Control - DrakeRichards/stable-diffusion-webui GitHub Wiki
Control
Native control module for SD.Next for Diffusers backend
Can be used for generation Control as well as Image and Text workflows
Install
Final release
For final release no extra steps will be required
Pre-release
- Make sure you're using latest SD.Next from DEV branch
webui --upgrade git checkout dev
- Make sure you enable extra logging as described below
- Yes, you need to be in
backend=diffusers
- Any issues should be reported on SD.Next Discord server in a dedicated channel
https://discord.com/channels/1101998836328697867/1186383781066719322
Do not create GitHub issues for pre-release versions
Additional steps
-
ControlNet-XS support requires
diffusers
0.25.dev from latest main branchvenv\Scripts\activate (on Windows)
source venv/bin/activate (on Linux)
pip uninstall diffusers
pip install git+https://github.com/huggingface/diffusers
exitAfter changing
diffusers
version, you need to start SD.Next usingwebui --experimental
flag
or it would automatically install latest known supported version -
DWPose: Requires OpenMMLab framework
pip install openmim
mim install mmengine mmcv mmpose mmdet
-
MediaPipe: Requires MediaPipe framework
pip install mediapipe
Supported Control Models
- lllyasviel ControlNet for SD 1.5 and SD-XL models
Includes ControlNets as well as Reference-only mode and any compatible 3rd party models
Original ControlNets for SD15 are 1.4GB each and for SDXL its at massive 4.9GB - VisLearn ControlNet XS for SD 1.5 and SD-XL models
Lightweight ControlNet models for SDXL at 165MB only with near-identical results - TencentARC T2I-Adapter for SD 1.5 and SD-XL models
Adapters provide similar functionality at much lower resource cost at only 300MB each - CiaraRowles TemporalNet for SD 1.5 models
All built-in models are downloaded upon first use and stored stored in /models/controlnets
, /models/adapters
, /models/processors
Listed below are all models that are supported out-of-the-box:
ControlNet
- SD15:
Canny, Depth, IP2P, LineArt, LineArt Anime, MLDS, NormalBae, OpenPose,
Scribble, Segment, Shuffle, SoftEdge, TemporalNet, HED, Tile - SDXL:
Canny Small XL, Canny Mid XL, Canny XL, Depth Zoe XL, Depth Mid XL
Note: only models compatible with currently loaded base model are listed
Additional ControlNet models in safetensors can be downloaded manually and placed into /models/controlnets
ControlNet XS
- SDXL:
Canny, Depth
T2I-Adapter
- SD15:
Canny, Depth, Depth Zoe, OpenPose, Sketch - SDXL:
Canny XL, Depth Zoe XL, Depth Midas XL, LineArt XL, OpenPose XL, Sketch XL
Note: Only models compatible with currently loaded base model are listed
Processors
- Pose style: OpenPose, DWPose, MediaPipe Face
- Outline style: Canny, Edge, LineArt Realistic, LineArt Anime, HED, PidiNet
- Depth style: Midas Depth Hybrid, Zoe Depth, Leres Depth, Normal Bae
- Segmentation style: SegmentAnything
- Other: MLSD, Shuffle
Note: Processor sizes can vary from none for built-in ones to anywhere between 200MB up to 4.2GB for ZoeDepth-Large
Reference
Reference mode is its own pipeline, so it cannot have multiple units or processors
Workflows
Inputs & Outputs
- Image -> Image
- Batch: list of images -> Gallery and/or Video
- Folder: folder with images -> Gallery and/or Video
- Video -> Gallery and/or Video
Notes:
- Input/Output/Preview panels can be minimized by clicking on them
- For video output, make sure to set video options
Unit
- Unit is: input plus process plus control
- Pipeline consists of any number of configured units
If unit is using using control modules, all control modules inside pipeline must be of same type
e.g. ControlNet, ControlNet-XS, T2I-Adapter or Reference - Each unit can use primary input or its own override input
- Each unit can have no processor in which case it will run control on input directly
Use when you're using predefined input templates - Unit can have no control in which case it will run processor only
- Any combination of input, processor and control is possible
For example, two enabled units with process only will produce compound processed image but without control
What-if?
- If no input is provided then pipeline will run in txt2img mode
Can be freely used instead of standardtxt2img
- If none of units have control or adapter, pipeline will run in img2img mode using input image
Can be freely used instead of standardimg2img
- If you have processor enabled, but no controlnet or adapter loaded,
pipeline will run in img2img mode using processed input - If you have multiple processors enabled, but no controlnet or adapter loaded,
pipeline will run in img2img mode on blended processed image - Output resolution is by default set to input resolution,
Use resize settings to force any resolution - Resize operation can run before (on input image) or after processing (on output image)
- Using video input will run pipeline on each frame unless skip frames is set
Video output is standard list of images (gallery) and can be optionally encoded into a video file
Video file can be interpolated using RIFE for smoother playback
Logging
To enable extra logging for troubleshooting purposes,
set environment variables before running SD.Next
-
Linux:
export SD_CONTROL_DEBUG=true
export SD_PROCESS_DEBUG=true
./webui.sh --debug -
Windows:
set SD_CONTROL_DEBUG=true
set SD_PROCESS_DEBUG=true
webui.bat --debug
Requirements
Control itself does not have any additional requirements and any used models are downloaded automatically
However, some processors require additional packages to be installed
Note: Its recommended to activate venv
before installing requirements
- DWPose: Requires OpenMMLab framework
pip install openmim
mim install mmengine mmcv mmpose mmdet
- MediaPipe: Requires MediaPipe framework
pip install mediapipe
Limitations / TODO
Todo
- Validate all models: ControlNet, T2I-Adapter, Reference
- Bind restore button and override controls
- API is missing
- Some metadata is not included in output images (key metadata is included)
- Model load can take some time and there is no progress indicator in UI, only in logs
Especially on first access since model needs to be downloaded
Future
- Pose editor
- Support for
konhya-ss lllite
controlnets - Metrabs model as openpose preprocessor
- Use DPT for depth and segmentation: https://huggingface.co/Intel/dpt-hybrid-midas https://huggingface.co/docs/transformers/main/en/model_doc/dpt
- Multi-frame rendering https://xanthius.itch.io/multi-frame-rendering-for-stablediffusion
- Deflickering and deghosting