Chapter 5: ControlNet & Pose Control
April 13, 2026 ยท View on GitHub
Welcome to Chapter 5: ControlNet & Pose Control. In this part of ComfyUI Tutorial: Mastering AI Image Generation Workflows, you will build an intuitive mental model first, then move into concrete implementation details and practical production tradeoffs.
ControlNet is one of the most transformative additions to Stable Diffusion, and ComfyUI provides the ideal environment for harnessing its full potential. While standard text-to-image generation gives you control over what appears in an image, ControlNet gives you control over how it appears -- the composition, structure, pose, and spatial arrangement of every element. In this chapter, you will learn to integrate ControlNet models into your ComfyUI workflows, chain multiple control signals together, and fine-tune parameters for production-quality results.
How ControlNet Works
ControlNet injects structural guidance into the diffusion process by conditioning the model on an additional input signal -- a control image derived from techniques such as edge detection, depth estimation, or pose extraction. The control image is processed through a parallel copy of the model's encoder, and the resulting features are added to the main U-Net at each resolution level.
flowchart LR
A[Reference Image] --> B[Preprocessor]
B --> C[Control Image]
C --> D[ControlNet Encoder]
D --> E[Feature Injection]
F[Text Prompt] --> G[CLIP Encoder]
G --> H[Conditioning]
H --> I[KSampler / U-Net]
E --> I
J[Empty Latent] --> I
I --> K[VAE Decode]
K --> L[Final Image]
classDef input fill:#e1f5fe,stroke:#01579b
classDef process fill:#f3e5f5,stroke:#4a148c
classDef output fill:#e8f5e8,stroke:#1b5e20
class A,F,J input
class B,C,D,E,G,H,I,K process
class L output
Key Concepts
| Concept | Description |
|---|---|
| Control Image | A preprocessed image (edge map, depth map, pose skeleton) that guides generation |
| Preprocessor | An algorithm that extracts structural information from a reference image |
| Strength | How strongly the control signal influences the final output (0.0 to 1.0) |
| Start/End Percent | The portion of the diffusion process during which the control signal is active |
| Multi-ControlNet | Stacking multiple ControlNet models for layered structural guidance |
Setting Up ControlNet in ComfyUI
Step 1: Install Required Models
ControlNet models must be placed in the models/controlnet/ directory. Each preprocessor type has a corresponding ControlNet model.
# Create the controlnet directory
mkdir -p ComfyUI/models/controlnet
# Download ControlNet models (example for SD 1.5)
# Place .safetensors or .pth files in models/controlnet/
# Common models:
# control_v11p_sd15_openpose.safetensors
# control_v11f1p_sd15_depth.safetensors
# control_v11p_sd15_canny.safetensors
# control_v11p_sd15_lineart.safetensors
# control_v11p_sd15_scribble.safetensors
# For SDXL ControlNet models:
# diffusers_xl_canny_full.safetensors
# diffusers_xl_depth_full.safetensors
Step 2: Install Preprocessor Nodes
The comfyui_controlnet_aux package provides all standard preprocessors.
# Install the ControlNet auxiliary preprocessors
cd ComfyUI/custom_nodes
git clone https://github.com/Fannovel16/comfyui_controlnet_aux.git
cd comfyui_controlnet_aux
pip install -r requirements.txt
# Restart ComfyUI to register the new nodes
Step 3: Build the Basic ControlNet Workflow
flowchart TD
subgraph Input
A[Load Image] --> B[Canny Edge Detector]
C[Load Checkpoint]
D[CLIP Text Encode +]
E[CLIP Text Encode -]
end
subgraph ControlNet
F[Load ControlNet Model]
B --> G[Apply ControlNet]
F --> G
D --> G
end
subgraph Generation
G --> H[KSampler]
E --> H
C --> H
I[Empty Latent Image] --> H
end
subgraph Output
H --> J[VAE Decode]
C --> J
J --> K[Save Image]
end
classDef input fill:#e1f5fe,stroke:#01579b
classDef control fill:#fff3e0,stroke:#ef6c00
classDef gen fill:#f3e5f5,stroke:#4a148c
classDef output fill:#e8f5e8,stroke:#1b5e20
class A,B,C,D,E input
class F,G control
class H,I gen
class J,K output
ControlNet Preprocessors
Each preprocessor extracts different structural information from a reference image. Choosing the right preprocessor is critical for achieving the desired control.
Preprocessor Comparison
| Preprocessor | Use Case | Control Type | Best For |
|---|---|---|---|
| Canny | Edge detection | Hard edges | Architecture, mechanical objects |
| Depth (MiDaS/Zoe) | Depth estimation | Spatial depth | Landscapes, room layouts |
| OpenPose | Body pose | Human skeleton | Character poses, figure drawing |
| Lineart | Line extraction | Clean outlines | Illustrations, comics |
| Scribble | Rough sketches | Freeform outlines | Quick concept art |
| SoftEdge (HED/PIDI) | Soft edge detection | Gentle outlines | Organic shapes, portraits |
| Normal Map | Surface normals | 3D surface direction | Product renders, 3D integration |
| Segmentation | Semantic regions | Scene layout | Multi-object composition |
| Shuffle | Color/texture mixing | Style transfer | Maintaining color palettes |
| IP-Adapter | Image prompt | Visual similarity | Style and subject reference |
Preprocessor Node Configuration
# Canny Edge Detection
canny_config = {
"node": "CannyEdgePreprocessor",
"low_threshold": 100, # Lower = more edges detected
"high_threshold": 200, # Higher = fewer, stronger edges
"resolution": 512 # Processing resolution
}
# OpenPose Detection
openpose_config = {
"node": "OpenposePreprocessor",
"detect_hand": True, # Include hand keypoints
"detect_body": True, # Include body keypoints
"detect_face": True, # Include face keypoints
"resolution": 512
}
# Depth Estimation (MiDaS)
depth_config = {
"node": "MiDaS-DepthMapPreprocessor",
"a": 6.283, # pi * 2, depth sensitivity
"bg_threshold": 0.1, # Background cutoff
"resolution": 512
}
# Lineart Detection
lineart_config = {
"node": "LineartPreprocessor",
"coarse": False, # False = fine lines, True = thick lines
"resolution": 512
}
Pose Control with OpenPose
OpenPose is the most popular ControlNet preprocessor for character work. It detects human body keypoints and generates a skeleton overlay that guides the diffusion model to reproduce exact poses.
OpenPose Workflow
# Complete OpenPose ControlNet workflow configuration
openpose_workflow = {
"1": {
"class_type": "LoadImage",
"inputs": {
"image": "reference_pose_photo.png"
}
},
"2": {
"class_type": "OpenposePreprocessor",
"inputs": {
"image": ["1", 0],
"detect_hand": "enable",
"detect_body": "enable",
"detect_face": "enable",
"resolution": 512
}
},
"3": {
"class_type": "ControlNetLoader",
"inputs": {
"control_net_name": "control_v11p_sd15_openpose.safetensors"
}
},
"4": {
"class_type": "CheckpointLoaderSimple",
"inputs": {
"ckpt_name": "dreamshaper_8.safetensors"
}
},
"5": {
"class_type": "CLIPTextEncode",
"inputs": {
"text": "a warrior in full armor, standing heroically, fantasy art, highly detailed",
"clip": ["4", 1]
}
},
"6": {
"class_type": "CLIPTextEncode",
"inputs": {
"text": "blurry, low quality, deformed, extra limbs",
"clip": ["4", 1]
}
},
"7": {
"class_type": "ControlNetApply",
"inputs": {
"conditioning": ["5", 0],
"control_net": ["3", 0],
"image": ["2", 0],
"strength": 0.85
}
},
"8": {
"class_type": "EmptyLatentImage",
"inputs": {
"width": 512,
"height": 768,
"batch_size": 1
}
},
"9": {
"class_type": "KSampler",
"inputs": {
"model": ["4", 0],
"positive": ["7", 0],
"negative": ["6", 0],
"latent_image": ["8", 0],
"seed": 42,
"steps": 30,
"cfg": 7.5,
"sampler_name": "euler_ancestral",
"scheduler": "karras",
"denoise": 1.0
}
},
"10": {
"class_type": "VAEDecode",
"inputs": {
"samples": ["9", 0],
"vae": ["4", 2]
}
},
"11": {
"class_type": "SaveImage",
"inputs": {
"images": ["10", 0],
"filename_prefix": "openpose_output"
}
}
}
Pose Keypoint Reference
| Keypoint Group | Points | Description |
|---|---|---|
| Body | 18 points | Nose, neck, shoulders, elbows, wrists, hips, knees, ankles |
| Hands | 21 per hand | Fingertips, knuckles, palm center |
| Face | 70 points | Eyes, nose, mouth, jawline, eyebrows |
| Foot | 3 per foot | Heel, toe, ankle |
Advanced Control Techniques
Multi-ControlNet Stacking
You can combine multiple ControlNet models to achieve layered control -- for example, using OpenPose for body pose and Depth for scene composition simultaneously.
flowchart TD
A[Reference Image] --> B[OpenPose Preprocessor]
A --> C[Depth Preprocessor]
A --> D[Canny Preprocessor]
E[Load ControlNet: OpenPose] --> F[Apply ControlNet 1]
B --> F
G[Positive Conditioning] --> F
H[Load ControlNet: Depth] --> I[Apply ControlNet 2]
C --> I
F --> I
J[Load ControlNet: Canny] --> K[Apply ControlNet 3]
D --> K
I --> K
K --> L[KSampler]
L --> M[VAE Decode]
M --> N[Final Image]
classDef preprocess fill:#fff3e0,stroke:#ef6c00
classDef controlnet fill:#e1f5fe,stroke:#01579b
classDef gen fill:#f3e5f5,stroke:#4a148c
class B,C,D preprocess
class E,F,H,I,J,K controlnet
class L,M,N gen
# Multi-ControlNet configuration
# Each ControlNet is applied sequentially, chaining the conditioning output
# First ControlNet: OpenPose for body structure
controlnet_1 = {
"class_type": "ControlNetApply",
"inputs": {
"conditioning": ["positive_clip", 0],
"control_net": ["openpose_model", 0],
"image": ["openpose_image", 0],
"strength": 0.9 # Strong pose adherence
}
}
# Second ControlNet: Depth for spatial composition
controlnet_2 = {
"class_type": "ControlNetApply",
"inputs": {
"conditioning": ["controlnet_1", 0], # Chain from first
"control_net": ["depth_model", 0],
"image": ["depth_image", 0],
"strength": 0.6 # Moderate depth guidance
}
}
# Third ControlNet: Canny for fine edge detail
controlnet_3 = {
"class_type": "ControlNetApply",
"inputs": {
"conditioning": ["controlnet_2", 0], # Chain from second
"control_net": ["canny_model", 0],
"image": ["canny_image", 0],
"strength": 0.4 # Subtle edge hints
}
}
Control Strength and Timing
Fine-tuning when and how strongly ControlNet influences the generation is essential for natural-looking results.
# ControlNet Advanced Apply node provides timing control
advanced_controlnet = {
"class_type": "ControlNetApplyAdvanced",
"inputs": {
"positive": ["clip_positive", 0],
"negative": ["clip_negative", 0],
"control_net": ["controlnet_model", 0],
"image": ["preprocessed_image", 0],
"strength": 0.8,
"start_percent": 0.0, # Begin control at step 0%
"end_percent": 0.8 # Release control at step 80%
}
}
| Strength | Start % | End % | Effect |
|---|---|---|---|
| 1.0 | 0.0 | 1.0 | Maximum control, strict adherence throughout |
| 0.8 | 0.0 | 0.8 | Strong structure, natural fine details |
| 0.5 | 0.0 | 0.5 | Loose guidance, high creative freedom |
| 0.7 | 0.2 | 0.9 | Skip initial noise layout, control mid-process |
| 1.0 | 0.0 | 0.4 | Lock in composition early, free detail phase |
Best practices for timing:
- Ending early (end_percent < 1.0): Lets the model add natural details without rigid constraint in the final steps. This often produces more photorealistic results.
- Starting late (start_percent > 0.0): Allows the model to establish its own global composition before structural control kicks in. Useful when the control image is a rough approximation.
- Strength below 0.5: The control signal becomes a gentle suggestion rather than a constraint. Combine with high CFG for best results.
ControlNet with Image-to-Image
ControlNet can be combined with img2img workflows for guided modifications of existing images.
# ControlNet + img2img workflow
controlnet_img2img = {
"load_image": {
"class_type": "LoadImage",
"inputs": {"image": "source_photo.png"}
},
"encode_source": {
"class_type": "VAEEncode",
"inputs": {
"pixels": ["load_image", 0],
"vae": ["checkpoint", 2]
}
},
"preprocess": {
"class_type": "CannyEdgePreprocessor",
"inputs": {
"image": ["load_image", 0],
"low_threshold": 100,
"high_threshold": 200
}
},
"apply_controlnet": {
"class_type": "ControlNetApply",
"inputs": {
"conditioning": ["positive_prompt", 0],
"control_net": ["canny_controlnet", 0],
"image": ["preprocess", 0],
"strength": 0.75
}
},
"sampler": {
"class_type": "KSampler",
"inputs": {
"model": ["checkpoint", 0],
"positive": ["apply_controlnet", 0],
"negative": ["negative_prompt", 0],
"latent_image": ["encode_source", 0], # Use encoded source
"denoise": 0.65, # Partial denoise to retain source structure
"steps": 25,
"cfg": 7.0,
"sampler_name": "euler",
"scheduler": "karras"
}
}
}
ControlNet for SDXL
SDXL ControlNet models work similarly but require SDXL-specific model files and typically operate at higher resolutions.
# SDXL ControlNet workflow differences
sdxl_controlnet = {
"checkpoint": {
"class_type": "CheckpointLoaderSimple",
"inputs": {"ckpt_name": "sd_xl_base_1.0.safetensors"}
},
"controlnet_model": {
"class_type": "ControlNetLoader",
"inputs": {"control_net_name": "diffusers_xl_canny_full.safetensors"}
},
"latent": {
"class_type": "EmptyLatentImage",
"inputs": {
"width": 1024, # SDXL native resolution
"height": 1024,
"batch_size": 1
}
},
"preprocessor": {
"class_type": "CannyEdgePreprocessor",
"inputs": {
"image": ["reference_image", 0],
"low_threshold": 100,
"high_threshold": 200,
"resolution": 1024 # Match SDXL resolution
}
}
}
SD 1.5 vs. SDXL ControlNet Comparison
| Feature | SD 1.5 ControlNet | SDXL ControlNet |
|---|---|---|
| Resolution | 512x512 native | 1024x1024 native |
| Model Size | ~700 MB | ~2.5 GB |
| VRAM Required | ~4 GB | ~8 GB |
| Available Models | Extensive (11+ types) | Growing (5-6 types) |
| Quality | Good | Excellent |
| Speed | Fast | Moderate |
| Community Support | Mature | Rapidly expanding |
Practical Recipes
Recipe 1: Architecture Preservation
Maintain the exact structure of a building while changing its style.
# Use Canny with high strength + Depth for 3D consistency
architecture_recipe = {
"canny_strength": 0.95, # Near-exact edge preservation
"canny_start": 0.0,
"canny_end": 1.0,
"depth_strength": 0.6, # Moderate 3D guidance
"depth_start": 0.0,
"depth_end": 0.7,
"prompt": "gothic cathedral, dark fantasy style, dramatic lighting, 8k",
"negative": "modern, contemporary, bright, cheerful",
"steps": 30,
"cfg": 8.0,
"sampler": "dpm_2_ancestral",
"scheduler": "karras"
}
Recipe 2: Character Pose Transfer
Apply a specific pose from a reference photo to a generated character.
# OpenPose with face and hand detection
pose_transfer_recipe = {
"openpose_strength": 0.85,
"detect_body": True,
"detect_hand": True,
"detect_face": True,
"prompt": "anime girl in school uniform, cherry blossoms, spring, high quality",
"negative": "realistic, photo, deformed hands, extra fingers",
"steps": 25,
"cfg": 7.0,
"sampler": "euler_ancestral",
"scheduler": "normal"
}
Recipe 3: Depth-Guided Landscapes
Generate landscapes that match the spatial composition of a reference.
# Depth map for spatial consistency
landscape_recipe = {
"depth_strength": 0.7,
"depth_start": 0.0,
"depth_end": 0.85,
"prompt": "alien planet landscape, bioluminescent plants, two moons, sci-fi concept art",
"negative": "earth, realistic, mundane, urban",
"steps": 35,
"cfg": 9.0,
"sampler": "dpm_2",
"scheduler": "karras"
}
Troubleshooting ControlNet
| Problem | Cause | Solution |
|---|---|---|
| ControlNet has no effect | Wrong model/preprocessor pairing | Verify the ControlNet model matches the preprocessor type |
| Output looks distorted | Strength too high | Reduce strength to 0.6-0.8 and set end_percent to 0.8 |
| Poses are inaccurate | Low-quality pose detection | Use higher-resolution input images; enable hand/face detection |
| Out of memory with multi-ControlNet | Too many models loaded | Use model unloading; reduce resolution; apply one at a time |
| Control image resolution mismatch | Preprocessor and latent size differ | Set preprocessor resolution to match your Empty Latent Image dimensions |
| SDXL ControlNet not loading | Wrong model version | Ensure you are using SDXL-specific ControlNet models, not SD 1.5 |
Summary
ControlNet transforms ComfyUI from a text-driven image generator into a precision composition tool. By injecting structural signals -- edges, depth, poses, and more -- into the diffusion process, you gain direct control over the spatial layout and form of your generated images. Mastering preprocessor selection, strength tuning, and multi-ControlNet stacking unlocks workflows that rival professional concept art pipelines.
Key Takeaways
- ControlNet adds structural guidance to the diffusion process through parallel encoder feature injection.
- Preprocessor choice matters -- Canny for hard edges, Depth for spatial layout, OpenPose for character poses, Lineart for illustrations.
- Strength and timing (start/end percent) are your primary levers for balancing control precision with creative freedom.
- Multi-ControlNet stacking allows layered control (e.g., pose + depth + edge simultaneously) with decreasing strength for each additional layer.
- End the control signal early (end_percent 0.7-0.8) for more natural, photorealistic results.
- SDXL ControlNet requires SDXL-specific models and higher resolution but produces superior output.
Next Steps
With ControlNet mastered, you can precisely control the structure and composition of your generated images. In the next chapter, we will explore LoRA models and model customization -- learning how to fine-tune generation style, add specific characters or concepts, and stack multiple LoRA adapters for unique artistic results.
Continue to Chapter 6: LoRA & Model Customization
Built with insights from the ComfyUI project.
What Problem Does This Solve?
Most teams struggle here because the hard part is not writing more code, but deciding clear boundaries for class_type, inputs, ControlNet so behavior stays predictable as complexity grows.
In practical terms, this chapter helps you avoid three common failures:
- coupling core logic too tightly to one implementation path
- missing the handoff boundaries between setup, execution, and validation
- shipping changes without clear rollback or observability strategy
After working through this chapter, you should be able to reason about Chapter 5: ControlNet & Pose Control as an operating subsystem inside ComfyUI Tutorial: Mastering AI Image Generation Workflows, with explicit contracts for inputs, state transitions, and outputs.
Use the implementation notes around safetensors, image, classDef as your checklist when adapting these patterns to your own repository.
How it Works Under the Hood
Under the hood, Chapter 5: ControlNet & Pose Control usually follows a repeatable control path:
- Context bootstrap: initialize runtime config and prerequisites for
class_type. - Input normalization: shape incoming data so
inputsreceives stable contracts. - Core execution: run the main logic branch and propagate intermediate state through
ControlNet. - Policy and safety checks: enforce limits, auth scopes, and failure boundaries.
- Output composition: return canonical result payloads for downstream consumers.
- Operational telemetry: emit logs/metrics needed for debugging and performance tuning.
When debugging, walk this sequence in order and confirm each stage has explicit success/failure conditions.
Source Code Walkthrough
comfy/controlnet.py
The ControlNet integration in comfy/controlnet.py applies the conditioned guidance to the UNet during diffusion sampling. The ControlBase class is the abstract interface imported by execution.py:
import comfy.controlnet
from comfy.comfy_types import IO, ComfyNodeABC, InputTypeDict, FileLocator
ControlNet models are loaded via the same folder_paths checkpoint resolution system as regular checkpoints. The ControlNetApply and ControlNetApplyAdvanced nodes in nodes.py accept an image tensor (the control signal) and a strength float, passing them to the ControlNet model for injection into the diffusion UNet's middle and output blocks.