Chapter 5: ControlNet & Pose Control

April 13, 2026 ยท View on GitHub

Welcome to Chapter 5: ControlNet & Pose Control. In this part of ComfyUI Tutorial: Mastering AI Image Generation Workflows, you will build an intuitive mental model first, then move into concrete implementation details and practical production tradeoffs.

ControlNet is one of the most transformative additions to Stable Diffusion, and ComfyUI provides the ideal environment for harnessing its full potential. While standard text-to-image generation gives you control over what appears in an image, ControlNet gives you control over how it appears -- the composition, structure, pose, and spatial arrangement of every element. In this chapter, you will learn to integrate ControlNet models into your ComfyUI workflows, chain multiple control signals together, and fine-tune parameters for production-quality results.

How ControlNet Works

ControlNet injects structural guidance into the diffusion process by conditioning the model on an additional input signal -- a control image derived from techniques such as edge detection, depth estimation, or pose extraction. The control image is processed through a parallel copy of the model's encoder, and the resulting features are added to the main U-Net at each resolution level.

flowchart LR
    A[Reference Image] --> B[Preprocessor]
    B --> C[Control Image]
    C --> D[ControlNet Encoder]
    D --> E[Feature Injection]

    F[Text Prompt] --> G[CLIP Encoder]
    G --> H[Conditioning]
    H --> I[KSampler / U-Net]
    E --> I

    J[Empty Latent] --> I
    I --> K[VAE Decode]
    K --> L[Final Image]

    classDef input fill:#e1f5fe,stroke:#01579b
    classDef process fill:#f3e5f5,stroke:#4a148c
    classDef output fill:#e8f5e8,stroke:#1b5e20

    class A,F,J input
    class B,C,D,E,G,H,I,K process
    class L output

Key Concepts

ConceptDescription
Control ImageA preprocessed image (edge map, depth map, pose skeleton) that guides generation
PreprocessorAn algorithm that extracts structural information from a reference image
StrengthHow strongly the control signal influences the final output (0.0 to 1.0)
Start/End PercentThe portion of the diffusion process during which the control signal is active
Multi-ControlNetStacking multiple ControlNet models for layered structural guidance

Setting Up ControlNet in ComfyUI

Step 1: Install Required Models

ControlNet models must be placed in the models/controlnet/ directory. Each preprocessor type has a corresponding ControlNet model.

# Create the controlnet directory
mkdir -p ComfyUI/models/controlnet

# Download ControlNet models (example for SD 1.5)
# Place .safetensors or .pth files in models/controlnet/
# Common models:
#   control_v11p_sd15_openpose.safetensors
#   control_v11f1p_sd15_depth.safetensors
#   control_v11p_sd15_canny.safetensors
#   control_v11p_sd15_lineart.safetensors
#   control_v11p_sd15_scribble.safetensors

# For SDXL ControlNet models:
#   diffusers_xl_canny_full.safetensors
#   diffusers_xl_depth_full.safetensors

Step 2: Install Preprocessor Nodes

The comfyui_controlnet_aux package provides all standard preprocessors.

# Install the ControlNet auxiliary preprocessors
cd ComfyUI/custom_nodes
git clone https://github.com/Fannovel16/comfyui_controlnet_aux.git
cd comfyui_controlnet_aux
pip install -r requirements.txt

# Restart ComfyUI to register the new nodes

Step 3: Build the Basic ControlNet Workflow

flowchart TD
    subgraph Input
        A[Load Image] --> B[Canny Edge Detector]
        C[Load Checkpoint]
        D[CLIP Text Encode +]
        E[CLIP Text Encode -]
    end

    subgraph ControlNet
        F[Load ControlNet Model]
        B --> G[Apply ControlNet]
        F --> G
        D --> G
    end

    subgraph Generation
        G --> H[KSampler]
        E --> H
        C --> H
        I[Empty Latent Image] --> H
    end

    subgraph Output
        H --> J[VAE Decode]
        C --> J
        J --> K[Save Image]
    end

    classDef input fill:#e1f5fe,stroke:#01579b
    classDef control fill:#fff3e0,stroke:#ef6c00
    classDef gen fill:#f3e5f5,stroke:#4a148c
    classDef output fill:#e8f5e8,stroke:#1b5e20

    class A,B,C,D,E input
    class F,G control
    class H,I gen
    class J,K output

ControlNet Preprocessors

Each preprocessor extracts different structural information from a reference image. Choosing the right preprocessor is critical for achieving the desired control.

Preprocessor Comparison

PreprocessorUse CaseControl TypeBest For
CannyEdge detectionHard edgesArchitecture, mechanical objects
Depth (MiDaS/Zoe)Depth estimationSpatial depthLandscapes, room layouts
OpenPoseBody poseHuman skeletonCharacter poses, figure drawing
LineartLine extractionClean outlinesIllustrations, comics
ScribbleRough sketchesFreeform outlinesQuick concept art
SoftEdge (HED/PIDI)Soft edge detectionGentle outlinesOrganic shapes, portraits
Normal MapSurface normals3D surface directionProduct renders, 3D integration
SegmentationSemantic regionsScene layoutMulti-object composition
ShuffleColor/texture mixingStyle transferMaintaining color palettes
IP-AdapterImage promptVisual similarityStyle and subject reference

Preprocessor Node Configuration

# Canny Edge Detection
canny_config = {
    "node": "CannyEdgePreprocessor",
    "low_threshold": 100,    # Lower = more edges detected
    "high_threshold": 200,   # Higher = fewer, stronger edges
    "resolution": 512        # Processing resolution
}

# OpenPose Detection
openpose_config = {
    "node": "OpenposePreprocessor",
    "detect_hand": True,     # Include hand keypoints
    "detect_body": True,     # Include body keypoints
    "detect_face": True,     # Include face keypoints
    "resolution": 512
}

# Depth Estimation (MiDaS)
depth_config = {
    "node": "MiDaS-DepthMapPreprocessor",
    "a": 6.283,              # pi * 2, depth sensitivity
    "bg_threshold": 0.1,     # Background cutoff
    "resolution": 512
}

# Lineart Detection
lineart_config = {
    "node": "LineartPreprocessor",
    "coarse": False,         # False = fine lines, True = thick lines
    "resolution": 512
}

Pose Control with OpenPose

OpenPose is the most popular ControlNet preprocessor for character work. It detects human body keypoints and generates a skeleton overlay that guides the diffusion model to reproduce exact poses.

OpenPose Workflow

# Complete OpenPose ControlNet workflow configuration
openpose_workflow = {
    "1": {
        "class_type": "LoadImage",
        "inputs": {
            "image": "reference_pose_photo.png"
        }
    },
    "2": {
        "class_type": "OpenposePreprocessor",
        "inputs": {
            "image": ["1", 0],
            "detect_hand": "enable",
            "detect_body": "enable",
            "detect_face": "enable",
            "resolution": 512
        }
    },
    "3": {
        "class_type": "ControlNetLoader",
        "inputs": {
            "control_net_name": "control_v11p_sd15_openpose.safetensors"
        }
    },
    "4": {
        "class_type": "CheckpointLoaderSimple",
        "inputs": {
            "ckpt_name": "dreamshaper_8.safetensors"
        }
    },
    "5": {
        "class_type": "CLIPTextEncode",
        "inputs": {
            "text": "a warrior in full armor, standing heroically, fantasy art, highly detailed",
            "clip": ["4", 1]
        }
    },
    "6": {
        "class_type": "CLIPTextEncode",
        "inputs": {
            "text": "blurry, low quality, deformed, extra limbs",
            "clip": ["4", 1]
        }
    },
    "7": {
        "class_type": "ControlNetApply",
        "inputs": {
            "conditioning": ["5", 0],
            "control_net": ["3", 0],
            "image": ["2", 0],
            "strength": 0.85
        }
    },
    "8": {
        "class_type": "EmptyLatentImage",
        "inputs": {
            "width": 512,
            "height": 768,
            "batch_size": 1
        }
    },
    "9": {
        "class_type": "KSampler",
        "inputs": {
            "model": ["4", 0],
            "positive": ["7", 0],
            "negative": ["6", 0],
            "latent_image": ["8", 0],
            "seed": 42,
            "steps": 30,
            "cfg": 7.5,
            "sampler_name": "euler_ancestral",
            "scheduler": "karras",
            "denoise": 1.0
        }
    },
    "10": {
        "class_type": "VAEDecode",
        "inputs": {
            "samples": ["9", 0],
            "vae": ["4", 2]
        }
    },
    "11": {
        "class_type": "SaveImage",
        "inputs": {
            "images": ["10", 0],
            "filename_prefix": "openpose_output"
        }
    }
}

Pose Keypoint Reference

Keypoint GroupPointsDescription
Body18 pointsNose, neck, shoulders, elbows, wrists, hips, knees, ankles
Hands21 per handFingertips, knuckles, palm center
Face70 pointsEyes, nose, mouth, jawline, eyebrows
Foot3 per footHeel, toe, ankle

Advanced Control Techniques

Multi-ControlNet Stacking

You can combine multiple ControlNet models to achieve layered control -- for example, using OpenPose for body pose and Depth for scene composition simultaneously.

flowchart TD
    A[Reference Image] --> B[OpenPose Preprocessor]
    A --> C[Depth Preprocessor]
    A --> D[Canny Preprocessor]

    E[Load ControlNet: OpenPose] --> F[Apply ControlNet 1]
    B --> F
    G[Positive Conditioning] --> F

    H[Load ControlNet: Depth] --> I[Apply ControlNet 2]
    C --> I
    F --> I

    J[Load ControlNet: Canny] --> K[Apply ControlNet 3]
    D --> K
    I --> K

    K --> L[KSampler]
    L --> M[VAE Decode]
    M --> N[Final Image]

    classDef preprocess fill:#fff3e0,stroke:#ef6c00
    classDef controlnet fill:#e1f5fe,stroke:#01579b
    classDef gen fill:#f3e5f5,stroke:#4a148c

    class B,C,D preprocess
    class E,F,H,I,J,K controlnet
    class L,M,N gen
# Multi-ControlNet configuration
# Each ControlNet is applied sequentially, chaining the conditioning output

# First ControlNet: OpenPose for body structure
controlnet_1 = {
    "class_type": "ControlNetApply",
    "inputs": {
        "conditioning": ["positive_clip", 0],
        "control_net": ["openpose_model", 0],
        "image": ["openpose_image", 0],
        "strength": 0.9   # Strong pose adherence
    }
}

# Second ControlNet: Depth for spatial composition
controlnet_2 = {
    "class_type": "ControlNetApply",
    "inputs": {
        "conditioning": ["controlnet_1", 0],  # Chain from first
        "control_net": ["depth_model", 0],
        "image": ["depth_image", 0],
        "strength": 0.6   # Moderate depth guidance
    }
}

# Third ControlNet: Canny for fine edge detail
controlnet_3 = {
    "class_type": "ControlNetApply",
    "inputs": {
        "conditioning": ["controlnet_2", 0],  # Chain from second
        "control_net": ["canny_model", 0],
        "image": ["canny_image", 0],
        "strength": 0.4   # Subtle edge hints
    }
}

Control Strength and Timing

Fine-tuning when and how strongly ControlNet influences the generation is essential for natural-looking results.

# ControlNet Advanced Apply node provides timing control
advanced_controlnet = {
    "class_type": "ControlNetApplyAdvanced",
    "inputs": {
        "positive": ["clip_positive", 0],
        "negative": ["clip_negative", 0],
        "control_net": ["controlnet_model", 0],
        "image": ["preprocessed_image", 0],
        "strength": 0.8,
        "start_percent": 0.0,   # Begin control at step 0%
        "end_percent": 0.8      # Release control at step 80%
    }
}
StrengthStart %End %Effect
1.00.01.0Maximum control, strict adherence throughout
0.80.00.8Strong structure, natural fine details
0.50.00.5Loose guidance, high creative freedom
0.70.20.9Skip initial noise layout, control mid-process
1.00.00.4Lock in composition early, free detail phase

Best practices for timing:

  • Ending early (end_percent < 1.0): Lets the model add natural details without rigid constraint in the final steps. This often produces more photorealistic results.
  • Starting late (start_percent > 0.0): Allows the model to establish its own global composition before structural control kicks in. Useful when the control image is a rough approximation.
  • Strength below 0.5: The control signal becomes a gentle suggestion rather than a constraint. Combine with high CFG for best results.

ControlNet with Image-to-Image

ControlNet can be combined with img2img workflows for guided modifications of existing images.

# ControlNet + img2img workflow
controlnet_img2img = {
    "load_image": {
        "class_type": "LoadImage",
        "inputs": {"image": "source_photo.png"}
    },
    "encode_source": {
        "class_type": "VAEEncode",
        "inputs": {
            "pixels": ["load_image", 0],
            "vae": ["checkpoint", 2]
        }
    },
    "preprocess": {
        "class_type": "CannyEdgePreprocessor",
        "inputs": {
            "image": ["load_image", 0],
            "low_threshold": 100,
            "high_threshold": 200
        }
    },
    "apply_controlnet": {
        "class_type": "ControlNetApply",
        "inputs": {
            "conditioning": ["positive_prompt", 0],
            "control_net": ["canny_controlnet", 0],
            "image": ["preprocess", 0],
            "strength": 0.75
        }
    },
    "sampler": {
        "class_type": "KSampler",
        "inputs": {
            "model": ["checkpoint", 0],
            "positive": ["apply_controlnet", 0],
            "negative": ["negative_prompt", 0],
            "latent_image": ["encode_source", 0],  # Use encoded source
            "denoise": 0.65,  # Partial denoise to retain source structure
            "steps": 25,
            "cfg": 7.0,
            "sampler_name": "euler",
            "scheduler": "karras"
        }
    }
}

ControlNet for SDXL

SDXL ControlNet models work similarly but require SDXL-specific model files and typically operate at higher resolutions.

# SDXL ControlNet workflow differences
sdxl_controlnet = {
    "checkpoint": {
        "class_type": "CheckpointLoaderSimple",
        "inputs": {"ckpt_name": "sd_xl_base_1.0.safetensors"}
    },
    "controlnet_model": {
        "class_type": "ControlNetLoader",
        "inputs": {"control_net_name": "diffusers_xl_canny_full.safetensors"}
    },
    "latent": {
        "class_type": "EmptyLatentImage",
        "inputs": {
            "width": 1024,   # SDXL native resolution
            "height": 1024,
            "batch_size": 1
        }
    },
    "preprocessor": {
        "class_type": "CannyEdgePreprocessor",
        "inputs": {
            "image": ["reference_image", 0],
            "low_threshold": 100,
            "high_threshold": 200,
            "resolution": 1024  # Match SDXL resolution
        }
    }
}

SD 1.5 vs. SDXL ControlNet Comparison

FeatureSD 1.5 ControlNetSDXL ControlNet
Resolution512x512 native1024x1024 native
Model Size~700 MB~2.5 GB
VRAM Required~4 GB~8 GB
Available ModelsExtensive (11+ types)Growing (5-6 types)
QualityGoodExcellent
SpeedFastModerate
Community SupportMatureRapidly expanding

Practical Recipes

Recipe 1: Architecture Preservation

Maintain the exact structure of a building while changing its style.

# Use Canny with high strength + Depth for 3D consistency
architecture_recipe = {
    "canny_strength": 0.95,       # Near-exact edge preservation
    "canny_start": 0.0,
    "canny_end": 1.0,
    "depth_strength": 0.6,        # Moderate 3D guidance
    "depth_start": 0.0,
    "depth_end": 0.7,
    "prompt": "gothic cathedral, dark fantasy style, dramatic lighting, 8k",
    "negative": "modern, contemporary, bright, cheerful",
    "steps": 30,
    "cfg": 8.0,
    "sampler": "dpm_2_ancestral",
    "scheduler": "karras"
}

Recipe 2: Character Pose Transfer

Apply a specific pose from a reference photo to a generated character.

# OpenPose with face and hand detection
pose_transfer_recipe = {
    "openpose_strength": 0.85,
    "detect_body": True,
    "detect_hand": True,
    "detect_face": True,
    "prompt": "anime girl in school uniform, cherry blossoms, spring, high quality",
    "negative": "realistic, photo, deformed hands, extra fingers",
    "steps": 25,
    "cfg": 7.0,
    "sampler": "euler_ancestral",
    "scheduler": "normal"
}

Recipe 3: Depth-Guided Landscapes

Generate landscapes that match the spatial composition of a reference.

# Depth map for spatial consistency
landscape_recipe = {
    "depth_strength": 0.7,
    "depth_start": 0.0,
    "depth_end": 0.85,
    "prompt": "alien planet landscape, bioluminescent plants, two moons, sci-fi concept art",
    "negative": "earth, realistic, mundane, urban",
    "steps": 35,
    "cfg": 9.0,
    "sampler": "dpm_2",
    "scheduler": "karras"
}

Troubleshooting ControlNet

ProblemCauseSolution
ControlNet has no effectWrong model/preprocessor pairingVerify the ControlNet model matches the preprocessor type
Output looks distortedStrength too highReduce strength to 0.6-0.8 and set end_percent to 0.8
Poses are inaccurateLow-quality pose detectionUse higher-resolution input images; enable hand/face detection
Out of memory with multi-ControlNetToo many models loadedUse model unloading; reduce resolution; apply one at a time
Control image resolution mismatchPreprocessor and latent size differSet preprocessor resolution to match your Empty Latent Image dimensions
SDXL ControlNet not loadingWrong model versionEnsure you are using SDXL-specific ControlNet models, not SD 1.5

Summary

ControlNet transforms ComfyUI from a text-driven image generator into a precision composition tool. By injecting structural signals -- edges, depth, poses, and more -- into the diffusion process, you gain direct control over the spatial layout and form of your generated images. Mastering preprocessor selection, strength tuning, and multi-ControlNet stacking unlocks workflows that rival professional concept art pipelines.

Key Takeaways

  1. ControlNet adds structural guidance to the diffusion process through parallel encoder feature injection.
  2. Preprocessor choice matters -- Canny for hard edges, Depth for spatial layout, OpenPose for character poses, Lineart for illustrations.
  3. Strength and timing (start/end percent) are your primary levers for balancing control precision with creative freedom.
  4. Multi-ControlNet stacking allows layered control (e.g., pose + depth + edge simultaneously) with decreasing strength for each additional layer.
  5. End the control signal early (end_percent 0.7-0.8) for more natural, photorealistic results.
  6. SDXL ControlNet requires SDXL-specific models and higher resolution but produces superior output.

Next Steps

With ControlNet mastered, you can precisely control the structure and composition of your generated images. In the next chapter, we will explore LoRA models and model customization -- learning how to fine-tune generation style, add specific characters or concepts, and stack multiple LoRA adapters for unique artistic results.

Continue to Chapter 6: LoRA & Model Customization


Built with insights from the ComfyUI project.

What Problem Does This Solve?

Most teams struggle here because the hard part is not writing more code, but deciding clear boundaries for class_type, inputs, ControlNet so behavior stays predictable as complexity grows.

In practical terms, this chapter helps you avoid three common failures:

  • coupling core logic too tightly to one implementation path
  • missing the handoff boundaries between setup, execution, and validation
  • shipping changes without clear rollback or observability strategy

After working through this chapter, you should be able to reason about Chapter 5: ControlNet & Pose Control as an operating subsystem inside ComfyUI Tutorial: Mastering AI Image Generation Workflows, with explicit contracts for inputs, state transitions, and outputs.

Use the implementation notes around safetensors, image, classDef as your checklist when adapting these patterns to your own repository.

How it Works Under the Hood

Under the hood, Chapter 5: ControlNet & Pose Control usually follows a repeatable control path:

  1. Context bootstrap: initialize runtime config and prerequisites for class_type.
  2. Input normalization: shape incoming data so inputs receives stable contracts.
  3. Core execution: run the main logic branch and propagate intermediate state through ControlNet.
  4. Policy and safety checks: enforce limits, auth scopes, and failure boundaries.
  5. Output composition: return canonical result payloads for downstream consumers.
  6. Operational telemetry: emit logs/metrics needed for debugging and performance tuning.

When debugging, walk this sequence in order and confirm each stage has explicit success/failure conditions.

Source Code Walkthrough

comfy/controlnet.py

The ControlNet integration in comfy/controlnet.py applies the conditioned guidance to the UNet during diffusion sampling. The ControlBase class is the abstract interface imported by execution.py:

import comfy.controlnet
from comfy.comfy_types import IO, ComfyNodeABC, InputTypeDict, FileLocator

ControlNet models are loaded via the same folder_paths checkpoint resolution system as regular checkpoints. The ControlNetApply and ControlNetApplyAdvanced nodes in nodes.py accept an image tensor (the control signal) and a strength float, passing them to the ControlNet model for injection into the diffusion UNet's middle and output blocks.

Chapter Connections