ComfyUI-Anima-LLLite
May 20, 2026 · View on GitHub
ComfyUI custom node for ControlNet-LLLite for Anima (DiT-based).
LLLite is a lightweight ControlNet variant that injects a low-rank correction into the attention (and optionally MLP) projections of the DiT. This node loads weights trained with kohya-ss/sd-scripts and applies them to the ComfyUI Anima model at inference time.
This is intended as a minimal reference implementation. Community nodes with extra features (per-step scheduling, multi-cond, region masks, per-block selection, …) are welcome and can build on this codebase.
Install
Clone into ComfyUI/custom_nodes/:
cd ComfyUI/custom_nodes
git clone <this-repo> ComfyUI-Anima-LLLite
Place LLLite weights (.safetensors) under ComfyUI/models/controlnet/.
Node
Apply Anima ControlNet-LLLite (loaders category)
| Input | Type | Notes |
|---|---|---|
model | MODEL | Anima checkpoint |
lllite_name | filename | from models/controlnet/ |
image | IMAGE | control image (any resolution; auto-resized to latent×8) |
strength | FLOAT | LLLite multiplier (default 1.0) |
start_percent | FLOAT | start of the active sampling window (0.0 = sigma_max) |
end_percent | FLOAT | end of the active sampling window (1.0 = sigma_min) |
preserve_wrapper | BOOLEAN | if a previous node already installed a model_function_wrapper, delegate to it from inside this node's wrapper so they cascade instead of overwriting (default True) |
mask (optional) | MASK | required when the weights are 4-channel (inpaint); white = inpaint area, black = keep |
Output: patched MODEL.
All architectural parameters (cond_emb_dim, mlp_dim, cond_dim,
cond_resblocks, use_aspp, aspp_dilations, target_layers,
cond_in_channels, inpaint_masked_input) are baked into the trained
weights and read automatically from the safetensors metadata — they are
not exposed as node inputs because changing them would just cause load
errors.
Inpainting (4-channel) weights
If the loaded weights were trained with --lllite_cond_in_channels=4
(inpaint mode), the node also requires a MASK input. The mask convention
matches the sd-scripts training/inference code:
white(1.0) = inpaint area (the region the model should fill)black(0.0) = keep
The mask is resized to match the control image (nearest-neighbor),
binarized at 0.5, and concatenated with the RGB control image as a 4th
channel (rescaled to [-1, +1] to match the RGB range). When the weights
were trained with --lllite_inpaint_masked_input, the RGB channels are
also zeroed inside the mask region before concatenation — this is read
from the weights metadata and applied automatically.
3-channel weights continue to work with no mask input; if a mask is
connected to a 3-channel LLLite the node logs a warning and ignores it.
A 4-channel weight without a mask raises a ValueError rather than
silently substituting an empty mask.
Cascading multiple LLLite nodes
Multiple Apply Anima ControlNet-LLLite nodes can be chained on the same
MODEL to apply more than one LLLite at the same time (e.g. a pose LLLite
and a depth LLLite, or several inpaint LLLites with different conditions).
Just feed the output MODEL of one node into the model input of the next.
ComfyUI's model_options has a single model_function_wrapper slot, so two
nodes that each install a wrapper would normally cause the outer one to
silently overwrite the inner one. To avoid this, each node captures the
previously-installed wrapper before cloning and delegates to it from inside
its own wrapper, so wrappers stack instead of replacing each other. This is
controlled by the preserve_wrapper toggle (default True); turn it off to
get the legacy single-wrapper behavior, or to deliberately replace an
upstream wrapper.
Notes when cascading:
- Each node has its own
start_percent/end_percentwindow — when a node's window is inactive it is a pass-through to the next wrapper, so you can stage different LLLites across different parts of the schedule. strengthis per-node and they add independently in the LLLite output; there is no global normalization, so very high combined strengths can over-saturate the same way a single highstrengthwould.- The pattern is compatible with other well-behaved wrapper-installing
nodes (e.g.
ChromaRadianceOptions) provided they follow the same delegate-to-previous-wrapper convention.
How it works
-
Discovers the LLLite target Linears under each Anima DiT block according to the
target_atomicsrecorded in the weight metadata:self_attn_q_pre→ self-attentionq_projself_attn_kv_pre→ self-attentionk_proj+v_projcross_attn_q_pre→ cross-attentionq_projmlp_fc1_pre→GPT2FeedForward.layer1(the MLP fc1)
The LLM-Adapter sub-tree and
output_projare always skipped. -
On each sampling step the wrapper monkey-patches those Linears with the LLLite forward, runs
apply_model, and restores the originals — so the patch never leaks across model clones. -
The control image is resized to
latent_HW * 8once per resolution, then embedded by the v2conditioning1trunk (Conv 4×4 s=4 → Conv 3×3 s=1 → Conv 4×4 s=4 with GroupNorm/SiLU, optional ResBlocks, optional ASPP, 1×1 projection, LayerNorm) to a per-token feature map matching the DiT token grid. For 4-channel (inpaint) weights, the mask is resized with nearest-neighbor, binarized, rescaled to[-1, +1], and concatenated as the 4th channel of the cond image before this trunk. -
Each LLLite module applies the v2 forward:
down → silu → concat with (cond + per-module depth-embedding) → mid → FiLM(γ, β) → silu → up, added to the original Linear input. -
CFG is supported:
cond_embis broadcast to match the runtime batch size.
Weight format (v2, named keys)
The state dict is in the v2 self-describing format. Each LLLite module's weights are prefixed by the target Linear's full path, so the file alone identifies which Linear each tensor belongs to:
lllite_conditioning1.{conv1,conv2,conv3,norm1,...,proj,out_norm,...}.{weight,bias}
lllite_dit_blocks_{i}_self_attn_q_proj.down.{weight,bias}
lllite_dit_blocks_{i}_self_attn_q_proj.mid.{weight,bias}
lllite_dit_blocks_{i}_self_attn_q_proj.cond_to_film.{weight,bias}
lllite_dit_blocks_{i}_self_attn_q_proj.up.{weight,bias}
lllite_dit_blocks_{i}_self_attn_q_proj.depth_embed
...
The expected safetensors metadata keys are:
lllite.version(="2")lllite.cond_emb_dim,lllite.mlp_dimlllite.cond_dim,lllite.cond_resblockslllite.use_aspp,lllite.aspp_dilationslllite.target_atomics(canonical, comma-separated atomic specifiers) — falls back tolllite.target_layersfor older v2 snapshotslllite.cond_in_channels("3"standard /"4"inpaint; defaults to3)lllite.inpaint_masked_input("true"/"false"; defaults tofalse, only meaningful whencond_in_channels=4)
Legacy weights with lllite_modules.{i}.* keys (the pre-v2 format) are
rejected on load. Re-train against the current sd-scripts codebase.
Credits
- LLLite design and Anima training implementation: kohya-ss
- Adapted for ComfyUI from
sd-scripts - ComfyUI port written collaboratively with Claude (Anthropic,
claude-opus-4-7)