Janus

April 3, 2026 ยท View on GitHub

Original unified multimodal model from DeepSeek with decoupled visual encoding for understanding and generation.

Architecture

Janus uses separate vision encoders for understanding and generation tasks, unified through a shared LLM backbone. Key design:

  • Understanding encoder: SigLIP vision encoder
  • Generation: Autoregressive VQ token prediction (576 discrete tokens per image)
  • LLM base: DeepSeek-LLM-1.3B

Supported Variants

VariantHuggingFaceParameters
Janus-1.3Bdeepseek-ai/Janus-1.3B1.3B

Relationship to Janus-Pro and JanusFlow

All three models share the same repository but differ in architecture and scale:

AspectJanusJanus-ProJanusFlow
Generation methodVQ autoregressiveVQ autoregressiveRectified flow ODE
Parameters1.3B7B1.3B
Image tokens576 discrete576 discreteContinuous (30 steps)
External VAENoNoSDXL VAE

Janus and Janus-Pro share the same backbone adapter (janus_pro). Switch between them by changing model_path.

Dependencies

The model environment is managed via the janus_pro image defined in modal/images.py. For local setup, install the dependencies listed in model/Janus/requirements.txt.

Inference

Python API

from umm.inference.pipeline import InferencePipeline
from umm.inference.multimodal_inputs import InferenceRequest

pipeline = InferencePipeline(backbone_name="janus_pro", backbone_cfg={
    "model_path": "/path/to/Janus-1.3B",
    "janus_root": "/path/to/model/Janus",
    "seed": 42,
    "torch_dtype": "bfloat16",
})

# Generation
result = pipeline.run(InferenceRequest(
    backbone="janus_pro", task="generation",
    prompt="A cat sitting on a rainbow",
))

# Understanding
result = pipeline.run(InferenceRequest(
    backbone="janus_pro", task="understanding",
    prompt="Describe this image",
    images=["path/to/image.jpg"],
))

Supported Benchmarks

Same configs as Janus-Pro โ€” change model_path to Janus-1.3B:

BenchmarkConfig
DPG Benchconfigs/eval/dpg_bench/dpg_bench_janus_pro.yaml
GenEvalconfigs/eval/geneval/geneval_janus_pro.yaml
WISEconfigs/eval/wise/wise_janus_pro.yaml
UEvalconfigs/eval/ueval/ueval_janus_pro.yaml
Uni-MMMUconfigs/eval/uni_mmmu/uni_mmmu_janus_pro.yaml
MMEconfigs/eval/mme/mme_janus_pro.yaml
MMMUconfigs/eval/mmmu/mmmu_janus_pro.yaml
MMBenchconfigs/eval/mmbench/mmbench_janus_pro.yaml
MM-Vetconfigs/eval/mmvet/mmvet_janus_pro.yaml
MathVistaconfigs/eval/mathvista/mathvista_janus_pro.yaml

Key Configuration Parameters

  • Generation: seed, torch_dtype (cfg_weight=5.0, parallel_size=4 are model defaults)
  • Understanding: uses VLChatProcessor for image preprocessing