Janus-Pro

July 2, 2026 · View on GitHub

Scaled-up multimodal model from DeepSeek (7B) with improved training and stronger multimodal capabilities.

Original repository: https://github.com/deepseek-ai/Janus
Paper: Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling
Backbone key: janus_pro
Capabilities: Understanding, Generation (NO Editing)

Supported Variants

Variant	HuggingFace	Parameters
Janus-Pro-7B	`deepseek-ai/Janus-Pro-7B`	7B

Janus-Pro shares the same backbone adapter (janus_pro) and architecture as the original Janus (1.3B), but at larger scale with better performance. See also JanusFlow for the rectified flow variant.

Dependencies

The model environment is managed via the janus_pro image defined in modal/images.py. For local setup, install the dependencies listed in model/Janus/requirements.txt.

Janus-Pro requires Flash Attention (v2.7.4). The Modal image already includes it. For local setup, install a pre-compiled wheel matching your environment — see modal/README.md for the exact environment parameters and installation instructions.

Architecture Note

Janus-Pro uses parallel generation: a single forward pass produces multiple images simultaneously (parallel_size=4). Generation uses classifier-free guidance (cfg_weight=5.0) with 576 image tokens per image.

Inference

CLI

# Generation
PYTHONPATH=src python -m umm.cli.main infer --config configs/inference/janus_pro_generation.yaml

# Understanding
PYTHONPATH=src python -m umm.cli.main infer --config configs/inference/janus_pro_understanding.yaml

Python API

from umm.inference.pipeline import InferencePipeline
from umm.inference.multimodal_inputs import InferenceRequest

pipeline = InferencePipeline(backbone_name="janus_pro", backbone_cfg={
    "model_path": "/path/to/janus_pro_weights",
    "janus_root": "/path/to/model/Janus",
    "seed": 42,
    "torch_dtype": "bfloat16",
})

# Generation
result = pipeline.run(InferenceRequest(
    backbone="janus_pro", task="generation",
    prompt="A cat sitting on a rainbow",
))

# Understanding
result = pipeline.run(InferenceRequest(
    backbone="janus_pro", task="understanding",
    prompt="Describe this image",
    images=["path/to/image.jpg"],
))

Supported Benchmarks

Benchmark	Config
DPG Bench	`configs/eval/dpg_bench/dpg_bench_janus_pro.yaml`
GenEval	`configs/eval/geneval/geneval_janus_pro.yaml`
WISE	`configs/eval/wise/wise_janus_pro.yaml`
UEval	`configs/eval/ueval/ueval_janus_pro.yaml`
Uni-MMMU	`configs/eval/uni_mmmu/uni_mmmu_janus_pro.yaml`
MME	`configs/eval/mme/mme_janus_pro.yaml`
MMMU	`configs/eval/mmmu/mmmu_janus_pro.yaml`
MMBench	`configs/eval/mmbench/mmbench_janus_pro.yaml`
MMStar	`configs/eval/mmstar/mmstar_janus_pro.yaml`
MM-Vet	`configs/eval/mmvet/mmvet_janus_pro.yaml`
MathVista	`configs/eval/mathvista/mathvista_janus_pro.yaml`

# Example: run GenEval
PYTHONPATH=src python -m umm.cli.main eval --config configs/eval/geneval/geneval_janus_pro.yaml

# Example: run UEval
PYTHONPATH=src python -m umm.cli.main eval --config configs/eval/ueval/ueval_janus_pro.yaml

Key Configuration Parameters

Generation: seed, torch_dtype (cfg_weight=5.0 and parallel_size=4 are model defaults)
Understanding: uses VLChatProcessor for image preprocessing; returns text and sft_format