Fast version
May 20, 2025 · View on GitHub
TaylorSeer-HiDream
1. Set Up Conda Environment
# First install Python 3.12
# Then install the requirements
pip install -r requirements.txt
2. Download Model Checkpoints
If you experience connection issues with Hugging Face, you can use the Hugging Face mirror:
export HF_ENDPOINT=https://hf-mirror.com
Download HiDream Models
# Full version
huggingface-cli download --resume-download HiDream-ai/HiDream-I1-Full \
--local-dir HiDream-ai/HiDream-I1-Full
# Dev version
huggingface-cli download --resume-download HiDream-ai/HiDream-I1-Dev \
--local-dir HiDream-ai/HiDream-I1-Dev
# Fast version
huggingface-cli download --resume-download HiDream-ai/HiDream-I1-Fast \
--local-dir HiDream-ai/HiDream-I1-Fast
Download Llama-3.1-8B
HiDream requires Llama-3.1-8B, which you need to request access for. Once you have the necessary permissions, you can download it using one of these methods:
# Option 1: Using huggingface-cli
huggingface-cli download --resume-download meta-llama/Llama-3.1-8B \
--local-dir meta-llama/Llama-3.1-8B
# Option 2: Using modelscope
modelscope download --model LLM-Research/Meta-Llama-3.1-8B
3. Quick Start
# For full model inference
python ./inference.py --model_type full
# For distilled dev model inference
python ./inference.py --model_type dev
# For distilled fast model inference
python ./inference.py --model_type fast
4. Gradio Demo
python gradio_demo.py
5. Sampling with TaylorSeer-HiDream
We've developed a custom script to evaluate the image generation quality metrics. Use the following command to run sampling tests:
python sampling.py --prompt_file </path/to/your/test/prompt.txt> \
--output_dir </path/to/your/generated/samples/folder> \
--add_sampling_metadata
Performance Evaluation
We tested TaylorSeer-HiDream on the DrawBench200 benchmark. Here's an example of how to run the test:
python sampling.py --prompt_file /path/to/your/prompts/DrawBench200.txt \
--output_dir /path/to/your/generated/samples/folder \
--add_sampling_metadata
Results
TaylorSeer-HiDream demonstrates significant performance improvements:
TaylorSeer optimization reduces generation time by approximately 72% (from 76s to 21s per image) while maintaining comparable quality metrics on the DrawBench200 benchmark.
6. Start with TaylorSeer-HiDream-I1-Fast-nf4
This section explains how to launch TaylorSeer-HiDream using the nf4 quantized model versions. These models are optimized for reduced memory and computation requirements, making them suitable for resource-constrained environments.(A 24GB GPU should be sufficient to run it.)
Download nf4 Weights
# Dev version
huggingface-cli download --resume-download azaneko/HiDream-I1-Dev-nf4 --local-dir /root/autodl-tmp/pretrained_models/azaneko/HiDream-I1-Dev-nf4
# Fast version
huggingface-cli download --resume-download azaneko/HiDream-I1-Fast-nf4 --local-dir /root/autodl-tmp/pretrained_models/azaneko/HiDream-I1-Fast-nf4
# Full version
huggingface-cli download --resume-download azaneko/HiDream-I1-Full-nf4 --local-dir /root/autodl-tmp/pretrained_models/azaneko/HiDream-I1-Full-nf4
#Download Quantized LLaMA Model (INT4)
huggingface-cli download --resume-download hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4 --local-dir /root/autodl-tmp/pretrained_models/hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4
Install hdi1 Inference Package
The nf4 models require the hdi1 inference library, which depends on flash-attn. If you've already installed the base requirements.txt for HiDream, it's recommended to reuse the same Conda environment and install hdi1 as follows:
pip install hdi1 --no-build-isolation
Note: This may recompile parts of flash-attn, so ensure your environment is properly configured.
Quick Inference Example
You can quickly generate images using the CLI provided by hdi1. Here’s a sample command:
python -m hdi1 "A cat holding a sign that says 'hello world'" -m fast
The -m fast flag specifies the HiDream-I1-Fast-nf4 model.
Replace the prompt with your own text to generate different images.
TaylorSeer vs. Original HiDream: Key Differences
The main architectural difference between TaylorSeer-HiDream and the original HiDream repository lies in the addition of two new modules:
- taylor_utils/
This module implements Taylor series-based prediction for efficient inference. It manages:
Cache step prediction using different orders of Taylor approximation.
Dynamic adjustment of cache updates during inference to optimize performance.
The core idea is to reuse computation across time steps using Taylor expansion, significantly reducing redundant operations.
- cache_functions/
This module handles the initialization and configuration of the cache system, which enables the TaylorSeer optimizations. In particular:
cache_init.py defines the cache structure and its parameters:
cache_dic['fresh_threshold'] = 4 # Determines after how many steps the cache should be refreshed
cache_dic['max_order'] = 1 # Specifies the maximum order of Taylor approximation used
These settings govern when to recompute or reuse previous computations, striking a balance between speed and accuracy.