LAMIC

August 11, 2025 ยท View on GitHub

LAMIC: Layout-Aware Multi-Image Composition via Scalability of Multimodal Diffusion Transformer

License: Apache-2.0 arXiv Hugging Face

This repo contains minimal inference code to run layout-aware multi-image composition & editing with our LAMIC based on Flux.1 Kontext-dev open-source models.

๐Ÿš€ Updates

  • [2025.07] This work is now open source, and you are free to use it!
  • [2025.08.01] We released a technical report! See our detailed method here.
  • [2025.08.08] LAMIC's Comfyui implementation is coming soon!

An example of LAMIC

Generate using the same input "example_inputs.json" with four different random number seeds.

example_forestexample_manexample_sea_turtleexample_sea_turtle
exampleexample_bboxed

Local install

Install LAMIC

git clone https://github.com/Suchenl/LAMIC.git
cd LAMIC
conda create --name myenv python=3.10.18
pip install -r requirements.txt

Install underlying foundation model

NameHuggingFace repoModelScope repo
FLUX.1 Kontext [dev]https://huggingface.co/black-forest-labs/FLUX.1-Kontext-devhttps://www.modelscope.cn/models/black-forest-labs/FLUX.1-Kontext-dev

For example, using modelscope

modelscope download --model black-forest-labs/FLUX.1-Kontext-dev --local_dir ./your_dir

Usage

Run example

python batch_run.py --input_path "./example_inputs.json" --num_img_per_sample 4 --concat_per_sample True --flux_kontext_transformer_path "your_local_kontext_transformer_path" --flux_path "your_local_flux_path (flux or flux kontext, both are acceptable)"

Of course, you can also refer to example_input to quickly build your own input. The number of reference images is flexibleโ€”just try it!

More refined control (modulate your custom inputs)

1. Use your predefined masks for layout control instead of margin boxes

Replace the key "bbox" with "mask_path".

For example, "mask_path": "your_mask_path (image format)"

2. Enhance the impact of one or more of your reference images

If you find that in some cases, the reference performance of some reference images is insufficient, you can try increasing the value of "to_area", which means the area to which the original reference image is scaled in proportion to its length and width. Since LAMIC is based on the attention between tokens, increasing the area of some reference images can increase the number of tokens, thereby enhancing the reference performance of the generated image. Vice versa.

3. Reduce the impact of one or more of your reference images on other reference images

In addition to the second method, you can also consider adding the key of "de-overlap" to other reference images, to eliminate the influence of some reference images on other reference images (when the target areas of the two reference images overlap)

For example, "ref_img_4": {"de-overlap": [1, 2]}. This means eliminating the influence of reference image 4 on the overlapping area with reference image 1, and eliminating the influence of reference image 4 on the overlapping area with reference image 2.

Framework of LAMIC

framework

You are free to:

  • โœ… Modify, adapt, or build upon the work
  • โœ… Distribute your modified versions
  • โœ… Use for research, education, or production systems

๐Ÿ“„ Citation

If you find this work useful to you, please consider citing our paper and giving the repo a โญ:

@article{chen2025lamic,
  title={LAMIC: Layout-Aware Multi-Image Composition via Scalability of Multimodal Diffusion Transformer},
  author={Chen, Yuzhuo and Ma, Zehua and Wang, Jianhua and Kang, Kai and Yao, Shunyu and Zhang, Weiming},
  journal={arXiv preprint arXiv:2508.00477},
  year={2025}
}