CoCoIns: Consistent Subject Generation via Contrastive Instantiated Concepts

December 2, 2025 · View on GitHub

arXiv Project Page Citation

Lee Hsin-Ying1Kelvin CK Chan2Ming-Hsuan Yang1

1University of California, Merced  2Google DeepMind

Note

CoCoIns generates consistent subjects across individual creations without requiring reference images, fine-tuning, or access to other creations.

teaser

Contrastive Concept Instantiation (CoCoIns) is a framework for synthesizing consistent subjects across multiple independent generations. The framework consists of a generative model and a mapping network that transforms input latent codes into pseudo-words associated with specific concept instances. Users can generate consistent subjects by reusing the same latent codes across different prompts.

Setup

Requirements

  • Python 3.10

Installation

pip install -r requirements.txt

The pre-trained checkpoint (model.safetensors) is included in the checkpoint/ directory.

Usage

Input Prompts

Mark the subject that should be made consistent across generations by wrapping it with <subj1> and </subj1>. The current checkpoint works best with prompts containing a single subject.

Collect the prompts in a text file, e.g., data/example1.txt, where each line is a prompt. Subjects with the same tag will be consistent across generations. For example,

A <subj1>person</subj1> harvests grapes in the vineyard.
A <subj1>person</subj1> transplants coral fragments.
A <subj2>person</subj2> grades assignments in his classroom.
A <subj2>person</subj2> leads a yoga class in the studio.

Test Datasets

The data/ directory also contains the datasets evaluated in the paper:

  • portraits.txt contains 1,000 prompts featuring a subject looking at the camera. This dataset evaluates face similarity with clear frontal faces.
  • scenes.txt evaluates real-world performance with free-form prompts where face poses and angles vary. It contains 1,000 prompts generated by a Large Language Model.

Inference

Generation

python test.py --prompts data/example1.txt --checkpoint checkpoint --output output --max-subject-id 2

Set --max-subject-id to the maximum subject ID in your input prompts.

The output images and subject codes will be saved in output/. You can then load the saved codes to generate the same subjects in new contexts:

python test.py --prompts data/example2.txt --checkpoint checkpoint --output output2 --codes output/codes.pt --max-subject-id 2

Evaluation

You can evaluate subject consistency across generations using the --prompt-per-code option. For example, portraits.txt and scenes.txt only contain <subj1>. By setting --prompt-per-code 5, the script samples a new code for subject 1 every five prompts. This means subjects in samples 1-5 will be consistent, samples 6-10 will be consistent with each other, and so on.

python test.py --prompts data/scenes.txt --checkpoint checkpoint --output output --prompt-per-code 5 --max-subject-id 1

Citation

@article{
    hsin-ying2025cocoins,
    title={CoCoIns: Consistent Subject Generation via Contrastive Instantiated Concepts},
    author={Lee Hsin-Ying and Kelvin C.K. Chan and Ming-Hsuan Yang},
    journal={Transactions on Machine Learning Research},
    year={2025},
}