TokenUnify: Scaling Up Autoregressive Pretraining for Neuron Segmentation

July 30, 2025 · View on GitHub

Yinda Chen¹,²*, Haoyuan Shi¹,²*, Xiaoyu Liu¹, Te Shi², Ruobing Zhang³,², Dong Liu¹, Zhiwei Xiong¹,²†, Feng Wu¹,²‡

¹MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China
²Institute of Artificial Intelligence, Hefei Comprehensive National Science Center
³Institute for Brain and Intelligence, Fudan University

*Equal Contribution †Project Leader ‡Corresponding Author

This repository contains the official implementation of the paper TokenUnify: Scaling Up Autoregressive Pretraining for Neuron Segmentation. It includes experimental settings, source code, and theoretical proofs. For details, please refer to the original paper.

Pipeline of our proposed methods

Network details of our proposed methods

📰 News

[2025.06] 🎉 TokenUnify was accepted by ICCV 2025, looking forward to meeting you in Hawaii.
[2025.06] 📊 MEC dataset released! Wafer (MEC) dataset available on HuggingFace.
[2025.06] 🔧 Pre-trained weights updated! Robust initialization weights (pre-trained, not fine-tuned) available in the Pretrained_weights folder on HuggingFace.
[2024.12] 🎉 Code and pre-training dataset released! Core implementation and pre-training weights released.
[2024.12] 📊 Datasets released! Pre-training dataset available on HuggingFace.
[2024.05] 📝 Paper released! TokenUnify paper published on arXiv.

🚀 Overview

TokenUnify introduces a novel autoregressive visual pre-training method for neuron segmentation from electron microscopy (EM) volumes. The method tackles the unique challenges of EM data including high noise levels, anisotropic voxel dimensions, and ultra-long spatial dependencies through hierarchical predictive coding that combines three complementary prediction tasks:

Random Token Prediction: Captures noise-robust spatial patterns and learns position-invariant local feature detectors.
Next Token Prediction: Maintains sequential dependencies and captures critical transitional patterns in neuronal morphology.
Next-All Token Prediction: Models global context and long-range correlations while mitigating cumulative errors in autoregression.

Leveraging the Mamba architecture's linear-time sequence modeling capabilities, TokenUnify achieves 44% improvement in neuron segmentation performance compared to training from scratch and 25% improvement over MAE, while demonstrating superior scaling properties and reducing autoregressive error accumulation from O(K) to O(√K) for sequences of length K.

🛠️ Environment Setup

Set up the environment using our Docker image:

sudo docker pull registry.cn-hangzhou.aliyuncs.com/mybitahub/large_model:mamba0224_ydchen

📦 Dataset Download

Datasets for pre-training and segmentation:

Dataset Type	Dataset Name	Description	URL
Pre-training Dataset	Large EM Datasets	Various brain regions for pre-training	🤗 EM Pretrain Dataset
Segmentation Dataset	Wafer (MEC)	High-resolution neuron segmentation	🤗 Wafer_EM Dataset
Segmentation Dataset	CREMI Dataset	Circuit reconstruction challenge	CREMI Dataset
Segmentation Dataset	AC3/AC4	Mouse brain cortex dataset	Google Drive

🏋️ Model Weights

Pre-trained (robust initialization, not fine-tuned) TokenUnify weights are available in the Pretrained_weights folder:

Model	Parameters	Dataset	URL
TokenUnify_pretrained-100M	100M	EM Multi-dataset	🤗 Pretrained_weights
TokenUnify_pretrained-200M	200M	EM Multi-dataset	🤗 Pretrained_weights
TokenUnify_pretrained-500M	500M	EM Multi-dataset	🤗 Pretrained_weights
TokenUnify_pretrained-1B	1B	EM Multi-dataset	🤗 Pretrained_weights

Fine-tuned weights are also available:

Model	Parameters	Dataset	URL
TokenUnify-100M	100M	EM Multi-dataset	🤗 HuggingFace
TokenUnify-200M	200M	EM Multi-dataset	🤗 HuggingFace
TokenUnify-500M	500M	EM Multi-dataset	🤗 HuggingFace
TokenUnify-1B	1B	EM Multi-dataset	🤗 HuggingFace
superhuman	-	EM Multi-dataset	🤗 HuggingFace

🔥 Usage Guide

1. Pre-training (8 nodes)

bash src/run_mamba_mae_AR.sh

2. Pre-training (32 nodes - Large scale)

bash src/launch_huge.sh

3. Fine-tuning

bash src/run_mamba_seg.sh

Hierarchical Predictive Coding Framework: We introduce a unified framework that integrates three distinct visual structure perspectives within a coherent information-theoretic formulation, providing optimal coverage of visual data structure while reducing autoregressive error accumulation from O(K) to O(√K).
Large-Scale EM Dataset: We construct one of the largest EM neuron segmentation datasets with 1.2 billion finely annotated voxels across six functional brain regions, providing an ideal testbed for long-sequence visual modeling.
Billion-Parameter Mamba Network: We achieve the first billion-parameter Mamba network for visual autoregression, demonstrating both effectiveness and computational efficiency in processing long-sequence visual data with favorable scaling properties.

📄 License

Usage Notes

Before the public release of the data, the following usage restrictions must be met:

Non-commercial Use: Users do not have the rights to copy, distribute, publish, or use the data for commercial purposes or develop and produce products. Any format or copy of the data is considered the same as the original data. Users may modify the content and convert the data format as needed but are not allowed to publish or provide services using the modified or converted data without permission.
Research Purposes Only: Users guarantee that the authorized data will only be used for their own research and will not share the data with third parties in any form.
Citation Requirements: Research results based on the authorized data, including books, articles, conference papers, theses, policy reports, and other publications, must cite the data source according to citation norms, including the authors and the publisher of the data.
Prohibition of Profit-making Activities: Users are not allowed to use the authorized data for any profit-making activities.
Termination of Data Use: Users must terminate all use of the data and destroy the data (e.g., completely delete from computer hard drives and storage devices/spaces) upon leaving their team or organization or when the authorization is revoked by the copyright holder.

Data Information

Sample Source: Mouse MEC MultiBeam-SEM, Intelligent Institute Brain Imaging Platform (Wafer 4 at layer VI, wafer 25, wafer 26, and wafer 36 at layer II/III)
Resolution: 8nm x 8nm x 35nm
Volume Size: 1250 x 1250 x 125
Annotation Completion Dates: 2023.12.11 (w4), 2024.04.12 (w36)
Authors: Yinda Chen, Haoyuan Shi, Xiaoyu Liu, Te Shi, Ruobing Zhang, Dong Liu, Zhiwei Xiong, Feng Wu
Copyright Holder: Institute of Artificial Intelligence, Hefei Comprehensive National Science Center

Acknowledgment Norms

Chinese Name: 合肥人工智能研究院
English Name: Institute of Artificial Intelligence, Hefei Comprehensive National Science Center

✅ To-Do List

📝 Open-source core code
📖 Write README for code usage
🗂️ Open-source pre-training dataset
⚖️ Upload pre-trained and fine-tuned weights
🧠 Release Wafer (MEC) dataset
🏆 Release evaluation scripts and benchmarks
🔧 Add support for natural image datasets

📝 Citation

If you find this code or dataset useful, please cite:

@inproceedings{chen2025tokenunify,
  title={TokenUnify: Scaling Up Autoregressive Pretraining for Neuron Segmentation},
  author={Chen, Yinda and Shi, Haoyuan and Liu, Xiaoyu and Shi, Te and Zhang, RuoBing and Liu, Dong and Xiong, Zhiwei and Wu, Feng},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2025}
}

🤝 Contributing

We welcome contributions to improve TokenUnify! Please submit issues and pull requests.

📧 Contact

For questions, contact: cyd0806@mail.ustc.edu.cn

⭐ If you find this work helpful, please give us a star! ⭐