MultiMolecule

May 31, 2026 ยท View on GitHub

Tip

Accelerate Molecular Biology Research with Machine Learning.

DOI

Codacy - Quality Codacy - Coverage CodeCov - Coverage

PyPI - Version PyPI - Python Version Downloads Statistics

License: AGPL v3

MultiMolecule is a one-stop ecosystem for molecular machine learning. It connects datasets, model implementations, reusable dataset and neural-network modules, the DanLing-based runner for training and evaluation, and task-oriented inference pipelines for RNA, DNA, and protein workflows.

Get Started

Install the latest stable release from PyPI:

pip install multimolecule

Run a registered pipeline through the Hugging Face transformers interface:

import multimolecule  # registers MultiMolecule models and pipelines
from transformers import pipeline

predictor = pipeline("rna-secondary-structure", model="multimolecule/ernierna-ss")
result = predictor("AUCAGCCUUCGUUCUGUAAACGG")

Load models directly when you need lower-level control:

import multimolecule

model = multimolecule.AutoModelForSequencePrediction.from_pretrained("multimolecule/basset")
tokenizer = multimolecule.AutoTokenizer.from_pretrained("multimolecule/basset")

Install the latest source version when you need unreleased changes:

pip install git+https://github.com/DLS5-Omics/MultiMolecule

Explore

Entry pointUse it for
dataTask-aware datasets, data loading, and multi-task sampling.
datasetsCurated biomolecular datasets and task metadata.
ioFASTA, DBN, BPSEQ, and bpRNA ST readers and writers.
modelsModel cards and API references for supported architectures.
tokenisersDNA, RNA, protein, and dot-bracket tokenisers.
pipelinesTask-focused inference workflows for supported biological tasks.
runnerTraining, evaluation, and inference configuration.
modulesReusable neural-network building blocks.

Community

  • Discourse: release announcements, usage questions, model requests, RFCs, and community discussion.
  • GitHub Issues: reproducible bugs, API issues, and implementation-tracked feature requests.
  • Hugging Face: released checkpoints, datasets, and demo Spaces.

Citation

Note

The artifacts distributed in this repository are part of the MultiMolecule project. If MultiMolecule supports your research, please cite the MultiMolecule project as follows:

@software{chen_2024_12638419,
  author    = {Chen, Zhiyuan and Zhu, Sophia Y.},
  title     = {MultiMolecule},
  doi       = {10.5281/zenodo.12638419},
  publisher = {Zenodo},
  url       = {https://doi.org/10.5281/zenodo.12638419},
  year      = 2024,
  month     = may,
  day       = 4
}

License

We believe openness is the Foundation of Research.

MultiMolecule is licensed under the GNU Affero General Public License.

For additional terms and clarifications, please refer to our License FAQ.

Please join us in building an open research community.

SPDX-License-Identifier: AGPL-3.0-or-later