OpenFF QCArchive Dataset Submission

June 15, 2026 ยท View on GitHub

Dataset Lifecycle

All datasets submitted to QCArchive via this repository conform to the Dataset Lifecycle.

See STANDARDS.md for submission standards. Datasets must be submitted as pull requests.

User Quickstart

  1. Ensure git-lfs is installed on your local machine: https://git-lfs.github.com/

  2. To submit a new dataset, begin by cloning this repository:

    export GIT_LFS_SKIP_SMUDGE=1
    git clone git@github.com:openforcefield/qca-dataset-submission.git
    

    This will clone the repo, but avoid downloading existing LFS objects. If you wish to download all LFS objects, leave off the export GIT_LFS_SKIP_SMUDGE=1.

  3. Once cloned, create and switch to a new branch from master, then create a new directory in qca-dataset-submission/submissions/:

    git checkout -b <dataset-branch>
    mkdir qca-dataset-submission/submissions/YYYY-MM-DD-OpenFF-<DESCRIPTIVE-DATASET-NAME>-v1.0
    

    You will add all submission artifacts to this directory.

  4. Create and activate a new conda env with basic submission-preparation requirements with:

    conda env create -f qca-dataset-submission/devtools/prod-envs/qcarchive-user-submit.yaml
    conda activate qcarchive-user-submit
    

    You may also need to install OpenEye:
    conda install -c openeye openeye-toolkits

  5. Choose a starting notebook and README based on the type of dataset you wish to submit:

    Copy the notebook and README for the dataset you want into the directory you created.

    cp examples/<dataset-type>/* qca-dataset-submission/submissions/YYYY-MM-DD-OpenFF-<DESCRIPTIVE-DATASET-NAME>-v1.0
    
  6. Start up a Jupyter notebook with your new notebook:

    jupyter notebook qca-dataset-submission/submissions/YYYY-MM-DD-OpenFF-<DESCRIPTIVE-DATASET-NAME>-v1.0/generate-dataset.ipynb
    

    Edit the contents with appropriate metadata information, read in your molecules using the cells appropriate for your input data, and make any other modifications as needed for your specific needs.

  7. Copy generated metadata components into README. Write a reasonably-detailed high-level description of the submission at the top.

  8. Commit the following files in the submission directory you made:

    • your input files; please compress them if possible with e.g. bzip2
    • generate-dataset.ipynb
    • dataset.pdf
    • dataset.smi
    • dataset.json.bz2
  9. Push your branch to Github:

    git push origin <dataset-branch>
    
  10. Make a new PR for the branch. Validation will run automatically on your dataset.json.* file, indicating any potential issues prior to submission. Ask for help if you see validation failures you do not understand. Ping a reviewer in the comments.

  11. Once reviewed and approved, your submission will be merged and submitted to QCArchive! Computations specified by the submission will be performed on OpenFF-managed compute resources.

Creating a compute expansion

If you have already computed a dataset but want to re-compute it with a new QCSpec (e.g. new level of theory), you can do so using a compute expansion. This is faster than creating a new dataset, and explicitly links datasets with the same molecules and purpose. A compute expansion involves adding a file called compute.json to your original submission, which contains the dataset metadata (identical to the original dataset) and the new compute spec. This can be done manually, or programatically. The programatic description is provided below, with an example of the notebook and of the file.

  1. Create a new branch as described above, and navigate to the submission directory of the dataset you want to expand.
  2. Create a new jupyter notebook called generate-compute.ipynb example here.
  3. In the notebook, either download the original dataset and remove the molecules and original QCSpec, or re-create the dataset with the same name as the original and skip the molecule addition step.
  • See below for details about how changes to the dataset are propagated; note that the dataset name must be the same, and changes to any metadata except compute-tag and the QCSpec will be ignored when submitting the compute expansion.
  • Please note that the default compute_tag is openff; if you need to use a different one, please add it explicitly to the dataset at this step, as the compute.json file overrides the compute tag added manually to the PR. If you do need to change the compute tag after submission, you can change it by updating the label on the PR and the change will take effect when the error cycling action runs next.
  1. Add the new QCSpec to the dataset, and save the dataset to compute.json, example here.
  2. Add the additional compute spec to the submission's README.md file.
  3. Add the generate-compute.ipynb and compute.json files to the submission's QCSubmit Manifest entry in the README.md file.
  4. Proof the submission and open a PR. Dataset validation will run automatically.
  5. Once the dataset is validated, request a review, and once approved, your compute expansion will be submitted!

When the PR is merged, the following happens:

  • CI checks for compute*.json*, so files can be called anything so long as they follow that pattern.

  • This gets loaded into a QCSubmit dataset structure in CI (see lifecycle.py, SubmittableBase) and submitted to MolSSI with openff.qcsubmit.datasets.datasets._BaseDataset.submit()

  • submit() checks if the dataset already exists using only the dataset type and name. Changes in descriptions, other metadata, etc. don't affect anything. New/different molecules will also be ignored if the dataset name already exists.

  • submit() adds the specifications

  • submit() submits with the compute_tag and priority within the new compute.json.

  • Other info in the dataset, such as dataset_tags, are not incorporated into additional compute submissons and thus changing them will not affect the dataset.

The Lifecycle of a Dataset Submission

All Open Force Field datasets submitted to QCArchive undergo well-defined lifecycle.

Dataset Lifecycle

Each labeled rectangle in the lifecycle represents a state. A submission PR changes state according to the arrows. Changes in state may be performed by automation or manually by a human when certain critera are met.

The lifecycle process is described below, with [bracketed] items indicating the agent of action, one of:

  • [GHA]: Github Actions
  • [Board]: Github Project Board
  • [Human]: A maintainer of the qca-dataset-submission repository.
  1. A PR is created against qca-dataset-submission by a submitter.

    • the template is filled out with informational sections according to the PR template
    • [GHA] validation operates on all dataset*.json files found in the PR; performs validation checks
      • comment made based on validation checks
      • reruns on each push
  2. Add card for the PR to Dataset Tracking board.

  3. When the submission is ready to be submitted to public QCArchive (validations pass, submitters and reviewers satisfied), PR is merged.

    • [Board] PR card will move to state "Queued for Submission" immediately.

    • [GHA] lifecycle-backlog will move PR card to state "Queued for Submission" if merged and in state "Backlog"

    • [GHA] lifecycle-submission will attempt to submit the dataset

      • if successful, will move card to state "Error Cycling"; add comment to PR
      • if failed, will keep card queued; add comment to PR; attempt again next execution
    • [Human] Submit worker jobs on a server to begin compute. If using Nautilus, carefully monitor utilization and scale down resources as jobs finish.

  4. COMPLETE, INCOMPLETE, ERROR numbers reported for Optimizations, TorsionDrives

  5. PR will remain in state "Error Cycling" until moved to "Requires Scientific Review" or until all tasks COMPLETE

    • [Human] if errors appear persistent, move to state "Requires Scientific Review"
    • discussion should be had on PR for next version
    • [Human] once decided, state moved to "End of Life"
    • [Human] ensure all worker jobs have been shut down.
  6. [GHA] lifecycle-end-of-life will add tag 'end-of-life' to dataset in QCArchive for PR in "End of Life"

  7. [GHA] lifecycle-archived-complete will add tag 'archived-complete' to dataset in QCArchive for PR in "Archived/Complete"

Management Touchpoints

In addition to the states given above, there are additional touchpoints available for managing dataset submissions:

  1. The tracking label is the "on/off" switch for automation via Github Actions. To disable all automation on a submission PR, remove this label. To enable automation, add the label.

  2. Submission priority can be changed by adding one of the following labels:

    • priority-high: highest priority
    • priority-normal: normal priority
    • priority-low: lowest priority
  3. Submission routing to QCFractal managers on different compute resources can be accomplished with compute tags. Add a label like compute-<tagname> to set the compute tag for all QCArchive tasks associated with a submisison. Be sure to coordinate with QCFractal manager admins to ensure your chosen compute tag is being served on the expected resources. This mechanism can also be used to "dead-letter" computations that are no longer desired by setting a compute tag that no manager will service.

  4. The order of a submission PR in a Dataset Tracking column matters. Submissions higher in a column will be operated on first by all Github Action automation. For example, if you want to error cycle a submission before any others so it has a higher chance of being pulled by idle manager workers, place it at the top of the Error Cycling column.

Dude where's my Dataset?

Finding the source of a dataset in QCArchive can be difficult; here we offer a mapping between a dataset in QCArchive and the folder which contains its inputs including a quick overview of some metadata and the status of the dataset. Note that new datasets submitted using QCSubmit know where they were created and have a long_description_url in the metadata which points directly to their home folder in this repository.

Status

The status only refers to the default specification which is required for all of our datasets. Currently this is B3LYP-D3BJ/DZVP.

Key:

Complete 100% of all default spec jobs have been complete.

Error some of the jobs in the dataset contain errors which may prevent the jobs from finishing, this could be something like a linear torsiondrive.

Running the dataset is currently running and may have some incomplete jobs.

Forcefield Release Datasets

ForcefieldRepositoryDatasetsElementsZenodo
Release OpenFF 2.0.0 SageOpenFF Sage 2.0.02025-05-29-OpenFF-SMIRNOFF-Sage-2.0.0H, C, N, O, S, P, F, Cl, Br, IQC Fitting Datasets for OpenFF SMIRNOFF Sage 2.0.0
Release OpenFF 2.1.0 SageOpenFF Sage 2.1.02025-05-22-OpenFF-SMIRNOFF-Sage-2.1.0H, C, N, O, S, P, F, Cl, Br, IQC Fitting Datasets for OpenFF SMIRNOFF Sage 2.1.0
Release OpenFF 2.2.0 SageOpenFF Sage 2.2.02025-05-23-OpenFF-SMIRNOFF-Sage-2.2.0H, C, N, O, S, P, F, Cl, Br, IQC Fitting Datasets for OpenFF SMIRNOFF Sage 2.2.0
Release OpenFF 2.3.0 SageOpenFF Sage 2.3.02026-01-27-OpenFF-SMIRNOFF-Sage-2.3.0H, C, N, O, S, P, F, Cl, Br, IQC Fitting Datasets for OpenFF SMIRNOFF Sage 2.3.0

Basic Datasets

These are currently used to compute properties of a minimum energy conformation (Hessians, wavefunctions, etc.), usually derived from completed optimization datasets.

QCArchive DatasetFolderDescriptionElementsStatus
OpenFF Optimization Set 12019-07-09-OpenFF-Optimization-SetHessian calculations.Cl, S, C, F, O, H, NComplete
OpenFF NCI250K Boron 12019-07-05 OpenFF NCI250K Boron 1Hessian calculations.Cl, Br, S, C, F, B, O, H, NComplete
OpenFF Discrepancy Benchmark 12019-07-05 eMolecules force field discrepancies 1Hessian calculation.Cl, Br, S, C, F, P, I, O, H, NError
OpenFF Gen 2 Opt Set 1 Roche2020-03-20-OpenFF-Gen-2-Optimization-Set-1-RocheHessian calculation.Cl, S, C, F, O, H, NComplete
OpenFF Gen 2 Opt Set 2 Coverage2020-03-20-OpenFF-Gen-2-Optimization-Set-2-CoverageThe hessian calculations.Cl, Br, S, C, F, P, I, O, H, NError
OpenFF Gen 2 Opt Set 3 Pfizer Discrepancy2020-03-20-OpenFF-Gen-2-Optimization-Set-3-Pfizer-DiscrepancyHessian calculations.Cl, F, C, S, O, H, NComplete
OpenFF Gen 2 Opt Set 4 eMolecules Discrepancy2020-03-20-OpenFF-Gen-2-Optimization-Set-4-eMolecules-DiscrepancyHessian calculations.Cl, Br, S, C, F, P, I, O, H, NComplete
OpenFF Gen 2 Opt Set 5 Bayer2020-03-20-OpenFF-Gen-2-Optimization-Set-5-BayerHessian calculations.Si, Cl, Br, F, C, S, O, H, NError
OpenFF VEHICLe Set 12019-07-02 VEHICLe optimization datasetHessian calculations.S, C, O, H, NError
SMIRNOFF Coverage Set 12019-06-25-smirnoff99Frost-coverageHessian calculations.Cl, Br, S, C, F, P, I, O, H, NError
OpenFF ESP Fragment Conformers v1.02022-01-16-OpenFF-ESP-Fragment-Conformers-v1.0ESP CalculationsN, Cl, C, H, P, Br, O, F, SRunning
OpenFF Theory Benchmarking Single Point Energies v1.02021-09-06-theory-bm-single-pointsSingle Point Energy dataset for the final optimized geometries from MP2/heavy-aug-cc-pVTZ torsiondrives.Cl, F, C, S, O, H, N, PRunning
TorsionNet500 Single Points Dataset v1.02021-11-09-TorsionNet500-single-pointsSingle point energies of final geometries of TorsionNet500 dataset.H, O, F, S, N, Cl, CRunning
SPICE DES Monomers Single Points Dataset v1.12021-11-15-QMDataset-DES-monomers-single-pointsSingle point energy calculation of DES monomers.I, C, Br, P, Cl, H, S, O, F, NComplete
SPICE Solvated Amino Acids Single Points Dataset v1.12021-11-08-QMDataset-Solvated-Amino-Acids-single-pointsSingle point energy calculation of solvated amino acids.N, S, O, C, HComplete
SPICE DES370K Single Points Dataset v1.02021-11-08-QMDataset-DES370K-single-pointsSPICE single point dataset for ML applications.'N', 'O', 'Mg', 'H', 'F', 'K', 'Br', 'Na', 'P', 'Cl', 'I', 'Ca', 'S', 'Li', 'C'Complete
SPICE DES370K Single Points Dataset Supplement v1.02022-02-18-QMDataset-DES370K-single-points-supplementSPICE single point dataset for ML applications.F, H, Cl, S, I, Br, N, Li, O, C, NaRunning
SPICE Dipeptides Single Points Dataset v1.22021-11-08-QMDataset-Dipeptide-single-pointsSPICE single point dataset for ML applications.C ,N ,O ,H ,SComplete
SPICE PubChem Set 1 Single Points Dataset v1.22021-11-08-QMDataset-pubchem-set1-single-pointsSPICE single point dataset for ML applications.'O', 'Cl', 'N', 'C', 'P', 'Br', 'S', 'F', 'I', 'H'Running
SPICE PubChem Set 2 Single Points Dataset v1.22021-11-09-QMDataset-pubchem-set2-single-pointsSPICE single point dataset for ML applications.'H', 'P', 'C', 'Cl', 'Br', 'N', 'F', 'S', 'O', 'I'Running
SPICE PubChem Set 3 Single Points Dataset v1.22021-11-09-QMDataset-pubchem-set3-single-pointsSPICE single point dataset for ML applications.'N', 'C', 'S', 'Cl', 'Br', 'F', 'P', 'I', 'H', 'O'Running
SPICE PubChem Set 4 Single Points Dataset v1.22021-11-09-QMDataset-pubchem-set4-single-pointsSPICE single point dataset for ML applications.'N', 'S', 'Br', 'O', 'C', 'F', 'H', 'I', 'Cl', 'P'Running
SPICE PubChem Set 5 Single Points Dataset v1.22021-11-09-QMDataset-pubchem-set5-single-pointsSPICE single point dataset for ML applications.'F', 'H', 'S', 'Br', 'Cl', 'N', 'P', 'C', 'I', 'O'Running
SPICE PubChem Set 6 Single Points Dataset v1.22021-11-09-QMDataset-pubchem-set6-single-pointsSPICE single point dataset for ML applications.'Cl', 'O', 'N', 'H', 'C', 'P', 'S', 'F', 'Br', 'I'Running
OpenFF ESP Industry Benchmark Set v1.12022-02-02-OpenFF-ESP-Industry-Benchmark-Set-v1.1-single-pointHF/6-31G* conformers of public industry benchmark molecules.N, F, Cl, C, H, O, Br, P, SRunning
SPICE Ion Pairs Single Points Dataset v1.12022-06-08-QMDataset-ion-pairsSPICE single point dataset for ML applications.'F', 'Cl', 'Li', 'Na', 'Br', 'K', 'I'Running
RNA Single Point Dataset v1.02022-07-07-RNA-basepair-triplebase-single-pointsRNA single point dataset consisting of RNA basepairs and triple bases.'P', 'N', 'O', 'C', 'H'Running
RNA Trinucleotide Single Point Dataset v1.02022-10-21-RNA-trinucleotide-single-pointsSingle point energy calculations of RNA basepairs and triple bases'O', 'N', 'C', 'H', 'P'Running
RNA Nucleoside Single Point Dataset v1.02023-03-09-RNA-nucleoside-single-pointsSingle point energy calculations of RNA nucleosides without O5' hydroxyl atom'O', 'N', 'C', 'H'Running
OpenFF multi-Br ESP Fragment Conformers v1.12023-11-30-OpenFF-multi-Br-ESP-Fragment-Conformers-v1.1-single-pointSingle point ESP calculationsBr, C, F, H, N, O, P, S
MLPepper RECAP Optimized Fragments v1.02024-07-26-MLPepper-RECAP-Optimized-Fragments-v1.0Single point property calculations for charge modelsP ,B ,Cl ,Br ,C ,H ,I ,F ,O ,N ,Si ,S
OpenFF NAGL2 ESP Timing Benchmark v1.02024-09-06-OpenFF-NAGL2-ESP-Timing-Benchmark-v1.0Single point ESP calculations for timing/memory benchmarking'P', 'S', 'N', 'C', 'Cl', 'F', 'Br', 'O', 'H', 'I'
OpenFF NAGL2 ESP Timing Benchmark v1.12024-09-18-OpenFF-NAGL2-ESP-Timing-Benchmark-v1.1Single point ESP calculations for timing/memory benchmarking'P', 'S', 'N', 'C', 'Cl', 'F', 'Br', 'O', 'H', 'I'
OpenFF Sulfur Hessian Training Coverage Supplement v1.02024-09-18-OpenFF-Sulfur-Hessian-Training-Coverage-Supplement-v1.0Additional Hessian training data for Sage sulfur and phosphorus parameters (from 'OpenFF Sulfur Optimization Training Coverage Supplement v1.0')O, S, C, Cl, P, N, F, Br, H
OpenFF Sulfur Hessian Training Coverage Supplement v1.12024-11-08-OpenFF-Sulfur-Hessian-Training-Coverage-Supplement-v1.1Additional Hessian training data for Sage sulfur and phosphorus parameters (from 'OpenFF Sulfur Optimization Training Coverage Supplement v1.0')O, S, C, Cl, P, N, F, Br, H
OpenFF Aniline Para Hessian v1.02024-10-07-OpenFF-Aniline-Para-Hessian-v1.0Hessian single points for the final molecules in the OpenFF Aniline Para Opt v1.0 dataset'O', 'Cl', 'S', 'Br', 'H', 'F', 'N', 'C'
OpenFF Aniline Para Hessian v1.12024-11-12-OpenFF-Aniline-Para-Hessian-v1.1Hessian single points for the final molecules in the OpenFF Aniline Para Opt v1.0 dataset, re-generated to preserve molecule IDs between opt and basic datasets.'O', 'Cl', 'S', 'Br', 'H', 'F', 'N', 'C'
OpenFF Gen2 Hessian Dataset Protomers v1.02024-10-07-OpenFF-Gen2-Hessian-Dataset-Protomers-v1.0Hessian single points for the final molecules in the OpenFF Gen2 Optimization Dataset Protomers v1.0 dataset'H', 'C', 'Cl', 'P', 'F', 'Br', 'O', 'N', 'S'
OpenFF Gen2 Hessian Dataset Protomers v1.12024-11-12-OpenFF-Gen2-Hessian-Dataset-Protomers-v1.1Hessian single points for the final molecules in the OpenFF Gen2 Optimization Dataset Protomers v1.0 dataset, re-generated to preserve molecule IDs between opt and basic datasets.'H', 'C', 'Cl', 'P', 'F', 'Br', 'O', 'N', 'S'
MLPepper-RECAP-Optimized-Fragments-Add-Iodines-v1.02024-10-11-MLPepper-RECAP-Optimized-Fragments-Add-Iodines-v1.0Set of diverse iodine containing molecules with a number of calculated electrostatic properties.Br, Cl, S, B, O, Si, C, N, I, P, H, F
OpenFF Iodine Chemistry Hessian Dataset v1.02024-11-11-OpenFF-Iodine-Chemistry-Hessian-Dataset-v1.0Hessian single points for the final molecules in the OpenFF Iodine Chemistry Optimization Dataset v1.0 datasetI, F, Br, C, Cl, O, S, N, H
Curated tmQM-xtb Dataset: T=100K Dataset Restricted to Pd, Zn, Fe, Cu v0.02025-03-17-Curated-tmQM-xtb-Dataset-T=100K-Dataset-Restricted-to-Pd-Zn-Fe-Cu-v0.0BP86/def2-TZVP Conformers for single metal complexes with Pd, Fe, Zn, Cu, Mg, Li and change of {-1,0,+1}Br, C, Cl, Cu, F, Fe, H, N, O, P, Pd, S, Zn
OpenFF Cresset Additional Coverage Hessian v4.02025-03-31-OpenFF-Cresset-Additional-Coverage-Hessian-v4.0Hessian single points for the final molecules in the OpenFF Cresset Additional Coverage Optimizations v4.0 datasetO, C, F, S, H, N, Br, Cl
OpenFF Optimization Hessians 2019-07 to 2025-03 v4.02025-04-14-OpenFF-Optimization-Hessians-2019-07-to-2025-03-v4.0Hessian single points for the final molecules in OpenFF optimization datasets from 2019-07 to 2025-03S, H, O, Br, F, N, P, Cl, I, C
OpenFF CX3-CX4 singlepoints v4.02025-05-21-OpenFF-CX3-CX4-singlepoints-v4.0Single-points of molecules where Sage 2.2.1 torsions t17 and t18 have been drivenBr, C, Cl, F, H, I, N, O, S
MLPepper RECAP Optimized Fragments v1.12025-07-01-MLPepper-RECAP-Optimized-Fragments-v1.1Single point property calculations for charge models, expanded to include iodineP ,B ,Cl ,Br ,C ,H ,I ,F ,O ,N ,Si ,S
tmQM xtb Dataset T=100K low-mw high-coordinate xtb mult=1 v0.02025-08-14-tmQM-xtb-Dataset-T=100K-low-mw-high-coordinate-xtb-mult=1-v0.0BP86/def2-TZVP Conformers for single metal complexes with Pd, Fe, Zn, Cu, and charge of {-1,0,+1} run in xtb with a multiplicity of 1 and in DFT run with a multiplicity of 1, 3, or 5. MW <= 600 Da, generally high coordinate, and a max of 30 geometry samplesBr, C, Cl, Cu, F, Fe, H, N, O, P, Pd, S, Zn
tmQM xtb Dataset T=100K low-mw high-coordinate xtb mult=3 v0.02025-09-02-tmQM-xtb-Dataset-T=100K-low-mw-high-coordinate-xtb-mult=3-v0.0BP86/def2-TZVP Conformers for single metal complexes with Pd, Fe, Zn, Cu, and charge of {-1,0,+1} run in xtb with a multiplicity of 3 and in DFT run with a multiplicity of 1, 3, or 5. MW <= 600 Da, generally high coordinate, and a max of 30 geometry samplesBr, C, Cl, Cu, F, Fe, H, N, O, P, Pd, S, Zn
tmQM xtb Dataset T=100K low-mw high-coordinate xtb mult=5 v0.02025-09-02-tmQM-xtb-Dataset-T=100K-low-mw-high-coordinate-xtb-mult=5-v0.0BP86/def2-TZVP Conformers for single metal complexes with Pd, Fe, Zn, Cu, and charge of {-1,0,+1} run in xtb with a multiplicity of 5 and in DFT run with a multiplicity of 1, 3, or 5. MW <= 600 Da, generally high coordinate, and a max of 30 geometry samplesBr, C, Cl, Cu, F, Fe, H, N, O, P, Pd, S, Zn
OpenFF TMC Atom Energies v0.02025-10-15-OpenFF-TMC-Atom-Energies-v0.0Element energies at various multiplicities, and formal charges of +1, 0, and -1. Elements include C, H, P, S, O, N, F, Cl, B, Li, Na, K, Mg, Ca, Br, Pd, Fe, Zn, Cu, Rh, Ir, Pt, Ni, Cr, Ag, Ti at the following model chemistries: B3LYP-D3BJ/DZVP, BP86/def2-TZVP, wB97M-V/def2-TZVPD, wB97M-D3BJ/def2-TZVPPD."C", "H", "P", "S", "O", "N", "F", "Cl", "B", "Li", "Na", "K", "Mg", "Ca", "Br", "Pd", "Fe", "Zn", "Cu", "Rh", "Ir", "Pt", "Ni", "Cr", "Ag", "Ti"
tmQM xtb Dataset T=100K low-mw high-coordinate geom-mult=1 v0.02025-12-19-tmQM-xtb-Dataset-T=100K-low-mw-high-coordinate-geom-mult=1-v0.0BP86/def2-TZVP Conformers for single metal complexes with Pd, Fe, Zn, Cu, and charge of {-1,0,+1} and multiplicity of 1. MW <= 600 Da, generally high coordinate, and 10 geometry samplesBr, C, Cl, Cu, F, Fe, H, N, O, P, Pd, S, Zn
tmQM xtb Dataset T=100K low-mw high-coordinate geom-mult=3 v0.02025-12-19-tmQM-xtb-Dataset-T=100K-low-mw-high-coordinate-geom-mult=3-v0.0BP86/def2-TZVP Conformers for single metal complexes with Pd, Fe, Zn, Cu, and charge of {-1,0,+1} and multiplicity of 3. MW <= 600 Da, generally high coordinate, and 20 geometry samplesBr, C, Cl, Cu, F, Fe, H, N, O, P, Pd, S, Zn

Optimization Datasets

These are currently used to find a minimum energy conformation of a molecule.

QCArchive DatasetFolderDescriptionElementsStatus
OpenFF Optimization Set 12019-05-16-Roche-Optimization_SetGeometry optimizations of a set of Roche molecules for forcefield fitting.Cl, S, C, F, O, H, NComplete
SMIRNOFF Coverage Set 12019-06-25-smirnoff99Frost-coverageAn optimization dataset the excises all parameters in Smirnoff99Frost.Cl, Br, S, C, F, P, I, O, H, NError
OpenFF VEHICLe Set 12019-07-02 VEHICLe optimization datasetVEHICLe (virtual exploratory heterocyclic library) dataset of 24,867 aromatic heterocyclic rings with expanded stereochemistry.S, C, O, H, NError
OpenFF Discrepancy Benchmark 12019-07-05 eMolecules force field discrepancies 1A set of molecules whose optimized structures differs across forcefields.Cl, Br, S, C, F, P, I, O, H, NError
OpenFF NCI250K Boron 12019-07-05 OpenFF NCI250K Boron 1This database is a subset of boron-containing compounds from the NCI250K (Release 1 - Oct 1999) compound dataset.Cl, Br, S, C, F, B, O, H, NComplete
OpenFF Ehrman Informative Optimization v0.22019-09-06-OpenFF-Informative-SetThis provides an optimization dataset based on an initial batch of Jordan Ehrman's analysis of eMolecules, pulling out molecules with minimized geometries which are substantially different in different force fields.Cl, Br, S, C, F, P, I, O, H, NError
Pfizer discrepancy optimization dataset 12019-09-07-Pfizer-discrepancy-optimization-dataset-1This database is a subset of 100 challenging small molecule fragments where HF/minix followed by B3LYP/6-31G*//B3LYP/6-31G** differed substantially from OPLS3e.Cl, F, C, S, O, H, NComplete
FDA optimization dataset 12019-09-08-fda-optimization-dataset-1he ZINC15 FDA dataset was retrieve in mol2 format on Sun Sep 8 20:44:34 EDT 2019 via: http://zinc.docking.org/substances/subsets/fda.mol2?count=allCl, Br, F, C, S, P, I, O, H, NError
Kinase Inhibitors: WBO Distributions2019-11-27-kinase-inhibitor-optimizationGeometry optimization of kinase inhibitor conformers to explore WBO conformation dependency.Cl, Br, S, C, F, P, I, O, H, NComplete
OpenFF Gen 2 Opt Set 1 Roche2020-03-20-OpenFF-Gen-2-Optimization-Set-1-Roche2nd generation optimization dataset for bond and valence parameter fitting.Cl, S, C, F, O, H, NComplete
OpenFF Gen 2 Opt Set 2 Coverage2020-03-20-OpenFF-Gen-2-Optimization-Set-2-Coverage2nd generation optimization dataset for bond and valence parameter fitting.Cl, Br, S, C, F, P, I, O, H, NError
OpenFF Gen 2 Opt Set 3 Pfizer Discrepancy2020-03-20-OpenFF-Gen-2-Optimization-Set-3-Pfizer-Discrepancy2nd generation optimization dataset for bond and valence parameter fitting.Cl, F, C, S, O, H, NComplete
OpenFF Gen 2 Opt Set 4 eMolecules Discrepancy2020-03-20-OpenFF-Gen-2-Optimization-Set-4-eMolecules-Discrepancy2nd generation optimization dataset for bond and valence parameter fittingCl, Br, S, C, F, P, I, O, H, NComplete
OpenFF Gen 2 Opt Set 5 Bayer2020-03-20-OpenFF-Gen-2-Optimization-Set-5-Bayer2nd generation optimization dataset for bond and valence parameter fitting.Si, Cl, Br, F, C, S, O, H, NError
OpenFF Protein Fragments v1.02020-07-06-OpenFF-Protein-Fragments-InitialThis is the initial test of running constrained optimizations on various protein fragments prepared by David Cerutti. Here we just have ALA as the central residue.H, C, O, NComplete
OpenFF Protein Fragments v2.02020-08-12-OpenFF-Protein-Fragments-version2This is the full protein fragment dataset (version2) consisting of constrained optimizations on various protein fragments prepared by David Cerutti. We have 12 central residues which are capped with a combination of different terminal residues.S, C, O, H, NError
OpenFF Sandbox CHO PhAlkEthOH v1.02020-09-18-OpenFF-Sandbox-CHO-PhAlkEthOHThe molecules are from the AlkEthOH and PhEthOH datasets originally used to build the smirnoff99Frosst parameters. The AlkEthOH was taken from hereH, C, ORunning
OpenFF Industry Benchmark Season 1 v1.02021-03-30-OpenFF-Industry-Benchmark-Season-1-v1.0The combination of all publicly chosen compound sets by industry partners from the OpenFF season 1 industry benchmarkN, F, Cl, C, H, O, Br, P, SError
OpenFF Industry Benchmark Season 1 v1.12021-06-04-OpenFF-Industry-Benchmark-Season-1-v1.1The combination of all publicly chosen compound sets by industry partners from the OpenFF season 1 industry benchmarkN, F, Cl, C, H, O, Br, P, SRunning
OpenFF Theory Benchmarking Constrained Optimization Set MP2 heavy-aug-cc-pVTZ v1.12020-11-25-theory-bm-set-mp2-heavy-aug-cc-pvtzThis is a Constrained Optimization dataset for benchmarking MP2/heavy-aug-cc-pVTZ.Running
OpenFF Industry Benchmark Season 1 - MM v1.12021-07-28-OpenFF-Industry-Benchmark-Season-1-MM-v1.1The combination of all publicly chosen compound sets by industry partners from the OpenFF season 1 industry benchmark; MM computations starting from QM-optimized geometries.N, F, Cl, C, H, O, Br, P, SRunning
OpenFF RESP Polarizability Optimizations v1.02021-10-01-OpenFF-resppol-mp2-single-pointA data set used for training ESP-fitting based typed atomic polarizabilities with a direct approximation.N, C, H, ORunning
OpenFF RESP Polarizability Optimizations v1.12021-10-01-OpenFF-resppol-mp2-single-pointA data set used for training ESP-fitting based typed atomic polarizabilities with a direct approximation.N, C, H, ORunning
SPICE Dipeptides Optimization Dataset v1.02021-11-11-Dipeptide-optimization-setOptimization set created from the smiles of SPICE Dipeptide dataset.N, C, H, O, SRunning
OpenFF Gen 2 Optimization Dataset Protomers v1.02021-12-21-OpenFF-Gen2-Optimization-Set-ProtomersOptimization set created from the smiles of missing protomers in Gen 2 optimization sets.O, F, S, Br, Cl, C, P, H, I, NRunning
OpenFF ESP Industry Benchmark Set v1.02022-02-02-OpenFF-ESP-Industry-Benchmark-Set-v1.0-optimization-setHF/6-31G* conformers of public industry benchmark molecules.N, F, Cl, C, H, O, Br, P, SRunning
OpenFF Protein Capped 1-mers 3-mers Optimization Dataset v1.02022-05-30-OpenFF-Protein-Capped-1-mers-3-mers-OptimizationOptimization dataset for protein capped 1-mers Ace-X-Nme and capped 3-mers Ace-Y-X-Y-Nme with Y = {Ala, Val} and X = 26 canonical amino acids with common protomers/tautomers (Ash, Cyx, Glh, Hid, Hip, and Lyn)H, C, N, O, S
OpenFF Iodine Chemistry Optimization Dataset v1.02022-07-27-OpenFF-iodine-optimization-setOptimization set created from Gen1 and Gen2 molecules containing iodine'C', 'F', 'O', 'H', 'Br', 'Cl', 'N', 'I', 'S'
OpenFF multi-Br ESP Fragment Conformers v1.02023-11-02-OpenFF-multi-Br-ESP-Fragment-Conformers-v1.0Optimization set created from 2022-01-16-OpenFF-ESP-Fragment-Conformers-v1.0 by selecting molecules with multiple Cl atoms and replacing them with BrBr, C, F, H, N, O, P, S
XtalPi Shared Fragments OptimizationDataset v1.02024-01-30-xtalpi-shared-fragments-optimization-v1.0Representative optimization molecules used to fit XFFC, H, Cl, Br, S, O, F, N, P
XtalPi 20-percent Fragments OptimizationDataset v1.02024-04-02-xtalpi-20-percent-fragments-optimization-v1.0Larger (20%) representative subset of molecules used to fit XFFCl, P, Br, I, H, C, B, Si, O, N, F, S
OpenFF Torsion Benchmark Supplement Optimization Dataset v1.02024-04-18-OpenFF-Torsion-Benchmark-Supplement-Optimization-Dataset-v1.0Additional optimizations for benchmarking Sage 2.2.0 proper torsions and new parameters from the torsion multiplicity workH, C, N, O, F, P, S, Cl, Br
OpenFF Torsion Multiplicity Optimization Training Coverage Supplement v1.02024-06-20-OpenFF-Torsion-Multiplicity-Optimization-Training-Coverage-Supplement-v1.0Additional optimization training data for Sage 2.2.0 proper torsions and new parameters from the torsion multiplicity workC, Cl, S, O, H, P, N, Br
OpenFF Torsion Multiplicity Optimization Benchmarking Coverage Supplement v1.02024-06-24-OpenFF-Torsion-Multiplicity-Optimization-Benchmarking-Coverage-Supplement-v1.0Additional optimization benchmarking data for Sage 2.2.0 proper torsions and new parameters from the torsion multiplicity workCl, H, I, S, O, N, Br, C, P
OpenFF Iodine Fragment Opt v1.02024-09-10-OpenFF-Iodine-Fragment-Opt-v1.0B3LYP-D3BJ/DZVP optimized conformers for a variety of I-containing fragment moleculesC, O, I, S, F, Br, Cl, N, H
OpenFF Sulfur Optimization Training Coverage Supplement v1.02024-09-11-OpenFF-Sulfur-Optimization-Training-Coverage-Supplement-v1.0Additional optimization training data for Sage sulfur and phosphorus parametersC, S, F, O, H, Cl, Br, P, N
OpenFF Sulfur Optimization Benchmarking Coverage Supplement v1.02024-09-18-OpenFF-Sulfur-Optimization-Benchmarking-Coverage-Supplement-v1.0Additional optimization benchmarking data for Sage sulfur and phosphorus parametersS, P, Cl, C, N, O, H, Br, F
OpenFF Lipid Optimization Training Supplement v1.02024-10-08-OpenFF-Lipid-Optimization-Training-Supplement-v1.0Additional optimization training data for Sage from representative LIPID MAPS fragmentsI, Br, O, H, P, C, N, Cl, F, S
OpenFF Lipid Optimization Benchmark Supplement v1.02024-10-30-OpenFF-Lipid-Optimization-Benchmark-Supplement-v1.0Additional optimization benchmarking data for Sage from representative LIPID MAPS fragmentsO, H, C, Br, P, N, Cl, F, S, I
OpenFF NAGL2 Training Optimization Dataset Part 1 v4.02024-11-19-OpenFF-NAGL2-Training-Optimization-Dataset-Part-1-v4.0Optimization dataset for NAGL2 training, part 1Cl, O, C, P, I, Br, B, S, N, F, H, Si
OpenFF NAGL2 Training Optimization Dataset Part 2 v4.02024-11-19-OpenFF-NAGL2-Training-Optimization-Dataset-Part-2-v4.0Optimization dataset for NAGL2 training, part 2Si, B, O, I, S, Cl, N, H, C, P, F, Br
OpenFF Organometallics Exploratory Optimization Dataset2024-12-03-OpenFF-Organometallics-Exploratory-Optimization-DatasetOptimization training data for organometallic moleculesF, P, O, C, Zn, N, Ni, Pt, S, Pd, Mg, Br, Rh, Fe, H, Cl, B, Li
OpenFF NAGL2 Training Optimization Dataset v4.02024-12-09-OpenFF-NAGL2-Training-Optimization-Dataset-v4.0Optimization dataset for NAGL2 training, combined and filteredSi, B, O, I, S, Cl, N, H, C, P, F, Br
SPICE Dipeptides Lowest E Conformer Optimization Dataset v4.02025-01-08-SPICE-Dipeptides-Lowest-E-Conformer-Optimization-Dataset-v4.0Optimization dataset for the lowest energy conformers of the Dipeptides subset of SPICEH, S, C, O, N
SPICE DES370k Monomers Lowest E Conformer Optimization Dataset v4.02025-01-08-SPICE-DES370k-Monomers-Lowest-E-Conformer-Optimization-Dataset-v4.0Optimization dataset for the lowest energy conformer of molecules in the DES370k monomer subset of SPICES, O, F, P, Br, N, H, C, Cl, I
SPICE Dipeptides Partial Relaxation Dataset v4.02025-02-26-SPICE-Dipeptides-Partial-Relaxation-Dataset-v4.0Partial relaxation dataset for the Dipeptides subset of SPICEH, S, C, O, N
OpenFF Cresset Additional Coverage Optimizations v4.02025-03-06-OpenFF-Cresset-Additional-Coverage-Optimizations-v4.0Additional optimizations from Cresset moleculesS, Br, C, N, Cl, O, F, H
OpenFF Protein PDB 4-mers v4.02025-03-05-OpenFF-Protein-PDB-4mer-v4.0Optimization dataset for peptide 4-mers extracted from PDB structuresC, N, O, H
SPICE DES370k Monomers Partial Relaxation Dataset v4.02025-03-12-SPICE-DES370k-Monomers-Partial-Relaxation-Dataset-v4.0Partial relaxation dataset for the DES370k monomers subset of SPICEH, C, Cl, I, F, O, S, Br, N, P
TM Benchmark Optimization Dataset Step 1 v0.02025-04-03-TM-Benchmark-Optimization-Dataset-Step-1-v0.0Diverse set of conformers for single metal complexes with Pd, Fe, Zn, Cu, Mg, Li and charge of {-1,0,+1}, with some organic molecules undergoing step 1, initial optimization for benchmarking purposesBr, C, Cl, Cu, F, Fe, H, Li, Mg, N, O, P, Pd, S, Zn
OpenFF Additional Generated ChEMBL Optimizations v4.02025-04-14-OpenFF-Additional-ChEMBL-Fragment-Optimizations-v4.0Diverse set of molecules to increase coverage of rare valence parameters in Sage 2.2.1F, Br, Cl, P, S, O, N, H, C, I
OpenFF Industry Benchmark Season 1 v1.22025-06-24-OpenFF-Industry-Benchmark-Season-1-v1.2The combination of all publicly chosen compound sets by industry partners from the OpenFF Industry Benchmark Season 1 with standard post-submission filters applied and unrealistic conformers removed.Br, C, Cl, F, H, N, O, P, SRunning
TM Benchmark Optimization Dataset Step 2 v0.02025-12-17-TM-Benchmark-Optimization-Dataset-Step-2-v0.0Diverse set of conformers for single metal complexes with Pd, Fe, Zn, Cu, Mg, Li and charge of {-1,0,+1}, with some organic molecules, all undergoing step 2, high level of theory final optimization.Br, C, Cl, Cu, F, Fe, H, Li, Mg, N, O, P, Pd, S, Zn
OpenFF NSP Optimization Set 1 Sulfur v4.02026-02-05-OpenFF-NSP-Optimization-Set-1-Sulfur-v4.0Assess coverage of various NSP chemistriesC, S, F, O, H, Cl, Br, P, N, I
OpenFF NSP Optimization Set 1 Phosphorus v4.02026-02-05-OpenFF-NSP-Optimization-Set-1-Phosphorus-v4.0Assess coverage of various NSP chemistriesH, Br, N, S, C, F, I, Cl, P, O
OpenFF NSP Optimization Set 1 Nitrogen v4.02026-02-05-OpenFF-NSP-Optimization-Set-1-Nitrogen-v4.0Assess coverage of various NSP chemistriesC, S, F, O, H, Cl, Br, P, N, I
OpenFF NSP Optimization Set 2 Phosphorus v4.02026-02-17-OpenFF-NSP-Optimization-Set-2-Phosphorus-v4.0Assess coverage of various NSP chemistriesH, Br, N, S, C, F, I, Cl, P, O
OpenFF NSP Optimization Set 2 Sulfur v4.02026-02-17-OpenFF-NSP-Optimization-Set-2-Sulfur-v4.0Assess coverage of various NSP chemistriesC, S, F, O, H, Cl, Br, P, N, I
OpenFF NSP Optimization Set 2 Nitrogen v4.02026-02-17-OpenFF-NSP-Optimization-Set-2-Nitrogen-v4.0Assess coverage of various NSP chemistriesC, S, F, O, H, Cl, Br, P, N, I
OpenFF Organometallic Complexes Architector Minimum Energy Structures v0.02026-04-06-OpenFF-Organometallic-Complexes-Architector-Minimum-Energy-Structures-v0.0BP86/def2-TZVP single metal complex optimizations with Pd, Fe, Zn, Cu, Li, and Mg and charge of {-1,0,+1} and multiplicities ranging from 1 to 6 and coordination ranging from 1 to 12, and MW <= 1005 Da.Br, C, Cu, Fe, H, Li, Mg, N, O, P, Pd, S, Zn
OpenFF SPICE2 Subset Optimization Dataset v4.02026-04-08-OpenFF-SPICE2-Subset-Optimization-Dataset-v4.0Improve optimization coverage for suspicious SPICE2 conformers across challenging charged chemistriesP, S, C, Br, Cl, N, F, H, I, O

TorsionDrive Datasets

These are currently used perform a complete rotation of one or more selected bonds, where optimizations are performed over a discrete set of angles.

QCArchive DatasetFolderDescriptionElementsStatus
Fragment Stability Benchmark2019-03-06-Fragmenter_Stability-BenchmarkExamination of different fragmentation schemes.Cl, F, C, P, I, O, H, NError
OpenFF Group1 Torsions2019-05-01-OpenFF-Group1-TorsionsA collection of torsion drives for forcefield fitting.Cl, F, C, S, O, H, NError
SMIRNOFF Coverage Torsion Set 12019-07-01-smirnoff99Frost-coverage-torsionSet of small molecules that use all smirnoff99Frost parameters.C', Br, S, C, F, P, I, O, H, NError
OpenFF Substituted Phenyl Set 12019-07-25-phenyl-setA set of substituted phenyl torsiondrives.Cl, Br, F, C, I, O, H, NError
Pfizer discrepancy torsion dataset 12019-09-07-Pfizer-discrepancy-torsion-dataset-1This database is a subset of 100 challenging small molecule fragments where HF/minix followed by B3LYP/6-31G*//B3LYP/6-31G** differed substantially from OPLS3e.Cl, F, C, S, O, H, NError
TorsionDrive Paper2019-11-07-TorsionDrive-PaperTorsion Drives to explore wavefront propagation for the TorsionDrive paper.C, H, OError
OpenFF Primary Benchmark 1 Torsion Set2019-12-05-OpenFF-Benchmark-Primary-1-torsionValidation of optimized force field torsion parameters.Cl, Br, F, C, S, O, H, NError
OpenFF Primary Benchmark 2 Torsion Set2020-01-17-OpenFF-Benchmark-Full-1-torsionValidation of optimized force field torsion parameters.Cl, Br, S, C, F, P, I, O, H, NError
OpenFF Group1 Torsions 22020-01-31-OpenFF-Group1-Torsions-2Generation of additional data for fitting of newly added torsion terms.H, C, O, NComplete
OpenFF Group1 Torsions 32020-02-10-OpenFF-Group1-Torsions-3Generation of additional data for fitting of t128 and t129H, C, O, NError
OpenFF Gen 2 Torsion Set 1 Roche2020-03-12-OpenFF-Gen-2-Torsion-Set-1-RocheDesign 2nd generation torsion dataset for valence parameter fitting.F, C, S, O, H, NError
OpenFF Gen 2 Torsion Set 2 Coverage2020-03-12-OpenFF-Gen-2-Torsion-Set-2-CoverageDesign 2nd generation torsion dataset for valence parameter fitting.Cl, Br, F, C, S, P, I, O, H, NError
OpenFF Gen 2 Torsion Set 3 Pfizer Discrepancy2020-03-12-OpenFF-Gen-2-Torsion-Set-3-Pfizer-DiscrepancyDesign 2nd generation torsion dataset for valence parameter fittingS, C, F, O, H, NRunning
OpenFF Gen 2 Torsion Set 4 eMolecules Discrepancy2020-03-12-OpenFF-Gen-2-Torsion-Set-4-eMolecules-DiscrepancyDesign 2nd generation torsion dataset for valence parameter fitting.Cl, Br, F, C, S, P, I, O, H, NError
OpenFF Gen 2 Torsion Set 5 Bayer2020-03-12-OpenFF-Gen-2-Torsion-Set-5-BayerDesign 2nd generation torsion dataset for valence parameter fitting.Cl, Br, F, C, S, O, H, NError
OpenFF Gen 2 Torsion Set 6 supplemental2020-03-12-OpenFF-Gen-2-Torsion-Set-6-supplementalDesign 2nd generation torsion dataset for valence parameter fitting.S, C, O, H, NError
OpenFF Gen 2 Torsion Set 1 Roche 22020-03-23-OpenFF-Gen-2-Torsion-Set-1-Roche-2Design 2nd generation torsion dataset for valence parameter fitting.Cl, F, C, S, O, H, NError
OpenFF Gen 2 Torsion Set 2 Coverage 22020-03-23-OpenFF-Gen-2-Torsion-Set-2-Coverage-2Design 2nd generation torsion dataset for valence parameter fitting.Cl, Br, F, C, S, P, I, O, H, NError
OpenFF Gen 2 Torsion Set 3 Pfizer Discrepancy 22020-03-23-OpenFF-Gen-2-Torsion-Set-3-Pfizer-Discrepancy-2Design 2nd generation torsion dataset for valence parameter fitting.S, C, F, O, H, NComplete
OpenFF Gen 2 Torsion Set 4 eMolecules Discrepancy 22020-03-23-OpenFF-Gen-2-Torsion-Set-4-eMolecules-Discrepancy-2Design 2nd generation torsion dataset for valence parameter fitting.Cl, Br, F, C, S, P, I, O, H, NError
OpenFF Gen 2 Torsion Set 5 Bayer 22020-03-26-OpenFF-Gen-2-Torsion-Set-5-Bayer-2Design 2nd generation torsion dataset for valence parameter fitting.Cl, Br, F, C, S, O, H, NError
OpenFF Gen 2 Torsion Set 6 supplemental 22020-03-26-OpenFF-Gen-2-Torsion-Set-6-supplemental-2Design 2nd generation torsion dataset for valence parameter fitting.Br S, C, F, O, H, NError
OpenFF Fragmenter Validation 1.02020-04-28-Fragmenter-testExamination of different fragmentation schemes.Cl, S, C, P, I, O, H, NError
OpenFF DANCE 1 eMolecules t142 v1.02020-06-01-DANCE-1-eMolecules-t142-selectedMolecules selected from the eMolecules database by DANCE to improve t142 parameterization in smirnoff99Frosst.Cl, Br, F, C, S, O, H, NError
OpenFF Rowley Biaryl v1.02020-06-17-OpenFF-Biaryl-setThis is a TorsionDrive dataset consisting of biaryl torsions provided by Christopher Rowley. Originally used to benchmark parsley, but could also be useful for fitting.S, C, O, H, NRunning
OpenFF-benchmark-ligand-fragments-v1.02020-07-27-OpenFF-Benchmark-LigandsThis is a torsiondrive dataset created from the OpenFF FEP benchmark dataset. The ligands are fragmented before having key torsions driven.Cl, Br, S, C, F, I, O, H, NRunning
OpenFF Theory Benchmarking Set B3LYP-D3BJ DZVP v1.02020-07-27-theory-bm-set-b3lyp-d3bj-dzvpThis is a TorsionDrive dataset consisting of 36 1-D torsions selected for benchmarking different QM levels.Cl, F, C, S, P, O, H, NComplete
OpenFF Theory Benchmarking Set B3LYP-D3BJ def2-TZVP v1.02020-07-30-theory-bm-set-b3lyp-d3bj-def2-tzvpThis is a TorsionDrive dataset consisting of 36 1-D torsions selected for benchmarking different QM levels.Cl, F, C, S, P, O, H, NComplete
OpenFF Theory Benchmarking Set B3LYP-D3BJ def2-TZVPD v1.02020-07-30-theory-bm-set-b3lyp-d3bj-def2-tzvpdThis is a TorsionDrive dataset consisting of 36 1-D torsions selected for benchmarking different QM levels.Cl, F, C, S, P, O, H, NError
OpenFF Theory Benchmarking Set B3LYP-D3BJ def2-TZVPP v1.02020-07-30-theory-bm-set-b3lyp-d3bj-def2-tzvppThis is a TorsionDrive dataset consisting of 36 1-D torsions selected for benchmarking different QM levels.Cl, F, C, S, P, O, H, NComplete
OpenFF Protein Fragments TorsionDrives v1.02020-09-16-OpenFF-Protein-Fragments-TorsionDrivesThis is a protein fragment dataset consisting of torsion drives on various protein fragments prepared by David Cerutti. We have 12 central residues capped with a combination of different terminal residues. We drive the following angles for each fragment: - omega - phi - psi - chi1 (if applicable) - chi2 (if applicable).S, C, O, H, NError
OpenFF WBO Conjugated Series v1.02021-01-25-OpenFF-Conjugated-SeriesThis is a torsion drive dataset that consists of various chemistries that probe a range of conjugated bonds. The goal of this dataset is to develop WBO interpolated torsions for the OpenFF force field.S, C, O, H, NError
OpenFF Amide Torsion Set v1.02021-03-23-OpenFF-Amide-Torsion-Set-v1.0Amides, thioamides and amidines diversely functionalized.S, C, O, H, NRunning
OpenFF Aniline Para Opt v1.02021-04-02-OpenFF-Aniline-Para-Opt-v1.0Optimizations of diverse, para-substituted aniline derivatives.Br, C, O, N, S, H, Cl, FRunning
OpenFF Gen3 Torsion Set v1.02021-04-09-OpenFF-Gen3-Torsion-Set-v1.0This dataset is a simple-molecule-only torsiondrive dataset, aiming to avoid issue of torsion parameter contamination by large internal non-bonded interactions during a valece parameter optimization. Molecules with one effective rotating bond were generate by combining two simple substituents, which were identified by fragmenting small drug like molecules. Torsions from the generated molecule set were selected using clustering method, in a way that the dataset can allow a chemical diversity of molecules training each torsion parameter.F ,N ,H ,Cl ,P ,S ,O ,Br ,CRunning
OpenFF Aniline 2D Impropers v1.02021-03-29-OpenFF-Aniline-2D-Impropers-v1.0This dataset contains a set of aniline derivatives which have para-substituted groups of varying electron donating and withdrawing properties. This dataset was curated in an effort to improve and understand improper torsions in force fields. We will scan the improper and proper angle simultaneously to better understand the coupling and energetics of these torsions.O, C, S, H, NRunning
OpenFF BCC Refit Study COH v2.02021-06-22-OpenFF-BCC-Refit-Study-COH-v2.0A data set curated for the initial stage of the on-going OpenFF study which aims to co-optimize the AM1BCC bond charge correction (BCC) parameters against an experimental training set of density and enthalpy of mixing data points and a QM training set of electric field data. The initial data set is limited to only molecules composed of C, O, H. This limited scope significantly reduces the number of BCC parameters which must be retrained, thus allowing for easier convergence of the initial optimizations. The included molecules were combinatorially generated to cover a range of alcohol, ether, and carbonyl containing molecules.O, C, S, H, NRunning
OpenFF-benchmark-ligand-fragments-v2.02021-08-10-OpenFF-JACS-Fragments-v2.0This is a torsiondrive dataset created from the OpenFF FEP benchmark dataset. The ligands are fragmented using openff-fragmenter with both ambertools and openeye before having key torsions driven.S, N, Br, C, H, O, Cl, F, IRunning
OpenFF-Protein-Dipeptide-2D-TorsionDrive-v2.12021-11-18-OpenFF-Protein-Dipeptide-2D-TorsionDriveTwo-dimensional TorsionDrives on phi and psi for dipeptides of the 20 canonical amino acids and 6 alternate protomers/tautomers.H, C, N, O, S
OpenFF-Protein-Capped-1-mer-Sidechains-v1.32022-02-10-OpenFF-Protein-Capped-1-mer-SidechainsTwo-dimensional TorsionDrives on chi1 and chi2 for capped 1-mers of amino acids with a rotatable bond in the sidechain.H, C, N, O, S
OpenFF-Protein-Capped-3-mer-Backbones-v1.02022-05-30-OpenFF-Protein-Capped-3-mer-BackbonesTwo-dimensional TorsionDrives on phi and psi for capped 3-mers Ace-Y-X-Y-Nme with Y = {Ala, Val}.H, C, N, O, S
OpenFF multiplicity correction torsion drive data v1.12022-04-29-OpenFF-multiplicity-correction-torsion-drive-data-v1.1A torsiondrive dataset created to correct multiplicity issues in the force field.'S', 'P', 'O', 'C', 'H', 'N'Running
OpenFF Protein Capped 3-mer Omega v1.02023-02-06-OpenFF-Protein-Capped-3-mer-OmegaTorsionDrives on omega for capped 3-mers Ace-Ala-X-Ala-Nme.H, C, N, O, S
XtalPi Shared Fragments TorsiondriveDataset v1.02024-01-30-xtalpi-shared-fragments-torsiondrive-v1.0Representative torsion scan molecules used to fit XFFC, H, Cl, Br, S, O, F, N, P
OpenFF Torsion Coverage Supplement v1.02024-02-29-OpenFF-Torsion-Coverage-Supplement-v1.0Additional TorsionDrives to improve coverage for Sage 2.1.0 proper torsions and new parameters from the torsion multiplicity workC, Cl, F, H, N, O, S
OpenFF RNA Dinucleoside Monophosphate TorsionDrives v1.02024-03-26-OpenFF-RNA-Dinucleoside-Monophosphate-TorsionDrivesTorsionDrives of non-ring backbone, glycosidic, and hydroxyl dihedrals in RNA XpY 2-mers.H, C, N, O, P
XtalPi 20-percent Fragments TorsiondriveDataset v1.02024-04-02-xtalpi-20-percent-fragments-torsiondrive-v1.0Torsion scans of larger representative subset (20%) of molecules used to fit XFFO, Br, I, Si, B, C, P, S, Cl, H, N, F
OpenFF Torsion Drive Supplement v1.02024-04-17-OpenFF-Torsion-Drive-Supplement-v1.0Additional TorsionDrives to expand training data for Sage 2.2.0 proper torsions and new parameters from the torsion multiplicity workH, C, N, O, P, S
OpenFF Torsion Multiplicity Torsion Drive Coverage Supplement v1.02024-06-14-OpenFF-Torsion-Multiplicity-Torsion-Drive-Coverage-Supplement-v1.0Additional torsion drive training data for Sage 2.2.0 proper torsions and new parameters from the torsion multiplicity workN, Br, H, P, Cl, O, C, S
OpenFF Phosphate Torsion Drives v1.02024-07-17-OpenFF-Phosphate-Torsion-Drives-v1.0Lipid-like phosphate torsionsC, S, N, H, O, P
OpenFF Alkane Torsion Drives v1.02024-08-09-OpenFF-Alkane-Torsion-Drives-v1.0Alka/ene torsion drivesC, H
OpenFF Cresset Additional Coverage TorsionDrives v4.02025-02-12-OpenFF-Cresset-Additional-Coverage-TorsionDrives-v4.0Additional torsiondrives from CressetS, Br, C, N, Cl, O, F, H
OpenFF Additional Generated ChEMBL TorsionDrives 4.02025-04-10-OpenFF-Additional-ChEMBL-Fragment-TorsionDrives-v4.0Additional torsiondrives from ChEMBL fragmentsO, Cl, Br, C, I, P, F, H, N, S
OpenFF Additional Generated Guanidine Amidine Derivative and O-Linker TorsionDrives 4.02025-04-10-OpenFF-Additional-Generated-Guanidine-Amidine-Derivative-O-linkers-TorsionDrives-4.0Additional manually-generated torsiondrivesS, C, N, O, H
TorsionNet500 Re-optimization TorsionDrives v4.02026-02-12-TorsionNet500-Re-optimization-TorsionDrives-v4.0TorsionNet500 re-optimized with OpenFF default specN, Cl, O, S, H, C, F
OpenFF Lipid Torsion Drives v4.12026-03-23-OpenFF-Lipid-Torsion-Drives-v4.1Additional torsiondrive training data for lipid parametersC, O, P, H
OpenFF PEG Ether Fragments TorsionDrives v4.02026-06-15-OpenFF-PEG-Ether-Fragments-TorsionDrives-v4.0TorsionDrives of the central O-CH2-CH2-O glycol torsion in small poly(ethylene glycol) (PEG) ether fragmentsC, H, O

GridOptimization Datasets

These are currently used perform a scan of one or more internal coordinates (bond, angle, torsion), where optimizations are performed over a discrete set of values.

QCArchive DatasetFolderDescriptionElementsStatus
OpenFF Trivalent Nitrogen Set 12019-06-28-Nitrogen-grid-optimizationSet of diverse trivalent nitrogen molecules for 1-D grid optimization.Si, Cl, Br, F, C, S, P, B, I, O, H, NError
OpenFF Trivalent Nitrogen Set 22019-12-09-Nitrogen-grid-optimization-2dSet of diverse trivalent nitrogen molecules for 2-D grid optimizationSi, Cl, Br, F, C, S, P, B, I, O, H, NError
OpenFF Trivalent Nitrogen Set 32020-01-15-Nitogen-grid-optimization-02-1dscansSet of diverse trivalent nitrogen molecules for 1-D grid optimization, this is a secondary datasetCl, Br, S, C, F, O, H, NError