About The Project
April 29, 2025 · View on GitHub
Root mean squared deviation (RMSD) is one of the most common metrics for comparing the similarity of three-dimensional chemical structures. The molecular-oriented RMSD with branch-and-bound (mobbRMSD) is an RMSD-based metric for 3D chemical structure similarity. mobbRMSD is formulated in molecular-oriented coordinates and uses the branch-and-bound method to obtain an exact solution. It can handle large and complex chemical systems such as molecular liquids, solvationsof solute, and self-assembly of large molecules, which are difficult to handle using conventional methods.
Define molecular oriented coordinates as follows:
where is coordinates of the -th homologous molecular assemblies, is a number of spatial dimentions, and and are a number of molecules and the number of atoms per molecule, respectively,
For a molecular-oriented coordinate pair and that consisting of molecules of atoms, the moRMSD is defined as follows:
where and are the Cartesian coordinates of -th atom in the -th molecule of molecular species corresponding to and , respectively, is a translation vecotor, and is a rotation matrix. and are permutations on and , respectively. takes the appropriate domain of definition corresponding to the molecular topology. Since and expand the solution space by factorial and exponential costs with respect to , respectively, It is difficult to find a solution by brute force when the number of molecules is large.
mobbRMSD practically eliminates this difficulty by using the branch-and-bound method. See Back Ground and Benchmark for details.
Getting Started
Prerequisites
- gfortran >= 9.4.0
- OpenBLAS (optional)
- OpenMP (optional)
To use the Python interface, you additionally need the following:
- python >= 3.8
- pip
Installation
You can use package build via
pip install git+https://github.com/yymmt742/mobbrmsd.git
Running some demonstrations via
python -m mobbrmsd demo
Usage
The input is json format, and a simple example is as follows
{
"reference":"./path/to/file1.pdb",
"target":"./path/to/file2.xyz",
"mols":[
{
"n_apm":2,
"n_mol":1,
"name":"HydrogenFluoride"
},
{
"n_apm":3,
"n_mol":4,
"sym":[[ 1, 3, 2]],
"name":"Water"
}
]
}
In this example, the system contains one hydrogen fluoride and four waters. Intramolecular permutations resulting from the swapping of hydrogen positions are specified for water molecules. sym is a list of index arrays (list[int]) enumerating the intramolecular permutations represented by substitutions. Identity permutation (i.e., [0,1,... n_apm-1]) are ignored if you input it.
The file load backend is MDtraj. See documentation for supported formats. Only coordinates are referenced from the file; information such as residues and atom types are not used. The coordinates must have Cartesian coordinates in the order specified in json file as follows.
> cat file1.pdb
ATOM 1 F HF A 0 X.XXX Y.YYY Z.ZZZ 1.00 0.00 F
ATOM 2 H HF A 0 X.XXX Y.YYY Z.ZZZ 1.00 0.00 H
ATOM 3 OH HF A 1 X.XXX Y.YYY Z.ZZZ 1.00 0.00 O
ATOM 4 H1 WAT A 1 X.XXX Y.YYY Z.ZZZ 1.00 0.00 H
ATOM 5 H2 WAT A 1 X.XXX Y.YYY Z.ZZZ 1.00 0.00 H
ATOM 6 OH WAT A 2 X.XXX Y.YYY Z.ZZZ 1.00 0.00 O
ATOM 7 H1 WAT A 2 X.XXX Y.YYY Z.ZZZ 1.00 0.00 H
ATOM 8 H2 WAT A 2 X.XXX Y.YYY Z.ZZZ 1.00 0.00 H
ATOM 9 OH WAT A 3 X.XXX Y.YYY Z.ZZZ 1.00 0.00 O
ATOM 10 H1 WAT A 3 X.XXX Y.YYY Z.ZZZ 1.00 0.00 H
ATOM 11 H2 WAT A 3 X.XXX Y.YYY Z.ZZZ 1.00 0.00 H
ATOM 12 OH WAT A 4 X.XXX Y.YYY Z.ZZZ 1.00 0.00 O
ATOM 13 H1 WAT A 4 X.XXX Y.YYY Z.ZZZ 1.00 0.00 H
ATOM 14 H2 WAT A 4 X.XXX Y.YYY Z.ZZZ 1.00 0.00 H
You can run the calculations with the following commands:
python -m mobbrmsd run -i <input>
Background
Benchmark
Roadmap
- Add Usage
- Enable autovariance sorting
- Enable skip tree
- Compatible with compilers (intel)
- Compatible with compilers (nv)
- Add detail documentation
- Add detail documentation (Python interface)
- Add benchmarks
- Internalize lapack
See the open issues for a full list of proposed features (and known issues).
Contributing
This project is open source and we invite contributions. If you have a suggestion that would make this better, please fork the repo and create a pull request.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
License
Distributed under the MIT License.
See LICENSE.txt for more information.
Contact
YYMMT742 - yymmt@kuchem.kyoto-u.ac.jp
Reference
Further details are available from the following publications:
This project is based on the following papers:
Molecular superposition
- Kabsch, W. A solution for the best rotation to relate two sets of vectors. Acta Crystallogr., Sect. A 1976, 32, 922-923
- Theobald, D. L. Rapid calculation of RMSDs using a quaternion-based characteristic polynomial. Acta Crystallogr., Sect. A:Found. Crystallogr._ 2005, 61, 478-480
- Coutsias, E. A.; Wester, M. J. RMSD and Symmetry. J. Comput. Chem. 2019, 40, 1496-1508
Linear assignment problem