About The Project

April 29, 2025 · View on GitHub

CI Create Release Branch

Root mean squared deviation (RMSD) is one of the most common metrics for comparing the similarity of three-dimensional chemical structures. The molecular-oriented RMSD with branch-and-bound (mobbRMSD) is an RMSD-based metric for 3D chemical structure similarity. mobbRMSD is formulated in molecular-oriented coordinates and uses the branch-and-bound method to obtain an exact solution. It can handle large and complex chemical systems such as molecular liquids, solvationsof solute, and self-assembly of large molecules, which are difficult to handle using conventional methods.

Define molecular oriented coordinates as follows:

X={X1,X2,,XK}, \mathbf{X} = \left\{ \mathbf{X}_{1},\mathbf{X}_{2},\ldots,\mathbf{X}_{K} \right\},

where X_IRd×nk×Mk\mathbf{X}\_I\in\mathbb{R}^{d\times n^{k}\times M^{k}} is coordinates of the kk-th homologous molecular assemblies, dd is a number of spatial dimentions, and MkM^{k} and nkn^{k} are a number of molecules and the number of atoms per molecule, respectively,

For a molecular-oriented coordinate pair X\mathbf{X} and X\mathbf{X}' that consisting of MM molecules of nn atoms, the moRMSD is defined as follows:

moRMSD(X,X)=minR,c,μ(k),νI(k)1k=1KM(k)n(k)k=1KI=1M(k)j=1n(k)xj,I(k)RxνI(k)(j),μ(k)(I)(k)c2, \text{moRMSD}\left(\mathbf{X},\mathbf{X}'\right) = \min_{\mathbf{R},c,\mu^{(k)},\nu_{I}^{(k)}} \sqrt{\frac{1}{\sum_{k=1}^{K}M^{(k)}n^{(k)}} \sum_{k=1}^{K} \sum_{I=1}^{M^{(k)}} \sum_{j=1}^{n^{(k)}} \left\|{x}^{(k)}_{j,I}-\mathbf{R}{{x}'}^{(k)}_{\nu^{(k)}_{I}(j),\mu^{(k)}(I)}-{c}\right\|^2},

where x(k)_j,Ix^{(k)}\_{j,I} and x(k)_j,I{x'}^{(k)}\_{j,I} are the Cartesian coordinates of jj-th atom in the II-th molecule of molecular species kk corresponding to X\mathbf{X} and X\mathbf{X}', respectively, cc is a translation vecotor, and R\mathbf{R} is a rotation matrix. ν(k)_I\nu^{(k)}\_{I} and μ(k)\mu^{(k)} are permutations on 1,,n(k)\\{1,\ldots,n^{(k)}\\} and 1,,M(k)\\{1,\ldots,{M}^{(k)}\\}, respectively. ν(k)_I\nu^{(k)}\_{I} takes the appropriate domain of definition corresponding to the molecular topology. Since ν(k)_I\nu^{(k)}\_{I} and μ(k)\mu^{(k)} expand the solution space by factorial and exponential costs with respect to M(k)M^{(k)}, respectively, It is difficult to find a solution by brute force when the number of molecules is large.

mobbRMSD practically eliminates this difficulty by using the branch-and-bound method. See Back Ground and Benchmark for details.

(back to top)

Getting Started

Prerequisites

  • gfortran >= 9.4.0
  • OpenBLAS (optional)
  • OpenMP (optional)

To use the Python interface, you additionally need the following:

  • python >= 3.8
  • pip

Installation

You can use package build via

pip install git+https://github.com/yymmt742/mobbrmsd.git

Running some demonstrations via

python -m mobbrmsd demo

(back to top)

Usage

The input is json format, and a simple example is as follows

{
  "reference":"./path/to/file1.pdb",
  "target":"./path/to/file2.xyz",
  "mols":[
    {
     "n_apm":2,
     "n_mol":1,
     "name":"HydrogenFluoride"
    },
    {
     "n_apm":3,
     "n_mol":4,
     "sym":[[ 1, 3, 2]],
     "name":"Water"
    }
  ]
}

In this example, the system contains one hydrogen fluoride and four waters. Intramolecular permutations resulting from the swapping of hydrogen positions are specified for water molecules. sym is a list of index arrays (list[int]) enumerating the intramolecular permutations represented by substitutions. Identity permutation (i.e., [0,1,... n_apm-1]) are ignored if you input it.

The file load backend is MDtraj. See documentation for supported formats. Only coordinates are referenced from the file; information such as residues and atom types are not used. The coordinates must have Cartesian coordinates in the order specified in json file as follows.

> cat file1.pdb
ATOM      1  F   HF  A   0       X.XXX   Y.YYY   Z.ZZZ  1.00  0.00           F
ATOM      2  H   HF  A   0       X.XXX   Y.YYY   Z.ZZZ  1.00  0.00           H
ATOM      3  OH  HF  A   1       X.XXX   Y.YYY   Z.ZZZ  1.00  0.00           O
ATOM      4  H1  WAT A   1       X.XXX   Y.YYY   Z.ZZZ  1.00  0.00           H
ATOM      5  H2  WAT A   1       X.XXX   Y.YYY   Z.ZZZ  1.00  0.00           H
ATOM      6  OH  WAT A   2       X.XXX   Y.YYY   Z.ZZZ  1.00  0.00           O
ATOM      7  H1  WAT A   2       X.XXX   Y.YYY   Z.ZZZ  1.00  0.00           H
ATOM      8  H2  WAT A   2       X.XXX   Y.YYY   Z.ZZZ  1.00  0.00           H
ATOM      9  OH  WAT A   3       X.XXX   Y.YYY   Z.ZZZ  1.00  0.00           O
ATOM     10  H1  WAT A   3       X.XXX   Y.YYY   Z.ZZZ  1.00  0.00           H
ATOM     11  H2  WAT A   3       X.XXX   Y.YYY   Z.ZZZ  1.00  0.00           H
ATOM     12  OH  WAT A   4       X.XXX   Y.YYY   Z.ZZZ  1.00  0.00           O
ATOM     13  H1  WAT A   4       X.XXX   Y.YYY   Z.ZZZ  1.00  0.00           H
ATOM     14  H2  WAT A   4       X.XXX   Y.YYY   Z.ZZZ  1.00  0.00           H  

You can run the calculations with the following commands:

python -m mobbrmsd run -i <input>

(back to top)

Background

(back to top)

Benchmark

(back to top)

Roadmap

  • Add Usage
  • Enable autovariance sorting
  • Enable skip tree
  • Compatible with compilers (intel)
  • Compatible with compilers (nv)
  • Add detail documentation
  • Add detail documentation (Python interface)
  • Add benchmarks
  • Internalize lapack

See the open issues for a full list of proposed features (and known issues).

(back to top)

Contributing

This project is open source and we invite contributions. If you have a suggestion that would make this better, please fork the repo and create a pull request.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

YYMMT742 - yymmt@kuchem.kyoto-u.ac.jp

(back to top)

Reference

Further details are available from the following publications:

This project is based on the following papers:

Molecular superposition

Linear assignment problem

(back to top)