ArbAlign (v2)

June 19, 2026 · View on GitHub

Optimal alignment of arbitrarily ordered molecular isomers using the Kuhn-Munkres / Hungarian algorithm and Kabsch RMSD.

Background

When comparing two molecular structures that are isomers of each other their atoms may appear in any order. A naïve atom-by-atom RMSD is meaningless unless the atoms are matched optimally first. ArbAlign:

  1. Groups atoms by element (or, optionally, by SYBYL type or MNA connectivity).
  2. For each group, finds the optimal one-to-one atom assignment via the Kuhn-Munkres (Hungarian) algorithm (scipy.optimize.linear_sum_assignment).
  3. Tries all 48 combinations of axis permutations and sign flips to escape local minima that arise from symmetric point groups.
  4. Applies the full Kabsch rotation to superpose the aligned structure onto the reference.

If you use ArbAlign in published research, please cite:

Berhane Temelso, Joel M. Mabey, Toshiro Kubota, Nana Appiah-padi, George C. Shields. J. Chem. Info. Model. 2017, 57(5), 1045–1054. https://doi.org/10.1021/acs.jcim.6b00546


Installation

# Core (no optional dependencies)
pip install -e .

# With SYBYL-type and MNA connectivity labelling
pip install -e ".[openbabel]"

# Development (adds pytest)
pip install -e ".[dev]"

Python 3.6+ and NumPy ≥ 1.17 / SciPy ≥ 1.3 are required.


Quick start

# Full search (48 axis combos, element-label matching — default)
arbalign Mol-A.xyz Mol-B.xyz

# Fast mode (skip axis search)
arbalign -s Mol-A.xyz Mol-B.xyz

# Ignore hydrogens
arbalign -n Mol-A.xyz Mol-B.xyz

# Match by SYBYL atom type (requires openbabel-wheel)
arbalign -b t Mol-A.xyz Mol-B.xyz

# Match by MNA connectivity (requires openbabel-wheel)
arbalign -b c Mol-A.xyz Mol-B.xyz

# Verbose output (print every candidate RMSD)
arbalign -v Mol-A.xyz Mol-B.xyz

You can also invoke via python -m arbalign.


Python API

from arbalign import Molecule, align

mol_a = Molecule.from_xyz("Mol-A.xyz")
mol_b = Molecule.from_xyz("Mol-B.xyz")

result = align(mol_a, mol_b)
print(f"Best RMSD: {result.best_rmsd:.3f} Å")
print(f"Swap:      {result.swap}")
print(f"Reflect:   {result.reflection}")

# result.aligned is a Molecule in the original B atom order,
# with coordinates superposed onto A.
result.aligned.to_xyz("Mol-B-aligned.xyz")

Molecule

Method / attributeDescription
Molecule.from_xyz(path, no_hydrogens=False)Read XYZ file
mol.to_xyz(path, title=None)Write XYZ file
mol.labelslist[str] of element/type labels
mol.coordsndarray shape (N, 3)
mol.element_counts(){element: count}
mol.unique_elements()sorted unique labels
mol.indices_of(element)indices matching label
mol.centroid()geometric centroid
mol.centered()copy translated to origin
mol.with_labels(new_labels)copy with different labels
mol.sorted_copy()(sorted_mol, orig_indices)
mol.validate_compatible(other)raises ValueError on mismatch

align

align(mol_a, mol_b, simple=False, verbose=False) -> AlignResult
AlignResult attributeDescription
initial_rmsdKabsch RMSD before any reordering
sorted_rmsdKabsch RMSD after sorting both by element
best_rmsdKabsch RMSD after optimal reordering + axes search
swapBest axis permutation tuple e.g. (0, 2, 1)
reflectionBest sign-flip tuple e.g. (-1, 1, 1)
alignedMolecule in original B order, superposed onto A

Running tests

pytest tests/ -v

Repository layout

Arbalign-improved/
├── arbalign/
│   ├── __init__.py      # public API exports
│   ├── __main__.py      # python -m arbalign entry point
│   ├── cli.py           # argparse CLI (replaces ArbAlign-driver.py + ArbAlign.py)
│   ├── molecule.py      # Molecule class and XYZ I/O
│   ├── core.py          # Kabsch RMSD / superpose / rotation
│   ├── align.py         # Kuhn-Munkres + 48-isometry search
│   └── labeling.py      # SYBYL / MNA relabelling via OpenBabel
├── tests/
│   ├── conftest.py
│   ├── test_core.py
│   ├── test_molecule.py
│   ├── test_align.py
│   └── fixtures/        # Mol-A.xyz, Mol-B.xyz, 10-1.xyz, 10-2.xyz
└── pyproject.toml