Molecule configuration similarity database - ProkopHapala/FireCore GitHub Wiki

Introduction

In order to find global minimum structure (i.e. Global Optimization) or map potential energy surface (PES) of a molecule e.g. to estimate the configuration entropy or to prepare training configurations for fitting of a force-field, we need to evaluate large number of molecular configurations. Most of algorithms for configuration sampling visit the same area of configuration space repeatedly, therefore it is useful to be able to check whether the new configurations are similar to those already sampled. This ability can significantly speed up some global optimization and configuration sampling algorithm, as well as to prepare more balanced and representative statistics about the PES (for training force-fied and entropy estimation).

However robust and accurate comparison of two general structures may be rather demanding. Assuming that configuration sampling using classical force-field can easily generate millions of different structures, comparison between the new structure and all structures in the memory may easily become a bottleneck. Therefore it make sense to think about clever accelerated schemes for matching of structures and quarrying from a database. This algorithms will significantly differ depending on assumptions about symmetries and invariant degrees of freedom in the molecule.

Orientation invariance (molecule vs lattice)

Assuming molecule with fixed topology and fixed ordering of atoms, we could calculate distance between two molecular configurations configurations $A$, $B$ simply as euclidean distance between cartesian atomic positions ${\vec r_i}$.

$$ \Delta_e(A,B) = \sqrt{ \sum_i{ | {\vec r}^A_{i} - {\vec r}^B_{i} |^2 }} $$

However, physical properties, such as energy, of molecule floating freely in vacuum does not depend on its position and rotation in space. Therefore physically meaningful similarity metric should not depend on molecule orientation as well. This is can be achieved by orienting the molecule in the same way. This can be done by several ways

  • Rigid fit of atomic coordinates - we can simply minimize the distance with respect to varying translation vector $\vec d$ and rotation matrix $\hat R$.

$$ \Delta_e(A,B)|(min.{\vec r},{\hat R}) = \sqrt{ \sum_i{ | {\vec r}^A_{i} + {\vec d} - {\hat R}{\vec r}^B_{i} |^2 }} $$

this can be done also inside VMD program. Nevertheless, this is relatively slow iterative procedure.

  • moment matching - faster method to obtain approximate match is to calculate low moments of atomic positions such as
  1. center of mass
  2. Polar vectors of rotation ellipsoide (obtained by diagonalization of matrix of second moments of positions) This can be done for different atom types (e.g. Carbon, Oxygen, Nitrogen) independently.

Fixed topology and permutation invariance

Molecular symmery