File formats - noinil/genesis_cg_tool GitHub Wiki

Table of Contents

  1. Topology file
    1. Syntax
    2. Directives
    3. Nonlocal interactions
  2. Molecule .itp file
    1. Directives
    2. Detailed formats of each block
      1. Atoms
      2. Bonds
      3. Angles
      4. Dihedrals
      5. Pairs
      6. PWMcos type of protein-DNA sequence-specific interactions
      7. Protein-DNA sequence-nonspecific hydrogen-bond
      8. IDR region information for HPS and KH models
  3. Coordinate file
  4. References

Basically, we adapted the GROMACS-like format of topology and coordinate files for CG simulations in GENESIS. Here we briefly describe the formats so that you can make changes to the files if necessary. Please be careful on the differences between the GENESIS version and the original one.

Topology file

The topology (.top) file can be generated by the genesis_cg_tools. In this file we define the molecular composition and the inter-molecular interactions. A typical .top file looks like this:

; common interaction parameters for CG models
#include "./param/atom_types.itp"
#include "./param/flexible_local_angle.itp"
#include "./param/flexible_local_dihedral.itp"
#include "./param/pair_energy_MJ_96.itp"

; Molecule topology (itp)
#include "./top/protein_A.itp"

[ system ]
protein A CG simulation

[ molecules ]
protein_A  100

[ cg_ele_chain_pairs ]
ON 1 - 100 : 1 - 100

Syntax

As can be seen, there are different types of lines in this file:

  • Comments: starting with ";".
  • "#include" lines: to include general parameters as well as the molecule-specific .itp files. Note that the path should be either absolute or relative to the ".top" file's directory.
  • Directives: encompassed by "[" and "]", indicating a specific block.
  • Normal lines: detailed information.

Warning: the first four lines include .itp files contain the general and default parameters from the existing CG models. Please DO NOT modify these files unless you are sure about what you are doing.

Directives

In the following table we list the available directives:

Directives Typ Information
[ system ] mandatory System name.
[ molecules ] mandatory Molecule name-molecule number pairs.
[ xxx_chain_pairs ] optional Nonlocal interactions ("xxx" can be "cg_ele", "cg_KH", "pwmcos", or "pwmcosns".

Note: the directive format is free and space is not mandatory inside the squared parantheses.

Nonlocal interactions

The content lines for the [ xxx_chain_pairs ] share the same format: SWITCH i - j : k - l (for inter and intra molecular interactions) or SWITCH i - j (for intra molecular interactions), where "SWITCH" can be "ON" or "OFF", and i, j, k, l are integer numbers (chain indices).

Imagine an $n\times n$ "instance matrix" $A$ for a system consisting of n chains. GENESIS reads this matrix to set up the intra/inter-molecular nonlocal interactions. Specifically, for chain $i$ and chain $j$, if $A(i, j) = 1$, a type of nonlocal interaction is turned on for particles in these two chains; whereas if $A(i, j) = 0$, the interactin is turned off. For each type of nonlocal interactions (for example electrostatics or protein-DNA sequence-specific recognition), there is a specific instance matrix. The default element values of these matrices are all 0.

Now let's see what these lines in the topology file do:

  • ON i - j : k - l sets $A(c, d) = 1$ for all $i \le c \le j$ and $k \le d \le l$;
  • OFF i - j : k - l sets $A(c, d) = 0$ for all $i \le c \le j$ and $k \le d \le l$;
  • ON i - j sets $A(c, c) = 1$ for all $i \le c \le j$;
  • OFF i - j sets $A(c, c) = 0$ for all $i \le c \le j$.

Molecule .itp file

Usually we use one .itp file to store the information of a particular molecule. A typical one looks like this:

[ moleculetype ]
;name            nrexcl
ROM_cg                3

[ atoms ]
;       nr type     resnr  res atom   cg   charge     mass
         1  THR         1  THR   CA    1    0.000  101.110
         2  LYS         2  LYS   CA    1    1.000  128.170
         3  GLN         3  GLN   CA    1    0.000  128.140
...

[ bonds ]
;        i         j    f                eq              coef
         1         2    1        3.7869E-01        8.3680E+04
         2         3    1        3.8733E-01        8.3680E+04
         3         4    1        3.8304E-01        8.3680E+04
...

[ angles ]
;        i         j         k    f             eq           coef
         1         2         3    1     8.8380E+01     1.6736E+02
         2         3         4    1     9.1330E+01     1.6736E+02
         3         4         5    1     8.8309E+01     1.6736E+02
...

[ dihedrals ]
;        i         j         k         l    f             eq           coef            w/n
         1         2         3         4   32    -1.2595E+02     4.1840E+00              1
         2         3         4         5   32    -1.3173E+02     4.1840E+00              1
         3         4         5         6   32    -1.2827E+02     4.1840E+00              1
...

[ pairs ]
;        i         j         f             eq           coef
         1         5         2     6.2602E-01     4.1840E+00
         1         6         2     8.6459E-01     4.1840E+00
         1        55         2     8.2667E-01     4.1840E+00
...

Directives

Directives Typ Information
[ moleculetype ] mandatory Molecule name and nrexcl (number of particles considered as local).
[ atoms ] mandatory Atom (CG particle) information.
[ bonds ] optional Bond terms and parameters.
[ angles ] optional Angle or 1-3 terms and parameters.
[ dihedrals ] optional Dihedral angle terms and parameters.
[ pairs ] optional Nonbonded terms and parameters.
[ cg_IDR_HPS_region ] optional IDR region information (the HPS IDR model). [^1]
[ cg_IDR_KH_region ] optional IDR region information (the KH IDR model). [^1]
[ pwmcos ] optional Information for the PWMcos-model of protein-DNA sequence-specific interactions. [^2]
[ pwmcosns ] optional Information for the protein-DNA sequence-nonspecific hydrogen-bond interactions. [^3]

Detailed formats of each block

Atoms

The [ atoms ] block contains the following information for each CG particle:

  • particle number;
  • particle type;
  • residue number;
  • residue name;
  • particle name;
  • particle group (always 1);
  • particle charge (unit: $e^-$);
  • particle mass (unit: $amu$ ).

printf format: "%10d%5s%10d%5s%5s%5d %8.3f %8.3f\n".

Bonds

The [ bonds ] block contains the following information for each bond potential:

  • particle index i;
  • particle index j;
  • function type;
  • equilibrium value (unit: $nm$);
  • force coefficient (unit: $kJ \cdot mol^{-1} nm^{-2}$).
Type Interaction type
1 $\sum_{i} k_{i}(r_{i}-r_{i,0})^{2}$
21 $\sum_{i} k_{i}(r_{i}-r_{i,0})^{2} + 100 k_{i}(r_{i}-r_{i,0})^{4}$

printf format: "%10d%10d%5d%18.4E%18.4E\n".

Angles

The [ angles ] block contains the following information for each angle (or "1-3") potential:

  • particle index i;
  • particle index j;
  • particle index k;
  • function type;
  • equilibrium value;
  • force coefficient;
  • Gaussian width (only available for function type 21).
Type Interaction type Parameters and units printf format
1 $\sum_{i} k_{i} (\theta_{i} - \theta_{i, 0})^{2}$ $\theta_{i, 0}$ ( $^\circ$ ); $k_{i}$ ( $kJ \cdot mol^{-1} rad^{-2}$ ) "%10d%10d%10d%5d%15.4E%15.4E\n"
21 $\displaystyle\sum_{i} \epsilon_{i} \exp(\frac{-(r_{i} - r_{i,0})^{2}}{2w_{i}^{2}})$ $r_{i, 0}$ ( $nm$ ); $\epsilon_{i}$ ( $kJ \cdot mol^{-1}$ ); $w_{i}$ ( $nm$ ) "%10d%10d%10d%5d%15.4E%15.4E%15.4E\n"
22 $\displaystyle\sum_{i} -k_{B}T \ln \frac{P(\theta)}{\sin \theta }$ "%10d%10d%10d%5d\n"

Dihedrals

The [ dihedrals ] block contains the following information for each dihedral angle potential:

  • particle index i;
  • particle index j;
  • particle index k;
  • particle index l;
  • function type;
  • equilibrium value;
  • force coefficient;
  • Gaussian width (for function type 21, 41) or periodicity (for function type 1, 32).
Type "Safe" type Interaction type Parameters and units printf format
1 32 $\sum_{i} k_{i} (1 + \cos ( n (\phi_{i} - \phi_{i, 0}) ) )$ $\phi_{i, 0}$ ( $^\circ$ ); $k_{i}$ ( $kJ \cdot mol^{-1}$ ); $n$ (1) "%10d%10d%10d%10d%5d%15.4E%15.4E%15d\n"
21 41 $\displaystyle\sum_{i} -\epsilon_{i} \exp(\frac{-(\phi_i - \phi_{i,0})^2}{2\sigma_i^2})$ $\phi_{i, 0}$ ( $^\circ$ ); $\epsilon_{i}$ ( $kJ \cdot mol^{-1}$ ); $\sigma_{i}$ ( $rad$ ) "%10d%10d%10d%10d%5d%15.4E%15.4E%15.4E\n"
22 52 $\sum_{i} -k_{B}T \ln P(\phi)$ "%10d%10d%10d%10d%5d\n"

Column 2 is the "function type" of the "safe dihedral angle" types designed to avoid singularity problems in energy and force calculations [^4].

Pairs

The [ pairs ] block contains the following information for each nonbonded pairwise interaction (mainly used for Go-like native contacts):

  • particle index i;
  • particle index j;
  • function type;
  • equilibrium value of distance;
  • force coefficient.
Type Interaction type Parameters and units printf format
2 $\displaystyle\sum_{i} \epsilon_{i} ( 5\left(\frac{\sigma_i}{ r_i}\right)^{12} - 6\left(\frac{\sigma_i}{r_i}\right)^{10} )$ $\sigma_{i, 0}$ ( $nm$ ); $\epsilon_{i}$ ( $kJ \cdot mol^{-1}$ ) "%10d%10d%10d%15.4E%15.4E\n"

PWMcos type of protein-DNA sequence-specific interactions

The [ pwmcos ] block contains the following information for each PWMcos-type interaction [^2]:

  • protein residue index;
  • function type (always 1);
  • equilibrium distance;
  • $\theta_1$ (angle Sugar - Base - $C_\alpha$ ) (unit: $^\circ$);
  • $\theta_2$ (angle between Base - $C_\alpha$ and Base ${}{-1}$ - Base ${}{+1}$ ) (unit: $^\circ$)
  • $\theta_3$ (angle between Base - $C_\alpha$ and $C_{\alpha,+1}$ - $C_{\alpha, -1}$ ) (unit: $^\circ$)
  • $\epsilon_A$ (unit: $k_B T$);
  • $\epsilon_C$ (unit: $k_B T$);
  • $\epsilon_G$ (unit: $k_B T$);
  • $\epsilon_T$ (unit: $k_B T$);
  • $\gamma$ (unit: 1);
  • $\epsilon'$ (unit: $kcal\cdot mol^{-1}$).

printf format: "%6d %3d %8.5f %8.3f %8.3f %8.3f%12.6f%12.6f%12.6f%12.6f%8.3f%8.3f \n".

Protein-DNA sequence-nonspecific hydrogen-bond

The [ pwmcosns ] block contains the following information for each protein-DNA hydrogen-bond interaction [^3]:

  • protein residue index;
  • function type (always 2);
  • equilibrium distance;
  • $\theta_1$ (angle Sugar - Base - $C_\alpha$) (unit: $^\circ$);
  • $\theta_3$ (angle between Base - $C_\alpha$ and $C_{\alpha,+1}$ - $C_{\alpha, -1}$) (unit: $^\circ$)
  • $\epsilon$ (unit: $kcal\cdot mol^{-1}$).

printf format: "%6d %3d %8.5f %8.3f %8.3f %8.3f \n".

IDR region information for HPS and KH models

The [ cg_IDR_HPS_region ] or [ cg_IDR_KH_region ] block contains the following information for the intrinsically disordered region [^1]:

  • IDR starting index i;
  • IDR ending index j.

printf format: "%10d %10d\n".

Coordinate file

The CG coordinates of molecules are written in the .gro files. An example file looks like this:

MOL_NAME, t =            0.000
    1000
    1  ALA   CA    1   8.2333   0.6870  18.6981   0.0000   0.0000   0.0000
    2  THR   CA    2   7.9130   0.8916  18.6893   0.0000   0.0000   0.0000
    3  LEU   CA    3   8.0104   1.0025  18.3395   0.0000   0.0000   0.0000
...
      100.0000         100.0000         100.0000

The first line is the system information. The second line writes the number of CG particles. The last line contains the box sizes. From the third line to the second last line are the coordinate information for all the particles.

Each coordinate line has the following information:

  • residue number;
  • residue name;
  • particle name;
  • particle number;
  • $x$ (unit: $nm$);
  • $y$ (unit: $nm$);
  • $z$ (unit: $nm$);
  • $vx$ (not used);
  • $vy$ (not used);
  • $vz$ (not used).

printf format: "%5d%5s%5s%5d %8.4f %8.4f %8.4f %8.4f %8.4f %8.4f \n".

Note that in GENESIS we use a "fixed + free-style" .gro file, which means that except for the first 20-chars, you can use a relatively free format for the coordinates. For instance, a format string of "%5d%5s%5s%5d %18.6f %18.6f %18.6f %18.6f %18.6f %18.6f \n" is also acceptable.

[^1]: Dignon, G. L., et al., PLoS Comput Biol, 14(1), e1005941 (2018).

[^2]: Tan, C. & Takada, S., J Chem Theory Comput 14, 3877–3889 (2018).

[^3]: Niina, T., Brandani, G. B., Tan, C. & Takada, S., PLOS Comput Biol 13, e1005880 (2017).

[^4]: Tan, C., Jung, J., Kobayashi, C., & Sugita, Y. (2020). The Journal of Chemical Physics, 153(4), 044110.