File formats - noinil/genesis_cg_tool GitHub Wiki
Table of Contents
Basically, we adapted the GROMACS-like format of topology and coordinate files for CG simulations in GENESIS. Here we briefly describe the formats so that you can make changes to the files if necessary. Please be careful on the differences between the GENESIS version and the original one.
Topology file
The topology (.top
) file can be generated by the genesis_cg_tools
. In
this file we define the molecular composition and the inter-molecular
interactions. A typical .top
file looks like this:
; common interaction parameters for CG models
#include "./param/atom_types.itp"
#include "./param/flexible_local_angle.itp"
#include "./param/flexible_local_dihedral.itp"
#include "./param/pair_energy_MJ_96.itp"
; Molecule topology (itp)
#include "./top/protein_A.itp"
[ system ]
protein A CG simulation
[ molecules ]
protein_A 100
[ cg_ele_chain_pairs ]
ON 1 - 100 : 1 - 100
Syntax
As can be seen, there are different types of lines in this file:
- Comments: starting with "
;
". - "
#include
" lines: to include general parameters as well as the molecule-specific.itp
files. Note that the path should be either absolute or relative to the ".top" file's directory. - Directives: encompassed by "
[
" and "]
", indicating a specific block. - Normal lines: detailed information.
Warning: the first four lines include .itp
files contain the general and
default parameters from the existing CG models. Please DO NOT modify these
files unless you are sure about what you are doing.
Directives
In the following table we list the available directives:
Directives | Typ | Information |
---|---|---|
[ system ] |
mandatory | System name. |
[ molecules ] |
mandatory | Molecule name-molecule number pairs. |
[ xxx_chain_pairs ] |
optional | Nonlocal interactions ("xxx " can be "cg_ele ", "cg_KH ", "pwmcos ", or "pwmcosns ". |
Note: the directive format is free and space is not mandatory inside the squared parantheses.
Nonlocal interactions
The content lines for the [ xxx_chain_pairs ]
share the same format: SWITCH i - j : k - l
(for inter and intra molecular interactions) or SWITCH i - j
(for intra molecular interactions), where "SWITCH
" can be "ON
" or "OFF
",
and i
, j
, k
, l
are integer numbers (chain indices).
Imagine an $n\times n$ "instance matrix" $A$ for a system consisting of n chains. GENESIS reads this matrix to set up the intra/inter-molecular nonlocal interactions. Specifically, for chain $i$ and chain $j$, if $A(i, j) = 1$, a type of nonlocal interaction is turned on for particles in these two chains; whereas if $A(i, j) = 0$, the interactin is turned off. For each type of nonlocal interactions (for example electrostatics or protein-DNA sequence-specific recognition), there is a specific instance matrix. The default element values of these matrices are all 0.
Now let's see what these lines in the topology file do:
ON i - j : k - l
sets $A(c, d) = 1$ for all $i \le c \le j$ and $k \le d \le l$;OFF i - j : k - l
sets $A(c, d) = 0$ for all $i \le c \le j$ and $k \le d \le l$;ON i - j
sets $A(c, c) = 1$ for all $i \le c \le j$;OFF i - j
sets $A(c, c) = 0$ for all $i \le c \le j$.
.itp
file
Molecule Usually we use one .itp
file to store the information of a particular
molecule. A typical one looks like this:
[ moleculetype ]
;name nrexcl
ROM_cg 3
[ atoms ]
; nr type resnr res atom cg charge mass
1 THR 1 THR CA 1 0.000 101.110
2 LYS 2 LYS CA 1 1.000 128.170
3 GLN 3 GLN CA 1 0.000 128.140
...
[ bonds ]
; i j f eq coef
1 2 1 3.7869E-01 8.3680E+04
2 3 1 3.8733E-01 8.3680E+04
3 4 1 3.8304E-01 8.3680E+04
...
[ angles ]
; i j k f eq coef
1 2 3 1 8.8380E+01 1.6736E+02
2 3 4 1 9.1330E+01 1.6736E+02
3 4 5 1 8.8309E+01 1.6736E+02
...
[ dihedrals ]
; i j k l f eq coef w/n
1 2 3 4 32 -1.2595E+02 4.1840E+00 1
2 3 4 5 32 -1.3173E+02 4.1840E+00 1
3 4 5 6 32 -1.2827E+02 4.1840E+00 1
...
[ pairs ]
; i j f eq coef
1 5 2 6.2602E-01 4.1840E+00
1 6 2 8.6459E-01 4.1840E+00
1 55 2 8.2667E-01 4.1840E+00
...
Directives
Directives | Typ | Information |
---|---|---|
[ moleculetype ] |
mandatory | Molecule name and |
[ atoms ] |
mandatory | Atom (CG particle) information. |
[ bonds ] |
optional | Bond terms and parameters. |
[ angles ] |
optional | Angle or 1-3 terms and parameters. |
[ dihedrals ] |
optional | Dihedral angle terms and parameters. |
[ pairs ] |
optional | Nonbonded terms and parameters. |
[ cg_IDR_HPS_region ] |
optional | IDR region information (the HPS IDR model). [^1] |
[ cg_IDR_KH_region ] |
optional | IDR region information (the KH IDR model). [^1] |
[ pwmcos ] |
optional | Information for the PWMcos-model of protein-DNA sequence-specific interactions. [^2] |
[ pwmcosns ] |
optional | Information for the protein-DNA sequence-nonspecific hydrogen-bond interactions. [^3] |
Detailed formats of each block
Atoms
The [ atoms ]
block contains the following information for each CG particle:
- particle number;
- particle type;
- residue number;
- residue name;
- particle name;
- particle group (always 1);
- particle charge (unit: $e^-$);
- particle mass (unit: $amu$ ).
printf
format: "%10d%5s%10d%5s%5s%5d %8.3f %8.3f\n"
.
Bonds
The [ bonds ]
block contains the following information for each bond potential:
- particle index
i
; - particle index
j
; - function type;
- equilibrium value (unit: $nm$);
- force coefficient (unit: $kJ \cdot mol^{-1} nm^{-2}$).
Type | Interaction type |
---|---|
1 |
$\sum_{i} k_{i}(r_{i}-r_{i,0})^{2}$ |
21 |
$\sum_{i} k_{i}(r_{i}-r_{i,0})^{2} + 100 k_{i}(r_{i}-r_{i,0})^{4}$ |
printf
format: "%10d%10d%5d%18.4E%18.4E\n"
.
Angles
The [ angles ]
block contains the following information for each angle (or "1-3") potential:
- particle index
i
; - particle index
j
; - particle index
k
; - function type;
- equilibrium value;
- force coefficient;
- Gaussian width (only available for function type
21
).
Type | Interaction type | Parameters and units | printf format |
---|---|---|---|
1 |
$\sum_{i} k_{i} (\theta_{i} - \theta_{i, 0})^{2}$ | $\theta_{i, 0}$ ( $^\circ$ ); $k_{i}$ ( $kJ \cdot mol^{-1} rad^{-2}$ ) | "%10d%10d%10d%5d%15.4E%15.4E\n" |
21 |
$\displaystyle\sum_{i} \epsilon_{i} \exp(\frac{-(r_{i} - r_{i,0})^{2}}{2w_{i}^{2}})$ | $r_{i, 0}$ ( $nm$ ); $\epsilon_{i}$ ( $kJ \cdot mol^{-1}$ ); $w_{i}$ ( $nm$ ) | "%10d%10d%10d%5d%15.4E%15.4E%15.4E\n" |
22 |
$\displaystyle\sum_{i} -k_{B}T \ln \frac{P(\theta)}{\sin \theta }$ | "%10d%10d%10d%5d\n" |
Dihedrals
The [ dihedrals ]
block contains the following information for each
dihedral angle potential:
- particle index
i
; - particle index
j
; - particle index
k
; - particle index
l
; - function type;
- equilibrium value;
- force coefficient;
- Gaussian width (for function type
21
,41
) or periodicity (for function type1,
32
).
Type | "Safe" type | Interaction type | Parameters and units | printf format |
---|---|---|---|---|
1 |
32 |
$\sum_{i} k_{i} (1 + \cos ( n (\phi_{i} - \phi_{i, 0}) ) )$ | $\phi_{i, 0}$ ( $^\circ$ ); $k_{i}$ ( $kJ \cdot mol^{-1}$ ); $n$ (1) | "%10d%10d%10d%10d%5d%15.4E%15.4E%15d\n" |
21 |
41 |
$\displaystyle\sum_{i} -\epsilon_{i} \exp(\frac{-(\phi_i - \phi_{i,0})^2}{2\sigma_i^2})$ | $\phi_{i, 0}$ ( $^\circ$ ); $\epsilon_{i}$ ( $kJ \cdot mol^{-1}$ ); $\sigma_{i}$ ( $rad$ ) | "%10d%10d%10d%10d%5d%15.4E%15.4E%15.4E\n" |
22 |
52 |
$\sum_{i} -k_{B}T \ln P(\phi)$ | "%10d%10d%10d%10d%5d\n" |
Column 2 is the "function type" of the "safe dihedral angle" types designed to avoid singularity problems in energy and force calculations [^4].
Pairs
The [ pairs ]
block contains the following information for each nonbonded
pairwise interaction (mainly used for Go-like native contacts):
- particle index
i
; - particle index
j
; - function type;
- equilibrium value of distance;
- force coefficient.
Type | Interaction type | Parameters and units | printf format |
---|---|---|---|
2 |
$\displaystyle\sum_{i} \epsilon_{i} ( 5\left(\frac{\sigma_i}{ r_i}\right)^{12} - 6\left(\frac{\sigma_i}{r_i}\right)^{10} )$ | $\sigma_{i, 0}$ ( $nm$ ); $\epsilon_{i}$ ( $kJ \cdot mol^{-1}$ ) | "%10d%10d%10d%15.4E%15.4E\n" |
PWMcos type of protein-DNA sequence-specific interactions
The [ pwmcos ]
block contains the following information for each
PWMcos-type interaction [^2]:
- protein residue index;
- function type (always 1);
- equilibrium distance;
- $\theta_1$ (angle Sugar - Base - $C_\alpha$ ) (unit: $^\circ$);
- $\theta_2$ (angle between Base - $C_\alpha$ and Base ${}{-1}$ - Base ${}{+1}$ ) (unit: $^\circ$)
- $\theta_3$ (angle between Base - $C_\alpha$ and $C_{\alpha,+1}$ - $C_{\alpha, -1}$ ) (unit: $^\circ$)
- $\epsilon_A$ (unit: $k_B T$);
- $\epsilon_C$ (unit: $k_B T$);
- $\epsilon_G$ (unit: $k_B T$);
- $\epsilon_T$ (unit: $k_B T$);
- $\gamma$ (unit: 1);
- $\epsilon'$ (unit: $kcal\cdot mol^{-1}$).
printf
format: "%6d %3d %8.5f %8.3f %8.3f %8.3f%12.6f%12.6f%12.6f%12.6f%8.3f%8.3f \n"
.
Protein-DNA sequence-nonspecific hydrogen-bond
The [ pwmcosns ]
block contains the following information for each
protein-DNA hydrogen-bond interaction [^3]:
- protein residue index;
- function type (always 2);
- equilibrium distance;
- $\theta_1$ (angle Sugar - Base - $C_\alpha$) (unit: $^\circ$);
- $\theta_3$ (angle between Base - $C_\alpha$ and $C_{\alpha,+1}$ - $C_{\alpha, -1}$) (unit: $^\circ$)
- $\epsilon$ (unit: $kcal\cdot mol^{-1}$).
printf
format: "%6d %3d %8.5f %8.3f %8.3f %8.3f \n"
.
IDR region information for HPS and KH models
The [ cg_IDR_HPS_region ]
or [ cg_IDR_KH_region ]
block contains the
following information for the intrinsically disordered region [^1]:
- IDR starting index
i
; - IDR ending index
j
.
printf
format: "%10d %10d\n"
.
Coordinate file
The CG coordinates of molecules are written in the .gro
files. An example
file looks like this:
MOL_NAME, t = 0.000
1000
1 ALA CA 1 8.2333 0.6870 18.6981 0.0000 0.0000 0.0000
2 THR CA 2 7.9130 0.8916 18.6893 0.0000 0.0000 0.0000
3 LEU CA 3 8.0104 1.0025 18.3395 0.0000 0.0000 0.0000
...
100.0000 100.0000 100.0000
The first line is the system information. The second line writes the number of CG particles. The last line contains the box sizes. From the third line to the second last line are the coordinate information for all the particles.
Each coordinate line has the following information:
- residue number;
- residue name;
- particle name;
- particle number;
- $x$ (unit: $nm$);
- $y$ (unit: $nm$);
- $z$ (unit: $nm$);
- $vx$ (not used);
- $vy$ (not used);
- $vz$ (not used).
printf
format: "%5d%5s%5s%5d %8.4f %8.4f %8.4f %8.4f %8.4f %8.4f \n".
Note that in GENESIS we use a "fixed + free-style" .gro
file, which means
that except for the first 20-chars, you can use a relatively free format for
the coordinates. For instance, a format string of "%5d%5s%5s%5d %18.6f %18.6f %18.6f %18.6f %18.6f %18.6f \n"
is also acceptable.
[^1]: Dignon, G. L., et al., PLoS Comput Biol, 14(1), e1005941 (2018).
[^2]: Tan, C. & Takada, S., J Chem Theory Comput 14, 3877–3889 (2018).
[^3]: Niina, T., Brandani, G. B., Tan, C. & Takada, S., PLOS Comput Biol 13, e1005880 (2017).
[^4]: Tan, C., Jung, J., Kobayashi, C., & Sugita, Y. (2020). The Journal of Chemical Physics, 153(4), 044110.