Methods - sb-ncbr/moleonline-web GitHub Wiki

MOLEonline algorithm

The computation of the tunnel is performed in several steps (see Figure 1) described in more detail in MOLE 2.0 publication:

  1. computation of the Delaunay triangulation/Voronoi diagram of the atomic centres,
  2. construction of the molecular surface,
  3. identification of cavities,
  4. identification of possible channel start points,
  5. identification of possible channel endpoints,
  6. localisation of channels using,
  7. filtering of the localised channels.
molde_scheme_methods

Figure 1. Scheme of the channel computation by [MOLE 2.5](https://webchem.ncbr.muni.cz/Platform/App/Mole)

Calculation of channels

The channel computation is finished after defining the starting and ending points for all the cavities. Dijkstra’s Shortest Path Algorithm finds the channels between all pairs of starting and ending points (or the molecular surface if the algorithm reaches it before it finds the channel exit). The edge weight function used in the algorithm considers the distance to the surface of the closest vdW sphere and the edge length.

The channel centerline is represented as a 3D natural spline representation defined by the Voronoi diagram vertices that form the path found by Dijkstra‘s algorithm.

The algorithm might find duplicate channels. Therefore, in the last post-processing step, the duplicate channels are removed according to Channel parameters.

Pore calculation

Pores are calculated differently than channels in the classical MOLE computation and are divided into two steps:

  1. finding of membrane region and
  2. calculation of the pore.

The pore computation starts with defining the membrane region of the molecule. Firstly, MOLE is trying to find the structure of Orientations of Proteins in Membranes - OPM database (Lomize2006).

If the structure cannot be found in the OPM database, the MOLE continue to search the membrane region by using the MEMEMBED program (Nugent2013) which is able to predict the probable membrane region of the protein molecule.

When the membrane region definition is finished, the pore endpoints are computed. The option Beta structure serves as the parameter for refinement of the membrane region prediction. The prediction of the membrane region highly depends on the secondary structure of the studied molecule.

2DProts algorithm

The computation of a 2D diagram by 2DProts can be performed using the following steps (see Figure 2). More detailed description is in 2DProts publication.

  1. Detection and annotation of SSEs
  2. Processing of individual domains
    1. Search for cluster start domain
    2. Joining strands into sheets and generating a 2D model for each sheet
    3. Division of helices and sheets into primary and secondary
    4. Placing all primary helices and sheets into the 2D diagram
    5. Adjustment of angles of primary helices and sheets in the 2D diagram
    6. Adding all secondary helices and sheets into the 2D diagram
    7. Adjustment of secondary SSE angles
  3. Drawing 2D diagrams
2Dprots

Figure 2. Model of a) 3D structure and b) 2D diagram of porcine Dipeptidyl Peptidase IV (PDB ID 1ORW).

Visualization of results

Radius of channel and pore

The MOLEonline offers three representations of the channel/pore radius (Figure 3):

  • Radius – The radius of a sphere within the channel limited by the three closest atoms
  • Free Radius – The radius of a sphere within the channel is limited by the three closest main chain atoms to allow sidechain flexibility
  • BRadius – Radius + RMSF calculated from B-factors of residues within individual layers
radius

Figure 3. Detail of the channel profile.

Calculation of physicochemical properties

When the tunnel is computed, the set of amino acids (side chains) lining the channel can be used to compute the physicochemical properties (Figure 4). According to side chains composition, the set of physicochemical properties is calculated (for lining amino acids side chains only):

  • Charge is calculated as a sum of charged amino acid residues (ARG, LYS, HIS = +1; ASP, GLU = -1)

  • Hydropathy is calculated as an average of the hydropathy index assigned to residues according to the method of Kyte and Doolittle (Kyte1982).

    • The hydropathy index is connected to hydrophilicity/hydrophobicity of amino acids (most hydrophilic is ARG = -4.5; most hydrophobic ILE = 4.5).
  • Hydrophobicity is calculated as an average of normalised hydrophobicity scales (Cid1992).

    • According to the hydrophobicity value, the most hydrophilic amino acid is GLU (-1.140), and ILE (1.810) is the most hydrophobic.
  • Polarity is calculated as an average of amino acid polarities assigned according to the method of Zimmerman et al. (Zimmerman1968).

    • Polarity ranges from completely nonpolar amino acids (ALA, GLY = 0.00) through polar residues (e.g. SER = 1.67) towards charged residues (GLU = 49.90, ARG = 52.00).
  • Lipophilicity (logP-scale) is calculated as octanol/water partition coefficients of Cβ fragments of side-chains and mainchain (-0.86) via www.chemicalize.org.

    • LogP values range from lipophobic fragments (ASN -1.03; SER -0.52; GLN -0.33; THR -0.16) through acids (ASP -0.22; GLU 0.48) and bases (ARG -0.08; HIS -0.01; LYS 0.7), to sulphur-containing (CYS 0.84; MET 1.48), aliphatic (ALA 1.08; VAL 1.8; PRO 1.8; LEU 2.08; ILE 2.24) to most lipophilic aromatic residues (TYR 2.18; PHE 2.49; TRP 2.59). GLY has no sidechain fragment, so its value is set to 0.
  • Lipophilicity (logD-scale) is calculated as octanol/water distribution coefficients of Cβ side-chain fragments and mainchain (-0.86) at pH 7.4 via www.chemicalize.org. The distribution coefficient takes into account the ionisation of compounds.

    • LogD values range from highly lipophobic charged fragments (ASP -3.00; ARG -2.49; GLU -2.12; LYS -1.91) through polar ones (ASN -1.03; SER -0.52; GLN -0.33; THR -0.16, HIS -0.11) to sulphur-containing (CYS 0.84; MET 1.48), aliphatic (ALA 1.08; VAL 1.8; PRO 1.8; LEU 2.08; ILE 2.24) to most lipophilic aromatic residues (TYR 2.18; PHE 2.49; TRP 2.59). GLY has no sidechain fragment, so its value is set to 0.
  • Solubility (logS-scale) is calculated as water solubility of Cβ side-chain fragments and mainchain (0.81) at pH 7.4 via www.chemicalize.org. Our estimated logS value is a unit stripped logarithm (base 10) of the solubility measured in mol/litre. It measures how well individual residues can interact with water molecules.

    • LogS values range from lowest solubility of aromatic residues (TRP -2.48: PHE -1.81; TYR -1.44;) followed by aliphatic (LEU -1.79; ILE -1.85; PRO -1.3; VAL -1.3; ALA 0.59) and sulphur-containing residues (MET -0.72; CYS 0.16) up to polar ones (HIS -0.2; GLN 0.13; ASN 0.54; THR 0.77; SER 1.11) whereas the most water molecules will attract charged residues (LYS 1.46; ARG 1.63; GLU 2.23; ASP 2.63). GLY has no sidechain fragment, so its value is set to 0.
  • Mutability is calculated as an average of the relative mutability index (Jones1992). Relative mutability is based on empirical substitution matrices between similar protein sequences.

    • It is high wherever amino acid can be easily substituted for another, e.g. in the case of small polar amino acids (SER = 117, THR = 107, ASN = 104) or in the case of small aliphatic amino acids (ALA = 100, VAL = 98, ILE = 103). On the other hand, it is low when amino acid plays an important structural role, such as in the case of aromatic amino acids (TRP = 25, PHE = 51, TYR = 50) or in the case of special amino acids (CYS = 44, PRO = 58, GLY = 50). A specific example of an amino acid with low relative mutability is the most abundant amino acid /LEU = 54), which has the highest probability of mutating back to itself.
  • Ionisable residues can also be viewed in the channel profile or directly as the selection on the visualised structure.

physicochemical_properties

Figure 4. Detail of computed physicochemical properties of 1TQN.

⚠️ **GitHub.com Fallback** ⚠️