Input Files: definput.json, dmdinput.json, and phdinput.json - alexandrova-lab-ucla/phd3 GitHub Wiki

Input File Format

All input files are in json format. json is a very standardized format that makes it very easy to parse the input parameters. If you are familiar with python, then you should be familiar with json as it is essentially a giant python dictionary with only a couple of key differences:

strings are enclosed with double quotes "" and not single quotes ''. That is, "hello" is a valid string and 'hello' is an invalid string
boolean values (true/false) are lowercased and not upper case. That is true and false
null is used in place of None

Several common mistakes/errors are generated from using ' ', True/False, or None in the input file as well as missing commas. Remember that you need commas to separate elements in the json file just as you would in a python dictionary! Often times, you will get a error message such as

json.decoder.JSONDecodeError: Expecting ',' delimiter: line 54 column 2 (char 639)

This means that the json parser got to line 54 column 2 and was expecting a comma, but instead got another entry. Typically this means that the line above (or last entry) does not have a comma and thus one shoulw be placed there.

Note that just as in a python dictionary, the value can take on various types. These types are strings, numbers, lists, dictionaries, and booleans. Hence when describing each parameter on subsequent pages, we will indicate the type that it is in parentheses.

Specifying Atoms within a Protein

For both QM/DMD and DMD, it is often necessary to specify chains, residues, and even specific atoms within a protein. We have decided to assign to each atom, residue, and chain a unique identifier based on the chain letter, residue number, and atom id. Hence chain's are specified by their chain letter only. That is, "A" selects chain A. Residues are part of a chain and therfore require both a chain letter and a residue number. We use the : to seperate these elements out. So "A:45" would be residue 45 on chain A. Finally, the inclusion of an atom id selects a specific atom from a residue. Thus, "A:45:SG" would select atom SG (sulfer atom on a cystein residue) from residue 45 on chain A. Note that the atom id is also called the atom name and is represented by columns 13-16.

ATOM      1  N   ILE A   1     113.272 103.403  75.327  1.00  0.00           N
ATOM      2  CA  ILE A   1     112.497 103.485  76.529  1.00  0.00           C
ATOM      3  C   ILE A   1     111.719 104.788  76.569  1.00  0.00           C
ATOM      4  O   ILE A   1     112.137 105.759  75.903  1.00  0.00           O
ATOM      5  CB  ILE A   1     113.385 103.323  77.780  1.00  0.00           C
ATOM      6  CG1 ILE A   1     114.340 104.459  78.055  1.00  0.00           C
ATOM      7  CG2 ILE A   1     112.491 103.098  79.010  1.00  0.00           C
ATOM      8  CD1 ILE A   1     114.563 104.779  79.510  1.00  0.00           C
ATOM      9  HA  ILE A   1     111.778 102.666  76.520  1.00  0.00           H
ATOM     10  HB1 ILE A   1     113.982 102.422  77.642  1.00  0.00           H

In the sample pdb above, we can see that the atom id's are N, CA, C, O, CB, CG1,... etc.

Specifying Atoms within a QM System

Though phd3 is built for running calculations on protein's, it can still perform QM calculations agnostic to the system with TURBOMOLE. When specifying atoms within the coord file or .xyz file, numerical indices are used (starting from 1) to specify the atom of interest. When multiple atoms are needed, such as for defining bonds between atoms, a comma , is used to delimit selections. Hence, "4,24" would select atoms 4 and 24.