Input Files: definput.json, dmdinput.json, and phdinput.json - alexandrova-lab-ucla/phd3 GitHub Wiki
Input File Format
All input files are in json format. json is a very standardized format that makes it very easy to parse the input parameters. If you are familiar with python, then you should be familiar with json as it is essentially a giant python dictionary with only a couple of key differences:
- strings are enclosed with double quotes
""
and not single quotes''
. That is,"hello"
is a valid string and'hello'
is an invalid string - boolean values (true/false) are lowercased and not upper case. That is
true
andfalse
null
is used in place ofNone
Several common mistakes/errors are generated from using ' ', True/False, or None in the input file as well as missing commas. Remember that you need commas to separate elements in the json file just as you would in a python dictionary! Often times, you will get a error message such as
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 54 column 2 (char 639)
This means that the json parser got to line 54 column 2 and was expecting a comma, but instead got another entry. Typically this means that the line above (or last entry) does not have a comma and thus one shoulw be placed there.
Note that just as in a python dictionary, the value can take on various types. These types are strings, numbers, lists, dictionaries, and booleans. Hence when describing each parameter on subsequent pages, we will indicate the type that it is in parentheses.
Specifying Atoms within a Protein
For both QM/DMD and DMD, it is often necessary to specify chains, residues, and even specific atoms within a protein. We have decided to assign to each atom, residue, and chain a unique identifier based on the chain letter, residue number, and atom id. Hence chain's are specified by their chain letter only. That is, "A"
selects chain A. Residues are part of a chain and therfore require both a chain letter and a residue number. We use the :
to seperate these elements out. So "A:45"
would be residue 45 on chain A. Finally, the inclusion of an atom id selects a specific atom from a residue. Thus, "A:45:SG"
would select atom SG (sulfer atom on a cystein residue) from residue 45 on chain A. Note that the atom id is also called the atom name and is represented by columns 13-16.
ATOM 1 N ILE A 1 113.272 103.403 75.327 1.00 0.00 N
ATOM 2 CA ILE A 1 112.497 103.485 76.529 1.00 0.00 C
ATOM 3 C ILE A 1 111.719 104.788 76.569 1.00 0.00 C
ATOM 4 O ILE A 1 112.137 105.759 75.903 1.00 0.00 O
ATOM 5 CB ILE A 1 113.385 103.323 77.780 1.00 0.00 C
ATOM 6 CG1 ILE A 1 114.340 104.459 78.055 1.00 0.00 C
ATOM 7 CG2 ILE A 1 112.491 103.098 79.010 1.00 0.00 C
ATOM 8 CD1 ILE A 1 114.563 104.779 79.510 1.00 0.00 C
ATOM 9 HA ILE A 1 111.778 102.666 76.520 1.00 0.00 H
ATOM 10 HB1 ILE A 1 113.982 102.422 77.642 1.00 0.00 H
In the sample pdb above, we can see that the atom id's are N, CA, C, O, CB, CG1,... etc.
Specifying Atoms within a QM System
Though phd3 is built for running calculations on protein's, it can still perform QM calculations agnostic to the system with TURBOMOLE. When specifying atoms within the coord file or .xyz file, numerical indices are used (starting from 1) to specify the atom of interest. When multiple atoms are needed, such as for defining bonds between atoms, a comma ,
is used to delimit selections. Hence, "4,24"
would select atoms 4 and 24.