data submission workflow - informatics-isi-edu/pdb-ihm GitHub Wiki
Detail of what scripts need to be called and what tables each script need to update.
mmCIF file processing (Step 2)
Installation
Installation Instruction for python-ihm
- pip install ihm
Installation Instruction for biopython (no longer needed)
- pip install biopython
Installation Instruction for mmcif
- yum install cmake
- pip install mmcif
Installation Instruction for rcsb utils
- pip install rcsb.utils.io
- pip install rcsb.utils.chemref
- pip install rcsb.utils.ec
- pip install rcsb.utils.seq
- pip install rcsb.utils.struct
- pip install rcsb.utils.taxonomy
- pip install rcsb.utils.multiproc
- pip install rcsb.utils.validation
- pip install rcsb.utils.config
Note: The above packages can be installed from PyPi.
In addition, the latest version of py-rcsb_db.tar.gz
file which contains the rcsb/db
directory already configured to run the required scripts is available on the salilab server (managed by Arthur).
Installation Instruction for CifCheck
- Following the instruction from RCSB Software Tools (only the first two steps for download and build) OR
- Install from source through
git clone --recurse-submodules https://github.com/rcsb/cpp-dict-pack.git
then
cd cpp-dict-pack
mkdir build
cd build
cmake .. -DMINIMAL_DICTS=ON
make
# This processing will generate a bin folder under build
Installation Instruction for IHMValidation
The deployment of IHMValidation pipeline requires several actions:
- Download pre-built binary image with 3rd party dependencies
- Pull IHMValidation code from github repo
- Create a neccesary directory structure
The exact commands are available in IHMValidation deployment script and were already incorporated in the dev
and prod
deployment scripts.
Workflow detail
- Convert partial mmCIF (user uploaded file) to mmCIF using python-ihm:
# From the scripts/make-mmCIF directory run:
python3 make-mmcif.py input.cif
Note: This package is used for converting mmCIF that can be converted to JSON and loaded into ermRest. This is not used to create mmCIF in the submission workflow.
Requirements for this step:
- Biopython
- make-mmcif.py (provided by Brinda)
- Input CIF file (e.g., input.cif) uploaded by user
- Copy output.cif from the previous step to py-rcsb_db/rcsb/db/tests-validate/test-output/ihm-files
- Convert mmCIF to JSON using py-rcsb_db:
# From the scripts/make-json/py-rcsb_db directory run:
python3 rcsb/db/tests-validate/testSchemaDataPrepValidate-ihm.py
Note: Output JSON files in rcsb/db/tests-validate/test-output
Requirements for this step:
- Brinda will provide the followings files that need to be properly installed:
- a python script i.e. rcsb/db/tests-validate/testSchemaDataPrepValidate-ihm.py
- a yml file i.e., rcsb/db/config/exdb-config-example-ihm.yml
- a json file i.e., CACHE/data_type_and_coverage/scan-ihm_dev-type-map.json
- IHM dictionary file i.e., ihm-extension.dic in CACHE/dictionaries
- Use JSON file to populate tables
- struct (editable)
- entity (editable)
- entity_poly (not editable)
- entity_poly_seq (not editable)
- pdbx_poly_seq_scheme (not editable)
- chem_comp (not editable)
- atom_type (not editable)
- struct_asym (not editable)
- ihm_entity_poly_segment (editable)
- ihm_struct_assembly (editable)
- ihm_struct_assembly_details (editable)
- ihm_model_representation (editable)
- ihm_model_representation_details (editable)
- ihm_modeling_protocol (editable)
- ihm_model_list (not editable)
- ihm_model_group (editable)
- ihm_model_group_link (editable)
Upload File processing (Step 4)
- Check out a file from Entry_Related_File table that hasn't been processed.
- Retrieves the file from hatrac.
- Populates the file's corresponding table (using the File_Type) with the file content. Make sure that a foreign key for each individual row to the Entry_Related_File is added.
Export entry into mmCIF File (export)
mmCIF validator
- Get
mmCIF dictionary software suite
from RCSB software tools website. - Follow steps 1 and 2 in the instructions for installation.
- The serialized
sdb
file (mmcif_ihm_vx.xx.sdb
) can be obtained from the IHM-dictionary Git repository. Brinda will provide the version that needs to be used, since the Deriva data model is a few versions behind the current dictionary version. - Execute command for validating mmCIF file (step 4):
./bin/CifCheck -f mmCIF_filename -dictSdb sdb_filename
Generate validation report
To generate a validation report run the following command as the pdbihm
user from /mnt/vdb1/pdbihm
folder:
singularity exec --pid --bind IHMValidation/:/opt/IHMValidation,input:/ihmv/input,output:/ihmv/output,cache:/ihmv/cache ihmv_20231222.sif /opt/IHMValidation/ihm_validation/ihm_validator.py --output-root /ihmv/output --cache-root /ihmv/cache --force -f input/mmCIF_filename