data submission workflow - informatics-isi-edu/pdb-ihm GitHub Wiki

Detail of what scripts need to be called and what tables each script need to update.

mmCIF file processing (Step 2)

Installation

Installation Instruction for python-ihm

pip install ihm

Installation Instruction for biopython (no longer needed)

pip install biopython

Installation Instruction for mmcif

yum install cmake
pip install mmcif

Installation Instruction for rcsb utils

pip install rcsb.utils.io
pip install rcsb.utils.chemref
pip install rcsb.utils.ec
pip install rcsb.utils.seq
pip install rcsb.utils.struct
pip install rcsb.utils.taxonomy
pip install rcsb.utils.multiproc
pip install rcsb.utils.validation
pip install rcsb.utils.config

Note: The above packages can be installed from PyPi.

In addition, the latest version of py-rcsb_db.tar.gz file which contains the rcsb/db directory already configured to run the required scripts is available on the salilab server (managed by Arthur).

Installation Instruction for CifCheck

Following the instruction from RCSB Software Tools (only the first two steps for download and build) OR
Install from source through git clone --recurse-submodules https://github.com/rcsb/cpp-dict-pack.git then

  cd cpp-dict-pack
  mkdir build
  cd build
  cmake .. -DMINIMAL_DICTS=ON
  make
  # This processing will generate a bin folder under build

Installation Instruction for IHMValidation

The deployment of IHMValidation pipeline requires several actions:

Download pre-built binary image with 3rd party dependencies
Pull IHMValidation code from github repo
Create a neccesary directory structure

The exact commands are available in IHMValidation deployment script and were already incorporated in the dev and prod deployment scripts.

Workflow detail

Convert partial mmCIF (user uploaded file) to mmCIF using python-ihm:

# From the scripts/make-mmCIF directory run:

python3 make-mmcif.py input.cif

Note: This package is used for converting mmCIF that can be converted to JSON and loaded into ermRest. This is not used to create mmCIF in the submission workflow.

Requirements for this step:

Biopython
make-mmcif.py (provided by Brinda)
Input CIF file (e.g., input.cif) uploaded by user

Copy output.cif from the previous step to py-rcsb_db/rcsb/db/tests-validate/test-output/ihm-files
Convert mmCIF to JSON using py-rcsb_db:

# From the scripts/make-json/py-rcsb_db directory run:

python3 rcsb/db/tests-validate/testSchemaDataPrepValidate-ihm.py

Note: Output JSON files in rcsb/db/tests-validate/test-output

Requirements for this step:

Brinda will provide the followings files that need to be properly installed:
- a python script i.e. rcsb/db/tests-validate/testSchemaDataPrepValidate-ihm.py
- a yml file i.e., rcsb/db/config/exdb-config-example-ihm.yml
- a json file i.e., CACHE/data_type_and_coverage/scan-ihm_dev-type-map.json
- IHM dictionary file i.e., ihm-extension.dic in CACHE/dictionaries

Use JSON file to populate tables

struct (editable)
entity (editable)
entity_poly (not editable)
entity_poly_seq (not editable)
pdbx_poly_seq_scheme (not editable)
chem_comp (not editable)
atom_type (not editable)
struct_asym (not editable)
ihm_entity_poly_segment (editable)
ihm_struct_assembly (editable)
ihm_struct_assembly_details (editable)
ihm_model_representation (editable)
ihm_model_representation_details (editable)
ihm_modeling_protocol (editable)
ihm_model_list (not editable)
ihm_model_group (editable)
ihm_model_group_link (editable)

Upload File processing (Step 4)

Check out a file from Entry_Related_File table that hasn't been processed.
Retrieves the file from hatrac.
Populates the file's corresponding table (using the File_Type) with the file content. Make sure that a foreign key for each individual row to the Entry_Related_File is added.

Export entry into mmCIF File (export)

mmCIF validator

Get mmCIF dictionary software suite from RCSB software tools website.
Follow steps 1 and 2 in the instructions for installation.
The serialized sdb file (mmcif_ihm_vx.xx.sdb) can be obtained from the IHM-dictionary Git repository. Brinda will provide the version that needs to be used, since the Deriva data model is a few versions behind the current dictionary version.
Execute command for validating mmCIF file (step 4): ./bin/CifCheck -f mmCIF_filename -dictSdb sdb_filename

Generate validation report

To generate a validation report run the following command as the pdbihm user from /mnt/vdb1/pdbihm folder:

singularity exec --pid --bind IHMValidation/:/opt/IHMValidation,input:/ihmv/input,output:/ihmv/output,cache:/ihmv/cache ihmv_20231222.sif /opt/IHMValidation/ihm_validation/ihm_validator.py --output-root /ihmv/output --cache-root /ihmv/cache --force -f input/mmCIF_filename