data submission workflow - informatics-isi-edu/protein-database Wiki

Detail of what scripts need to be called and what tables each script need to update.

mmCIF file processing (Step 2)


  • To do:
    • Brinda is going to send the Git repo with instruction on how to install library
    • Brinda gives a hello world script similar to (with the goal to test that the installation is proper)

Installation Instruction for python-ihm

  1. pip install ihm

Installation Instruction for biopython

  1. pip install biopython

Installation Instruction for mmcif

  1. yum install cmake
  2. pip install mmcif

Installation Instruction for rcsb

  1. pip install
  2. pip install rcsb.utils.chemref
  3. pip install
  4. pip install rcsb.utils.seq
  5. pip install rcsb.utils.struct
  6. pip install rcsb.utils.taxonomy
  7. pip install rcsb.utils.multiproc

Brinda has provided the py-rcsb_db.tar.gz file which contains the rcsb/db directory already configured to run the required scripts.

Workflow detail

  1. Convert partial mmCIF (user uploaded file) to mmCIF using python-ihm:
# From the scripts/make-mmCIF directory run:

python3 input.cif

Note: This package is used for converting mmCIF that can be converted to JSON and loaded into ermRest. This is not used to create mmCIF in the submission workflow.

Requirements for this step:

  • Biopython
  • (provided by Brinda)
  • Input CIF file (e.g., input.cif) uploaded by user
  1. Copy output.cif from the previous step to py-rcsb_db/rcsb/db/tests-validate/test-output/ihm-files
  2. Convert mmCIF to JSON using py-rcsb_db:
# From the scripts/make-json/py-rcsb_db directory run:

python3 rcsb/db/tests-validate/

Note: Output JSON files in rcsb/db/tests-validate/test-output

Requirements for this step:

  • Brinda will provide the followings files that need to be properly installed:
    • a python script i.e. rcsb/db/tests-validate/
    • a yml file i.e., rcsb/db/config/exdb-config-example-ihm.yml
    • a json file i.e., CACHE/data_type_and_coverage/scan-ihm_dev-type-map.json
    • IHM dictionary file i.e., ihm-extension.dic in CACHE/dictionaries
  1. Use JSON file to populate tables
  • struct (editable)
  • entity (editable)
  • entity_poly (not editable)
  • entity_poly_seq (not editable)
  • pdbx_poly_seq_scheme (not editable)
  • chem_comp (not editable)
  • atom_type (not editable)
  • struct_asym (not editable)
  • ihm_entity_poly_segment (editable)
  • ihm_struct_assembly (editable)
  • ihm_struct_assembly_details (editable)
  • ihm_model_representation (editable)
  • ihm_model_representation_details (editable)
  • ihm_modeling_protocol (editable)
  • ihm_model_list (not editable)
  • ihm_model_group (editable)
  • ihm_model_group_link (editable)

Upload File processing (Step 4)

  • Check out a file from Entry_Related_File table that hasn't been processed.
  • Retrieves the file from hatrac.
  • Populates the file's corresponding table (using the File_Type) with the file content. Make sure that a foreign key for each individual row to the Entry_Related_File is added.

Export entry into mmCIF File (export)

mmCIF validator

  • Get mmCIF dictionary software suite from RCSB software tools website.
  • Follow steps 1 and 2 in the instructions for installation.
  • The serialized sdb file (mmcif_ihm_vx.xx.sdb) can be obtained from the IHM-dictionary Git repository. Brinda will provide the version that needs to be used, since the Deriva data model is a few versions behind the current dictionary version.
  • Execute command for validating mmCIF file (step 4): ./bin/CifCheck -f mmCIF_filename -dictSdb sdb_filename