Custom Datasets for Mass UI - stjude/proteinpaint GitHub Wiki

This page contains draft content. Thank you for your patience as we make updates.

The information in this wiki is intended for experienced bioinformaticians and developers. Please note that the ProteinPaint team does not provide support or guidance for creating ETLs or formatting data source files unless a formal collaboration or contract is in place.

For details about contracts and licensing, please refer to the ProteinPaint licensing information from the St. Jude Office of Technology Licensing.

Mass UI Specs

Please refer to the mass UI README for the technical description of the mass UI.

Technical Requirements

Before attempting, one must:

  1. Possess an understanding of docker containers. Refer to their documentation here for assistance.
  2. Create and maintain ETLs specific to one's usage

Please use our test data in this directory as an example of how to properly format data for the mass UI. This test data directory is mounted to the ppfull and ppserver containers.

Instructions

Step 1: Pull the Docker image

Pull a pinned docker image version.

docker pull <image-name>@<digest>

Note: Pulling our latest published image may break your usage.

Step 2: Download reference files

Download the reference and support files to run the container. Please see the instructions here for downloading the files. Note: This only includes hg19 and hg38 at this time.

Step 3: Create Dataset File

Make container/dataset and create a dataset js file. As a reference, see our test dataset configuration file. For more options and descriptions, please see the type definitions for dataset files here. Note: At this time only js files are supported. Later this year, support will be extended to ts files.

Step 4: Set up serverconfig.json

Create a serverconfig.json, similar to container/ci/serverconfig.json. Include the custom dataset in the appropriate genome object in the datasets array. Here's an example:

"genomes": [
      { 
        "name": "hg38", 
        "species": "human", 
        "file": "./genome/hg38.js", 
        "datasets": [{ 
          "name": "Custom",
          "jsfile": "./dataset/custom.js"
        }]
      }
    ] ...

Step 5: Run container

Use ./run.sh ppfull:latest (or a pinned image version from our github repo) in the container directory. A detailed description for running the docker container is available here.

⚠️ **GitHub.com Fallback** ⚠️