Tracks - stjude/proteinpaint GitHub Wiki

Using custom tracks in ProteinPaint

  • Tracks are defined as JSON objects.
  • Submit tracks by launching a genome browser, see below:
  • All text values are case-sensitive.

custom track workflow



Common names such as red, green: https://en.wikipedia.org/wiki/Web_colors #FF0000 rgb(255,0,0) rgba(255,0,0,.5)

Example

Go to https://proteinpaint.stjude.org/ , launch hg19 genome browser and paste in following JSON text to add two tracks:

[
{"type":"bigwig","file":"hg19/hg19.100way.phastCons.bw","name":"UCSC phastCons 100ways","dotplotfactor":20,"height":100},
{"type":"bedj","file":"anno/refGene.hg19.gz","name":"RefSeq genes","translatecoding":1,"color":"#417D4C","stackheight":20}
]

View or debug JSON with https://jsonlint.com/

The JSON track objects can be used with embedding API.

Example: { "name":"name of the track", "type":, “url”:”http://domain/file.gz”, “indexURL”:”http://domain/path/file.gz.tbi”, "file":"path/to/file.gz", // use this when not using URL “toppad”:5, “bottompad”:5 }

"name": STR

  • A string as track name

"type": STR

  • Typecode of the track. Allowed values are:
    • bigwig
    • bigwigstranded
    • bedj
    • profilegenevalue
    • junction
    • mds3
    • bam
    • bampile
    • hicstraw

"file": STR "url": STR "indexURL": STR

  • Either “file” or “url” should be provided, but not both. When using “file”, provide the relative path to the track file starting from directory as is configured on the ProteinPaint server.
  • When using URL for tabix-indexed files, by default it requires the index file to share URL with the .gz file. When it’s not sharing the URL, the attribute “indexURL” must be used to provide the URL of the index file.

"toppad": INT

  • Number of pixels as the padding space on the top, default: 5

"bottompad": INT

  • Number of pixels as the padding space at the bottom, default: 5

"hidden": 1

  • If set, the track will be hidden by default and can be found in the track menu (by clicking the “Tracks” button)

"mclassOverride": {}

  • Customize the appearance of mutations with the mclassOverride argument.
    • "className": STR creates the legend label to the left of the custom mutation
    • Define all the customizations using the class code in "classes": {}
      • "label": STR is the label used in the visualization and the legend
      • "color": STR the color within the visualization and legend, Use web color names, HEX colors (e.g. #ff0000), or rgb color codes (e.g. rgb(255,0,0)).
      • "desc": STR appears when clicking on the mutation in the legend
  • Example:
"mclassOverride": {
    "className": "Amino Acid Phosphosite",
    "classes":{
        "M": {
            "label": "Threonine",
            "color": "#2874A6",
            "desc": "phosphorylated amino acid"
        }
    }
}

Example: https://proteinpaint.stjude.org/?block=on&genome=hg19&bigwigfile=BigWig_Demo,proteinpaint_demo/hg19/bigwig/file.bw

{
	    "name": "track name",
    "type": "bigwig",
    "file": "proteinpaint_demo/hg19/bigwig/file.bw",
    "scale": {
	    "min": 0,
	    "max": 100
    },
    "height": 100
}

bigWig track attributes:

scale: {}

  • min:number
  • max:number
    • Set a fixed scale range of the Y axis
  • percentile:number
    • Value is integer from 1 to 99, representing a percentile of all the data in the view range. Overrides min/max
  • auto:1
    • Set automatic scale, will override all other settings in “scale” height:number
  • Bar plot height in number of pixels. If height is below 10, the track will be rendered as heatmap.

pcolor:str

  • Bar color of the positive values

pcolor2:str

  • Rendering color for data points above Y axis maximum value.

ncolor:str

  • Bar color of negative values

ncolor2:str

  • Rendering color for data points below Y axis minimum value

dotplotfactor:int

  • Value is positive integer e.g. 5 or 10. When applied, will request 5 or 10 times more data points from a bigWig track and plot each point as a dot, rather than bars. A use case is checking (large-scale) CNV from DNA sequencing coverage track

Example: https://proteinpaint.stjude.org/?block=on&genome=hg19&bigwigfile=BigWig_Demo,proteinpaint_demo/hg19/bigwig/file.bw

For showing stranded RNA-seq coverage data as a pair of bigWig tracks, with forward strand on top and reverse strand on bottom.

Screenshot 2024-07-15 at 9 39 00 AM
{
    "name": "stranded RNA-seq coverage",
    "type": "bigwigstranded",
     “strand1”:{
        "file": "path/to/sample.forwardstrand.bw",
        "scale": {
	        "min": 0,
	        "max": 100
        },
        "height": 50
     },
     “strand2”:{
        "file": "path/to/sample.reversestrand.bw",
        "scale": {
	        "max": 0,
	        "min": -100
        },
        "height": 50,
         "normalize": {  "dividefactor": -1 }
     }
}
  • strand1 : {}
    • The bigWig track of the data from forward strand
    • For read-coverage data, the values in the forward-sttrand bigWig file should be positive
  • strand2 : {}
    • The bigWig track of the data from reverse strand
    • Both strands follow the bigWig track definition.

Note: for stranded bigwig files using all positive values for both strands (e.g. sequencing read coverage), a “normalization value” of -1 should be applied to the reverse strand, so the bars will point down.

If the reverse strand bigwig track has been prepared to have negative values, then no need to apply the -1 normalizing factor.

Example: https://proteinpaint.stjude.org/?appcard=pgv

The PGV track is a combination of genomic profiling results and gene-level numerical values over a set of samples. Here is a screenshot of chromatin states and the expression level for one gene over a set of samples.

Screenshot 2024-07-15 at 8 44 59 AM

A live example: https://pecan.stjude.org/proteinpaint/study/retina2017

Gene-value track

Example lines from a gene-value track:

Gene-value track

Example lines from a gene-value track:

chr1    3205900 3671498 {"sample":"sample1","value":1.83479,"gene":"Gm37363"}
chr1    3205900 3671498 {"sample":"sample1","value":1.87122,"gene":"Gm37180"}
chr1    3205900 3671498 {"sample":"sample2","value":2.10581,"gene":"Gm37329"}
chr1    3205900 3671498 {"sample":"sample3","value":3.21379,"gene":"Gm19938"}

The file has four columns:

  1. Chromosome name
  2. Start position of the gene (feature), 0-based
  3. Stop position of the gene (feature), 0-based
  4. JSON object
  • Required keys:
  • "sample":STR
  • "gene": STR
  • "value": FLOAT
  • Currently, only nonnegative values are supported
  1. The chromosomal position should match with the gene. It’s totally okay to describe other type of genomic features rather than genes, but the “gene” key should still be used for the moment.

Each line represents a numerical value for a gene in a sample. The gene chromosome positions are for indexing purpose.

The gene-value track should be compressed and indexed, and hosted in the same way as the JSON-BED track.

Track format with single type of gene-value

This format is suitable for just one type of gene-value in PGV track, e.g. expression.

Example:

{
"type":"profilegenevalue",
"genevaluetrack":{
    “file”:"rhb/fpkm.db"
},
"genevaluetype":"FPKM",
“genevaluematchname”:”sampleID”,
"legendimg":{
    "file":"rhb/chromhmm.png",
   },
"name":"RHB chromHMM",
"tracks":[  
    {"type":"bedj",
     "file":"rhb/fq21.gz",

     "stackheight":20,
     "stackspace":1,
     "name":"sample1",
     “sampleID”:”sample1”
    },
    {“type”:”bigwig”,
     ”file”:”path/to/file.bw”,
     ”height”:20,
     ”name”:”sample2”,
     “sampleID”:”sample2”,
     ”scale”:{“auto”:1},
     ”pcolor”:”blue”
    },
   … more member tracks …
]
}
  • genevaluetrack
  • A JSON-BED file that stores numerical data per gene per sample.
    • When the file is stored on the ProteinPaint server, use “file” to provide path. Otherwise, provide the URL with the “url” keyword.
  • The file follows JSON-BED format and needs to be compressed by bgzip and indexed by tabix.
  • Each line stores one numerical value for one gene in one sample. The JSON part should be an object with following attributes:
    • gene
      • the name of gene
    • sample
      • the name of sample, should match one of the member tracks (but not required to be so)
    • value
      • the numerical value
  • genevaluetype
    • The type of the gene value, e.g. “FPKM”, will be displayed on the track
  • genevaluematchname
    • Optional.
    • If not provided:
      • The “name” of each member track will be used for sample name matching with the “genevaluetrack”
    • If provided:
      • Value can be arbitrary string, and will be an attribute for member tracks
      • E.g. “genevaluematchname”:”sampleID”, then the “sampleID” attribute should exist for member tracks, and the value of “sampleID” of each track will be used for sample name matching
  • legendimg
    • File
      • Relative path to an PNG image as the legend of this track. E.g. a state-by-assay heatmap for the case of chromHMM tracks.
    • Image URL is not supported yet.
    • The image will be displayed in the LEGEND section at the bottom.
  • tracks : [ ]
    • An array of member tracks
    • Each member track is one JSON object, which must be properly defined according to its type. Supported types are:
      • JSON-BED
      • bigWig
    • Each member track must have the “name” attribute, the name value must be unique

Track format for multiple types of gene values

This format should be used for multiple types of gene values in a PGV track, e.g. RNA expression + proteomics.

Notably, this format uses “genevaluetklst” to store a set of gene value tracks. Thus it’s also able to replace the single-track format discussed above.

{
"type":"profilegenevalue",
"name":"RHB chromHMM",
"genevaluetklst":[
    {
   "file":"path/to/fpkm_file.gz",
    "name":"Gene FPKM",
    "matchname":"RNAsampleID",
    },
    {
    "file":"path/to/proteomics_file.gz",
    "name":"Proteomics",
    "matchname":"PROTEINsampleID",
    }
},
"tracks":[  … member tracks … ]
}

Track format for having multiple types of data points in one of the gene-value track

With application for protein phosphorylation. In such case the value types will be specific to genes. May also support fixed set of types.

{
"type":"profilegenevalue",
"name":"RHB chromHMM",
"genevaluetklst":[
    {
    "file":"path/to/phosphorylation.gz",
    "name":"phosphorylation",
    "multivaluekey":"site",
    "axistickformat":".0e",
    }
},
"tracks":[  … member tracks … ]
}

Example rows of the phosphorylation data file:

chr1    879582  894689  {"sample":"MAST 118","value":149697.325,"gene":"NOC2L","site":"S56"}
chr1    879582  894689  {"sample":"MAST 118","value":3463144.33,"gene":"NOC2L","site":"S673"}
chr1    879582  894689  {"sample":"MAST 118","value":4299136.975,"gene":"NOC2L","site":"S672"}
chr1    879582  894689  {"sample":"MAST 118","value":86552.805,"gene":"NOC2L","site":"S49"}
chr1    879582  894689  {"sample":"MAST 35","value":106901.32,"gene":"NOC2L","site":"S56"}

In such phosphorylation data, each row represents the phosphorylation value for one specific amino acid residue (denoted by the “site” key), in one gene, of one sample.

Note that the genomic coordinates are of genes.

Example: https://proteinpaint.stjude.org/?block=on&genome=hg19&junctionfile=RNA%20splice%20junction,proteinpaint_demo/hg19/junction/file.gz

{
    "type": "junction",
    "name": "sample junction",
    "file": "junction/targetALL/10-PANYGB-diagnosis-SJCOGALL010859_D2.gz",
    "categories": {
	    "known": {
		    "label": "Known",
		    "color": "#9c9c9c"
	    },
	    "novel": {
		    "label": "Novel",
		    "color": "#cc0000"
	    }
    }
}

“categories” specifies the rendering color for types of junctions.

The track can be used to represent RNA splice junction data from RNA-seq assays. Each file can contain information for 1 or more samples.

File format of single sample data

The file has 6 required columns:

  1. Chromosome name, e.g. “chr1”

  2. Start, 0-based position of the last nucleotide of the upstream exon

  3. Stop, 0-based position of the first nucleotide of the downstream exon

  4. Strand, +/-

  5. Type of junction, if not available, use empty string

    a. It can be arbitrary text value, e.g. “known” or “novel”. Any used types should be stated in the .categories{} attribute of the track so they can be distinguished by color.

  6. Read count, integer value

Steps to convert the tabular text file to a junction track

$ sort -k1,1 -k2,2n textfile > textfile.sorted
$ bgzip textfile.sorted
$ tabix -p bed textfile.sorted.gz

This generates two files:

textfile.sorted.gz
textfile.sorted.gz.tbi

Put both files in the same directory on the server, and use the file path (or URL) to the .gz file for submitting.

File format for multiple samples

The first five columns are the same as single sample file. The 6th column is the read count for the first sample, the 7th column is the second sample, so that arbitrary number of samples can be represented in this way.

Optionally, provide a header line to denote sample names, e.g.:

#chr  start  stop  strand  type  sample1   sample2   ...

Header line must begin with “#”. Bgzip this file in the same way. To index this file, run tabix with additional parameter:

$ tabix -p bed -c "#" multisample.gz

Sample names like “sample1” and “sample2” in the header of above example can be replaced by JSON object strings, as a way of encoding additional information on samples in the track file.

{"patient":"SJACT001","sample":"SJACT001_D","sampletype":"DIAGNOSIS","diagnosis_group_short":"ST","diagnosis_group_full":"Solid Tumor","diagnosis_short":"ACT","diagnosis_full":"Adrenocortical Carcinoma","diagnosis_subtype_short":"TP53-mut","diagnosis_subtype_full":"TP53-mut"}

{"patient":"SJACT002","sample":"SJACT002_D","sampletype":"DIAGNOSIS","diagnosis_group_short":"ST","diagnosis_group_full":"Solid Tumor","diagnosis_short":"ACT","diagnosis_full":"Adrenocortical Carcinoma","diagnosis_subtype_short":"TP53-mut","diagnosis_subtype_full":"TP53-mut"}

This can allow plotting samples by different colors. To do so, add “cohortsetting” attribute to track object when using the embedding API:

runproteinpaint({

     … other parameters … 

     tracks:[
         {
            type:'junction',
            name:'track name',
            file:'path/to/file.gz',
            cohortsetting:{
                 uselevelidx:0,
                 cohort:{
                      levels:[
                           {
                               k:'diagnosis_group_short',
                               label:'cancer'
                           }
                      ]
                 }
            }
         }
    ],

})

Multiple junction tracks can be aggregated to show in one track, via the .tracks[ ] attribute:

{
"type":"junction",
"name":"sample junction",
"tracks":[
    {
      "sample":"sample1",
      "file":"path/to/sample1.gz"
    },  
    {   
     "sample":"sample2",
       "file":"path/to/sample2.gz"
    },  
    … more samples …
],
"categories":{ … } 
}

In the .tracks[ ], add one object for each member track.

When combining multiple junction tracks, each track can contain one or multiple samples.

Example: https://proteinpaint.stjude.org/?genome=hg38&gene=kras&mds3bcffile=BCF_Demo,hg38/clinvar.hg38.bcf.gz

Bampile track can be used to examine the very rare alleles from a high-depth (capture-based) DNA sequencing experiment. Example of bampile track:

{
  "name": "track name",
  "type": "bampile",
  "file": "path/to/file.gz",
  "url": "use url if the file is hosted on a web server"
}

File format

The file has 3 columns:

  1. Chromosome name
  2. Basepair position, 0-based
  3. JSON object, e.g. a. {"Raw_0":{"A":8,"C":11,"G":5,"T":4050},"Raw_30":{"A":4,"C":7,"G":4,"T":3966},"Raw_35":{"A":3,"C":7,"G":3,"T":3771},"Raw_38":{"A":1,"C":5,"G":3,"T":3334},"New_0":{"C":4,"T":2652},"New_30":{"C":4,"T":2633},"New_35":{"C":4,"T":2598},"New_38":{"C":2,"T":2382}} b. Keys are various grades c. For each grade, read depth for each observed nucleotide is given in an object

Prepare a track

To prepare bampile track file from Xiaotu’s output in St. Jude internal system:

Node.js v6.0 and above is required. Linux binaries are available at https://nodejs.org/dist/v6.9.1/node-v6.9.1-linux-x64.tar.xz

(Do not load Node.js using “module load”, that’s outdated)

$ module load tabix
$ cd /research/dept/compbio/common/proteinpaint-dev/tp/ultra
$ node bampileparse.js SAMPLE.input.txt > tempfile
$ sort -k1,1 -k2,2n tempfile > SAMPLE
$ bgzip SAMPLE
$ tabix -b 2 -e 2 -s 1 SAMPLE.gz

This generates two files under “tp/ultra/” directory:

  • SAMPLE.gz
  • SAMPLE.gz.tbi

Link the file to ProteinPaint by URL for display:

http://proteinpaint-dev.stjude.org:3001/?block=on&genome=hg19&position=chr13:48941549-48941948&bampilefile=SAMPLE,ultra/SAMPLE.gz

In the URL parameter, set correct genome build version, initial display position, and name and path to the SAMPLE.gz file.

  • File path should start from but do not contain “tp/”.

  • Name and path is joined by comma.

  • fineheight

    • Height of the bar plot at bottom showing low-frequency alleles, the Y scale uses a cutoff value as defined by “fineymax”
  • allheight

    • Height of the bar plot at top showing frequency of all alleles with automatic scale for coverage
  • midpad

    • Padding distance between top and bottom bar plots
  • fineymax

    • Y scale used by the bottom bar plot
  • usegrade

    • Name of the grade to use

Example: https://proteinpaint.stjude.org/?appcard=ai

AIcheck track format and usage

“aicheck” is a term coined by Xiaotu Ma, who also designed the visualization to show the allelic imbalance of the heterozygous SNP markers in a tumor genome as compared to this patient’s germline genome, as a way of indicating loss-of-heterozygosity.

Screenshot 2024-07-15 at 9 22 35 AM

Live example: https://proteinpaint.stjude.org/?appcard=ai

The JSON definition for a aicheck track is:

{
    "type": "aicheck",
    "file": "files/hg19/example/aicheck/SJBALL020340_D1_sorted.gz",
    "name": "SJBALL020340 tumor allelic imbalance"
}

Submit this JSON text to the custom track panel to make it work. See custom track guide for details.

Screenshot 2024-07-15 at 9 25 04 AM

Additional JSON parameters:

  • coveragemax: Integer
    • Maximum Y-axis value for coverage tracks of both tumor and normal; default: 100.
  • vafheight: Integer
    • Variant allele fraction track height for both tumor and normal; default: 50.
  • coverageheight: Integer
    • Coverage track height; default: 30.
  • rowspace: Integer
    • Vertical spacing between rows; default: 5.

Following two optional parameters allow to filter markers.

  • gtotalcutoff: Integer
    • Minimum total germline coverage. Markers with values below the cutoff will be excluded.
  • gmafrestrict: float between 0 to 0.5
    • A limit on the germline B-allele fraction so as to only include markers with BAF close to 50%. E.g. by setting value 0.3 to this parameter, it will require 0.3 <= BAF <= 0.7. Markers with BAF outside this range will be excluded.

Aicheck track file has 6 columns, tab-separated. First line is a header, optional.

Chr     Pos     MinD    TinD    MinN    TinN
chr1    10003   3       38      0       23
chr1    10007   2       60      1       32
chr1    10009   3       58      1       32
chr1    10016   3       79      1       65
chr1    10019   1       94      2       76
chr1    10020   3       102     1       82
chr1    10025   2       127     1       91
chr1    10025   2       129     1       92

Each row is one germline heterozygous marker. All markers are from the same patient. Columns:

  1. Chromosome name
  2. Position of the SNP
  3. Alternative allele read count in tumor DNA
  4. Total read count of this SNP in tumor DNA
  5. Alternative allele read count in germline DNA
  6. Total read count of this SNP in germline DNA

After you assemble the marker data into a text file, do following to convert it to a track file:

$ sort -k1,1 -k2,2n input.file > input.sorted
$ bgzip input.sorted
$ tabix -c 'C' -s 1 -b 2 -e 2 input.sorted.gz

This generates two files “sorted.gz” and “sorted.gz.tbi”. Put both of them in the same path.

Example: https://proteinpaint.stjude.org/?genome=hg38&block=1&position=chr22:22569655-23013766&svcnvfpkmfile=TCGA_DLBC,svcnv,proteinpaint_demo/hg38/GP/TCGA_DLBC.CNV.gz,vcf,proteinpaint_demo/hg38/GP/TCGA_DLBC.vcf.gz,fpkm,proteinpaint_demo/hg38/GP/TCGA_DLBC.fpkm.gz

This launches a custom GenomePaint track. To access official tracks, see embedding API.

{
  "name": "track name",
  "type": "mdssvcnv",
  "file": "path/to/svcnv.gz",
  "checkexpressionrank":{
     "file":"hg38/tcga-gdc/SKCM/TCGA_SKCM.fpkm.gz"
  }
  "checkvcf":{
      "file":"hg38/tcga-gdc/SKCM/TCGA_SKCM.vcf.gz"
  }
}

Read the GenomePaint tutorial.

You can replace “file” with “url” in above 3 places.

Following attributes can be applied in the track object as detailed in the Embedding API.

  • singlesample:{}
  • isfull:true / isdense:true
  • sampleAttribute:{}
  • vcf:{}
  • hide_cnvgain:true
  • hide_cnvloss:true
  • cnv:{}
  • sampleset:[]

To supply sample assay tracks for a custom GenomePaint track, use the “sample2assaytrack” attribute. The value is an object of key-value pairs, where keys are sample names, and values are lists of assay tracks available for that sample.

A derivative of the “mdssvcnv” track is the multi-sample ASE track.

Example: https://proteinpaint.stjude.org/?appcard=ase

{ 
    type:'mdssvcnv',
    name:'Multi-sample ASE analysis',
    checkvcf:{
         file:'hg19/TARGET/DNA/test/oct3/sorted.vcf.gz',
    },
    checkrnabam:{
         samples:{
              SJALL015260_D1: {
                   file:'hg19/TARGET/RNAbam/SJALL015260_D1.bam',
                   totalreads: 83388794,
              },
         SJALL015643_D1: {
              file:'hg19/TARGET/RNAbam/SJALL015643_D1.bam',
              totalreads: 103477133,
         },
         … more BAM files … 
    },
}

Note this track type is deprecated. Use mds3 instead. The track object contains one single attribute:

{
  "mdsjsonfile": "path/to/your/dataset.json"
}

Example: https://proteinpaint.stjude.org/?appcard=ase

This carries on-the-fly ASE analysis for a single sample. This track can be built standalone, or spawned from a multi-sample track (a special mode of mdssvcnv track)

{
    type:'ase',
    name:'My sample ASE',
    samplename:'my_sample_name',
    rnabamfile:'path/to/sample.rnaseq.bam',
    rnabamtotalreads: 103477133,
    vcffile:'path/to/SJALL015643_D1.gz',
},

The pediatric tumor gene expression is annotated with the ASE status. This is indicated using bar colors:

Screenshot 2024-07-24 at 3 08 00 PM

Hover over a bar to see details about the ASE status.

Screenshot 2024-07-24 at 3 09 22 PM

At the bottom of the tooltip, the ASE call is further explained with four fields:

  • #SNPs heterozygous in DNA

    • Total number of heterozygous SNPs in tumor genome over the gene body of this gene, as determined by tumor DNA sequencing
  • #SNPs showing ASE in RNA

    • Number of such heterozygous markers showing mono-allelic expression. A binomial test with p-value cutoff is used to determine if one heterozygous SNP sufficiently deviates from 50%, if so this SNP is “ASE”. There should be 0 ASE SNPs for bi-allelic expressing genes, and >=1 for mono-allelic expressing genes.
  • Mean delta of ASE SNPs

    • Mean value of (BAF-0.5) for all the markers. BAF: RNA B-allele frequency
  • Q-value

    • First, the binomial p-values of all ASE SNPs for a gene are combined into one value using geometric mean; then, the combined p-values for all genes from a tumor are multiple-test corrected, to obtain this q-value for each gene.

GenomePaint uses a decision tree to determine the ASE status of a gene in a sample (mono-allelic, bi-allelic, uncertain). To customize the cutoff values, click the gene label and select “Customize ASE/OHE parameters”:

Screenshot 2024-07-24 at 3 13 34 PM

In the ASE decision tree, three cutoff values are customizable:

Screenshot 2024-07-24 at 3 14 08 PM

For ASE, values for number of heterozygous SNPs, number of ASE SNPs, mean delta, and q-value are precomputed (Cis-X, Yu et. al., in submission) and cannot be recomputed on-the-fly.

“Automatic” genes

While you pan and zoom the view range, genes update automatically on the right in the expression column. It always shows the first gene in the view range.

When you zoom out and there are multiple genes in the view range, you may want to view some other gene rather than the default leftmost one. Click on the gene label on top of the rank axis and find a list of gene names from the view range. Choose a gene to change. By choosing a gene here, whenever this gene is in the view range, it will always be shown irrespective of its order of appearance.

Screenshot 2024-07-24 at 3 15 21 PM

“Fixed” genes

GenomePaint allows you to show the expression of multiple genes side-by-side:

Screenshot 2024-07-24 at 3 15 52 PM

The gene on the left is “automatic”. The gene on right is added by user, and will be always shown irrespective of the view range, hence the fixed genes.

To add a fixed gene, click the gene label on top of the automatic expression rank axis to show the menu. Type into the search box on top of the menu to find matching gene names:

Screenshot 2024-07-24 at 3 16 00 PM

Select a gene from the list, and its expression rank will appear as a new column. More than 1 fixed genes can be added. The samples are aligned for both automatic and fixed genes.

To remove a fixed gene, click the gene label and select “Remove”.

Screenshot 2024-07-24 at 3 16 05 PM

As an example, while browsing the recurrently duplicated NOTCH1 MYC enhancer locus in TALL, the distal target gene MYC is not in view range, thus its expression is not shown automatically.

Screenshot 2024-07-24 at 3 16 10 PM

By adding MYC as a “fixed gene”, its expression is shown for TALL tumors with enhancer duplication:

Screenshot 2024-07-24 at 3 16 16 PM

Defines the Interaction track.

{
	    name:"NALM6 in-situ Hi-C",
    type:"hicstraw",
    file:"files/hg19/nalm6/hic_Nalm6.inter.hic",
    percentile_max:95,
    mincutoff:0,
    pyramidup:1,
    enzyme:"MboI"
}

File format

A track file requires at least 3 columns, separated by tab:

  • Chromosome
  • Start position (0-based)
  • Stop position (not including the ending base)
  • Optional stringified JSON object

Each line is a genomic feature. Using a file with only the first 3 columns will produce a basic rendering of genomic segments.

https://proteinpaint.stjude.org/?genome=hg19&block=1&bedjfile=test,proteinpaint_demo/hg19/misc/rmsk.bed3.gz

Screenshot 2024-07-15 at 9 36 02 AM

Using JSON objects at the 4th column allows to richly describe the genomic features, which offers better flexibility compared to fixed columns of BED format and its variants. See section 3 about JSON object specification.

Prepare and use a track

$ sort -k1,1 -k2,2n [file] > [file.sorted]
$ mv [file.sorted] [file]
$ bgzip [file]
$ tabix -p bed [file].gz

To host the track on a web server: put both .gz and .gz.tbi files at the same directory on the web server. Obtain the URL to the .gz file and submit it to the browser.

To host the track on ProteinPaint server: put both .gz and .gz.tbi files at the same directory under the directory. Obtain the relative path to the .gz file and submit it to the browser.

Alternatively, a bigbed file could be used as source file and will be parsed into JSON-BED format: To host the track on a web server: put the bigbed file at the same directory on the web server. Obtain the URL to the bigbed file and submit it to the browser.

To host the track on ProteinPaint server: put bigbed file at the same directory under the directory. Obtain the relative path to the bigbed file and submit it to the browser.

JSON object for a BED item

The content is a string representation of an object (key-value pairs). Examples:

{"name":"CTCF1","strand":"+"}

{"name":"MIR6859-1","isoform":"NR_106918","strand":"-","exon":[[17368,17436]],"rnalen":68}

Format requirements:

  • No line breaks in the JSON text
  • Must include braces {}
  • Use double quotes for strings
  • Don’t use quotes for numerical values
  • Keys are case-sensitive
  • These keys cannot be used in JSON and will be ignored: chr, start, stop, canvas

Supported JSON keys:

"name": STR

  • Value is a string. For genes, “name” is gene symbol. When “itemurl_appendname” is specified, “name” is required to enable clicking on an item from track display and trigger a URL.

"isoform": STR

  • Gene isoform accession, e.g. NM_000546 or ENST00000269305. Both name and isoform can appear in the tooltip when hovering cursor over the track display.

"strand": STR

  • “+” or “-”. Unstranded if not provided.

"exon":[]

  • Array of two-number arrays, e.g. 665562,665731],[665277,665335],[661138,665184. Must be present for genes. All positions are 0-based. The stop position of an exon is the nucleotide next to the last exonic nucleotide, similar to the UCSC BED format. Notes:
    • For coding genes: Value should be a union of UTRs and CDS.
    • For noncoding genes: Value should be all exons.
    • Exons in this array are ordered from 5’ to 3’.
    • Despite the presence of “utr5”, “utr3”, “coding”, “intron” attributes, the “exon” attribute is still required.

"intron":[]

  • Same format as “exon[]”. Required for native gene tracks. The stop position of an intron is the first nucleotide in the exon.

"utr5":[]

  • Same format as “exon[]”. Required for any 5’ UTRs in coding genes.

"utr3":[]

  • Same format as “exon[]”. Required for any 3’ UTRs in coding genes.

"rnalen": INT

  • Base-pair length of RNA transcript. Required for all genes.

"cdslen": INT

  • Base-pair length of coding region length of an mRNA transcript. Required for coding genes. Will include nucleotides from incomplete codon as defined by startCodonFrame. Do not use for noncoding genes.

"codingstart": INT

  • Genomic position of the smaller boundary of the coding sequence. Required for coding gene. Do not use for noncoding gene.

"codingstop": INT

  • Genomic position of the bigger boundary of the coding sequence. Required for coding gene.

"startCodonFrame": 1/2

  • Tells how many nucleotides the “start codon” of this transcript should be shifted for translation. In the case of IGKC, startCodonFrame=1 means it will borrow 1 nt from the previous IGKJ exons. So the first two nucleotides of IGKC will not be translated when looking at IGKJ alone.

"coding":[ [start,stop], … ]

  • Same as “exon[]”, for coding exons only. Set “translatecoding” to true in the track object to enable gene translation according to the coding exons as well as coding frame defined by the “coding:[]” attribute. The translation can happen when the browser is at sufficient resolution. The first element will include nucleotides from incomplete codon as defined by startCodonFrame.

"description": STR

  • Some text, e.g. gene function.

"color": STR

"exon2color": [ { start, stop, color }, … ]

  • Optional. Each element: { start, stop, color } Start and stop are 0-based. This will override the item color for the matching exon from “exon[]” array.

"category": STR

  • Value is string or integer. Value must be a key of the “categories” attribute of the track object.

"isoformonly": STR

  • Experimental fix to filter bed items by isoform, so that certain bed items will be shown under a specific isoform. Value is an isoform accession.

Declaring a track as a JSON object

{
"type":"bedj",
"name":"gene track”,
"file":"anno/gencode.v24.hg19.gz",
"stackheight":14,
"stackspace":1
}

stackheight:20

  • Height of rows in number of pixels. All rows share the same height.
  • For gene tracks, this height will be the thickness of coding exons, while UTRs and noncoding exons will have height reduced by 4 pixels.

stackspace:1

  • Spacing distance between rows.

color:”blue”

  • Track rendering color for lines, boxes, and text labels. Per-item color defined in the track file will override this setting.

onerow:true

  • Value is “1” for true. Forces all items in the view range to be displayed in the same row, and item names will be hidden. Useful for making compact representation of certain tracks, e.g. chromHMM. Delete this attribute to cancel the effect.

categories:{}

  • List of categories, each item in the track will belong to one category and will be colored accordingly. E.g. {“1”:{“color”:”red”,”label”:”type 1”}, “2”:{“color”:”blue”,”label”:”type 2”}, … }

translatecoding:1

  • Will translate genes when the resolution is fine enough. This requires the .coding[] attribute in the JSON objects of BED items.

itemurl_appendname: URL

  • Allows clicking on an item from the track and open up a URL customized by the name of that item; item’s name will be appended to the end of the URL as the value of a parameter.
  • Example: given URL of “http://google.com?query=”. When clicked on an item named “HOX”, this URL will be triggered: http://google.com?query=HOX

hideItemNames: true

  • Do not show item names in the track

filterByName: “Item1\nItem2”

  • Multiple item names joined by line break. Will only show given items in the track.
⚠️ **GitHub.com Fallback** ⚠️