Interaction Tracks: Hi‐C and Arc - stjude/proteinpaint GitHub Wiki

Table of Contents


For examples of the Hi-C track, please visit the Hi-C card on our homepage. Please visit the Arc card for examples of the Arc track, also on our homepage.

Introduction to Hi-C and Arc Interaction Tracks

Hi-C and Arc interaction tracks are powerful tools for visualizing the three-dimensional structure of the genome.

Hi-C Tracks: Hi-C is a method that allows for the study of the three-dimensional architecture of genomes by coupling proximity-based ligation with massively parallel sequencing. The Hi-C tracks on ProteinPaint represent the frequency of interaction between different parts of the genome.

Arc Tracks: Arc tracks are a type of visualization that display physical interactions between different genomic regions. These interactions are represented as arcs, with the height of the arc indicating the frequency of the interaction.

Track data formats

A genomic interaction track can draw data from one of three sources:

  1. Juicebox *.hic file, for Hi-C experiments. Learn about juicebox
  2. BED file, compressed and indexed
  3. Copy-paste text data

Juicebox Hi-C

This is a required data format for Hi-C interaction matrix information. The resulting file (*.hic) is a binary file. The file can be hosted using a publicly-accessible URL, or under the directory of the ProteinPaint server. Please note an Arc track can also be created from a .hic file.

BED-like file

Create a bed-like file to display the Arc track. Example:

chr1    713192  714392  chr1    1091970 1094482 1
chr1    713192  714392  chr1    1141602 1143157 1
chr1    792622  794350  chr1    803459  806745  1
chr1    792622  794350  chr1    808880  812725  1
chr1    792622  794350  chr1    872223  875765  4
chr1    792622  794350  chr1    988940  990936  1

Columns:

  1. Chromosome of 1st region
  2. Start coordinate of 1st region
  3. Stop coordinate of 1st region
  4. Chromosome of 2nd region
  5. Start coordinate of 2nd region
  6. Stop coordinate of 2nd region
  7. Numerical value

Notes:

  • Separate columns by tabs
  • Coordinate positions are 0-based
  • Each A-B interaction must be represented by two lines:
    • First line: chrA startA stopA chrB startB stopB value
    • Second line: chrB startB stopB chrA startA stopA value

To prepare a text file into track file, run following:

$ sort -k1,1 -k2,2n file > sortedfile
$ bgzip sortedfile
$ tabix -p bed sortedfile.gz

This generates two files, “sortedfile.gz” and “sortedfile.gz.tbi”.

To host this track, put both files under the same directory on a web server and obtain URL to the .gz file. Alternatively, put both files under the directory of the ProteinPaint server and obtain path to the .gz file.

Text data, copy & paste

Data can also be copied and pasted via the genome browser to create an Arc track. Click on Tracks -> Add custom Track -> Interaction Example:

chr5	98725000	98750000	chr5	99675000	99700000	0,255,255	12.0
chr5	20725000	20750000	chr5	34100000	34125000	0,255,255	13.0
chr5	21525000	21550000	chr5	29425000	29450000	0,255,255	17.0
chr5	177375000	177400000	chr5	178925000	178950000	0,255,255	18.0

Columns:

  1. Chromosome of 1st region
  2. Start coordinate of 1st region
  3. Stop coordinate of 1st region
  4. Chromosome of 2nd region
  5. Start coordinate of 2nd region
  6. Stop coordinate of 2nd region
  7. This field is not used
  8. Numerical value

Notes:

  • Separate columns by either tabs or spaces; multiple consecutive tabs or spaces are considered to be one separator
  • Each interaction is represented by one line, no duplication
  • Interactions can be intra- or inter-chromosomal.
  • Genomic positions are 1-based. Start and stop can be the same position.

Running the Hi-C track

Accessing Hi-C and Arc Interaction Tracks

Follow these steps to access the Hi-C and Arc interaction tracks:

  • Navigate to the ProteinPaint server and load your data.
  • Select the appropriate genome assembly from the drop down in the header. Click on the genome browser button to the right.
  • Click on the "Tracks" button located in the top menu (“Tracks” > “Add custom track” > “Interaction”).
  • From the dropdown menu, select either "Add Hi-C Track" or "Add Arc Track" depending on your needs.
  • A popup window will appear. Here, choose the dataset you wish to visualize.

From a URL

Please see the Hi-C URL parameter documentation. All URL parameters are documented on this wiki page.

Embedded in website

Please see the examples in the Hi-C card. Click on the Code button to see the runproteinpaint() call. Please see the documentation on embedding for more information.

Genome View

To create a genome view, specify the genome parameter in the hic object (see code below). The genome parameter should be set to the genome version matching the reference in the file (e.g., 'hg19').

runproteinpaint({
  holder: document.getElementById('aaa'),
  parseurl: true,
  noheader: true,
  hic: {
        genome: 'hg19',
        file: "proteinpaint_demo/hg19/hic/hic_demo.hic",
        enzyme:"MboI"
    }
});

Chromosome pair View

To create a Chromosome pair (intra or inter-chromosomal) view, specify the position1 and position2 parameters in the hic object. These parameters should be set to the chromosome pairs (e.g., 'chr1' and 'chr2').

    hic: {
        genome: 'hg19',
        file: "proteinpaint_demo/hg19/hic/hic_demo.hic",
        enzyme:"MboI"
        position1: "chr1",
        position2: "chr2"
    }
});

Detailed View

To create a detailed view (zoomed-in views of chromosome pair), specify the position1 and position2 parameters with the regions of interest in the hic argument. Use this number format as a guide chr##:####-####. Example:

    hic: {
        genome: "hg19",
        file: "proteinpaint_demo/hg19/hic/hic_demo.hic",
        enzyme:"MboI"
        position1: "chr8:125470045-130470044",
        position2: "chr4:172643663-177643662"
    }
});

Runproteinpaint Parameters

Track

Please see examples from the Hi-C card and Arc Track card by clicking the Code button.

Hic: In the tracks array, add an object with the hicstraw type. See a description of the available arguments in the example below:

{
    type: "hicstraw", //Required. 
    file: "proteinpaint_demo/hg19/hic/hic_demo.hic", //Either .file or .url must be present to fun the track
    name: "Hi-C Demo", //Optional. The track name to show to the left
    percentile_max: 95, //Optional. Determines the maximum value to display as a percentile of the data.
    mincutoff: 1, // Optional. The minimum value to show.
    pyramidup: 1, //Optional. 1 or true indicates points the heatmap up. 0 or false points the heatmap down.
    enzyme: "MboI", //Optional. Choose a restriction enzyme
    normalizationmethod: "VC" //Optional. Choose a normalization method present in the file
}

Arc: In the tracks array, add an object with the hicstraw type. Unlike the example above, include mode_arc:true and mode_hm:false in the object. Here's an example:

{
    type: "hicstraw",
    bedfile: "proteinpaint_demo/hg19/arc/mango.gz",
    name: "Arc Track Demo",
    percentile_max: 99,
    mode_arc: true,
    mode_hm: false
}

App

Please see examples from the Hi-C card by clicking the Code button.

hic:{...}: This argument launches the hic app with the genome, chromosome, and detail views.

.genome: This is the genome version to be visualized. Common values include 'hg19', 'hg38', etc.

.file: This is the path to the Hi-C data file. The file path must be under the 'tp' folder. For example, "tp/proteinpaint_demo/hg19/hic/hic_demo.hic".

.url: Provide the file via URL

enzyme: This is the restriction enzyme used in the Hi-C experiment. Common values include 'MboI', 'HindIII', 'NcoI', 'NlaIII', etc. The available enzymes depend on the data in the Hi-C file.

.position1 and .position2: These are the regions to be visualized. They should be in the format "chr:start-stop", where "chr" is the chromosome, and "start" and "stop" are the start and stop positions of the region.

.state: Set the normalization method and matrixType (described below) for one or each view. Example

state: {
   chrpair: { nmeth: 'VC', matrixType: 'observed'}
}

Here's a brief explanation of these two options:

nmeth: The value must match one of the normalization methods present in the file. Common values include 'NONE', 'VC', 'VC_SQRT', 'KR', etc. The default value is 'NONE', which means no normalization is applied.

matrixType: Available options are observed, expected, oe (Observed/Expected), or log(oe) (Log(Observed/Expected). The default value is observed.

User interface

ProteinPaint Hi-C Data Visualization Tutorial

This tutorial describes the process of visualizing Hi-C data using the ProteinPaint tool.

Step 1: Understanding the URL Parameters

The URL contains several parameters that define the visualization:

genome=hg19: This sets the genome version to 'hg19'. hicfile=files/hg19/nbl-hic/hic_NB69.inter.hic: This sets the path to the Hi-C data file. enzyme=MboI: This sets the restriction enzyme used in the Hi-C experiment to 'MboI'.

The final URL is https://proteinpaint.stjude.org/?genome=hg19&hicfile=files/hg19/nbl-hic/hic_NB69.inter.hic&enzyme=MboI

Step 2: Exploring the User Interface

The whole-genome view appears first. This view shows the interaction frequencies between different regions of the genome.

Step 3: Interacting with the Visualization (Genome view)

Interact with this visualization by clicking on different parts of it. Clicking on a specific region will zoom in on that region and show a more detailed view of the interaction frequencies. This can be either an inter-chromosomal view (showing interactions between different chromosomes) or an intra-chromosomal view (showing interactions within the same chromosome).

Step 4: Interacting with the Visualization (Chromosomal view)

Click on a specific region in the Chromosomal View to switch to the Detailed View. This view provides a more detailed look at the interactions within the selected region.

Step 5: Interacting with the Visualization (Detailed view)

In the Detailed View, see the specific interactions between different parts of the selected region. Use the options in the user interface to adjust the view, normalization method, matrix type, and min and max cut off values. Use the button in the controls to open the detail view.

Step 6: Interacting with the Visualization (Horizontal view)

This view provides a linear, horizontal representation of the genomic region, which can be useful for seeing the distribution of interactions along the length of the region. The resolution of the view can be adjusted and it goes up to Fragments. Use the controls in the user interface to adjust the view, normalization method, matrix type, min and max cut off and values arc view.

Interpreting Hi-C and Arc Interaction Tracks

Once you've successfully loaded a Hi-C or Arc track on the ProteinPaint server, you'll be presented with a visual representation of genomic interactions. This section will guide you through understanding these visualizations.

Understanding the Arcs

The primary visual elements in both Hi-C and Arc tracks are arcs. Each arc represents an interaction between two genomic regions. The start and end points of the arc correspond to these interacting regions on the genome.

Frequency of Interactions

The height of each arc is not arbitrary. It corresponds to the frequency of the interaction it represents. Higher arcs indicate more frequent interactions between the genomic regions they connect. This frequency is determined based on the data from the Hi-C or other chromatin conformation capture experiments.

Color Coding

Depending on the specific settings of your ProteinPaint server, the arcs may also be color-coded. This color coding can provide additional information about the interactions, such as the strength of the interaction, the type of interaction, or other experiment-specific data. Be sure to refer to the legend or the specific documentation for your dataset to understand what these colors represent.

Interactions and Genomic Structure

The interactions represented by these arcs can provide valuable insights into the three-dimensional structure of the genome. Regions that frequently interact are likely to be close to each other in the physical structure of the chromosome. By studying these interactions, researchers can gain insights into the spatial organization of the genome and how this organization affects gene regulation, genetic diseases, and other important biological processes.

Conclusion

Hi-C and Arc interaction tracks are powerful tools for visualizing genomic structure. By understanding how to access and interpret these tracks, you can gain valuable insights into the three-dimensional organization of genomes.