Interaction Tracks: Hi‐C and Arc - stjude/proteinpaint GitHub Wiki
Table of Contents
For examples of the Hi-C track, please visit the Hi-C card on our homepage. Please visit the Arc card for examples of the Arc track, also on our homepage.
Introduction to Hi-C and Arc Interaction Tracks
Hi-C and Arc interaction tracks are powerful tools for visualizing the three-dimensional structure of the genome.
Hi-C Tracks: Hi-C is a method that allows for the study of the three-dimensional architecture of genomes by coupling proximity-based ligation with massively parallel sequencing. The Hi-C tracks on ProteinPaint represent the frequency of interaction between different parts of the genome.
Arc Tracks: Arc tracks are a type of visualization that display physical interactions between different genomic regions. These interactions are represented as arcs, with the height of the arc indicating the frequency of the interaction.
Track data formats
A genomic interaction track can draw data from one of three sources:
- Juicebox *.hic file, for Hi-C experiments. Learn about juicebox
- BED file, compressed and indexed
- Copy-paste text data
Juicebox Hi-C
This is a required data format for Hi-C interaction matrix information. The resulting file (*.hic) is a binary file. The file can be hosted using a publicly-accessible URL, or under the directory of the ProteinPaint server. Please note an Arc track can also be created from a .hic file.
BED-like file
Create a bed-like file to display the Arc track. Example:
chr1 713192 714392 chr1 1091970 1094482 1
chr1 713192 714392 chr1 1141602 1143157 1
chr1 792622 794350 chr1 803459 806745 1
chr1 792622 794350 chr1 808880 812725 1
chr1 792622 794350 chr1 872223 875765 4
chr1 792622 794350 chr1 988940 990936 1
Columns:
- Chromosome of 1st region
- Start coordinate of 1st region
- Stop coordinate of 1st region
- Chromosome of 2nd region
- Start coordinate of 2nd region
- Stop coordinate of 2nd region
- Numerical value
Notes:
- Separate columns by tabs
- Coordinate positions are 0-based
- Each A-B interaction must be represented by two lines:
- First line: chrA startA stopA chrB startB stopB value
- Second line: chrB startB stopB chrA startA stopA value
To prepare a text file into track file, run following:
$ sort -k1,1 -k2,2n file > sortedfile
$ bgzip sortedfile
$ tabix -p bed sortedfile.gz
This generates two files, “sortedfile.gz” and “sortedfile.gz.tbi”.
To host this track, put both files under the same directory on a web server and obtain URL to the .gz file. Alternatively, put both files under the directory of the ProteinPaint server and obtain path to the .gz file.
Text data, copy & paste
Data can also be copied and pasted via the genome browser to create an Arc track. Click on Tracks -> Add custom Track -> Interaction Example:
chr5 98725000 98750000 chr5 99675000 99700000 0,255,255 12.0
chr5 20725000 20750000 chr5 34100000 34125000 0,255,255 13.0
chr5 21525000 21550000 chr5 29425000 29450000 0,255,255 17.0
chr5 177375000 177400000 chr5 178925000 178950000 0,255,255 18.0
Columns:
- Chromosome of 1st region
- Start coordinate of 1st region
- Stop coordinate of 1st region
- Chromosome of 2nd region
- Start coordinate of 2nd region
- Stop coordinate of 2nd region
- This field is not used
- Numerical value
Notes:
- Separate columns by either tabs or spaces; multiple consecutive tabs or spaces are considered to be one separator
- Each interaction is represented by one line, no duplication
- Interactions can be intra- or inter-chromosomal.
- Genomic positions are 1-based. Start and stop can be the same position.
Running the Hi-C track
Accessing Hi-C and Arc Interaction Tracks
Follow these steps to access the Hi-C and Arc interaction tracks:
- Navigate to the ProteinPaint server and load your data.
- Select the appropriate genome assembly from the drop down in the header. Click on the genome browser button to the right.
- Click on the "Tracks" button located in the top menu (“Tracks” > “Add custom track” > “Interaction”).
- From the dropdown menu, select either "Add Hi-C Track" or "Add Arc Track" depending on your needs.
- A popup window will appear. Here, choose the dataset you wish to visualize.
From a URL
Please see the Hi-C URL parameter documentation. All URL parameters are documented on this wiki page.
Embedded in website
Please see the examples in the Hi-C card. Click on the Code button to see the runproteinpaint()
call. Please see the documentation on embedding for more information.
Genome View
To create a genome view, specify the genome parameter in the hic object (see code below). The genome parameter should be set to the genome version matching the reference in the file (e.g., 'hg19').
runproteinpaint({
holder: document.getElementById('aaa'),
parseurl: true,
noheader: true,
hic: {
genome: 'hg19',
file: "proteinpaint_demo/hg19/hic/hic_demo.hic",
enzyme:"MboI"
}
});
Chromosome pair View
To create a Chromosome pair (intra or inter-chromosomal) view, specify the position1 and position2 parameters in the hic object. These parameters should be set to the chromosome pairs (e.g., 'chr1' and 'chr2').
hic: {
genome: 'hg19',
file: "proteinpaint_demo/hg19/hic/hic_demo.hic",
enzyme:"MboI"
position1: "chr1",
position2: "chr2"
}
});
Detailed View
To create a detailed view (zoomed-in views of chromosome pair), specify the position1 and position2 parameters with the regions of interest in the hic
argument. Use this number format as a guide chr##:####-####
.
Example:
hic: {
genome: "hg19",
file: "proteinpaint_demo/hg19/hic/hic_demo.hic",
enzyme:"MboI"
position1: "chr8:125470045-130470044",
position2: "chr4:172643663-177643662"
}
});
Runproteinpaint Parameters
Track
Please see examples from the Hi-C card and Arc Track card by clicking the Code
button.
Hic:
In the tracks
array, add an object with the hicstraw
type. See a description of the available arguments in the example below:
{
type: "hicstraw", //Required.
file: "proteinpaint_demo/hg19/hic/hic_demo.hic", //Either .file or .url must be present to fun the track
name: "Hi-C Demo", //Optional. The track name to show to the left
percentile_max: 95, //Optional. Determines the maximum value to display as a percentile of the data.
mincutoff: 1, // Optional. The minimum value to show.
pyramidup: 1, //Optional. 1 or true indicates points the heatmap up. 0 or false points the heatmap down.
enzyme: "MboI", //Optional. Choose a restriction enzyme
normalizationmethod: "VC" //Optional. Choose a normalization method present in the file
}
Arc:
In the tracks array, add an object with the hicstraw
type. Unlike the example above, include mode_arc:true
and mode_hm:false
in the object. Here's an example:
{
type: "hicstraw",
bedfile: "proteinpaint_demo/hg19/arc/mango.gz",
name: "Arc Track Demo",
percentile_max: 99,
mode_arc: true,
mode_hm: false
}
App
Please see examples from the Hi-C card by clicking the Code
button.
hic:{...}
: This argument launches the hic app with the genome, chromosome, and detail views.
.genome
: This is the genome version to be visualized. Common values include 'hg19', 'hg38', etc.
.file
: This is the path to the Hi-C data file. The file path must be under the 'tp' folder. For example, "tp/proteinpaint_demo/hg19/hic/hic_demo.hic".
.url
: Provide the file via URL
enzyme
: This is the restriction enzyme used in the Hi-C experiment. Common values include 'MboI', 'HindIII', 'NcoI', 'NlaIII', etc. The available enzymes depend on the data in the Hi-C file.
.position1
and .position2
: These are the regions to be visualized. They should be in the format "chr:start-stop", where "chr" is the chromosome, and "start" and "stop" are the start and stop positions of the region.
.state
: Set the normalization method and matrixType (described below) for one or each view.
Example
state: {
chrpair: { nmeth: 'VC', matrixType: 'observed'}
}
Here's a brief explanation of these two options:
nmeth
: The value must match one of the normalization methods present in the file. Common values include 'NONE', 'VC', 'VC_SQRT', 'KR', etc. The default value is 'NONE', which means no normalization is applied.
matrixType
: Available options are observed, expected, oe (Observed/Expected), or log(oe) (Log(Observed/Expected). The default value is observed.
User interface
ProteinPaint Hi-C Data Visualization Tutorial
This tutorial describes the process of visualizing Hi-C data using the ProteinPaint tool.
Step 1: Understanding the URL Parameters
The URL contains several parameters that define the visualization:
genome=hg19: This sets the genome version to 'hg19'. hicfile=files/hg19/nbl-hic/hic_NB69.inter.hic: This sets the path to the Hi-C data file. enzyme=MboI: This sets the restriction enzyme used in the Hi-C experiment to 'MboI'.
The final URL is https://proteinpaint.stjude.org/?genome=hg19&hicfile=files/hg19/nbl-hic/hic_NB69.inter.hic&enzyme=MboI
Step 2: Exploring the User Interface
The whole-genome view appears first. This view shows the interaction frequencies between different regions of the genome.
Step 3: Interacting with the Visualization (Genome view)
Interact with this visualization by clicking on different parts of it. Clicking on a specific region will zoom in on that region and show a more detailed view of the interaction frequencies. This can be either an inter-chromosomal view (showing interactions between different chromosomes) or an intra-chromosomal view (showing interactions within the same chromosome).
Step 4: Interacting with the Visualization (Chromosomal view)
Click on a specific region in the Chromosomal View to switch to the Detailed View. This view provides a more detailed look at the interactions within the selected region.
Step 5: Interacting with the Visualization (Detailed view)
In the Detailed View, see the specific interactions between different parts of the selected region. Use the options in the user interface to adjust the view, normalization method, matrix type, and min and max cut off values. Use the button in the controls to open the detail view.
Step 6: Interacting with the Visualization (Horizontal view)
This view provides a linear, horizontal representation of the genomic region, which can be useful for seeing the distribution of interactions along the length of the region. The resolution of the view can be adjusted and it goes up to Fragments. Use the controls in the user interface to adjust the view, normalization method, matrix type, min and max cut off and values arc view.
Interpreting Hi-C and Arc Interaction Tracks
Once you've successfully loaded a Hi-C or Arc track on the ProteinPaint server, you'll be presented with a visual representation of genomic interactions. This section will guide you through understanding these visualizations.
Understanding the Arcs
The primary visual elements in both Hi-C and Arc tracks are arcs. Each arc represents an interaction between two genomic regions. The start and end points of the arc correspond to these interacting regions on the genome.
Frequency of Interactions
The height of each arc is not arbitrary. It corresponds to the frequency of the interaction it represents. Higher arcs indicate more frequent interactions between the genomic regions they connect. This frequency is determined based on the data from the Hi-C or other chromatin conformation capture experiments.
Color Coding
Depending on the specific settings of your ProteinPaint server, the arcs may also be color-coded. This color coding can provide additional information about the interactions, such as the strength of the interaction, the type of interaction, or other experiment-specific data. Be sure to refer to the legend or the specific documentation for your dataset to understand what these colors represent.
Interactions and Genomic Structure
The interactions represented by these arcs can provide valuable insights into the three-dimensional structure of the genome. Regions that frequently interact are likely to be close to each other in the physical structure of the chromosome. By studying these interactions, researchers can gain insights into the spatial organization of the genome and how this organization affects gene regulation, genetic diseases, and other important biological processes.
Conclusion
Hi-C and Arc interaction tracks are powerful tools for visualizing genomic structure. By understanding how to access and interpret these tracks, you can gain valuable insights into the three-dimensional organization of genomes.