Embeddings visualization - ma-compbio/Higashi GitHub Wiki

In this documentation, all panels would be marked with Bold Italic, any components (buttons, sliders, menus) in the tool would be marked with code format. Similar to before, the file names are also marked with code format.

Visualization of the embedding vectors

Let's first get familiar with the tools on the Embedding visualization panel:

  • Pan: when activated, the mouse drag operation would be defined as moving the coordinates.
  • Lasso select: when activated, the mouse drag operation would be defined as the lasso selection of dots.
  • Box select: when activated, the mouse drag operation would be defined as box selection (rectangular shape) of dots.
  • Zoom: when activated, the mouse scroll wheel can be used to zoom in and zoom out of the scatter plots.
  • Click select: when activated, the user can click on the dots to select the dots
  • Reset: when clicked, the scatter plot would be restored to the default scale and range.
  • Save figure: when clicked, the user can save the scatterplot.
  • Display info when hover: when activated, when the user hovers the mouse over a dot, detailed information of the corresponding cell would show.

Define X and Y axis

By default, Higashi-vis uses the first two principal components (referred to as PC1 and PC2) to visualize the embeddings. However, in some cases, PC1 largely corresponds to read depths, batch effects, or even outliers. Thus, one can use the x-axis/y-axis dropdown menu to choose the definition of the x/y axis to be the third principal component.

Note: when using visualization methods such as TSNE/UMAP, if both x and y-axis is defined as 1 or 2, the embeddings will be projected to a 2-dimensional space. If any of x or y-axis is defined as 3, the embeddings will be projected to a 3-dimensional space, and the selected dimensions will be visualized in the scatter plot.

Choose visualization method

For scHi-C datasets with a large number of cells, PCA might not be the appropriate way to visualize the embeddings. In Higashi-vis, one can choose to use TSNE, UMAP, and a lot of visualization methods that are commonly used for visualizing single-cell datasets. Use the dropout down menu Vis method to choose the appropriate visualization method.

Change the color of the scatter plot

Use the color scheme dropdown menu to decide how to color the scatter plot. Higashi-vis automatically assumes information stored at the label_info.pickle file which is part of the input dataset as potential coloring schemes. Higashi-vis supports both discrete and continuous color schemes. When selecting a specific color scheme, not only will the scatter plot be colored correspondingly, the Statistics visualization panel would also visualize a barplot of the distribution of the labels. When the color scheme is continuous, the Statistics visualization panel would visualize a histogram of the distribution instead.

In Higashi-vis, besides the label information provided in the label_info.pickle, it also provides three extra coloring schemes.

  • kde: log pdf value from kernel density estimation
  • kde_ratio: log difference of pdf from kernel density estimation with different kernel bandwidth (which can be regarded as local density)
  • read_count: log10 of the read counts of the selected chromosome

Change the size of the scatter plot

Use the scatter size slider to choose the size of the dots in the scatter plot

Hover for more information

When your mouse hover over elements such as the dots in the scatter plot or the bar in the Statistics visualization panel, more detailed information such as the cell index, how many cells are there for a specific group would be displayed.

Update after 2021-04-01: If cell_name_higashi is provided in the label_info.pickle, the cell name information would also be displayed here when the mouse hover over a dot in the scatter plot.

Drag and zoom

The scatter plot in Higashi-vis supports drag and zoom-in/zoom-out. Click the zoom button to activate zoom-in/out with the mouse scroll wheel. Click the reset button for the default scale of the scatter plot.

Note: The reset button in the scatter plot and the Reload green button in the Control panel does different things. The former one only changes the scale of the scatter plot without reloading the embeddings from the disk while the latter one would reload the embeddings first. The Reload green button is helpful when the user is trying to visualize the embeddings in the middle of the training process to inspect if the model has converged.