Visualize your WGCNA network in Cytoscape - Persilian/WGCNA GitHub Wiki

After completing the WGCNA network construction it's time to visualize. Since it is both computationally intensive and overwhelming, it does not make sense to visualize the entire network you have built. Depicting ten thousands of genes (nodes) in a network figure will be messy. Therefore we want to select genes of interest and depict their connections, give them attributes (for example "cold up-regulated", or annotation with an interesting GO-term) and color this subnetwork according to these node-attributes.

Identify genes of interest

Genes of interest are obviously genes that are differentially expressed in response to the treatments of an experiment. While differentially expressed genes (DEGs) are good candidates, they are not the only genes that work in response to a treatment. Constitutively expressed genes that are not changing their expression significantly in response to a treatment, are nonetheless part of pathways responding to treatments/environments.

In WGCNA stage 4 you can generate a gene-treatment correlation file, which depicts the correlation of each gene in the network with the treatments you have expression data of. Extracting candidate genes responding to a specific treatment can now be easily done by deciding on a correlation-cutoff (e.g. all genes with a correlation of >0.9 to a treatment, plus all differentially expressed genes in response to the same treatment). Note that these correlation values are more accurate the more samples/libraries you input into your network construction. Alternatively you can look at the heatmaps from WGCNA stage 5, where you can identify modules that correlate highly with your treatments and just use all the genes from those interesting modules. The gene-lists for all modules in the created network are stored in /Data/Gene_lists.

Once you have decided on how to select your candidate genes, make a simple "gene_list.txt" with them, of the form

gene1
gene2
gene3
etc.

Generate a node-attribute file

Possibly you want to have some information tied to your candidate genes. For that you can generate a "node-attribute file", which is a table with all genes of your network with genes as rows and any number of columns with "attributes". An attribute column can for example be "cold_upregulated" and each row would then state whether the genes is cold up-regulated (yes), or not (no). Apart from categorial variables, you can also have continuous variables as attributes, for example log-fold change and connectivity. Later in Cytoscape, you will be able to color and shape the nodes and edges of your network according to these attributes.
An example of how to format a node-attribute file can be found here WGCNA_node_attributes_formatting_example.R.

gene    attribute1    attribute2    attribute3
gene1   etc.    etc.    etc.
gene2   etc.    etc.    etc.
gene3   etc.    etc.    etc.
etc.

Extract edge- and node-files for your candidate genes

With Cytoscape_extract.R you can extract edge- and node-files of your candidate genes, compatible with Cytoscape. Just specify your "gene_list.txt" and load the adjacency matrix of your network.

Choose an adjacency threshold, which will not extract edges (connections between nodes) of a value below that threshold. Usually we are interested in the stronger connections only, so applying a threshold between 0.5-0.8 is reasonable. This is also a good way to filter out weakly connected genes from your candidate gene list. The genes remaining in your network after thresholding can be extracted from the node-file, which is stored by default in /Data/cytoscape/. Just keep in mind that weakly connected genes are genes with very individual expression profiles (dissimilar from all other expression profiles) and may therefore still be interesting.

Note: Use at the very least a threshold of 0.05 in order to exclude 0-weight connections. Including 0-weight connections will increase the size of your Edge- and Node-files dramatically and will have Cytoscape consume large amounts of RAM for depicting weak connections. A computer with 8Gb of RAM and a i5 CPU can handle about 50'000 edges in cytoscape, everything above that will take an unresaonable amount of time.

Note: If you have few replicates (e.g. RNAseq libraries) per treatment, it is possible that you will get many genes that have very strong connections. This happens because the averages of the expression profiles of genes have large error bars and therefore expression profiles can have high correlation values. In this case you will get large edge-files too, which you must reduce by applying adjacency-thresholds of >0.9.

If you have a node-attribute file, you can specify it in Cytoscape_extract.R as well. Node-attribute files can be modified and loaded into Cytoscape at any time.

Import your network in Cytoscape

Open Cytoscape and import your network by File > Import > Network from File and choose the edge-file you’ve created with Cytoscape_extract.R. Select “fromNode”-transcripts as source node and “toNode”-transcripts as target nodes, leave the weight column as an “Edge Attribute” and do not import the “direction”, “fromAltName” and toAltName” columns unless you are working with a directed network and have alternative names for your nodes (e.g. alternative gene annotations).

Network layout

There is a plethora of layouts to choose from and even more settings to fine-tune these layouts. For weighted gene co-expression networks the “Prefuse Force Directed Layout” is a good start. Go to Layout > Settings > Layout Algorithm > Prefuse Force Directed layout.

The prefuse force directed layout is a good layout to intuitively depict the strength of connections between genes. Here, nodes will repel each other like electrons and at the same time they are pulled together through the edges that act like springs. Ultimately, the network will be depicted in a “minimal energy state”, clustering strongly connected nodes together and repelling weakly connected nodes.

To achieve a visually pleasing result you have to find parameters that fit your individual network. Let's assume a network with ca. 3'500 nodes and ca. 40'000 edges. In the prefuse force directed layout settings, specify the weight-column in your edge-file. Set the weight values to "normalized value". Set the minimum and maximum edge-weight to consider according to your needs. Here you can once more filter for stronger connections, but the smallest edge-weight in your network will be what you specified previously as the "adjacency threshold". It is recommended to have "Number of Iterations" of at least 1000, so the network can enter it's "minimal energy state". After a 1000 iterations, the network usually doesn't change much, but takes longer to compute.

The Default spring coefficient represents the "stiffness" of the springs (edges), meaning a higher spring coefficient will result in a more dense network. Explore this parameter for your network. Default spring length can be adjusted to make a network more, or less dense too. Default Node Mass can be set to a 1000 in a network of ca. 3'500 nodes and 40'000 edges. This value can also be adjusted to increase, or decrease network density.

Import node-attributes

Import the node-attribute file you have created previously by File > Import > Table from File. Now all nodes in your sub-network have attributes we can colour and shape after.

Style your network

In your Cytoscape user interface go to the “Style”-tab on the left. Here you have numerous options to fulfil your artistic dreams. To colour nodes, first set the default colour of all nodes to black. Continue with setting a bypass colour for specific groups of nodes. Do this by selecting a group of nodes in the “Node Table” at the bottom of the Cytoscape user interface. Select rows, right click and click “select nodes from selected rows”. Go to the “Fill Color” property and set a bypass color. Keep in mind that bypass colouring will also colour nodes that you have colored before.

You can also visualize continuous values (e.g. log-fold change) attributed to the nodes in your network. For example, you may want to increase the size of the nodes relative to their log-fold change. Click the “Lock node width and height” in order to show the “Size” property. With the “Size” property, you can change the size of a node. Click on the middle box (Map.) of the “Size” property to map values of a column from your node-attribute file. Choose a column with log-fold change and set the “Mapping Type” to “Continuous Mapping”. Click on the “Current Mapping” slope and increase the value of the slope to 1-2 orders of magnitude by dragging the right side of the slope upwards. This will make the nodes with the smallest log-fold change 10-100 smaller than the nodes with the largest log-fold change. You can apply this strategy for every continuous value and you can also apply it to other properties such as the label font size and the border colour.

Congratulations, you have acquired some of the basic tools to generate and visualize networks. Feel free to explore more properties and network layouts that may convey your ideas better than the suggestions made here.