Markov Clustering - NETESOLUTIONS/ERNIE GitHub Wiki

We are exploring the use of Markov Clustering via the MCL algorithm for community detection in citation data. Apart from the source code in C distributed by van Dongen, implementations in R (MCL package) and Python (markov_clustering) are also available. Documentation and behavior of the C, R, and Python versions don't sync very well, so we use van Dongen's code. Reading the documentation is not only sensible, it's essential.

  1. Download the latest version of the software, unpack the .gz file with tar -xvcf.
  2. Navigate to the mcl-* directory and type ./configure.
  3. Then make install.

The package compiled with two Warnings on OSX 10.15.4 (Apple has advice on how to delve deeper but our reference environment is CentOS so I didn;t do anything about this) and loads various executables into /Users/your_user_id/local/bin (I have admin privileges so I didn't use sudo) so you have create aliases and/or EXPORT PATH. On our CentoS 7.x server, I used sudo make install and the package compiled, without error or warnings, various executables into /usr/local/bin.

Input file formats. The easy way is to generate a two column edgelist using labels and add an optional third column of edge-weights. Thus, I exported tsv files from an edge list in R. Headers should not be included as far as I can tell (I don't know whether one can have commented headers), e.g.

> write.table(x, file='x.tsv', quote=FALSE, sep='\t', row.names=FALSE)

For such a file:

mcl input_file.abc --abc -o out.input_file should get your first run working (the file extension doesn't have to be abc and can be named anything sensible like .tsv). The output is a file where each line is a cluster of tab-separated labels.

The more complicated way, but more useful eventually, is to use native mode format. For that you convert abc format to matrix format using mcxload

mcxload --stream-mirror -abc data1.txt -o data1.mci -write-tab data1.tab

In the command above, data1.txt (abc format) is converted to data1.mci (native or matrix format) along with a dictionary that allows reconversion to labels (data1.tab).

To run mcl on native files skip the --abc flag. Thus,

mcl data.mci -I 1.4 (inflation parameter of 1.4 would result in an output file out.x.mci.I24. Finally you can get some eval data by running clm in info and mode on input and output files as in

clm info x.mci out.x.mci.I24

To convert back from matrix format to labels one uses mcxdump. For example,

mcxdump -icl out.top_sfweights.mci.I20 -tabr top_sfweights.tab -o dump.data.mci.I14

The resultant output can be easily inhaled into R and converted to a list in, which each element is a vector of labels corresponding to the contents of a cluster and thus, easy to interpret in the original context of the problem.

df <- readLines('dump.data.mci.I14').
list <- lapply(df, function(x) scan(text = x, what = character(), quiet = TRUE)).

Checking that your input graph is symmetric. Use the following syntax. The output is a file named check that should have the same dimensions in the header as the source file.

mcxi /matrix.mci lm tp -1 mul add /check wm

Minor Note: I was using scp (bigint) as labels and the R version didn't like it so I prefixed each scp with the letter 'a' to ensure that it was treated a string and just carried the practice over into the C version.

For everything else RTFM, which is a little cryptic- even the author of mcl acknowledges this.

A remark on the sloppy naming conventions used for mcl and its sibling utilities may be in order here. The prefix mcx is used for generic matrix functionality, the prefix clm is used for generic cluster functionaliy. The utility mcx is a general purpose interpreter for manipulating matrices (and grahps, sets, and clusterings). The set of all mcl siblings (cf. mclfamily) is loosely refered to as the mcl family, which makes use of the mcl libraries (rather than the mcx libraries). The full truth is even more horrible, as the mcl/mcx prefix conventions used in the C source code follow still other rules.

⚠️ **GitHub.com Fallback** ⚠️