Home - MASantos/Partanalyzer GitHub Wiki
Welcome to the Partanalyzer wiki!
The MAN file contains the full output of --help option. It explains all of them and provides a few examples.
Full help (as of version alpha 1.0.)
partanalyzer (Partition Analyzer)
Usage:
partanalyzer [-h|--help] (Use --help for more details)
partanalyzer --version
partanalyzer [OPTIONS] COMMAND ARGS
OPTIONS
--debug
--verbose
-q , --quiet
-z, --pid-normalization [s|p|r|l]
-t , --format partition_format
--tab tab_file
--DIST_SUBSPROJECT
--beta beta_value
--mu mu_value
COMMANDS
Defining the algebra of partitions
(-i|-u) partition1 partition2 [partition1_offset (=2) ] [partition2_offset (=partition1_offset) ]
(-I|-U) [-ofs partition_offset (=2)] [-f partition_list | partition1 [ partition2 [ ... ]]]
For analyzing partitions
(-v|-e|-p) partition1 partition2 [partition1_offset (=2) ] [partition2_offset (=partition1_offset) ]
-c matrix-of-values partition1 [threshold (=-1.0)] [partition_offset (=2)]
-d matrix-of-values partition1 [partition_offset (=2)]
(-Q|-R|-T) [-ext extensivity] [-ofs partition_offset (=2)] [-f partition_list | partition1 [ partition2 [ ... ]]]
(-V|-E|-P) [-ofs partition_offset (=2)] [-f partition_list | partition1 [ partition2 [ ... ]]]
--pstat-sym [-ofs partition_offset (=2)] [-f partition_list | partition1 [ partition2 [ ... ]]]
(--ipot|--cpot|--jpot|--v-measure-h) entropy [-ext extensivity] [-ofs partition_offset (=2)] [-f partition_list | partition1 [ partition2 [ ... ]]]
(--mpot|--cmpot|--SSSA) entropy [-ext extensivity] [-ofs partition_offset (=2)] [-f partition_list | partition1 [ partition2 [ ... ]]]
(-C|-H) [-cons] [-ofs partition_offset (=2)] [-f partition_list | [partition1 [ partition2 [ ... ]]]
(-A|-S|--Info) [-ofs partition_offset (=2)] [-f partition_list | [partition1 [ partition2 [ ... ]]]
For creating partitions ( Clustering ) --cluster graph [-below|-above] [ threshold ] --cluster-robust graph [-s #samples] [-n #neighbors] [-below|-above] [-V|-E|-R|-T|-J] [-ext extensivity] --cluster-robust-self-consistently graph [-s #samples] [-n #neighbors] [-below|-above] [-V|-E|-R|-T|-J] [-ext extensivity]
For editing partitions --part-extract-elements elements_file [-tab mcl_tab_file] partition [partition1_offset (=2) ] --part-sort partition --part-sort-rename partition [prefix] --part-swap-names partition (requires use of --tab)
For converting between different partition formats --toMCL [-tab mcl_tab_file] partition [partition1_offset (=2) ] --toFREE partition [partition1_offset (=2) ] --MCLtoPART [-tab mcl_tab_file] partition [partition1_offset (=2) ] --MSAtoPART msa_file
For dealing with (fasta) sequence files --seq-noclone-sequences fasta_sequence_file [reference_sequence_file]
For analyzing Multiple Sequence Alignments --msa-seqid-stat [--positions positions_file] multiple_seq_alignment.fasta [multiple_seq_alignment.fasta2] --msa-seqid-avg [-thr threshold=50] multiple_seq_alignment.fasta [multiple_seq_alignment.fasta2] --msa-extract-positions positions_file multiple_seq_alignment.fasta --msa-extract-sequences sequences_file multiple_seq_alignment.fasta --msa-drop-sequences sequences_file multiple_seq_alignment.fasta --msa-extract-sequences-by-id msa_file1 msa_file2 [minId maxId] --msa-drop-sequences-by-id sequences_file msa_file [minId maxId] --msa-extract-sequences-by-topid msa_file1 msa_file2 [count] --msa-drop-sequences-by-topid msa_file1 msa_file [count] --msa-map-partition partition multiple_seq_alignment.fasta [MSAformat] --msa-print [-sort|-nosort] multiple_seq_alignment.fasta --msa-redundant [-nsam nsam] [-nseq nseq] [-seed seed] multiple_seq_alignment.fasta
For dealing with -interaction- matrices (aka, undirected graph) --edge-dist matrix-of-values [partition [partition_offset (=2)] ] -m matrix-of-values1 matrix-of-values2 -r matrix-of-values1 matrix-of-values2 partition [partition_offset (=2)] -l matrix-of-values1 matrix-of-values2 --prune-edges-above float graphfile --prune-edges-below float graphfile --print-matrix matrix-of-values --graph-nodes matrix-of-values
partanalyzer aims at being a general program for analyzing (sets of) partitions. Here a partition is defined as in set theory of mathematics (see http://en.wikipedia.org/wiki/Partition_of_a_set). It also allows to edit (rudimentarily), as well as generate, partitions.
Whenever many input files are expected, one can either list them as command line arguments, or list them in a file and use option -f to specify that file.
For calculating distances between partitions with different number of elements, use option --DIST_SUBSPROJECT right before any *stat command. Works only with a *stat distance command, i.e., not purity scores.
OPTIONS: -z , --pid-normalization norm Determines the normalization used for calculating percent sequence identities. The possible string values for norm are: s , shorter-sequence p , aligned-positions r , aligned-residues l , average-length Default normalization is the average sequence length, l.
COMMANDS: Defining algebra of partitions
-i , --intersection , -m , --meet
Calculate the intersection of partition1 & partition2
-u , --union , -j , --join
Calculate the union of partition1 & partition2. This can be
seen as the algebraic optimal consensus partition covering
partition1 and partition2. Optimal means the most refined
partition that covers both.
-I , --Intersection , -M , --Meet
Calculate the intersection of all partitions provided
-U , --Union , -J , --Join
Calculate the union of all partitions provided. This can be
seen as the algebraic optimal consensus partition covering
each and every input partition. Here, optimal means the most
refined partition that covers any of the input partitions.
Remark: This algebraic consensus is very sensitive to outlier
partitions.
For analyzing partitions
-v , --vi-distance
Calculate VI distances between partition1 & partition2
-e , --edit-distance
Calculate the edit score distance between partition1 and
partition2
-p , -purity-scores
Calculates the purity scores of partition2 (the target)
againts the partition1 (the reference).
-c , --check-consistency-of-partition , --ccop [-tab tab_file]
Check cluster consistency according to the given matrix. If
partition and graph matrix label items differently, use the
option -tab to provide a tab file specifying the conversion.
(See below for syntaxis of the matrix and tab file)
-d , --intra-inter-edge-dist
Calculate intra and inter cluster distribution of weights
according to the given matrix
-Q , --qstat [-ext extensivity] [-ref]
Calculates Tarantola distance for each pair of partitions.
For that it uses the Jeffrey's Qnorm based on Shannon Entropy.
With option -ref, the first partition is taken as a reference
and it calculates the distances of all againts that one.
Default extensivity coefficient is 2.
-R , --rstat [-ext extensivity] [-ref]
Calculates Renyi distances for each pair of partitions.
With option -ref, the first partition is taken as a reference
and it calculates the distances of all againts that one.
Default extensivity coefficient is 2.
-T , --tstat [-ext extensivity] [-ref]
Calculates Tsallis distances for each pair of partitions.
With option -ref, the first partition is taken as a reference
and it calculates the distances of all againts that one.
Default extensivity coefficient is 2.
-B , --bstat [-ref]
Calculates the Boltzmann distance for each pair of partitions.
With option -ref, the first partition is taken as a reference
and it calculates the distances of all againts that one.
-V , --vstat [-ref]
Calculates the VI distance for each pair of partitions.
With option -ref, the first partition is taken as a reference
and it calculates the distances of all againts that one.
-E , --estat [-ref]
Calculates the Edit Score distance for each pair of partitions
With option -ref, the first partition is taken as a reference
and it calculates the distances of all againts that one.
-P , --pstat [-ref | -target]
Calculates the purity scores (strict and lax) for each pair
of partitions. With option -ref, it calculates the purity
scores of all againts the first one, which is taken as a
reference. With option -target, the first one is considered
the target and it calculates the scores of that one against
all others taken as reference.
--pstat-sym , --pstat-symmetric
Calculates arithmetic averages of purity stric and purity lax
scores for each pair of partitions.
-n , --ipot entropy [(-e|-ext) extensivity] [-ref]
Calculates (information theoretic) potential (entropy) of each
partition. The possible values for entropy are (short|long):
v | s | vonneumann | shannon
b | boltzmann
c | e | cardinality
r | renyi
t | tsallis
q | tarantola/jeffrey/tjqn
Both, long and short option names are valid.
Default extensivity coefficient is 2.
For cardinality potential, this coefficient will be used as a
gauge determining the card(1)=1+extensivity.
--cpot, --conditional-potential entropy [-ext extensivity] [-ref]
Calculates conditional entropy for each pair of partitions.
The possible values for entropy and extensivity are the same
as for option --ipot.
--jpot, --joint-potential entropy [-ext extensivity] [-ref]
Calculates joint entropy for each pair of partitions.
The possible values for entropy and extensivity are the same
as for option --ipot.
--mpot, --mutual-potential entropy [-ext extensivity] [-ref]
--SA, --subadditivity entropy [-ext extensivity]
Calculates the mutual potential (mutual information)
for each pair of partitions. If positive, subadditivity holds.
The possible values for entropy and extensivity are the same
as for option --ipot.
--cmpot, --conditional-mutual-potential entropy [-ext extensivity]
--SSA, --strong-subadditivity entropy [-ext extensivity] [-ref]
Calculates the conditional mutual potential (conditional
mutual information) for each pair of partitions. If positive,
for all three partitions, then strong subadditivity holds.
The possible values for entropy and extensivity are the same
as for option --ipot.
--SSSA, --soft-strong-subadditivity entropy [-ext extensivity] [-ref]
Calculates a softer version of the strong subadditivity
condition for all three partitions. If positive, then the
potential acts as a norm and defines a metric, which thus
satisfies the triangular inequality.
The possible values for entropy and extensivity are the same
as for option --ipot.
--v-measure-h , --v-measure-harmonic entropy [-ext extensivity] [-ref]
Calculates the Vmeasure between each pair of partitions. This
measure is as that defined by Roseberg, A. and Hirschberg, J.
in http://acl.ldc.upenn.edu/D/D07/D07-1043.pdf. Use global
option --beta for specifying relative weight of homogeneity
versus completeness. Default is equal weight, i.e., beta=1.
and thus the average between both is strictly an harmonic one.
The possible values for entropy and extensivity are the same
as for option --ipot.
--v-measure-a , --v-measure-arithmetic entropy [-ext extensivity] [-ref]
Analogous to --v-measure-h but using arithmetic mean between
homogeneity and completeness.
--v-measure-g , --v-measure-geometric entropy [-ext extensivity] [-ref]
Analogous to --v-measure-h but using geometric mean between
homogeneity and completeness.
-C , --cluster-stat [-ofs ofs] [-norm gaug] [-cons|-consensus]
For each item, determines the most frequent cluster where
it appears among all the clusters of all the given partitions.
It also prints its size and observed frequency (both, raw
count and %).
Option -ofs,see below, allows to specify a partition offset.
Option -norm gaug gauges the normalization used for
determining the %frequencies. By default these are
calculated by counting how many times the mode cluster
is found at each of the different partitions and then
dividing by the number of partitions N. With this
option, that count gets divided by N+gaug, where gaug
can be negative or positive.
Option -cons or -consensus will print the consensus partition
-A , --adjacency-stat , {--adjstat}
Determines the average adjacency matrix from the provided
partitions. The adjacency matrix of a partition is the graph
where edges (0 or 1 ) represent two elements belonging to the
same subfamily. The average adjacency matrix has edges with
continous values [0,1]. The output consists in a matrix of
values and a gray-scale image of it in PGM format.
-S , --split-merge-analysis , {--splitstat}
(Split-Merge plot)
Determines the overlap of each cluster to those of the
reference partition (the first). Possible values are for
the overlap are:
-over fraction elements in common relative to the target cluster.
-cos cosine normalized similarity
It outputs:
-Confusion matrix (in % of the target clusters) taking
the first partition as reference and the second as target.
-number of overlaps for each target cluster
-Split-Merge image showing the CT. In addition it show two
reference color bars: a bottom color bar representing the
perfect split transformations (black), the merge-only
(white) ones and those cases in between (different grey
levels); a right-most column shows whether these are perfect
matches (black) or not (white).
--Info , {--isPart , --isaPart , --is-partition} [-ofs ofs]
For each partition checks whether it is a sound partition
or not, i.e., whether all of its clusters are pair-wise
disjoint. With option -q, only error message will be printed
in case partition is not sound, otherwise it'll keep silent.
-H, --hasse-diagram
prints the local Hasse Diagram (graph) spanned by the
given partitions.
For creating partitions ( clustering ) --cluster graph [ [-below|-above] treshold ] Defines clusters from the transitivity relation given by the graph's edges. If a treshold is provided, it prunes first the edges below the threshold. Example: partanalyzer --cluster gf -below 0.7 partanalyzer --cluster gf 0.7 both cases will first pruned the edges below 0.7 and the obtain the clusters generated that way. For pruning above we must use the second explicit form partanalyzer --cluster gf -above 0.7
--cluster-robust graph [-s #samples] [-n #neighbors] [-below|-above] [-V|-E|-R|-T|-J] [-ext extensivity]
Gives the most robut clustering with respect to edge pruning.
This is defined as the partition showing the smallest average
variability _after_ the phase transition. The average varia-
bility is calculated as the average distance against those
partitions at its #neighbors nearest pruning thresholds
(#neighbors above; #neighbors below).
It repeatedly clusters the graph starting with a pruning
threshold equal to the lowest edge and increasing it by a fixed
amount until reaching the highest edge value. The total
number of samples determine each step increase of threshold.
We may be pruning the edges above the threshold (as if the
later were a temperature T) or below the threshold (1/T).
Defaults: #samples=10 ; Pruning=below ; Metric=shannon (-V)
#neighbors=2.
--RDC
--cluster-robust-self-consistently graph [-s #samples] [-n #neighbors] [-below|-above] [-V|-E|-R|-T|-J] [-ext extensivity]
As --cluster-robust, but it determines self-consitently the
largest possible number of samples. The latter is defined as
the largest for which each pruning interval removes at least
one edge. The method used is bisectioning and the provided
#samples is used as the seed for the search. All defaults as
for --cluster-robust.
For editing partitions --part-extract-elements {--extract-elements} elements_file elements_file lists the names of the elements to cull from the given partition
--part-sort
Sorts the clusters by size, the larger on top. Ties are
sorted alphabetically by their first item. Within each
cluster, items are sorted alphabetically.
--part-sort-rename partition [prefix]
As --part-sort, but also rename each clusters consecutively
as C1, C2,etc. If a prefix string is supplied use that
instead of C.
--part-swap-names partition
--part-swap-labels partition
Swaps elements' names present in partition by their new
names as found in the provided tab file. An element's name in
the partition will be changed iif there is a translation for
it found in the tab file; otherwise it will be left as it is.
Thus, it is not mandatory to provide a translation for all
elements. Requires the use of --tab to specify a tab file
providing the mapping between new and old names. See general
options.
For converting between different partition formats
--toMCL [-tab mcl_tab_file] converts partition from PART format
to MCL's format. If additional tab file is provided, output
will contain the specific label index given in the tab file.
--toFREE converts partition from PART format to FREE format.
--MCLtoPART [-tab mcl_tabl_file]
converts partition from MCL format to PART format.
If additional tab file is provided, output will contain
items' labels, instead of simply their MCL index number.
--MSAtoPART
converts a MSA file in FASTA format containing the
labeling of clusters into a partition in PART format. Cluster
labels are expected in a separate line before the actual
set of sequences, i.e.,
%Group_A
>Sequence_A1
...
or
==Group_A
>Sequence_A1
...
Escape characters indicating cluster labels can be mixed in
the same file, although it's not recommended.
For dealing with (fasta) sequence files --drop-clone-sequences --msa-noclone-sequences --seq-noclone-sequences sequence_file (fasta) Given a (fasta) sequence file or a fasta MSA file, remove all duplicate sequences. Here duplicate means literally that, namely, exactly the same string of characters. Therefore, it is not the same as having a pid=100%, but more stringent. If a second sequence file is provided, drop also sequences that are clones of any sequence in the second file.
For analyzing Multiple Sequence Alignments
--msa-seqid-stat
--msa-seqid-stat [--positions file]
Given a multiple sequence alignment in fasta format, it
prints all pair-wise sequence identities. By default, it
calculates identities over the full sequence length. The
second version allows to specify the (reduced) set of positions
we want to consider in comparing sequences. These should be
specified in a file, each separated by space,tabs, new lines,
etc. The positions are understood as columns of the MSA.
If two MSA are provided, it prints the sequence Id of the
first set against the second.
--msa-seqid-avg [-thr threshold ]
Similar as option --msa-seqid-avg, but prints for each sequence
a statistics of its pair-wise sequence identity to all other
sequences. This consists of average Seq.Id, standard
deviation, variance, minimum Seq.Id, maximum Seq.Id, number
of pairs with Seq.Id > threshold, fraction of pairs with Seq.
Id. > threshold and total number of pairs.
Option -thr allows to provide a specific threshold to use.
default value is 50%. Values are floating numbers
within [0,100].
If two MSA are provided, it prints the sequence Id of the
first set against the second.
--msa-extract-positions positions_file msa_file
From the given MSA, extract only columns specified in file
positions_file.
--msa-extract-sequences sequences_file msa_file
--msa-drop-sequences sequences_file msa_file
From the given MSA, extract only sequences specified in file
sequences_file. This file contains a list of sequences names
The second form drops those sequences instead.
If a positions file is given, sequence Id's are calculated
considering only those columns of the MSA.
--msa-extract-sequences-by-id msa_file1 msa_file2 [minId maxId]
--msa-drop-sequences-by-id sequences_file msa_file [minId maxId]
From MSA msa_file1, extract sequences with an ID above minId
and at most maxId against any sequence of MSA msa_file2.
The second form drops those sequences instead. Default values
values are minId=30 and maxId=100, i.e., homologous sequences.
If a positions file is given, sequence Id's are calculated
considering only those columns of the MSA. In this case minId
and maxId are mandatory and must come before positions_file.
--msa-extract-sequences-by-topid msa_file1 msa_file2 [count]
--msa-drop-sequences-by-topid sequences_file msa_file [count]
From MSA msa_file1, extract at most count most similar sequences
(seq.ID) to any sequence of MSA msa_file2.
The second form drops those sequences instead.
If a positions file is given, sequence Id's are calculated
considering only those columns of the MSA. In this case count
is mandatory and must come before positions_file.
--msa-redundant [-nsam nsam] [-nseq nseq] [-seed seed]
Duplicates sequences chosen at random in the given multiple
sequence alignment. Wtihout options, only one is chosen.
Option -nsam Generate nsam samples of MSAs with nseq dupli-
cated sequences. Each sample is written is its
own directory.
-nseq Specify the number of sequences to duplicate.
-seed Specify the seed of the random number generator
All options are expected to be integer values. The value of
the seed is written within .seed_used allowing for repeated
experiments.
--msa-map-partition
Given a Partition and the original MSA, output the MSA
with the cluster annotation format of the SDPpred server.
MSAformat allows to specify the format of the output alignment
Possible formats are: FASTA[23]*, SPEER[23]*, GDE[23]* and
GSIM[23]*. Example: FASTA prints cluster information as a
line heading the sequence label line starting with `%'; using
FASTA2 prints the same but only clusters with 2 or more
elements are printed (3 or more if format is FASTA3). Idem
for the additional formats. SPEER prints the MSA appending the
clusters' sizes as a last line; GDE is analogous to FASTA but
but uses `==' instead of `%'. Finally, GSIM adds cluster name
as the last string of the fasta label separated from it by `|'
--msa-print , --print-msa [-sort|-nosort]
Prints the given multiple sequence alignment. Useful for
debugging. With -sort, sequences are sorted alphabetically;
-nosort leaves them sorted as in the input file (default).
For dealing with -interaction- matrices
--edge-dist
For each node, prints the distribution of edge weights.
Information printed is: Node, average edge weight, standard
deviation, standard error, skewness, minimum edge value,
max edge value and sample size (number of edges).
If a partition is provided, it also prints the cluster size
and cluster name each node belongs to.
-m , --merge-graphs
Merge two graph matrices into one that contains both values
for each pair of items, i.e., the resulting graph looks like
stringA stringB float1 float2
... ... ... ...
where float1, float2 are the matrix values of matrix1 and
matrix2, respectively. Both matrices are expected to contain
the same set of pair of items, i.e., the same set of edges.
-r , --merge-graphs-color
as option -m, but in addition includes the name of the
cluster each pair of values belong to. If they belong to
different clusters the label is "x". The label is NAN
if any of the item does not belong to any of the clusters
defined in the given partition. The format of the output is
float1A flaot2 clustername_AB stringA stringB
... ... ... ... ...
-l , --cull-edges
Culls from matrix of values edges specified in second file.
--prune-edges-below float graphfile
--prune-edges-above float graphfile
Removes all edges below or above the given threshold.
--graph-nodes graphfile
--matrix-nodes graphfile
Prints the list of nodes of the given interaction matrix.
--graph-print [-c col] , --matrix-print [-c col]
Print the given interaction matrix. For debugging. Integer
col specifies the column containing the edge values. Default:
col=3.
General options
--verbose
For debugging.
-q , --quiet
quiet mode. Do not print out comment lines (that start with
`#').
-t , --format {--fmt} [pfmt=input_partition_file_format]
Specify the default format expected for the input paritions.
Possible format values are: PART,MCL and FREE. See below.
As MCL is automatically recognized from the file content,
this option will be useful in two cases:
(1) to distinguish between PART and FREE input partitions,
(2) in combination with --tab, if the output (specified with
--oformat or the different format conversion options) is
different from the (input) format specified with -t, the
tabfile will be used for translating the labels of the
elements; however, if the _specified_ input and output
formats coindice, the original labels will be preserved.
Example: p.mcl is in MCL format; p.lst, in PART format.
partanalyzer -t PART --tab tbf -V p.mcl p.lst
this gives the distance between the two by using the tab
file on p.mcl, but NOT on p.lst.
Default input format is PART.
--oformat [pfmt=input_partition_file_format]
Specify the ouput format when printing partitions.
Default output format is PART.
File formats:
matrix-of-values (an undirected graph):
stringA stringB float
stringA stringC float
... ... ...
stringZ stringV float
tab file:
integer1 string1
integer2 string2
... ...
partition:
PART: (default, i.e., partition_offset=2)
sizeA clusterA_name item_1 item_2 ... item_sizeA
sizeB clusterB_name item_1 item_2 ... item_sizeB
... ... ... ... ...
or (partition_offset=1):
sizeA item_1 item_2 ... item_sizeA
... ... ... ...
FREE: (not yet implemented) (partition_offset=0)
item_1 item_2 ... item_sizeA
... ... ...
MCL : MCL's own matrix format for partitions. See MCL manual.
License
partanalyzer Version alpha 1.0.
Copyright (c) Miguel A. Santos, May. 2008-2010 .Build Feb 19 2010
Licensed under the GNU GPL version 3 or later.
http://www.gnu.org/copyleft/gpl.html )
(seeExamples: For lastest options check the help from the program ./partanalyze -h
Check consistency of a given partition test.subfam.lst based on a matrix of interactions given by test-blast_pairwise_id. How large are the intra-cluster values compared to the inter-cluster ones. ./partanalyze -c test-blast_pairwise_id test.subfam.lst or ./partanalyze --check-consistency-of-partition test-blast_pairwise_id test.subfam.lst which also accepts an abreviated form as ./partanalyze --ccop test-blast_pairwise_id test.subfam.lst
Calculate VI distance between two partitions and between each of them and their intersection Definition of VI distance: Given two partitions P1 and P2, with cluster size distributions {n_k} and {n_k'} respectively, where k and k' are indexes to each of their corresponding clusters, and such that Sum_k n_k = Sum_k n'_k = N, the VI distance is defined as
VI (P1,P2) = Sum_k n_k/N * log( n_k/N) + Sum_k' n_k'/N * log( n_k'/N) - 2 * Sum_k Sum_k' n_kk'/N log(n_kk'/N)
where n_kk' is the number of items common to cluster k of P1 and cluster k' of P2. This definition satisfies the triangular inequality, i.e., for any three partitions P1,P2 and P, it is VI (P1, P) + VI (P,P2) >= VI (P1,P2) ./partanalyze --vi-distance test.subfam.lst test.subfam.lst2 or simply ./partanalyze -v test.subfam.lst test.subfam.lst2
Print the intersection of 2 partitions test.subfam.lst and test.subfam.lst2 ./partanalyze -i test.subfam.lst test.subfam.lst2 Performs the intersection of P1 and P2 as induced by the intersection operation on the underlying set (the one that contains all elements). This gives a new partition I such that each cluster of I is obtained as an intersection of one cluster of P1 and one of P2 (all againts all).
Print the purity scores for partition1 (target) againts partition2 (reference) ./partanalyze --purity-scores test.subfam.lst test.subfam.lst2 or simplply ./partanalyze -p test.subfam.lst test.subfam.lst2
It outputs the purity strict and purity lax values. Purity strict of P1 againts P2 := the number of non-singleton clusters of P1 that are exactly identical to one of P2, divided by the number of non-singleton clusters of P2 (the reference). Purity Lax of P1 againts P2 := the number of non-singleton clusters of P1 that are subsets of a cluster of P2, divided by the number of non-singleton clusters of P1 (the target).
For debugging: print the interaction matrix read by the program ./partanalyze --print-matrix test-blast_pairwise_id