Partanalyzer Help page - MASantos/Partanalyzer GitHub Wiki
Full help (as of version alpha 1.0.) partanalyzer (Partition Analyzer)
Usage: partanalyzer [-h|--help] (Use --help for more details) partanalyzer --version partanalyzer [OPTIONS] COMMAND ARGS
OPTIONS --debug --verbose -q , --quiet -z, --pid-normalization [s|p|r|l] -t , --format partition_format --tab tab_file --DIST_SUBSPROJECT --beta beta_value --mu mu_value
COMMANDS
Defining the algebra of partitions (-i|-u) partition1 partition2 [partition1_offset (=2) ] [partition2_offset (=partition1_offset) ] (-I|-U) [-ofs partition_offset (=2)] [-f partition_list | partition1 [ partition2 [ ... ]]]
For analyzing partitions (-v|-e|-p) partition1 partition2 [partition1_offset (=2) ] [partition2_offset (=partition1_offset) ] -c matrix-of-values partition1 [threshold (=-1.0)] [partition_offset (=2)] -d matrix-of-values partition1 [partition_offset (=2)] (-Q|-R|-T) [-ext extensivity] [-ofs partition_offset (=2)] [-f partition_list | partition1 [ partition2 [ ... ]]] (-V|-E|-P) [-ofs partition_offset (=2)] [-f partition_list | partition1 [ partition2 [ ... ]]] --pstat-sym [-ofs partition_offset (=2)] [-f partition_list | partition1 [ partition2 [ ... ]]] (--ipot|--cpot|--jpot|--v-measure-h) entropy [-ext extensivity] [-ofs partition_offset (=2)] [-f partition_list | partition1 [ partition2 [ ... ]]] (--mpot|--cmpot|--SSSA) entropy [-ext extensivity] [-ofs partition_offset (=2)] [-f partition_list | partition1 [ partition2 [ ... ]]] (-C|-H) [-cons] [-ofs partition_offset (=2)] [-f partition_list | [partition1 [ partition2 [ ... ]]] (-A|-S|--Info) [-ofs partition_offset (=2)] [-f partition_list | [partition1 [ partition2 [ ... ]]]
For creating partitions ( Clustering ) --cluster graph [-below|-above] [ threshold ] --cluster-robust graph [-s #samples] [-n #neighbors] [-below|-above] [-V|-E|-R|-T|-J] [-ext extensivity] --cluster-robust-self-consistently graph [-s #samples] [-n #neighbors] [-below|-above] [-V|-E|-R|-T|-J] [-ext extensivity]
For editing partitions --part-extract-elements elements_file [-tab mcl_tab_file] partition [partition1_offset (=2) ] --part-sort partition --part-sort-rename partition [prefix] --part-swap-names partition (requires use of --tab)
For converting between different partition formats --toMCL [-tab mcl_tab_file] partition [partition1_offset (=2) ] --toFREE partition [partition1_offset (=2) ] --MCLtoPART [-tab mcl_tab_file] partition [partition1_offset (=2) ] --MSAtoPART msa_file
For dealing with (fasta) sequence files --seq-noclone-sequences fasta_sequence_file [reference_sequence_file]
For analyzing Multiple Sequence Alignments --msa-seqid-stat [--positions positions_file] multiple_seq_alignment.fasta [multiple_seq_alignment.fasta2] --msa-seqid-avg [-thr threshold=50] multiple_seq_alignment.fasta [multiple_seq_alignment.fasta2] --msa-extract-positions positions_file multiple_seq_alignment.fasta --msa-extract-sequences sequences_file multiple_seq_alignment.fasta --msa-drop-sequences sequences_file multiple_seq_alignment.fasta --msa-extract-sequences-by-id msa_file1 msa_file2 [minId maxId] --msa-drop-sequences-by-id sequences_file msa_file [minId maxId] --msa-extract-sequences-by-topid msa_file1 msa_file2 [count] --msa-drop-sequences-by-topid msa_file1 msa_file [count] --msa-map-partition partition multiple_seq_alignment.fasta [MSAformat] --msa-print [-sort|-nosort] multiple_seq_alignment.fasta --msa-redundant [-nsam nsam] [-nseq nseq] [-seed seed] multiple_seq_alignment.fasta
For dealing with -interaction- matrices (aka, undirected graph) --edge-dist matrix-of-values [partition [partition_offset (=2)] ] -m matrix-of-values1 matrix-of-values2 -r matrix-of-values1 matrix-of-values2 partition [partition_offset (=2)] -l matrix-of-values1 matrix-of-values2 --prune-edges-above float graphfile --prune-edges-below float graphfile --print-matrix matrix-of-values --graph-nodes matrix-of-values
partanalyzer aims at being a general program for analyzing (sets of) partitions. Here a partition is defined as in set theory of mathematics (see http://en.wikipedia.org/wiki/Partition_of_a_set). It also allows to edit (rudimentarily), as well as generate, partitions.
Whenever many input files are expected, one can either list them as command line arguments, or list them in a file and use option -f to specify that file.
For calculating distances between partitions with different number of elements, use option --DIST_SUBSPROJECT right before any *stat command. Works only with a *stat distance command, i.e., not purity scores.
OPTIONS: -z , --pid-normalization norm Determines the normalization used for calculating percent sequence identities. The possible string values for norm are: s , shorter-sequence p , aligned-positions r , aligned-residues l , average-length Default normalization is the average sequence length, l.
COMMANDS: Defining algebra of partitions
-i , --intersection , -m , --meet Calculate the intersection of partition1 & partition2
-u , --union , -j , --join Calculate the union of partition1 & partition2. This can be seen as the algebraic optimal consensus partition covering partition1 and partition2. Optimal means the most refined partition that covers both.
-I , --Intersection , -M , --Meet Calculate the intersection of all partitions provided
-U , --Union , -J , --Join Calculate the union of all partitions provided. This can be seen as the algebraic optimal consensus partition covering each and every input partition. Here, optimal means the most refined partition that covers any of the input partitions. Remark: This algebraic consensus is very sensitive to outlier partitions.
For analyzing partitions
-v , --vi-distance Calculate VI distances between partition1 & partition2
-e , --edit-distance Calculate the edit score distance between partition1 and partition2
-p , -purity-scores Calculates the purity scores of partition2 (the target) againts the partition1 (the reference).
-c , --check-consistency-of-partition , --ccop [-tab tab_file] Check cluster consistency according to the given matrix. If partition and graph matrix label items differently, use the option -tab to provide a tab file specifying the conversion. (See below for syntaxis of the matrix and tab file)
-d , --intra-inter-edge-dist Calculate intra and inter cluster distribution of weights according to the given matrix
-Q , --qstat [-ext extensivity] [-ref] Calculates Tarantola distance for each pair of partitions. For that it uses the Jeffrey's Qnorm based on Shannon Entropy. With option -ref, the first partition is taken as a reference and it calculates the distances of all againts that one. Default extensivity coefficient is 2.
-R , --rstat [-ext extensivity] [-ref] Calculates Renyi distances for each pair of partitions. With option -ref, the first partition is taken as a reference and it calculates the distances of all againts that one. Default extensivity coefficient is 2.
-T , --tstat [-ext extensivity] [-ref] Calculates Tsallis distances for each pair of partitions. With option -ref, the first partition is taken as a reference and it calculates the distances of all againts that one. Default extensivity coefficient is 2.
-B , --bstat [-ref] Calculates the Boltzmann distance for each pair of partitions. With option -ref, the first partition is taken as a reference and it calculates the distances of all againts that one.
-V , --vstat [-ref] Calculates the VI distance for each pair of partitions. With option -ref, the first partition is taken as a reference and it calculates the distances of all againts that one.
-E , --estat [-ref] Calculates the Edit Score distance for each pair of partitions With option -ref, the first partition is taken as a reference and it calculates the distances of all againts that one.
-P , --pstat [-ref | -target] Calculates the purity scores (strict and lax) for each pair of partitions. With option -ref, it calculates the purity scores of all againts the first one, which is taken as a reference. With option -target, the first one is considered the target and it calculates the scores of that one against all others taken as reference.
--pstat-sym , --pstat-symmetric Calculates arithmetic averages of purity stric and purity lax scores for each pair of partitions.
-n , --ipot entropy [(-e|-ext) extensivity] [-ref] Calculates (information theoretic) potential (entropy) of each partition. The possible values for entropy are (short|long): v | s | vonneumann | shannon b | boltzmann c | e | cardinality r | renyi t | tsallis q | tarantola/jeffrey/tjqn Both, long and short option names are valid. Default extensivity coefficient is 2. For cardinality potential, this coefficient will be used as a gauge determining the card(1)=1+extensivity.
--cpot, --conditional-potential entropy [-ext extensivity] [-ref] Calculates conditional entropy for each pair of partitions. The possible values for entropy and extensivity are the same as for option --ipot.
--jpot, --joint-potential entropy [-ext extensivity] [-ref] Calculates joint entropy for each pair of partitions. The possible values for entropy and extensivity are the same as for option --ipot.
--mpot, --mutual-potential entropy [-ext extensivity] [-ref] --SA, --subadditivity entropy [-ext extensivity] Calculates the mutual potential (mutual information) for each pair of partitions. If positive, subadditivity holds. The possible values for entropy and extensivity are the same as for option --ipot.
--cmpot, --conditional-mutual-potential entropy [-ext extensivity] --SSA, --strong-subadditivity entropy [-ext extensivity] [-ref] Calculates the conditional mutual potential (conditional mutual information) for each pair of partitions. If positive, for all three partitions, then strong subadditivity holds. The possible values for entropy and extensivity are the same as for option --ipot.
--SSSA, --soft-strong-subadditivity entropy [-ext extensivity] [-ref] Calculates a softer version of the strong subadditivity condition for all three partitions. If positive, then the potential acts as a norm and defines a metric, which thus satisfies the triangular inequality. The possible values for entropy and extensivity are the same as for option --ipot.
--v-measure-h , --v-measure-harmonic entropy [-ext extensivity] [-ref] Calculates the Vmeasure between each pair of partitions. This measure is as that defined by Roseberg, A. and Hirschberg, J. in http://acl.ldc.upenn.edu/D/D07/D07-1043.pdf. Use global option --beta for specifying relative weight of homogeneity versus completeness. Default is equal weight, i.e., beta=1. and thus the average between both is strictly an harmonic one. The possible values for entropy and extensivity are the same as for option --ipot.
--v-measure-a , --v-measure-arithmetic entropy [-ext extensivity] [-ref] Analogous to --v-measure-h but using arithmetic mean between homogeneity and completeness.
--v-measure-g , --v-measure-geometric entropy [-ext extensivity] [-ref] Analogous to --v-measure-h but using geometric mean between homogeneity and completeness.
-C , --cluster-stat [-ofs ofs] [-norm gaug] [-cons|-consensus] For each item, determines the most frequent cluster where it appears among all the clusters of all the given partitions. It also prints its size and observed frequency (both, raw count and %). Option -ofs,see below, allows to specify a partition offset. Option -norm gaug gauges the normalization used for determining the %frequencies. By default these are calculated by counting how many times the mode cluster is found at each of the different partitions and then dividing by the number of partitions N. With this option, that count gets divided by N+gaug, where gaug can be negative or positive. Option -cons or -consensus will print the consensus partition
-A , --adjacency-stat , {--adjstat} Determines the average adjacency matrix from the provided partitions. The adjacency matrix of a partition is the graph where edges (0 or 1 ) represent two elements belonging to the same subfamily. The average adjacency matrix has edges with continous values [0,1]. The output consists in a matrix of values and a gray-scale image of it in PGM format.
-S , --split-merge-analysis , {--splitstat} (Split-Merge plot) Determines the overlap of each cluster to those of the reference partition (the first). Possible values are for the overlap are: -over fraction elements in common relative to the target cluster. -cos cosine normalized similarity
It outputs: -Confusion matrix (in % of the target clusters) taking the first partition as reference and the second as target. -number of overlaps for each target cluster -Split-Merge image showing the CT. In addition it show two reference color bars: a bottom color bar representing the perfect split transformations (black), the merge-only (white) ones and those cases in between (different grey levels); a right-most column shows whether these are perfect matches (black) or not (white).
--Info , {--isPart , --isaPart , --is-partition} [-ofs ofs] For each partition checks whether it is a sound partition or not, i.e., whether all of its clusters are pair-wise disjoint. With option -q, only error message will be printed in case partition is not sound, otherwise it'll keep silent.
-H, --hasse-diagram prints the local Hasse Diagram (graph) spanned by the given partitions.
For creating partitions ( clustering ) --cluster graph [ [-below|-above] treshold ] Defines clusters from the transitivity relation given by the graph’s edges. If a treshold is provided, it prunes first the edges below the threshold. Example: partanalyzer --cluster gf -below 0.7 partanalyzer --cluster gf 0.7 both cases will first pruned the edges below 0.7 and the obtain the clusters generated that way. For pruning above we must use the second explicit form partanalyzer --cluster gf -above 0.7
--cluster-robust graph [-s #samples] [-n #neighbors] [-below|-above] [-V|-E|-R|-T|-J] [-ext extensivity] Gives the most robut clustering with respect to edge pruning. This is defined as the partition showing the smallest average variability _after_ the phase transition. The average varia- bility is calculated as the average distance against those partitions at its #neighbors nearest pruning thresholds (#neighbors above; #neighbors below). It repeatedly clusters the graph starting with a pruning threshold equal to the lowest edge and increasing it by a fixed amount until reaching the highest edge value. The total number of samples determine each step increase of threshold. We may be pruning the edges above the threshold (as if the later were a temperature T) or below the threshold (1/T). Defaults: #samples=10 ; Pruning=below ; Metric=shannon (-V) #neighbors=2.
--RDC --cluster-robust-self-consistently graph [-s #samples] [-n #neighbors] [-below|-above] [-V|-E|-R|-T|-J] [-ext extensivity] As --cluster-robust, but it determines self-consitently the largest possible number of samples. The latter is defined as the largest for which each pruning interval removes at least one edge. The method used is bisectioning and the provided #samples is used as the seed for the search. All defaults as for --cluster-robust.
For editing partitions --part-extract-elements {--extract-elements} elements_file elements_file lists the names of the elements to cull from the given partition
--part-sort Sorts the clusters by size, the larger on top. Ties are sorted alphabetically by their first item. Within each cluster, items are sorted alphabetically.
--part-sort-rename partition [prefix] As --part-sort, but also rename each clusters consecutively as C1, C2,etc. If a prefix string is supplied use that instead of C.
--part-swap-names partition --part-swap-labels partition Swaps elements' names present in partition by their new names as found in the provided tab file. An element's name in the partition will be changed iif there is a translation for it found in the tab file; otherwise it will be left as it is. Thus, it is not mandatory to provide a translation for all elements. Requires the use of --tab to specify a tab file providing the mapping between new and old names. See general options.
For converting between different partition formats
--toMCL [-tab mcl_tab_file] converts partition from PART format to MCL's format. If additional tab file is provided, output will contain the specific label index given in the tab file.
--toFREE converts partition from PART format to FREE format.
--MCLtoPART [-tab mcl_tabl_file] converts partition from MCL format to PART format. If additional tab file is provided, output will contain items' labels, instead of simply their MCL index number.
--MSAtoPART converts a MSA file in FASTA format containing the labeling of clusters into a partition in PART format. Cluster labels are expected in a separate line before the actual set of sequences, i.e., %Group_A >Sequence_A1 ... or ==Group_A >Sequence_A1 ... Escape characters indicating cluster labels can be mixed in the same file, although it's not recommended.
For dealing with (fasta) sequence files --drop-clone-sequences --msa-noclone-sequences --seq-noclone-sequences sequence_file (fasta) Given a (fasta) sequence file or a fasta MSA file, remove all duplicate sequences. Here duplicate means literally that, namely, exactly the same string of characters. Therefore, it is not the same as having a pid=100%, but more stringent. If a second sequence file is provided, drop also sequences that are clones of any sequence in the second file.
For analyzing Multiple Sequence Alignments
--msa-seqid-stat --msa-seqid-stat [--positions file] Given a multiple sequence alignment in fasta format, it prints all pair-wise sequence identities. By default, it calculates identities over the full sequence length. The second version allows to specify the (reduced) set of positions we want to consider in comparing sequences. These should be specified in a file, each separated by space,tabs, new lines, etc. The positions are understood as columns of the MSA. If two MSA are provided, it prints the sequence Id of the first set against the second.
--msa-seqid-avg [-thr threshold ] Similar as option --msa-seqid-avg, but prints for each sequence a statistics of its pair-wise sequence identity to all other sequences. This consists of average Seq.Id, standard deviation, variance, minimum Seq.Id, maximum Seq.Id, number of pairs with Seq.Id > threshold, fraction of pairs with Seq. Id. > threshold and total number of pairs. Option -thr allows to provide a specific threshold to use. default value is 50%. Values are floating numbers within [0,100]. If two MSA are provided, it prints the sequence Id of the first set against the second.
--msa-extract-positions positions_file msa_file From the given MSA, extract only columns specified in file positions_file.
--msa-extract-sequences sequences_file msa_file --msa-drop-sequences sequences_file msa_file From the given MSA, extract only sequences specified in file sequences_file. This file contains a list of sequences names The second form drops those sequences instead. If a positions file is given, sequence Id's are calculated considering only those columns of the MSA.
--msa-extract-sequences-by-id msa_file1 msa_file2 [minId maxId] --msa-drop-sequences-by-id sequences_file msa_file [minId maxId] From MSA msa_file1, extract sequences with an ID above minId and at most maxId against any sequence of MSA msa_file2. The second form drops those sequences instead. Default values values are minId=30 and maxId=100, i.e., homologous sequences. If a positions file is given, sequence Id's are calculated considering only those columns of the MSA. In this case minId and maxId are mandatory and must come before positions_file.
--msa-extract-sequences-by-topid msa_file1 msa_file2 [count] --msa-drop-sequences-by-topid sequences_file msa_file [count] From MSA msa_file1, extract at most count most similar sequences (seq.ID) to any sequence of MSA msa_file2. The second form drops those sequences instead. If a positions file is given, sequence Id's are calculated considering only those columns of the MSA. In this case count is mandatory and must come before positions_file.
--msa-redundant [-nsam nsam] [-nseq nseq] [-seed seed] Duplicates sequences chosen at random in the given multiple sequence alignment. Wtihout options, only one is chosen. Option -nsam Generate nsam samples of MSAs with nseq dupli- cated sequences. Each sample is written is its own directory. -nseq Specify the number of sequences to duplicate. -seed Specify the seed of the random number generator All options are expected to be integer values. The value of the seed is written within .seed_used allowing for repeated experiments.
--msa-map-partition Given a Partition and the original MSA, output the MSA with the cluster annotation format of the SDPpred server. MSAformat allows to specify the format of the output alignment Possible formats are: FASTA[23]*, SPEER[23]*, GDE[23]* and GSIM[23]*. Example: FASTA prints cluster information as a line heading the sequence label line starting with `%'; using FASTA2 prints the same but only clusters with 2 or more elements are printed (3 or more if format is FASTA3). Idem for the additional formats. SPEER prints the MSA appending the clusters' sizes as a last line; GDE is analogous to FASTA but but uses `==' instead of `%'. Finally, GSIM adds cluster name as the last string of the fasta label separated from it by `|'
--msa-print , --print-msa [-sort|-nosort] Prints the given multiple sequence alignment. Useful for debugging. With -sort, sequences are sorted alphabetically; -nosort leaves them sorted as in the input file (default).
For dealing with -interaction- matrices
--edge-dist For each node, prints the distribution of edge weights. Information printed is: Node, average edge weight, standard deviation, standard error, skewness, minimum edge value, max edge value and sample size (number of edges). If a partition is provided, it also prints the cluster size and cluster name each node belongs to.
-m , --merge-graphs Merge two graph matrices into one that contains both values for each pair of items, i.e., the resulting graph looks like
stringA stringB float1 float2 ... ... ... ... where float1, float2 are the matrix values of matrix1 and matrix2, respectively. Both matrices are expected to contain the same set of pair of items, i.e., the same set of edges.
-r , --merge-graphs-color as option -m, but in addition includes the name of the cluster each pair of values belong to. If they belong to different clusters the label is "x". The label is NAN if any of the item does not belong to any of the clusters defined in the given partition. The format of the output is float1A flaot2 clustername_AB stringA stringB ... ... ... ... ...
-l , --cull-edges Culls from matrix of values edges specified in second file.
--prune-edges-below float graphfile --prune-edges-above float graphfile Removes all edges below or above the given threshold.
--graph-nodes graphfile --matrix-nodes graphfile Prints the list of nodes of the given interaction matrix.
--graph-print [-c col] , --matrix-print [-c col] Print the given interaction matrix. For debugging. Integer col specifies the column containing the edge values. Default: col=3.
General options
--verbose For debugging.
-q , --quiet quiet mode. Do not print out comment lines (that start with `#').
-t , --format {--fmt} [pfmt=input_partition_file_format] Specify the default format expected for the input paritions. Possible format values are: PART,MCL and FREE. See below. As MCL is automatically recognized from the file content, this option will be useful in two cases: (1) to distinguish between PART and FREE input partitions, (2) in combination with --tab, if the output (specified with --oformat or the different format conversion options) is different from the (input) format specified with -t, the tabfile will be used for translating the labels of the elements; however, if the _specified_ input and output formats coindice, the original labels will be preserved. Example: p.mcl is in MCL format; p.lst, in PART format. partanalyzer -t PART --tab tbf -V p.mcl p.lst this gives the distance between the two by using the tab file on p.mcl, but NOT on p.lst. Default input format is PART.
--oformat [pfmt=input_partition_file_format] Specify the ouput format when printing partitions. Default output format is PART.
File formats:
matrix-of-values (an undirected graph): stringA stringB float stringA stringC float ... ... ... stringZ stringV float
tab file: integer1 string1 integer2 string2 ... ...
partition: PART: (default, i.e., partition_offset=2) sizeA clusterA_name item_1 item_2 ... item_sizeA sizeB clusterB_name item_1 item_2 ... item_sizeB ... ... ... ... ... or (partition_offset=1): sizeA item_1 item_2 ... item_sizeA ... ... ... ... FREE: (not yet implemented) (partition_offset=0) item_1 item_2 ... item_sizeA ... ... ... MCL : MCL's own matrix format for partitions. See MCL manual.
License # partanalyzer Version alpha 1.0. # Copyright (c) Miguel A. Santos, May. 2008-2010 .Build Feb 19 2010 # Licensed under the GNU GPL version 3 or later. # (see http://www.gnu.org/copyleft/gpl.html ) #
Examples: For lastest options check the help from the program ./partanalyze -h
Check consistency of a given partition test.subfam.lst based on a matrix of interactions given by test-blast_pairwise_id. How large are the intra-cluster values compared to the inter-cluster ones. ./partanalyze -c test-blast_pairwise_id test.subfam.lst or ./partanalyze --check-consistency-of-partition test-blast_pairwise_id test.subfam.lst which also accepts an abreviated form as ./partanalyze --ccop test-blast_pairwise_id test.subfam.lst
Calculate VI distance between two partitions and between each of them and their intersection Definition of VI distance: Given two partitions P1 and P2, with cluster size distributions {n_k} and {n_k'} respectively, where k and k' are indexes to each of their corresponding clusters, and such that Sum_k n_k = Sum_k n'_k = N, the VI distance is defined as
VI (P1,P2) = Sum_k n_k/N * log( n_k/N) + Sum_k' n_k'/N * log( n_k'/N) - 2 * Sum_k Sum_k' n_kk'/N log(n_kk'/N) where n_kk' is the number of items common to cluster k of P1 and cluster k' of P2. This definition satisfies the triangular inequality, i.e., for any three partitions P1,P2 and P, it is VI (P1, P) + VI (P,P2) >= VI (P1,P2) ./partanalyze --vi-distance test.subfam.lst test.subfam.lst2 or simply ./partanalyze -v test.subfam.lst test.subfam.lst2
Print the intersection of 2 partitions test.subfam.lst and test.subfam.lst2 ./partanalyze -i test.subfam.lst test.subfam.lst2 Performs the intersection of P1 and P2 as induced by the intersection operation on the underlying set (the one that contains all elements). This gives a new partition I such that each cluster of I is obtained as an intersection of one cluster of P1 and one of P2 (all againts all).
Print the purity scores for partition1 (target) againts partition2 (reference) ./partanalyze --purity-scores test.subfam.lst test.subfam.lst2 or simplply ./partanalyze -p test.subfam.lst test.subfam.lst2
It outputs the purity strict and purity lax values. Purity strict of P1 againts P2 := the number of non-singleton clusters of P1 that are exactly identical to one of P2, divided by the number of non-singleton clusters of P2 (the reference). Purity Lax of P1 againts P2 := the number of non-singleton clusters of P1 that are subsets of a cluster of P2, divided by the number of non-singleton clusters of P1 (the target).
For debugging: print the interaction matrix read by the program ./partanalyze --print-matrix test-blast_pairwise_id