Code Quality Benchmark - adrianzap/softwipe GitHub Wiki
To generate a benchmark, we have executed softwipe on a collection of programs, most of which are bioinformatics tools from the area of evolutionary biology. Some of the below tools (genesis, raxml-ng, repeatscounter, hyperphylo) have been developed in our lab. You will find a table containing the code quality scores below. Note that this is subject to change as we are refining our scoring criteria and including more tools.
Softwipe scores for each category are assigned such that the "best" program in each category that is not an outlier obtains a 10 out of 10 score, and the "worst" program in each category that is not an outlier is assigned a 0 out of 10 score. An outlier is defined to be a value that lies outside of Tukey's fences.
All code quality categories use relative scores. For instance, we calculate the number of compiler warnings per total Lines Of Code (LOC). Hence, we can use those relative scores to compare and rank the different programs in our benchmark. The overall score that is used for our ranking is simply the average over all score categories. You can find a detailed description of the scoring categories and the tools included in our benchmark below.
program | overall | relative score | compiler_and_sanitizer | assertions | cppcheck | clang_tidy | cyclomatic_complexity | lizard_warnings | unique | kwstyle | infer | test_count |
---|---|---|---|---|---|---|---|---|---|---|---|---|
genesis-0.24.0 | 9.0 | 9.1 | 9.9 | 8.7 | 8.4 | 9.2 | 9.0 | 9.4 | 8.2 | 8.2 | N/A | 10.0 |
fastspar | 8.3 | 8.6 | 9.6 | 2.0 | 9.9 | 9.9 | 8.8 | 7.9 | 8.8 | 6.4 | 9.7 | 10.0 |
axe-0.3.3 | 7.6 | 7.6 | 9.4 | 1.2 | 6.6 | 9.3 | 6.2 | 7.6 | 8.4 | 9.8 | N/A | 10.0 |
pstl | 7.5 | 7.1 | 10.0 | 0.4 | 8.0 | 5.6 | 9.3 | 9.9 | 6.3 | 8.4 | N/A | 10.0 |
raxml-ng_v1.0.1 | 7.5 | 7.8 | 9.9 | 4.2 | 6.6 | 9.0 | 7.9 | 6.6 | 4.0 | 9.2 | N/A | 10.0 |
kahypar | 7.3 | 7.6 | 6.7 | 2.4 | 8.0 | N/A | 9.2 | 9.6 | 3.3 | 9.1 | N/A | 10.0 |
bindash-1.0 | 7.2 | 6.9 | 8.3 | 8.8 | 5.8 | 7.1 | 8.7 | 9.5 | 8.2 | 8.5 | N/A | 0.0 |
ExpansionHunter-4.0.2 | 7.2 | 7.3 | 8.7 | 1.8 | 8.6 | 9.4 | 8.9 | 9.1 | 0.4 | 7.9 | N/A | 10.0 |
ripser-1.2.1 | 6.9 | 6.7 | 10.0 | 6.3 | 6.4 | 2.4 | 8.9 | 9.1 | 8.6 | 9.9 | 7.1 | 0.0 |
naf-1.1.0/unnaf | 6.8 | 7.3 | 9.9 | 4.0 | 9.8 | 10.0 | 6.9 | 7.5 | 7.2 | 3.3 | 9.5 | 0.0 |
virulign-1.0.1 | 6.8 | 7.0 | 9.1 | 3.4 | 9.4 | 9.0 | 7.3 | 5.8 | 7.5 | 9.3 | N/A | 0.0 |
naf-1.1.0/ennaf | 6.8 | 6.8 | 9.9 | 10.0 | 9.4 | 10.0 | 7.2 | 6.7 | 0.0 | 5.2 | 9.0 | 0.0 |
glucose-3-drup | 6.7 | 6.7 | 8.6 | 10.0 | 5.2 | 9.4 | 8.7 | 8.4 | 8.5 | 1.4 | N/A | 0.0 |
Treerecs-v1.2 | 6.7 | 6.6 | 5.8 | 1.8 | 6.7 | 8.6 | 9.0 | 9.0 | 1.6 | 7.5 | N/A | 10.0 |
dawg-1.2 | 6.6 | 6.6 | 10.0 | 0.0 | 6.3 | 10.0 | 8.4 | 8.1 | 7.9 | 9.1 | N/A | 0.0 |
RepeatsCounter | 6.6 | 6.1 | 7.7 | 0.0 | 7.0 | 6.8 | 9.0 | 10.0 | 9.3 | 9.5 | N/A | 0.0 |
samtools-1.11 | 6.5 | 6.6 | 8.6 | 1.2 | 7.4 | 9.1 | 3.8 | 2.2 | 8.2 | 6.3 | 8.1 | 9.9 |
bpp-4.3.8 | 6.4 | 6.4 | 9.8 | 9.3 | 7.1 | 8.9 | 2.8 | 2.0 | 6.6 | 9.3 | 7.9 | 0.0 |
swarm-3.0.0 | 6.3 | 6.1 | 10.0 | 0.3 | 9.3 | 3.8 | 8.0 | 7.7 | 4.3 | 9.9 | 10.0 | 0.0 |
usher-0.3.2 | 6.3 | 6.4 | 8.9 | 2.1 | 7.4 | 9.3 | 7.5 | 7.5 | 7.5 | 6.4 | N/A | 0.0 |
ntEdit-1.2.3 | 6.1 | 6.0 | 8.4 | 0.0 | 7.1 | 9.7 | 7.9 | 6.7 | 3.8 | 7.7 | 9.4 | 0.0 |
prank-msa | 5.9 | 6.2 | 5.3 | 5.1 | 9.9 | 9.0 | 7.0 | 6.6 | 1.4 | 5.8 | 9.0 | 0.0 |
IQ-TREE-2.0.6 | 5.9 | 5.5 | 2.3 | 2.5 | 4.7 | 7.8 | 8.2 | 7.7 | 5.3 | 6.6 | N/A | 7.7 |
emeraLD | 5.7 | 5.5 | 4.2 | 0.0 | 9.4 | 8.4 | 6.3 | 5.3 | 9.0 | 8.6 | N/A | 0.0 |
dna-nn-0.1 | 5.6 | 5.4 | 7.9 | 4.1 | 6.8 | 6.0 | 6.7 | 5.0 | 6.1 | 7.8 | N/A | 0.0 |
openmp | 5.5 | 5.4 | 5.8 | 0.9 | 0.2 | 1.5 | 8.1 | 7.3 | 7.6 | 8.3 | N/A | 10.0 |
HLA-LA | 5.5 | 5.5 | 7.9 | 10.0 | 4.1 | 9.5 | 5.0 | 4.1 | 2.9 | 3.1 | 8.0 | 0.0 |
BGSA-1.0 | 5.4 | 5.0 | 7.3 | 0.0 | 0.2 | 10.0 | 7.5 | 6.8 | 8.2 | 9.4 | 5.1 | 0.0 |
minimap2-2.17 | 5.3 | 4.9 | 6.8 | 2.6 | 5.2 | 6.6 | 6.1 | 5.2 | 8.0 | 5.1 | 7.6 | 0.0 |
ngsTools/ngsLD | 5.3 | 4.9 | 9.0 | 0.0 | 7.3 | 6.1 | 5.0 | 3.9 | 8.3 | 7.9 | N/A | 0.0 |
Seq-Gen-1.3.4 | 5.3 | 5.0 | 8.9 | 0.0 | 6.8 | 8.3 | 5.7 | 5.2 | 8.9 | 2.5 | 6.3 | 0.0 |
defor | 5.3 | 5.2 | 0.1 | 0.0 | 6.1 | 9.4 | 6.9 | 6.4 | 9.0 | 9.4 | N/A | 0.0 |
copmem-0.2 | 5.2 | 5.2 | 10.0 | 0.2 | 7.6 | 8.6 | 8.5 | 7.8 | 4.2 | 4.5 | 0.3 | 0.0 |
phyml-3.3.20200621 | 5.2 | 5.3 | 9.6 | 5.5 | 5.0 | 8.1 | 4.3 | 2.7 | 5.9 | 3.7 | 6.8 | 0.0 |
dr_sasa_n | 4.8 | 5.1 | 0.4 | 0.0 | 9.8 | 10.0 | 2.3 | 1.6 | 9.2 | 9.9 | N/A | 0.0 |
SF2 | 4.8 | 4.9 | 10.0 | 1.3 | 4.6 | 7.9 | 3.0 | 0.8 | 3.3 | 6.9 | 10.0 | 0.0 |
vsearch-2.15.1 | 4.7 | 4.4 | 7.1 | 0.0 | 8.2 | 1.1 | 5.0 | 3.9 | 5.6 | 9.7 | 6.6 | 0.0 |
clustal-omega-1.2.4 | 4.7 | 5.1 | 7.4 | 3.1 | 6.9 | 8.8 | 3.9 | 2.5 | 5.3 | 3.9 | N/A | 0.2 |
cellcoal-1.0.0 | 4.6 | 4.1 | 9.7 | 0.0 | 6.2 | 7.5 | 0.8 | 0.1 | 7.2 | 6.9 | 8.1 | 0.0 |
ms | 4.6 | 4.6 | 8.4 | 0.0 | 0.0 | 10.0 | 6.2 | 5.3 | 6.4 | 0.0 | 9.6 | 0.0 |
MrBayes-3.2.7a | 4.3 | 4.0 | 9.6 | 1.4 | 8.2 | 7.1 | 0.0 | 0.1 | 3.8 | 4.5 | 8.1 | 0.0 |
Gadget-2.0.7 | 4.2 | 4.2 | 10.0 | 0.0 | 0.0 | 10.0 | 0.4 | 0.1 | 5.4 | 9.1 | N/A | 3.0 |
prequal | 4.1 | 4.6 | 2.4 | 5.9 | 0.3 | 9.9 | 6.0 | 4.0 | 1.0 | 2.8 | 8.8 | 0.0 |
crisflash | 4.1 | 4.1 | 5.9 | 0.0 | 3.9 | 10.0 | 5.4 | 4.1 | 6.2 | 4.9 | 0.5 | 0.0 |
cryfa-18.06 | 3.9 | 4.1 | 6.2 | 2.0 | 0.0 | 9.7 | 5.9 | 5.5 | 6.0 | 0.0 | N/A | 0.0 |
athena-public-version-21.0 | 3.9 | 3.4 | 3.1 | 0.0 | 1.7 | 8.2 | 4.5 | 2.5 | 0.6 | 9.1 | 8.7 | 0.3 |
sumo | 3.8 | 3.8 | 0.0 | 1.2 | 6.6 | 9.4 | 8.0 | 7.4 | 0.0 | 0.5 | N/A | 0.7 |
PopLDdecay | 3.8 | 3.6 | 9.2 | 0.0 | 9.6 | 10.0 | 0.1 | 0.0 | 0.0 | 0.0 | 8.6 | 0.0 |
gargammel | 3.8 | 3.4 | 10.0 | 0.0 | 8.4 | 6.4 | 0.0 | 0.1 | 0.9 | 3.4 | 9.1 | 0.0 |
mafft-7.475 | 3.7 | 3.0 | 9.3 | 0.0 | 6.4 | 7.8 | 0.3 | 0.4 | 0.7 | 6.5 | 4.6 | 0.8 |
covid-sim-0.13.0 | 2.8 | 2.6 | 7.5 | 0.0 | 5.2 | 0.0 | 0.0 | 0.0 | 7.3 | 0.3 | N/A | 4.9 |
INDELibleV1.03 | 2.5 | 2.3 | 6.1 | 0.0 | 0.7 | 9.3 | 0.7 | 0.8 | 6.7 | 0.0 | 0.5 | 0.0 |
Tools included
Bioinformatics-related tools:
indelible 1.03
simulates sequence data on phylogenetic trees paperms
population genetics simulations papermafft 7.429
multiple sequence alignment papermrbayes 3.2.6
Bayesian phylogenetic inference paperbpp 3.4
multispecies coalescent analyses papertcoffee
multiple sequence alignment paperprank 0.170427
multiple sequence alignment papersf (SweepFinder)
population genetics paperseq-gen 1.3.4
phylogenetic sequence evolution simulation paperdawg 1.2
phylogenetic sequence evolution simulation githubrepeatscounter
evaluates quality of a data distribution for phylogenetic inference githubraxml-ng 0.8.1
phylogenetic inference papergenesis 0.22.1
phylogeny library githubminimap 2.17-r943
pairwise sequence alignment paperClustal Omega 1.2.4
multiple sequence alignment papersamtools 1.9
utilities for processing SAM (Sequence Alignment/Map) files papervsearch 2.13.4
metagenomics functions paper githubswarm 3.0.0
amplicon clustering paper githubphyml 3.3.20190321
phylogenetic inference paperIQ-TREE 1.6.10
phylogenetic inference papercellcoal 1.0.0
coalescent simulation of single-cell NGS genotypes githubtreerecs 1.0
species- and gene-tree reconciliation gitlabHyperPhylo
judicious hypergraph partitioning, for creating a data distribution for phylogenetic inference paperHLA*LA
- HLA (human leukocyte antigen) typing from linearly projected graph alignments paperDna-nn 0.1
implements a proof-of-concept deep-learning model to learn relatively simple features on DNA sequences paperntEdit 1.2.3
scalable genome sequence polishing paperlemon
framework for rapidly mining structural information from Protein Data Bank paperDEFOR
depth- and frequency-based somatic copy number alteration detector papernaf 1.1.0
Nucleotide Archival Format for lossless reference-free compression of DNA sequences paperngsLD
- Evaluating linkage disequilibrium using genotype likelihoods paperdr_sasa 0.4b
- Calculation of accurate interatomic contact surface areas for quantitative analysis of non-bonded molecular interactions paperCrisflash
software to generate CRISPR guide RNAs against genomes annotated with individual variation paperBGSA 1.0
global sequence alignment toolkit papervirulign 1.0.1
codon-correct alignment and annotation of viral genomes paperPopLDdecay 3.40
tool for linkage disequilibrium decay analysis paperfastspar 0.0.10
rapid and scalable correlation estimation for compositional data paperExpansionHunter 3.1.2
tool to analyze variation in short tandem repeat regions paperbindash 1.0
fast genome distance estimation papercopMEM 0.2
finding maximal exact matches via sampling both genomes papercryfa 18.06
secure encryption tool for genomic data paperemeraLD
rapid linkage disequilibrium estimation with massive datasets paperaxe 0.3.3
rapid sequence read demultiplexing paperprequal
detecting non-homologous characters in sets of unaligned homologous sequences paperSCIPhI-0.1.7
mutation detection in tumor cells githubUShER
a program that rapidly places new samples onto an existing phylogeny using maximum parsimony githubgargammel
"a set of programs aimed at simulating ancient DNA fragments" github
Other tools:
KaHyPar
hypergraph partitioning tool websiteAthena++
magnetohydrodynamics paperGadget 2
simulations of cosmological structure formations paperCandy Kingdom
modular collection of SAT solvers and tools for structure analysis in SAT problems githubglucose-3-drup
Glucose 3.0 (a SAT solver) with online DRUP proofs and proof traversal githubCovidSim 0.13.0
COVID-19 microsimulation model developed by the MRC Centre for Global Infectious Disease Analysis hosted at Imperial College, London. githubEclipse SUMO
traffic simulation package githubLLVM OpenMP
githubLLVM Parallel-STL
githubRipser 1.2.1
"code for the computation of Vietoris–Rips persistence barcodes" github
Scoring categories
- compiler and sanitizer: Here, we compile each benchmark tool using the clang compiler and count the number of warnings. We activate almost all warnings for this. We have weighted the warnings, such that each warning has a weight of 1, 2, or 3, where 3 is most dangerous (for instance, implicit type conversions that might result in precision loss are level 3 warnings). We calculate a weighted sum of clang warnings, where each warning that occurs in the compilation adds its level (1, 2, or 3) to the weighted sum. Additionally, we execute the tool with clang sanitizers (ASan and UBSan) and if the sanitizers find warnings, we add them to the weighted sum. Sanitizer warnings default to level 3. The compiler and sanitizer score is calculated from the weighted sum of warnings per total LOC.
- assertions: The count of assertions (C-Style assert(), static_assert(), or custom assert macros, if defined) per total LOC.
- cppcheck: The count of warnings found by the static code analyzer cppcheck per total LOC. Cppcheck categorizes its warnings; we have assigned each category a weight, similarly to the compiler warnings.
- clang-tidy: The count of warnings found by the static code analyzer clang-tidy per total LOC. Clang-tidy categorizes its warnings; we have assigned each category a weight, similarly to the cppcheck and compiler warnings.
- cyclomatic complexity: The cyclomatic complexity is a software metric to quantify the complexity/modularity of a program. See full Wikipedia article here. We use the lizard tool to assess the cyclomatic complexity of our benchmark tools. Keep in mind that the above table does not contain the real cyclomatic complexity values, but the scores, which rate all tools relative to each other regarding their cyclomatic complexity.
- lizard warnings: The number of functions that are considered too complex, relative to the total number of functions. Lizard counts a function as "too complex" if its cyclomatic complexity, its length, or its parameter count exceeds a certain treshold value.
- unique rate: The amount of unique code; a higher amount of code duplication decreases this value. The unique rate is obtained using lizard.
- kwstyle: The count of warnings found by the static code style analyzer KWStyle per total LOC. We configure KWStyle using the
KWStyle.xml
file that is delivered with softwipe. - infer: We weight the warnings found by the static analyzer Infer and use the weighted warnings per LOC rate to calculate a score.
- test count: We try to put the amount of written unit test LOC in relation with the overall LOC count and compute the rate as test_code_loc/overall_loc. At the moment we keep the detection of unit test LOC simple and declare files which have the keyword "test" in their path as test code files.
Analysis tool versions
For the benchmark we used the following analysis tool versions:
- clang 11.0.0
- clang-tidy 5.0.1
- cppcheck 2.1
- lizard 1.17.7
- kwstyle latest git version (25.02.2021)
- infer 0.17.0
Absolute values
For comparability reasons, we provide the absolute values for all results from which the above table is derived. The following table contains each programs total lines of pure code (by which it is sorted), total number of functions, and the absolute results for each scoring category. Note that these are already weighted results, that is, for example, level 3 compiler warnings are counted as 3 warnings here.
program | loc | functions | compiler | sanitizer | assertions | cppcheck | clang_tidy | cyclomatic_complexity | lizard_warnings | unique | kwstyle | infer | test_count |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
sumo | 514811 | 23788 | 1664573 | 0 | 995 | 9585 | 9057 | 4.6 | 1285 | 0.7079 | 43493 | N/A | 2563 |
IQ-TREE-2.0.6 | 220709 | 10930 | 93864 | 0 | 852 | 6368 | 13767 | 4.2 | 527 | 0.9098 | 5827 | N/A | 10994 |
Treerecs-v1.2 | 171121 | 10189 | 40533 | 0 | 483 | 3140 | 6861 | 2.4 | 210 | 0.845 | 3314 | N/A | 64810 |
dr_sasa_n | 146963 | 86 | 97039 | 0 | 0 | 182 | 22 | 11.5 | 15 | 0.9968 | 94 | N/A | 0 |
kahypar | 109786 | 9732 | 20475 | 0 | 417 | 1207 | N/A | 1.7 | 72 | 0.8796 | 751 | N/A | 72051 |
MrBayes-3.2.7a | 95597 | 962 | 1959 | 4 | 205 | 964 | 7969 | 22.6 | 287 | 0.8872 | 3984 | 242 | 0 |
openmp | 91040 | 3530 | 21406 | 0 | 127 | 6600 | 22104 | 4.4 | 196 | 0.9479 | 1157 | N/A | 16895 |
raxml-ng_v1.0.1 | 87135 | 2545 | 645 | 0 | 572 | 1633 | 2602 | 4.7 | 181 | 0.8903 | 531 | N/A | 13049 |
samtools-1.11 | 78959 | 2321 | 6414 | 0 | 151 | 1125 | 2038 | 9.5 | 364 | 0.9626 | 2264 | 200 | 7626 |
mafft-7.475 | 77251 | 932 | 3275 | 0 | 0 | 1558 | 4849 | 17.8 | 226 | 0.81 | 2098 | 534 | 405 |
ExpansionHunter-4.0.2 | 72944 | 3945 | 5275 | 5 | 208 | 584 | 1296 | 2.8 | 74 | 0.7912 | 1157 | N/A | 18758 |
phyml-3.3.20200621 | 70845 | 1609 | 1800 | 1 | 596 | 1942 | 3762 | 9.0 | 235 | 0.9188 | 3301 | 298 | 0 |
athena-public-version-21.0 | 65302 | 1509 | 24518 | 1 | 3 | 2990 | 3325 | 8.8 | 229 | 0.8005 | 463 | 111 | 131 |
genesis-0.24.0 | 62886 | 3855 | 266 | 0 | 859 | 567 | 1472 | 2.4 | 49 | 0.9608 | 885 | N/A | 7658 |
bpp-4.3.8 | 41109 | 793 | 499 | 0 | 646 | 668 | 1314 | 10.8 | 129 | 0.9305 | 210 | 110 | 0 |
clustal-omega-1.2.4 | 34160 | 883 | 4970 | 0 | 162 | 576 | 1133 | 9.4 | 133 | 0.9106 | 1557 | N/A | 42 |
vsearch-2.15.1 | 24384 | 506 | 4039 | 0 | 0 | 242 | 6409 | 8.2 | 62 | 0.9142 | 65 | 107 | 0 |
prank-msa | 24023 | 756 | 6334 | 0 | 188 | 12 | 660 | 6.0 | 54 | 0.8378 | 773 | 31 | 0 |
HLA-LA | 23811 | 462 | 2817 | 0 | 1653 | 753 | 337 | 8.3 | 55 | 0.872 | 1217 | 62 | 0 |
usher-0.3.2 | 22140 | 849 | 1405 | 1 | 72 | 319 | 456 | 5.3 | 45 | 0.9463 | 610 | N/A | 0 |
covid-sim-0.13.0 | 13200 | 124 | 1857 | 0 | 0 | 350 | 9280 | 32.5 | 42 | 0.9434 | 1255 | N/A | 433 |
Gadget-2.0.7 | 12589 | 148 | 0 | 0 | 0 | 1534 | 4 | 16.9 | 47 | 0.9117 | 83 | N/A | 257 |
fastspar | 11346 | 90 | 226 | 0 | 35 | 4 | 23 | 3.1 | 4 | 0.9779 | 310 | 4 | 9933 |
cellcoal-1.0.0 | 11000 | 66 | 189 | 0 | 0 | 229 | 793 | 14.7 | 21 | 0.9406 | 264 | 27 | 0 |
pstl | 10380 | 1162 | 0 | 0 | 7 | 115 | 1303 | 1.6 | 2 | 0.9247 | 128 | N/A | 6617 |
INDELibleV1.03 | 9697 | 216 | 2150 | 0 | 0 | 543 | 199 | 14.9 | 45 | 0.9321 | 4252 | 139 | 0 |
minimap2-2.17 | 8841 | 339 | 1599 | 0 | 35 | 236 | 859 | 7.1 | 34 | 0.9569 | 334 | 28 | 0 |
swarm-3.0.0 | 7092 | 212 | 0 | 0 | 3 | 26 | 1204 | 4.6 | 10 | 0.8945 | 7 | 0 | 0 |
dawg-1.2 | 7058 | 256 | 0 | 0 | 0 | 146 | 0 | 3.9 | 10 | 0.9539 | 47 | N/A | 0 |
PopLDdecay | 6557 | 57 | 292 | 0 | 0 | 16 | 3 | 19.5 | 20 | 0.4369 | 1418 | 12 | 0 |
SF2 | 5337 | 121 | 0 | 0 | 11 | 158 | 312 | 10.5 | 25 | 0.8789 | 129 | 0 | 0 |
glucose-3-drup | 4772 | 479 | 390 | 0 | 149 | 126 | 78 | 3.3 | 16 | 0.9705 | 318 | N/A | 0 |
dna-nn-0.1 | 4768 | 210 | 574 | 1 | 30 | 85 | 541 | 6.4 | 22 | 0.923 | 82 | N/A | 0 |
ngsTools/ngsLD | 4373 | 113 | 236 | 0 | 0 | 65 | 487 | 8.3 | 14 | 0.9643 | 69 | N/A | 0 |
Seq-Gen-1.3.4 | 3980 | 120 | 237 | 0 | 0 | 70 | 195 | 7.5 | 12 | 0.9828 | 222 | 19 | 0 |
gargammel | 3444 | 17 | 0 | 0 | 0 | 31 | 357 | 48.7 | 5 | 0.8183 | 169 | 4 | 0 |
crisflash | 3279 | 84 | 763 | 0 | 0 | 107 | 0 | 7.8 | 10 | 0.9238 | 128 | 47 | 0 |
copmem-0.2 | 3026 | 133 | 4 | 0 | 1 | 40 | 123 | 3.7 | 6 | 0.8939 | 125 | 48 | 0 |
axe-0.3.3 | 2781 | 60 | 94 | 0 | 5 | 53 | 54 | 7.0 | 3 | 0.9677 | 4 | N/A | 802 |
prequal | 2600 | 99 | 1083 | 0 | 23 | 179 | 4 | 7.2 | 12 | 0.8228 | 139 | 4 | 0 |
ntEdit-1.2.3 | 2365 | 87 | 213 | 0 | 0 | 38 | 23 | 4.8 | 6 | 0.8867 | 42 | 2 | 0 |
cryfa-18.06 | 2216 | 74 | 473 | 5 | 7 | 370 | 20 | 7.3 | 7 | 0.9213 | 372 | N/A | 0 |
ms | 2182 | 71 | 193 | 1 | 0 | 201 | 0 | 7.0 | 7 | 0.9263 | 641 | 1 | 0 |
emeraLD | 1642 | 51 | 524 | 0 | 0 | 5 | 74 | 6.8 | 5 | 0.988 | 18 | N/A | 0 |
bindash-1.0 | 1622 | 88 | 152 | 0 | 23 | 38 | 133 | 3.2 | 1 | 0.963 | 19 | N/A | 0 |
naf-1.1.0/unnaf | 1620 | 77 | 4 | 2 | 10 | 2 | 0 | 6.1 | 4 | 0.9415 | 80 | 1 | 0 |
naf-1.1.0/ennaf | 1615 | 73 | 7 | 1 | 78 | 5 | 0 | 5.7 | 5 | 0.6041 | 60 | 2 | 0 |
BGSA-1.0 | 1405 | 30 | 216 | 0 | 0 | 100 | 0 | 5.3 | 2 | 0.9621 | 7 | 9 | 0 |
virulign-1.0.1 | 1149 | 46 | 56 | 0 | 6 | 4 | 33 | 5.6 | 4 | 0.9464 | 6 | N/A | 0 |
ripser-1.2.1 | 1053 | 105 | 0 | 0 | 10 | 21 | 221 | 2.8 | 2 | 0.9742 | 1 | 4 | 0 |
defor | 695 | 27 | 602 | 0 | 0 | 15 | 11 | 6.2 | 2 | 0.9876 | 3 | N/A | 0 |
RepeatsCounter | 243 | 19 | 30 | 2 | 0 | 4 | 22 | 2.4 | 0 | 1.0 | 1 | N/A | 0 |
How to create the benchmark
To calculate this benchmark, the results of all softwipe runs must be saved into a results directory that has one subdirectory for each tool that should be included in the benchmark. Most importantly, for each tool, the output of softwipe must be saved into a file called "softwipe_output.txt", which has to lie in the according subdirectory for that tool. For example, the directory structure has to look like this:
results/
results/tool1/
results/tool1/softwipe_output.txt
results/tool2/
results/tool2/softwipe_output.txt
...
Then, the script calculate_score_table.py
can be used to parse all the softwipe output files and generate a csv that contains all scores. The script requires the path to the results directory (results/
in our example). The script contains a list called FOLDERS
that contains the names of all subdirectories that will be included in the benchmark (tool1
, tool2
, etc. in out example). To add or remove a tool to/from the benchmark, edit this list.
The script recalculates all scores from the rates, rather than parsing the scores directly. This is done so that softwipe doesn't need to be rerun for all tools if the scoring functions get changed. The script simply uses softwipe's scoring functions from scoring.py
. These scoring functions use the values calculated by the compare_results.py
script, which are the best/worst values that are not outliers, as mentioned above.