5. Advanced topics - bbuchfink/diamond GitHub Wiki

Optimizing performance

Substantial gains in performance can be achieved by configuring the tool according to the needs of a given application instead of simply running it with with default settings. Some points to consider:

  • Set the -e parameter (maximum expected value) as low as possible.

  • Set the -k parameter (number of target sequences to report alignments for) as low as possible. This will improve performance and reduce the use of temporary disk space and the size of the output files.

  • Configure the tabular output format to only report the output fields that are actually required (see output options).

  • Use the BLAST tabular format for machine-based processing of the output. Use the XML and SAM format only if required by your downstream tools or if you prefer it as a human-readable format.

  • Use the option --compress 1 to automatically generate gzip-compressed output files.

  • Set the block size (-b) and index chunks (-c) parameters according to the available resources on your system.

Compiling with custom GCC

Download and compile GCC:

cd
wget ftp://ftp.gnu.org/gnu/gcc/gcc-10.2.0/gcc-10.2.0.tar.gz
tar xzf gcc-10.2.0.tar.gz
cd gcc-10.2.0
./contrib/download_prerequisites
cd ..
mkdir objdir
cd objdir
$PWD/../gcc-10.2.0/configure --prefix=$HOME/GCC-10.2.0 --disable-multilib --disable-bootstrap --enable-languages=c++
make -j 8
make install

Download and compile Diamond:

cd
git clone https://github.com/bbuchfink/diamond.git
cd diamond
mkdir bin
cd bin
export CC=$HOME/GCC-10.2.0/bin/gcc
export CXX=$HOME/GCC-10.2.0/bin/g++
cmake -DSTATIC_LIBGCC=ON -DSTATIC_LIBSTDC++=ON ..
make -j8

Database format versions

  • v0.9.25 to current produce format version 3 and accept format version 2-3.
  • v0.9.19 to v0.9.24 produce and accept format version 2.
  • v0.9.0 to v0.9.18 produce and accept format version 1.
  • v0.8.12 to v0.8.38 produce and accept format version 0.

Small database optimization

Since v2.0.8, an optimization to increase performance for small (<10 MB) database files is available. To use it, invoke diamond makeidx -d <database file>, which will create a <database file>.dmnd.seed_idx file that can be used in subsequent alignment runs. This call should be provided with one of the sensitivity settings (--mid-sensitive, --sensitive, --more-sensitive, --very-sensitive, --ultra-sensitive or default) and optionally, the number of seed shapes (-s). The generated index file is specific to this sensitivity setting. The setting for subsequent alignment runs needs to match this sensitivity setting, otherwise undefined behaviour will occur.

To use the index file for an alignment run, set the option --target-indexed. The file is loaded using memory mapping and can be shared among processes.

This optimization was developed for project Serratus.