Using reper - nterhoeven/reper GitHub Wiki

Using reper

In order to run reper, you need a the config file reper.conf and a command (for example help). Then you can run reper like this:

reper help

Using docker, the recommendend way to run reper is

#set the current user and mount the current working directory as /data - it will be used as working directory inside docker and files will be created with the right permissions
docker run --user=$(id -u):$(id -g) -it --rm -v $(pwd):/data nterhoeven/reper help

You can create an alias for the docker command and then call reper as reper <command>

alias reper="docker run --user=$(id -u):$(id -g) -it --rm -v $(pwd):/data nterhoeven/reper"

With singularity you should use

singularity exec reper-singularity.img help

The reper config file

The config file is called reper.conf and must be readable in the directory from which reper is started. It consists of the following sections

  • run-specific data
    • sequencing library names
    • sequencing depth
    • average read length
    • output directory
    • CPU and Memory limits
  • Dependencies
    • paths to required software and databases
  • Step specific options
    • names of intermediate files
    • parameter settings shared between steps

The first section has to be adjusted to each run. The second section has to be adjusted if you installed reper manually or want to use a different reference database (e.g. repbase). You rarely have to change the third section, as it contains settings used internally in reper.

The reper commands

reper consists of several commands. Which are explained in this section.

configuration commands

These commands are used for the configuration of the reference database. You probably want to use the same database for multiple reper runs. In that case, you can run the configuration commands once and set the path to the database directories in the config file (make sure the data is accessible when using docker or singularity).

  • configure-REdat
  • configure-refseq
  • configure-repbase

The default databases used are the open access plant repeat database REdat and refseq for chloroplast and mitochondrial data. If these fit to your analysis, you can download and configure them with the two commands mentioned above.

In case you are not working with plant data or want to use a different (e.g. species specific) reference dataset, you can do so by providing a fasta file and blast database of the sequences. To ensure correct parsing of the fasta, make sure, the headers look like this:

>seqID|class|source

The seqID should contain a unique ID, the class the class assigned to this sequence and the source can be a species name or similar

Since repbase is a very popular database, reper contains a command to convert an already present download into a reper-readable format. Please follow these instructions to configure repbase:

  • You need a copy of the fasta-formatted version of repbase
  • create a list of file names (one per line) of species you want to include as reference
  • run this command: reper configure-repbase files.list /path/to/repbase/RepBaseXX.YY.fasta.This will create a library and blastdb files.list.fa (Docker/Singularity note: You probably have to mount the repbase dirctory in the container)
  • edit the following line in reper.conf: classificationDB="$dbDir/repbase/files.list.fa

pipeline commands

These commands start the pipeline with the specified step. The following steps are executed automatically. If something goes wrong and reper cannot finish, you can resume the run starting after the last successful step. Example: The assembly step fails because it exceeded the memory limit. Then you can increase it in the config file and restart reper by calling reper assembly. In this case you save time by not rerunning the kmerCount and kmerFilter steps.

  • kmerCount
  • kmerFilter
  • assembly
  • cluster
  • classify
  • quantify
  • landscape

other commands

These are supporting commands. print-env prints some information about the current configuration (useful for debugging/logging setup). help prints a help text.

  • print-env
  • help
⚠️ **GitHub.com Fallback** ⚠️