Quick start - nterhoeven/reper GitHub Wiki

Quick start

This is the tl,dr page. It shows you how to use reper with a plant data set via docker.

Installation

  1. download reper
git clone https://github.com/nterhoeven/reper.git
  1. get the docker container
cd reper
docker build .
# or 
docker pull nterhoeven/reper
  1. create an alias for reper
alias reper="docker run --user=$(id -u):$(id -g) -it --rm -v $(pwd):/data nterhoeven/reper"

prepare the run

  1. edit reper.conf to include the following info:
READS1='reads.fastq'  # the paired end sequencing reads
READS2='mates.fastq'
COVERAGE='10'         # the genomic coverage sequenced
READLENGTH=100        # the average read length

MAXTHREADS=20         # the maximum CPU threads allowed
MAXMEMORY='20G'       # the maximum memory allowed
  1. prepare the databases (since we are working on a plant data set here, we can use the defaults)
reper configure-refseq
reper configure-REdat

run reper

Now it is time to start the reper pipeline:

reper kmerCount

results

The resulting repeat landscape is summarized in the three repeat-landscape* files. You can use the R script plot-landscape.R (needs ggplot2) to create two plots of the results.