Quick start - nterhoeven/reper GitHub Wiki

Quick start

This is the tl,dr page. It shows you how to use reper with a plant data set via docker.

Installation

download reper

git clone https://github.com/nterhoeven/reper.git

get the docker container

cd reper
docker build .
# or 
docker pull nterhoeven/reper

create an alias for reper

alias reper="docker run --user=$(id -u):$(id -g) -it --rm -v $(pwd):/data nterhoeven/reper"

prepare the run

edit reper.conf to include the following info:

READS1='reads.fastq'  # the paired end sequencing reads
READS2='mates.fastq'
COVERAGE='10'         # the genomic coverage sequenced
READLENGTH=100        # the average read length

MAXTHREADS=20         # the maximum CPU threads allowed
MAXMEMORY='20G'       # the maximum memory allowed

prepare the databases (since we are working on a plant data set here, we can use the defaults)

reper configure-refseq
reper configure-REdat

run reper

Now it is time to start the reper pipeline:

reper kmerCount

results

The resulting repeat landscape is summarized in the three repeat-landscape* files. You can use the R script plot-landscape.R (needs ggplot2) to create two plots of the results.