Quick start - nterhoeven/reper GitHub Wiki
Quick start
This is the tl,dr page. It shows you how to use reper with a plant data set via docker.
Installation
- download reper
git clone https://github.com/nterhoeven/reper.git
- get the docker container
cd reper
docker build .
# or
docker pull nterhoeven/reper
- create an alias for reper
alias reper="docker run --user=$(id -u):$(id -g) -it --rm -v $(pwd):/data nterhoeven/reper"
prepare the run
- edit
reper.conf
to include the following info:
READS1='reads.fastq' # the paired end sequencing reads
READS2='mates.fastq'
COVERAGE='10' # the genomic coverage sequenced
READLENGTH=100 # the average read length
MAXTHREADS=20 # the maximum CPU threads allowed
MAXMEMORY='20G' # the maximum memory allowed
- prepare the databases (since we are working on a plant data set here, we can use the defaults)
reper configure-refseq
reper configure-REdat
run reper
Now it is time to start the reper pipeline:
reper kmerCount
results
The resulting repeat landscape is summarized in the three repeat-landscape*
files.
You can use the R script plot-landscape.R
(needs ggplot2) to create two plots of the results.