4. DAY 6.12.2016 - mai0/Project_BB2491 GitHub Wiki

We got the data

Until today we were waiting for our dataset! Today we got it!

Email from Lars:

...you can access the data in the file

    /proj/g2016025/nobackup/private/z4006c01.g.ipe.fq.bz2

you can now find Illumina paired-end data from a sample (naturally) enriched in chloroplast. This means that there are plenty (a majority actually) of reads from the nuclear (and some mt) DNA, but it should only be possible to assemble the chloroplast. The genome coverage seems less than 0.1, but the chloroplast coverage should be closing up towards 100.

The data is compressed using bzip2, so you unpack it with bunzip2. The unpacked file is 2.7 GB and in FastQ format. The reads are short (about 75 nt) and the fragment size is ~300. The reads are interleaved, which means that reads come two-and-two. You can see that on the identifiers. The first two reads have identifiers @HWI_ST139:1:1:3619:1964#0/1 @HWI_ST139:1:1:3619:1964#0/2 As you see, it is the same ID, but with different digits trailing the slash.

    Have fun!