Getting our metagenomic datasets - AstrobioMike/JPL-HBCU-2020 GitHub Wiki

We will be working with 8 sample datasets of metagenomic reads. Illumina sequencing is often done as paired-end sequencing, which is the case for these datasets. This means there is a "forward" and "reverse" read file for each sample (commonly designated as "R1" and "R2" as below). So there are in total 16 files for our 8 samples. Additionally, the files also have "trimmed" in their names because the initial "raw" sequencing files have been processed with a tool called Trimmomatic which performs trimming and filtering of reads based on their quality scores.

Downloading samples

These can be downloaded to our instances and unpacked with the following commands (they are ~5 GB in total):

curl -L -o metagenomic-read-files.tar.gz https://ndownloader.figshare.com/files/24079451

tar -xzvf metagenomic-read-files.tar.gz

Sample info

Sample	Associated files
Sample 1	sample1_R1_trimmed.fastq.gzsample1_R2_trimmed.fastq.gz
Sample 2	sample2_R1_trimmed.fastq.gzsample2_R2_trimmed.fastq.gz
Sample 3	sample3_R1_trimmed.fastq.gzsample3_R2_trimmed.fastq.gz
Sample 4	sample4_R1_trimmed.fastq.gzsample4_R2_trimmed.fastq.gz
Sample 5	sample5_R1_trimmed.fastq.gzsample5_R2_trimmed.fastq.gz
Sample 6	sample6_R1_trimmed.fastq.gzsample6_R2_trimmed.fastq.gz
Sample 7	sample7_R1_trimmed.fastq.gzsample7_R2_trimmed.fastq.gz
Sample 8	sample8_R1_trimmed.fastq.gzsample8_R2_trimmed.fastq.gz