Getting our metagenomic datasets - AstrobioMike/JPL-HBCU-2020 GitHub Wiki
We will be working with 8 sample datasets of metagenomic reads. Illumina sequencing is often done as paired-end sequencing, which is the case for these datasets. This means there is a "forward" and "reverse" read file for each sample (commonly designated as "R1" and "R2" as below). So there are in total 16 files for our 8 samples. Additionally, the files also have "trimmed" in their names because the initial "raw" sequencing files have been processed with a tool called Trimmomatic which performs trimming and filtering of reads based on their quality scores.
Downloading samples
These can be downloaded to our instances and unpacked with the following commands (they are ~5 GB in total):
curl -L -o metagenomic-read-files.tar.gz https://ndownloader.figshare.com/files/24079451
tar -xzvf metagenomic-read-files.tar.gz
Sample info
Sample | Associated files |
---|---|
Sample 1 | sample1_R1_trimmed.fastq.gzsample1_R2_trimmed.fastq.gz |
Sample 2 | sample2_R1_trimmed.fastq.gzsample2_R2_trimmed.fastq.gz |
Sample 3 | sample3_R1_trimmed.fastq.gzsample3_R2_trimmed.fastq.gz |
Sample 4 | sample4_R1_trimmed.fastq.gzsample4_R2_trimmed.fastq.gz |
Sample 5 | sample5_R1_trimmed.fastq.gzsample5_R2_trimmed.fastq.gz |
Sample 6 | sample6_R1_trimmed.fastq.gzsample6_R2_trimmed.fastq.gz |
Sample 7 | sample7_R1_trimmed.fastq.gzsample7_R2_trimmed.fastq.gz |
Sample 8 | sample8_R1_trimmed.fastq.gzsample8_R2_trimmed.fastq.gz |