Analysis 5: Binning - cecilia-andersson/Genome_Analysis_Project GitHub Wiki
First, I made a depth.txt file using BWA-MEM and samtools to use when binning with MetaBat. To do this, I first mapped the contigs from the assembly back to the DNA reads fasta file using BWA-MEM. I then sorted and indexed this using samtools, in a pipe to eliminate the memory-stealing SAM files. To create the actual depth text, I used the 'jgi_summarize_bam_contig_depths' call from MetaBat, which takes the contigs file from the assembly and the sorted mapped reads as inputs.
After creating the depth text, I was able to run MetaBat using the contigs and depth text.
To think about:
-
Metabat uses information about the contig coverage and tetranucleotide frequency to classify contigs into bins. What are they? Why are they suitable features to use? Contig coverage: The number of reads per base in a contig. This is a suitable feature to use to determine which bin a contig belongs to because, in theory, all the contigs in one bin should have the same coverage because they are all from the same organism, whose entire genome shows up in the sample at a consistent rate. For example, if one organism is very abundant in the sample, its bin's contigs will have a high coverage number when compared to contig coverage of a bin belonging to an organism that only appears once or twice in the sample.
Tetranucleotide frequency: The frequency of the same four-base pairs appearing within a contig. This is a suitable measure to use because different microbes have different biases towards particular tetranucleotide frequencies. -
Check how many contigs you have in your metagenome assembly. And look at how many contigs are in your bins.
-
Do the numbers add up? Is this expected? What does it mean? The numbers do not quite add up, but this is expected. It's possible that some contigs do not properly fit into the bins, whether because the contigs themselves are chimeric or because there isn't a bin that the contig associates well with.