Double phylogenetic placement of mixed samples (MISA) - KamilSJaron/k-mer-approaches-for-biodiversity-genomics GitHub Wiki

Using MISA for mixed genome skim analyses

Prepare the query and distances.

We will now place the mixed individual (a known hybrid called Saccharomyces pastorianus) onto the tree using a double-placement tool MISA.

cd $USERWORK
cd skmer-tutorial
mkdir mix-query
cp genomes/Saccharomyces_pastorianus/GCA_001515485.2_Saccharomyces_pastorianus_Weihenstephan_34_70_chromosomes_assembly_1.0_genomic.fna mix-query/Saccharomyces_pastorianus.fna

These are the real constituents of Saccharomyces pastorianus.

cat genomes/Saccharomyces_pastorianus/things.txt

Recall that yesterday, we used -a to add Saccharomyces cerevisiae to the reference set. Let us first infer a backbone tree that includes Saccharomyces cerevisiae.

# Update the distance matrix to include the added species Saccharomyces cerevisiae
skmer distance -t library/

# Build the full tree with included
tsv_to_phymat.sh ref-dist-mat.txt  ref-dist-mat-full.phy
fastme -i ref-dist-mat-full.phy -o full.tre

Start by computing distances from the mixed query to the references.

# Run Skmer
skmer query -t mix-query/Saccharomyces_pastorianus.fna library/
# Convert output to .tsv file
convert_to_tsv.sh dist-saccharomyces_pastorianus.txt > dist-saccharomyces_pastorianus.tsv

Ignoring mixtures:

Now, place the sample onto the tree, ignoring that it is a mixture.

run_apples.py -t backbone-fastme.tre -d dist-saccharomyces_pastorianus.tsv -o pastorianus-single.jplace
guppy tog pastorianus-single.jplace
nw_display pastorianus-single.tog.tre

Placement of mixed samples with both constituents present

Let's jump to MISA runs.

# Run MISA for phylogenetic double placemet
run_misa.py -d dist-saccharomyces_pastorianus.tsv -t full.tre -o mixed-output-present.jplace


# Check the output versus correct mixture:
guppy tog mixed-output-present.jplace
nw_display full.tre
nw_display mixed-output-present.tog.tre

Placement of mixed samples with one constituent missing

Now, let's try the double-placement when one of the constituents is missing from the backbone.

nw_display backbone-fastme.tre

# Run MISA for phylogenetic double placemet
run_misa.py -d dist-saccharomyces_pastorianus.tsv -t backbone-fastme.tre -o mixed-output.jplace


# Check the output versus correct mixture:
guppy tog mixed-output.jplace
cat genomes/Saccharomyces_pastorianus/things.txt
nw_display backbone-fastme.tre
nw_display mixed-output.tog.tre

You will see the following beautiful result. As you can see, MISA correctly identified the two parent species of Saccharomyces pastorianus.

  • Top: the full reference tree before removing Saccharomyces cerevisiae. The Two blue branches are known constituents of Saccharomyces pastorianus.
  • Bottom: Results of placement of Saccharomyces pastorianus on the tree after removing Saccharomyces cerevisiae.

Placement of mixed samples with both constituents missing

nw_prune backbone-fastme.tre Saccharomyces_eubayanus > backbone-noconst.tre
run_misa.py -d dist-saccharomyces_pastorianus.tsv -t backbone-noconst.tre -o mixed-output-noconst.jplace
guppy tog mixed-output-noconst.jplace
nw_display mixed-output-noconst.tog.tre