"Species tree" estimation - barrettlab/2021-Genomics-bootcamp GitHub Wiki

  1. Build gene trees with IQtree2
  2. Estimate "species tree" with ASTRAL5
  3. Summarize concordance and conflict with phyparts
  4. make figure with 'phypartspiechatrts.py'

install iqtree2

### cd /usr/local/bin
### sudo wget https://github.com/iqtree/iqtree2/releases/download/v2.1.3/iqtree-2.1.3-Linux.tar.gz
### tar -xvf iqtree-2.1.3-Linux.tar.gz
### cd /iqtree-2.1.3-Linux/bin
### sudo chmod 777 iqtree2
### sudo cp iqtree2 ../..
### cd /data/cbarrett/hybpiper_test

1. Check to see if iqtree is in path and works

iqtree2

Created a partitioned nexus file from all fasta alignments in TriFusion

Then, copy and cut the partition out of the resulting nexus file, and save as 'partition.txt'

Save the nexus file -- removed CghiesbreghtianaOUT and violacea2, both had > 90% missing data

Run command to run model selection separately for each locus and generate gene trees, using 28 cores and 1000 ultrafast bootstraps

This might take a while, so sit tight, or go do something else!

nohup iqtree2 -s Roystonea_concat_partition.nex.nex.nex -S partition.txt --prefix genes -nt 28 -bb 1000 &

2. Run astral-5

java -jar /usr/local/src/ASTRAL/astral.5.7.8.jar -i genes.treefile --outgroup OpalindanOUT -o loci_astral.tre 2> astral_log_loci.log

## reroot tree based on an outgroup
## Create a blank file with nano

nano outgroup.list
type 'OpalindanOUT'
ctrl+X and save

Now, run the reroot_trees python script

python3.8 python /data/cbarrett/hybpiper_test/reroot_trees.py loci.treefile outgroup.list > rerooted.treefile  

3. Run phyparts

See the following tutorial link And PhyParts link

java -jar /usr/local/bin/phyparts/target/phyparts-0.0.1-SNAPSHOT-jar-with-dependencies.jar -a 1 -v -d rerooted.treefile -m loci_astral.tre -o output_file_phyparts

### -a = the type of analysis
### -v = verbose
### -m = target "species tree" from ASTRAL
### -d = directory with gene trees; here it is just a single phylip file of all 749 trees

4. Run 'phypartspiecharts.py' to summarize pie charts on the "species tree"

phypartspycharts.py script

python phypartspiecharts.py loci_astral.tre output_file_phyparts 749 --svg_name piechart_tree.svg


### specify target "species tree"
### include output name from phyparts (I just used "output_file_phyparts")
### specify the number of gene trees, here it is 749

5. Coalescent analysis with SVDquartets

In Geneious, masked missing data at 10, 25, 50, and 75%, plus the original matrix

Exported nexus files from Geneious

Commands for running svdquartets in paup:

## start paup
paup4a168_ubuntu64

## read in a dataset, e.g.
execute Roystonea_contatenated_all.nex

## Remove 2 taxa with >90% missing data
delete CghiesbreghtianaOUT violacea2

## Define outgroup
outgroup OpalindanOUT

## Run svdquartets with 500 std bootstraps, coalescent model (default), sampling all quartets
svdq evalQuartets=all bootstrap=standard nreps=500 nthreads=28 treefile=Roystonea_contatenated_all_10.tre;

## Now, read in the trees you just calculated (may need to specify maxtrees limit, say, 10000)
execute Roystonea_contatenated_all_10.tre;

## Compute 50% majority rule consensus of trees
contree all/strict=no majrule=yes usetreewts=yes treefile=bootMajRule_10.tre;

## Now you can edit this tree in Geneious or FigTree

## Repeat for all other matrices

5A. Concatenated trees with IQtree, same model for entire alignment

iqtree2 -m MFP -s iqtree_all.nex -o OpalindanOUT -nt 28 -bb 1000 -pre iqtree_all

5b. Concatenated trees with IQtree, with model partition

iqtree2 -m MFP -s iqtree_all.nex -o OpalindanOUT -nt 28 -bb 1000 -pre iqtree_all

-m = test all possible models (single model for entire concatenated alignment
-s = input file
-o = outgroup
-nt = num threads
-bb = bootstrap reps
-pre = prefix for output

7. Concordance factors: gene and site concordance

## Use concat tree from IQtree:

iqtree2 -t iqtree_all.contree -s iqtree_all.nex --gcf genes.treefile --scf 10000 --prefix concord -T 28

## Use coalescent tree from ASTRAL

iqtree2 -t loci_astral.tre -s iqtree_all.nex --gcf genes.treefile --scf 10000 --prefix concord_astraltree -T 28

-t = target tree
-s = input alignment
--gcf = gene tree concordance factor (here a multi-nexus file with gene trees from astral)
--scf = number of replicate quartets for site concordance factor calculation