Wall lizard dataset - bpp/bpp-tutorial-geneflow GitHub Wiki
Sequence data
The Lacertidae are the family of the wall lizards or also known as true lizards, or sometimes simply lacertas, which are native to Afro-Eurasia. It is a diverse family with about 360 species in 39 genera. They represent the most common reptile group in Europe.
There have been several attempts to unravel the relationships among the lacertid genera based on both, genetic, and morphological markers. However, most phylogenetic inference efforts to date yielded unresolved or conflicting topologies. The most problematic relationships in the family are those among the 19 Lacertini genera that mostly occur across Eurasia. All attempts to unravel the relationships among its genera resulted in topologies with small internal, and long external branches, thus, resembling a “bush”.
In this tutorial, we will use data from Garcia-Porta et al, 2019. In this study, they performed RNA-seq and identified orthologous sequences based on a previously compiled set of markers across vertebrates. The dataset contains 6,269 loci in total. The relevant phylogeny is shown in the Figure below in panel a.
For practical reasons, we will use a subset of 50 loci for 20 samples corresponding to 18 species and 17 genera (compared to the original data shown in a in the figure above we have removed the two most distant outgroups, Gallotia and Psammodromus, and two problematic taxa, Hellenolacerta and one of the Podarcis liolepis samples).
What makes a phylogeny difficult to infer?
- Short time separating speciation events (limited phylogenetic signal, incomplete lineage sorting, hybridization)
- Long time since speciation events (homoplasy, heterogeneous evolution)
Pre-processing
Prepare the folders that we will be using throughout the tutorial
mkdir lizard-exercise
cd lizard-exercise
mkdir data alignments concatenation phylo filtered_alignments
Go to data
folder and download the compressed fasta files and uncompress
cd data
wget https://github.com/bpp/bpp-tutorial-geneflow/raw/refs/heads/main/first-day/fasta-files.tar.gz
tar -xvzf fasta-files.tar.gz
rm fasta-files.tar.gz
Have a look at the data:
less -S locus_1.fasta
You can use alan
to display one of the alignments in the terminal:
alan locus_1.fasta
Type :q
to exit the alan viewer.
You will notice that the sequences are not aligned! We need to align them first before doing phylogenetic inference.
Next, Alignment and filtering