Task - Pas-Kapli/tutorials GitHub Wiki
In this tutorial we delimit the species for two datasets:
-
57 cytochrome b sequences of the Branchiomma worms of the Sabellidae family.
-
517 cytochrome c oxidase subunit I sequences of the genus Carabus.
In the Branchiomma dataset we observe that the pairwise distances form two distinct clusters:
While, in the Carabus dataset this distinction is not so clear:
By the end of all tutorials we want to fill in the following two tables
Table 1:
| Method | Nsp. Branchiomma | Nsp. Carabus |
|---|---|---|
| Crop ST = 97 | ? | ? |
| ABGD ST = 97 | ? | ? |
| Vsearch STcf = 97 | ? | ? |
| Vsearch STcsm = 97 | ? | ? |
| Vsearch STcs = 97 | ? | ? |
| multi ptp | ? | ? |
| single ptp | ? | ? |
| GMYC strict_clock | ? | ? |
Table 2:
| Method | Nsp. Branchiomma | Nsp. Carabus |
|---|---|---|
| Crop ST = 97 | ? | ? |
| Crop ST = 98 | ? | ? |
| Crop ST = 99 | ? | ? |
| ABGD ST = 97 | ? | ? |
| ABGD ST = 98 | ? | ? |
| ABGD ST = 99 | ? | ? |
| Vsearch STcf = 97 | ? | ? |
| Vsearch STcf = 98 | ? | ? |
| Vsearch STcf = 99 | ? | ? |
| Vsearch STcsm = 97 | ? | ? |
| Vsearch STcsm = 98 | ? | ? |
| Vsearch STcsm = 99 | ? | ? |
| Vsearch STcs = 97 | ? | ? |
| Vsearch STcs = 98 | ? | ? |
| Vsearch STcs = 99 | ? | ? |
| multi ptp | ? | ? |
| single ptp | ? | ? |
| GMYC strict_clock | ? | ? |
| GMYC LN_clock | ? | ? |
Based on the results we want to gain some insight and discuss the following questions:
Is the performance of the methods affected by the barcoding gap?
Does the similarity threshold affect the delimitation in each method? Are there some methods more vulnerable to the thresholds?
Is there a pattern in the performance of the methods?