NextStrain - quipupe/from-assembly-to-nextstrain GitHub Wiki
Tutorial Zika
https://nextstrain.org/docs/tutorials/zika
git clone https://github.com/nextstrain/zika-tutorial.git
Revisar el formato de metadata y de las secuencias https://github.com/nextstrain/zika-tutorial/tree/master/data
Se crea una carpeta al mismo nivel que la data
mkdir -p results/
Filtrar muestras
y se usa augur para filtrar
augur filter \
--sequences data/sequences.fasta \
--metadata data/metadata.tsv \
--exclude config/dropped_strains.txt \
--output results/filtered.fasta \
--group-by country year month \
--sequences-per-group 20 \
--min-date 2012
Estos son mis resultados. Eran 34 secuencias grep -c ">" data/sequences.fasta
1 sequences were dropped during filtering
1 of these were dropped because they were in config/dropped_strains.txt
0 of these were dropped because of their date (or lack of date)
0 of these were dropped because of subsampling criteria
33 sequences have been written out to results/filtered.fasta
Alinear con mafft
augur align \
--sequences results/filtered.fasta \
--reference-sequence config/zika_outgroup.gb \
--output results/aligned.fasta \
--fill-gaps
the --fill-gaps flag fills gaps in non-reference sequences with āNā characters.
Output
using mafft to align via: mafft --reorder --anysymbol --nomemsave --adjustdirection --thread 1 results/aligned.fasta.to_align.fasta 1> results/aligned.fasta 2> results/aligned.fasta.log
Katoh et al, Nucleic Acid Research, vol 30, issue 14
https://doi.org/10.1093%2Fnar%2Fgkf436
16bp insertion at ref position 0
AGTTGTTGATCTGTGT: ZKC2/2016
TCTGTGT: SMGC_1
AGTAGTTGATCTGTGT: EcEs062_16
AGTTGTTACTGTTGCT: VEN/UF_1/2016
GTTGTTGATCTGTGT: PRVABC59
GTGT: USA/2016/FLUR022
1bp insertion at ref position 61
T: 1_0087_PF, 1_0181_PF, 1_0199_PF, ZKC2/2016, SMGC_1, EcEs062_16, PAN/CDC_259359_V1_V3/2015, COL/FLR_00024/2015, COL/FLR_00008/2015,
VEN/UF_1/2016, Colombia/2016/ZC204Se, HND/2016/HU_ME59, Nica1_16, PRVABC59, USA/2016/FL022, BRA/2016/FC_6706, DOM/2016/BB_0433, DOM/2016/BB_0183,
DOM/2016/MA_WGS16_011, USA/2016/FLUR022, Aedes_aegypti/USA/2016/FL05, SG_027, SG_074, SG_056, Thailand/1610acTw
26bp insertion at ref position 10769
TGTGGGGAAATCCATGGGTCT: ZKC2/2016, PAN/CDC_259359_V1_V3/2015
TGTGGGGA: SMGC_1
TGTGGGGAAATCCATGGGAGATCGGA: EcEs062_16
TGTGGGGAAATCCATGGGTCTT: VEN/UF_1/2016
TGTGGGGAAATC: USA/2016/FLUR022
Trimmed gaps in KX369547.1 from the alignment
El genoma de referencia se encuentra en config/zika_outgroup.gb
Construir la filogenia con IQTREE
augur tree \
--alignment results/aligned.fasta \
--output results/tree_raw.nwk
Calibrar el arbol
augur refine \
--tree results/tree_raw.nwk \
--alignment results/aligned.fasta \
--metadata data/metadata.tsv \
--output-tree results/tree.nwk \
--output-node-data results/branch_lengths.json \
--timetree \
--coalescent opt \
--date-confidence \
--date-inference marginal \
--clock-filter-iqd 4
Reconstruir caracteristicas ancestrales
augur traits \
--tree results/tree.nwk \
--metadata data/metadata.tsv \
--output results/traits.json \
--columns region country \
--confidence
OUTPUT
augur traits is using TreeTime version 0.7.6
Assigned discrete traits to 33 out of 33 taxa.
NOTE: previous versions (<0.7.0) of this command made a 'short-branch length assumption. TreeTime now optimizes the overall rate numerically and thus allows for long branches along which multiple changes accumulated. This is expected to affect estimates of the overall rate while leaving the relative rates mostly unchanged. Assigned discrete traits to 33 out of 33 taxa. NOTE: previous versions (<0.7.0) of this command made a 'short-branch length assumption. TreeTime now optimizes the overall rate numerically and thus allows for long branches along which multiple changes accumulated. This is expected to affect estimates of the overall rate while leaving the relative rates mostly unchanged. Inferred ancestral states of discrete character using TreeTime: Sagulenko et al. TreeTime: Maximum-likelihood phylodynamic analysis Virus Evolution, vol 4, https://academic.oup.com/ve/article/4/1/vex042/4794731 results written to results/traits.json
Reconstruir secuencias ancestrales
augur ancestral \
--tree results/tree.nwk \
--alignment results/aligned.fasta \
--output-node-data results/nt_muts.json \
--inference joint
OUTPUT
augur ancestral is using TreeTime version 0.7.6
/home/pipo/.local/lib/python3.8/site-packages/treetime/aa_models.py:88: VisibleDeprecationWarning: Creating an ndarray from ragged nested
sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you
must specify 'dtype=object' when creating the ndarray
_BLOSUM45 = np.array([
Inferred ancestral sequence states using TreeTime:
Sagulenko et al. TreeTime: Maximum-likelihood phylodynamic analysis
Virus Evolution, vol 4, https://academic.oup.com/ve/article/4/1/vex042/4794731
ancestral mutations written to results/nt_muts.json
Traducir mutaciones en amino acidos
augur translate \
--tree results/tree.nwk \
--ancestral-sequences results/nt_muts.json \
--reference-sequence config/zika_outgroup.gb \
--output results/aa_muts.json
OUTPUT
Read in 13 features from reference sequence file
amino acid mutations written to results/aa_muts.json
Exportando el archivo en formato json
augur export v2 \
--tree results/tree.nwk \
--metadata data/metadata.tsv \
--node-data results/branch_lengths.json \
results/traits.json \
results/nt_muts.json \
results/aa_muts.json \
--colors config/colors.tsv \
--lat-longs config/lat_longs.tsv \
--auspice-config config/auspice_config.json \
--output auspice/zika.json
Validating schema of 'results/aa_muts.json'...
Validating config file config/auspice_config.json against the JSON schema
Validating schema of 'config/auspice_config.json'...
Validating produced JSON
Validating schema of 'auspice/zika.json'...
Validating that the JSON is internally consistent...
Validation of 'auspice/zika.json' succeeded.
Ver los resultados
El analisis anterior crea una carpeta llamada auspice
auspice.js view --datasetDir auspice/
Los resultados se observan aqui localmente http://localhost:4000