NextStrain - quipupe/from-assembly-to-nextstrain GitHub Wiki

Tutorial Zika

https://nextstrain.org/docs/tutorials/zika git clone https://github.com/nextstrain/zika-tutorial.git

Revisar el formato de metadata y de las secuencias https://github.com/nextstrain/zika-tutorial/tree/master/data

Se crea una carpeta al mismo nivel que la data mkdir -p results/

Filtrar muestras

y se usa augur para filtrar

augur filter \
  --sequences data/sequences.fasta \
  --metadata data/metadata.tsv \
  --exclude config/dropped_strains.txt \
  --output results/filtered.fasta \
  --group-by country year month \
  --sequences-per-group 20 \
  --min-date 2012

Estos son mis resultados. Eran 34 secuencias grep -c ">" data/sequences.fasta

1 sequences were dropped during filtering
1 of these were dropped because they were in config/dropped_strains.txt                                                                                             
0 of these were dropped because of their date (or lack of date)                                                                                                     
0 of these were dropped because of subsampling criteria                                                                                                     
33 sequences have been written out to results/filtered.fasta 

Alinear con mafft

augur align \
  --sequences results/filtered.fasta \
  --reference-sequence config/zika_outgroup.gb \
  --output results/aligned.fasta \
  --fill-gaps

the --fill-gaps flag fills gaps in non-reference sequences with ā€œNā€ characters.

Output

using mafft to align via:                                                                                                                                                   mafft --reorder --anysymbol --nomemsave --adjustdirection --thread 1 results/aligned.fasta.to_align.fasta 1> results/aligned.fasta 2> results/aligned.fasta.log
                                                                                                                                                                                                                                                                                                                                      Katoh et al, Nucleic Acid Research, vol 30, issue 14                                                                                                                
https://doi.org/10.1093%2Fnar%2Fgkf436                                                                                                                                                                                                                                                                                          
16bp insertion at ref position 0                                                                                                                                            
AGTTGTTGATCTGTGT: ZKC2/2016                                                                                                                                         
TCTGTGT: SMGC_1                                                                                                                                                     
AGTAGTTGATCTGTGT: EcEs062_16                                                                                                                                        
AGTTGTTACTGTTGCT: VEN/UF_1/2016                                                                                                                                     
GTTGTTGATCTGTGT: PRVABC59                                                                                                                                           
GTGT: USA/2016/FLUR022                                                                                                                                      
1bp insertion at ref position 61                                                                                                                                            

T: 1_0087_PF, 1_0181_PF, 1_0199_PF, ZKC2/2016, SMGC_1, EcEs062_16, PAN/CDC_259359_V1_V3/2015, COL/FLR_00024/2015, COL/FLR_00008/2015, 
VEN/UF_1/2016, Colombia/2016/ZC204Se, HND/2016/HU_ME59, Nica1_16, PRVABC59, USA/2016/FL022, BRA/2016/FC_6706, DOM/2016/BB_0433, DOM/2016/BB_0183, 
DOM/2016/MA_WGS16_011, USA/2016/FLUR022, Aedes_aegypti/USA/2016/FL05, SG_027, SG_074, SG_056, Thailand/1610acTw                                                                                             
26bp insertion at ref position 10769                                                                                                                                        
TGTGGGGAAATCCATGGGTCT: ZKC2/2016, PAN/CDC_259359_V1_V3/2015                                                                                                         
TGTGGGGA: SMGC_1                                                                                                                                                    
TGTGGGGAAATCCATGGGAGATCGGA: EcEs062_16                                                                                                                              
TGTGGGGAAATCCATGGGTCTT: VEN/UF_1/2016                                                                                                                               
TGTGGGGAAATC: USA/2016/FLUR022                                                                                                                              
Trimmed gaps in KX369547.1 from the alignment  

El genoma de referencia se encuentra en config/zika_outgroup.gb

Construir la filogenia con IQTREE

augur tree \
  --alignment results/aligned.fasta \
  --output results/tree_raw.nwk

Calibrar el arbol

augur refine \
  --tree results/tree_raw.nwk \
  --alignment results/aligned.fasta \
  --metadata data/metadata.tsv \
  --output-tree results/tree.nwk \
  --output-node-data results/branch_lengths.json \
  --timetree \
  --coalescent opt \
  --date-confidence \
  --date-inference marginal \
  --clock-filter-iqd 4

Reconstruir caracteristicas ancestrales

augur traits \
  --tree results/tree.nwk \
  --metadata data/metadata.tsv \
  --output results/traits.json \
  --columns region country \
  --confidence

OUTPUT
augur traits is using TreeTime version 0.7.6
Assigned discrete traits to 33 out of 33 taxa.       
                                                                                                                                                                                                                                                                                   NOTE: previous versions (<0.7.0) of this command made a 'short-branch                                                                                               length assumption. TreeTime now optimizes the overall rate numerically                                                                                              and thus allows for long branches along which multiple changes                                                                                                      accumulated. This is expected to affect estimates of the overall rate                                                                                               while leaving the relative rates mostly unchanged.                                                                                                                  Assigned discrete traits to 33 out of 33 taxa.                                                                                                                                                                                                                                                                                          NOTE: previous versions (<0.7.0) of this command made a 'short-branch                                                                                               length assumption. TreeTime now optimizes the overall rate numerically                                                                                              and thus allows for long branches along which multiple changes                                                                                                      accumulated. This is expected to affect estimates of the overall rate                                                                                               while leaving the relative rates mostly unchanged.                                                                                                                                                                                                                                                                                      Inferred ancestral states of discrete character using TreeTime:                                                                                                             Sagulenko et al. TreeTime: Maximum-likelihood phylodynamic analysis                                                                                                 Virus Evolution, vol 4, https://academic.oup.com/ve/article/4/1/vex042/4794731                                                                                                                                                                                                                                                  results written to results/traits.json 

Reconstruir secuencias ancestrales

augur ancestral \
  --tree results/tree.nwk \
  --alignment results/aligned.fasta \
  --output-node-data results/nt_muts.json \
  --inference joint

OUTPUT
augur ancestral is using TreeTime version 0.7.6
/home/pipo/.local/lib/python3.8/site-packages/treetime/aa_models.py:88: VisibleDeprecationWarning: Creating an ndarray from ragged nested 
sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you 
must specify 'dtype=object' when creating the ndarray                                                                                                                                                              
_BLOSUM45 = np.array([                                                                                                                                                                                                                                                                                                                
Inferred ancestral sequence states using TreeTime:                                                                                                                          
Sagulenko et al. TreeTime: Maximum-likelihood phylodynamic analysis                                                                                                 
Virus Evolution, vol 4, https://academic.oup.com/ve/article/4/1/vex042/4794731                                                                                                                                                                                                                                                  
ancestral mutations written to results/nt_muts.json

Traducir mutaciones en amino acidos

augur translate \
  --tree results/tree.nwk \
  --ancestral-sequences results/nt_muts.json \
  --reference-sequence config/zika_outgroup.gb \
  --output results/aa_muts.json
OUTPUT
Read in 13 features from reference sequence file                                                                                                                    
amino acid mutations written to results/aa_muts.json 

Exportando el archivo en formato json

augur export v2 \
  --tree results/tree.nwk \
  --metadata data/metadata.tsv \
  --node-data results/branch_lengths.json \
              results/traits.json \
              results/nt_muts.json \
              results/aa_muts.json \
  --colors config/colors.tsv \
  --lat-longs config/lat_longs.tsv \
  --auspice-config config/auspice_config.json \
  --output auspice/zika.json
Validating schema of 'results/aa_muts.json'...
Validating config file config/auspice_config.json against the JSON schema                                                                                           
Validating schema of 'config/auspice_config.json'...                                                                                                                
Validating produced JSON                                                                                                                                            
Validating schema of 'auspice/zika.json'...                                                                                                                         
Validating that the JSON is internally consistent...                                                                                                                
Validation of 'auspice/zika.json' succeeded.

Ver los resultados

El analisis anterior crea una carpeta llamada auspice
auspice.js view --datasetDir auspice/

Los resultados se observan aqui localmente http://localhost:4000