1. Data sources and definitions - typhoidgenomics/TyphiNET GitHub Wiki

Data sources

The TyphiNET dashboard displays data from publicly available whole-genome sequences of Salmonella Typhi.

  • The raw genome sequence data (fastq files) are generated by dozens of different laboratories around the world for research and routine public health purposes, and deposited in INSDC databases (DDBJ in Japan, EMBL-EBI in Europe, NCBI in USA).
  • The Global Typhoid Genomics Consortium (GTGC) curates the sequences and source information as outlined in this paper. This includes assembling the genomes using a consistent pipeline, applying quality filters, and gathering consistent metadata from data contributors via a standard template bit.ly/typhiMeta. Importantly this allows us to identify/confirm genome sets that are from non-targeted sampling frames, suitable for calculating national annual prevalences.
  • Genomes and curated metadata are uploaded to the Typhi Pathogenwatch platform, where they are subjected to standardised genotyping, to identify lineage variants (genotypes according to the GenoTyphi scheme) and AMR determinants (defined below). The genomes are also available for interactive analysis in Pathogenwatch.
  • TyphiNET pulls the resulting genotypes, AMR determinants and curated metadata from Pathogenwatch, and uses these to infer resistance phenotypes for specific drugs and combinations most relevant to Typhi (defined below). The resulting TyphiNET database can be downloaded directly, as described here. Each genome is annotated with INSDC accessions for the raw data, a citation (PubMed ID or preprint DOI), and details of the originating laboratory (see lab codes here) and contact person.

The TyphiNET dashboard filters the TyphiNET database to include only genomes from non-targeted sampling frames, and uses these to calculate data points such as national/annual prevalences of AMR and genotype variants to display plots according to user preferences, using the definitions detailed below.

Genotypes

Salmonella Typhi genotypes are defined by the GenoTyphi scheme [citations: Wong et al. 2016, Dyson & Holt 2021].

Scheme definitions are hosted at https://github.com/typhoidgenomics/genotyphi, which includes software to call genotypes.

TyphiNET sources genotype calls from Typhi Pathogenwatch, which assigns genotypes to uploaded genome assemblies using inhouse code.

H58 genotypes

Haplotype 58 (H58) genotypes were originally defined using a different scheme that is no longer in use. However the name is well-known in the Typhi field as a common drug-resistant lineage, therefore TyphiNET provides a map view of national prevalence of H58, which includes all those genotypes starting with the prefix 4.3.1 under the GenoTyphi framework (including genotypes 4.3.1, 4.3.1.1, 4.3.1.2, 4.3.1.3 and associated sublineages).

Dominant genotypes

Dominant genotypes are defined as the most common genotype present in a setting based on counts. Where two or more genotypes are equally common for a given country, a randomly selected genotype is displayed on the map. National genotype frequencies are shown in tooltips when the mouse hovers over a country on the map.

Drug resistances

In Salmonella Typhi, there is extremely high concordance between known molecular mechanisms of drug resistance (genes and mutations) and antimicrobial susceptibility phenotypes as determined by public health reference laboratories [citations: Chattaway et al, 2021, Argimon et al, 2021].

TyphiNET sources resistance determinants from Typhi Pathogenwatch, which screens uploaded genome assemblies as described here.

Resistance mechanisms

Definitions of antimicrobial resistance variables, interpreted from molecular mechanisms, used in TyphiNET are as follows:

Ampicillin resistant: presence of a beta-lactamase gene (usually bla TEM-1).

Azithromycin resistant: presence of non-synonymous mutation at codon 717 of gene acrB (see Hooda et al. 2019 for details). Note that mobile azithromycin resistance genes (mph, ere) are also screened by Pathogenwatch but have yet to be found in Salmonella Typhi.

Chloramphenicol resistant: presence of ≥1 chloramphenicol acetyltransferase gene, including catA1, cmlA.

Ciprofloxacin non-susceptible (NS): presence of ≥1 QRDR mutation or ≥1 qnr gene.

Ciprofloxacin resistant (R): presence of either (i) ≥3 QRDR mutations, or (ii) qnr gene plus ≥1 QRDR mutation.

Ceftriaxone resistant: presence of ≥1 extended spectrum beta-lactamase (ESBL) gene, including bla SHV-12, CTX-M-15, CTX-M-55.

Sulphonamide resistant: presence of one or both of the sulphonamide dihydropteroate synthase genes sul1 or sul2.

Tetracycline resistant: presence of ≥1 tetracycline efflux pump gene, including tetA(D), tetA(C), tetA(B), tetA(A).

Trimethoprim resistant: presence of one or more dihydrofolate reductase gene, including dfrA1, dfrA5, dfrA7, dfrA14, dfrA15, dfrA17, dfrA18.

Trimethoprim-sulfamethoxazole resistant: presence of at least one sul gene and one dfr gene.

Resistance variable definitions

MDR (Multidrug-resistant): resistant to all three of ampicillin, chloramphenicol, and trimethoprim-sulfamethoxazole

XDR (Extensively drug resistant): MDR plus resistant to ciprofloxacin AND ceftriaxone

QRDR (Quinolone Resistance Determining Region): specific sites in chromosomal genes at which mutations are associated with quinolone resistance: gyrA (codon 83 and 87), gyrB (codon 464) or parC (codons 80 and 84).

Pan-susceptible: No AMR-associated genes or mutations detected.

Local vs travel

TyphiNET uses the Global Typhoid Genomics Consortium (GTGC) definition of 'country of origin', which assigns each genome to the country where the infection is presumed to have been acquired [citations: Carey et al, 2023, Ingle et al. 2019]. This is typically the country where the isolate was collected. However in some countries where typhoid is not endemic, public health agencies collect individual case travel histories to determine the country where each infection originated. These cases are recorded as travel-associated, and the country attributed in TyphiNET is the presumed country of infection based on travel history, as opposed to the country where the isolate was collected.

For example, a Typhi genome sequenced by the public health agency in England, from a patient who recently returned from India and is thought to have acquired the infection there, will be recorded as travel-associated and originating in India. In TyphiNET, this genome will be contribute to the case counts for India, not for England. If a case is recorded as travel-associated but the country of travel is not known, it will be recorded as having an unknown country of origin and will be excluded from the TyphiNET dashboard.

By default, the TyphiNET dashboard includes ALL cases with a known country of origin (whether from local or travel sources). Optionally, users can choose to restrict the view to either LOCAL (isolates collected locally within the specified countries) or TRAVEL (isolates collected outside the specified country, associated with travel to that country) by toggling the filter buttons.

Thresholds and cutoffs for plotted data

To ensure robust estimates of both genotype frequencies and drug resistance, minimum data thresholds are applied to some plots as detailed below.

Map view

Prevalence data are shown for countries where ≥20 genome sequences are available (after applying the current filters for time period and local/travel source). Countries with <20 genomes are shown as ‘Insufficient data’.

Drug resistance trends

Annual resistance frequencies are shown for those years with n≥10 genome sequences (after applying the current filters for time, country and local/travel source). Points are not displayed for years with <10 sequences.