Amplicon sequencing - uic-ric/uic-ric.github.io GitHub Wiki

Table of Contents

Registering your project and sending samples

  1. Register for an account on RRC CrossLabs
  2. Submit a service request for a new amplicon project
    • Fill out project descriptions and sample details in submission form.
    • The UIC Genomics Core will review your submission for approval, and contact you regarding any questions and to provide a quote (if needed).
    • If basic bioinformatics processing is requested, the bioinformatics cost of the quote will be included as a separate set of line items attached to the same request form.
  3. Ship samples to UIC Genomics Core (instructions are included in the sample submission form on CrossLab)
    • Include a print-out of the sample submission form with your shipment.

Project initiation

  1. UIC Genomics core will receive your samples and verify information with sample submission form.
  2. Project is put into the amplicon sequencing queue and matched with other appropriate samples for a sequencing run.
  3. Amplification and library preparation is performed per project description.
  4. Gel images are generated to verify amplification. Customer will be contacted only if problems are observed (e.g., no amplification, multiple bands, bands of the wrong size).

Library QC

  1. Once the first and second PCR reactions have been completed (see pamphlet for workflow details or Naqib et al. 2018 manuscript), samples are pooled in equal volume for a QC sequencing run.
  2. Samples are sequenced on an Illumina MiniSeq (2x153 bases) mid-output run or an Illumina MiSeq (2x150 bases) Nano run. We then examine the distribution of reads/sample (see below) and re-pool.

Sequencing

  1. After re-pooling, samples are sequenced on the appropriate sequencing platform (MiniSeq mid-output 2x153; MiSeq V2 2x250; MiSeq V3 2x300; HiSeq2500 2x250; NovaSeq SP 2x250). Bolded are kits are most frequently used.
  2. Data are uploaded to Illumina’s Basespace cloud storage and Computing environment.
  3. After inspection, data are then shared with the customer using their BaseSpace email address. Customers may download their data directly from BaseSpace.

Basic bioinformatics processing

If you have requested basic annotation of the sequence data, the Genomics Core (GQC) will share the data with the bioinformatics core (Research Informatics Core; RIC).

  • The information in your sample submission form will be shared, indicating which primers were used and what type of analysis pipeline should be used. The GQC will email you when your data is shared with RIC.
  • Bioinformatics services will be invoiced separately from the RIC.
  • The RIC will then process the sequence data through a custom QIIME pipeline. Processed data will be shared on the RRC Data Portal. When data are ready, you will receive a notification from the Data Portal.

Results

The following are files that are typically included in the basic processing results of amplicon sequencing data. While most projects will include all of these files. Please note, that based on the options/parameters for the processing of your particular dataset some of these files may not be present. If you have any questions about the results from the basic processing please contact the Research Informatics Core ([email protected]).

  • report.html - Basic processing report. This report will include the basic parameters/options for the various processing steps in pipeline. Each step will also include some basic statistics about the results. The processing steps may include...
    • Read merging - The read pair for each DNA fragment/cluster are merged into a single sequence.
    • Sequence trimming - Sequences are trimmed to remove adapters, low quality regions and ambiguous nucleotides
    • Chimera checking - Sequences are filtered to remove any sequences that are likely a chimeric sequence (artifact of the PCR process)
    • Read count simplification - Sequence data are clustered, e.g. OTU, or de-noised, e.g. sub-OTU processing, to simplify the number of sequences and thereby reduce the complexity of the sequence table (sequence units and counts in each sample)
    • Taxonomic annotation - Representative sequences are given taxonomic annotations.
    • Data normalization - Filtering or normalization of the annotated sequence table.
  • biom-summary.txt - Summary statistics of sequence table. This will include the number of features (OTUs, sub-OTU clusters or ASVs depending on the processing options), the sparseness of the table (portion of non-zero values) and total read counts for each sample.
  • rep_set_sequences.zip - Compressed FASTA file of representative sequences for OTUs, sub-OTU clusters or ASV (amplicon sequencing variants) depending on sequence processing options
  • rep_set_tax_assignments.txt - Taxonomic assignment of representative sequences
  • sequences.zip - ZIP archive of sequences for each sample, after merging, trimming and chimera checking
  • seq_table.biom - Sequence table, in BIOM format. This table will include the taxonomic annotations for each sequence unit and associated raw read counts for each sample.
  • taxa_raw_counts.xlsx - Excel spreadsheet of taxonomic summaries of the raw read counts from phylum to species level
  • taxa_raw_counts.zip - ZIP archive of taxonomic summaries, in tab-delimited tables and BIOM format, of the raw read counts from phylum to species level
  • taxa_relative.xlsx - Excel spreadsheet of taxonomic summaries of the relative read counts from phylum to species level
  • taxa_relative.zip - ZIP archive of taxonomic summaries, in tab-delimited tables and BIOM format, of the relative read counts from phylum to species level
Example results

Synthetic Rhodanobacter Spike-In

For some projects, a synthetic spike-in will be added to the samples to out compete (possibly mask) any background DNA or act as an internal standard. Typically the UIC Research Resources Center will utilize a synthetic spike-in in which the primer regions are derived from the 16S of a Rhodanobacter with the regions between the primers replaced with sequences derived from a Eukaryote. When processing samples with the Synthetic Spike-In, RIC will utilize a custom annotation reference that will identify the Spike-In sequences as Synthetic_Rhodanobacter_Spike-In. Be sure to remove the Spike-In sequence counts from your data before any analyses!

Primer design

You will need to determine what to target in your experiment. For example, in 16S amplicon sequencing, different conserved segments of the 16S can be targeted by different combinations of primers. We strongly encourage working with the RIC and Genomics Core Facilities in determining the amplicon to sequence. Factors to consider include:

  • Longer amplicons will give higher taxonomic specificity.
  • Amplicons cannot be too long.
    • Amplicons need to be fully covered by paired-end sequences, with sufficient overlap to merge reads.
    • We recommend aiming for a minimum 50bp overlap between read ends to ensure that the majority of reads are merge-able, as some species may have longer variable regions in the targeted region. For example, if the expected amplicon size is 550bp, libraries should be sequenced at 2x300bp to give 600bp total length.
    • Refer to the table below for approximate amplicon sizes given different common primer choices.
  • Different amplicons will give quantitatively different results due to primer biases. Data generated from different amplicons will not be comparable.
Estimated amplicon lengths, based on positions in E. coli 16S gene
8F 341F 515F 967F
534R 526 193 - -
806R 798 465 291 -
926R 918 585 411 -
1046R 1038 705 531 79

Cross-species variability relative to position in E. coli 16S gene. Commonly targeted variable regions (e.g., V1, V2) are indicated.

⚠️ **GitHub.com Fallback** ⚠️