Service portfolio - BU-ISCIII/BU-ISCIII GitHub Wiki

Introduction

BU-ISCIII is focus on the analysis of high throughput data (NGS) inside Computational and functional genomics. There are three service categories:

  • Genomic data analysis: bioinformatic analysis for different kind of data and experiments with fixed input and output files.
  • Bioinformatics counseling: bioinformatic consulting for experiment and analysis design. Also if your research interest does not fit any of our fixed service offering ask of this service and we will evaluate a possible collaboration. Moreover we offer support for training: courses organization, internship, MSc/Phd thesis,...
  • User support: we support installation of software in linux machines, and we offer the deployment of custom virtual machines in our server Bioinfo01 for researchers interested in performing their own analysis. Moreover we offer the possibility of develop small code snippets for specific functionality the researcher may require.

Service portfolio

Genomic Data Analysis:

  • Pre-processing and quality analysis

  • Sequence quality analysis and host genome removal (seek_and_destroy)

  • Next Generation Sequencing data analysis

  • DNAseq / cDNAseq: Exome sequencing (WES) / Genome sequencing (WGS) / Targeted sequencing

    • Low-frequency variants detection and annotation for whole genome or sequencing panel (e.g. retinoblastoma gene panel) (lowfreq_panel)

    • Eukaria: Variant calling and annotation for a sequencing panel (e.g. epidermolysis gene panel, mouse or rat gene panel) (exomeeb)

      ExomeEB service uses nextflow's pipeline sarek to detect variants on whole genome or targeted sequencing data, in this case exome for single samples. The output is then processed with GATK-toolkit and annotated with Ensembl's Variant Effect Predictor (VEP) and Exomiser which will include prediction of effect and inheritance mode, targeting a specific list of genes if given by the researcher.

      Below are the files that researchers NEED to provide when requesting the ExomeEB service.

      Required information for service request Service Notes Description

      When requesting a service in iskylims, researchers are required to provide pertinent details, including a list of targeted genes to analyse during exomiser's annotation step if necessary.

      • targeted_regions.bed: a file with targeted genomic coordinates during sequencing protocol in BED format, consists of one line per feature.
      chrom - name of the chromosome or scaffold. Any valid seq_region_name can be used, and chromosome names can be given with or without the 'chr' prefix.
      chromStart - Start position of the feature in standard chromosomal coordinates (i.e. first base is 0).
      chromEnd - End position of the feature in standard chromosomal coordinates
      name - Label to be displayed under the feature. Optional.
      
      1       chromStart   chromEnd   orientation(+/-)       name
      1       chromStart   chromEnd   orientation(+/-)       name
      ...
      chrX       chromStart   chromEnd   orientation(+/-)       name
      
    • Eukaria (non-human): Variant calling, annotation and SNP-based outbreak analysis (e.g. diploid fungal outbreak) (TODO freebayes_outbreak)

    • Human: Exome sequencing for variant calling, annotation and inheritance filtering (e.g. Exome sequencing of a human trio (two parents and one child)) (exometrio)

      Exometrio service uses nextflow's pipeline sarek to detect variants on whole genome or targeted sequencing data, in this case exome for multiple related samples, ussually relatives. The output is then processed with GATK-toolkit and annotated with Ensembl's Variant Effect Predictor (VEP) and Exomiser which will include prediction of effect and inheritance mode.

      Below are the files that researchers NEED to provide when requesting the Exometrio service.

      Required information for service request
      • targeted_regions.bed: a file with targeted genomic coordinates during sequencing protocol in BED format, consists of one line per feature.
      chrom - name of the chromosome or scaffold. Any valid seq_region_name can be used, and chromosome names can be given with or without the 'chr' prefix.
      chromStart - Start position of the feature in standard chromosomal coordinates (i.e. first base is 0).
      chromEnd - End position of the feature in standard chromosomal coordinates
      name - Label to be displayed under the feature. Optional.
      
      1       chromStart   chromEnd   orientation(+/-)       name
      1       chromStart   chromEnd   orientation(+/-)       name
      ...
      chrX       chromStart   chromEnd   orientation(+/-)       name
      
      • family.ped: A pedigree file following PED format.
      family.ped
      
      group   samplefather_samplefather   0       0       1       1
      group   samplemother_samplemother   0       0       2       1
      group   samplechildren_samplechildren   samplefather_samplefather   samplemother_samplemother   1       2
      
    • Human: Whole genome sequencing for SNPs variant calling, annotation and inheritance filtering (e.g.WGS of a human trio ) (wgstrio)

      WGStrio service uses nextflow's pipeline sarek to detect variants on whole genome or targeted sequencing data, in this case Whole genome for multiple related samples, ussually relatives. The output is then processed with GATK-toolkit and annotated with Ensembl's Variant Effect Predictor (VEP) and Exomiser which will include prediction of effect and inheritance mode.

      Below are the files that researchers NEED to provide when requesting the WGStrio service.

      Required information for service request
      • family.ped: A pedigree file following PED format.
      family.ped
      
      group   samplefather_samplefather   0       0       1       1
      group   samplemother_samplemother   0       0       2       1
      group   samplechildren_samplechildren   samplefather_samplefather   samplemother_samplemother   1       2
      
    • Fungal / bacteria / virus : Variant calling, annotation and SNP-based outbreak analysis (e.g. haploid fungal outbreak) (snippy)

    • Bacteria: De novo genome assembly and annotation (Assembly)

    • Bacteria: In-depth analysis of Mycobacterium species genomes (e.g. M. tuberculosis. M. bovis) (MTBSeq)

    • Bacteria: Plasmid analysis and characterization (PlasmidID)

      PlasmidID is a mapping-based, assembly-assisted plasmid identification tool that analyzes and gives graphic solution for plasmid identification.

      PlasmidID is a computational pipeline that maps Illumina reads over plasmid database sequences. The k-mer filtered, most covered sequences are clustered by identity to avoid redundancy and the longest are used as scaffold for plasmid reconstruction. Reads are assembled and annotated by automatic and specific annotation. All information generated from mapping, assembly, annotation and local alignment analyses is gathered and accurately represented in a circular image which allow user to determine plasmidic composition in any bacterial sample.

      Below are the files that researchers NEED to provide when requesting the plasmidID service.

      Required information for service request
      • As default annotation databases we use:
        • AMR resistance genes: Card database
        • Virulence genes: VirulenceFinder database
        • IS: NCBI sequences
        • Rep/INC genes: plasmidFinder database (Caratoli et al. 2014)
      • If you want a specific database you need to provide a multifasta with the sequence features you want to annotate, or indicate a url where we can download the resource.
    • Bacteria: Multi-Locus Sequence Typing (MLST), analysis of virulence factors, antimicrobial resistance, and plasmids characterization (characterization)

      MLST service performs Multi-Locus Sequence Typing of the samples with the de novo assembly genomes of the samples. It uses ChewBBACA to generate the schemas (if necessary) and perform the allele calling, and GrapeTree to generate the minimun spanning tree. You can ask for:

      • cgMLST (core-genome MLST): Set of loci that are present in the majority of strains for core genome (cg) MLST schemas.
      • wgMLST (whole-genome MLST): Set of loci that are present in at least one of the analyzed strains in the Schema Creation for whole genome MLST schemas.

      Below are the files that researchers NEED to provide when requesting the plasmidID service.

      Required information for service request
      • If the user wants a specific cgMLST/wgMLST schema, it needs to be provided.
    • Bacteria: Core genome or whole genome Multi-Locus Sequence Typing analysis (cg/wgMLST) (TODO wgmlst_chewbbaca)

    • Viral: Genomic reconstruction, variant calling and de novo assembly (viralrecon)

      Viralrecon is a bioinformatics analysis pipeline used to perform assembly and intrahost/low-frequency variant calling for viral samples. The pipeline supports both Illumina and Nanopore sequencing data. For Illumina short-reads the pipeline is able to analyse metagenomics data typically obtained from shotgun sequencing (e.g. directly from clinical samples) and enrichment-based library preparation methods (e.g. amplicon-based: ARTIC SARS-CoV-2 enrichment protocol; or probe-capture-based). For Nanopore data the pipeline only supports amplicon-based analysis obtained from primer sets created and maintained by the ARTIC Network. Some examples of viruses analyzed with this pipeline are SARS-CoV-2, mumps virus, monkeypox virus, West Nile virus, etc.

      Required information for service request
      For the correct performance of the pipeline, it is necessary to provide some input documents:
      • Primers bed file. In case of amplicon-based method, we need to provide a BED file with primer coordinates for the mapping step.

      • Primers fasta file. Additionally, a fasta file will be necessary if de novo assembly is requested.

      • viralrecon_input.xlsx

        This document contains 3 different columns:

        • SampleID: Identifier assigned to each sample to be analyzed.
        • Reference: Reference genome (or sequence) to be used to perform the analysis of each sample in the pipeline.
        • Host: Specifies the host organism from which the sequenced sample was obtained.

        Notes:

        • At least one row for every sample must be included in the document.
        • If a sample is required to be analyzed against different references (individually), one row for each one is required.
        • For multifasta documents (e.g. fragmented genomes or custom documents) containing several references, their name should be specified in the Reference column.
    • Viral Flu: Influenza fragment reconstruction and variant detection (IRMA)

  • mRNAseq: Transcriptome sequencing (mrnaseq)

    • Differential Gene Expression (DEG) (rnaseq) The RNAseq service performs a quality control (QC), trimming and alignment followed by quantification with Star and Salmon, respectively. After quantification, differntial expression analysis is carried out with DESeq2. Below are the files that researchers NEED to provide when requesting the RNA-seq service.

      Required information for service request (genes) **Service Notes Description** When requesting a service in iskylims, researchers are required to provide pertinent details, including the type of NGS data intended for analysis. Please be specific when requesting the mRNA-seq service by indicating something like: 'mRNAseq for genes'.

      comparatives.txt

      The comparatives.txt(link to access) file defines the experimental design for the analysis. It specifies the comparison order, sense, and direction between sample groups. Each comparison requested should have a corresponding line in this file. The file format consists of three columns without headings:

      1. Incremental index representing each comparison.
      2. Treatment group/s.
      3. Control group.

      Example:

      1 Treatment Control
      2 Treatment       Control
      3 Treatment       Control
      4 Treatment1-Treatment2       Control1-Control2

      clinical_data.txt

      The clinical_data.txt (link to access) file is necessary for categorizing the names of samples into comparison groups. This file comprises two columns:

      • Name: Sample name.

      • Group: Group to which the sample belongs.

      • Batch Label that groups samples according to their batch.

        Example:

           Name    Group  Batch
    • Differential transcript expression (DET) (rnaseq) The RNAseq service performs a quality control (QC), trimming and alignment followed by quantification with Star and Salmon, respectively. After quantification, differntial expression analysis is carried out with fishpond. Below are the files that researchers need to provide when requesting the RNA-seq service.

      Required information for service request (transcripts) **Service Notes Description** When requesting a service in iskylims, researchers are required to provide pertinent details, including the type of NGS data intended for analysis. Please be specific when requesting the mRNA-seq service by indicating something like: 'mRNAseq for transcripts'.

      comparatives.txt

      The comparatives.txt (link to access) file defines the experimental design for the analysis. It specifies the comparison order, sense, and direction between sample groups. Each comparison requested should have a corresponding line in this file. The file format consists of three columns without headings:

      1. Incremental index representing each comparison.
      2. Treatment group/s.
      3. Control group.

      Example:

      1 Treatment Control
      2 Treatment       Control
      3 Treatment       Control
      4 Treatment1-Treatment2       Control1-Control2

      clinical_data.txt

      The clinical_data.txt (link to access) file is necessary for categorizing the names of samples into comparison groups. This file comprises two columns:

      • Name: Sample name.

      • Group: Group to which the sample belongs.

      • Batch Label that groups samples according to their batch.

        Example:

           Name    Group  Batch
    • Differential miRNA expression (DEM) (TODO mirnaseq)

    • Gene expression changes over a series of time points (TODO timeseries_rnaseq) The RNAseq service performs a quality control (QC), trimming and alignment followed by quantification with Star and Salmon, respectively. After quantification, differntial expression analysis is carried out with ad-hoc software/scripts. Below are the files that researchers need to provide when requesting the RNA-seq service.

  • Metagenomics and targeted metagenomics

    • Taxonomic based Identification and classification of organisms in complex communities (TODO mag_met)
    • De novo assembly contigs' alignment to database BLAST(TODO blast_nt)
    • Bacteria: 16S rRNA gene analysis to assess bacterial diversity (TODO 16s_metagenomics)
    • Viral: Detection and characterization of viral genomes within metagenomic data (pikavirus)
  • Bioinformatics consulting and training

    • Bioinformatics analysis consulting
    • In-house and outer course organization
    • Student training in colaboration: Master thesis, research visit,...
  • User support

    • Installation and support of bioinformatic software on Linux OS
    • Installation and access to Virtual machines in the Unit server containing bioinformatic software
    • Code snippets development
⚠️ **GitHub.com Fallback** ⚠️