Detection of insecticide‐resistant genes in mosquitoes and malaria - BGIGPD/BestPractices4Pathogenomics GitHub Wiki
一、Detection of resistance genes in mosquitoes
Introduction
This course will guide you through detecting resistance genes in mosquitoes. With extensive insecticide use, mosquitoes are developing resistance to common treatments. By identifying and analyzing key resistance genes (such as P450, AChE, and VGSC), we can assess resistance levels in mosquito populations and improve control strategies. This course will introduce the detection workflow and relevant tools, helping you gain essential skills for resistance gene detection in mosquitoes.
FastD is a comprehensive platform to analyze insecticide resistance through two main mechanisms:
-
FastD_TR detects target-site mutations associated with insecticide resistance from RNA-Seq data.
- Target genes include:
AChE, VGSC, RyR, nAChR
- Target genes include:
-
FastD_MR detects overexpressed detoxification genes associated with insecticide resistance.
- Target genes include:
P450, GST, CCE
- Target genes include:
Step 1: Open Your Server and Set Up Working Directory
# Make your working directory
mkdir IR_mos
cd IR_mos
Step 2: Download FastD
-
Download FastD. Click on "FastD linux".
-
Transfer it to the
IR_mos
directory using WinSCP.
Alternatively, you can copy the file from my directory:
cp /home/liuji/IR/IR_mosq/FastD_v1.0_linux.tar.gz ./
cp /home/liuji/IR/IR_mosq/19BJYW1818.sam ./
tar -xzvf FastD_v1.0_linux.tar.gz
Step 3: Set Up the Conda Environment
Activate Conda if it's not already activated:
source ~/miniconda3/bin/activate
Install Required Software
-
Create a Conda Environment:
conda config --show channels#if you have these three channels,you don't run conda config -add * conda config --add channels conda-forge conda config --add channels bioconda conda config --add channels defaults conda create -n mq -c conda-forge r-base#if you have these three channels,just start to run this command conda activate mq
-
Install Required Packages:
conda install -c bioconda bowtie2 conda install -c conda-forge r-ggplot2 r-ggseqlogo r-deseq2 #if it fails, we can install in R directly.
Install R Packages (if needed)
R
# After entering the R console, run the following command
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
#Here a number will pop up to select the mirror source, here we select 18
BiocManager::install("DESeq2")
install.packages("ggplot2")
install.packages("ggseqlogo")
Quit and Save the R Environment
q()
# Save workspace image? [y/n/c]: y
Step 4: Input and Output
Input:
- The raw RNA-Seq reads are aligned to target gene sequences using Bowtie2, and the resulting
.sam
file is used as input.
Output:
result_mutation_scaned*
: Includes mutation detection and frequency in target genes. Each file contains codons corresponding to each mutation site in the reads.
Step 5: Run FastD_TR for Mutation Detection
Run the FastD_TR.pl
script with the following command:
perl FastD_v1.0_linux/FastD_TR.pl -s Aedes_albopictus -r VGSC AChE -i 19BJYW1818.sam
Step 6: Review Output Files
FastD generates results in the following folders:
-
Mutation_detected_details: Contains detailed mutation detection information, including mutation sites, types, and related genes for each sample.
-
Mutation_table: Summarizes resistance gene mutations in table format, providing an overview of detected mutations for statistical analysis.
-
Mutation_visualization: Provides visualizations of mutation distributions or proportions across samples.
Note: The software may not generate data in the Mutation_visualization
folder, but the results in the first two folders are the most essential.
二、Detection of drug resistance genes in plasmodium
Introduction
In this tutorial, I will show you how to use a Python script to compare the VCF file with an Excel file that contains mutation site information for malaria resistance genes. The goal is to extract the resistance-related genes present in the samples. This script helps identify resistance mutations and provides detailed information about the mutation sites, gene names, and amino acid changes.
Prerequisites
To run this script, ensure the following prerequisites are in place:
- Python environment: Python 3.7 or higher is recommended.
- Required Python libraries:
pandas
: for reading and processing Excel filesopenpyxl
: for reading.xlsx
files
Installing Required Libraries
You can install the necessary libraries using the following commands:
conda create -n malaria
conda activate malaria
pip install pandas openpyxl
Setting Up Your Working Directory
- Create and navigate to your working directory:
mkdir ~/IR_mala
cd ~/IR_mala
- Copy the necessary files into your working directory:
cp /home/liuji/IR/IR_mala/fetch_IR.py ./
cp /home/liuji/IR/IR_mala/test1_pf.vcf.gz ./
cp /home/liuji/IR/IR_mala/snp_marker.xlsx ./
- Unzip the VCF file:
gunzip test1_pf.vcf.gz
Running the Script
Once the files are in place, run the Python script to compare the VCF file with the Excel mutation data. Use the following command:
python fetch_IR.py test1_pf.vcf snp_marker.xlsx result1
This command will execute the script with the snp_marker.xlsx
file as input and output the results in a folder named result
.
Output Structure
The output will be structured in the following way:
result/
│
├── sample_1_result/ # Folder for each sample's results
│ ├── GeneA/ # Folder for GeneA
│ │ └── GeneA_mutations.csv # CSV file containing mutations of GeneA
│ ├── GeneB/ # Folder for GeneB
│ │ └── GeneB_mutations.csv # CSV file containing mutations of GeneB
│ └── ...
│
├── sample_2_result/ # Folder for another sample's results
│ ├── GeneA/ # Folder for GeneA
│ │ └── GeneA_mutations.csv # CSV file containing mutations of GeneA
│ ├── GeneB/ # Folder for GeneB
│ │ └── GeneB_mutations.csv # CSV file containing mutations of GeneB
│ └── ...
│
└── ...
In the output directory:
- Each sample's result will be stored in its own folder (e.g.,
sample_1_result
). - Inside each sample folder, there will be separate folders for each gene (e.g.,
GeneA
). - Each gene folder will contain a CSV file (e.g.,
GeneA_mutations.csv
) with detailed mutation information.
Exercise Section
1. FastD Exercise: Detect Overexpressed Genes
For this exercise, you will use the FastD_MR.pl tool to detect overexpressed genes related to insecticide resistance, such as P450, GST, and CCE. You will be working with a sam file named 19BJYW1818.sam
. The task is to identify mutations in resistance genes and analyze the overexpression data.
- Data to Use:
19BJYW1818.sam
- Genes of Interest: P450, GST, CCE
Note that FASTD_MR is used differently from FastD_TR, and remember to return to foreword understanding
2. Malaria Resistance Gene Detection: Python Script Exercise
In this exercise, you will use the Python script fetch_IR.py
to compare a VCF file (test2_pf.vcf.gz
) with mutation site information stored in the snp_marker.xlsx
file. The goal is to extract and analyze the resistance-related genes present in the sample.
- Data to Use:
test2_pf.vcf.gz
(VCF file with sample mutation data)snp_marker.xlsx
(Excel file containing mutation sites and gene information) The script will output CSV files with mutation details for the relevant genes in the sample. The result of FastD_MR lists the detoxifying enzyme genes which are selected to be the potential resistance causal genes.
These exercises will allow you to practice identifying resistance mutations in malaria samples using both the FastD_MR.pl tool and the Python script. The exercises will help you understand the mutation sites, gene names, and the amino acid changes related to resistance.