Fasta Tools - ZhaoL-Bio/BioToolKits GitHub Wiki
extract_fasta.py
This Python script allows for the filtering of FASTA files to extract sequences for specific genes, supporting fuzzy matching of gene names (e.g., "geneA" matches "geneA", "geneA.1", "geneA.2", etc.). It also includes an option to only output the longest sequence for each gene match.
Prerequisites
- Python 3.6 or higher
- Biopython library
Installation
pip install biopython
Usage
The script is run from the command line with the following syntax:
python extract_fasta.py <input_fasta> <gene_names_file> <output_fasta> [--longest]
- <input_fasta>: Path to the input FASTA file.
- <gene_names_file>: Path to a file containing a list of gene names to filter by, one per line.
- <output_fasta>: Path to the output file where filtered sequences will be saved.
- --longest: Optional flag to only output the longest sequence for each gene match.