2. Filter SRRs - labbces/SpliceScape GitHub Wiki

Scripts

🟢 filter_metadata.py

The Python script filter_metadata.py is used to filter the metadata in an SQLite3 database generated by the get_metadata.py script. It allows filtering based on read length, specific column values, and strand information, and then stores the filtered results in a new table.

This script is designed to filter an existing SQLite3 database containing SRA metadata and outputs a text file with one filtered SRR accession per line. It provides a quick and user-friendly way to perform common filtering operations without requiring direct SQL queries. For more refined or complex filtering needs, we recommend using SQL queries directly on the database. The main purpose of this script is to offer a straightforward and accessible filtering tool, especially for users who prefer to avoid writing SQL commands.

  • Requirements:
Category Requirements
Python Python 3
Standard Libraries argparse, sqlite3, sys
External Libraries None
  • Arguments:
Argument Function Required Example
-db, --database Specifies the SQLite3 database file generated by get_metadata.py Yes -db database.db
-l, --read_length Filter by minimum read length (default is 100) No -l 100.0
-f, --filters Filter rows where specified columns are not empty. Allowed columns: pmid, species_cultivar, species_genotype, treatment, dev_stage, tissue, age, source_name No -f pmid species_cultivar
-s, --strand Filter rows where strand_info is PAIRED, SINGLE, or NULL No -s PAIRED SINGLE NULL
-e, --exact_filter Filter by exact column=value pairs (can be used multiple times) No -e species_name Setaria_viridis
--create_table Create the filtered_sra_metadata table and insert filtered data No --create_table
--output_file Output file to save filtered sra_id values Yes --output_file filtered_sra_ids.txt
--verbose Enables verbose output No --verbose
  • Run example:
python3 filter_metadata.py -db database.db -l 100 -f pmid species_cultivar -s SINGLE NULL --create_table --output_file filtered_sra_ids.txt --verbose

Errors