2. Filter SRRs - labbces/SpliceScape GitHub Wiki
Scripts
filter_metadata.py
🟢The Python script filter_metadata.py is used to filter the metadata in an SQLite3 database generated by the get_metadata.py script. It allows filtering based on read length, specific column values, and strand information, and then stores the filtered results in a new table.
This script is designed to filter an existing SQLite3 database containing SRA metadata and outputs a text file with one filtered SRR accession per line. It provides a quick and user-friendly way to perform common filtering operations without requiring direct SQL queries. For more refined or complex filtering needs, we recommend using SQL queries directly on the database. The main purpose of this script is to offer a straightforward and accessible filtering tool, especially for users who prefer to avoid writing SQL commands.
- Requirements:
Category | Requirements |
---|---|
Python | Python 3 |
Standard Libraries | argparse , sqlite3 , sys |
External Libraries | None |
- Arguments:
Argument | Function | Required | Example |
---|---|---|---|
-db , --database |
Specifies the SQLite3 database file generated by get_metadata.py | Yes | -db database.db |
-l , --read_length |
Filter by minimum read length (default is 100) | No | -l 100.0 |
-f , --filters |
Filter rows where specified columns are not empty. Allowed columns: pmid , species_cultivar , species_genotype , treatment , dev_stage , tissue , age , source_name |
No | -f pmid species_cultivar |
-s , --strand |
Filter rows where strand_info is PAIRED , SINGLE , or NULL |
No | -s PAIRED SINGLE NULL |
-e , --exact_filter |
Filter by exact column=value pairs (can be used multiple times) | No | -e species_name Setaria_viridis |
--create_table |
Create the filtered_sra_metadata table and insert filtered data |
No | --create_table |
--output_file |
Output file to save filtered sra_id values |
Yes | --output_file filtered_sra_ids.txt |
--verbose |
Enables verbose output | No | --verbose |
- Run example:
python3 filter_metadata.py -db database.db -l 100 -f pmid species_cultivar -s SINGLE NULL --create_table --output_file filtered_sra_ids.txt --verbose
Errors