bb_fastq_stats - ampinzonv/BB3 GitHub Wiki
bb_fastq_stats
Function: Generate basic statistics from a FASTQ file, with support for random subsampling.
WARNING: For large fastq files bb_fastq_stats can take forever to output statistics since it does not natively subsample the library, meaning that will use all the sequences in your fastq file to calculate statistics. Our advice is to subsample the library and then calculate the statistics:
bb_fastq_subsample --input file.fq --sample_size 5 --quiet | bb_fastq_stats --input -
The command above will randomly subsample 5% of the library and then calculate statistics.
๐ Description
This function analyzes the reads in a FASTQ file and produces a statistical summary that includes:
- Total number of reads
- Total sequence length
- Minimum and maximum read length
- Average read length
- Percent of bases with quality โฅ Q20 and Q30
By default, it samples 10% of the reads to speed up processing. You can change this with --sample_size
.
๐ฅ Input
- A FASTQ file, plain or gzip-compressed.
- You can also use standard input with
--input -
.
๐ค Output
- A single-line summary report (tabular) with key metrics.
๐งช Examples
Analyze all reads:
bb_fastq_stats --input reads.fastq
Use a 5% subsample:
bb_fastq_stats --input reads.fastq --sample_size 5
Save the output:
bb_fastq_stats --input reads.fastq --outfile stats.tsv
Use in a pipe:
cat reads.fastq | bb_fastq_stats --input -
โ๏ธ Usage
bb_fastq_stats --input FILE [--outfile FILE] [--sample_size PCT] [--quiet] [--force]
๐งต Options
Option | Description |
---|---|
--input FILE |
Input FASTQ file (or - for STDIN) (required) |
--outfile FILE |
File to save output (optional, default: STDOUT) |
--sample_size PCT |
Percent of reads to randomly sample (1โ100, default: 10) |
--quiet |
Suppress log messages |
--force |
Overwrite output file if it exists |
๐ Notes
- Compatible with compressed
.gz
FASTQ files (requiresgzcat
on macOS,zcat
on Linux). - Uses internal random shuffling to select sampled reads.
- For reliable estimates, larger sample sizes yield more precise statistics.