Generating shuffled predictions - picrust/picrust2 GitHub Wiki
It can be helpful to compare the PICRUSt2 output tables with tables based on shuffling the predictions across all amplicon sequence variants (ASVs). The script shuffle_predictions.py
was added in v2.4.0 to make this task easier. This script randomizes the ASV labels for all predicted genomes (so all the same individual predicted genomes are the same - they just are linked to different ASV abundances across samples).
This is how you could run the command with the tutorial data:
shuffle_predictions.py -i EC_predicted.tsv.gz \
-o EC_predicted_shuffled \
-r 5 \
-s 131
Where -r
specifies how many random replicates to make and -s 131
specifies a random seed so that the same shuffled tables will be output reproducibly if this seed were used again.
The gene family and pathway-level prediction tables can then be generated from these shuffled tables by running the standard PICRUSt2 commands. Below is an example of how to quickly run metagenome_pipeline.py
and pathway_pipeline.py
on all shuffled tables with a bash loop.
# Make folders for shuffled output
mkdir EC_metagenome_out_shuffled
mkdir pathways_out_shuffled
for i in {1..5}; do
# Define in and out file paths.
EC_SHUFFLED="EC_predicted_shuffled/EC_predicted_shuf"$i".tsv.gz"
OUT_META="EC_metagenome_out_shuffled/rep"$i
OUT_PATHWAYS="pathways_out_shuffled/rep"$i
# PICRUSt2 scripts to get prediction abundance tables for gene and pathway levels, respectively.
metagenome_pipeline.py -i ../table.biom -m marker_predicted_and_nsti.tsv.gz -f $EC_SHUFFLED \
-o $OUT_META \
--strat_out
pathway_pipeline.py -i $OUT_META/pred_metagenome_contrib.tsv.gz \
-o $OUT_PATHWAYS \
-p 1
done
These shuffled tables are especially helpful to get a baseline for how the predicted functional data differentiates samples (e.g. based on ordination or differential abundance testing) when the predicted ASV genomes are assigned randomly.