Output files - mooreryan/ZetaHunter GitHub Wiki
ZetaHunter has a lot of output files. Here is the basic directory structure.
OTU calls
Final OTU calls
Final ZOTU calls for each sequence can be found in the file: ZH_output/otu_calls/otu_calls.final.txt
. In this file, every sequence is listed with its sample number (S#
), ZOTU call, percent of maximum entropy, percent of masked bases, and any flags. Sample numbers can be associated with the original file name using the sample_id_to_fname.txt
file.
Closed reference ZOTUs are indicated as ZetaOtu#
. De novo classified OTUs are indicated as NewZetaOtu#
, and are ordered by abundance.
Out groups and other flags
The ZetaHunter database includes 23 out group sequences, with representatives from each class of the Proteobacteria, and one sequence each from Thermotogae and Aquificae. If a sequence is within 97% identity to an out group sequence in the ZetaHunter database, the OTU call is to that out group, and the flag OG_GTE_97
is applied (Out Group Greater Than or Equal to 97 percent identity).
If a sequence is closest to an outgroup, but not within a 97% OTU, then it is still classified as a NewZetaOtu
, but is given the flag OG_LT_97
(LT = Less Than). These sequences are likely non-Zetaproteobacteria, yet could be novel Zetaproteobacteria sequences at the base of the phylogenetic tree.
Sequences covering less than 75% of the Zetaproteobacteria 16S rRNA entropy (e.g., approximately 75% of the base positions that contain information that separate ZOTUs) are given the FRAGMENT
flag. Singletons and chimeras (as defined by UCHIME) are also flagged. A single sequence may have multiple flags.
Biom output
The standardized biom format is useful for comparing ZOTU composition across multiple samples. ZH_output/biom/biom.txt
shows ZOTU abundance within each sample, and can be used to easily create bar charts to view ZOTU abundance. This file is converted automatically by ZetaHunter into node and edge files for input into an OTU network in Cytoscape. These files can show the connectedness of ZOTUs within and between a user's samples. Further control on which NewZetaOtus
are shown within this OTU network can be obtained through filtering. The supplied script (ZetaHunter/bin/biom_to_cytoscape.rb
) filters out NewZetaOtus
below a minimum number of sequences, defined by the user, like so:
biom_to_cytoscape.rb biom.txt min_otu_size
with a default min_otu_size
of 1.
Miscellaneous output
Another important out file is ZH_output/misc/closest_db_seqs.txt
. From this file, the closest database hits for novel NewZetaOtus
can be seen. The closest hit determines the OTU, yet the percent identity (PID) determines whether that OTU designation receives the final call (i.e. PID greater than OTU cutoff) or the sequence is passed to de novo clustering.