Output file descriptions - agmcfarland/GeneGrouper GitHub Wiki
- Each row shows has the group id, number of members, and the min, mean, max, and std dev for a member's pair-wise dissimilarity to other members in the group, and the identity and coverage of a member's seed gene relative to the query gene.
- Each row shows the group id and each gene found in the representative group member.
- Each row shows the percentage of genomes from each taxa that have a member in a group. An asterisk indicates that a genome had more than one member in that group.
This is a large table that has all data generated from the run. Each row is a gene that was found in a genome and has information about:
-
Which genome it originates from, what contig it is on, and which group member it belongs to.
-
Whether it is a pseudogene or not.
-
The RefSeq locus tag, RefSeq gene name, and RefSeq product annotation.
-
The GeneGrouper group it belongs to.
-
The relative and pair-wise dissimilarity of the group member it belongs to.
-
The position of the gene in the gene region.
-
The start, end, and strand orientation of the gene.
- Each time the group inspection visualization is used, a key file containing the subgroup and all gene clusters that are a member of that subgroup, is saved in the
subgroups
folder.
-
Each time the group inspection visualization is used, a file similar to group_regions.csv is saved in the
subgroups
folder. -
Contains the same information but only for each representative of each subgroup.
- Contains all amino acid translated sequences for all genes from all gene regions. The header of each translated gene matches the orf_id from group_regions.csv
Note: Some visual files will have a numbered suffix at the end. This is because 30 groups at a time are displayed in each visualization. So a search that produces 60 groups will have two of each visualization.
- Three-part visualization of the group search output. This visual shows the number of unique groups found (middle), how many gene clusters are in a group (far right), and how similar gene clusters in a group are to each other (right), and how similar the seed gene of each gene cluster is to the query gene (left).

- Heatmap showing the percentage of genomes in a taxon with at least one member per group. Asterisks indicates that at least one genome in that taxon has more than one member in that group.

- The total number of genomes in a taxon searched (blue) and the number of genomes in that taxon that had at least one region extracted (red).

- Three-part visualization of a group inspection output. The left panel shows the counts of each unique subgroup architecture. Middle panel shows the subgroup architecture. Right panel shows the dissimilarity of the subgroup gene content relative to subgroup 0, which is also the group representative.

