RuleDescriptions - GeneMANIA/pipeline GitHub Wiki
Each rule in the build pipeline corresponds to a processing step, which may require other steps to be executed.
ALL
Build everything. This is the default target rule, and produces binary data files required by the GeneMANIA website.
ATTRIBUTES
Target rule for producing clean attribute files in text format.
MELT_ATTRIBUTES
Convert gene-attrib input files with ragged-length records into regular, tall thin tables with multiple records per gene.
MELT_ATTRIBUTES2
Convert ragged attrib-gene input files into regular, tall thin tables.
TRANSPOSE_ATTRIBS
Convert attrib-gene to gene-attrib pairs, common format required for later processing.
MELT_ATTRIBUTES3
Convert gmt-format ragged input files into tall thin tables.
PROCESS_ATTRIBUTES
Remove records with unrecognized gene symbols from attribute files, producing new cleaned attribute files.
DEDUP_ATTRIBUTES
Remove duplicate attributes.
MAP_ATTRIBUTES_TO_IDS
Convert attributes represented by gene and attribute symbols to internal equivalent internal genemania ids.
APPLY_ATTRIBUTE_ENUMERATION
Assign an internal genemania id to each attribute group
UPDATE_ATTRIBUTE_DESCRIPTIONS
Create (attributeid, description) pairs for all attribute ids in the cleaned input, adding empty description fields where no description is given.
GENERIC_DB_ATTRIBUTE_GROUPS
Create the generic_db ATTRIBUTE_GROUPS.txt file.
GENERIC_DB_ATTRIBUTES
Create the generic_db ATTRIBUTES.txt file, containing the name/description of each attribute in each group.
GENERIC_DB_COPY_ATTRIBUTE_DATA
Copy attribute data files containing internal genemania ids to generic_db.
DIRECT_NETWORKS
Target rule for processing direct networks.
CLEANED_DIRECT_NETWORKS
Apply weight cleaning to direct networks, adding an implicit '1' weight if missing, and removing weights <=0.
PROCESS_DIRECT_NETWORKS
Apply genemania network normalization to direct networks.
BUILD_NETWORKS_CACHE
CacheBuilder: convert interaction networks to binary engine format.
ATTRIBUTE_DATA
AttributeBuilder: convert attributes to binary engine format.
POST_SPARSIFY
PostSparsifier: filter co-expression networks removing unsupported interactions.
NODE_DEGREES
NodeDegreeComputer: count interactions for each gene across the entire organism.
ANNOTATION_DATA
AnnotationCacheBuilder: load functional annotation data into engine binary format.
FAST_WEIGHTING
FastWeightCacheBuilder: build precomputed data structures for GO-based network weighting. .
ENRICHMENT_ANALYSIS
EnrichmentCategoryBuilder: build data structures for functional enrichment analysis.
DEFAULT_COEXPRESSION
DefaultNetworkSelector: select subset of co-expression networks to use as default networks.
PRECOMBINE_NETWORKS
NetworkPrecombiner: build combined networks for common queries, non-query-list dependent queries, such as single-gene queries.
TIDY_QUERIED_FUNCTIONS
Filter functional annotations removing unrecognized gene symbols.
ENRICHMENT_FUNCTIONS
Filter functional annotation categories by size for enrichment analysis.
COMBINING_FUNCTIONS_BP
Filter functional annotation categories by size and branch for BP combining.
COMBINING_FUNCTIONS_MF
Filter functional annotation categories by size and branch for MF combining.
COMBINING_FUNCTIONS_CC
Filter functional annotation categories by size and branch CC combining.
GENERIC_DB_FUNCTIONS
Build generic db file ONTOLOGY_CATEGORIES.txt listing sets of functional annotations available for enrichment analysis.
GENERIC_DB_FUNCTION_GROUPS
Build generic db file ONTOLGOIES.txt with names of function categories for display in enrichment analysis.
COPY_GOCAT_COMBINING_FILES
Copy functional annotation data files for GO based combining to generic db.
COPY_GOCAT_ENRICHMENT_FILES
Copy functional annotation data files for enrichment analysis to generic db.
GENERIC_DB_FUNCTIONS_ALL
Create flag file marking functional annotation data being ready in generic_db.
GENERIC_DB_TAGS
Create empty generic db file TAGS.txt, network tags are no longer supported.
GENERIC_DB_NETWORK_TAG_ASSOC
Create empty generic_db file NETWORK_TAG_ASSOC.txt, network tags are longer supported.
GENERIC_DB_SCHEMA
Create generic db file SCHEMA.txt, listing fields in each file in generic db.
GENERIC_DB_STATISTICS
Create generic db file STATISTICS.txt containing interaction total count, and dataset production date.
GENERIC_DB_ORGANISMS
Create generic db file ORGANISMS.txt containing an organisms descriptive metadata such as scientific and common names.
GENERIC_DB_INTERACTIONS
Target rule for creating interaction data in generic_db format.
GENERIC_DB_COPY_INTERACTIONS
Copy network interaction files to generic_db.
MELT_RAW_IDENTIFIERS
Convert raw identifiers into id/symbol/source triplets, and remove genes with unneeded biotypes.
SCRUB_SYMBOLS
Target rule for producing cleaned gene identifier files.
APPLY_SYMBOL_SCRUBBING
Load all identifier input files containing id/symbol/source triplets and produce a single clean file, removing duplicates and clashes.
IDENTIFIER_DESCRIPTIONS
Create table containing descriptions for only the clean gene symbols.
GENERIC_DB_NODES
Create generic db file NODES.txt, containing an id record for each unique gene (not symbol) in the system.
GENERIC_DB_GENES
Create generic db file GENES.txt containing all recognized identifier symbols.
GENERIC_DB_GENE_DATA
Create generic db file GENE_DATA.txt containing gene descriptions.
GENERIC_DB_GENE_NAMING_SOURCES
Create generic db file GENE_NAMING_SOURCES.txt enumerating all identifier source types such as Entrez ID, etc.
LUCENE_INDEX
Target rule for constructing Lucene index files.
LUCENE_CFG
Build a config file in format required by index construction program.
BUILD_LUCENE_INDEX
Build Lucene index from generic db files containing organism, network, attribute, and functional annotation metadata.
TABULATE_NETWORK_METADATA
Target rule constructing a table containing network metadata.
APPLY_NETWORK_METADATA_TABULATION
Combine metadata from individual network config files into a single tabular file.
SET_MISSING_NETWORK_METADATA
Set default values where no network metadata was provided.
NETWORK_STATS_FILES
Target rule for computing interaction stats for all individual networks.
COMPUTE_NETWORK_STATS
Compute interaction stats for individual networks.
TABULATED_NETWORK_STATS
Target rule for constructing a table combining all individual network stats.
TABULATE_NETWORK_STATS
Combine individual network stats files into a single table
INIT_PUBMED_CACHE
Create an empty pubmed data cache file if none exists.
FETCH_PUBMED_METADATA
Retrieve publication metadata from pubmed, where available. Create a new extended network metadata file adding the required fields from pubmed.
GENERATE_NETWORK_NAMES
Compute network names from publication metadata, if not given explicitly. Apply network name deduplication by adding letters 'A', 'B', etc to networks with the same name and network group.
JOIN_NETWORK_INTERACTION_COUNTS
Incorporate network interaction counts into network metadata
EXTRACT_NETWORKS
Create the generic db file NETWORKS.txt, listing all networks.
EXTRACT_NETWORK_GROUPS
Create generic db file NETWORK_GROUPS.txt.
EXTRACT_NETWORK_METADATA
Create generic db file NETWORK_METADATA.txt, containing publication references and descriptive data for each network.
PROFILES
Target rule for interaction networks created from profile data.
PROCESS_PROFILES_P2N
Convert profiles to networks via Pearson correlation.
PROCESS_PROFILES_NN
Apply genemania network normalization to networks created from profile data.
SHAREDNEIGHBOUR_NETWORKS
Target rule for interaction networks created from shared neighbour profile data.
PROCESS_SHAREDNEIGHBOUR_NETWORKS_P2N
Convert shared neighbour profiles to networks
PROCESS_SHAREDNEIGHBOUR_NETWORKS_NN
Apply GeneMANIA network normalization to networks created from shared neighbour profiles.