RuleDescriptions - GeneMANIA/pipeline GitHub Wiki

Each rule in the build pipeline corresponds to a processing step, which may require other steps to be executed.

ALL

Build everything. This is the default target rule, and produces binary data files required by the GeneMANIA website.

ATTRIBUTES

Target rule for producing clean attribute files in text format.

MELT_ATTRIBUTES

Convert gene-attrib input files with ragged-length records into regular, tall thin tables with multiple records per gene.

MELT_ATTRIBUTES2

Convert ragged attrib-gene input files into regular, tall thin tables.

TRANSPOSE_ATTRIBS

Convert attrib-gene to gene-attrib pairs, common format required for later processing.

MELT_ATTRIBUTES3

Convert gmt-format ragged input files into tall thin tables.

PROCESS_ATTRIBUTES

Remove records with unrecognized gene symbols from attribute files, producing new cleaned attribute files.

DEDUP_ATTRIBUTES

Remove duplicate attributes.

MAP_ATTRIBUTES_TO_IDS

Convert attributes represented by gene and attribute symbols to internal equivalent internal genemania ids.

APPLY_ATTRIBUTE_ENUMERATION

Assign an internal genemania id to each attribute group

UPDATE_ATTRIBUTE_DESCRIPTIONS

Create (attributeid, description) pairs for all attribute ids in the cleaned input, adding empty description fields where no description is given.

GENERIC_DB_ATTRIBUTE_GROUPS

Create the generic_db ATTRIBUTE_GROUPS.txt file.

GENERIC_DB_ATTRIBUTES

Create the generic_db ATTRIBUTES.txt file, containing the name/description of each attribute in each group.

GENERIC_DB_COPY_ATTRIBUTE_DATA

Copy attribute data files containing internal genemania ids to generic_db.

DIRECT_NETWORKS

Target rule for processing direct networks.

CLEANED_DIRECT_NETWORKS

Apply weight cleaning to direct networks, adding an implicit '1' weight if missing, and removing weights <=0.

PROCESS_DIRECT_NETWORKS

Apply genemania network normalization to direct networks.

BUILD_NETWORKS_CACHE

CacheBuilder: convert interaction networks to binary engine format.

ATTRIBUTE_DATA

AttributeBuilder: convert attributes to binary engine format.

POST_SPARSIFY

PostSparsifier: filter co-expression networks removing unsupported interactions.

NODE_DEGREES

NodeDegreeComputer: count interactions for each gene across the entire organism.

ANNOTATION_DATA

AnnotationCacheBuilder: load functional annotation data into engine binary format.

FAST_WEIGHTING

FastWeightCacheBuilder: build precomputed data structures for GO-based network weighting. .

ENRICHMENT_ANALYSIS

EnrichmentCategoryBuilder: build data structures for functional enrichment analysis.

DEFAULT_COEXPRESSION

DefaultNetworkSelector: select subset of co-expression networks to use as default networks.

PRECOMBINE_NETWORKS

NetworkPrecombiner: build combined networks for common queries, non-query-list dependent queries, such as single-gene queries.

TIDY_QUERIED_FUNCTIONS

Filter functional annotations removing unrecognized gene symbols.

ENRICHMENT_FUNCTIONS

Filter functional annotation categories by size for enrichment analysis.

COMBINING_FUNCTIONS_BP

Filter functional annotation categories by size and branch for BP combining.

COMBINING_FUNCTIONS_MF

Filter functional annotation categories by size and branch for MF combining.

COMBINING_FUNCTIONS_CC

Filter functional annotation categories by size and branch CC combining.

GENERIC_DB_FUNCTIONS

Build generic db file ONTOLOGY_CATEGORIES.txt listing sets of functional annotations available for enrichment analysis.

GENERIC_DB_FUNCTION_GROUPS

Build generic db file ONTOLGOIES.txt with names of function categories for display in enrichment analysis.

COPY_GOCAT_COMBINING_FILES

Copy functional annotation data files for GO based combining to generic db.

COPY_GOCAT_ENRICHMENT_FILES

Copy functional annotation data files for enrichment analysis to generic db.

GENERIC_DB_FUNCTIONS_ALL

Create flag file marking functional annotation data being ready in generic_db.

GENERIC_DB_TAGS

Create empty generic db file TAGS.txt, network tags are no longer supported.

GENERIC_DB_NETWORK_TAG_ASSOC

Create empty generic_db file NETWORK_TAG_ASSOC.txt, network tags are longer supported.

GENERIC_DB_SCHEMA

Create generic db file SCHEMA.txt, listing fields in each file in generic db.

GENERIC_DB_STATISTICS

Create generic db file STATISTICS.txt containing interaction total count, and dataset production date.

GENERIC_DB_ORGANISMS

Create generic db file ORGANISMS.txt containing an organisms descriptive metadata such as scientific and common names.

GENERIC_DB_INTERACTIONS

Target rule for creating interaction data in generic_db format.

GENERIC_DB_COPY_INTERACTIONS

Copy network interaction files to generic_db.

MELT_RAW_IDENTIFIERS

Convert raw identifiers into id/symbol/source triplets, and remove genes with unneeded biotypes.

SCRUB_SYMBOLS

Target rule for producing cleaned gene identifier files.

APPLY_SYMBOL_SCRUBBING

Load all identifier input files containing id/symbol/source triplets and produce a single clean file, removing duplicates and clashes.

IDENTIFIER_DESCRIPTIONS

Create table containing descriptions for only the clean gene symbols.

GENERIC_DB_NODES

Create generic db file NODES.txt, containing an id record for each unique gene (not symbol) in the system.

GENERIC_DB_GENES

Create generic db file GENES.txt containing all recognized identifier symbols.

GENERIC_DB_GENE_DATA

Create generic db file GENE_DATA.txt containing gene descriptions.

GENERIC_DB_GENE_NAMING_SOURCES

Create generic db file GENE_NAMING_SOURCES.txt enumerating all identifier source types such as Entrez ID, etc.

LUCENE_INDEX

Target rule for constructing Lucene index files.

LUCENE_CFG

Build a config file in format required by index construction program.

BUILD_LUCENE_INDEX

Build Lucene index from generic db files containing organism, network, attribute, and functional annotation metadata.

TABULATE_NETWORK_METADATA

Target rule constructing a table containing network metadata.

APPLY_NETWORK_METADATA_TABULATION

Combine metadata from individual network config files into a single tabular file.

SET_MISSING_NETWORK_METADATA

Set default values where no network metadata was provided.

NETWORK_STATS_FILES

Target rule for computing interaction stats for all individual networks.

COMPUTE_NETWORK_STATS

Compute interaction stats for individual networks.

TABULATED_NETWORK_STATS

Target rule for constructing a table combining all individual network stats.

TABULATE_NETWORK_STATS

Combine individual network stats files into a single table

INIT_PUBMED_CACHE

Create an empty pubmed data cache file if none exists.

FETCH_PUBMED_METADATA

Retrieve publication metadata from pubmed, where available. Create a new extended network metadata file adding the required fields from pubmed.

GENERATE_NETWORK_NAMES

Compute network names from publication metadata, if not given explicitly. Apply network name deduplication by adding letters 'A', 'B', etc to networks with the same name and network group.

JOIN_NETWORK_INTERACTION_COUNTS

Incorporate network interaction counts into network metadata

EXTRACT_NETWORKS

Create the generic db file NETWORKS.txt, listing all networks.

EXTRACT_NETWORK_GROUPS

Create generic db file NETWORK_GROUPS.txt.

EXTRACT_NETWORK_METADATA

Create generic db file NETWORK_METADATA.txt, containing publication references and descriptive data for each network.

PROFILES

Target rule for interaction networks created from profile data.

PROCESS_PROFILES_P2N

Convert profiles to networks via Pearson correlation.

PROCESS_PROFILES_NN

Apply genemania network normalization to networks created from profile data.

SHAREDNEIGHBOUR_NETWORKS

Target rule for interaction networks created from shared neighbour profile data.

PROCESS_SHAREDNEIGHBOUR_NETWORKS_P2N

Convert shared neighbour profiles to networks

PROCESS_SHAREDNEIGHBOUR_NETWORKS_NN

Apply GeneMANIA network normalization to networks created from shared neighbour profiles.