Getting the graph - googleinterns/data-dependency-graph-analysis GitHub Wiki
Introduction
Generally, there are two ways to get a graph. First, is to generate a random graph from a config (yaml example is in the repo in configs folder), and second is to convert a created proto by schema to networkx and save it in a correct format. Here the detailed schema, config, and ways to create a supported graph will be described.
Graph schema
Graph consists of the next entities:
- dataset
- system
- dataset collection
- system collection
- collection
- processing
- data integrity
Dataset entity-relationship schema can be seen below:
Graph config
Config file has four types of fields.
- Count - count of node type in a graph.
- Count_map - int:int map, where key is the number of elements in a group, and value, is the count of groups with that number of elements. For example, in dataset_count_map for dataset collections 5:100 will mean, that there 100 dataset collections with 5 datasets.
- Proba_map - float:int, where value is the probability of a key. For example in volatality_proba_map, values 0:0.4 will mean that 40% of datasets are not volatile.
- Range - [int, int], ranges for an attribute.
Random graph generation
Based on the config, random connections and attributes are generated.
In connection generator you can create random one-to-many and many-to-many connections.
Many-to-many generation doesn't guarantee exact config generation, and will most likely generate a similar config without very high values outliers.
python3 graph_generation/generate_from_config.py \
--output_file "output.graphml" \
--config_file "graph_generation/configs/config_15_09_20.yaml" \
--graph_type "networkx" \
--overwrite
Parameters
- output_file - path to a file, for proto has .bin extension, and for networkx graph has .graphml extension
- config_file - path to a config file in yaml format
- graph_type - could be one of "proto" / "networkx"
- overwrite - if not specified equals to False. If it is used it will overwrite the existing graph.
Generate from proto
If a graph is already created by the proto schema in graph_generation/proto/config.proto, it can be easily converted to a networkx format to be manipulated later.
python3 graph_generation/proto_to_nx.py \
--proto_file "proto.bin" \
--nx_file "nx.graphml" \
--overwrite
Parameters
- proto_file - input proto file with .bin extension
- nx_file - output file to save networkx graph, should have .graphml extension
- overwrite - if not specified equals to False. If it is used - it will overwrite the existing graph