Getting the graph - googleinterns/data-dependency-graph-analysis GitHub Wiki


Generally, there are two ways to get a graph. First, is to generate a random graph from a config (yaml example is in the repo in configs folder), and second is to convert a created proto by schema to networkx and save it in a correct format. Here the detailed schema, config, and ways to create a supported graph will be described.

Graph schema

Graph consists of the next entities:

  • dataset
  • system
  • dataset collection
  • system collection
  • collection
  • processing
  • data integrity

Dataset entity-relationship schema can be seen below:


Graph config

Config file has four types of fields.

  • Count - count of node type in a graph.
  • Count_map - int:int map, where key is the number of elements in a group, and value, is the count of groups with that number of elements. For example, in dataset_count_map for dataset collections 5:100 will mean, that there 100 dataset collections with 5 datasets.
  • Proba_map - float:int, where value is the probability of a key. For example in volatality_proba_map, values 0:0.4 will mean that 40% of datasets are not volatile.
  • Range - [int, int], ranges for an attribute.

Random graph generation

Based on the config, random connections and attributes are generated.

In connection generator you can create random one-to-many and many-to-many connections.

Many-to-many generation doesn't guarantee exact config generation, and will most likely generate a similar config without very high values outliers.

python3 graph_generation/ \
         --output_file "output.graphml" \
         --config_file "graph_generation/configs/config_15_09_20.yaml" \
         --graph_type "networkx" \


  • output_file - path to a file, for proto has .bin extension, and for networkx graph has .graphml extension
  • config_file - path to a config file in yaml format
  • graph_type - could be one of "proto" / "networkx"
  • overwrite - if not specified equals to False. If it is used it will overwrite the existing graph.

Generate from proto

If a graph is already created by the proto schema in graph_generation/proto/config.proto, it can be easily converted to a networkx format to be manipulated later.

python3 graph_generation/ \
         --proto_file "proto.bin" \
         --nx_file "nx.graphml" \


  • proto_file - input proto file with .bin extension
  • nx_file - output file to save networkx graph, should have .graphml extension
  • overwrite - if not specified equals to False. If it is used - it will overwrite the existing graph