Algorithm Report: Breadth‐First Graph Processing with Vertex Distances - JoseCanova/brainz GitHub Wiki

graph LR
  ReleaseLabel -- "1.0" --> Label
  ReleaseLabel -- "1.0" --> Release
  ReleaseLabel -- "2.0" --> LabelType
  ReleaseLabel -- "2.0" --> Area
  ReleaseLabel -- "2.0" --> ReleaseGroup
  ReleaseLabel -- "2.0" --> ArtistCredit
  ReleaseLabel -- "2.0" --> Language
  ReleaseLabel -- "2.0" --> ReleasePackaging
  ReleaseLabel -- "2.0" --> ReleaseStatus
  ReleaseLabel -- "3.0" --> AreaType
  ReleaseLabel -- "3.0" --> ReleaseGroupPrimaryType

  Label         -- "1.0" --> LabelType
  Label         -- "1.0" --> Area
  Label         -- "2.0" --> AreaType

  Release       -- "1.0" --> ReleaseGroup
  Release       -- "1.0" --> ArtistCredit
  Release       -- "1.0" --> Language
  Release       -- "1.0" --> ReleasePackaging
  Release       -- "1.0" --> ReleaseStatus

  ReleaseGroup  -- "1.0" --> ReleaseGroupPrimaryType
  ReleaseGroup  -- "1.0" --> ArtistCredit

  Isrc          -- "1.0" --> Recording
  Isrc          -- "2.0" --> ArtistCredit

  Recording     -- "1.0" --> ArtistCredit

  Medium        -- "1.0" --> Release
  Medium        -- "1.0" --> MediumFormat
  Medium        -- "2.0" --> ReleaseGroup
  Medium        -- "2.0" --> ArtistCredit
  Medium        -- "2.0" --> Language
  Medium        -- "2.0" --> ReleasePackaging
  Medium        -- "2.0" --> ReleaseStatus
  Medium        -- "3.0" --> ReleaseGroupPrimaryType

  Artist        -- "1.0" --> Gender
  Artist        -- "1.0" --> Area
  Artist        -- "1.0" --> ArtistType
  Artist        -- "2.0" --> AreaType

  Area          -- "1.0" --> AreaType

  ReleaseAlias  -- "1.0" --> ReleaseAliasType
  ReleaseAlias  -- "1.0" --> Release
  ReleaseAlias  -- "2.0" --> ReleaseGroup
  ReleaseAlias  -- "2.0" --> ArtistCredit
  ReleaseAlias  -- "2.0" --> Language
  ReleaseAlias  -- "2.0" --> ReleasePackaging
  ReleaseAlias  -- "2.0" --> ReleaseStatus
  ReleaseAlias  -- "3.0" --> ReleaseGroupPrimaryType

  Instrument    -- "1.0" --> InstrumentType

  RecordingAlias -- "1.0" --> RecordingAliasType
  RecordingAlias -- "1.0" --> Recording
  RecordingAlias -- "2.0" --> ArtistCredit

  Track         -- "1.0" --> Recording
  Track         -- "2.0" --> ArtistCredit

  ArtistAlias   -- "1.0" --> Artist
  ArtistAlias   -- "1.0" --> ArtistAliasType
  ArtistAlias   -- "2.0" --> Gender
  ArtistAlias   -- "2.0" --> Area
  ArtistAlias   -- "2.0" --> ArtistType
  ArtistAlias   -- "3.0" --> AreaType

  Work          -- "1.0" --> WorkType


Loading

Algorithm Report: Breadth-First Graph Processing with Vertex Distances

1. Introduction

This report analyzes the processGraphByBreadthFirst routine alongside its underlying Java classes (VertexDistance and VertexPair) and the graph model gleaned from your music-metadata relationships. It breaks down data structures, step-by-step execution, performance characteristics, and creative insights for possible extensions.

2. Core Data Structures

VertexPair<X,Y> Holds a source and target vertex of types X and Y.

VertexDistance<X,Y> Wraps a VertexPair<X,Y> and a Double distance, representing the shortest‐path weight between the two vertices.

Priority<V, Integer> Associates each graph entity with an integer priority used for dynamic updates.

JGraphT’s

DirectedGraph<Class<? extends BaseEntity>, PriorityEdge> — the main graph of music-metadata entities.

DijkstraShortestPath — prebuilt once for all‐pairs path‐weight queries.

BreadthFirstIterator — traverses the graph layer by layer from each root vertex.

3. Algorithm Overview

Initialize

Build JGraphT’s directed graph from brainzGraphModel.

Instantiate DijkstraShortestPath on the full graph.

Prepare an empty set vertexDistances to collect unique VertexDistance entries.

Per-Vertex BFS Traversal For each vertex v in the graph:

Create a BreadthFirstIterator rooted at v.

Maintain a local visited set of edges to avoid double-processing.

While the iterator has next:

Let next be the current vertex, and parent its BFS parent.

If their connecting edge hasn’t been visited:

Mark edge visited.

Increment a global visitFrequency[next].

Compute distance = dijkstra.getPathWeight(v, next).

Wrap into a new VertexDistance<>(distance, new VertexPair<>(v, next)).

If truly new, add it to vertexDistances and invoke printVertexDistance.

Dynamic Priority Updates After recording each new distance:

Lookup priorities pparent and pnext for the two endpoints.

Fetch the meta-model edge weight from parentMetaModel.getModelGraph().

If the edge weight equals exactly 2.0:

Recompute

pparent' = pnext.priority + 1

pnext' = pparent.priority + pnext.priority + 1

Update priorityMap with the new values.

4. Graph Distance Profile

Below is a summary of the distribution of vertex‐pair distances in your music-metadata graph.

Distance # of Edges Sample Edge Types 1.0 28 Release → ReleaseGroup, Label → Area 2.0 22 ReleaseLabel → LabelType, Medium → Language 3.0 8 ReleaseLabel → AreaType, Medium → ReleaseGroupPrimaryType This profile reveals most entities live one or two hops apart, with only a handful requiring three-hop connections.

5. Complexity Analysis

Let • V = number of vertices • E = number of edges

BFS per root O(V + E) for each starting vertex ⇒ O(V·(V + E)).

Path‐weight queries Each getPathWeight(v, next) invocation runs Dijkstra in O(E + V log V). In the worst case, you invoke it for nearly every edge in each BFS ⇒ O(E·(E + V log V)).

Overall Combines to roughly O(V·(V + E) + E·(E + V log V)). For dense graphs this can approach O(E²+EV log V), so performance tuning or caching all-pairs distances may help.

6. Creative Insights & Extensions

All-Pairs Precomputation Instead of running Dijkstra per edge, compute a Floyd–Warshall or Johnson’s algorithm once to fill a distance matrix in O(V³) or O(V² log V + V E), then lookup in O(1).

Weighted BFS Hybrid Use a multi-source Dijkstra or a 0-1 BFS if weights are 1.0 and 2.0, exploiting deque-based approaches to reach O(V + E).

Dynamic Visualization Integrate with Graphviz or Gephi to color nodes by average distance from a given root, or animate BFS layers as concentric shells.

Threshold Filtering Introduce a distance cutoff to ignore loosely-connected pairs (e.g. only distances ≤2.0), reducing memory footprint of vertexDistances.

Adaptive Priority Logic Generalize beyond a hardcoded weight==2.0 check—perhaps use any weight ≥ threshold to escalate priority, or employ exponential backoff for very distant connections.

7. Conclusion

The processGraphByBreadthFirst method elegantly fuses breadth-first traversal with shortest-path weighting to discover and record the “distance” between every ordered pair of entities. Although its brute-force Dijkstra calls can become costly for large graphs, strategic caching or hybrid BFS approaches can tame the complexity. Finally, the adaptive priority scheme based on meta-model edge weights opens pathways for dynamic reordering of processing order, which could be key in applications like incremental graph indexing or real-time metadata synchronization.

If you’d like concrete code examples for any of the suggested extensions—or a dive into visual layouts and performance benchmarks—just let me know!

graph LR

  Track -- "1.0" --> Recording
  Track -- "2.0" --> ArtistCredit
  Recording -- "1.0" --> ArtistCredit
  Isrc -- "1.0" --> Recording
  Isrc -- "2.0" --> ArtistCredit

  RecordingAlias -- "1.0" --> Recording
  RecordingAlias -- "1.0" --> RecordingAliasType
  RecordingAlias -- "2.0" --> ArtistCredit

  Label -- "1.0" --> Area
  Label -- "1.0" --> LabelType
  Label -- "2.0" --> AreaType
  Area -- "1.0" --> AreaType

  ArtistAlias -- "1.0" --> Artist
  ArtistAlias -- "1.0" --> ArtistAliasType
  ArtistAlias -- "2.0" --> Area
  ArtistAlias -- "2.0" --> ArtistType
  ArtistAlias -- "2.0" --> Gender
  ArtistAlias -- "3.0" --> AreaType

  Artist -- "1.0" --> Area
  Artist -- "1.0" --> ArtistType
  Artist -- "1.0" --> Gender
  Artist -- "2.0" --> AreaType

  ReleaseGroup -- "1.0" --> ReleaseGroupPrimaryType
  ReleaseGroup -- "1.0" --> ArtistCredit
  Release -- "1.0" --> ReleaseGroup
  Release -- "1.0" --> ReleasePackaging
  Release -- "1.0" --> ReleaseStatus
  Release -- "1.0" --> ArtistCredit
  Release -- "1.0" --> Language
  Release -- "2.0" --> ReleaseGroupPrimaryType

  ReleaseAlias -- "1.0" --> Release
  ReleaseAlias -- "1.0" --> ReleaseAliasType
  ReleaseAlias -- "2.0" --> ReleasePackaging
  ReleaseAlias -- "2.0" --> ReleaseGroup
  ReleaseAlias -- "2.0" --> ReleaseStatus
  ReleaseAlias -- "2.0" --> ArtistCredit
  ReleaseAlias -- "2.0" --> Language
  ReleaseAlias -- "3.0" --> ReleaseGroupPrimaryType

  ReleaseLabel -- "1.0" --> Label
  ReleaseLabel -- "1.0" --> Release
  ReleaseLabel -- "2.0" --> Area
  ReleaseLabel -- "2.0" --> LabelType
  ReleaseLabel -- "2.0" --> ReleasePackaging
  ReleaseLabel -- "2.0" --> ReleaseGroup
  ReleaseLabel -- "2.0" --> ReleaseStatus
  ReleaseLabel -- "2.0" --> ArtistCredit
  ReleaseLabel -- "2.0" --> Language
  ReleaseLabel -- "3.0" --> AreaType
  ReleaseLabel -- "3.0" --> ReleaseGroupPrimaryType

  Medium -- "1.0" --> MediumFormat
  Medium -- "1.0" --> Release
  Medium -- "2.0" --> ReleasePackaging
  Medium -- "2.0" --> ReleaseGroup
  Medium -- "2.0" --> ReleaseStatus
  Medium -- "2.0" --> ArtistCredit
  Medium -- "2.0" --> Language
  Medium -- "3.0" --> ReleaseGroupPrimaryType

  Work -- "1.0" --> WorkType

  Instrument -- "1.0" --> InstrumentType
Loading
digraph MusicMetadata {
  rankdir=LR;
  node [shape=ellipse, fontsize=10];
  edge [fontsize=9];

  "Track" -> "Recording"         [label="1.0"];
  "Track" -> "ArtistCredit"      [label="2.0"];
  "Recording" -> "ArtistCredit"  [label="1.0"];
  "Isrc" -> "Recording"          [label="1.0"];
  "Isrc" -> "ArtistCredit"       [label="2.0"];

  "RecordingAlias" -> "Recording"           [label="1.0"];
  "RecordingAlias" -> "RecordingAliasType"  [label="1.0"];
  "RecordingAlias" -> "ArtistCredit"        [label="2.0"];

  "Label" -> "Area"          [label="1.0"];
  "Label" -> "LabelType"     [label="1.0"];
  "Label" -> "AreaType"      [label="2.0"];
  "Area" -> "AreaType"       [label="1.0"];

  "ArtistAlias" -> "Artist"           [label="1.0"];
  "ArtistAlias" -> "ArtistAliasType"  [label="1.0"];
  "ArtistAlias" -> "Area"             [label="2.0"];
  "ArtistAlias" -> "ArtistType"       [label="2.0"];
  "ArtistAlias" -> "Gender"           [label="2.0"];
  "ArtistAlias" -> "AreaType"         [label="3.0"];

  "Artist" -> "Area"         [label="1.0"];
  "Artist" -> "ArtistType"   [label="1.0"];
  "Artist" -> "Gender"       [label="1.0"];
  "Artist" -> "AreaType"     [label="2.0"];

  "ReleaseGroup" -> "ReleaseGroupPrimaryType" [label="1.0"];
  "ReleaseGroup" -> "ArtistCredit"            [label="1.0"];

  "Release" -> "ReleaseGroup"            [label="1.0"];
  "Release" -> "ReleasePackaging"        [label="1.0"];
  "Release" -> "ReleaseStatus"           [label="1.0"];
  "Release" -> "ArtistCredit"            [label="1.0"];
  "Release" -> "Language"                [label="1.0"];
  "Release" -> "ReleaseGroupPrimaryType" [label="2.0"];

  "ReleaseAlias" -> "Release"                 [label="1.0"];
  "ReleaseAlias" -> "ReleaseAliasType"        [label="1.0"];
  "ReleaseAlias" -> "ReleasePackaging"        [label="2.0"];
  "ReleaseAlias" -> "ReleaseGroup"            [label="2.0"];
  "ReleaseAlias" -> "ReleaseStatus"           [label="2.0"];
  "ReleaseAlias" -> "ArtistCredit"            [label="2.0"];
  "ReleaseAlias" -> "Language"                [label="2.0"];
  "ReleaseAlias" -> "ReleaseGroupPrimaryType" [label="3.0"];

  "ReleaseLabel" -> "Label"                     [label="1.0"];
  "ReleaseLabel" -> "Release"                   [label="1.0"];
  "ReleaseLabel" -> "Area"                      [label="2.0"];
  "ReleaseLabel" -> "LabelType"                 [label="2.0"];
  "ReleaseLabel" -> "ReleasePackaging"          [label="2.0"];
  "ReleaseLabel" -> "ReleaseGroup"              [label="2.0"];
  "ReleaseLabel" -> "ReleaseStatus"             [label="2.0"];
  "ReleaseLabel" -> "ArtistCredit"              [label="2.0"];
  "ReleaseLabel" -> "Language"                  [label="2.0"];
  "ReleaseLabel" -> "AreaType"                  [label="3.0"];
  "ReleaseLabel" -> "ReleaseGroupPrimaryType"   [label="3.0"];

  "Medium" -> "MediumFormat"                 [label="1.0"];
  "Medium" -> "Release"                      [label="1.0"];
  "Medium" -> "ReleasePackaging"             [label="2.0"];
  "Medium" -> "ReleaseGroup"                 [label="2.0"];
  "Medium" -> "ReleaseStatus"                [label="2.0"];
  "Medium" -> "ArtistCredit"                 [label="2.0"];
  "Medium" -> "Language"                     [label="2.0"];
  "Medium" -> "ReleaseGroupPrimaryType"      [label="3.0"];

  "Work" -> "WorkType"           [label="1.0"];
  "Instrument" -> "InstrumentType" [label="1.0"];
}
⚠️ **GitHub.com Fallback** ⚠️