Algorithm Report: Breadth‐First Graph Processing with Vertex Distances - JoseCanova/brainz GitHub Wiki
graph LR
ReleaseLabel -- "1.0" --> Label
ReleaseLabel -- "1.0" --> Release
ReleaseLabel -- "2.0" --> LabelType
ReleaseLabel -- "2.0" --> Area
ReleaseLabel -- "2.0" --> ReleaseGroup
ReleaseLabel -- "2.0" --> ArtistCredit
ReleaseLabel -- "2.0" --> Language
ReleaseLabel -- "2.0" --> ReleasePackaging
ReleaseLabel -- "2.0" --> ReleaseStatus
ReleaseLabel -- "3.0" --> AreaType
ReleaseLabel -- "3.0" --> ReleaseGroupPrimaryType
Label -- "1.0" --> LabelType
Label -- "1.0" --> Area
Label -- "2.0" --> AreaType
Release -- "1.0" --> ReleaseGroup
Release -- "1.0" --> ArtistCredit
Release -- "1.0" --> Language
Release -- "1.0" --> ReleasePackaging
Release -- "1.0" --> ReleaseStatus
ReleaseGroup -- "1.0" --> ReleaseGroupPrimaryType
ReleaseGroup -- "1.0" --> ArtistCredit
Isrc -- "1.0" --> Recording
Isrc -- "2.0" --> ArtistCredit
Recording -- "1.0" --> ArtistCredit
Medium -- "1.0" --> Release
Medium -- "1.0" --> MediumFormat
Medium -- "2.0" --> ReleaseGroup
Medium -- "2.0" --> ArtistCredit
Medium -- "2.0" --> Language
Medium -- "2.0" --> ReleasePackaging
Medium -- "2.0" --> ReleaseStatus
Medium -- "3.0" --> ReleaseGroupPrimaryType
Artist -- "1.0" --> Gender
Artist -- "1.0" --> Area
Artist -- "1.0" --> ArtistType
Artist -- "2.0" --> AreaType
Area -- "1.0" --> AreaType
ReleaseAlias -- "1.0" --> ReleaseAliasType
ReleaseAlias -- "1.0" --> Release
ReleaseAlias -- "2.0" --> ReleaseGroup
ReleaseAlias -- "2.0" --> ArtistCredit
ReleaseAlias -- "2.0" --> Language
ReleaseAlias -- "2.0" --> ReleasePackaging
ReleaseAlias -- "2.0" --> ReleaseStatus
ReleaseAlias -- "3.0" --> ReleaseGroupPrimaryType
Instrument -- "1.0" --> InstrumentType
RecordingAlias -- "1.0" --> RecordingAliasType
RecordingAlias -- "1.0" --> Recording
RecordingAlias -- "2.0" --> ArtistCredit
Track -- "1.0" --> Recording
Track -- "2.0" --> ArtistCredit
ArtistAlias -- "1.0" --> Artist
ArtistAlias -- "1.0" --> ArtistAliasType
ArtistAlias -- "2.0" --> Gender
ArtistAlias -- "2.0" --> Area
ArtistAlias -- "2.0" --> ArtistType
ArtistAlias -- "3.0" --> AreaType
Work -- "1.0" --> WorkType
Algorithm Report: Breadth-First Graph Processing with Vertex Distances
This report analyzes the processGraphByBreadthFirst routine alongside its underlying Java classes (VertexDistance and VertexPair) and the graph model gleaned from your music-metadata relationships. It breaks down data structures, step-by-step execution, performance characteristics, and creative insights for possible extensions.
VertexPair<X,Y> Holds a source and target vertex of types X and Y.
VertexDistance<X,Y> Wraps a VertexPair<X,Y> and a Double distance, representing the shortest‐path weight between the two vertices.
Priority<V, Integer> Associates each graph entity with an integer priority used for dynamic updates.
JGraphT’s
DirectedGraph<Class<? extends BaseEntity>, PriorityEdge> — the main graph of music-metadata entities.
DijkstraShortestPath — prebuilt once for all‐pairs path‐weight queries.
BreadthFirstIterator — traverses the graph layer by layer from each root vertex.
Initialize
Build JGraphT’s directed graph from brainzGraphModel.
Instantiate DijkstraShortestPath on the full graph.
Prepare an empty set vertexDistances to collect unique VertexDistance entries.
Per-Vertex BFS Traversal For each vertex v in the graph:
Create a BreadthFirstIterator rooted at v.
Maintain a local visited set of edges to avoid double-processing.
While the iterator has next:
Let next be the current vertex, and parent its BFS parent.
If their connecting edge hasn’t been visited:
Mark edge visited.
Increment a global visitFrequency[next].
Compute distance = dijkstra.getPathWeight(v, next).
Wrap into a new VertexDistance<>(distance, new VertexPair<>(v, next)).
If truly new, add it to vertexDistances and invoke printVertexDistance.
Dynamic Priority Updates After recording each new distance:
Lookup priorities pparent and pnext for the two endpoints.
Fetch the meta-model edge weight from parentMetaModel.getModelGraph().
If the edge weight equals exactly 2.0:
Recompute
pparent' = pnext.priority + 1
pnext' = pparent.priority + pnext.priority + 1
Update priorityMap with the new values.
Distance # of Edges Sample Edge Types 1.0 28 Release → ReleaseGroup, Label → Area 2.0 22 ReleaseLabel → LabelType, Medium → Language 3.0 8 ReleaseLabel → AreaType, Medium → ReleaseGroupPrimaryType This profile reveals most entities live one or two hops apart, with only a handful requiring three-hop connections.
Let • V = number of vertices • E = number of edges
BFS per root O(V + E) for each starting vertex ⇒ O(V·(V + E)).
Path‐weight queries Each getPathWeight(v, next) invocation runs Dijkstra in O(E + V log V). In the worst case, you invoke it for nearly every edge in each BFS ⇒ O(E·(E + V log V)).
Overall Combines to roughly O(V·(V + E) + E·(E + V log V)). For dense graphs this can approach O(E²+EV log V), so performance tuning or caching all-pairs distances may help.
All-Pairs Precomputation Instead of running Dijkstra per edge, compute a Floyd–Warshall or Johnson’s algorithm once to fill a distance matrix in O(V³) or O(V² log V + V E), then lookup in O(1).
Weighted BFS Hybrid Use a multi-source Dijkstra or a 0-1 BFS if weights are 1.0 and 2.0, exploiting deque-based approaches to reach O(V + E).
Dynamic Visualization Integrate with Graphviz or Gephi to color nodes by average distance from a given root, or animate BFS layers as concentric shells.
Threshold Filtering Introduce a distance cutoff to ignore loosely-connected pairs (e.g. only distances ≤2.0), reducing memory footprint of vertexDistances.
Adaptive Priority Logic Generalize beyond a hardcoded weight==2.0 check—perhaps use any weight ≥ threshold to escalate priority, or employ exponential backoff for very distant connections.
7. Conclusion
The processGraphByBreadthFirst method elegantly fuses breadth-first traversal with shortest-path weighting to discover and record the “distance” between every ordered pair of entities. Although its brute-force Dijkstra calls can become costly for large graphs, strategic caching or hybrid BFS approaches can tame the complexity. Finally, the adaptive priority scheme based on meta-model edge weights opens pathways for dynamic reordering of processing order, which could be key in applications like incremental graph indexing or real-time metadata synchronization.
If you’d like concrete code examples for any of the suggested extensions—or a dive into visual layouts and performance benchmarks—just let me know!
graph LR
Track -- "1.0" --> Recording
Track -- "2.0" --> ArtistCredit
Recording -- "1.0" --> ArtistCredit
Isrc -- "1.0" --> Recording
Isrc -- "2.0" --> ArtistCredit
RecordingAlias -- "1.0" --> Recording
RecordingAlias -- "1.0" --> RecordingAliasType
RecordingAlias -- "2.0" --> ArtistCredit
Label -- "1.0" --> Area
Label -- "1.0" --> LabelType
Label -- "2.0" --> AreaType
Area -- "1.0" --> AreaType
ArtistAlias -- "1.0" --> Artist
ArtistAlias -- "1.0" --> ArtistAliasType
ArtistAlias -- "2.0" --> Area
ArtistAlias -- "2.0" --> ArtistType
ArtistAlias -- "2.0" --> Gender
ArtistAlias -- "3.0" --> AreaType
Artist -- "1.0" --> Area
Artist -- "1.0" --> ArtistType
Artist -- "1.0" --> Gender
Artist -- "2.0" --> AreaType
ReleaseGroup -- "1.0" --> ReleaseGroupPrimaryType
ReleaseGroup -- "1.0" --> ArtistCredit
Release -- "1.0" --> ReleaseGroup
Release -- "1.0" --> ReleasePackaging
Release -- "1.0" --> ReleaseStatus
Release -- "1.0" --> ArtistCredit
Release -- "1.0" --> Language
Release -- "2.0" --> ReleaseGroupPrimaryType
ReleaseAlias -- "1.0" --> Release
ReleaseAlias -- "1.0" --> ReleaseAliasType
ReleaseAlias -- "2.0" --> ReleasePackaging
ReleaseAlias -- "2.0" --> ReleaseGroup
ReleaseAlias -- "2.0" --> ReleaseStatus
ReleaseAlias -- "2.0" --> ArtistCredit
ReleaseAlias -- "2.0" --> Language
ReleaseAlias -- "3.0" --> ReleaseGroupPrimaryType
ReleaseLabel -- "1.0" --> Label
ReleaseLabel -- "1.0" --> Release
ReleaseLabel -- "2.0" --> Area
ReleaseLabel -- "2.0" --> LabelType
ReleaseLabel -- "2.0" --> ReleasePackaging
ReleaseLabel -- "2.0" --> ReleaseGroup
ReleaseLabel -- "2.0" --> ReleaseStatus
ReleaseLabel -- "2.0" --> ArtistCredit
ReleaseLabel -- "2.0" --> Language
ReleaseLabel -- "3.0" --> AreaType
ReleaseLabel -- "3.0" --> ReleaseGroupPrimaryType
Medium -- "1.0" --> MediumFormat
Medium -- "1.0" --> Release
Medium -- "2.0" --> ReleasePackaging
Medium -- "2.0" --> ReleaseGroup
Medium -- "2.0" --> ReleaseStatus
Medium -- "2.0" --> ArtistCredit
Medium -- "2.0" --> Language
Medium -- "3.0" --> ReleaseGroupPrimaryType
Work -- "1.0" --> WorkType
Instrument -- "1.0" --> InstrumentType
digraph MusicMetadata {
rankdir=LR;
node [shape=ellipse, fontsize=10];
edge [fontsize=9];
"Track" -> "Recording" [label="1.0"];
"Track" -> "ArtistCredit" [label="2.0"];
"Recording" -> "ArtistCredit" [label="1.0"];
"Isrc" -> "Recording" [label="1.0"];
"Isrc" -> "ArtistCredit" [label="2.0"];
"RecordingAlias" -> "Recording" [label="1.0"];
"RecordingAlias" -> "RecordingAliasType" [label="1.0"];
"RecordingAlias" -> "ArtistCredit" [label="2.0"];
"Label" -> "Area" [label="1.0"];
"Label" -> "LabelType" [label="1.0"];
"Label" -> "AreaType" [label="2.0"];
"Area" -> "AreaType" [label="1.0"];
"ArtistAlias" -> "Artist" [label="1.0"];
"ArtistAlias" -> "ArtistAliasType" [label="1.0"];
"ArtistAlias" -> "Area" [label="2.0"];
"ArtistAlias" -> "ArtistType" [label="2.0"];
"ArtistAlias" -> "Gender" [label="2.0"];
"ArtistAlias" -> "AreaType" [label="3.0"];
"Artist" -> "Area" [label="1.0"];
"Artist" -> "ArtistType" [label="1.0"];
"Artist" -> "Gender" [label="1.0"];
"Artist" -> "AreaType" [label="2.0"];
"ReleaseGroup" -> "ReleaseGroupPrimaryType" [label="1.0"];
"ReleaseGroup" -> "ArtistCredit" [label="1.0"];
"Release" -> "ReleaseGroup" [label="1.0"];
"Release" -> "ReleasePackaging" [label="1.0"];
"Release" -> "ReleaseStatus" [label="1.0"];
"Release" -> "ArtistCredit" [label="1.0"];
"Release" -> "Language" [label="1.0"];
"Release" -> "ReleaseGroupPrimaryType" [label="2.0"];
"ReleaseAlias" -> "Release" [label="1.0"];
"ReleaseAlias" -> "ReleaseAliasType" [label="1.0"];
"ReleaseAlias" -> "ReleasePackaging" [label="2.0"];
"ReleaseAlias" -> "ReleaseGroup" [label="2.0"];
"ReleaseAlias" -> "ReleaseStatus" [label="2.0"];
"ReleaseAlias" -> "ArtistCredit" [label="2.0"];
"ReleaseAlias" -> "Language" [label="2.0"];
"ReleaseAlias" -> "ReleaseGroupPrimaryType" [label="3.0"];
"ReleaseLabel" -> "Label" [label="1.0"];
"ReleaseLabel" -> "Release" [label="1.0"];
"ReleaseLabel" -> "Area" [label="2.0"];
"ReleaseLabel" -> "LabelType" [label="2.0"];
"ReleaseLabel" -> "ReleasePackaging" [label="2.0"];
"ReleaseLabel" -> "ReleaseGroup" [label="2.0"];
"ReleaseLabel" -> "ReleaseStatus" [label="2.0"];
"ReleaseLabel" -> "ArtistCredit" [label="2.0"];
"ReleaseLabel" -> "Language" [label="2.0"];
"ReleaseLabel" -> "AreaType" [label="3.0"];
"ReleaseLabel" -> "ReleaseGroupPrimaryType" [label="3.0"];
"Medium" -> "MediumFormat" [label="1.0"];
"Medium" -> "Release" [label="1.0"];
"Medium" -> "ReleasePackaging" [label="2.0"];
"Medium" -> "ReleaseGroup" [label="2.0"];
"Medium" -> "ReleaseStatus" [label="2.0"];
"Medium" -> "ArtistCredit" [label="2.0"];
"Medium" -> "Language" [label="2.0"];
"Medium" -> "ReleaseGroupPrimaryType" [label="3.0"];
"Work" -> "WorkType" [label="1.0"];
"Instrument" -> "InstrumentType" [label="1.0"];
}