Priority and Frequency Report during BFS ‐ DJkistra execution on processGraphByBreadthFirst method - JoseCanova/brainz GitHub Wiki
graph LR
ArtistCredit("ArtistCredit<br/>Freq: 9<br/>Pri: 31")
AreaType("AreaType<br/>Freq: 5<br/>Pri: 24")
Recording("Recording<br/>Freq: 3<br/>Pri: 18")
Area("Area<br/>Freq: 4<br/>Pri: 15")
Label("Label<br/>Freq: 1<br/>Pri: 14")
Release("Release<br/>Freq: 3<br/>Pri: 13")
RecordingAliasType("RecordingAliasType<br/>Freq: 1<br/>Pri: 12")
ReleaseAliasType("ReleaseAliasType<br/>Freq: 1<br/>Pri: 9")
ArtistType("ArtistType<br/>Freq: 2<br/>Pri: 8")
ReleaseGroup("ReleaseGroup<br/>Freq: 4<br/>Pri: 7")
Artist("Artist<br/>Freq: 1<br/>Pri: 5")
ArtistAliasType("ArtistAliasType<br/>Freq: 1<br/>Pri: 3")
MediumFormat("MediumFormat<br/>Freq: 1<br/>Pri: 3")
WorkType("WorkType<br/>Freq: 1<br/>Pri: 2")
InstrumentType("InstrumentType<br/>Freq: 1<br/>Pri: 2")
Language("Language<br/>Freq: 4<br/>Pri: 0")
ReleaseGroupPrimaryType("ReleaseGroupPrimaryType<br/>Freq: 5<br/>Pri: 0")
Gender("Gender<br/>Freq: 2<br/>Pri: 0")
LabelType("LabelType<br/>Freq: 2<br/>Pri: 0")
ReleasePackaging("ReleasePackaging<br/>Freq: 4<br/>Pri: 0")
ReleaseStatus("ReleaseStatus<br/>Freq: 4<br/>Pri: 0")
Recording --> ArtistCredit
Release --> ReleasePackaging
Release --> ReleaseStatus
Release --> Language
Release --> ReleaseGroup
Release --> ArtistCredit
ReleaseGroup --> ReleaseGroupPrimaryType
ReleaseGroup --> ArtistCredit
Label --> LabelType
Label --> Area
Area --> AreaType
Artist --> Area
Artist --> ArtistType
Artist --> Gender
style ArtistCredit fill:#ADD8E6,stroke:#333,stroke-width:2px;
style AreaType fill:#ADD8E6,stroke:#333,stroke-width:2px;
style Recording fill:#ADD8E6,stroke:#333,stroke-width:2px;
style Area fill:#ADD8E6,stroke:#333,stroke-width:2px;
style Label fill:#ADD8E6,stroke:#333,stroke-width:2px;
style Release fill:#ADD8E6,stroke:#333,stroke-width:2px;
style RecordingAliasType fill:#ADD8E6,stroke:#333,stroke-width:2px;
style ReleaseAliasType fill:#ADD8E6,stroke:#333,stroke-width:2px;
style ArtistType fill:#ADD8E6,stroke:#333,stroke-width:2px;
style ReleaseGroup fill:#ADD8E6,stroke:#333,stroke-width:2px;
style Artist fill:#ADD8E6,stroke:#333,stroke-width:2px;
style ArtistAliasType fill:#ADD8E6,stroke:#333,stroke-width:2px;
style MediumFormat fill:#ADD8E6,stroke:#333,stroke-width:2px;
style WorkType fill:#ADD8E6,stroke:#333,stroke-width:2px;
style InstrumentType fill:#ADD8E6,stroke:#333,stroke-width:2px;
style Language fill:#90EE90,stroke:#333,stroke-width:2px;
style ReleaseGroupPrimaryType fill:#90EE90,stroke:#333,stroke-width:2px;
style Gender fill:#90EE90,stroke:#333,stroke-width:2px;
style LabelType fill:#90EE90,stroke:#333,stroke-width:2px;
style ReleasePackaging fill:#90EE90,stroke:#333,stroke-width:2px;
style ReleaseStatus fill:#90EE90,stroke:#333,stroke-width:2px;
digraph G {
rankdir=LR; // Layout from Left to Right
// Define nodes and their attributes (only for visited entities)
// Format: "SimpleName" [label="SimpleName\nFreq: <Frequency>\nPri: <Priority>", style=filled, fillcolor="lightblue"];
"ArtistCredit" [label="ArtistCredit\nFreq: 9\nPri: 31", style=filled, fillcolor="lightblue"];
"AreaType" [label="AreaType\nFreq: 5\nPri: 24", style=filled, fillcolor="lightblue"];
"Recording" [label="Recording\nFreq: 3\nPri: 18", style=filled, fillcolor="lightblue"];
"Area" [label="Area\nFreq: 4\nPri: 15", style=filled, fillcolor="lightblue"];
"Label" [label="Label\nFreq: 1\nPri: 14", style=filled, fillcolor="lightblue"];
"Release" [label="Release\nFreq: 3\nPri: 13", style=filled, fillcolor="lightblue"];
"RecordingAliasType" [label="RecordingAliasType\nFreq: 1\nPri: 12", style=filled, fillcolor="lightblue"];
"ReleaseAliasType" [label="ReleaseAliasType\nFreq: 1\nPri: 9", style=filled, fillcolor="lightblue"];
"ArtistType" [label="ArtistType\nFreq: 2\nPri: 8", style=filled, fillcolor="lightblue"];
"ReleaseGroup" [label="ReleaseGroup\nFreq: 4\nPri: 7", style=filled, fillcolor="lightblue"];
"Artist" [label="Artist\nFreq: 1\nPri: 5", style=filled, fillcolor="lightblue"];
"ArtistAliasType" [label="ArtistAliasType\nFreq: 1\nPri: 3", style=filled, fillcolor="lightblue"];
"MediumFormat" [label="MediumFormat\nFreq: 1\nPri: 3", style=filled, fillcolor="lightblue"];
"WorkType" [label="WorkType\nFreq: 1\nPri: 2", style=filled, fillcolor="lightblue"];
"InstrumentType" [label="InstrumentType\nFreq: 1\nPri: 2", style=filled, fillcolor="lightblue"];
"Language" [label="Language\nFreq: 4\nPri: 0", style=filled, fillcolor="lightgreen"]; // Lower priority, different color
"ReleaseGroupPrimaryType" [label="ReleaseGroupPrimaryType\nFreq: 5\nPri: 0", style=filled, fillcolor="lightgreen"];
"Gender" [label="Gender\nFreq: 2\nPri: 0", style=filled, fillcolor="lightgreen"];
"LabelType" [label="LabelType\nFreq: 2\nPri: 0", style=filled, fillcolor="lightgreen"];
"ReleasePackaging" [label="ReleasePackaging\nFreq: 4\nPri: 0", style=filled, fillcolor="lightgreen"];
"ReleaseStatus" [label="ReleaseStatus\nFreq: 4\nPri: 0", style=filled, fillcolor="lightgreen"];
// Define edges (inferred from Vertex Distance = 1.0, and only if both source and target are visited)
"Recording" -> "ArtistCredit";
"Release" -> "ReleasePackaging";
"Release" -> "ReleaseStatus";
"Release" -> "Language";
"Release" -> "ReleaseGroup";
"Release" -> "ArtistCredit";
"ReleaseGroup" -> "ReleaseGroupPrimaryType";
"ReleaseGroup" -> "ArtistCredit";
"Label" -> "LabelType";
"Label" -> "Area";
"Area" -> "AreaType";
"Artist" -> "Area";
"Artist" -> "ArtistType";
"Artist" -> "Gender";
}
To generate the Graphviz dot file and visualize the relevant parts of your graph for a work breakdown structure, I employed the following strategy:
Understand the Goal: The primary goal was to visualize the dependencies and priorities, specifically excluding entities that were not visited during the frequency count, as these were the ones causing the null frequency issue.
Data Source Analysis (log_report.txt): I meticulously parsed the log_report.txt to extract three key pieces of information:
"Vertex Distance" Section: This section provided directed relationships and their shortest path distances (e.g., "Vertex Distance: 1.0 between Isrc and Recording"). I used pairs with a distance of 1.0 to infer direct edges in the graph, as this strongly indicates a direct dependency.
"Frequency Count" Section: This explicitly listed the 21 entities that were visited during the Breadth-First Search traversals and their respective visitFrequency counts. This list was crucial for determining which nodes to include in the final Graphviz output.
"Priority Queue" Section: This section listed all 31 entities in your graph along with their final calculated newPriority values after inverse frequency scaling.
Entity Filtering (Work Breakdown Structure Scope):
To adhere to your request of "removing the 'entities' that are not being visited during the frequency count," I compared the list of 31 entities from the "Priority Queue" section against the 21 entities found in the "Frequency Count" section.
Only the 21 entities present in the "Frequency Count" section were selected for inclusion as nodes in the Graphviz diagram. The remaining 10 entities were excluded from the visualization, as they were the ones contributing to the null frequency problem.
Node Construction:
For each of the 21 selected entities, I created a Graphviz node.
The node's label was formatted to clearly display its simple class name (e.g., ArtistCredit), its Frequency (from the "Frequency Count" section), and its final Priority (from the "Priority Queue" section).
Nodes were styled (e.g., lightblue fill color) for better visual distinction. Entities with a final priority of 0 (after scaling, indicating a relatively lower processing order) were given a lightgreen fill color to visually differentiate them.
Edge Construction:
I iterated through all the "Vertex Distance: 1.0 between X and Y" lines in the log_report.txt.
For each such line, a directed edge from X to Y was inferred.
Crucially, an edge was only included in the Graphviz output if both its source node (X) and its target node (Y) were among the 21 selected, visited entities. This ensured that the visualization accurately reflected the dependencies only within the actively visited (and therefore prioritized) subgraph.
This strategy allowed for a targeted visualization that highlights the portion of your graph that is actively processed and prioritized, making the dependencies and relative priorities of those entities clear.