# Analysis: Causes of Variation in `CsvFileProcessingPriority.processGraphByBreadthFirst` Results - JoseCanova/brainz GitHub Wiki

Analysis: Causes of Variation in CsvFileProcessingPriority.processGraphByBreadthFirst Results

Context

  • The system uses an entity-relationship graph (JGraphT) to represent JPA entities and their relationships.
  • The CsvFileProcessingPriority class computes priorities for entities, using a breadth-first traversal of this graph.
  • Five rounds of output show significant variation in the computed priorities or counts for each entity.

Key Observations

  • The order and values for entities change significantly between runs.
  • The process relies on traversing a graph of entities where each entity is a vertex, and relationships (edges) may have weights.

Main Causes of Variation

1. Unordered Data Structures

  • The order in which vertices are stored and traversed is determined by the underlying graph or collection implementation.
  • If unordered structures (like HashSet) are used for the vertex set, iteration order is not guaranteed and may change between runs.
  • This leads to different starting points and traversal sequences, resulting in different priority calculations.

2. Non-Deterministic Graph Construction

  • If the graph is built dynamically (e.g., via reflection, metamodel scanning, or classpath scanning), the order of entity discovery and addition may vary.
  • This can lead to different graph shapes or edge directions between runs if not carefully controlled.

3. Edge Weight and Attribute Discovery

  • Edge weights (1 or 2) are assigned based on field annotations (@NotNull, @NotBlank, etc.).
  • If annotation processing or field discovery order is non-deterministic, the graph's weights can differ between executions.

4. Parallelism or Threading

  • If any processing is done in parallel or with multiple threads, and if shared data structures are not synchronized or ordered, results may vary.

5. Breadth-First Traversal Parent Assignment

  • The parent chosen for a node during breadth-first traversal depends on the order in which neighbors are visited.
  • If neighbor order is not fixed, parent assignments (and thus priority propagation) can be inconsistent.

Recommendations for Determinism

Problem Solution
Unordered collections Use ordered structures (LinkedHashSet, etc)
Dynamic, unordered discovery Sort entities before adding to the graph
Field/attribute discovery Sort fields by name before processing
Edge/weight assignment Ensure annotation processing is ordered

Summary Table

Cause Manifestation in Results Fix/Recommendation
Unordered vertex set/graph Different entity order per round Use ordered collections/graphs
Dynamic, unordered entity discovery Entities appear/disappear/reorder Sort entities before processing
Non-deterministic edge/relationship order Variations in parent assignments Consistently build/traverse graph

Direct Answer

The main cause of the variation is the use of unordered data structures for the entity graph and its traversal. This introduces non-determinism in the traversal order, leading to different results on each run. To fix this, use ordered collections and ensure the graph and all its construction steps are deterministic.


For further diagnosis or code adjustment examples, just request!