Pieter's Graph Design - davidlabee/Graph4Air GitHub Wiki
π§ Thesis Goal: Exploring Spatial Aggregation in Road Segment Graphs
The goal of this thesis is to understand how the spatial resolution of road segment graphs affects the performance of Graph Neural Networks (GCNs and GATs) for predicting air pollution.
We start with a fine-grained graph, where each node represents a 50m road segment. The idea is to coarsen this graph to test other aggregation levels β such as 100m, 200m, or 500m β and see how the modelβs ability to learn and generalize changes.
By building multiple versions of the graph at different resolutions, we aim to answer:
- Does coarsening improve performance by reducing noise or over-smoothing?
- Is there an optimal segment size for pollution prediction?
- Can multi-resolution approaches combine the best of both fine and coarse views?
π§± Approaches to Aggregating Road Segments
There are different ways to group or merge 50m road segments into larger units:
1. Index-Based Grouping
- Merge every n consecutive rows (e.g., 4Γ50m β 200m).
- Simple but risky β assumes the order of rows matches spatial order, which is not always true.
2. Distance-Based Sorting
- Sort all segments by distance from a starting point (e.g., segment centroid distance), then group them.
- Better than random order, but may not preserve connectivity in complex road networks.
3. Clustering-Based Aggregation
- Use spatial clustering (e.g., DBSCAN, KMeans) or grid-snapping to group segments within a certain radius.
- Allows flexibility but might break natural road continuity.
4. Touch-Based Growing (Used in This Project β )
- Start from one segment and grow a group by adding nearby touching segments.
- Stops when a group reaches a target size (e.g., 4 segments = ~200m).
- Preserves spatial continuity and real-world road shapes.
coarsen_by_touching()
Function (Current Method)
π§ͺ The The coarsening method we use is a touch-based, greedy grouping algorithm. Here's how it works:
- Start with a single 50m road segment
- Grow a group by adding touching neighbors
- Stop when the group reaches a target size (e.g., 4 segments)
- Repeat for all unvisited segments
Additional features:
- Groups of only 1 segment (singleton groups) are optionally merged into the nearest larger group
- Feature values (e.g., pollution, traffic) are averaged across the group
- The final result is a new GeoDataFrame with:
- One row per coarsened unit
- A
group_size
column (how many segments it contains) - A merged geometry
- Optionally: the function can plot the result and return group mappings
This approach allows us to build graph-ready inputs at 100m, 200m, or 500m resolution while keeping the real-world spatial structure intact.
πΊοΈ Results - Target Size = 4 (~200m segments)
Coursening Statistics:
- Count | 407 coarsened segments (~200m-segments)
- Mean | On average each coarsened segment consists of 3.86 50m-segments.
- Min | 2 50m-segments
- Max | 6 50m-segments
Observations:
- Most groups consist of 4 segments, which matches the target group size.
- The minimum group size is 2, due to geometric constraints or leftover segments.
- A few larger groups (up to 6) exist, likely caused by singleton merging or dense road connectivity.
β οΈ Limitations & Criticism of the Current Coarsening Method
While the coarsen_by_touching()
method is practical and preserves physical connectivity, it has some important limitations to be aware of:
π 1. Non-linear Segment Grouping
The method grows groups by touching neighbors, without considering directionality or road curvature.
As a result, it may:
- Merge segments around a sharp bend or intersection
- Produce irregular shapes that are not always straight or linear
This is a potential issue because:
- Air pollution often follows the linear flow of traffic and wind, not necessarily physical connectivity
- Merging across corners may mix unrelated micro-environments, especially in dense urban areas
π 2. No Constraint on Geometry Shape
The algorithm does not enforce constraints like:
- Max turning angle between merged segments
- Road classification or road name consistency
- Straightness or axial continuity
This means some coarsened segments may combine a main road and a side alley, or segments that don't reflect a single pollution profile.
π‘ Possible Improvements
- Introduce angle-based filtering: only merge segments if the turning angle is below a threshold
- Use road network metadata (e.g., road name, direction) to restrict grouping
- Add a maximum deviation from a straight line as a merging constraint