Meeting notes week 3 - davidlabee/Graph4Air GitHub Wiki

🗓️ Meeting Notes – 29 April 2025, 14:00

Summary:
- David first introduced his thesis topic to Jules, explaining the goal of enhancing road network graphs for better air pollution modeling.
- He presented five different augmentation strategies developed and tested (road_graph_strategies_subset-2.ipynb).
Key Insights:
1. Feature category-based selectivity (cosine similarity on traffic, population, land use, morphology).
2. Sparse multi-modal agreement (≥3 feature domains agreeing).
3. Distance-constrained similarity edges (500 m–20 km range).
4. Top-K strongest similarity edges (K=2).
5. Soft augmentation with similarity as edge weights.
- Strategy 2 (Sparse Multi-Modal Agreement) currently shows the best balance between sparsity and meaningful connectivity.
Challenges:
- Avoid over-densifying the graph.
- Maintain functional similarity while preserving spatial structure.

Summary:
- Pieter walked through the baseline models (Baseline_50m_Thesis_(COPY).ipynb).
- GAT and GCN were trained on the raw 50 m-segment road graph without augmentation.
Key Insights:
- GAT already outperforms GCN in initial experiments.
- Likely reason: GAT uses attention mechanisms to weigh more important neighbors more heavily, improving information aggregation.
Challenges:
- Further analyze why attention benefits this context.

Summary:
- Debated how to best split data for training and evaluation.
Key Insights:
- Traditional random splits may or may not be ideal.
- Our goal is interpolation across known road segments, not predicting truly unseen segments.
Challenges:
- Decide if a held-out test set is necessary.
- Alternatives: cross-validation over segments, internal validation (early stopping), etc.
- Emphasize achieving smooth, accurate visual maps over raw accuracy.

Summary:
- Jules and Pieter met earlier to discuss alternative node aggregation methods.
Key Ideas:
- Square-grid aggregation: group segments into grid cells as nodes. This could function as a layer of nodes that captures patterns only observed at a course level. Later this grid layer and the 50m segment layer could be combined in an hierarchical GNN model that captures both low and high resolution patterns.
- Multi-graph approach: build separate graphs by functional categories (e.g., residential vs. highways). Suggested by Zhendong as faster to implement.
Challenges:
- Time overhead of square-grid aggregation.
- Deciding how to train/evaluate multiple specialized graphs.

David to benchmark and finalize the best augmentation strategy and test it on the whole city (Sparse Multi-Modal Agreement).
Pieter to merge the current 50m segments into 100m and 200m segments (possibly adding segment length as a feature if segment length varies a lot)
Group to:
- Describe baseline model in the wiki while adressing the challenges and motivating choices.
- Define final evaluation methodology (Basic holdout vs. cross-validation)(with or without early stopping?).
- Look into using multiple graphs. So a graph for residential areas, highways etc.

Next Meeting:
6 may 2025, 14:30