Meeting notes week 4 - davidlabee/Graph4Air GitHub Wiki

🗓️ Meeting Notes – 29 April 2025, 14:00

Graph Augmentation: Avoid over-densifying the graph; ensure new edges reflect functional similarity while preserving spatial structure.
GAT vs. GCN: GAT outperforms GCN; investigate why attention helps in this context.
Train/Test Splitting: Random splits may not fit the use case; focus on spatial interpolation and map quality over raw accuracy.
Aggregation: Consider the road-segment aggregation for hierarchical models; multi-graph by road type may be a faster alternative.
Baseline Documentation: Clearly describe and justify baseline models and evaluation choices in the wiki.

Thought through the general GNN workflow → Thinking about the Model Architecture
Drafted the Model Architecture page for the Baseline Models → Baseline Model Architecture
Created a graph at a different aggregation level; shared progress and notes → Pieter's Graph Design

Drafted and refined the introduction section of the thesis report in LaTeX.
Conducted additional literature research on graph-augmentation techniques (similarity-based edges, spatial graphs).
Ran multiple parameter configurations of the augmentation on the full Amsterdam network; hit memory limits.
Implemented an optimized approach comparing only node pairs within spatial range—still awaiting full results.

External validation: Zhendong to provide Palmes tube data for final validation—and possibly include in training.
Train/Test Split: Use an 80% node‐mask for training; model sees features for all nodes but only labels for the 80%.
Evaluation Metrics: Validate using the Palmes dataset; report MAE and RMSE.
Dataset Imbalance: Highway segments dominate; consider penalizing over-represented classes in the loss.
Continue refining baseline GAT/GCN architectures and document choices in the wiki.

Discussed external validation workflow with Palmes data—could be integrated into training or held out.
Revisited train/test masking strategy and its impact on interpolation quality.
Confirmed MAE/RMSE as primary metrics.

Pieter demonstrated his GitHub wiki updates on the baseline model architecture.
Pieter showcased progress on data aggregation—current square‐grid approach is functional.
David shared preliminary results of the optimized similarity algorithm (still running).

Propose clustering nodes first to optimize feature-similarity augmentation.
Zhendong will grant access to multiple CPU cores to speed up pairwise similarity.
Zhendong to parallelize the similarity-computation function.
Instead of leaving nodes out, compute similarity across all node pairs.

Continue working on this data aggregation method and start testing with the models.
Check the stats (distribution) of the current aggregation and try other options.
Think about alternative graph construction with Highway segments separate from the rest.

Next Meeting: 13 May 2025, 11:00