Meeting notes week 6 - davidlabee/Graph4Air GitHub Wiki

🗓️ Meeting Notes – 29 April 2025, 14:00

External validation: Use Palmes tube data for final validation—and possibly include in training.
Continue refining baseline GAT/GCN architectures and document choices in the wiki
Think about dataset imbalance: Some road types dominate; consider penalizing over-represented classes in the loss.
Pieter: Think about alternative graph construction with Highway segments separate from the rest.

Check the unit for NO2d data. Is it the same as in the research of the supervisors?
✅ Should we use a mask for the missing values instead of mean imputation + dropping?
✅ What is the effect of the current imputation function on performance? Should imputation be increased to include longer chains?
Put the created Colab notebooks on Github as well.
Fill in the Scoreboard page to document results.
✅ Try the cross validation to see if there is overfitting.
✅ Think about bias, overfitting and look at ways to prevent these.
Also consider other GNN python packages.
✅ Do external validation on the Palmes dataset for all models. (could maybe be done during hyperparameter tuning)
If you are satisfied with the current graph structure look into hyperparameter tuning
Is there is a way to take 'half' of the Palmes measurements into the graph structure (for training)

Refine basic aggregation method using existing graph partitioning algorithms.
Optional: try multiscale graphs where aggregation happens based on road types.

For next week create some good comparisons of the graph augmentation model with different parameters, scores and the baseline model scores.

Implemented new method of using a mask for the missing values instead of mean imputation + dropping.
What is the effect of the current imputation function on performance? --> Scoreboard
Implemented k-fold cross validation. --> (show notebook)
Thought about bias, overfitting. --> Bias and Overfitting in Road‐Network GNNs
More on transductive vs inductive learning for GNN--> Transductive vs Inductive Learning in GNNs

Completed and documented the David's Graph Design wiki page, which explains the semantic edge augmentation method in detail, including full implementation, parameter tuning, and theoretical motivation.
Finished the Potentially handling outliers wiki page, describing how local residuals are used to detect and optionally exclude outlier NO₂ measurements.
Finalized the Palmes validation wiki page, which covers the spatial matching and validation of predictions against Palmes tubes NO₂ data across Amsterdam.
Uploaded all core graph augmentation, model training, and evaluation code to GitHub.
Refined the graph augmentation literature page with new structure and combined theoretical and applied references.

No need for outlier removal. We can try it during training but when comparing to other results, the dataset has to be identical.
Get started on hyperparameter tuning. It can take a long time. Send a mail to Zhendong with request for access to the cpu cores.
...

In the thesis, explain also the physical reasons for choosing a multi-resolution model.
Continue working on coarsening method but don't make it to difficult.
Share the results from the multi-resolution model. Does it improve the baselines?

Apply outlier handling only on the training set, then evaluate on the original test set to ensure consistent comparisons across datasets.
Leverage Zhendong’s 16 CPU cores to parallelize your hyperparameter search and efficiently find the best parameters.
For external validation with the Palmes tube data, use the same R² (Pearson’s squared) metric to maintain consistency.

Next Meeting: Tuesday 27th of May, 13:00 (Online on Teams)