Meeting notes week 6 - davidlabee/Graph4Air GitHub Wiki
🗓️ Meeting Notes – 29 April 2025, 14:00
🗒️ Open Feedback
- External validation: Use Palmes tube data for final validation—and possibly include in training.
- Continue refining baseline GAT/GCN architectures and document choices in the wiki
- Think about dataset imbalance: Some road types dominate; consider penalizing over-represented classes in the loss.
- Pieter: Think about alternative graph construction with Highway segments separate from the rest.
🗒️ Last Meeting’s Feedback
General Notes
- Check the unit for NO2d data. Is it the same as in the research of the supervisors?
- ✅ Should we use a mask for the missing values instead of mean imputation + dropping?
- ✅ What is the effect of the current imputation function on performance? Should imputation be increased to include longer chains?
- Put the created Colab notebooks on Github as well.
- Fill in the Scoreboard page to document results.
- ✅ Try the cross validation to see if there is overfitting.
- ✅ Think about bias, overfitting and look at ways to prevent these.
- Also consider other GNN python packages.
- ✅ Do external validation on the Palmes dataset for all models. (could maybe be done during hyperparameter tuning)
- If you are satisfied with the current graph structure look into hyperparameter tuning
- Is there is a way to take 'half' of the Palmes measurements into the graph structure (for training)
Feedback for Pieter
- Refine basic aggregation method using existing graph partitioning algorithms.
- Optional: try multiscale graphs where aggregation happens based on road types.
Feedback for David
- For next week create some good comparisons of the graph augmentation model with different parameters, scores and the baseline model scores.
🗒️ Progress since last meeting
Pieter
- Implemented new method of using a mask for the missing values instead of mean imputation + dropping.
- What is the effect of the current imputation function on performance? --> Scoreboard
- Implemented k-fold cross validation. --> (show notebook)
- Thought about bias, overfitting. --> Bias and Overfitting in Road‐Network GNNs
- More on transductive vs inductive learning for GNN--> Transductive vs Inductive Learning in GNNs
David
- Completed and documented the David's Graph Design wiki page, which explains the semantic edge augmentation method in detail, including full implementation, parameter tuning, and theoretical motivation.
- Finished the Potentially handling outliers wiki page, describing how local residuals are used to detect and optionally exclude outlier NO₂ measurements.
- Finalized the Palmes validation wiki page, which covers the spatial matching and validation of predictions against Palmes tubes NO₂ data across Amsterdam.
- Uploaded all core graph augmentation, model training, and evaluation code to GitHub.
- Refined the graph augmentation literature page with new structure and combined theoretical and applied references.
🗒️ New Meeting Notes
General Notes
- No need for outlier removal. We can try it during training but when comparing to other results, the dataset has to be identical.
- Get started on hyperparameter tuning. It can take a long time. Send a mail to Zhendong with request for access to the cpu cores.
- ...
Feedback for Pieter
- In the thesis, explain also the physical reasons for choosing a multi-resolution model.
- Continue working on coarsening method but don't make it to difficult.
- Share the results from the multi-resolution model. Does it improve the baselines?
Feedback for David
- Apply outlier handling only on the training set, then evaluate on the original test set to ensure consistent comparisons across datasets.
- Leverage Zhendong’s 16 CPU cores to parallelize your hyperparameter search and efficiently find the best parameters.
- For external validation with the Palmes tube data, use the same R² (Pearson’s squared) metric to maintain consistency.
Next Meeting: Tuesday 27th of May, 13:00 (Online on Teams)