Meeting notes week 6 - davidlabee/Graph4Air GitHub Wiki

🗓️ Meeting Notes – 29 April 2025, 14:00

🗒️ Open Feedback

  • External validation: Use Palmes tube data for final validation—and possibly include in training.
  • Continue refining baseline GAT/GCN architectures and document choices in the wiki
  • Think about dataset imbalance: Some road types dominate; consider penalizing over-represented classes in the loss.
  • Pieter: Think about alternative graph construction with Highway segments separate from the rest.

🗒️ Last Meeting’s Feedback

General Notes

  • Check the unit for NO2d data. Is it the same as in the research of the supervisors?
  • ✅ Should we use a mask for the missing values instead of mean imputation + dropping?
  • ✅ What is the effect of the current imputation function on performance? Should imputation be increased to include longer chains?
  • Put the created Colab notebooks on Github as well.
  • Fill in the Scoreboard page to document results.
  • ✅ Try the cross validation to see if there is overfitting.
  • ✅ Think about bias, overfitting and look at ways to prevent these.
  • Also consider other GNN python packages.
  • ✅ Do external validation on the Palmes dataset for all models. (could maybe be done during hyperparameter tuning)
  • If you are satisfied with the current graph structure look into hyperparameter tuning
  • Is there is a way to take 'half' of the Palmes measurements into the graph structure (for training)

Feedback for Pieter

  • Refine basic aggregation method using existing graph partitioning algorithms.
  • Optional: try multiscale graphs where aggregation happens based on road types.

Feedback for David

  • For next week create some good comparisons of the graph augmentation model with different parameters, scores and the baseline model scores.

🗒️ Progress since last meeting

Pieter

David

  • Completed and documented the David's Graph Design wiki page, which explains the semantic edge augmentation method in detail, including full implementation, parameter tuning, and theoretical motivation.
  • Finished the Potentially handling outliers wiki page, describing how local residuals are used to detect and optionally exclude outlier NO₂ measurements.
  • Finalized the Palmes validation wiki page, which covers the spatial matching and validation of predictions against Palmes tubes NO₂ data across Amsterdam.
  • Uploaded all core graph augmentation, model training, and evaluation code to GitHub.
  • Refined the graph augmentation literature page with new structure and combined theoretical and applied references.

🗒️ New Meeting Notes

General Notes

  • No need for outlier removal. We can try it during training but when comparing to other results, the dataset has to be identical.
  • Get started on hyperparameter tuning. It can take a long time. Send a mail to Zhendong with request for access to the cpu cores.
  • ...

Feedback for Pieter

  • In the thesis, explain also the physical reasons for choosing a multi-resolution model.
  • Continue working on coarsening method but don't make it to difficult.
  • Share the results from the multi-resolution model. Does it improve the baselines?

Feedback for David

  • Apply outlier handling only on the training set, then evaluate on the original test set to ensure consistent comparisons across datasets.
  • Leverage Zhendong’s 16 CPU cores to parallelize your hyperparameter search and efficiently find the best parameters.
  • For external validation with the Palmes tube data, use the same R² (Pearson’s squared) metric to maintain consistency.

Next Meeting: Tuesday 27th of May, 13:00 (Online on Teams)