David's most optimal parameter search - davidlabee/Graph4Air GitHub Wiki

Hyperparameter Tuning with Optuna for Graph Neural Networks (GCN & GAT)

This document explains the hyperparameter tuning strategy using Optuna for optimizing GCN and GAT models in predicting NOโ‚‚ levels over road segments. It includes outlier handling, graph augmentation, model training, evaluation, and results.


๐Ÿ“Œ Step 1: Outlier Detection and Classification

Before model training, we identify and handle spatial outliers in the NOโ‚‚ measurements.

Detection

  • Each road segment is compared to the mean NOโ‚‚ value of its neighbors (within 1-hop in the graph).
  • The residual between a segment's value and its neighbors' mean is computed.
  • Outliers are those where the residual exceeds a threshold ร— MAD (Median Absolute Deviation).
  • The threshold can differ based on road type:
    • Highways (TRAFMAJOR > 20,000): threshold = 9.0
    • Others: threshold = 5.0

Classification

Each outlier is classified as an error or a real extreme using 3 heuristics:

  1. Palmes Tube Comparison
    Nearby official Palmes tube NOโ‚‚ measurements (within 50m) are compared.
    If the outlier is > 2ร— the Palmes value โ†’ classified as error.

  2. Neighborhood Agreement
    If most neighbors within 200m also have high NOโ‚‚ โ†’ classified as real extreme.

  3. Traffic-based Prediction
    A linear model trained on TRAFNEAR โ†’ NO2d.
    If prediction error > 3ร— RMSE โ†’ error, else real extreme.

Only non-error segments are used in training to avoid bias.


๐Ÿง  Step 2: Graph Construction and Augmentation

A graph is created where nodes are road segments and edges represent physical connections or added similarity links:

  • Base Graph: Built from spatial adjacency (segments touching).
  • Augmented Edges:
    • Based on land use features (grouped).
    • Top-N nodes with highest intensity in a group are selected.
    • KNN is applied on their normalized feature vectors.
    • Pairs are added as new edges if:
      • Cosine similarity exceeds threshold.
      • Physical distance is between min and max bounds.
      • Nodes aren't too close in original graph (by hop distance).
      • Edge limits per node and global max edge constraints are respected.

๐Ÿงช Step 3: Model Training & Evaluation

Two GNN architectures were used:

  • GCN: Graph Convolutional Network with 3 layers.
  • GAT: Graph Attention Network with 3 layers and 2 heads.

Optuna Optimization

Optuna is used to optimize hyperparameters like:

  • top_n: number of high-intensity nodes per group
  • neighbors: number of KNN neighbors
  • sim_thresh: similarity threshold for adding edges
  • min_dist, max_dist: spatial bounds for edge creation
  • hop_thresh: how close nodes can be in original graph
  • max_edges: global cap on new edges
  • per_node_cap: edge cap per node

Why Optuna?

Optuna uses:

  • TPE (Tree-structured Parzen Estimator): a Bayesian optimizer that models promising vs. unpromising hyperparameter regions.
  • Early Stopping (Pruning): unpromising trials are aborted early (e.g., using HyperbandPruner) to save time.

This makes Optuna efficient for exploring large, high-dimensional spaces with limited resources.


Results of Hyperparameter Search

We ran a total of 100 trials per model, each with a unique combination of graph augmentation parameters. The score in the table below represents the RMSE on the test set for that trial (lower is better).

๐Ÿ” Top 5 GCN Trials

Trial RMSE top_n neighbors sim_thresh min_dist max_dist hop_thresh max_edges per_node_cap
16 9.58 1000 120 0.9888 197 1152 5 3000 4
6 9.59 1000 50 0.9543 82 2980 2 3000 6
0 9.62 500 70 0.9806 297 1783 4 2000 4
87 9.62 1000 170 0.9906 462 1173 3 3000 5
98 9.62 1000 30 0.9477 482 2689 3 2000 4

๐Ÿ” Top 5 GAT Trials

Trial RMSE top_n neighbors sim_thresh min_dist max_dist hop_thresh max_edges per_node_cap
40 9.40 1000 40 0.9173 187 1870 3 3000 7
33 9.40 1000 60 0.9470 332 2315 3 2500 3
37 9.40 1000 180 0.9834 167 1583 4 4500 2
44 9.42 1000 90 0.9932 180 1340 2 2000 7
25 9.42 1000 130 0.9215 464 1592 5 5000 8

Performance Visualization

We visualized all 200 trials across both models. The plot below shows RMSE per trial. You can see that while GAT generally outperforms GCN, both models achieve stable and competitive results across various configurations.

![RMSE Over Trials] image


Conclusion

  • GAT outperformed GCN slightly, achieving the lowest RMSE of ~9.40.
  • Optuna helped efficiently navigate a large search space.
  • Outlier filtering ensured cleaner training data and more robust evaluation.