David's most optimal parameter search - davidlabee/Graph4Air GitHub Wiki
Hyperparameter Tuning with Optuna for Graph Neural Networks (GCN & GAT)
This document explains the hyperparameter tuning strategy using Optuna for optimizing GCN and GAT models in predicting NOโ levels over road segments. It includes outlier handling, graph augmentation, model training, evaluation, and results.
๐ Step 1: Outlier Detection and Classification
Before model training, we identify and handle spatial outliers in the NOโ measurements.
Detection
- Each road segment is compared to the mean NOโ value of its neighbors (within 1-hop in the graph).
- The residual between a segment's value and its neighbors' mean is computed.
- Outliers are those where the residual exceeds a threshold ร MAD (Median Absolute Deviation).
- The threshold can differ based on road type:
- Highways (
TRAFMAJOR > 20,000
): threshold = 9.0 - Others: threshold = 5.0
- Highways (
Classification
Each outlier is classified as an error or a real extreme using 3 heuristics:
-
Palmes Tube Comparison
Nearby official Palmes tube NOโ measurements (within 50m) are compared.
If the outlier is > 2ร the Palmes value โ classified as error. -
Neighborhood Agreement
If most neighbors within 200m also have high NOโ โ classified as real extreme. -
Traffic-based Prediction
A linear model trained onTRAFNEAR
โNO2d
.
If prediction error > 3ร RMSE โ error, else real extreme.
Only non-error segments are used in training to avoid bias.
๐ง Step 2: Graph Construction and Augmentation
A graph is created where nodes are road segments and edges represent physical connections or added similarity links:
- Base Graph: Built from spatial adjacency (segments touching).
- Augmented Edges:
- Based on land use features (grouped).
- Top-N nodes with highest intensity in a group are selected.
- KNN is applied on their normalized feature vectors.
- Pairs are added as new edges if:
- Cosine similarity exceeds threshold.
- Physical distance is between min and max bounds.
- Nodes aren't too close in original graph (by hop distance).
- Edge limits per node and global max edge constraints are respected.
๐งช Step 3: Model Training & Evaluation
Two GNN architectures were used:
- GCN: Graph Convolutional Network with 3 layers.
- GAT: Graph Attention Network with 3 layers and 2 heads.
Optuna Optimization
Optuna is used to optimize hyperparameters like:
top_n
: number of high-intensity nodes per groupneighbors
: number of KNN neighborssim_thresh
: similarity threshold for adding edgesmin_dist
,max_dist
: spatial bounds for edge creationhop_thresh
: how close nodes can be in original graphmax_edges
: global cap on new edgesper_node_cap
: edge cap per node
Why Optuna?
Optuna uses:
- TPE (Tree-structured Parzen Estimator): a Bayesian optimizer that models promising vs. unpromising hyperparameter regions.
- Early Stopping (Pruning): unpromising trials are aborted early (e.g., using
HyperbandPruner
) to save time.
This makes Optuna efficient for exploring large, high-dimensional spaces with limited resources.
Results of Hyperparameter Search
We ran a total of 100 trials per model, each with a unique combination of graph augmentation parameters. The score in the table below represents the RMSE on the test set for that trial (lower is better).
๐ Top 5 GCN Trials
Trial | RMSE | top_n | neighbors | sim_thresh | min_dist | max_dist | hop_thresh | max_edges | per_node_cap |
---|---|---|---|---|---|---|---|---|---|
16 | 9.58 | 1000 | 120 | 0.9888 | 197 | 1152 | 5 | 3000 | 4 |
6 | 9.59 | 1000 | 50 | 0.9543 | 82 | 2980 | 2 | 3000 | 6 |
0 | 9.62 | 500 | 70 | 0.9806 | 297 | 1783 | 4 | 2000 | 4 |
87 | 9.62 | 1000 | 170 | 0.9906 | 462 | 1173 | 3 | 3000 | 5 |
98 | 9.62 | 1000 | 30 | 0.9477 | 482 | 2689 | 3 | 2000 | 4 |
๐ Top 5 GAT Trials
Trial | RMSE | top_n | neighbors | sim_thresh | min_dist | max_dist | hop_thresh | max_edges | per_node_cap |
---|---|---|---|---|---|---|---|---|---|
40 | 9.40 | 1000 | 40 | 0.9173 | 187 | 1870 | 3 | 3000 | 7 |
33 | 9.40 | 1000 | 60 | 0.9470 | 332 | 2315 | 3 | 2500 | 3 |
37 | 9.40 | 1000 | 180 | 0.9834 | 167 | 1583 | 4 | 4500 | 2 |
44 | 9.42 | 1000 | 90 | 0.9932 | 180 | 1340 | 2 | 2000 | 7 |
25 | 9.42 | 1000 | 130 | 0.9215 | 464 | 1592 | 5 | 5000 | 8 |
Performance Visualization
We visualized all 200 trials across both models. The plot below shows RMSE per trial. You can see that while GAT generally outperforms GCN, both models achieve stable and competitive results across various configurations.
![RMSE Over Trials]
Conclusion
- GAT outperformed GCN slightly, achieving the lowest RMSE of ~9.40.
- Optuna helped efficiently navigate a large search space.
- Outlier filtering ensured cleaner training data and more robust evaluation.