Motion Transformer‐based planning node for Autoware - tier4/new_planning_framework GitHub Wiki
MTR Trajectory Prediction Node (Autoware Compatible)
This node implements a PyTorch-based trajectory predictor for the ego vehicle using the MTR model (Motion Transformer with Global Intention Localization and Local Movement Refinement). It is fully compatible with Autoware's New Planning Framework.
🎯 Project Objectives
The primary goal of this node was to evaluate the practicality of integrating the MTR model into a planning or simulation pipeline. While MTR is primarily designed for trajectory prediction, this project sought to understand how well it could function — or be adapted — for planning-related tasks.
✅ Key Objectives
-
Model Integration Testing
Integrate the MTR model into a ROS 2-based simulation setup to evaluate its predictions in a closed-loop environment. -
Training Data Requirements
Investigate the amount and type of training data required to achieve generalization in simulation, including:- Quantity: number of scenarios and data richness.
- Diversity: straight vs. curved roads, intersections, roundabouts, etc.
- Structure: need for semantic inputs (e.g., route, traffic light states).
-
Closed-loop Simulation Behavior
Analyze how MTR behaves when its outputs directly influence the ego vehicle’s motion, especially under edge cases or ambiguous scenes. -
Model Limitations & Bottlenecks
Identify practical limitations in training and deployment:- Hardware constraints (e.g., batch size, GPU memory).
- Data preprocessing complexity.
- Latency or inference time in real-time scenarios.
-
Provide a trajectory generator that can be used as a placeholder to develop the selector framework
-
Baseline for Future Work
Establish a reproducible foundation for future improvements, such as:- Route-conditioned prediction.
- Selective trajectory sampling.
- Model fusion with planning heuristics or reinforcement learning.
📌 Summary
This node served as a sandbox for experimentation — a means to quantify MTR’s strengths and weaknesses when stepping beyond its original scope as a predictor. The insights gained here will guide future iterations involving better conditioning, model fusion, and more principled dataset design.
🔧 Overview
The node predicts 6 trajectory modes for the ego vehicle using historical ego and agent data, as well as lanelet2 map information. It runs at 10 Hz, producing predictions that are consumed by downstream selectors and planners within Autoware.
In yellow: MTR node outputs
📥 Inputs
Subscribed ROS 2 Topics
/perception/tracked_objects
/localization/kinematic_state
/map/vector_map
(Lanelet2)
Model Inputs
Key | Description | Shape |
---|---|---|
obj_trajs |
Past agent trajectories (target-centric, embedded) | [1, A, 11, 29] |
obj_trajs_mask |
Mask for valid steps | [1, A, 11] |
map_polylines |
Encoded lanelet2 polyline features | [1, L, 20, 9] |
map_polylines_mask |
Mask for valid polyline points | [1, L, 20] |
map_polylines_center |
Center points of each polyline | [1, L, 3] |
obj_trajs_last_pos |
Final positions of each agent | [1, A, 3] |
intention_points |
Candidate goals for ego | [1, 64, 2] |
track_index_to_predict |
Index of ego in the agent list | [1] |
🧠 Prediction Logic
The node generates 6 future trajectory modes for the ego vehicle. Each trajectory consists of:
- 80 future waypoints (0.1s intervals → 8 seconds)
- Position
(x, y)
and velocity(vx, vy)
- Estimated yaw (computed post-inference)
Mode selection is delegated to downstream nodes.
🧪 Training & Validation
The model was trained using synthetic T4 data, generated by:
- Automatically assigning start/goal poses in an urban map.
- Engaging Autoware and recording 1-minute rosbags.
- Extracting 10-second training scenes from these rosbags.
- Approx. 5000 scenes were created.
- Scene duplication was used to extend the scene amount to ~60000 scenes.
A separate training repository processes this data, and extensive testing confirms that this node's pre-processing produces results identical to the training pipeline.
Validation comparing node vs. training pipeline
🗺️ Map Handling
The node uses a Lanelet2 map, which is converted into a Polyline format compatible with the model. Each polyline consists of:
- Up to 20 points
- 9 features per point (e.g., position, direction, type)
⚠️ Important Notes:
- Map coverage affects prediction quality.
- Including or excluding sidewalks in the polyline map can significantly alter the MTR’s output. For example, if sidewalks are not represented, the model may avoid generating paths that cross them.
Polyline representation of lanelet2 map
⚡ Performance
- Inference frequency: 10 Hz
- GPU latency: ~30–40 ms (on RTX 3090)
- CPU-only inference may struggle to keep real-time speed
🧪 Simulation Tests
Although the training process showed promising quantitative results, simulation performance was suboptimal. Most of the predicted trajectory modes were unsafe, often veering outside the map boundaries. While the MTR model appeared to learn general motion patterns from the training data, its lack of high-level guidance (e.g., route planning, lane-following commands, turn intentions) severely limited its real-world applicability in closed-loop simulation.
This aligns with the original design of MTR as a trajectory prediction model — not a planner. Without an explicit route or semantic command input, the ego vehicle lacks the context necessary for reliable decision-making during autonomous navigation. checkpoint_40.webm
Example simulation output showing trajectory drift and mode dispersion
⚠️ Key Observations
- Synthetic training data limitations: Training on naively generated synthetic scenarios (e.g., randomized start/goal pairs) introduced challenges in coverage, relevance, and diversity.
- Long training cycles: Training times were significant, especially given the volume of synthetic data required to achieve generalization.
- Scenario mining was lacking: No curriculum learning or hard-negative sampling was used, which likely contributed to poor behavior in corner cases.
- Instability in PSim datasets: Results were inconsistent, particularly in complex scenes such as curves and intersections.
- Hardware bottlenecks: Training was constrained to a batch size of 6 due to GPU memory limits (RTX 3090), which likely impacted stability and convergence.
- Checkpoint quality: Despite fine-tuning, the T4-generated model checkpoints did not generalize well and failed to produce consistent, safe trajectories in simulation.
💡 Takeaway
MTR is not a plug-and-play planner. Its predictions are highly sensitive to input signals, map structure, and training diversity. In safety-critical applications like motion planning, additional modules — such as goal conditioning, semantic intention inputs, or downstream selectors — are essential to ensure safe operation.
This aligns with the initial project objectives, which sought to assess whether MTR could be effectively used in a planning context. Through integration testing, closed-loop simulation, and training analysis, it became evident that while MTR captures general motion patterns, it lacks the architectural components and robustness required for direct deployment in a planning stack.
As such, the findings strongly support further exploration of:
- Route-conditioned or intention-aware variants of MTR,
- Dataset refinement to include more structured and diverse supervision,
- Hybrid approaches that combine prediction with rule-based or optimization-based selectors.