Home - tier4/new_planning_framework GitHub Wiki
The proposal to introduce a new framework into Autoware is based on the following two reasons:
- Limitations of the current architecture
- The shift toward AV 2.0 / End-to-End autonomous driving
The current Autoware architecture is based on the Autoware Architecture Proposal introduced 5 years ago.
However, at that time, a perfect architecture had not yet been found, and it was considered the best available architecture given the technological standards of the time. As a result, several trade-offs were made, and it became apparent that there are inherent limitations in its performance as an autonomous driving system due to its architecture. Present-day Autoware is widely used in many places and is even progressing towards commercialization in domains with limited operational design domains, which is commendable. However, when considering services in urban environments like taxi services, improving modules within the current Autoware architecture may not achieve complete autonomy.
Autonomous driving has been an active area of research for several decades and gained significant attention following the DARPA Grand Challenge held between 2005 and 2007. Since the DARPA Challenge, the industry has adopted an approach of dividing the autonomous driving technology stack into components such as high-definition maps, localization, perception, planning, and control. While the field of perception has come to rely heavily on machine learning, much of behavior planning and simulation still depends on rule-based methods. Improvements in planning performance have primarily been achieved by having humans design increasingly detailed rules that govern autonomous vehicle behavior. It has long been believed that, with sufficiently accurate perception, classical robotics-based planning approaches could achieve human-level driving performance. This approach is referred to as "AV 1.0" or "Autonomy 1.0"

While AV 1.0 is sufficient for simple use cases, it is said to be difficult to handle rare events — the so-called “long tail.” For example, classical robotics-based driving behaviors cannot effectively model the complexity and diversity of real-world scenarios, often requiring re-tuning for each geographic region. Additionally, dividing the system into separate components can lead to information loss, making it difficult to respond appropriately to complex situations. In fact, we are currently facing challenges in the Autoware planning architecture, including issues related to information loss caused by component segmentation.
In recent public road testing using Autoware, many overrides have been observed in complex use cases and long-tail scenarios. Trying to address these with the AV 1.0 approach tends to overly complicate the software, which in turn accelerates quality degradation.
In recent years, AV 2.0 has been gaining attention as an approach to address these challenges. While there are various definitions of AV 2.0, in this proposal we define it as follows:
- AV 1.0: An approach in which humans design the methods for improving autonomous driving systems and enhance performance through engineering.
- AV 2.0: An approach where system improvements are driven by data, and the system evolves autonomously without direct human intervention in the improvement process.
In this sense, AV 2.0 represents a shift away from classical robotics and manually engineered designs, toward systems that can scale through data and learning. End-to-End machine learning approaches are considered highly important for realizing AV 2.0. Representative open-source projects include UniAD and VAD. These types of End-to-End autonomous driving systems have already been implemented in some commercial vehicles as Level 2 ADAS (Advanced Driver Assistance Systems).
references:
The current Autoware planning architecture employs a parallel architecture that can switch according to the scenario, using a hierarchical architecture primarily for lane driving scenarios. Currently, most of the planning functions operate within this hierarchical architecture to handle lane driving scenarios. One of the shortcomings of this hierarchical architecture is the discrepancy in decision-making that arises from separating behavior planning and motion planning. This issue was recognized as a challenge of the architecture since its proposal (following image).

In practice, we have faced the shortcomings of the hierarchical architecture, such as implementing speed planning within behavior planning to improve decision-making accuracy. Additionally, based on the concept of micro-autonomy, we have modularized the Planning function into scene modules to enable the on/off switching and expansion of each function. However, in complex scenarios, the cooperation between scene modules becomes necessary, making it difficult for modules to operate independently. As a result, the current Autoware architecture faces increased functional demands in complex scenarios, leading to higher coupling between scene modules and between behavior planning and motion planning. This makes it challenging to switch or expand each function, resulting in more complex code, increased bugs, and higher maintenance costs.
To address this issue, we propose the following two points:
- Introduction of End-to-End autonomous driving
- Development of a framework that enables collaborative operation of various approaches, including End-to-End methods and traditional robotics-based approaches
There are various definitions of End-to-End (E2E) autonomous driving, and terms such as "Monolithic E2E," "Modular E2E," and "2-model E2E" are used differently across the industry. This is largely due to the lack of a clearly defined consensus on what exactly constitutes "E2E."
In this context, we define E2E as a system where both Perception and Planning are implemented using machine learning models. Based on this definition, we categorize E2E into the following two types:
- Modular / 2-model E2E: In this approach, components such as Perception and Planning are individually implemented using machine learning, and the interfaces between modules are explicitly defined. This is conceptually similar to the traditional Autoware architecture, where each module is simply replaced by a learned model.
- Monolithic E2E: In this approach, the entire functionality—from Perception to Planning—is implemented as a single, unified machine learning model.

This approach offers a good balance between performance and interpretability, allowing for more flexible system development and deployment. Key benefits include:
- Well-suited for Level 4 autonomous driving
- Easier to perform testing at the component level
- Shorter iteration cycles for fixing issues and improving specific behaviors
- Higher explainability and interpretability
- Easier integration with traditional robotics-based methods

Monolithic E2E generally offers higher performance than the modular approach, and has shown notable success in Level 2 ADAS applications. However, it also comes with the following challenges:
- Component-level testing is difficult (requires end-to-end simulation for validation)
- Low interpretability due to the model’s black-box nature
- Harder to integrate with traditional robotics-based systems
In conclusion, both approaches have their pros and cons, and the optimal choice depends on the target use case and deployment environment. Therefore, our goal is to develop and support both Modular / 2-model E2E and Monolithic E2E systems in parallel.
The importance of End-to-End autonomous driving has been discussed in the previous sections. However, End-to-End approaches are not without their limitations. It has been pointed out that they struggle to make targeted improvements in specific problematic scenarios, and may even fail to handle simple use cases that traditional AV 1.0 approaches were capable of managing.
As such, every approach has its strengths and weaknesses, and no single method is optimal in all situations. The appropriate technology depends on the deployment environment and the priorities—whether it is ensuring minimum safety guarantees or maximizing overall driving performance.
For example:
- In-factory transport: Since edge cases can be eliminated operationally, rule-based systems that operate reliably as designed are effective.
- Level 4 autonomous bus/taxi: Require an approach that ensures minimum safety while also aiming to improve overall autonomous driving performance.
- ADAS (Advanced Driver Assistance Systems): Edge cases are handled by human override, so the focus is on improving general performance at low cost.
As these examples show, the requirements vary significantly depending on the use case, and no single approach can cover them all.
Autonomous driving systems encompass a wide range of methods—including machine learning, rule-based (hierarchical or parallel architectures), sampling-based, and optimization-based approaches—each with its own advantages and limitations. Depending on the context, the priorities may shift between explainability and safety guarantees, or overall performance and generalizability, requiring different techniques.
In light of these circumstances, we aim to build a framework that enables the flexible combination and selection of different approaches, allowing autonomous driving systems to be adapted to specific needs and environments. This framework will also support the cooperative operation of AV 1.0 approaches—reliable for simple use cases—and End-to-End approaches—robust in complex scenarios—towards the realization of AV 2.0.
Autonomous driving systems employ a variety of approaches, including machine learning, rule-based (hierarchical or parallel architectures), sampling-based, and optimization-based methods. Each has its own advantages and limitations. Furthermore, the requirements for an autonomous driving system vary depending on the operational environment. For example, in factory transport scenarios, guaranteed minimum safety (i.e., explainability) is prioritized. In contrast, for urban driving, overall driving performance becomes more important. Given these differences, it is difficult for a single architecture to satisfy all scenarios effectively.
To flexibly accommodate these diverse approaches, we propose a Generator-Selector Framework. This framework consists of two core components:
- Generator: Generates candidate trajectories that the vehicle can follow.
- Selector: Selects the safest and most optimal trajectory from among the candidates.

This architecture enables unified handling of different approaches while allowing safety and performance considerations to guide the final decision.
The Generator can be any module capable of generating trajectories, regardless of its internal method. For example:
- Rule-based: Existing Autoware planners can be reused as-is.
- Optimization-based: Methods such as Model Predictive Control (MPC).
- Machine learning-based: End-to-End models or diffusion-based planners.
Multiple Generators can be executed in parallel or selectively activated depending on the context.
The Selector is responsible for two main functions:
- Safety Gate (Safety Assurance)
- Validates the output of black-box Generators (e.g., neural networks) to ensure a minimum level of safety.
- Examples:
- A dummy implementation that simply passes through
- A check using an HD map to ensure traffic signals are obeyed
- Ranking (Trajectory Evaluation and Selection)
- Evaluates and ranks the outputs from multiple Generators, then selects the best one.
- Examples:
- Use a robotics-based approach when an HD map is available; fall back to E2E models otherwise
- Score trajectories based on driving policies such as safety, comfort, or rule compliance
These Selector functions are implemented as plugins, allowing developers to switch or extend them easily depending on project needs or development stages.
Reference: https://github.com/orgs/autowarefoundation/discussions/5033
As outlined in the proposal, the main development items are the Collaborative Framework, Modular E2E, and Monolithic E2E. While the development of a world model-based simulator may become necessary as we move forward, we will evaluate that need as the development progresses.

https://github.com/tier4/new_planning_framework/wiki/Motion-Transformer%E2%80%90based-planning-node-for-Autoware https://github.com/tier4/new_planning_framework/wiki/Selector
We plan to further improve the performance and create a Pull Request to Autoware Universe by 9E.
https://github.com/autowarefoundation/autoware/issues/6292
For the modular / 2 models E2E approach, we are testing a method called the Diffusion Planner. The ROS node implementation has been completed, and integration with the Autoware interface has also been finalized. In addition, data was collected in Japan and used to retrain the model.
We have also confirmed that the Autoware tutorial runs successfully. For more details, please refer to the this page.
We plan to further improve the performance and create a Pull Request to Autoware Universe by 9E.
https://github.com/autowarefoundation/autoware/issues/6292
We are currently experimenting with two approaches: VAD and Diffusion Drive. After retraining both methods, we have successfully confirmed that their performance can be reproduced. At the moment, we are in the process of converting VAD and Diffusion Drive into ROS nodes and integrating them with Autoware. Next, we plan to train the models using the CARLA dataset and develop a tutorial that runs within CARLA. Once that is complete, we will create a Pull Request to Autoware Universe.