A* Car Go Far - Taaseen-Ali/OpenAI-Gym-Car-Race GitHub Wiki

What is A*

A* (pronounced "A star") is an algorithm that searches for the shortest path between two states. A state in this project is defined to be the entire track environment and the car’s supplementary information including sensor distances, reward history, and crash detection. We define the goal state to be the car (simulated as a point on the track) reaching the coordinates of the red blocks on the race track. All other environment variables are not required.

A* differentiates itself from other path-finding and graph traversing algorithms by adding a heuristic function on top of the path cost. The algorithm takes into account the amount of traversed states needed to get from one state to another as well as a user-defined heuristic that more accurately represents the goal of the simulation. For example, the heuristic could be the manhattan distance between two points on the race track, or the standard deviation between the sensor distances defined by the car object.

The algorithm loops through a priority queue filled with different environment states. The queue is ordered by minimum total cost (path cost + heuristic function). The algorithm generates the children states for the environment found at the top of the queue and adds all children to the queue, reordering the states if necessary. The algorithm will have found the optimal path from start to end state if the queue is empty or if a solution to the simulation is found. Using A*, we hope to automate optimal path finding for randomly generated tracks, using the list of states generated for the resulting path as information for behavioral cloning or a set standard for training our AI.

What we did

In order to accommodate A*, the simulation was extended to include a get_state and set_state function in order to allow for the algorithm to jump to arbitrary points in time in the simulation. All of the state space information needed for the algorithm was pulled from the existing interfaces such as the speed, position and angle of the car, as well as the goal state.

Straight line distance to the finish line of the track from the car was used as the heuristic function at each timestep. Nodes were initially expanded by applying each of the 9 possible actions to the car (all combinations of car.ACCEL, car.REST, car.DECEL for forward/backward movement and car.ACCE_LEFT, car.REST, car.ACCEL_RIGHT for left/right movement), however it was quickly realized that this action space would be too large for a search that would run in a reasonable amount of time. This is because each of the actions described only moves the car by a pixel amount. Moving the car only a couple of blocks forward on the screen would require the algorithm to search through thousands of nodes.

In order to alleviate this problem, the car movements were abstracted away to three larger units: forward, left and right, each of which would move the car in appropriate increments in the desired direction. The resulting search tree was significantly quicker to search through and yielded a final route that was close to completely optimal.

All of the code can be accessed here

Results

The entire search took about thirty minutes to complete. The following is a sample of A* looking for the optimal branch of the search tree in the middle of the search.

A* Branching

This is the final path which was found.

Extensions

The following are some potential areas of exploration which may be of interest to the reader and could be implemented using the simulation and current A* implementation.

Adjustable heuristic dependent on the track environment: Add the ability to read sensor data from the car, extended from the current environment.
Greedy Search: This is an algorithm that uses only the heuristic to determine the best path to the goal.
Weighted A*: We can alter the heuristic so that it is weighted by a constant factor greater than 1 (the so-called detour index). This would, in theory, speed up the search but would return a slightly sub-optimal path.
Behavioral Cloning: Train an AI model to mimic the behavioral qualities of human sub cognitive skills by recording a subject performing a certain task. Instead of a real human as the subject, we can record optimal paths for several random tracks using A* and train a computer program on the dataset. It could be thought of as a form of supervised learning.