ForagerRL - gama-platform/gama GitHub Wiki
By Killian Trouillet
Welcome to the comprehensive tutorial on Reinforcement Learning with the GAMA platform. You will build a forager agent that learns to navigate toward food while avoiding obstacles β from a simple grid world to a continuous environment trained with Deep RL.
Build a tabular Q-Learning agent entirely in GAML, step by step:
- Step 1: The Grid World β Create the 10Γ10 environment with food and obstacles.
- Step 2: The Forager Agent β Define a simple agent that moves randomly.
- Step 3: Rewards and Episodes β Implement the reward system and simulation resets.
-
Step 4: The Q-Table β Set up the agent's memory using
map<string, float>. - Step 5: Q-Learning Algorithm β Implement the Bellman equation and Ξ΅-greedy policy.
- Step 6: Visualization & Automatic Test β Add charts, heatmaps, and evaluate the learned policy.

In this part, we move from the grid world to a continuous environment and train a neural network using PPO via the gama-gymnasium Python bridge.
- Step 7: Introduction & The Continuous World β Why Deep RL? Architecture overview. Continuous world setup.
- Step 8: The GymAgent Bridge β The bridge species, spaces, and GAMAβPython communication.
- Step 9: Sensors, Movement & Rewards β Ray-cast sensors, velocity actions, reward shaping. Complete GAML model.
- Step 10: Headless Training with PPO β Python script, PPO explained, training process.
- Step 11: Testing in GAMA GUI β Load and visualize the trained policy. Summary.
In this part, we extend the continuous world to multiple foragers that must cooperate: using gama-pettingzoo and independent PPO models, both agents must reach the food together.
-
Step 12: From Single Agent to Multi-Agent β PettingZoo Parallel API,
PetzAgentbridge, cooperative reward design. - Step 13: The Multi-Agent GAML Model β Multi-forager species, observation sharing, reward logic.
-
Step 14: Training Multiple Agents β Independent PPO,
PetzSingleAgentEnvGymnasium wrapper, alternating training rounds. - Step 15: Testing & Tutorial Summary β GUI testing, success criteria, recap of all 3 parts.