ForagerRL_step15 - gama-platform/gama GitHub Wiki
By Killian Trouillet
After training, we load the shared PPO model and run both foragers in the GAMA GUI.
Open GAMA normally. The GUI server runs on port 1000 by default.
from train_forager_petz import PPOAgent
agent = PPOAgent(state_dim=15, action_dim=2)
agent.load("saved_models/ppo_forager.pth")One model, shared by both foragers — just like training.
obs, _ = env.reset()
done = False
while not done:
actions = {}
for agent_id in env.agents:
action, _, _ = agent.select_action(
np.array(obs[agent_id], dtype=np.float32),
test=True # ← deterministic: use mean action
)
actions[agent_id] = action
obs, rewards, terminations, truncations, _ = env.step(actions)
done = not env.agents or all(terminations.values()) or all(truncations.values())
time.sleep(0.1)In the GAMA GUI:
-
Blue
forager_0navigates with its LIDAR cone around the left obstacles. -
Teal
forager_1follows a slightly different path due to its starting position. - The first to arrive at the green food turns orange and freezes.
- When both are orange, the episode ends: cooperative success.
cd models/petz
python test_forager_petz.py=======================================================
Smart Forager — MARL Test (gama-pettingzoo GUI)
=======================================================
Model loaded (shared by both foragers)
Running 1 cooperative test episodes...
Episode 1/1: ✓ COOPERATIVE SUCCESS! | Steps: 61
forager_0: reward = 89.4
forager_1: reward = 87.1
=======================================================
Test Results Summary
=======================================================
Episodes : 1
Success Rate: 100%
Avg Steps : 61
forager_0 avg reward: 89.4
forager_1 avg reward: 87.1
=======================================================
| Step | Concept introduced |
|---|---|
| 1 | Grid world with grid species |
| 2 | Forager agent with random movement |
| 3 | Reward function and episodes |
| 4 | Q-Table as map<string, float>
|
| 5 | Q-Learning / Bellman equation |
| 6 | Charts, heatmap, test mode |
| Step | Concept introduced |
|---|---|
| 7 | Continuous world, architecture overview |
| 8 |
GymAgent bridge species |
| 9 | LIDAR ray-cast sensors, movement, reward shaping |
| 10 | Headless training with custom PyTorch PPO |
| 11 | GUI testing, deterministic evaluation |
| Step | Concept introduced |
|---|---|
| 12 | PettingZoo Parallel API, PetzAgent bridge, cooperative rewards |
| 13 | Multi-agent GAML model, as_map, team obs, episode-end signal |
| 14 | Parameter-Shared PPO, batch inference, GamaParallelEnv directly |
| 15 | GUI testing, series recap |
| Concept | Part 1 | Part 2 | Part 3 |
|---|---|---|---|
| World | 10×10 grid | 100×100 continuous | Same |
| Agents | 1 | 1 | 2 |
| Actions | 4 discrete | 2D continuous [dx, dy]
|
Same |
| Sensors | Grid position | 8 LIDAR rays | 8 LIDAR + teammate pos |
| Algorithm | Q-Learning | PPO | Parameter-Shared PPO |
| Bridge | None | GymAgent |
PetzAgent |
| Library | None | gama-gymnasium |
gama-pettingzoo |
| RL Framework | None | PyTorch (custom PPO) | Same |
| Task | Solo food | Solo food | Cooperative food |
PetzAgent, agents, possible_agents, observations, rewards, terminations, truncations, actions, update_data, as_map, all_match, contains_key, episode-end via agents <- []
GamaParallelEnv, env.agents, env.observation_space(agent_id), env.action_space(agent_id), env.reset() → dict, env.step(actions_dict) → dict, parameter sharing, batch inference, select_actions_batch(), per-agent RolloutBuffer
| File | Description |
|---|---|
models/petz/forager_petz.gaml |
GAMA model with PetzAgent bridge |
models/petz/train_forager_petz.py |
MARL training script (headless) |
models/petz/test_forager_petz.py |
Testing script (GUI visualization) |