ForagerRL_step12 - gama-platform/gama GitHub Wiki
By Killian Trouillet
In Part 2 we trained a single forager to navigate toward food in a continuous world. In this Part 3, we extend that to two foragers that must cooperate: the episode is only a success when both foragers have reached the food.
| Aspect | Part 2 (Gymnasium) | Part 3 (PettingZoo) |
|---|---|---|
| Library | gama-gymnasium |
gama-pettingzoo |
| Bridge species | GymAgent |
PetzAgent |
| Agents | 1 forager | 2 foragers |
| Python env | gym.make(...) |
GamaParallelEnv(...) |
| Step input | Single action
|
Dict {agent_id: action, ...}
|
| Step output | Single obs / reward | Dict of obs / rewards per agent |
| Task | Reach food | Both foragers reach food |
| Training | 1 PPO | 1 independent PPO per agent |
gama-pettingzoo implements PettingZoo's Parallel API, where all agents act simultaneously each step — opposed to RoboArena-style sequential (AEC) environments.
Python GAMA
│ │
│ env.reset() │
│ ────────────────────────────────────► │ Spawns both foragers
│ ◄── {"forager_0": obs, "forager_1": obs}│
│ │
│ env.step({ │
│ "forager_0": action_0, │
│ "forager_1": action_1 │
│ }) │
│ ────────────────────────────────────► │ Both foragers move simultaneously
│ │ Observations, rewards computed
│ ◄── obs, rewards, terminations, truncs │ (one dict per output)
│ │
│ (repeat until env.agents is empty) │
End-of-episode signal: When all agents are done (terminated or truncated), GAMA sets
PetzAgent[0].agents <- []— which makesenv.agentsempty in Python. This is the PettingZoo convention for episode end.
The bridge species in gama-pettingzoo is called PetzAgent. All per-agent data flows through dictionaries keyed by agent ID:
species PetzAgent {
list<string> agents; // active agents this step
list<string> possible_agents; // all agents ever
map<string, unknown> observation_spaces; // one entry per agent
map<string, unknown> action_spaces; // one entry per agent
map<string, unknown> observations; // populated by GAML
map<string, float> rewards;
map<string, bool> terminations;
map<string, bool> truncations;
map<string, unknown> infos;
map<string, unknown> actions; // populated by Python
map<string, unknown> data; // assembled map
// Called by gama-pettingzoo: receives JSON '{"forager_0":[dx,dy],...}'
action set_actions(string json_str) {
actions <- map(from_json(json_str));
return "ok";
}
action update_data {
data <- [
"Observations"::observations,
"Rewards"::rewards,
"Terminations"::terminations,
"Truncations"::truncations,
"Infos"::infos
];
}
}
| Field | GymAgent |
PetzAgent |
|---|---|---|
| Observation |
state (unknown) |
observations (map) |
| Reward |
reward (float) |
rewards (map<string, float>) |
| Termination |
terminated (bool) |
terminations (map<string, bool>) |
| Action | next_action |
actions (map) |
| End signal | terminated = true |
Set agents <- []
|
| Update |
update_data — sends data
|
Same, but dict structure differs |
| Event | Reward | Rationale |
|---|---|---|
| Both at food |
+100 each |
Large cooperative bonus |
| This forager at food (waiting) | +0.5 |
Reached goal, avoiding negative gradient while waiting |
| Getting closer to food | +delta × 2.0 |
Distance shaping (same as Part 2) |
| Each step alive | −0.01 |
Encourage speed |
| Wall friction | −1.00 |
Penalty if agent bumps or rubs against a wall |
| Timeout | −1.0 |
Mild penalty |
The +0.5 waiting reward is critical: without it, an agent that already reached food would receive 0 every step, creating a gradient that could discourage reaching early.
┌────────────────────────────────────────┐ ┌──────────────────────────────┐
│ GAMA (Headless) │ │ Python │
│ │ │ │
│ forager_petz.gaml │◄─►│ train_forager_petz.py │
│ ├─ PetzAgent bridge │ │ ├─ GamaParallelEnv │
│ ├─ forager_0 (blue, LIDAR sensors) │ │ ├─ PPO model (forager_0) │
│ ├─ forager_1 (teal, LIDAR sensors) │ │ ├─ PPO model (forager_1) │
│ ├─ Same continuous world as Part 2 │ │ └─ Episode-based training │
│ └─ Cooperative reward logic │ │ │
│ │ │ test_forager_petz.py │
│ WebSocket (port 1001) │ │ └─ Deterministic evaluation │
└────────────────────────────────────────┘ └──────────────────────────────┘
pip install gama-pettingzoo torchGAMA 2024.09+ with headless server support.
| File | Description |
|---|---|
models/petz/forager_petz.gaml |
GAMA model with PetzAgent bridge |
models/petz/train_forager_petz.py |
MARL training script (headless) |
models/petz/test_forager_petz.py |
Testing script (GUI visualization) |
forager_petz.gaml — the full cooperative multi-agent GAMA model:
/**
* Name: SmartForagerPetz - Cooperative Multi-Agent Forager for PettingZoo
* Author: Killian Trouillet
* Description: Two cooperative foragers that must ALL reach the food.
* Uses the gama-pettingzoo PetzAgent bridge.
* Tags: reinforcement-learning, pettingzoo, marl, cooperative, tutorial
*/
model SmartForagerPetz
global {
float world_size <- 100.0;
point food_location <- {95.0, 95.0};
float food_radius <- 5.0;
list<geometry> obstacles <- [];
int max_steps <- 300;
int current_step <- 0;
int gama_server_port <- 0;
list<string> agent_ids <- ["forager_0", "forager_1"];
list<bool> agents_at_food <- [false, false];
init {
obstacles << square(10) at_location {25.0, 25.0};
obstacles << square(10) at_location {35.0, 25.0};
obstacles << square(10) at_location {25.0, 35.0};
obstacles << square(10) at_location {65.0, 45.0};
obstacles << square(10) at_location {75.0, 45.0};
obstacles << square(10) at_location {75.0, 55.0};
create PetzAgent {
agents <- copy(agent_ids);
possible_agents <- copy(agent_ids);
action_spaces <- agent_ids as_map (each::[
"type"::"Box", "low"::[-1.0, -1.0], "high"::[1.0, 1.0], "dtype"::"float"
]);
observation_spaces <- agent_ids as_map (each::[
"type"::"Box", "low"::list_with(15, 0.0), "high"::list_with(15, 1.0), "dtype"::"float"
]);
loop id over: agent_ids { infos << id::[]; }
}
create forager (agent_id::"forager_0", agent_index::0, location::{15.0, 5.0});
create forager (agent_id::"forager_1", agent_index::1, location::{5.0, 15.0});
ask forager { do compute_observation; }
}
reflex main_loop {
if (PetzAgent[0].actions = nil or empty(PetzAgent[0].actions)) { return; }
current_step <- current_step + 1;
ask forager { do apply_action; do compute_observation; do compute_reward; }
if (agents_at_food all_match each) or (current_step >= max_steps) {
PetzAgent[0].agents <- [];
}
ask PetzAgent { do update_data; }
}
}
species PetzAgent {
list<string> agents;
list<string> possible_agents;
map<string, unknown> observation_spaces;
map<string, unknown> action_spaces;
map<string, unknown> observations;
map<string, float> rewards;
map<string, bool> terminations;
map<string, bool> truncations;
map<string, map<string,unknown>> infos;
map<string, unknown> actions;
map<string, map<string,unknown>> data;
action update_data {
data <- [
"Observations"::observations, "Rewards"::rewards,
"Terminations"::terminations, "Truncations"::truncations, "Infos"::infos
];
}
}
species forager {
string agent_id;
int agent_index;
float sensor_range <- 30.0;
float previous_distance <- 0.0;
bool at_food <- false;
float velocity_x <- 0.0;
float velocity_y <- 0.0;
float friction <- 0.8;
float accel_force <- 1.5;
float max_speed <- 3.0;
bool bumped <- false;
PetzAgent bridge <- PetzAgent[0];
init { previous_distance <- location distance_to food_location; }
action apply_action {
bumped <- false;
if (at_food) { return; }
if (not (bridge.actions contains_key agent_id)) { return; }
list act <- list(bridge.actions[agent_id]);
if (length(act) < 2) { return; }
float ax <- float(act[0]);
float ay <- float(act[1]);
velocity_x <- velocity_x * friction + ax * accel_force;
velocity_y <- velocity_y * friction + ay * accel_force;
float speed <- sqrt(velocity_x^2 + velocity_y^2);
if (speed > max_speed) {
velocity_x <- velocity_x * max_speed / speed;
velocity_y <- velocity_y * max_speed / speed;
}
float move_x <- velocity_x;
float move_y <- velocity_y;
float new_x <- location.x + move_x;
float new_y <- location.y + move_y;
if (new_x < 1.0) { new_x <- 1.0; velocity_x <- 0.0; bumped <- true; }
if (new_x > world_size - 1.0) { new_x <- world_size - 1.0; velocity_x <- 0.0; bumped <- true; }
if (new_y < 1.0) { new_y <- 1.0; velocity_y <- 0.0; bumped <- true; }
if (new_y > world_size - 1.0) { new_y <- world_size - 1.0; velocity_y <- 0.0; bumped <- true; }
point new_loc <- {new_x, new_y};
bool blocked <- false;
loop obs over: obstacles { if (new_loc intersects obs) { blocked <- true; bumped <- true; break; } }
if (!blocked) { location <- new_loc; }
else {
point slide_x <- {new_x, location.y};
point slide_y <- {location.x, new_y};
bool x_ok <- true;
loop obs over: obstacles { if (slide_x intersects obs) { x_ok <- false; break; } }
bool y_ok <- true;
loop obs over: obstacles { if (slide_y intersects obs) { y_ok <- false; break; } }
if (x_ok and y_ok) { location <- (abs(move_x) >= abs(move_y)) ? slide_x : slide_y; }
else if (x_ok) { location <- slide_x; velocity_y <- 0.0; }
else if (y_ok) { location <- slide_y; velocity_x <- 0.0; }
else { velocity_x <- 0.0; velocity_y <- 0.0; }
}
}
action compute_observation {
float dist <- location distance_to food_location;
float angle <- location towards food_location;
list<float> sensors;
loop i from: 0 to: 7 {
float ray_angle <- float(i) * 45.0;
point ray_end <- {location.x + sensor_range * cos(ray_angle), location.y + sensor_range * sin(ray_angle)};
geometry ray <- line([location, ray_end]);
float closest <- 1.0;
geometry border <- polyline([{0,0}, {world_size,0}, {world_size,world_size}, {0,world_size}, {0,0}]);
if (ray intersects border) {
geometry inter <- ray inter border;
if (inter != nil) {
float d <- (location distance_to inter.location) / sensor_range;
closest <- min([closest, d]);
}
}
loop obs over: obstacles {
if (ray intersects obs) {
geometry inter <- ray inter obs;
if (inter != nil) { float d <- (location distance_to inter.location) / sensor_range; closest <- min([closest, d]); }
}
}
add closest to: sensors;
}
float other_x <- 0.5; float other_y <- 0.5;
forager other <- forager first_with (each.agent_id != agent_id);
if (other != nil) { other_x <- other.location.x / world_size; other_y <- other.location.y / world_size; }
bridge.observations[agent_id] <- [
location.x / world_size, location.y / world_size,
min([1.0, dist / (world_size * 1.414)]),
(cos(angle) + 1.0) / 2.0, (sin(angle) + 1.0) / 2.0
] + sensors + [other_x, other_y];
}
action compute_reward {
float dist <- location distance_to food_location;
if (dist < food_radius and not at_food) { at_food <- true; agents_at_food[agent_index] <- true; }
if (agents_at_food all_match each) {
bridge.rewards[agent_id] <- 100.0; bridge.terminations[agent_id] <- true;
bridge.truncations[agent_id] <- false; bridge.infos[agent_id] <- []; return;
}
if (at_food) {
bridge.rewards[agent_id] <- 0.5; bridge.terminations[agent_id] <- false;
bridge.truncations[agent_id] <- false; bridge.infos[agent_id] <- []; return;
}
if (current_step >= max_steps) {
bridge.rewards[agent_id] <- -1.0; bridge.terminations[agent_id] <- false;
bridge.truncations[agent_id] <- true; bridge.infos[agent_id] <- []; return;
}
float delta <- previous_distance - dist;
float step_reward <- delta * 2.0 - 0.01;
if (bumped) { step_reward <- step_reward - 1.0; }
bridge.rewards[agent_id] <- step_reward;
previous_distance <- dist;
bridge.terminations[agent_id] <- false; bridge.truncations[agent_id] <- false; bridge.infos[agent_id] <- [];
}
aspect default {
rgb agent_color <- agent_index = 0 ? #blue : rgb(0, 180, 180);
draw circle(0.8) color: at_food ? #orange : agent_color;
draw circle(sensor_range) color: rgb(agent_color.red, agent_color.green, agent_color.blue, 20)
border: rgb(agent_color.red, agent_color.green, agent_color.blue, 60);
}
}
experiment petz_env {
parameter "communication_port" var: gama_server_port;
output {
display "Multi-Agent World" type: 2d {
graphics "background" { draw rectangle(world_size, world_size) at: {world_size/2, world_size/2} color: rgb(240, 240, 240); }
graphics "obstacles" { loop obs over: obstacles { draw obs color: rgb(80, 80, 80); } }
graphics "food" { draw circle(food_radius) at: food_location color: rgb(50, 180, 50); }
species forager;
}
}
}