ForagerRL_step13 - gama-platform/gama GitHub Wiki
By Killian Trouillet
This step breaks down every section of forager_petz.gaml, explaining how the single-agent model from Part 2 becomes a cooperative multi-agent system.
The entire exchange between GAMA and Python goes through one PetzAgent instance. Define it exactly as follows — the field names are read by the library:
species PetzAgent {
list<string> agents;
list<string> possible_agents;
map<string, unknown> observation_spaces;
map<string, unknown> action_spaces;
map<string, unknown> observations;
map<string, float> rewards;
map<string, bool> terminations;
map<string, bool> truncations;
map<string, unknown> infos;
map<string, unknown> actions; // filled by Python each step
map<string, unknown> data; // assembled by update_data
// Called by gama-pettingzoo: receives JSON '{"forager_0":[dx,dy],...}'
action set_actions(string json_str) {
actions <- map(from_json(json_str));
return "ok";
}
action update_data {
data <- [
"Observations"::observations,
"Rewards"::rewards,
"Terminations"::terminations,
"Truncations"::truncations,
"Infos"::infos
];
}
}
Important: The
datakeys must be exactly"Observations","Rewards","Terminations","Truncations","Infos"— those are whatGamaParallelEnv.step()reads on the Python side.
Instead of tedious individual assignments, we use GAML's as_map operator to build both space maps in a single line:
create PetzAgent {
agents <- copy(agent_ids);
possible_agents <- copy(agent_ids);
// Action: Box[-1,1]² (dx, dy) — same for each agent
action_spaces <- agent_ids as_map (each::[
"type"::"Box", "low"::[-1.0, -1.0], "high"::[1.0, 1.0], "dtype"::"float"
]);
// Observation: 15 floats in [0,1]
observation_spaces <- agent_ids as_map (each::[
"type"::"Box", "low"::list_with(15, 0.0), "high"::list_with(15, 1.0), "dtype"::"float"
]);
loop id over: agent_ids { infos << id::[]; }
}
The as_map pattern (list as_map (each::value)) builds a map<string, unknown> from the list of agent IDs — clean and scalable to any number of agents.
Each forager still uses the 13-value observation from Part 2, plus 2 values for the teammate's position:
| Index | Value | Description |
|---|---|---|
| 0–1 | own (x, y)
|
Normalized position |
| 2 | dist / max_dist |
Distance to food |
| 3–4 | (cos, sin) |
Food direction, normalized to [0,1] |
| 5–12 | 8 LIDAR sensors | Obstacle distances at 45° intervals |
| 13–14 | other (x, y)
|
Teammate's normalized position |
The 8 LIDAR sensors are unchanged from Part 2. Adding the teammate's position (indices 13–14) allows each agent to develop implicit coordination strategies.
Observations are written directly into bridge.observations[agent_id] — a map entry:
bridge.observations[agent_id] <- [
location.x / world_size,
location.y / world_size,
min([1.0, dist / (world_size * 1.414)]),
(cos(angle) + 1.0) / 2.0,
(sin(angle) + 1.0) / 2.0
] + sensors + [other_x, other_y];
Actions come in through bridge.actions, a map of {agent_id → [dx, dy]}:
action apply_action {
if (at_food) { return; } // frozen once at food
if (not (bridge.actions contains_key agent_id)) { return; }
list act <- list(bridge.actions[agent_id]);
float dx <- float(act[0]);
float dy <- float(act[1]);
// ... same movement + collision logic as Part 2 ...
}
Frozen foragers: once
at_food = true, the forager stops moving and receives a small+0.5holding reward every step until its teammate arrives.
action compute_reward {
float dist <- location distance_to food_location;
// Just reached food
if (dist < food_radius and not at_food) {
at_food <- true;
agents_at_food[agent_index] <- true;
}
// Cooperative success: BOTH at food
if (agents_at_food all_match each) {
bridge.rewards[agent_id] <- 100.0;
bridge.terminations[agent_id] <- true;
bridge.truncations[agent_id] <- false;
bridge.infos[agent_id] <- [];
return;
}
// Waiting for teammate
if (at_food) {
bridge.rewards[agent_id] <- 0.5;
bridge.terminations[agent_id] <- false; ...
return;
}
// Timeout
if (current_step >= max_steps) {
bridge.rewards[agent_id] <- -1.0;
bridge.truncations[agent_id] <- true; ...
return;
}
// Distance shaping
float delta <- previous_distance - dist;
bridge.rewards[agent_id] <- delta * 2.0 - 0.01;
previous_distance <- dist;
bridge.terminations[agent_id] <- false; ...
}
PettingZoo uses an empty env.agents list to signal that the episode is over. We mirror this in GAML:
reflex main_loop {
...
// After all foragers have acted and rewards are computed:
if (agents_at_food all_match each) or (current_step >= max_steps) {
PetzAgent[0].agents <- []; // ← tells Python the episode is done
}
ask PetzAgent { do update_data; }
}
This is the correct gama-pettingzoo convention (as seen in the Prison Escape example).
aspect default {
rgb agent_color <- agent_index = 0 ? #blue : rgb(0, 180, 180);
draw circle(0.8) color: at_food ? #orange : agent_color;
draw circle(sensor_range)
color: rgb(agent_color.red, agent_color.green, agent_color.blue, 20)
border: rgb(agent_color.red, agent_color.green, agent_color.blue, 60);
}
-
Blue →
forager_0(normal) / Orange → reached food, waiting -
Teal →
forager_1(normal) / Orange → reached food, waiting
Full forager_petz.gaml — everything introduced across steps 12 and 13:
/**
* Name: SmartForagerPetz - Cooperative Multi-Agent Forager for PettingZoo
* Author: Killian Trouillet
* Description: Two cooperative foragers that must ALL reach the food.
* Uses the gama-pettingzoo PetzAgent bridge.
* Tags: reinforcement-learning, pettingzoo, marl, cooperative, tutorial
*/
model SmartForagerPetz
global {
float world_size <- 100.0;
point food_location <- {95.0, 95.0};
float food_radius <- 5.0;
list<geometry> obstacles <- [];
int max_steps <- 300;
int current_step <- 0;
int gama_server_port <- 0;
list<string> agent_ids <- ["forager_0", "forager_1"];
list<bool> agents_at_food <- [false, false];
init {
obstacles << square(10) at_location {25.0, 25.0};
obstacles << square(10) at_location {35.0, 25.0};
obstacles << square(10) at_location {25.0, 35.0};
obstacles << square(10) at_location {65.0, 45.0};
obstacles << square(10) at_location {75.0, 45.0};
obstacles << square(10) at_location {75.0, 55.0};
create PetzAgent {
agents <- copy(agent_ids);
possible_agents <- copy(agent_ids);
action_spaces <- agent_ids as_map (each::[
"type"::"Box", "low"::[-1.0, -1.0], "high"::[1.0, 1.0], "dtype"::"float"
]);
observation_spaces <- agent_ids as_map (each::[
"type"::"Box", "low"::list_with(15, 0.0), "high"::list_with(15, 1.0), "dtype"::"float"
]);
loop id over: agent_ids { infos << id::[]; }
}
create forager with: [agent_id::"forager_0", agent_index::0, location::{15.0, 5.0}];
create forager with: [agent_id::"forager_1", agent_index::1, location::{5.0, 15.0}];
ask forager { do compute_observation; }
}
reflex main_loop {
if (PetzAgent[0].actions = nil or empty(PetzAgent[0].actions)) { return; }
current_step <- current_step + 1;
ask forager { do apply_action; do compute_observation; do compute_reward; }
if (agents_at_food all_match each) or (current_step >= max_steps) {
PetzAgent[0].agents <- [];
}
ask PetzAgent { do update_data; }
}
}
species PetzAgent {
list<string> agents;
list<string> possible_agents;
map<string, unknown> observation_spaces;
map<string, unknown> action_spaces;
map<string, unknown> observations;
map<string, float> rewards;
map<string, bool> terminations;
map<string, bool> truncations;
map<string, map<string,unknown>> infos;
map<string, unknown> actions;
map<string, map<string,unknown>> data;
action update_data {
data <- [
"Observations"::observations, "Rewards"::rewards,
"Terminations"::terminations, "Truncations"::truncations, "Infos"::infos
];
}
}
species forager {
string agent_id;
int agent_index;
float sensor_range <- 30.0;
float previous_distance <- 0.0;
bool at_food <- false;
float velocity_x <- 0.0;
float velocity_y <- 0.0;
float friction <- 0.8;
float accel_force <- 1.5;
float max_speed <- 3.0;
bool bumped <- false;
PetzAgent bridge <- PetzAgent[0];
init { previous_distance <- location distance_to food_location; }
action apply_action {
bumped <- false;
if (at_food) { return; }
if (not (bridge.actions contains_key agent_id)) { return; }
list act <- list(bridge.actions[agent_id]);
if (length(act) < 2) { return; }
float ax <- float(act[0]);
float ay <- float(act[1]);
velocity_x <- velocity_x * friction + ax * accel_force;
velocity_y <- velocity_y * friction + ay * accel_force;
float speed <- sqrt(velocity_x^2 + velocity_y^2);
if (speed > max_speed) {
velocity_x <- velocity_x * max_speed / speed;
velocity_y <- velocity_y * max_speed / speed;
}
float move_x <- velocity_x;
float move_y <- velocity_y;
float new_x <- location.x + move_x;
float new_y <- location.y + move_y;
if (new_x < 1.0) { new_x <- 1.0; velocity_x <- 0.0; bumped <- true; }
if (new_x > world_size - 1.0) { new_x <- world_size - 1.0; velocity_x <- 0.0; bumped <- true; }
if (new_y < 1.0) { new_y <- 1.0; velocity_y <- 0.0; bumped <- true; }
if (new_y > world_size - 1.0) { new_y <- world_size - 1.0; velocity_y <- 0.0; bumped <- true; }
point new_loc <- {new_x, new_y};
bool blocked <- false;
loop obs over: obstacles { if (new_loc intersects obs) { blocked <- true; bumped <- true; break; } }
if (!blocked) { location <- new_loc; }
else {
point slide_x <- {new_x, location.y};
point slide_y <- {location.x, new_y};
bool x_ok <- true;
loop obs over: obstacles { if (slide_x intersects obs) { x_ok <- false; break; } }
bool y_ok <- true;
loop obs over: obstacles { if (slide_y intersects obs) { y_ok <- false; break; } }
if (x_ok and y_ok) { location <- (abs(move_x) >= abs(move_y)) ? slide_x : slide_y; }
else if (x_ok) { location <- slide_x; velocity_y <- 0.0; }
else if (y_ok) { location <- slide_y; velocity_x <- 0.0; }
else { velocity_x <- 0.0; velocity_y <- 0.0; }
}
}
action compute_observation {
float dist <- location distance_to food_location;
float angle <- location towards food_location;
list<float> sensors;
loop i from: 0 to: 7 {
float ray_angle <- float(i) * 45.0;
point ray_end <- {location.x + sensor_range * cos(ray_angle), location.y + sensor_range * sin(ray_angle)};
geometry ray <- line([location, ray_end]);
float closest <- 1.0;
geometry border <- polyline([{0,0}, {world_size,0}, {world_size,world_size}, {0,world_size}, {0,0}]);
if (ray intersects border) {
geometry inter <- ray inter border;
if (inter != nil) { float d <- (location distance_to inter.location) / sensor_range; closest <- min([closest, d]); }
}
loop obs over: obstacles {
if (ray intersects obs) {
geometry inter <- ray inter obs;
if (inter != nil) { float d <- (location distance_to inter.location) / sensor_range; closest <- min([closest, d]); }
}
}
add closest to: sensors;
}
float other_x <- 0.5; float other_y <- 0.5;
forager other <- forager first_with (each.agent_id != agent_id);
if (other != nil) { other_x <- other.location.x / world_size; other_y <- other.location.y / world_size; }
bridge.observations[agent_id] <- [
location.x / world_size, location.y / world_size,
min([1.0, dist / (world_size * 1.414)]),
(cos(angle) + 1.0) / 2.0, (sin(angle) + 1.0) / 2.0
] + sensors + [other_x, other_y];
}
action compute_reward {
float dist <- location distance_to food_location;
if (dist < food_radius and not at_food) { at_food <- true; agents_at_food[agent_index] <- true; }
if (agents_at_food all_match each) {
bridge.rewards[agent_id] <- 100.0; bridge.terminations[agent_id] <- true;
bridge.truncations[agent_id] <- false; bridge.infos[agent_id] <- []; return;
}
if (at_food) {
bridge.rewards[agent_id] <- 0.5; bridge.terminations[agent_id] <- false;
bridge.truncations[agent_id] <- false; bridge.infos[agent_id] <- []; return;
}
if (current_step >= max_steps) {
bridge.rewards[agent_id] <- -1.0; bridge.terminations[agent_id] <- false;
bridge.truncations[agent_id] <- true; bridge.infos[agent_id] <- []; return;
}
float delta <- previous_distance - dist;
float step_reward <- delta * 2.0 - 0.01;
if (bumped) { step_reward <- step_reward - 1.0; }
bridge.rewards[agent_id] <- step_reward;
previous_distance <- dist;
bridge.terminations[agent_id] <- false; bridge.truncations[agent_id] <- false; bridge.infos[agent_id] <- [];
}
aspect default {
rgb agent_color <- agent_index = 0 ? #blue : rgb(0, 180, 180);
draw circle(0.8) color: at_food ? #orange : agent_color;
draw circle(sensor_range) color: rgb(agent_color.red, agent_color.green, agent_color.blue, 20)
border: rgb(agent_color.red, agent_color.green, agent_color.blue, 60);
}
}
experiment petz_env {
parameter "communication_port" var: gama_server_port;
output {
display "Multi-Agent World" type: 2d {
graphics "background" { draw rectangle(world_size, world_size) at: {world_size/2, world_size/2} color: rgb(240, 240, 240); }
graphics "obstacles" { loop obs over: obstacles { draw obs color: rgb(80, 80, 80); } }
graphics "food" { draw circle(food_radius) at: food_location color: rgb(50, 180, 50); }
species forager;
}
}
}