ForagerRL_step13 - gama-platform/gama GitHub Wiki

13. The Multi-Agent GAML Model

By Killian Trouillet


Step 13: The Multi-Agent GAML Model

Content

This step breaks down every section of forager_petz.gaml, explaining how the single-agent model from Part 2 becomes a cooperative multi-agent system.

The PetzAgent Bridge Species

The entire exchange between GAMA and Python goes through one PetzAgent instance. Define it exactly as follows — the field names are read by the library:

species PetzAgent {
    list<string> agents;
    list<string> possible_agents;
    map<string, unknown> observation_spaces;
    map<string, unknown> action_spaces;

    map<string, unknown> observations;
    map<string, float>   rewards;
    map<string, bool>    terminations;
    map<string, bool>    truncations;
    map<string, unknown> infos;

    map<string, unknown> actions;   // filled by Python each step
    map<string, unknown> data;      // assembled by update_data

    // Called by gama-pettingzoo: receives JSON '{"forager_0":[dx,dy],...}'
    action set_actions(string json_str) {
        actions <- map(from_json(json_str));
        return "ok";
    }

    action update_data {
        data <- [
            "Observations"::observations,
            "Rewards"::rewards,
            "Terminations"::terminations,
            "Truncations"::truncations,
            "Infos"::infos
        ];
    }
}

Important: The data keys must be exactly "Observations", "Rewards", "Terminations", "Truncations", "Infos" — those are what GamaParallelEnv.step() reads on the Python side.


Registering Spaces in init

Instead of tedious individual assignments, we use GAML's as_map operator to build both space maps in a single line:

create PetzAgent {
    agents          <- copy(agent_ids);
    possible_agents <- copy(agent_ids);

    // Action: Box[-1,1]² (dx, dy) — same for each agent
    action_spaces <- agent_ids as_map (each::[
        "type"::"Box", "low"::[-1.0, -1.0], "high"::[1.0, 1.0], "dtype"::"float"
    ]);

    // Observation: 15 floats in [0,1]
    observation_spaces <- agent_ids as_map (each::[
        "type"::"Box", "low"::list_with(15, 0.0), "high"::list_with(15, 1.0), "dtype"::"float"
    ]);

    loop id over: agent_ids { infos << id::[]; }
}

The as_map pattern (list as_map (each::value)) builds a map<string, unknown> from the list of agent IDs — clean and scalable to any number of agents.


The Observation Vector: 15 Values

Each forager still uses the 13-value observation from Part 2, plus 2 values for the teammate's position:

Index Value Description
0–1 own (x, y) Normalized position
2 dist / max_dist Distance to food
3–4 (cos, sin) Food direction, normalized to [0,1]
5–12 8 LIDAR sensors Obstacle distances at 45° intervals
13–14 other (x, y) Teammate's normalized position

The 8 LIDAR sensors are unchanged from Part 2. Adding the teammate's position (indices 13–14) allows each agent to develop implicit coordination strategies.

Observations are written directly into bridge.observations[agent_id] — a map entry:

bridge.observations[agent_id] <- [
    location.x / world_size,
    location.y / world_size,
    min([1.0, dist / (world_size * 1.414)]),
    (cos(angle) + 1.0) / 2.0,
    (sin(angle) + 1.0) / 2.0
] + sensors + [other_x, other_y];

Applying Actions

Actions come in through bridge.actions, a map of {agent_id → [dx, dy]}:

action apply_action {
    if (at_food) { return; }   // frozen once at food
    if (not (bridge.actions contains_key agent_id)) { return; }

    list act <- list(bridge.actions[agent_id]);
    float dx <- float(act[0]);
    float dy <- float(act[1]);
    // ... same movement + collision logic as Part 2 ...
}

Frozen foragers: once at_food = true, the forager stops moving and receives a small +0.5 holding reward every step until its teammate arrives.


Reward and Termination Logic

action compute_reward {
    float dist <- location distance_to food_location;

    // Just reached food
    if (dist < food_radius and not at_food) {
        at_food <- true;
        agents_at_food[agent_index] <- true;
    }

    // Cooperative success: BOTH at food
    if (agents_at_food all_match each) {
        bridge.rewards[agent_id]      <- 100.0;
        bridge.terminations[agent_id] <- true;
        bridge.truncations[agent_id]  <- false;
        bridge.infos[agent_id]        <- [];
        return;
    }

    // Waiting for teammate
    if (at_food) {
        bridge.rewards[agent_id] <- 0.5;
        bridge.terminations[agent_id] <- false; ...
        return;
    }

    // Timeout
    if (current_step >= max_steps) {
        bridge.rewards[agent_id] <- -1.0;
        bridge.truncations[agent_id] <- true; ...
        return;
    }

    // Distance shaping
    float delta <- previous_distance - dist;
    bridge.rewards[agent_id] <- delta * 2.0 - 0.01;
    previous_distance <- dist;
    bridge.terminations[agent_id] <- false; ...
}

Signaling Episode End

PettingZoo uses an empty env.agents list to signal that the episode is over. We mirror this in GAML:

reflex main_loop {
    ...
    // After all foragers have acted and rewards are computed:
    if (agents_at_food all_match each) or (current_step >= max_steps) {
        PetzAgent[0].agents <- [];  // ← tells Python the episode is done
    }
    ask PetzAgent { do update_data; }
}

This is the correct gama-pettingzoo convention (as seen in the Prison Escape example).


Visualization

aspect default {
    rgb agent_color <- agent_index = 0 ? #blue : rgb(0, 180, 180);
    draw circle(0.8) color: at_food ? #orange : agent_color;
    draw circle(sensor_range)
        color:  rgb(agent_color.red, agent_color.green, agent_color.blue, 20)
        border: rgb(agent_color.red, agent_color.green, agent_color.blue, 60);
}
  • Blueforager_0 (normal) / Orange → reached food, waiting
  • Tealforager_1 (normal) / Orange → reached food, waiting

Complete Model

Full forager_petz.gaml — everything introduced across steps 12 and 13:

/**
 * Name: SmartForagerPetz - Cooperative Multi-Agent Forager for PettingZoo
 * Author: Killian Trouillet
 * Description: Two cooperative foragers that must ALL reach the food.
 *              Uses the gama-pettingzoo PetzAgent bridge.
 * Tags: reinforcement-learning, pettingzoo, marl, cooperative, tutorial
 */

model SmartForagerPetz

global {
 float world_size <- 100.0;
 point food_location <- {95.0, 95.0};
 float food_radius <- 5.0;
 list<geometry> obstacles <- [];
 int max_steps <- 300;
 int current_step <- 0;
 int gama_server_port <- 0;

 list<string> agent_ids <- ["forager_0", "forager_1"];
 list<bool> agents_at_food <- [false, false];

 init {
  obstacles << square(10) at_location {25.0, 25.0};
  obstacles << square(10) at_location {35.0, 25.0};
  obstacles << square(10) at_location {25.0, 35.0};
  obstacles << square(10) at_location {65.0, 45.0};
  obstacles << square(10) at_location {75.0, 45.0};
  obstacles << square(10) at_location {75.0, 55.0};

  create PetzAgent {
   agents          <- copy(agent_ids);
   possible_agents <- copy(agent_ids);
   action_spaces <- agent_ids as_map (each::[
    "type"::"Box", "low"::[-1.0, -1.0], "high"::[1.0, 1.0], "dtype"::"float"
   ]);
   observation_spaces <- agent_ids as_map (each::[
    "type"::"Box", "low"::list_with(15, 0.0), "high"::list_with(15, 1.0), "dtype"::"float"
   ]);
   loop id over: agent_ids { infos << id::[]; }
  }

  create forager with: [agent_id::"forager_0", agent_index::0, location::{15.0,  5.0}];
  create forager with: [agent_id::"forager_1", agent_index::1, location::{5.0, 15.0}];
  ask forager { do compute_observation; }
 }

 reflex main_loop {
  if (PetzAgent[0].actions = nil or empty(PetzAgent[0].actions)) { return; }
  current_step <- current_step + 1;
  ask forager { do apply_action; do compute_observation; do compute_reward; }
  if (agents_at_food all_match each) or (current_step >= max_steps) {
   PetzAgent[0].agents <- [];
  }
  ask PetzAgent { do update_data; }
 }
}

species PetzAgent {
 list<string> agents;
 list<string> possible_agents;
 map<string, unknown> observation_spaces;
 map<string, unknown> action_spaces;
 map<string, unknown>             observations;
 map<string, float>               rewards;
 map<string, bool>                terminations;
 map<string, bool>                truncations;
 map<string, map<string,unknown>> infos;
 map<string, unknown>             actions;
 map<string, map<string,unknown>> data;

 action update_data {
  data <- [
   "Observations"::observations, "Rewards"::rewards,
   "Terminations"::terminations, "Truncations"::truncations, "Infos"::infos
  ];
 }
}

species forager {
 string agent_id;
 int    agent_index;
 float  sensor_range      <- 30.0;
 float  previous_distance <- 0.0;
 bool   at_food           <- false;
 float  velocity_x        <- 0.0;
 float  velocity_y        <- 0.0;
 float  friction          <- 0.8;
 float  accel_force       <- 1.5;
 float  max_speed         <- 3.0;
 bool   bumped            <- false;
 PetzAgent bridge         <- PetzAgent[0];

 init { previous_distance <- location distance_to food_location; }

 action apply_action {
  bumped <- false;
  if (at_food) { return; }
  if (not (bridge.actions contains_key agent_id)) { return; }
  list act <- list(bridge.actions[agent_id]);
  if (length(act) < 2) { return; }
  float ax <- float(act[0]);
  float ay <- float(act[1]);
  velocity_x <- velocity_x * friction + ax * accel_force;
  velocity_y <- velocity_y * friction + ay * accel_force;
  float speed <- sqrt(velocity_x^2 + velocity_y^2);
  if (speed > max_speed) {
   velocity_x <- velocity_x * max_speed / speed;
   velocity_y <- velocity_y * max_speed / speed;
  }
  float move_x <- velocity_x;
  float move_y <- velocity_y;
  float new_x <- location.x + move_x;
  float new_y <- location.y + move_y;
  if (new_x < 1.0) { new_x <- 1.0; velocity_x <- 0.0; bumped <- true; }
  if (new_x > world_size - 1.0) { new_x <- world_size - 1.0; velocity_x <- 0.0; bumped <- true; }
  if (new_y < 1.0) { new_y <- 1.0; velocity_y <- 0.0; bumped <- true; }
  if (new_y > world_size - 1.0) { new_y <- world_size - 1.0; velocity_y <- 0.0; bumped <- true; }
  point new_loc <- {new_x, new_y};
  bool blocked <- false;
  loop obs over: obstacles { if (new_loc intersects obs) { blocked <- true; bumped <- true; break; } }
  if (!blocked) { location <- new_loc; }
  else {
   point slide_x <- {new_x, location.y};
   point slide_y <- {location.x, new_y};
   bool x_ok <- true;
   loop obs over: obstacles { if (slide_x intersects obs) { x_ok <- false; break; } }
   bool y_ok <- true;
   loop obs over: obstacles { if (slide_y intersects obs) { y_ok <- false; break; } }
   if (x_ok and y_ok) { location <- (abs(move_x) >= abs(move_y)) ? slide_x : slide_y; }
   else if (x_ok) { location <- slide_x; velocity_y <- 0.0; }
   else if (y_ok) { location <- slide_y; velocity_x <- 0.0; }
   else { velocity_x <- 0.0; velocity_y <- 0.0; }
  }
 }

 action compute_observation {
  float dist  <- location distance_to food_location;
  float angle <- location towards food_location;
  list<float> sensors;
  loop i from: 0 to: 7 {
   float ray_angle <- float(i) * 45.0;
   point ray_end <- {location.x + sensor_range * cos(ray_angle), location.y + sensor_range * sin(ray_angle)};
   geometry ray <- line([location, ray_end]);
   float closest <- 1.0;
   geometry border <- polyline([{0,0}, {world_size,0}, {world_size,world_size}, {0,world_size}, {0,0}]);
   if (ray intersects border) {
    geometry inter <- ray inter border;
    if (inter != nil) { float d <- (location distance_to inter.location) / sensor_range; closest <- min([closest, d]); }
   }
   loop obs over: obstacles {
    if (ray intersects obs) {
     geometry inter <- ray inter obs;
     if (inter != nil) { float d <- (location distance_to inter.location) / sensor_range; closest <- min([closest, d]); }
    }
   }
   add closest to: sensors;
  }
  float other_x <- 0.5; float other_y <- 0.5;
  forager other <- forager first_with (each.agent_id != agent_id);
  if (other != nil) { other_x <- other.location.x / world_size; other_y <- other.location.y / world_size; }
  bridge.observations[agent_id] <- [
   location.x / world_size, location.y / world_size,
   min([1.0, dist / (world_size * 1.414)]),
   (cos(angle) + 1.0) / 2.0, (sin(angle) + 1.0) / 2.0
  ] + sensors + [other_x, other_y];
 }

 action compute_reward {
  float dist <- location distance_to food_location;
  if (dist < food_radius and not at_food) { at_food <- true; agents_at_food[agent_index] <- true; }
  if (agents_at_food all_match each) {
   bridge.rewards[agent_id] <- 100.0; bridge.terminations[agent_id] <- true;
   bridge.truncations[agent_id] <- false; bridge.infos[agent_id] <- []; return;
  }
  if (at_food) {
   bridge.rewards[agent_id] <- 0.5; bridge.terminations[agent_id] <- false;
   bridge.truncations[agent_id] <- false; bridge.infos[agent_id] <- []; return;
  }
  if (current_step >= max_steps) {
   bridge.rewards[agent_id] <- -1.0; bridge.terminations[agent_id] <- false;
   bridge.truncations[agent_id] <- true; bridge.infos[agent_id] <- []; return;
  }
  float delta <- previous_distance - dist;
  float step_reward <- delta * 2.0 - 0.01;
  if (bumped) { step_reward <- step_reward - 1.0; }
  bridge.rewards[agent_id] <- step_reward;
  previous_distance <- dist;
  bridge.terminations[agent_id] <- false; bridge.truncations[agent_id] <- false; bridge.infos[agent_id] <- [];
 }

 aspect default {
  rgb agent_color <- agent_index = 0 ? #blue : rgb(0, 180, 180);
  draw circle(0.8) color: at_food ? #orange : agent_color;
  draw circle(sensor_range) color: rgb(agent_color.red, agent_color.green, agent_color.blue, 20)
   border: rgb(agent_color.red, agent_color.green, agent_color.blue, 60);
 }
}

experiment petz_env {
 parameter "communication_port" var: gama_server_port;
 output {
  display "Multi-Agent World" type: 2d {
   graphics "background" { draw rectangle(world_size, world_size) at: {world_size/2, world_size/2} color: rgb(240, 240, 240); }
   graphics "obstacles" { loop obs over: obstacles { draw obs color: rgb(80, 80, 80); } }
   graphics "food" { draw circle(food_radius) at: food_location color: rgb(50, 180, 50); }
   species forager;
  }
 }
}
⚠️ **GitHub.com Fallback** ⚠️