ForagerRL_step12 - gama-platform/gama GitHub Wiki

12. From Single Agent to Multi-Agent

By Killian Trouillet


Step 12: From Single Agent to Multi-Agent

Content

In Part 2 we trained a single forager to navigate toward food in a continuous world. In this Part 3, we extend that to two foragers that must cooperate: the episode is only a success when both foragers have reached the food.

What Changes from Part 2

Aspect Part 2 (Gymnasium) Part 3 (PettingZoo)
Library gama-gymnasium gama-pettingzoo
Bridge species GymAgent PetzAgent
Agents 1 forager 2 foragers
Python env gym.make(...) GamaParallelEnv(...)
Step input Single action Dict {agent_id: action, ...}
Step output Single obs / reward Dict of obs / rewards per agent
Task Reach food Both foragers reach food
Training 1 PPO 1 independent PPO per agent

The PettingZoo Parallel API

gama-pettingzoo implements PettingZoo's Parallel API, where all agents act simultaneously each step — opposed to RoboArena-style sequential (AEC) environments.

Python                                    GAMA
  │                                          │
  │  env.reset()                             │
  │ ────────────────────────────────────►    │  Spawns both foragers
  │  ◄── {"forager_0": obs, "forager_1": obs}│
  │                                          │
  │  env.step({                              │
  │    "forager_0": action_0,                │
  │    "forager_1": action_1                 │
  │  })                                      │
  │ ────────────────────────────────────►    │  Both foragers move simultaneously
  │                                          │  Observations, rewards computed
  │  ◄── obs, rewards, terminations, truncs  │  (one dict per output)
  │                                          │
  │  (repeat until env.agents is empty)      │

End-of-episode signal: When all agents are done (terminated or truncated), GAMA sets PetzAgent[0].agents <- [] — which makes env.agents empty in Python. This is the PettingZoo convention for episode end.


The PetzAgent Bridge

The bridge species in gama-pettingzoo is called PetzAgent. All per-agent data flows through dictionaries keyed by agent ID:

species PetzAgent {
    list<string> agents;                          // active agents this step
    list<string> possible_agents;                 // all agents ever
    map<string, unknown> observation_spaces;       // one entry per agent
    map<string, unknown> action_spaces;            // one entry per agent

    map<string, unknown> observations;            // populated by GAML
    map<string, float>   rewards;
    map<string, bool>    terminations;
    map<string, bool>    truncations;
    map<string, unknown> infos;

    map<string, unknown> actions;                 // populated by Python
    map<string, unknown> data;                    // assembled map

    // Called by gama-pettingzoo: receives JSON '{"forager_0":[dx,dy],...}'
    action set_actions(string json_str) {
        actions <- map(from_json(json_str));
        return "ok";
    }

    action update_data {
        data <- [
            "Observations"::observations,
            "Rewards"::rewards,
            "Terminations"::terminations,
            "Truncations"::truncations,
            "Infos"::infos
        ];
    }
}

Key differences from GymAgent (Part 2)

Field GymAgent PetzAgent
Observation state (unknown) observations (map)
Reward reward (float) rewards (map<string, float>)
Termination terminated (bool) terminations (map<string, bool>)
Action next_action actions (map)
End signal terminated = true Set agents <- []
Update update_data — sends data Same, but dict structure differs

Cooperative Reward Design

Event Reward Rationale
Both at food +100 each Large cooperative bonus
This forager at food (waiting) +0.5 Reached goal, avoiding negative gradient while waiting
Getting closer to food +delta × 2.0 Distance shaping (same as Part 2)
Each step alive −0.01 Encourage speed
Wall friction −1.00 Penalty if agent bumps or rubs against a wall
Timeout −1.0 Mild penalty

The +0.5 waiting reward is critical: without it, an agent that already reached food would receive 0 every step, creating a gradient that could discourage reaching early.


Architecture Overview

┌────────────────────────────────────────┐   ┌──────────────────────────────┐
│        GAMA (Headless)                 │   │          Python              │
│                                        │   │                              │
│  forager_petz.gaml                     │◄─►│  train_forager_petz.py       │
│  ├─ PetzAgent bridge                   │   │  ├─ GamaParallelEnv          │
│  ├─ forager_0 (blue, LIDAR sensors)    │   │  ├─ PPO model (forager_0)    │
│  ├─ forager_1 (teal, LIDAR sensors)    │   │  ├─ PPO model (forager_1)    │
│  ├─ Same continuous world as Part 2    │   │  └─ Episode-based training   │
│  └─ Cooperative reward logic           │   │                              │
│                                        │   │  test_forager_petz.py        │
│     WebSocket (port 1001)              │   │  └─ Deterministic evaluation │
└────────────────────────────────────────┘   └──────────────────────────────┘

Prerequisites

pip install gama-pettingzoo torch

GAMA 2024.09+ with headless server support.


Key Files

File Description
models/petz/forager_petz.gaml GAMA model with PetzAgent bridge
models/petz/train_forager_petz.py MARL training script (headless)
models/petz/test_forager_petz.py Testing script (GUI visualization)

Complete Model

forager_petz.gaml — the full cooperative multi-agent GAMA model:

/**
 * Name: SmartForagerPetz - Cooperative Multi-Agent Forager for PettingZoo
 * Author: Killian Trouillet
 * Description: Two cooperative foragers that must ALL reach the food.
 *              Uses the gama-pettingzoo PetzAgent bridge.
 * Tags: reinforcement-learning, pettingzoo, marl, cooperative, tutorial
 */

model SmartForagerPetz

global {
	float world_size <- 100.0;
	point food_location <- {95.0, 95.0};
	float food_radius <- 5.0;
	list<geometry> obstacles <- [];
	int max_steps <- 300;
	int current_step <- 0;
	int gama_server_port <- 0;

	list<string> agent_ids <- ["forager_0", "forager_1"];
	list<bool> agents_at_food <- [false, false];

	init {
		obstacles << square(10) at_location {25.0, 25.0};
		obstacles << square(10) at_location {35.0, 25.0};
		obstacles << square(10) at_location {25.0, 35.0};
		obstacles << square(10) at_location {65.0, 45.0};
		obstacles << square(10) at_location {75.0, 45.0};
		obstacles << square(10) at_location {75.0, 55.0};

		create PetzAgent {
			agents          <- copy(agent_ids);
			possible_agents <- copy(agent_ids);
			action_spaces <- agent_ids as_map (each::[
				"type"::"Box", "low"::[-1.0, -1.0], "high"::[1.0, 1.0], "dtype"::"float"
			]);
			observation_spaces <- agent_ids as_map (each::[
				"type"::"Box", "low"::list_with(15, 0.0), "high"::list_with(15, 1.0), "dtype"::"float"
			]);
			loop id over: agent_ids { infos << id::[]; }
		}

		create forager (agent_id::"forager_0", agent_index::0, location::{15.0,  5.0});
		create forager (agent_id::"forager_1", agent_index::1, location::{5.0, 15.0});
		ask forager { do compute_observation; }
	}

	reflex main_loop {
		if (PetzAgent[0].actions = nil or empty(PetzAgent[0].actions)) { return; }
		current_step <- current_step + 1;
		ask forager { do apply_action; do compute_observation; do compute_reward; }
		if (agents_at_food all_match each) or (current_step >= max_steps) {
			PetzAgent[0].agents <- [];
		}
		ask PetzAgent { do update_data; }
	}
}

species PetzAgent {
	list<string> agents;
	list<string> possible_agents;
	map<string, unknown> observation_spaces;
	map<string, unknown> action_spaces;
	map<string, unknown>             observations;
	map<string, float>               rewards;
	map<string, bool>                terminations;
	map<string, bool>                truncations;
	map<string, map<string,unknown>> infos;
	map<string, unknown>             actions;
	map<string, map<string,unknown>> data;

	action update_data {
		data <- [
			"Observations"::observations, "Rewards"::rewards,
			"Terminations"::terminations, "Truncations"::truncations, "Infos"::infos
		];
	}
}

species forager {
	string agent_id;
	int    agent_index;
	float  sensor_range      <- 30.0;
	float  previous_distance <- 0.0;
	bool   at_food           <- false;
	float  velocity_x        <- 0.0;
	float  velocity_y        <- 0.0;
	float  friction          <- 0.8;
	float  accel_force       <- 1.5;
	float  max_speed         <- 3.0;
	bool   bumped            <- false;
	PetzAgent bridge         <- PetzAgent[0];

	init { previous_distance <- location distance_to food_location; }

	action apply_action {
		bumped <- false;
		if (at_food) { return; }
		if (not (bridge.actions contains_key agent_id)) { return; }
		list act <- list(bridge.actions[agent_id]);
		if (length(act) < 2) { return; }
		float ax <- float(act[0]);
		float ay <- float(act[1]);
		velocity_x <- velocity_x * friction + ax * accel_force;
		velocity_y <- velocity_y * friction + ay * accel_force;
		float speed <- sqrt(velocity_x^2 + velocity_y^2);
		if (speed > max_speed) {
			velocity_x <- velocity_x * max_speed / speed;
			velocity_y <- velocity_y * max_speed / speed;
		}
		float move_x <- velocity_x;
		float move_y <- velocity_y;
		float new_x <- location.x + move_x;
		float new_y <- location.y + move_y;
		if (new_x < 1.0) { new_x <- 1.0; velocity_x <- 0.0; bumped <- true; }
		if (new_x > world_size - 1.0) { new_x <- world_size - 1.0; velocity_x <- 0.0; bumped <- true; }
		if (new_y < 1.0) { new_y <- 1.0; velocity_y <- 0.0; bumped <- true; }
		if (new_y > world_size - 1.0) { new_y <- world_size - 1.0; velocity_y <- 0.0; bumped <- true; }
		point new_loc <- {new_x, new_y};
		bool blocked <- false;
		loop obs over: obstacles { if (new_loc intersects obs) { blocked <- true; bumped <- true; break; } }
		if (!blocked) { location <- new_loc; }
		else {
			point slide_x <- {new_x, location.y};
			point slide_y <- {location.x, new_y};
			bool x_ok <- true;
			loop obs over: obstacles { if (slide_x intersects obs) { x_ok <- false; break; } }
			bool y_ok <- true;
			loop obs over: obstacles { if (slide_y intersects obs) { y_ok <- false; break; } }
			if (x_ok and y_ok) { location <- (abs(move_x) >= abs(move_y)) ? slide_x : slide_y; }
			else if (x_ok) { location <- slide_x; velocity_y <- 0.0; }
			else if (y_ok) { location <- slide_y; velocity_x <- 0.0; }
			else { velocity_x <- 0.0; velocity_y <- 0.0; }
		}
	}

	action compute_observation {
		float dist  <- location distance_to food_location;
		float angle <- location towards food_location;
		list<float> sensors;
		loop i from: 0 to: 7 {
			float ray_angle <- float(i) * 45.0;
			point ray_end <- {location.x + sensor_range * cos(ray_angle), location.y + sensor_range * sin(ray_angle)};
			geometry ray <- line([location, ray_end]);
			float closest <- 1.0;
			geometry border <- polyline([{0,0}, {world_size,0}, {world_size,world_size}, {0,world_size}, {0,0}]);
			if (ray intersects border) {
				geometry inter <- ray inter border;
				if (inter != nil) {
					float d <- (location distance_to inter.location) / sensor_range;
					closest <- min([closest, d]);
				}
			}
			loop obs over: obstacles {
				if (ray intersects obs) {
					geometry inter <- ray inter obs;
					if (inter != nil) { float d <- (location distance_to inter.location) / sensor_range; closest <- min([closest, d]); }
				}
			}
			add closest to: sensors;
		}
		float other_x <- 0.5; float other_y <- 0.5;
		forager other <- forager first_with (each.agent_id != agent_id);
		if (other != nil) { other_x <- other.location.x / world_size; other_y <- other.location.y / world_size; }
		bridge.observations[agent_id] <- [
			location.x / world_size, location.y / world_size,
			min([1.0, dist / (world_size * 1.414)]),
			(cos(angle) + 1.0) / 2.0, (sin(angle) + 1.0) / 2.0
		] + sensors + [other_x, other_y];
	}

	action compute_reward {
		float dist <- location distance_to food_location;
		if (dist < food_radius and not at_food) { at_food <- true; agents_at_food[agent_index] <- true; }
		if (agents_at_food all_match each) {
			bridge.rewards[agent_id] <- 100.0; bridge.terminations[agent_id] <- true;
			bridge.truncations[agent_id] <- false; bridge.infos[agent_id] <- []; return;
		}
		if (at_food) {
			bridge.rewards[agent_id] <- 0.5; bridge.terminations[agent_id] <- false;
			bridge.truncations[agent_id] <- false; bridge.infos[agent_id] <- []; return;
		}
		if (current_step >= max_steps) {
			bridge.rewards[agent_id] <- -1.0; bridge.terminations[agent_id] <- false;
			bridge.truncations[agent_id] <- true; bridge.infos[agent_id] <- []; return;
		}
		float delta <- previous_distance - dist;
		float step_reward <- delta * 2.0 - 0.01;
		if (bumped) { step_reward <- step_reward - 1.0; }
		bridge.rewards[agent_id] <- step_reward;
		previous_distance <- dist;
		bridge.terminations[agent_id] <- false; bridge.truncations[agent_id] <- false; bridge.infos[agent_id] <- [];
	}

	aspect default {
		rgb agent_color <- agent_index = 0 ? #blue : rgb(0, 180, 180);
		draw circle(0.8) color: at_food ? #orange : agent_color;
		draw circle(sensor_range) color: rgb(agent_color.red, agent_color.green, agent_color.blue, 20)
			border: rgb(agent_color.red, agent_color.green, agent_color.blue, 60);
	}
}

experiment petz_env {
	parameter "communication_port" var: gama_server_port;
	output {
		display "Multi-Agent World" type: 2d {
			graphics "background" { draw rectangle(world_size, world_size) at: {world_size/2, world_size/2} color: rgb(240, 240, 240); }
			graphics "obstacles" { loop obs over: obstacles { draw obs color: rgb(80, 80, 80); } }
			graphics "food" { draw circle(food_radius) at: food_location color: rgb(50, 180, 50); }
			species forager;
		}
	}
}
⚠️ **GitHub.com Fallback** ⚠️