ForagerRL_step14 - gama-platform/gama GitHub Wiki
By Killian Trouillet
gama-headless.bat -socket 1001./gama-headless.sh -socket 1001Port
1000is reserved for the GUI. Use any other port for headless training.
In Part 2, we trained a single forager with PPO. Here we have two foragers, but they share the same observation/action structure and the same goal. So we use parameter sharing: one single neural network for both agents.
| Criterion | Parameter Sharing (ours) | Independent PPO | MADDPG |
|---|---|---|---|
| # Networks | 1 (shared) | 1 per agent | Complex |
| Data efficiency | Best — 2× data per update | Standard | Standard |
| Cooperation | Emerges naturally | Must emerge independently | Explicit |
| Implementation | Simple | Simple | Complex |
Each agent feeds its own observation (15 values, including teammate position) into the same network and gets its own action back. Because both agents contribute trajectory data to the same network, learning is 2× faster.
Unlike Part 2 where we used gym.make(), here we use GamaParallelEnv directly — the PettingZoo Parallel API:
from gama_pettingzoo.gama_parallel_env import GamaParallelEnv
env = GamaParallelEnv(
gaml_experiment_path="path/to/forager_petz.gaml",
gaml_experiment_name="petz_env",
gama_ip_address="localhost",
gama_port=1001,
)
obs, infos = env.reset()
# {"forager_0": array([...]), "forager_1": array([...])}We query the shared network for all agents at once using select_actions_batch():
# Collect observations from all active agents
active = [a for a in AGENT_IDS if a in obs]
obs_list = [np.array(obs[a], dtype=np.float32) for a in active]
# One forward pass through the shared network
actions_np, log_probs, values = agent.select_actions_batch(obs_list)
# Build the actions dict for PettingZoo
actions_dict = {a: actions_np[i] for i, a in enumerate(active)}This is the same approach used in the Pistonball benchmark.
agent = PPOAgent(state_dim=15, action_dim=2)
UPDATE_EVERY = 2048
total_steps = 0
agent_buffers = {a: RolloutBuffer() for a in AGENT_IDS}
for ep in range(1, NUM_EPISODES + 1):
obs, _ = env.reset()
step = 0
done = False
while not done and step < 300:
active = [a for a in AGENT_IDS if a in obs]
obs_list = [np.array(obs[a], dtype=np.float32) for a in active]
actions_np, lps, vals = agent.select_actions_batch(obs_list)
actions_dict = {}
for i, a in enumerate(active):
actions_dict[a] = actions_np[i]
agent_buffers[a].states.append(torch.FloatTensor(obs_list[i]))
agent_buffers[a].actions.append(torch.FloatTensor(actions_np[i]))
agent_buffers[a].logprobs.append(torch.tensor(lps[i]))
agent_buffers[a].values.append(torch.tensor(vals[i]))
next_obs, rewards, terms, truncs, _ = env.step(actions_dict)
for a in active:
agent_buffers[a].rewards.append(rewards.get(a, 0.0))
agent_buffers[a].dones.append(terms.get(a, False) or truncs.get(a, False))
obs = next_obs
step += 1
total_steps += len(active)
done = not env.agents or all(terms.get(a, False) for a in AGENT_IDS)
# PPO update — all agents' data pooled into one gradient step
if total_steps >= UPDATE_EVERY:
agent.update(agent_buffers)
agent_buffers = {a: RolloutBuffer() for a in AGENT_IDS}
total_steps = 0| Aspect | Part 2 (Gymnasium) | Part 3 (PettingZoo) |
|---|---|---|
| Environment | gym.make() |
GamaParallelEnv() |
| Observations | Single array | Dict {agent_id: array}
|
| Actions | Single array | Dict {agent_id: array}
|
| Rollout buffers | One buffer | One buffer per agent |
| PPO update | agent.update(buffer) |
agent.update(agent_buffers) — pools all agents' data |
Ep 10/500 | steps 300 | f0 -4.2 f1 -3.8 | success 0%
Ep 50/500 | steps 241 | f0 +8.1 f1 +6.4 | success 5%
Ep 100/500 | steps 127 | f0 +42.3 f1 +38.7 | success 30%
Ep 200/500 | steps 74 | f0 +81.4 f1 +79.2 | success 72%
Ep 300/500 | steps 59 | f0 +88.6 f1 +87.1 | success 85%
Ep 500/500 | steps 52 | f0 +91.2 f1 +90.4 | success 92%
cd models/petz
python train_forager_petz.pyThe shared model is saved to saved_models/ppo_forager.pth.
| File | Description |
|---|---|
models/petz/forager_petz.gaml |
GAMA model with PetzAgent bridge |
models/petz/train_forager_petz.py |
MARL training script (headless) |
models/petz/test_forager_petz.py |
Testing script (GUI visualization) |