14. Training Multiple Agents

By Killian Trouillet

Step 14: Training Multiple Agents

Starting GAMA Headless

Windows

gama-headless.bat -socket 1001

Linux / MacOS

./gama-headless.sh -socket 1001

Port 1000 is reserved for the GUI. Use any other port for headless training.

Parameter-Shared PPO

In Part 2, we trained a single forager with PPO. Here we have two foragers, but they share the same observation/action structure and the same goal. So we use parameter sharing: one single neural network for both agents.

Why parameter sharing?

Criterion	Parameter Sharing (ours)	Independent PPO	MADDPG
# Networks	1 (shared)	1 per agent	Complex
Data efficiency	Best — 2× data per update	Standard	Standard
Cooperation	Emerges naturally	Must emerge independently	Explicit
Implementation	Simple	Simple	Complex

Each agent feeds its own observation (15 values, including teammate position) into the same network and gets its own action back. Because both agents contribute trajectory data to the same network, learning is 2× faster.

Using GamaParallelEnv Directly

Unlike Part 2 where we used gym.make(), here we use GamaParallelEnv directly — the PettingZoo Parallel API:

from gama_pettingzoo.gama_parallel_env import GamaParallelEnv

env = GamaParallelEnv(
    gaml_experiment_path="path/to/forager_petz.gaml",
    gaml_experiment_name="petz_env",
    gama_ip_address="localhost",
    gama_port=1001,
)

obs, infos = env.reset()
# {"forager_0": array([...]), "forager_1": array([...])}

Batch Inference

We query the shared network for all agents at once using select_actions_batch():

# Collect observations from all active agents
active = [a for a in AGENT_IDS if a in obs]
obs_list = [np.array(obs[a], dtype=np.float32) for a in active]

# One forward pass through the shared network
actions_np, log_probs, values = agent.select_actions_batch(obs_list)

# Build the actions dict for PettingZoo
actions_dict = {a: actions_np[i] for i, a in enumerate(active)}

This is the same approach used in the Pistonball benchmark.

The Training Loop

agent = PPOAgent(state_dim=15, action_dim=2)
UPDATE_EVERY = 2048

total_steps = 0
agent_buffers = {a: RolloutBuffer() for a in AGENT_IDS}

for ep in range(1, NUM_EPISODES + 1):
    obs, _ = env.reset()
    step = 0
    done = False

    while not done and step < 300:
        active = [a for a in AGENT_IDS if a in obs]
        obs_list = [np.array(obs[a], dtype=np.float32) for a in active]
        actions_np, lps, vals = agent.select_actions_batch(obs_list)

        actions_dict = {}
        for i, a in enumerate(active):
            actions_dict[a] = actions_np[i]
            agent_buffers[a].states.append(torch.FloatTensor(obs_list[i]))
            agent_buffers[a].actions.append(torch.FloatTensor(actions_np[i]))
            agent_buffers[a].logprobs.append(torch.tensor(lps[i]))
            agent_buffers[a].values.append(torch.tensor(vals[i]))

        next_obs, rewards, terms, truncs, _ = env.step(actions_dict)

        for a in active:
            agent_buffers[a].rewards.append(rewards.get(a, 0.0))
            agent_buffers[a].dones.append(terms.get(a, False) or truncs.get(a, False))

        obs = next_obs
        step += 1
        total_steps += len(active)
        done = not env.agents or all(terms.get(a, False) for a in AGENT_IDS)

        # PPO update — all agents' data pooled into one gradient step
        if total_steps >= UPDATE_EVERY:
            agent.update(agent_buffers)
            agent_buffers = {a: RolloutBuffer() for a in AGENT_IDS}
            total_steps = 0

Key differences from Part 2

Aspect	Part 2 (Gymnasium)	Part 3 (PettingZoo)
Environment	`gym.make()`	`GamaParallelEnv()`
Observations	Single array	Dict `{agent_id: array}`
Actions	Single array	Dict `{agent_id: array}`
Rollout buffers	One buffer	One buffer per agent
PPO update	`agent.update(buffer)`	`agent.update(agent_buffers)` — pools all agents' data

What to Expect

  Ep   10/500 | steps 300 | f0  -4.2  f1  -3.8 | success  0%
  Ep   50/500 | steps 241 | f0  +8.1  f1  +6.4 | success  5%
  Ep  100/500 | steps 127 | f0 +42.3  f1 +38.7 | success 30%
  Ep  200/500 | steps  74 | f0 +81.4  f1 +79.2 | success 72%
  Ep  300/500 | steps  59 | f0 +88.6  f1 +87.1 | success 85%
  Ep  500/500 | steps  52 | f0 +91.2  f1 +90.4 | success 92%

Running the Training

cd models/petz
python train_forager_petz.py

The shared model is saved to saved_models/ppo_forager.pth.

Key Files

File	Description
`models/petz/forager_petz.gaml`	GAMA model with `PetzAgent` bridge
`models/petz/train_forager_petz.py`	MARL training script (headless)
`models/petz/test_forager_petz.py`	Testing script (GUI visualization)

ForagerRL_step14 - gama-platform/gama GitHub Wiki

14. Training Multiple Agents

Step 14: Training Multiple Agents

Starting GAMA Headless

Windows

Linux / MacOS

Parameter-Shared PPO

Why parameter sharing?

Using GamaParallelEnv Directly

Batch Inference

The Training Loop

Key differences from Part 2

What to Expect

Running the Training

Key Files

⚠️ GitHub.com Fallback ⚠️

ForagerRL_step14 - gama-platform/gama GitHub Wiki

14. Training Multiple Agents

Step 14: Training Multiple Agents

Starting GAMA Headless

Windows

Linux / MacOS

Parameter-Shared PPO

Why parameter sharing?

Using GamaParallelEnv Directly

Batch Inference

The Training Loop

Key differences from Part 2

What to Expect

Running the Training

Key Files

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️