8. The GymAgent Bridge

By Killian Trouillet

Step 8: The GymAgent Bridge

How GAMA and Python Communicate

The gama-gymnasium library works as follows:

GAMA runs a simulation and exposes a WebSocket server.
Python connects via the Gymnasium interface (gym.make(...)).
Each simulation step, Python sends an action → GAMA returns an observation + reward.

The glue on the GAMA side is a special species called GymAgent.

The GymAgent Species

This species is required by gama-gymnasium. It must contain exactly these attributes and the update_data action:

species GymAgent {
    map<string, unknown> action_space;
    map<string, unknown> observation_space;
    
    list<float> state;      // Current observation sent to Python
    float reward;           // Reward for this step
    bool terminated;        // True when goal is reached (episode ends)
    bool truncated;         // True when max steps exceeded (episode ends)
    map<string, unknown> info;   // Additional info (can be empty)
    
    list<float> next_action; // Action received FROM Python
    
    map<string, unknown> data;   // Container read by the bridge
    
    // Called by gama-gymnasium with a JSON string like '[0.32, -0.08]'
    // Generic: works for any action type and any number of dimensions
    action set_action(string json_str) {
        next_action <- list(from_json(json_str));
        return "ok";
    }
    
    action update_data() {
        data <- [
            "State"::state,
            "Reward"::reward,
            "Terminated"::terminated,
            "Truncated"::truncated,
            "Info"::info
        ];
    }
}

In short: each simulation cycle, GAMA waits for Python to send an action, executes one step, then returns the resulting observation and reward.

How it works, step by step

Python                              GAMA
  │                                  │
  │  env.reset()                     │
  │ ─────────────────────────────►   │  Creates GymAgent, initializes state
  │  ◄─── obs, info ──────────────   │
  │                                  │
  │  env.step(action)                │
  │ ── action ────────────────────►  │  Sets GymAgent.next_action = action
  │                                  │  Runs one simulation step (reflex)
  │                                  │  Forager reads next_action, moves
  │                                  │  Forager computes obs, reward, done
  │                                  │  Calls update_data
  │  ◄─── obs, reward, done ──────   │
  │                                  │
  │  (repeat until terminated)       │

Attribute Reference

Attribute	Type	Direction	Description
`action_space`	`map`	GAMA → Python	Defines what actions Python can send
`observation_space`	`map`	GAMA → Python	Defines the observation format
`next_action`	`list<float>`	Python → GAMA	The action chosen by the Python agent (set via JSON)
`state`	`unknown`	GAMA → Python	Current observation vector
`reward`	`float`	GAMA → Python	Reward for the last action
`terminated`	`bool`	GAMA → Python	Goal reached?
`truncated`	`bool`	GAMA → Python	Timed out?
`info`	`map`	GAMA → Python	Extra debug data

Defining the Spaces

In the global init, after creating the GymAgent, we define the action space and observation space:

init {
    // ... obstacles and forager creation ...
    
    create GymAgent;
    
    // Action: 2 continuous values in [-1, 1]
    GymAgent[0].action_space <- [
        "type"::"Box",
        "low"::[-1.0, -1.0],
        "high"::[1.0, 1.0],
        "dtype"::"float"
    ];
    
    // Observation: 13 continuous values in [0, 1]
    GymAgent[0].observation_space <- [
        "type"::"Box",
        "low"::list_with(13, 0.0),
        "high"::list_with(13, 1.0),
        "dtype"::"float"
    ];
}

Space Types

The gama-gymnasium library supports two main space types:

Type	GAMA Format	Python Equivalent	Use Case
`Discrete`	`["type"::"Discrete", "n"::4]`	`Discrete(4)`	Grid actions (up/down/left/right)
`Box`	`["type"::"Box", "low"::[...], "high"::[...]]`	`Box(low, high)`	Continuous values

In Part 1, we would have used Discrete(4). Here we use Box because our forager moves with a continuous velocity vector.

Why 13 Observation Values?

We'll define the full observation vector in Step 9, but here's a preview:

Values 0-1	Values 2	Values 3-4	Values 5-12
Agent position (x, y)	Distance to food	Direction to food (cos, sin)	8 obstacle sensors

The Main Loop

The global reflex connects everything:

reflex main_loop {
    // Guard: skip if Python hasn't sent an action yet (e.g. right after reset)
    if (empty(GymAgent[0].next_action)) { return; }
    ask forager[0] {
        do apply_action();         // Read action from Python, move
        do compute_observation();  // Build the observation vector
        do compute_reward();       // Compute reward + check termination
    }
}

The nil guard is critical. When GAMA starts, next_action is nil because Python hasn't sent anything yet. Without this check, the forager would try to read a non-existent action and fail.

Each simulation cycle:

apply_action: Reads gym_agent.next_action, translates it to movement
compute_observation: Builds the 13-value state vector, writes it to gym_agent.state
compute_reward: Computes the reward, sets terminated/truncated, calls update_data

Complete Model

The model at the end of this step adds GymAgent, the spaces, and the main_loop reflex to the Step 7 skeleton. The forager species actions (apply_action, compute_observation, compute_reward) are stubs — they will be fully implemented in Step 9.

/**
 * Name: SmartForagerGym - Step 8: The GymAgent Bridge
 * Author: Killian Trouillet
 * Description: Adds the GymAgent bridge species, space definitions,
 *              and main reflex loop. Forager actions are stubs for now.
 * Tags: reinforcement-learning, gymnasium, gymAgent, tutorial
 */

model SmartForagerGym

global {
	float world_size <- 100.0;
	point food_location <- {95.0, 95.0};
	float food_radius <- 5.0;
	list<geometry> obstacles <- [];
	int max_steps <- 300;
	int current_step <- 0;
	int gama_server_port <- 0;

	init {
		obstacles << square(10) at_location {25.0, 25.0};
		obstacles << square(10) at_location {35.0, 25.0};
		obstacles << square(10) at_location {25.0, 35.0};
		obstacles << square(10) at_location {65.0, 45.0};
		obstacles << square(10) at_location {75.0, 45.0};
		obstacles << square(10) at_location {75.0, 55.0};

		create forager number: 1 {
			location <- {5.0, 5.0};
		}

		// ── GymAgent bridge ──────────────────────────────────────────────
		create GymAgent;

		GymAgent[0].action_space <- [
			"type"::"Box",
			"low"::[-1.0, -1.0],
			"high"::[1.0, 1.0],
			"dtype"::"float"
		];

		GymAgent[0].observation_space <- [
			"type"::"Box",
			"low"::list_with(13, 0.0),
			"high"::list_with(13, 1.0),
			"dtype"::"float"
		];
	}

	reflex main_loop {
		if (empty(GymAgent[0].next_action)) { return; }
		ask forager[0] {
			do apply_action();
			do compute_observation();
			do compute_reward();
		}
	}
}

// ── GymAgent bridge species (required by gama-gymnasium) ─────────────────────
species GymAgent {
	map<string, unknown> action_space;
	map<string, unknown> observation_space;

	list<float> state;
	float reward;
	bool terminated;
	bool truncated;
	map<string, unknown> info;

	list<float> next_action;
	map<string, unknown> data;

	// Called by gama-gymnasium with a JSON string like '[0.32, -0.08]'
	// Generic: works for any action type and any number of dimensions
	action set_action(string json_str) {
		next_action <- list(from_json(json_str));
		return "ok";
	}

	action update_data() {
		data <- [
			"State"::state,
			"Reward"::reward,
			"Terminated"::terminated,
			"Truncated"::truncated,
			"Info"::info
		];
	}
}

// ── Forager species (actions implemented in Step 9) ──────────────────────────
species forager {
	GymAgent gym_agent <- GymAgent[0];
	float sensor_range <- 30.0;
	float previous_distance <- 0.0;

	init {
		previous_distance <- location distance_to food_location;
	}

	action apply_action() {
		// To be implemented in Step 9
	}

	action compute_observation() {
		// To be implemented in Step 9
	}

	action compute_reward() {
		// To be implemented in Step 9
	}

	aspect default {
		draw circle(3) color: #blue;
		draw circle(sensor_range) color: rgb(0, 0, 255, 20) border: rgb(0, 0, 255, 60);
	}
}

experiment gym_env {
	parameter "communication_port" var: gama_server_port;
	output {
		display "Continuous World" type: 2d {
			graphics "background" {
				draw rectangle(world_size, world_size)
					at: {world_size / 2, world_size / 2}
					color: rgb(240, 240, 240);
			}
			graphics "obstacles" {
				loop obs over: obstacles { draw obs color: rgb(80, 80, 80); }
			}
			graphics "food" {
				draw circle(food_radius) at: food_location color: rgb(50, 180, 50);
			}
			species forager;
		}
	}
}

ForagerRL_step8 - gama-platform/gama GitHub Wiki

8. The GymAgent Bridge

Step 8: The GymAgent Bridge

How GAMA and Python Communicate

The GymAgent Species

How it works, step by step

Attribute Reference

Defining the Spaces

Space Types

Why 13 Observation Values?

The Main Loop

Complete Model

⚠️ GitHub.com Fallback ⚠️

ForagerRL_step8 - gama-platform/gama GitHub Wiki

8. The GymAgent Bridge

Step 8: The GymAgent Bridge

How GAMA and Python Communicate

The GymAgent Species

How it works, step by step

Attribute Reference

Defining the Spaces

Space Types

Why 13 Observation Values?

The Main Loop

Complete Model

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️