ForagerRL_step8 - gama-platform/gama GitHub Wiki

8. The GymAgent Bridge

By Killian Trouillet


Step 8: The GymAgent Bridge

How GAMA and Python Communicate

The gama-gymnasium library works as follows:

  1. GAMA runs a simulation and exposes a WebSocket server.
  2. Python connects via the Gymnasium interface (gym.make(...)).
  3. Each simulation step, Python sends an action → GAMA returns an observation + reward.

The glue on the GAMA side is a special species called GymAgent.


The GymAgent Species

This species is required by gama-gymnasium. It must contain exactly these attributes and the update_data action:

species GymAgent {
    map<string, unknown> action_space;
    map<string, unknown> observation_space;
    
    list<float> state;      // Current observation sent to Python
    float reward;           // Reward for this step
    bool terminated;        // True when goal is reached (episode ends)
    bool truncated;         // True when max steps exceeded (episode ends)
    map<string, unknown> info;   // Additional info (can be empty)
    
    list<float> next_action; // Action received FROM Python
    
    map<string, unknown> data;   // Container read by the bridge
    
    // Called by gama-gymnasium with a JSON string like '[0.32, -0.08]'
    // Generic: works for any action type and any number of dimensions
    action set_action(string json_str) {
        next_action <- list(from_json(json_str));
        return "ok";
    }
    
    action update_data() {
        data <- [
            "State"::state,
            "Reward"::reward,
            "Terminated"::terminated,
            "Truncated"::truncated,
            "Info"::info
        ];
    }
}

In short: each simulation cycle, GAMA waits for Python to send an action, executes one step, then returns the resulting observation and reward.

How it works, step by step

Python                              GAMA
  │                                  │
  │  env.reset()                     │
  │ ─────────────────────────────►   │  Creates GymAgent, initializes state
  │  ◄─── obs, info ──────────────   │
  │                                  │
  │  env.step(action)                │
  │ ── action ────────────────────►  │  Sets GymAgent.next_action = action
  │                                  │  Runs one simulation step (reflex)
  │                                  │  Forager reads next_action, moves
  │                                  │  Forager computes obs, reward, done
  │                                  │  Calls update_data
  │  ◄─── obs, reward, done ──────   │
  │                                  │
  │  (repeat until terminated)       │

Attribute Reference

Attribute Type Direction Description
action_space map GAMA → Python Defines what actions Python can send
observation_space map GAMA → Python Defines the observation format
next_action list<float> Python → GAMA The action chosen by the Python agent (set via JSON)
state unknown GAMA → Python Current observation vector
reward float GAMA → Python Reward for the last action
terminated bool GAMA → Python Goal reached?
truncated bool GAMA → Python Timed out?
info map GAMA → Python Extra debug data

Defining the Spaces

In the global init, after creating the GymAgent, we define the action space and observation space:

init {
    // ... obstacles and forager creation ...
    
    create GymAgent;
    
    // Action: 2 continuous values in [-1, 1]
    GymAgent[0].action_space <- [
        "type"::"Box",
        "low"::[-1.0, -1.0],
        "high"::[1.0, 1.0],
        "dtype"::"float"
    ];
    
    // Observation: 13 continuous values in [0, 1]
    GymAgent[0].observation_space <- [
        "type"::"Box",
        "low"::list_with(13, 0.0),
        "high"::list_with(13, 1.0),
        "dtype"::"float"
    ];
}

Space Types

The gama-gymnasium library supports two main space types:

Type GAMA Format Python Equivalent Use Case
Discrete ["type"::"Discrete", "n"::4] Discrete(4) Grid actions (up/down/left/right)
Box ["type"::"Box", "low"::[...], "high"::[...]] Box(low, high) Continuous values

In Part 1, we would have used Discrete(4). Here we use Box because our forager moves with a continuous velocity vector.

Why 13 Observation Values?

We'll define the full observation vector in Step 9, but here's a preview:

Values 0-1 Values 2 Values 3-4 Values 5-12
Agent position (x, y) Distance to food Direction to food (cos, sin) 8 obstacle sensors

The Main Loop

The global reflex connects everything:

reflex main_loop {
    // Guard: skip if Python hasn't sent an action yet (e.g. right after reset)
    if (empty(GymAgent[0].next_action)) { return; }
    ask forager[0] {
        do apply_action();         // Read action from Python, move
        do compute_observation();  // Build the observation vector
        do compute_reward();       // Compute reward + check termination
    }
}

The nil guard is critical. When GAMA starts, next_action is nil because Python hasn't sent anything yet. Without this check, the forager would try to read a non-existent action and fail.

Each simulation cycle:

  1. apply_action: Reads gym_agent.next_action, translates it to movement
  2. compute_observation: Builds the 13-value state vector, writes it to gym_agent.state
  3. compute_reward: Computes the reward, sets terminated/truncated, calls update_data

Complete Model

The model at the end of this step adds GymAgent, the spaces, and the main_loop reflex to the Step 7 skeleton. The forager species actions (apply_action, compute_observation, compute_reward) are stubs — they will be fully implemented in Step 9.

/**
 * Name: SmartForagerGym - Step 8: The GymAgent Bridge
 * Author: Killian Trouillet
 * Description: Adds the GymAgent bridge species, space definitions,
 *              and main reflex loop. Forager actions are stubs for now.
 * Tags: reinforcement-learning, gymnasium, gymAgent, tutorial
 */

model SmartForagerGym

global {
	float world_size <- 100.0;
	point food_location <- {95.0, 95.0};
	float food_radius <- 5.0;
	list<geometry> obstacles <- [];
	int max_steps <- 300;
	int current_step <- 0;
	int gama_server_port <- 0;

	init {
		obstacles << square(10) at_location {25.0, 25.0};
		obstacles << square(10) at_location {35.0, 25.0};
		obstacles << square(10) at_location {25.0, 35.0};
		obstacles << square(10) at_location {65.0, 45.0};
		obstacles << square(10) at_location {75.0, 45.0};
		obstacles << square(10) at_location {75.0, 55.0};

		create forager number: 1 {
			location <- {5.0, 5.0};
		}

		// ── GymAgent bridge ──────────────────────────────────────────────
		create GymAgent;

		GymAgent[0].action_space <- [
			"type"::"Box",
			"low"::[-1.0, -1.0],
			"high"::[1.0, 1.0],
			"dtype"::"float"
		];

		GymAgent[0].observation_space <- [
			"type"::"Box",
			"low"::list_with(13, 0.0),
			"high"::list_with(13, 1.0),
			"dtype"::"float"
		];
	}

	reflex main_loop {
		if (empty(GymAgent[0].next_action)) { return; }
		ask forager[0] {
			do apply_action();
			do compute_observation();
			do compute_reward();
		}
	}
}

// ── GymAgent bridge species (required by gama-gymnasium) ─────────────────────
species GymAgent {
	map<string, unknown> action_space;
	map<string, unknown> observation_space;

	list<float> state;
	float reward;
	bool terminated;
	bool truncated;
	map<string, unknown> info;

	list<float> next_action;
	map<string, unknown> data;

	// Called by gama-gymnasium with a JSON string like '[0.32, -0.08]'
	// Generic: works for any action type and any number of dimensions
	action set_action(string json_str) {
		next_action <- list(from_json(json_str));
		return "ok";
	}

	action update_data() {
		data <- [
			"State"::state,
			"Reward"::reward,
			"Terminated"::terminated,
			"Truncated"::truncated,
			"Info"::info
		];
	}
}

// ── Forager species (actions implemented in Step 9) ──────────────────────────
species forager {
	GymAgent gym_agent <- GymAgent[0];
	float sensor_range <- 30.0;
	float previous_distance <- 0.0;

	init {
		previous_distance <- location distance_to food_location;
	}

	action apply_action() {
		// To be implemented in Step 9
	}

	action compute_observation() {
		// To be implemented in Step 9
	}

	action compute_reward() {
		// To be implemented in Step 9
	}

	aspect default {
		draw circle(3) color: #blue;
		draw circle(sensor_range) color: rgb(0, 0, 255, 20) border: rgb(0, 0, 255, 60);
	}
}

experiment gym_env {
	parameter "communication_port" var: gama_server_port;
	output {
		display "Continuous World" type: 2d {
			graphics "background" {
				draw rectangle(world_size, world_size)
					at: {world_size / 2, world_size / 2}
					color: rgb(240, 240, 240);
			}
			graphics "obstacles" {
				loop obs over: obstacles { draw obs color: rgb(80, 80, 80); }
			}
			graphics "food" {
				draw circle(food_radius) at: food_location color: rgb(50, 180, 50);
			}
			species forager;
		}
	}
}
⚠️ **GitHub.com Fallback** ⚠️