System UnrealPlugin BaseBridge MultiEnvironmentBridge - kcccr123/ue-reinforcement-learning GitHub Wiki

MultiEnvBridge

The MultiEnvBridge class extends BaseBridge to support environments with multiple simulation instances running in parallel. It is designed for training agents using batched environments to improve throughput and sample efficiency. Intended to be used in conjuction with MultiTcpConnection class.


Overview

This bridge allows a Python agent to send and receive batched action/observation data across several instances of the environment. Each environment operates independently but shares the same communication channel.

Upon connection, the bridge sends the following handshake:

CONFIG:OBS={ObservationSpaceSize};ACT={ActionSpaceSize};ENV_TYPE=MULTI;ENV_COUNT={NumEnvironments}

This tells the agent how many environments are running and how to interpret the incoming data.


Usage

Subclass MultiEnvBridge and override the environment-specific callbacks to handle reward calculation, state serialization, reset logic, and action application for each environment instance.

Call InitializeEnvironments(int32 InNumEnvironments, bool bInferenceMode) before training begins to set up per-instance state.


Initialization and Configuration

void InitializeEnvironments(int32 InNumEnvironments, bool bInferenceMode)

Initializes the number of environments.

  • If bInferenceMode is true, forces NumEnvironments to 1.
  • Initializes internal state arrays (e.g., bIsActionRunning).

FString BuildHandshake()

Returns a handshake string that includes observation/action dimensions and number of environments:

CONFIG:OBS={ObservationSpaceSize};ACT={ActionSpaceSize};ENV_TYPE=MULTI;ENV_COUNT={NumEnvironments}

Execution Control

void UpdateRL(float DeltaTime)

Main training loop:

  • Training Mode:
    • Receives a batch of actions from Python, each associated with an environment ID.
    • Parses and applies each action, or resets the environment if requested.
    • Waits until IsActionRunningForEnv(i) is false.
    • Computes reward and sends updated observation/reward/done for each environment.
  • Inference Mode:
    • Only operates on environment 0.
    • Continuously generates and applies actions using a local model.

Local model for inference mode is set via SetInferenceInterface method inherited from parent class. See BaseBridge and Inference Interface pages for more details.


Environment Callbacks

float CalculateRewardForEnv(int32 EnvId, bool& bDone)

Compute and return the reward for environment EnvId. Set bDone = true if that environment’s episode should terminate.

FString CreateStateStringForEnv(int32 EnvId)

Serialize the full observation for environment EnvId.

EXPECTED FORMAT:

"<obs_0>,<obs_1>,...,<obs_n>"

Use a flat string of numerical values to represent each observation.

void HandleResetForEnv(int32 EnvId)

Reset the state of environment EnvId to its initial configuration.

void HandleResponseActionsForEnv(int32 EnvId, const FString& actions)

Parse and apply the action string for environment EnvId. This string should be interpreted as you would a single agent command.

bool IsActionRunningForEnv(int32 EnvId)

Return true if the environment is still processing an action. When this returns false, the reward and new observation will be computed and sent.


TCP Communication

Inherits behavior from BaseBridge.

  • Receives a batch of actions formatted as:
ACT=...;ENV=0||ACT=...;ENV=1||...
  • Sends a batch of results formatted as:
OBS=...;REW=...;DONE=0;ENV=0||OBS=...;REW=...;DONE=1;ENV=1||...

Ticking Methods

Inherits tick behavior from BaseBridge.

  • Ticks every frame and runs UpdateRL() if active.

⚠️ **GitHub.com Fallback** ⚠️