System UnrealPlugin BaseBridge MultiEnvironmentBridge - kcccr123/ue-reinforcement-learning GitHub Wiki
The MultiEnvBridge
class extends BaseBridge
to support environments with multiple simulation instances running in parallel. It is designed for training agents using batched environments to improve throughput and sample efficiency. Intended to be used in conjuction with MultiTcpConnection
class.
This bridge allows a Python agent to send and receive batched action/observation data across several instances of the environment. Each environment operates independently but shares the same communication channel.
Upon connection, the bridge sends the following handshake:
CONFIG:OBS={ObservationSpaceSize};ACT={ActionSpaceSize};ENV_TYPE=MULTI;ENV_COUNT={NumEnvironments}
This tells the agent how many environments are running and how to interpret the incoming data.
Subclass MultiEnvBridge
and override the environment-specific callbacks to handle reward calculation, state serialization, reset logic, and action application for each environment instance.
Call InitializeEnvironments(int32 InNumEnvironments, bool bInferenceMode)
before training begins to set up per-instance state.
Initializes the number of environments.
- If
bInferenceMode
is true, forcesNumEnvironments
to 1. - Initializes internal state arrays (e.g.,
bIsActionRunning
).
Returns a handshake string that includes observation/action dimensions and number of environments:
CONFIG:OBS={ObservationSpaceSize};ACT={ActionSpaceSize};ENV_TYPE=MULTI;ENV_COUNT={NumEnvironments}
Main training loop:
-
Training Mode:
- Receives a batch of actions from Python, each associated with an environment ID.
- Parses and applies each action, or resets the environment if requested.
- Waits until
IsActionRunningForEnv(i)
is false. - Computes reward and sends updated observation/reward/done for each environment.
-
Inference Mode:
- Only operates on environment 0.
- Continuously generates and applies actions using a local model.
Local model for inference mode is set via SetInferenceInterface
method inherited from parent class. See BaseBridge
and Inference Interface
pages for more details.
Compute and return the reward for environment EnvId
. Set bDone = true
if that environment’s episode should terminate.
Serialize the full observation for environment EnvId
.
EXPECTED FORMAT:
"<obs_0>,<obs_1>,...,<obs_n>"
Use a flat string of numerical values to represent each observation.
Reset the state of environment EnvId
to its initial configuration.
Parse and apply the action string for environment EnvId
. This string should be interpreted as you would a single agent command.
Return true
if the environment is still processing an action. When this returns false
, the reward and new observation will be computed and sent.
Inherits behavior from BaseBridge
.
- Receives a batch of actions formatted as:
ACT=...;ENV=0||ACT=...;ENV=1||...
- Sends a batch of results formatted as:
OBS=...;REW=...;DONE=0;ENV=0||OBS=...;REW=...;DONE=1;ENV=1||...
Inherits tick behavior from BaseBridge
.
- Ticks every frame and runs
UpdateRL()
if active.