System UnrealPlugin BaseBridge SingleEnvironmentBridge - kcccr123/ue-reinforcement-learning GitHub Wiki

SingleEnvBridge

The SingleEnvBridge class extends BaseBridge to support single-instance reinforcement learning environments in Unreal. It manages training and inference for a single agent and handles message exchange, control flow, and environment-specific overrides.


Overview

This bridge is designed for environments that only simulate one agent at a time. It runs the RL update loop based on tick cycles and communicates with a Python agent via a TCP connection.

On connection, the bridge sends the following handshake:

CONFIG:OBS={ObservationSpaceSize};ACT={ActionSpaceSize};ENV_TYPE=SINGLE

This informs the Python agent that only one environment instance is available.


Usage

This bridge should be subclassed to implement environment-specific logic. Override the callbacks to define how your agent observes the environment, takes actions, resets, and determines when an action has completed.


Initialization and Configuration

FString BuildHandshake()

Returns a handshake string that includes environment type = SINGLE. Used to initialize the Python agent.

UBaseTcpConnection* CreateTcpConnection()

Returns a USingleTcpConnection object used to send and receive messages over TCP.

void Disconnect()

Closes the socket connection. Inherited from BaseBridge.


Execution Control

void UpdateRL(float DeltaTime)

Main loop that runs during tick:

  • Training Mode:
    • Receives action from Python.
    • Parses and applies the action or handles reset.
    • Waits until action completes via IsActionRunning().
    • Then sends OBS, REW, and DONE to the Python agent.
  • Inference Mode:
    • Executes a local model to compute actions.
    • Applies actions until complete.

Local model for inference mode is set via SetInferenceInterface method inherited from parent class. See BaseBridge and Inference Interface pages for more details.


Environment Callbacks

float CalculateReward(bool& bDone)

Called once the action is finished. Returns the reward based on task completion or environment state.

  • Set bDone = true to signal the end of an episode.

FString CreateStateString()

Must serialize the environment state to a string.

EXPECTED FORMAT:

"<obs_0>,<obs_1>,...,<obs_n>"

Where each <obs_i> is a float representing part of the observation.

void HandleReset()

Reset the environment to its original state (e.g., agent position, timers, physics, etc.). Called after a RESET command.

void HandleResponseActions(const FString& actions)

Parses and applies the incoming agent action string.

  • Update movement, animation, or internal simulation logic as needed.

bool IsActionRunning()

Returns true while the agent is still performing the last action. The update loop polls this every tick. Once it returns false, reward is calculated and state is sent.


TCP Communication

bool SendData(const FString& Data)

Sends desired message to Python module via TcpConnection object.

FString ReceiveData()

Waits for the next message from the Python side. This may contain a command (RESET) or action string.


Ticking Methods

void Tick(float DeltaTime)

Inherited from BaseBridge. Calls UpdateRL(DeltaTime).

bool IsTickable() const

Returns true if bridge is active and socket is valid.

TStatId GetStatId() const

Used by Unreal to track tick performance.


⚠️ **GitHub.com Fallback** ⚠️