System UnrealPlugin BaseBridge SingleEnvironmentBridge - kcccr123/ue-reinforcement-learning GitHub Wiki
The SingleEnvBridge
class extends BaseBridge
to support single-instance reinforcement learning environments in Unreal. It manages training and inference for a single agent and handles message exchange, control flow, and environment-specific overrides.
This bridge is designed for environments that only simulate one agent at a time. It runs the RL update loop based on tick cycles and communicates with a Python agent via a TCP connection.
On connection, the bridge sends the following handshake:
CONFIG:OBS={ObservationSpaceSize};ACT={ActionSpaceSize};ENV_TYPE=SINGLE
This informs the Python agent that only one environment instance is available.
This bridge should be subclassed to implement environment-specific logic. Override the callbacks to define how your agent observes the environment, takes actions, resets, and determines when an action has completed.
Returns a handshake string that includes environment type = SINGLE. Used to initialize the Python agent.
Returns a USingleTcpConnection
object used to send and receive messages over TCP.
Closes the socket connection. Inherited from BaseBridge
.
Main loop that runs during tick:
-
Training Mode:
- Receives action from Python.
- Parses and applies the action or handles reset.
- Waits until action completes via
IsActionRunning()
. - Then sends
OBS
,REW
, andDONE
to the Python agent.
-
Inference Mode:
- Executes a local model to compute actions.
- Applies actions until complete.
Local model for inference mode is set via SetInferenceInterface
method inherited from parent class. See BaseBridge
and Inference Interface
pages for more details.
Called once the action is finished. Returns the reward based on task completion or environment state.
- Set
bDone = true
to signal the end of an episode.
Must serialize the environment state to a string.
EXPECTED FORMAT:
"<obs_0>,<obs_1>,...,<obs_n>"
Where each <obs_i>
is a float representing part of the observation.
Reset the environment to its original state (e.g., agent position, timers, physics, etc.). Called after a RESET
command.
Parses and applies the incoming agent action string.
- Update movement, animation, or internal simulation logic as needed.
Returns true
while the agent is still performing the last action. The update loop polls this every tick. Once it returns false
, reward is calculated and state is sent.
Sends desired message to Python module via TcpConnection object.
Waits for the next message from the Python side. This may contain a command (RESET) or action string.
Inherited from BaseBridge
. Calls UpdateRL(DeltaTime)
.
Returns true
if bridge is active and socket is valid.
Used by Unreal to track tick performance.