System PythonModule GymWrappers Base SingleEnv - kcccr123/ue-reinforcement-learning GitHub Wiki

GymWrapperSingleEnv

GymWrapperSingleEnv is a concrete gymnasium.Env implementation that wraps a single Unreal simulation instance exposed through the Single‑Environment Bridge text protocol. It inherits all networking utilities from GymWrapperBase and adds the one‑to‑one translation logic required for reinforcement‑learning libraries such as Stable‑Baselines3.


Overview

GymWrapperSingleEnv communicates with Unreal through a connected TCP socket that is already established by the caller. During an episode it:

  • Sends an ACT=RESET command to obtain the initial state.
  • Serialises NumPy action vectors into comma‑separated strings (two‑decimal precision) and transmits them as ACT=<a0>,<a1>,….
  • Blocks on a response of the form: OBS=<o0>,<o1>,…;REW=<reward>;DONE=<0|1>
  • Converts the received data back into Gym‑compatible observations and returns the standard tuple (obs, reward, terminated, truncated, info).

Usage

sock = socket.create_connection(("127.0.0.1", 2000))
env  = GymWrapperSingleEnv(sock, obs_shape=36, act_shape=4)

obs, _ = env.reset()
terminated = False
while not terminated:
    action = agent.act(obs)                 # your policy
    obs, reward, terminated, _, _ = env.step(action)

env.close()

Initialization and Configuration

init(sock, obs_shape=0, act_shape=0)

Parameter Type Description
sock socket.socket Connected TCP socket to the Unreal bridge.
obs_shape int Length of the flat observation vector.
act_shape int Length of the flat action vector.

The constructor also sets:

  • self.observation_space = spaces.Box(-np.inf, np.inf, (obs_shape,), np.float32)
  • self.action_space = spaces.Box(-1.0, 1.0, (act_shape,), np.float32)

Gymnasium API

reset(seed=None, options=None) -> (obs, info)

Sends ACT=RESET, waits for the first state message, parses it, and returns the initial observation along with an (empty) info dict.

step(action) -> (obs, reward, terminated, truncated, info)

  1. Formats the NumPy action as a comma‑separated string.
  2. Calls send_data() with the resulting ACT=<...> payload.
  3. Waits on receive_data() for the next state.
  4. Uses _parse_state() to extract observation, reward, and termination flag.
  5. Returns the Gym‑standard tuple with truncated=False.

close()

Inherited from GymWrapperBase. Invokes disconnect() to close the socket.


TCP Communication Helpers

void send_data(msg)

Appends \n, encodes as UTF‑8, and transmits with socket.sendall().

str receive_data()

Reads from the socket until \n is encountered and returns the full message without the delimiter.

void disconnect()

Closes the socket and sets self.sock = None.

All messages must terminate with \n so that receive_data can detect message boundaries.


⚠️ **GitHub.com Fallback** ⚠️