RL with til_environment - til-ai/til-25 GitHub Wiki
This page introduces the basics of using the til_environment
package to run your RL models.
Contents
Setup
Navigate to the til-25
repo and install the til_environment
package with pip install -r requirements-dev.txt
.
Usage
After installation, you should be able to access the RL environment as such:
from til_environment import gridworld
env = gridworld.env()
The gridworld.env()
method accepts the following arguments:
env_wrappers
: A list of wrappers for the environment.- If
None
, defaults to a set of example wrappers where the input observation dict is flattened (FlattenDictWrapper
) and the past 4 output frames are stacked together (supersuit.frame_stack_v2
withstack_size=4
andstack_dim=-1
). - If you don't want to pass in any wrappers, pass in an empty list
[]
.
- If
render_mode
: One of"human"
,"rgb_array"
, orNone
."human"
renders the environment in a pygame window."rgb_array"
returns the environment as a RGB pixel array. Useful for recording videos of your agent's actions for debugging.None
disables rendering. Useful for training.
debug
: Whether to log additional debug information and show a debug panel during rendering.novice
: IfTrue
, fixes the map layout to that used by Novice teams throughout the competition. Advanced teams should set this toFalse
, because your trained RL agent is expected to generalize to maps not known to you ahead of time.rewards_dict
: Mapping of reward names to values. Useful for simple reward shaping.window_size
: The size of the pygame render window, in pixels. Defaults to768
.
Running the environment
The environment is built with PettingZoo, a popular multi-agent reinforcement learning (MARL) environment library. Run it as such:
from til_environment import gridworld
env = gridworld.env(
env_wrappers=[], # clear out default env wrappers
render_mode="human", # Render the map; not visible on Workbench
debug=True, # Enable debug mode
novice=True, # Use same map layout every time (for Novice teams only)
)
env.reset(seed=42)
for agent in env.agent_iter():
observation, reward, termination, truncation, info = env.last()
if termination or truncation:
break
else:
# Insert your policy here
action = env.action_space(agent).sample()
env.step(action)
env.close()
Use during training
Reward shaping
For simple reward shaping, create a new rewards_dict
to pass in as a parameter to gridworld.env(rewards_dict=YOUR_REWARDS_DICT)
.
For complex reward shaping, you may have to write a wrapper or write your own class/function to provide additional rewards based on behaviours/actions/state.
There is a RewardNames
enum that has a list of all the possible rewards_dict
key names, with the following behaviours:
RewardNames Key |
Corresponding Behaviour |
---|---|
GUARD_WINS | Shared reward for all Guards if any Guard captures |
GUARD_CAPTURES | For all "capturers" since it's possible for multiple Guards to capture |
SCOUT_CAPTURED | For Scout if captured |
SCOUT_RECON | Scout collects recon |
SCOUT_MISSION | Scout collects mission |
WALL_COLLISION | For agent if they collide with wall |
AGENT_COLLIDER | For agent which collides into another |
AGENT_COLLIDEE | For agent which is collided into |
STATIONARY_PENALTY | For agent who takes action Action.STAY |
GUARD_TRUNCATION | For Guards if round ends without capture |
SCOUT_TRUNCATION | For Scouts if round ends without capture |
GUARD_STEP | For Guards when step increments |
SCOUT_STEP | For Scouts when step increments |
Writing an environment wrapper
Participants may wish to modify the observations received by their agent, or the function used to calculate its reward during training. This can be achieved by passing the default environment into a custom wrapper.
Custom wrappers can be created by inheriting from BaseWrapper
as follows:
import functools
from pettingzoo.utils.env import ActionType, AECEnv, AgentID, ObsType
from pettingzoo.utils.wrappers.base import BaseWrapper
class CustomWrapper(BaseWrapper[AgentID, ObsType, ActionType]):
def __init__(
self,
env: AECEnv[AgentID, ObsType, ActionType],
):
super().__init__(env)
def reset(self, seed=None, options=None):
super().reset(seed, options)
def step(self, action: ActionType):
super().step(action)
def observe(self, agent: AgentID) -> ObsType | None:
obs = super().observe(agent)
return obs
@functools.lru_cache(maxsize=None)
def observation_space(self, agent):
space = super().observation_space(agent)
return space
Wrap the environment using env = CustomWrapper(env)
. The gridworld.env()
wrapper has an argument provided to pass in a list of any custom environment wrappers you want: gridworld.env(env_wrappers=[CustomWrapper])
.
For a collection of other PettingZoo environment wrappers, see PettingZoo's own Wrappers, as well as those as part of the SuperSuit package.
Observation shaping and persistence
Either write a wrapper and put your logic in there, or write your own class/function to properly format the provided environment. Then when deploying your agent in a Docker container for submission, remember to replicate your observation shaping/persistence logic in rl/src/rl_manager.py
.
For example, after your CustomWrapper logic, you likely would want to flatten the agent observation to a 1D array for ease of training RL agents. You can thus pass both your CustomWrapper
as well as the provided FlattenDictWrapper
to env while training:
from til_environment.flatten_dict import FlattenDictWrapper
from til_environment import gridworld
class CustomWrapper(BaseWrapper[AgentID, ObsType, ActionType]):
...
env = gridworld.env(env_wrappers=[CustomWrapper, FlattenDictWrapper])
Then, in your rl_manager.py
, you would have something like this to replicate the FlattenDictWrapper
logic:
from gymnasium.spaces import flatten
class RLManager:
def __init__(self):
self.space = ...
self.model = ...
def rl(self, observation: dict[str, int | list[int]]) -> int:
...
obs = flatten(self.space, observation)
return self.model.predict(obs)