Exploring the Code - Skalwalker/MRLCommunication GitHub Wiki
This page is from Matheus Portela's Wiki
Code structure
Multiagent RL contains two packages that interact when executing a simulation:
- Controller: Code from
multiagentrl
package that provides common features to be reused by any simulation. Contains the following files:core.py
: Main code, providing base classes for adapters, agents and controllers.communication.py
: Functionality to receive and send messages between processes and threads.controller.py
: Logic when running a simulation, receiving messages from the adapter and sending them to the agents.exploration.py
: Exploration algorithms reusable by any simulation, such as ε-greedy.learning.py
: Generic learning algorithms reusable by any simulation, such as Q-learning.messages.py
: Message classes to be exchanged between adapter and controller.
- Adapter: Code specific for each simulation, such as Windy World and Pac-Man. All adapters must be Python packages inside the
experiments/
directory containing the following files in order to run a successful simulation:adapter.py
: Class inheriting fromBaseExperiment
, implementing simulation-specific logic to connect the simulation and Multiagent RL.agents.py
: Agents that will be instantiated by the simulation controller when running this simulation.plot.py
: Logic to plot the simulation results after running this simulation.
Controller modules
Controller
The controller
module provides functionality to receive messages from the adapter and route them to the proper agents. This code is reusable by all simulations, hence its stored in the multiagentrl
package.
Learning
The learning
module stores general-purpose reinforcement learning algoriths. Every RL algorithm must inherit from the BaseLearningAlgorithm
class and implement two methods:
learn(self, state, action, reward)
: Adapts according to the current state representation, the last performed action, and a numerical reward value.act(self, state)
: Selects an action for the current state.
Exploration
The exploration
module stores general-purpose exploration algoriths. Every exploration algorithm must inherit from the BaseExplorationAlgorithm
class and implement one method:
explore(self, selected_action, legal_actions)
: Returns either the selected action or one of the legal actions for the current state.
Communication
The communication
module implement two classes: Server
and Client
. By using the ZeroMQ
package, client-server architecture is easily incorporated into the decision process cycle using recv
and send
methods to receive and send strings.
A server, configured with TCP/IP address, may receive and answer toany number of clients messages. However, a client can only connect to a single server. Due to a ZeroMQ restriction, in this architecture, the client must send a message first and, in sequence, receive a server reply. Should the server not be able to reply the client, communication is lost.
The following code implements a client-server architecture where the client sends Client data
and the server replies Server data
:
# Server-side script
import communication
server = communication.TCPServer()
received_data = server.receive()
print 'Received "{}"'.format(received_data)
send_data = 'Server data'
server.send(send_data)
print 'Sent "{}"'.format(send_data)
# Client-side script
import communication
client = communication.TCPClient()
send_data = 'Client data'
client.send(send_data)
print 'Sent "{}"'.format(send_data)
received_data = client.receive()
print 'Received "{}"'.format(received_data)
Server output:
Received "Client data"
Sent "Server data"
Client output:
Sent "Client data"
Received "Server data"
Messages
The messages
module stores all kinds of messages used in the Pac-Man application. All messages inherit from BaseMessage
and have a respective type.
For instance, AcknowledgementMessage
is used to communicate the server received the client message but has no special reply.
ACKNOWLEDGMENT_MSG = 'Acknowledgment'
class AcknowledgementMessage(BaseMessage):
def __init__(self):
super(AcknowledgementMessage, self).__init__(type=ACKNOWLEDGMENT_MSG)
Adapter modules
Adapter
The adapter
module contains an adapter class, inheriting from BaseExperiment
, which controls the flow of the simulation. It must also contain the build_parser()
function, to parse command-line arguments, and build_adapter_with_args(args)
, to build an adapter instance from the parsed arguments.
from multiagentrl import core
from multiagentrl import messages
class ExampleExperiment(core.BaseExperiment):
def __init__(self, learn_games, test_games):
super(ExampleExperiment, self).__init__(
learn_games=learn_games,
test_games=test_games)
def execute_game(self):
pass
def build_parser():
parser = argparse.ArgumentParser(description='Run example simulation.')
parser.add_argument(
'-l', '--learn-games', dest='learn_games', type=int, default=1,
help='number of games to learn from')
parser.add_argument(
'-t', '--test-games', dest='test_games', type=int, default=1,
help='number of games to test learned policy')
return parser
def build_adapter_with_args(args):
return ExampleExperiment(
learn_games=args.learn_games,
test_games=args.test_games)
Agents
The agents
module contains agent classes to be used for action selection when the simulation is running. They must inherit from BaseControllerAgent
and implement its virtual methods, namely start_game
, stop_game
, learn
and act
.
The following agent selects random actions for every simulation step.
import random
from multiagentrl import core
class RandomAgent(core.BaseControllerAgent):
"""Agent that randomly selects an action."""
def __init__(self, agent_id, ally_ids, enemy_ids):
super(RandomAgent, self).__init__(agent_id)
def start_game(self):
pass
def finish_game(self):
pass
def learn(self, state, action, reward):
pass
def act(self, state, legal_actions, explore):
if legal_actions:
return random.choice(legal_actions)
Plot
The plot
module generate graphs from simulation results. It must contain the build_parser()
function, to parse command-line arguments, and plot(args)
, to plot graphs from the parsed arguments.