Exploring the Code - Skalwalker/MRLCommunication GitHub Wiki

This page is from Matheus Portela's Wiki

Code structure

Multiagent RL contains two packages that interact when executing a simulation:

  • Controller: Code from multiagentrl package that provides common features to be reused by any simulation. Contains the following files:
    • core.py: Main code, providing base classes for adapters, agents and controllers.
    • communication.py: Functionality to receive and send messages between processes and threads.
    • controller.py: Logic when running a simulation, receiving messages from the adapter and sending them to the agents.
    • exploration.py: Exploration algorithms reusable by any simulation, such as ε-greedy.
    • learning.py: Generic learning algorithms reusable by any simulation, such as Q-learning.
    • messages.py: Message classes to be exchanged between adapter and controller.
  • Adapter: Code specific for each simulation, such as Windy World and Pac-Man. All adapters must be Python packages inside the experiments/ directory containing the following files in order to run a successful simulation:
    • adapter.py: Class inheriting from BaseExperiment, implementing simulation-specific logic to connect the simulation and Multiagent RL.
    • agents.py: Agents that will be instantiated by the simulation controller when running this simulation.
    • plot.py: Logic to plot the simulation results after running this simulation.

Controller modules

Controller

The controller module provides functionality to receive messages from the adapter and route them to the proper agents. This code is reusable by all simulations, hence its stored in the multiagentrl package.

Learning

The learning module stores general-purpose reinforcement learning algoriths. Every RL algorithm must inherit from the BaseLearningAlgorithm class and implement two methods:

  • learn(self, state, action, reward): Adapts according to the current state representation, the last performed action, and a numerical reward value.
  • act(self, state): Selects an action for the current state.

Exploration

The exploration module stores general-purpose exploration algoriths. Every exploration algorithm must inherit from the BaseExplorationAlgorithm class and implement one method:

  • explore(self, selected_action, legal_actions): Returns either the selected action or one of the legal actions for the current state.

Communication

The communication module implement two classes: Server and Client. By using the ZeroMQ package, client-server architecture is easily incorporated into the decision process cycle using recv and send methods to receive and send strings.

A server, configured with TCP/IP address, may receive and answer toany number of clients messages. However, a client can only connect to a single server. Due to a ZeroMQ restriction, in this architecture, the client must send a message first and, in sequence, receive a server reply. Should the server not be able to reply the client, communication is lost.

The following code implements a client-server architecture where the client sends Client data and the server replies Server data:

# Server-side script
import communication

server = communication.TCPServer()

received_data = server.receive()
print 'Received "{}"'.format(received_data)

send_data = 'Server data'
server.send(send_data)
print 'Sent "{}"'.format(send_data)
# Client-side script
import communication

client = communication.TCPClient()

send_data = 'Client data'
client.send(send_data)
print 'Sent "{}"'.format(send_data)

received_data = client.receive()
print 'Received "{}"'.format(received_data)

Server output:

Received "Client data"
Sent "Server data"

Client output:

Sent "Client data"
Received "Server data"

Messages

The messages module stores all kinds of messages used in the Pac-Man application. All messages inherit from BaseMessage and have a respective type.

For instance, AcknowledgementMessage is used to communicate the server received the client message but has no special reply.

ACKNOWLEDGMENT_MSG = 'Acknowledgment'

class AcknowledgementMessage(BaseMessage):
    def __init__(self):
        super(AcknowledgementMessage, self).__init__(type=ACKNOWLEDGMENT_MSG)

Adapter modules

Adapter

The adapter module contains an adapter class, inheriting from BaseExperiment, which controls the flow of the simulation. It must also contain the build_parser() function, to parse command-line arguments, and build_adapter_with_args(args), to build an adapter instance from the parsed arguments.

from multiagentrl import core
from multiagentrl import messages


class ExampleExperiment(core.BaseExperiment):
    def __init__(self, learn_games, test_games):
        super(ExampleExperiment, self).__init__(
            learn_games=learn_games,
            test_games=test_games)

    def execute_game(self):
        pass


def build_parser():
    parser = argparse.ArgumentParser(description='Run example simulation.')
    parser.add_argument(
        '-l', '--learn-games', dest='learn_games', type=int, default=1,
        help='number of games to learn from')
    parser.add_argument(
        '-t', '--test-games', dest='test_games', type=int, default=1,
        help='number of games to test learned policy')
    return parser


def build_adapter_with_args(args):
    return ExampleExperiment(
        learn_games=args.learn_games,
        test_games=args.test_games)

Agents

The agents module contains agent classes to be used for action selection when the simulation is running. They must inherit from BaseControllerAgent and implement its virtual methods, namely start_game, stop_game, learn and act.

The following agent selects random actions for every simulation step.

import random

from multiagentrl import core


class RandomAgent(core.BaseControllerAgent):
    """Agent that randomly selects an action."""
    def __init__(self, agent_id, ally_ids, enemy_ids):
        super(RandomAgent, self).__init__(agent_id)

    def start_game(self):
        pass

    def finish_game(self):
        pass

    def learn(self, state, action, reward):
        pass

    def act(self, state, legal_actions, explore):
        if legal_actions:
            return random.choice(legal_actions)

Plot

The plot module generate graphs from simulation results. It must contain the build_parser() function, to parse command-line arguments, and plot(args), to plot graphs from the parsed arguments.