Backend server controller - Pyosch/powertac-server GitHub Wiki

up

The goal is to have a means that will allow the Tournament Scheduler to configure and run games on multiple hosts. We assume that the Tournament Scheduler knows what games it wants to run, which brokers are assigned to which games, and which machine resources it has available to run games.

For each game, the Tournament Scheduler must somehow assign an available machine, get a server started and configured on that machine, and gather up the resulting logfiles. When it starts, the server needs to receive configuration information, including the set of brokers to be included, the bootstrap data set to be used, and other configuration information. It may make sense to simply pass a URL served by the Tournament Scheduler and have the server request this information for each game.

While a game is running, the server generates two logfiles; a trace file and a state record. These can be in excess of 100 Mb each. Once a game is completed, these files need to be compressed and loaded into a space where the Tournament Scheduler can see them and support downloading.

If a server fails, the Tournament Scheduler must be able to detect the fact, clean up the mess, and try again. There are two obvious ways to detect failure:

  • Have an idea when the game should definitely be finished, and check to make sure the server has reached the end of its simulation and stopped. The trick is to know when the game should be finished.
  • Have the server compute its timeslot count and send it to the Tournament Scheduler when it starts, and then send a "heartbeat" message every 10 timeslots or so. The Tournament Scheduler would have to detect the fact that a heartbeat had NOT arrived on time (thus being required to prove a negative) and trigger the cleanup process.

align=center|width=400px

The general scheme is that the Tournament Scheduler keeps a database that includes pending games as well as server resources available to run them. When it needs to run a server, or kill one, or pull out the logs from a completed simulation, it requests the corresponding action from the jenkins service through a REST interface. Jenkins has the ability to run processes remotely, so it performs the requested actions.

Once a server starts up, it requests its configuration, bootstrap dataset, and broker credentials from the Tournament Scheduler. It then computes the game length and informs the TS, and starts the sim. It may send "heartbeat" messages to the TS to allow for failure detection, and it sends a completion message to the TS when the sim is complete, before it exits.

Notes