Producer Scheduler - indieprotocol/indie-core GitHub Wiki

Turn/Token producer scheduling algorithm

The algorithm which determines the order of producers is referred to as the producer scheduling algorithm.

However, Graphene has an additional requirement which is not taken into account by the solutions in the thread:

The membership and length of the list of producers may change over time.

So in this article I'll describe my solution.

Turns and tokens

The solution is based on terms of turns and tokens.

Newly inserted producers start out with a turn and a token.
In order for a producer to be scheduled, it must have a turn and a token.
The scheduler maintains a FIFO of producers without tokens.
If no producer has a turn, then the scheduler gives a turn to all producers. This is called "emitting a turn."
While less than half of the producers have tokens, give a token to the first producer in the FIFO and remove it from the FIFO.
Schedule a producer by picking randomly from all producers with both a turn and token.
When a producer is scheduled, it loses its turn and token.

The generic scheduler

The generic scheduler implements turns and tokens. It only depends on the C++11 stdlib and boost (not even using fc). Types provided by Graphene are template parameters.

The generic far future scheduler

The far future scheduler is implemented with the following rules:

Run until you emit a turn.
Record all producers produced.
Run until you emit a second turn.
The producers produced between the emission of the first turn (exclusive) and emission of the second turn (inclusive) are called the far future schedule.

Then the schedule for the rest of time is determined by repeating the future schedule indefinitely. The far future scheduler is required to give the scheduling algorithm bounded runtime and memory usage even in chains involving very long gaps.

Slots

Due to dynamic block interval, we must carefully keep in mind the difference between schedule slots and timestamps. A schedule slot number is a positive integer. A slot number of n represents the nth next block-interval-aligned timestamp after the head block.

Note that the mapping between slot numbers and timestamps will change if the block interval changes.

Scheduling blocks

When each block is produced, the blockchain must determine whether the scheduler needs to be run. If fewer than num_producers are scheduled, the scheduler will run until 2*num_producers are scheduled. A block in which the scheduler runs is called a scheduling block.

Changes in the set of active producers do not modify the existing schedule. Rather, they will be incorporated into new schedule entries when the scheduler runs in the next scheduling block. Thus, a producer that has lost an election may still produce 1-2 blocks. Such a producer is called a lame duck.

Near vs. far schedule

From a particular chain state, it must be possible to specify a mapping from slots to producers, called the total producer schedule. The total producer schedule is partitioned into a prefix, called the near schedule; the remainder is the far schedule.

When a block occurs, n entries are drained (removed) from the head of the total schedule, where n is the slot number of the new block according to its parent block.

If the block is a scheduling block, the total schedule is further transformed. The new near schedule contains 2*num_producers entries, with the previous near schedule as a prefix. The rest of the near schedule is determined by the current blockchain RNG.

The new far schedule is determined by running the far future scheduler, as described above. The far future scheduler also obtains entropy from the current blockchain RNG.

As an optimization, the implementation does not run the far future scheduler until a far-future slot is actually queried. With this optimization, the only circumstance under which validating nodes must run the far future scheduler is when a block gap longer than num_producers occurs (an extremely rare condition).

Minimizing impact of selective dropout

The ability of any single malicious producer to affect the results of the shuffle algorithm is limited because the RNG is based on bit commitment of the producers. However, a malicious producer is able to refuse to produce a block. A run of m consecutively scheduled malicious producers can independently make m independent choices of whether to refuse to produce a block. Basically they are able to control m bits of entropy in the shuffle algorithm's output.

It is difficult-to-impossible to entirely eliminate "the last person being evil" problem in trustless distributed RNG's. But we can at least mitigate this vector by rate-limiting changes to the total producer schedule to a very slow rate.

If every block schedules a producer, our adversary with m malicious producers gets m chances per round to selectively drop out in order to manipulate the shuffle order, allowing m attacks per round. If producers are only scheduled once per round, a selective dropout requires the malicious producer to produce the scheduling block, limiting the probability to m/n attacks per round.