What is Backbeat? - groupon/backbeat GitHub Wiki

Overview

Backbeat is an open source workflow service for processing asynchronous tasks across distributed systems. Originally developed by Groupon and its Financial Engineering Developer team, Backbeat allows applications to process tasks across N processes/machines while maintaining the state, order, and time in which tasks will run. Tasks can spawn other tasks and will be retried if the worker process errors out.

You would use Backbeat for

  • Running tasks where state needs to be coordinated across several worker processes
  • Running tasks that depend on another task's success or failure
  • Running tasks that can fail and need to be automatically retried
  • Scheduling tasks that need to be run in the future, which may kick off other tasks

Example

Let's say a customer purchases a brand new camera from Groupon Goods. A few tasks might occur for this purchase to be successful. Backbeat helps you define a tree of tasks in something called a workflow, where each task is called a node. For this example we will refer to tasks as nodes. For Backbeat to work you need both a Server and Client. The server holds the state of the workflow and the client application performs the actual work at hand.

  1. The first thing that is required is to create the workflow. Your client application will tell Backbeat server when it wants to create a workflow. A workflow has a subject which is unique to your application. For this example let's create a workflow with the subject being the inventory unit for the camera that was purchased(Camera 703)
  2. The next thing to do is to tell the workflow(from the client), which nodes we want to run. After a user on Groupon.com clicks to make a payment for the their camera we might send a signal called "Make Purchase".
  3. When the Backbeat server runs the make purchase node, it will tell the client "Hey it's time to run the 'Make Purchase' node". For our scenario let's say when the client is told to "Make Purchase" it first tries to reserve money on the customers credit card. If successful it will mark Camera 703 in its database with purchased=true (this is important for idempotence) and then tells backbeat to run a child node called "Successful Purchase".
  4. When it is time for "Successful Purchase" to run, the client may say that we want to run two "non-blocking" nodes: one called "Notify Customer" and one called "Notify Fulfillment Center". The reason we define the nodes as non-blocking is because we do not care if one node runs before the other. This is one reason why Backbeat is so useful. If we wanted to run a lot non-blocking nodes (say 1000), but needed to do something after all 1000 of those nodes finished, we now have the ability to run all 1000 of those nodes across 1000 different processes/machines at the same time and become notified when they finish.
  5. When the "Notify Customer" and "Notify Fulfillment Center" nodes run they send an email to the customer saying the purchase was successful and message to the fulfillment center to have them ship the camera, respectively. The separate client processes that are running the nodes notify the backbeat server that the nodes have completed and then then the parents are marked complete by the server.
  6. Extra Credit- We could stop there, but let's say that instead of the purchase being successful it failed because the customer's credit card was declined. This entire process we just performed could be retried by signaling the workflow with another "Make Purchase" signal. Top level nodes created by signaling a workflow are by default "blocking". This means that if we sent 100 "Make Purchase" signals to the workflow at once, only one would run at a time.

####Sidenotes #####Backbeat is super flexible. Defining the above workflow could be done in many different ways. For example, having child nodes called Successful Purchase and Failed Purchase, are unnecessary, but sometimes nice for readability and code organization. Without those two node definitions our workflow could have looked like this and performed just the same. #####Idempotence is important when using backbeat. In other words, each node needs to be defined in a way where if it was run multiple times, the result will not be changed beyond the result that came about from running it the first time. In step 3 we marked the Camera as purchased in the database. We did this so that if the node failed for whatever reason after reserving money on the credit card, and the node was retried, the process running the node would check if the camera was already purchased so that it does not charge the customer twice. Obviously this is not 100% fool-proof. There is always the extremely rare case that the connection to the database could be lost or the process could die unexpectedly immediately after charging the credit card. In that case backbeat would retry the node and the user could theoretically be charged twice.