Home - grantr/shortbus GitHub Wiki

Shortbus is a Change Data Capture (CDC) system for arbitrary, user-defined data streams. It is designed to be as resilient to failures as possible while still maintaining the following guarantees:

  • All messages are delivered at least once
  • No message is delivered out of sequence (k-sorted)
  • Streams are durable and highly available
  • Low latency
  • No single point of failure

Shortbus is similar in guarantees and structure to Databus, a CDC system developed at Linkedin. The implementations may differ substantially as the implementation details of Databus have not been revealed.

Features

  • Optional stream persistence and snapshotting. Bootstrap new clients with a snapshot instead of needing to replay the entire stream. Useful for streams that represent databases or records with ids. TODO

What is it useful for?

User-space database replication and backup

Run a daemon that parses database transaction logs and sends updated records to Shortbus. Use it to replicate databases across WANs, or just for database backup. Use snapshots to bootstrap new database instances.

Syndication to clients with intermittent connections

You have a local autocomplete database for your mobile app to get lightning fast search. How do you keep it updated? Send every name change to Shortbus. Clients connect on startup to get the latest deltas or a snapshot if their data is too old. If you need realtime updates, just keep the connection open and new deltas will be pushed to clients.

Event sourcing

You're running a shipping warehouse and you want to capture everything that ever happens to an item in the warehouse for auditing purposes. Whenever there's an update, push that change into Shortbus and get a persisted sequence with arbitrary lookback.

Any situation where sequenced data streams are needed

You can probably think of something!

More info

  • Publisher API: The HTTP API used by producers to publish transactions
  • Subscriber API: The HTTP API used by subscribers to consume transactions
  • Streams: The structure and semantics of streams
  • Snapshots: The structure and semantics of snapshots
  • Relays: Relay servers and stream consistency
  • Bootstrap Servers: Bootstrap servers and storage
  • Scenarios: Various scenarios and how they will be handled