001. Kafka Theory. Topics, partitions and offsets. - MarkHuntDev/my-kafka-exercises GitHub Wiki

  • Topics: a particular stream of data
    • Similar to a table in a database (without all the constraints)
    • You can have as many topics as you want
    • A topic is identified by its name
  • Topics are split in partitions
    • Each partition is ordered
    • Each message within a partition gets an incremental id, called offset

Topic example: truck_gps

  • Say you have a fleet of trucks, each truck reports its GPS position to Kafka
  • You can have a topic trucks_gps that contains the position of all trucks
  • Each truck will send a message to Kafka every 20 seconds, each message will contain the truck ID and the truck position (latitude and longitude)
  • We choose to create that topic with 10 partitions (arbitrary number)

  • Offset only have a meaning for a specific partition
    • E.g. offset 3 in partition 0 doesn't represent the same data as offset 3 in partition 1
  • Order is guaranteed only within a partition (not across partitions)
  • Data is kept only for a limited time (default is one week)
  • Once the data is written to a partition, it can't be changed (immutability)
  • Data is assigned randomly to a partition unless a key is provided (more on this later)