BG Workers Summary - noobaa/noobaa-core GitHub Wiki

[WIP]

A short overview of the current implemented background workers. They are all started from the main BG Workers Runner. Each has its own delay between cycles.

Stats Aggregator

The Stats Aggregator BG serves 2 roles:

  1. Serving as the statistics collector to send to phone home. It collects anonymous data (number of buckets, types of policies used etc.) Currently, this functionality is disabled.
  2. Collecting statistics and metrics to expose to the /metrics Prometheus REST API. In this flow, the aggregator differentiates between a full cycle and a partial cycle. The partial cycle currently runs every 5 minutes while the full cycle runs every 6 partial cycles. The difference between the two is mainly objects stats and heavier computations which we don't want running every 5m.

Mirror Writer

The Mirror Writer BGserves the async replication on a MIRROR placement policy set on the different tiers in the system. For each tier in the system that has MIRROR as its placement, the initial write would be directed to one of the mirror groups (taking into account locality of regions were set). On each cycle, this BG runs on all the newly created chunks (since its last updated timestamp) and replicates them to the defined mirror groups within the tier

MD Aggregator

The MD Aggregator collects Objects & their mappings statistics. While the stats aggregator also serves as a BG that collects statistics, Objects and Mappings need different handling than just direct querying when talking about scale. For this purpose, this BG runs 90seconds behind the current system time and calculates all the new and deleted sizes of objects, chunks, and blocks in the system. It saves the data as well as the timestamp for its collection in the DB which is later used by both the stats_aggregator and the various management clients (mainly the UI) for displaying different statistics and information.

Usage Aggregator

The Usage Aggregator in a similar manner to the MD Aggregator, collects statistics about bandwidth and throughput in the system. It is spliced by accounts and used later on as part of the Analytics shown in the UI and the metrics exposed to Prometheus

Scrubber

The Scrubber is a crawler that runs on all the mappings in the system and verifies their correctness according to the policy set on the bucket and the desired state of the different resources. Examples of cases the scrubber can fix are reclaiming chunks/blocks of deleted objects which were not reclaimed, performing async replication to new mirror groups etc.

Bucket Reclaimer

The Bucket Reclaimer mainly serves the OBC flows. When an OBC is deleted, the corresponding bucket is marked as deleted, won't show up in lists and the data within it won't be accessible. This BG will go over all the objects within the bucket and will delete them.

Tier TTF

The Tier Time to Fill runs on all tiering policies of buckets with more than one 1 tier and aims to keep serving writes for 1 hour for each of the tiers besides the last one in the chain. It does so by calculating the bandwidth observed within the last hour for that specific bucket as uses that a a reference point to what is the desired free capacity for the tier to support another hour of writing.