Working with named caches - GlobalCyberAlliance/pdns GitHub Wiki

Creating a named cache

TODO

Setting up a data source

For the initial run, a named cache only supports CDB files as a data source. The reason for this, is that CDBs have a wonderful property that lookup times against a CDB are constant; this means that there is no difference in lookup times if you have 1 entry, or 1 000 000 entries.

NOTE: Due to the CDB specification, the total size of a CDB cannot exceed 4GB.

When it comes to assigning a data source to a named cache, there are two ways to go about doing this:

  • Binding
  • Loading

Loading a named cache from a data source, reads the entire contents of the data source, and populates an in-memory cache. The benefit of loading the data source, is that at no point, will the named cache have to attempt to resolve a lookup against the data source; however, this is at the cost of a higher memory footprint.

For memory-constrained environments, the user can opt to bind the named cache to its data source. In this case, when a lookup is performed against the named cache, it will check its in-memory cache first. If there is no entry held in memory, the named cache will perform the lookup against the data source; should the lookup against the data source yield a "hit", the named cache will store the lookup key and resulting value in memory. An additional benefit to binding is that re-binding the named cache to another data source is incredibly fast, but it may come at the cost of a "thundering herd" of cache misses, leading to a period of slightly-elevated disk I/O.

Binding

Binding a named cache to a data source is done through its bindToCDB(path) method.

nc = newNamedCache("default")
nc:bindToCDB("path/to/file.cdb")

Calling bindToCDB on a named cache that is currently bound to a data source will result in the named cache's in-memory cache to be purged, and any open file descriptors to the previous data source will be closed.

In the event bindToCDB is called on a named cache that was previously populated with the loadFromCDB method, the in-memory cache will be purged.

Loading

Populating a named cache from a data source is done through its loadFromCDB(path) method:

nc = newNamedCache("default")
nc:loadFromCDB("path/to/file.cdb")

Calling loadFromCDB on a named cache that had been previously populated, will populate a new in-memory cache before purging the old one.

Querying

Basic lookups against a named cache can be done by its lookup(domain) method:

local result = getNamedCache("default"):lookup("www.example.net")

for k, v in pairs(result) do
  print(k, v)
end
Keeping it atomic

While this detail is hidden from the user, when a named cache's lookup method is called, it must operate on an atomically-loaded pointer to the in-memory cache, and on an atomically-loaded pointer to the data source (if need be). The use of atomic operations is incredibly important, as it can allow for the named cache's data source and/or in-memory cache to be swapped out, without blocking any queries.

Cache swapping

Regardless of whether a named cache has been loaded from, or bound to, its data source, at no point will the named cache ever block queries against itself.

When a named cache that is currently bound to its data source is re-bound:

  • The new data source is "opened";
  • A pointer to the old data source is atomically loaded;
  • A mutex on the old data source should be locked to prevent any asynchronous lookup;
  • A pointer to the new data source is atomically stored in the named cache;
  • Any open file descriptors to the old data source are closed, and its mutex is unlocked;
  • A new, empty in-memory cache object is allocated;
  • A pointer to the named cache's current, in-memory cache is atomically loaded;
  • The named cache's pointer to its in-memory cache is atomically swapped so that it refers to the new cache;
  • The old in-memory cache is de-allocated;

When a named cache that is currently bound is loaded:

  • The new data source is opened;
  • A new in-memory cache object is allocated;
  • The new, in-memory cache object is populated from the new data source;
  • The named cache's pointer to its in-memory cache is swapped with a pointer to the new, in-memory cache;
  • The old in-memory cache is de-allocated;
  • The currently-bound data source is closed;

When a named cache that is currently loaded is bound:

  • A new, empty, in-memory cache object is allocated;
  • The named cache's data source pointer is atomically stored to refer to the new data source;
  • The named cache's in-memory cache pointer is swapped out to refer to the new in-memory cache object;
  • The old in-memory cache is de-allocated;

When a named cache that is currently loaded is re-loaded:

  • A new, empty, in-memory cache object is allocated;
  • The new in-memory cache is populated from the new data source;
  • The named cache's in-memory cache pointer is atomically swapped to refer to the new in-memory cache;
  • The old in-memory cache is de-allocated;

Regardless of whether a bound cache is being re-bound or loaded, or whether a loaded cache is being re-loaded or bound, there are several details that should hold true:

  • There may be an atomically-incremented and -decremented value to indicate the number of currently-active queries against the in-memory cache;
  • Similarly, keep track of the current number of queries against the data source;
  • Only when the in-memory cache's, or data source's, internal counter hits 0 (zero), can the old object be de-allocated.