Working with named caches - GlobalCyberAlliance/pdns GitHub Wiki
Creating a named cache
TODO
Setting up a data source
For the initial run, a named cache only supports CDB files as a data source. The reason for this, is that CDBs have a wonderful property that lookup times against a CDB are constant; this means that there is no difference in lookup times if you have 1 entry, or 1 000 000 entries.
NOTE: Due to the CDB specification, the total size of a CDB cannot exceed 4GB.
When it comes to assigning a data source to a named cache, there are two ways to go about doing this:
- Binding
- Loading
Loading a named cache from a data source, reads the entire contents of the data source, and populates an in-memory cache. The benefit of loading the data source, is that at no point, will the named cache have to attempt to resolve a lookup against the data source; however, this is at the cost of a higher memory footprint.
For memory-constrained environments, the user can opt to bind the named cache to its data source. In this case, when a lookup is performed against the named cache, it will check its in-memory cache first. If there is no entry held in memory, the named cache will perform the lookup against the data source; should the lookup against the data source yield a "hit", the named cache will store the lookup key and resulting value in memory. An additional benefit to binding is that re-binding the named cache to another data source is incredibly fast, but it may come at the cost of a "thundering herd" of cache misses, leading to a period of slightly-elevated disk I/O.
Binding
Binding a named cache to a data source is done through its bindToCDB(path)
method.
nc = newNamedCache("default")
nc:bindToCDB("path/to/file.cdb")
Calling bindToCDB
on a named cache that is currently bound to a data source will result in the named cache's in-memory cache to be purged, and any open file descriptors to the previous data source will be closed.
In the event bindToCDB
is called on a named cache that was previously populated with the loadFromCDB
method, the in-memory cache will be purged.
Loading
Populating a named cache from a data source is done through its loadFromCDB(path)
method:
nc = newNamedCache("default")
nc:loadFromCDB("path/to/file.cdb")
Calling loadFromCDB
on a named cache that had been previously populated, will populate a new in-memory cache before purging the old one.
Querying
Basic lookups against a named cache can be done by its lookup(domain)
method:
local result = getNamedCache("default"):lookup("www.example.net")
for k, v in pairs(result) do
print(k, v)
end
Keeping it atomic
While this detail is hidden from the user, when a named cache's lookup
method is called, it must operate on an atomically-loaded pointer to the in-memory cache, and on an atomically-loaded pointer to the data source (if need be). The use of atomic operations is incredibly important, as it can allow for the named cache's data source and/or in-memory cache to be swapped out, without blocking any queries.
Cache swapping
Regardless of whether a named cache has been loaded from, or bound to, its data source, at no point will the named cache ever block queries against itself.
When a named cache that is currently bound to its data source is re-bound:
- The new data source is "opened";
- A pointer to the old data source is atomically loaded;
- A mutex on the old data source should be locked to prevent any asynchronous lookup;
- A pointer to the new data source is atomically stored in the named cache;
- Any open file descriptors to the old data source are closed, and its mutex is unlocked;
- A new, empty in-memory cache object is allocated;
- A pointer to the named cache's current, in-memory cache is atomically loaded;
- The named cache's pointer to its in-memory cache is atomically swapped so that it refers to the new cache;
- The old in-memory cache is de-allocated;
When a named cache that is currently bound is loaded:
- The new data source is opened;
- A new in-memory cache object is allocated;
- The new, in-memory cache object is populated from the new data source;
- The named cache's pointer to its in-memory cache is swapped with a pointer to the new, in-memory cache;
- The old in-memory cache is de-allocated;
- The currently-bound data source is closed;
When a named cache that is currently loaded is bound:
- A new, empty, in-memory cache object is allocated;
- The named cache's data source pointer is atomically stored to refer to the new data source;
- The named cache's in-memory cache pointer is swapped out to refer to the new in-memory cache object;
- The old in-memory cache is de-allocated;
When a named cache that is currently loaded is re-loaded:
- A new, empty, in-memory cache object is allocated;
- The new in-memory cache is populated from the new data source;
- The named cache's in-memory cache pointer is atomically swapped to refer to the new in-memory cache;
- The old in-memory cache is de-allocated;
Regardless of whether a bound cache is being re-bound or loaded, or whether a loaded cache is being re-loaded or bound, there are several details that should hold true:
- There may be an atomically-incremented and -decremented value to indicate the number of currently-active queries against the in-memory cache;
- Similarly, keep track of the current number of queries against the data source;
- Only when the in-memory cache's, or data source's, internal counter hits 0 (zero), can the old object be de-allocated.