Cache - serlo/documentation GitHub Wiki

The basic idea of a cache

Making a request to the database layer is quite slow and serlo.org has so many users that our infrastructure would crash if we would perform a database request for each visit of our site. Thus we store responses of the database layer in a cache which we can reuse for following requests. Behind the scenes we use Redis for this job which is a key-value database. This means that Redis can store values which can be adressed / modified / retrieved given a unique key.

Let's see how we use the cache in an example: When you perform a request to dataSources.serlo.getUuid({ id: 1 }) we first look in the cache whether we already have saved an older response for this request. In order to do so we have a function which assigns to each service request a unique key. In this case it is the key de.serlo.org/api/uuid/1. Note that given the key de.serlo.org/api/uuid/1 we know that the request must have been dataSources.serlo.getUuid({ id: 1 }) and vice versa. Thus the relationship service request <-> cache key is a 1:1 relationship.

Now we look into our cache whether we have stored an older response for dataSources.serlo.getUuid({ id: 1 }) under the key de.serlo.org/api/uuid/1. If this is the case we return it. If not we perform a request to the database layer, store the returned value for later in the cache and return it:

activity diagram: how the cache works

How to keep the cache in sync with the database

Congratulations! :smile: We have a cache and once we have stored a response in it we never have to repeat the same request twice. :tada: Right? ... Unfortunately this is not the case. :cry: There is a reason for the quote "There are only two hard things in Computer Science: cache invalidation and naming things." Keeping the cache in sync with the database is a pain in the a** and invalid or old cache value have already been the reason for a lot of problems. So, these are our strategies for updating the cache...

Updating the cache for mutations

Each mutation changes something. Thus, most mutations need to be reflected in an update of the cache as well. Therefore the createMutation() function for service endpoints in the model has the possibility to specify a updateCache() function. This function is called after the mutation with the given arguments of the mutation and the returned value of the mutation in order to update the cache:

createThread = createMutation({
  ...
  updateCache({ payload, value }) {
    // implement updates of the cache here
  }
})

See https://github.com/serlo/api.serlo.org/blob/main/packages/server/src/model/serlo.ts for some examples.

Listeners at legacy serlo.org

We are still in the process of migrating from the legacy monolithic serlo.org system to our new infrastructure. Thus there are some mutating requests which are directly handled by the legacy system and which are not passed through the API. In this case the API cannot update its cache directly and needs to be informed otherwise that there has been a change.

For this case the api exposes mutation endpoints for updating the cache, see ~/schema/cache/types.graphql:

type Mutation {
  _cache: _cacheMutation!
}

type _cacheMutation {
  # Updates the cache entry to a certain value
  set(input: CacheSetInput!): CacheSetResponse!

  # Deletes a cache entry so that it need to be refetched
  remove(input: CacheRemoveInput!): CacheRemoveResponse!

  # Forces the API to update some of the follwing keys
  update(input: CacheUpdateInput!): CacheUpdateResponse!
}

In the legacy serlo.org system there are listeners watching for changes of the database which haven't been passed through the api (for example changing a taxonomy term). In case of a change they use the above endpoints to update the cache of the API or to invalidate some keys.

The SWR algorithm to update cache values

In order to have a mechanism to update cache values regularly we use the stale-while-revalidate algorithm. Here we assign a maximum age for their cache entries to each service endpoint. When a cache value is older than this age, we say that the cache entry is stale. We still return it to have fast responses, however, we update the cache entry in the background.

To do so we put the cache key in a job queue of cache entries which need to be updated. There is a list of swr queue workers which take a key from this queue and update it. With this architecture we perform update requests in sequence and not parallel in case a lot of cache keys get stale at once. This ensures we do not overload the database layer and the database with a lot of requests at once. So the algorithm is like this:

To be able to control the flow of request and not overwhelm the backends with update requests, we decided to implement a parameter swrFrequency that lets us limit the percentage of stale cache keys that actually get put on the queue. For example: If it is set to 10%, only 10% a stale value is requested it is put on the queue. It can be activated in times of high requests and deactivated when it is no longer necessary.

Update cache values manually

Even with the above mechanism we can end up with invalid cache entries due to bugs or other reasons. When a user of reports such an error we might want to react instantly and not want to wait until the cache entry gets updated. In those cases you can update the cache manually. In order to do this you can perform a query directly against the _removeCache endpoint:

  1. Go to https://serlo.org and log in
  2. Go to https://frontend.serlo.org/___graphql and perform the following request
mutation {
  _cache {
    remove(key: "<cache key you want to delete>")
  }
} 

Have a look at the definition for the service endpoint (like serlo.ts) to see what the cache key for a particular resource is.

Note that we have allowed this only for software developers at serlo.org. If you do not have permissions to remove an invalid cache go to https://community.serlo.org/channel/feature-requests-and-bugs and make a request in this channel.