Clustering improvements: disk quota - GeoWebCache/geowebcache GitHub Wiki

Introduction

The current disk quota mechanism is based on the Berkely DB Java edition. While very efficient it has a few drawbacks in enterprise setups:

  • although possible via streaming replication, it is not easy to make it cluster
  • enterprise systems might already have their database of choice, which is the mandatory way to store data. The database in question is often a well known relational one (Oracle, PostgreSQL, SQL Server), often centralized and clustered itself.

The purpose of this proposal is to allow people to use the disk quota under a wider range of cases by extending the implementations options and allow the subsystem to work off a relational database

Changes

The current gwc-diskquota module embeds in a single container three core elements:

  • the object model describing disk quotas, annotated for integration with BDB Java Edition
  • the BDB Java edition based storage engine, implementing the QuotaStore interface
  • the higher levels system listening to tile cache changing, doing the disk quota math and issuing requests to the storage subsystem

In order to have a pluggable system with multiple implementations the following changes are proposed:

  • switch the model classes (LayerQuota, PageStats, Quota, TilePage, TileSet) to interfaces
  • have a parent gwc-diskquota module that contains no code, but only sub-modules
  • have a gwc-diskquota-core module containing the upper level system and the model classes
  • have a gwc-diskquota-bdb module containing the Berkely DB implementation of the quota subsystem
  • have a gwc-diskquota-jdbc module with the code common to all relationa database oriented implementations of the disk quota subsystem
  • have a gwc-diskquota-oracle and a gwc-diskquota-postgres couple of modules providing the specific implementation details for the Oracle and PostgreSQL databases respectively

Implementation

The JDBC database based disk quota system will follow the core/dialect pattern already successfully used in GeoTools JDBC data stores, where a core module provides all the core functionality while the database specific code (if any) is contained in a Dialect class hierarchy, with one subclass per target database.

The core module will be implemented around Spring JDBCTemplate and will be powered by a user provided DataSource to allow users to use a simple local connection pool or rely on the connection pooling abilities of the web container.

The implementation will use prepared statements heavily, as the disk quota upper level tends to send a lot of small update requests for specific pages and we need to save network/parsing/planning time on each of these requests. The upper levels are already queuing the changes so that they happen either on every 1000 changes or every 2 seconds, this means the storage subsystem is relived from having to deal with load control heuristics.

In case the target database is empty the code will automatically populate the necessary tables, relying on the dialect classes to setup the best possible index layout.

Backward compatibility

The default GWC implementation will keep on using the BDB embedded solution, posing no backwards compatibility changes. Systems with a pre-existing cache moving from BDB to JDBC will be transparently migrated to the new storage by a background thread that will load information from the tiles on disk (that is, no direct migration of the statistics from BDD to JDBC).