Pluggable BlobStores - GeoWebCache/geowebcache GitHub Wiki

Introduction

BlobStore is the interface that handles the actual storage and retrieval of tile images. The only realization is currently FileBlobStore, which stores tiles as single files following a legacy filesystem layout.

There can be one single instance of FileBlobStore and the location of the stored tiles is tied to the location of the geowebcache.xml configuration file.

This proposal aims to incorporate the following enhancements while maintaining behavioral backwards compatibility with GWC versions prior to 1.8, and minimal API changes:

  • Decouple the location of the cache directory and the geowebcache.xml configuration file;
  • Allow for multiple cache base directories;
  • Allow for alternate storage mechanisms than the current FileBlobStore;
  • Allow for different storage mechanisms to coexist;
  • Allow to chose which "blob store" to save tiles to on a per "tile layer" basis.

Status

Pull request ready for review: https://github.com/GeoWebCache/geowebcache/pull/337. User documentation included in this commit: https://github.com/groldan/geowebcache/commit/b3562231b73a9b1e0e3057c8ea08e715129e35d0

Strategy

Given BlobStore is an interface, we can obviously create implementations that store tiles in a different format and/or storage backend. Problem is GWC is designed to work with only one, defined in the Spring configuration. The easiest path forward to have multiple BlobStores coexisting is to keep that assumption but use a composite blob store instead, that merely multiplexes tile operations to the actual blobstores configured in the geowebcache.xml configuration file. This gives the flexibility necessary for the administrator to easily configure several blob stores without messing with the spring configuration.

To do so, instead of changing BlobStores API, we'll define a base configuration bean that defines the basic properties (unique id, enabled flag, etc) of a blob store, and concrete subclasses specialized in configuring and creating instances of different blob store types.

With this in place, the only missing piece to glue everything together is TileLayer's ability to define which blob store its tiles shall be stored to, so adding a blobStoreId property to TileLayer. This property can be undefined (i.e. null), meaning the "default" blob store shall be used instead. The "deafult" blob store is by default the same FileBlobStore than for versions prior to 1.8, and created automatically following the same cache directory lookup mechanism. But it can also be overridden by configuring a blob store in geowebcache.xml and settings its boolean default flag to true.

Changes

Implementation

Decouple blob store configuration from application components weaving, defining a base bean class for blob store configuration and delegating blob store construction to a factory method:

public abstract class BlobStoreConfig {
	/**
	 * @return the unique identifier for the blob store; which
	 *         {@link TileLayer#getBlobStoreId()} refers to.
	 */
	public String getId();
	/**
	 * @return whether the blob store is enabled ({@code true}) or not.
	 */
	public boolean isEnabled();
	/**
	 * Sets whether the blob store is enabled ({@code true}) or not.
	 */
	public void setEnabled(boolean enabled);
	/**
	 * @return whether the blob store defined by these settings is the default
	 *         one (i.e. the one used when
	 *         {@code TileLayer#getBlobStoreId() == null}, and hence used to
	 *         preserve backwards compatibility).
	 */
	public boolean isDefault();
	/**
	 * Sets whether the blob store defined by these settings is the default one
	 * (i.e. the one used when {@code TileLayer#getBlobStoreId() == null}, and
	 * hence used to preserve backwards compatibility).
	 */
	public void setDefault(boolean def);
	/**
	 * Factory method for this class of blobstore, configured as per this
	 * configuration object properties.
	 * <p>
	 * May only be called if {@link #isEnabled() == true}.
	 * 
	 * @throws StorageException
	 *             if the blob store can't be created with this configuration
	 *             settings
	 * @throws IllegalStateException
	 *             if {@link #isEnabled() isEnabled() == false} or
	 *             {@link #getId() getId() == null}
	 */
	public abstract BlobStore createInstance() throws StorageException;
}

And have concrete config beans for each kind of blob store:

public class FileBlobStoreConfig extends BlobStoreConfig {
public String getBaseDirectory() {
 ...
}

Introduce CompositeBlobStore, a BlobStore implementation that receives the list of configured blob stores from the XMLConfiguration and multiplexes tile operations to the blob store configured for each layer, or the default blob store if the layer has no explicit store id. Let the default blob store be created following the legacy cache location lookup mechanism in case no configured blob store is explicitly set as the default one.

public class CompositeBlobStore implements BlobStore {
    private Map<String, LiveStore> blobStores = new ConcurrentHashMap<>();
    private TileLayerDispatcher layers;
    private DefaultStorageFinder defaultStorageFinder;

    static final class LiveStore {
        private BlobStoreConfig config;

        private BlobStore liveInstance;

        public LiveStore(BlobStoreConfig config, @Nullable BlobStore store) {
            Preconditions.checkArgument((config.isEnabled() && store != null)
                    || !config.isEnabled());
            this.config = config;
            this.liveInstance = store;
        }
    }

    /**
     * Create a composite blob store that multiplexes tile operations to configured blobstores based
     * on {@link BlobStoreConfig#getId() blobstore id} and TileLayers
     * {@link TileLayer#getBlobStoreId() BlobStoreId} matches.
     * 
     * @param layers used to get the layer's {@link TileLayer#getBlobStoreId() blobstore id}
     * @param defaultStorageFinder to resolve the location of the cache directory for the legacy
     *        blob store when no {@link BlobStoreConfig#isDefault() default blob store} is given
     * @param configuration the configuration as read from {@code geowebcache.xml} containing the
     *        configured {@link XMLConfiguration#getBlobStores() blob stores}
     * @throws ConfigurationException if there's a configuration error like a store confing having
     *         no id, or two store configs having the same id, or more than one store config being
     *         marked as the default one, or the default store is not
     *         {@link BlobStoreConfig#isEnabled() enabled}
     * @throws StorageException if the live {@code BlobStore} instance can't be
     *         {@link BlobStoreConfig#createInstance() created} of an enabled
     *         {@link BlobStoreConfig}
     */
    public CompositeBlobStore(TileLayerDispatcher layers,
            DefaultStorageFinder defaultStorageFinder, XMLConfiguration configuration)
            throws StorageException, ConfigurationException {

        this.layers = layers;
        this.defaultStorageFinder = defaultStorageFinder;
        this.blobStores = loadBlobStores(configuration.getBlobStores());
    }
...
}

API

Add getBlobStoreId():String method to TileLayer, allow it to return null, meaning its tiles are handled by the default blob store.

public abstract class TileLayer{
/**
 * @return the identifier for the blob store that manages this layer tiles,
 *         or {@code null} if the default blob store shall be used
 */
public String getBlobStoreId();
...
}
public class GeoWebCacheConfiguration {
    private List<BlobStoreConfig> blobStores;
 ...
    public List<? extends BlobStoreConfig> getBlobStores(){
    	return blobStores;
    }
 ...
}

public class XMLConfiguration implements Configuration {
    public List<? extends BlobStoreConfig> getBlobStores() {
        return gwcConfig.getBlobStores();
    }
 ...
}

Configuration

Repplace FileBlobStore by CompositeBlobStore in geowebcache-core-context.xml:

<bean id="gwcBlobStore" class="org.geowebcache.storage.CompositeBlobStore" destroy-method="destroy">

Declare the needed changes in geowebcache.xsd:

  <xs:element name="gwcConfiguration">
...
        <xs:element name="formatModifiers" type="gwc:formatModifiers" minOccurs="0">
        ...
        </xs:element>
        <xs:element name="blobStores" minOccurs="0" maxOccurs="1">
          <xs:annotation>
            <xs:documentation xml:lang="en">
              The list of  blob stores. BlobStores allow to define a storage mechanism and format, such as the legacy file system
              based storage, an Amazon S3 instance with a TMS-like key structure, etc; independently of where the tiles come from
              in the TileLayer configuration. 
            </xs:documentation>
          </xs:annotation>
          <xs:complexType>
            <xs:sequence>
              <xs:element ref="gwc:blobstore" minOccurs="0" maxOccurs="unbounded"/>
            </xs:sequence>
          </xs:complexType>
        </xs:element>
...
  </xs:element>
  <xs:complexType name="AbstractBlobStore" abstract="true">
    <xs:sequence>
      <xs:element name="id" minOccurs="1" maxOccurs="1" type="xs:string" nillable="false">
        <xs:annotation>
          <xs:documentation>
            A blob store must have a unique identifier assigned through this element, which can be referenced
            by any number of TileLayer's 'blobStoreId'.
          </xs:documentation>
        </xs:annotation>
      </xs:element>
      <xs:element name="enabled" minOccurs="0" maxOccurs="1" type="xs:boolean" default="true">
        <xs:annotation>
          <xs:documentation>
            Defines whether the blob store is enabled (true) or disabled (false). Attempting to use
            a TileLayer whose blob store is disabled will result in a runtime exception.
          </xs:documentation>
        </xs:annotation>
      </xs:element>
    </xs:sequence>
    <xs:attribute name="default" type="xs:boolean" default="false">
      <xs:annotation>
        <xs:documentation xml:lang="en">
          The default attribute can only be true for one of the configured blob stores.
          If no blob store is configured as the default one, then one will be created automatically
          following the legacy location discovery method of looking for the GEOWEBCACHE_CACHE_DIR environment
          variable, servlet context parameter, or JVM argument.
          Additionally, any layer that has no blobStoreId set will default to use the default blob store,
          whether it is defined in the configuration file, or created automatically using the legacy method.
          So, it is allowed that none of the configured blob stores has its 'default' attribute set to true,
          but it's a configuration error that more than one is set as the default one. In such case, an exception
          will be thrown at application startup.
        </xs:documentation>
      </xs:annotation>
    </xs:attribute>
  </xs:complexType>
  
  <xs:element name="blobstore" type="gwc:AbstractBlobStore">
  </xs:element>
  
  <xs:element name="FileBlobStore" substitutionGroup="gwc:blobstore">
    <xs:complexType>
      <xs:complexContent>
        <xs:extension base="gwc:AbstractBlobStore">
          <xs:sequence>
            <xs:element name="baseDirectory" type="xs:string" minOccurs="1" maxOccurs="1">
              
            </xs:element>
            <xs:element name="fileSystemBlockSize" type="xs:positiveInteger" minOccurs="0" maxOccurs="1" nillable="true">
            </xs:element>
          </xs:sequence>
        </xs:extension>
      </xs:complexContent>
    </xs:complexType>
  </xs:element>
...

Additional considerations

DiskQuota "disk block size"

Ability to have multiple caches on different file systems, and different backend storage mechanisms, reveals a design flow in the disk-quota module. It assumes there's only one cache and its file based, and hence DiskQuotaConfig has a diskBlockSize integer setting used to pad the size of each tile to a whole file system block size.

This would rather be the responsibility of the "backend" (FileBlobStore), and let disk-quota blob store listeners be notified of the actual tile size.

The proposal is to move this configuartion setting back to FileBlobStoreConfig, and have FileBlobStore perform the padding to whole block size before calling the BlobStoreListeners, relievign the disk quota module from performing such padding to file system block size.

Configuration UI

Manual configuration of geowebcache.xml (i.e. GWC stand alone version) is straightforward. The GeoServer's GWC plugin should be updated with the ability to:

  • configure several blob stores;
  • define which store is the default one;
  • define the blob store on a per GeoServerTileLayer basis;

This will be accomplished once this proposal is accepted.

ArcGISCacheLayer

ArcGISCacheLayer is the Frankenstein's monster of TileLayers. A TileLayer is supposed to be responsible of fetching tile images from a backend source like a WMS, leaving the actual storage to the BlobStore by means of ConveyorTile.retrieve() (although for historical reasons it also a configuration object itself besides a business logic object and cares about orthogonal concerns such as checking cache availability and expiration rules, but that's another story).

Instead, ArcGISCacheLayer bypasses the blob store and accesses the the tiles on disk directly at getTile(final ConveyorTile tile), and doesn't support seeding.

Although outside the scope of this proposal, it'd be good to think in the future of creating an BlobStore specialized in reading and writing from and to an ArcGIS cache, hence decoupling the tile origin from the storage, meaning we could save tiles obtained from WMS or GeoServer to an arcgis cache instead of it having to be pre-generated in ArcGIS Server and served read-only by GWC.