Skip to content

GSIP 145

jratike80 edited this page Nov 1, 2016 · 12 revisions

GSIP 145 - Back-up and Restore Extension for GeoServer Configuration

Overview

The goal of this proposal is to implement in GeoServer the ability to move configuration changes between GeoServer instances as well as to backup and restore the configuration itself.

Also we propose here to refactor a bit the GeoServer configuration workflow in order to enable the possibility of storing some environment variables (like hosts names and addresses, request time-outs and/or network configurations, connections parameters to storages like DB not configured as JNDI resources, SOLR or whatever, and so on…) on a file external to the data directory, that GeoServer can use to parametrize data stores and/or plugins configurations at runtime.

This would allow users to easily configure GeoServer on a certain environment (lets say a TEST environment), backup the configuration and restore it later on another environment with different settings (lets say a PROD environment). Moreover users can save the configurations, test new changes and revert back the ones potentially harmful to GeoServer.

Proposed By

Alessio Fabiani (GeoSolutions)

Assigned to Release

This proposal is for GeoServer 2.10 and later.

State

  • Under Discussion
  • In Progress
  • Completed
  • Rejected
  • Deferred

Proposal

Motivation

Let’s provide a few use cases from which the reader can better understand the requirements as well as the functionalities needed.

Use Case 1

Organization1 has deployed a cluster of GeoServer which is used as TEST, as well as another cluster which is used as PROD. They extensively make use of REST to create and configure new layers.

Organization1 is currently using Single Master/Multiple Slaves clustering approach configured optionally as follows:

  1. Single Master/Multiple Slaves with:

    • Shared Data Directory
    • No Configuration on DBMS
    • No active clustering plugins.
    • The two systems (PROD and TEST) have been deployed on separated networks and don’t share anything (no data, no data directory, nothing).
  2. Single Master/Multiple Slaves with:

    • Shared Data Directory
    • Configuration on DBMS
    • Active clustering plugins
    • The two systems are on the same network and share the data while the data directory is separated in order to introduce a validation step when changes are made to the TEST with respect to transferring them to the PROD system.

As a GeoServer user I want to be able to perform changes to the configuration in TEST, test them to make sure they are fine and then promote them to PROD without having to replicate them one by one; however we do assume that whenever we try to move (portion of) the configuration from TEST to PROD the two GeoServer versions are the same.

Ideally, I would like to be able to:

  1. Perform multiple configuration changes and to test them without propagating them to PROD instances.
  2. Accumulate and then propagate configuration changes from TEST to PROD when happy.
  3. Revert back configuration in TEST
  4. Propagate the changes to PROD.
  5. Obtain a BACKUP file (could be a ZIP archive containing JSON or XML files plus other external stuff) performing a REST call on the TEST instance to export either a specific workspace configuration, or the whole configuration, to a specific URI (it could be a file system path, it could be an FTP site, it could a REST endpoint).
  6. Perform a second REST call on the PROD instance to import the exported configuration from a specific URI (it could be a file system path, it could be an FTP site, it could a REST endpoint).

Changes to the configuration might include change to GWC configuration or actually be limited to it.

In principle it would be great if I could:

  • Perform a dry-run import on the PROD instance that would tell if there are problems with config export and (possibly) where.
  • Support the possibility of performing REST requests to the backup/restore endpoints which allow users to manually specify single workspaces to dump or replace instead of performing a full backup.

Use case 2

Organization2 uses GeoServer in the back-end of another application to do the heavy lifting when it comes to maps and access to vector and raster data. They interact both with the GUI of GeoServer and REST interface. Organization2 uses by default a single instance of GeoServer with configuration persisted in a DBMS.

As a GeoServer user I want to be able to (prioritized order):

  1. Backup and restore the configuration of GeoServer to/from a specific URI (it could be a file system path, it could be an FTP site, it could a REST endpoint)
  2. Be able to revert changes in the GeoServer configuration by going back in time to a previous version of the configuration. This functionality should be available both in the GUI as well as in via REST calls; in the front-end application we would allow to revert changes performed to GeoServer as part of reverting changes in the application itself. As an instance I would ask GeoServer to revert the changes created by importing a shapefile or the changes created by deleting an existing store.

Supporting Different Environments

It is worth notice that we want to address here also the case where one backup is restored on a different environment. Having a backup file for a certain configuration does not guarantee that this can be restored anywhere, as an instance if the OrganisationX has a TEST environment on a different network and with different physical configuration from the PROD one, it won’t be able to transfer the configuration “as is” between the two environments. While differences in DBMS connection parameters can be addressed via JNDI or handling shapefiles and GeoTIFFs by using mount points, this is not so straight with other plugins and/or modules, like SOLR, SDE and so on.

Therefore among this proposal we want to address also the issue of transferring the backup from one environment to another. The rationale is to introduce some “placeholders” into the persisted configuration and allow GeoServer to use “environment parameters” in order to resolve them at loading/reading time.

Design of the Solution

Moving a GeoServer configuration across environments can be painful because of differences in the environment themselves. JNDI and data mount points in the file system can shade a number of issues, but these approaches are not always usable/acceptable. Some configuration items, like in the security subsystem, will simply be impossible to delegate to JNDI (e.g., location of the CAS server).

The backup of service configuration will happen by crawling the existing configuration, and using the existing Rest XStream bindings to perform the conversion to XML (as they use links with names instead of ids, which should result in less dangling references if the target environment has been “hand-modified” for any reason). Working off the programming API will make the solution usable regardless of how the configuration is actually stored (file system, database).

The result will be a ZIP file (or tar.gz depending on a setting of the backup request - ZIP by default) with an internal structure similar to the data directory, with either the full configuration, or just a single workspace worth of it (including its own local services, groups and styles).

The root will contain a metadata file with information about the backup, such as date and whether it’s a workspace specific or a global backup. As an instance, the summary file could be an XML/JSON one containing some sections as follows:

  • GeoServer Build Information
    • Geoserver version
    • Git Revision
    • Build Data…
  • GeoServer Basic info
    • Original data directory
    • JVM Version
    • Java Rendering Engine
    • Native JAI
    • Native JAI Imageio
  • Backup info
    • Start data-time
    • End data-time
    • Warnings
    • Global/Local
    • Workspace

While the zip is being prepared the global configuration lock will be held, preventing any change to the configuration while the backup copy is being generated. See later for more detailed info on how the configuration/services locks will be acquired.

Likewise, during restore the lock will be held, the configuration/catalog will be empty-ed, and then a new one will be loaded from the zip file. In order to reduce the impact of the restore on the clients we are looking into a solution where we would build a catalog in parallel and then swap it quickly, as an empty catalog breaks also all OGC requests (while the backup only prevents config changes). See later for more detailed info on how the configuration/services locks will be acquired.

The dry-run restore support will parse the configuration files and try to assess if they are suitable for the current environment. A report will be generated at the end of the process.

API Change

Making GeoServer configuration easier to move across environments

The proposal is to allow GeoServer to handle placeholders in configuration and replace them as needed with the provided input in order to customize some portions of the configuration itself. The placeholders would be configured in a separate file not subject to export/import (or subject only on demand). The location where this file is placed will be configurable also via an env variable similarly to how the GeoServer data dir is setup (env variable, system variable, servlet context parameter).

Besides the datastores which already have a neat separation between what’s stored and what’s used, with transformation in the middle, we’ll have to add extra methods/parameters to get the expanded or unexpanded version of a certain object.

Areas of work:

  • Expand variables in datastore configurations
  • Expand variables in service configurations
  • Expand variables in selected security subsystem bits
  • Expand variables in GWC disk quota configurations

The basic idea is to have a singleton, loaded at initialization time, able to transform placeholders defined by the user into real values and vice versa.

As an instance we could have environment variables like:

jdbc.driverClassName = “my_value”
jdbc.connectUrl = “my_value”

which we would like to map as:

${jdbc.driverClassName}
${jdbc.connectUrl}

into the configuration.

public class GeoServerEnvironment implements ApplicationContextAware, ApplicationListener {
	public static Serializable translate(Serializable template) {
         ...
	}
}

We’ll need to slightly modify the ResourcePool and Catalog connection parameters loading with something similar to this:

public class ResourcePool {
	...
    	public static <K,V> Map<K,V> getParams(Map<K,V> m, GeoServerResourceLoader loader) {
        @SuppressWarnings("unchecked")
        Map<K,V> params = Collections.synchronizedMap(new HashMap<K,V>(m));
        
        for (Entry<K,V> entry : params.entrySet()) {
            String key = (String) entry.getKey();
            Object value = GeoServerEnvironment.translate(entry.getValue());

	...

DataStores and CoverageStores

The Stores will preserve placeholders on the persisted configuration. The user will be able to use them via Admin GUI edit pages also and via REST config.

GeoServer API must be updated in order to translate the placeholders whenever there is the need to perform a validation of the connection parameters or use the Store.

As an instance for the DataStores on GUI side when creating a new Store the onSaveDataStore method must be reworked a bit in order to support the placeholders translations

public class DataAccessNewPage extends AbstractDataAccessPage {
	…
	protected final void onSaveDataStore(final DataStoreInfo info, AjaxRequestTarget target)
            throws IllegalArgumentException {
		…
DataStoreInfo expandedStore = catalog.getFactory().createDataStore();

// using the class static method clone to populate the store
clone(info, expandedStore);

where the AbstractDataAccessPage.clone(final DataStoreInfo source, DataStoreInfo target) instead of blindly copy the connection parameters, should translate them

protected void clone(final DataStoreInfo source, DataStoreInfo target) {
target.getConnectionParameters().clear();
for (Entry<String, Serializable> param : source.getConnectionParameters().entrySet()) {
        	target.getConnectionParameters().put(param.getKey(), GeoServerEnvironment.translate(param.getValue()));
        }

What stated above is valid also for the DataAccessEditPage.

One issue we’ll need to face, is how to handle the Wicket Validators built automatically upon the store connection parameters type.

A possible solution would be to enable a global configuration or environment variable called ALLOW_ENV_PARAMETRIZATION which will toggle Wicket default Validators for the parameters. This will allow us to edit connection parameters values using placemarks from the GUI also.

Services

Need to be done on a case by case basis XStreamServiceLoader implementations must take care of placeholders translations through the GeoServerEnvironment class.

Security Subsystem

Security configurations are loaded through the GeoServerSecurityManager loadConfig method.

The main issue here is that the XStream persister uses by default reflection and calls to the default constructor to instantiate the different SecurityConfig concrete classes.

One option could be to change a bit the contract making the SecurityConfigs more similar to the DataStoreInfo. In other words, instead of POJOs each SecurityConfig would become a SecurityConfigInfo with its own list of connectionParameters.

The parameters will be translated using an approach similar to the ResourcePool.getParams one, while a SecurityConfigFactory will take care of getting the final service configuration.

This will allow us to store placeholders on the XML file and get values from the GeoServerEnvironment as needed.

GWC DiskQuota

We will need to modify the GWC XStream configuration persisters. The idea is similar to the Security Subsystem one. The GWC diskquota configuration will become parameterized like the GeoServer DataStores and methods like the GeoServerExtensions.getProperty will take care of injecting the external environment variable from GeoServer to the GeoWebCache configuration. We envisage the use of a singleton GeoWebCacheEnvironment delegated to do that, similar to the GeoServerEnvironment one described above.

Configuration/Services Locking Strategy

Regardless of the operation performed, locks should never be retained for unlimited time, and also a failing backup or restore task must not block GeoServer forever.

Other than this, a backup or restore task should be aborted via a “DELETE” REST call.

As a general principle:

  • During the backup operation, the configuration must be locked but the OGC services should still be accessible.
  • During the restore operation, both the configuration and OGC services must be locked.

Both statements above present several issues:

  1. Locking the configuration does not allow to block the process since the REST call won’t be executed until the lock has been released.

    We can use the same RestConfigurationLockCallback locking mechanism here, except for the fact that the “DELETE” REST call does not try to acquire a “write lock”. The backup context must correctly handle this contract, allow the workflow to be interrupted through listeners and release the “write lock”.

    The same approach will be followed by a manager which must periodically check the status of the backup and restore workflows. If a single step takes too long (expiring time) or the intermediate status is a failure but the whole job has not been updated, it must block the execution and release the “write lock”.

    Notice that this approach does not allow to execute multiple backup operations at the same time.

  2. If the restore operation takes too much time, locking completely the server is not feasible.

    The proposed solution is to restore the catalog on a temporary instance leaving the server active (at least the OGC services) and swap the configuration once the restore has finished. In this way, the global lock will be held for the shortest possible time.

  3. It should be possible to advertise users that the server is performing a write operation and that the configuration is currently locked without hanging the UI pages.

    Currently the “write lock” is taken by the dispatcher callback before every page is rendered. In other words, a configuration lock freezes every GeoServer secured page at least. The approach of the WicketConfigurationLockCallback must be slightly changed here by:

    • Adding a “getHoldCount()” method to the GeoServerConfigurationLock singleton, allowing the UI to know at every time if a write operation is in progress or not on the server.

    • Have the callback not block on the “write lock” but raise a flag instead and consequently allow the GeoServerBasePage to render a warning message and invoke a visitor which will disable all the forms, submit buttons and ajax links.

Backing up and other configuration bits

GeoServer has several configuration elements that are not centrally managed by the service config and catalog, including:

  • Extra referencing configurations (epsg.properties and associated files)
  • Logging configuration profiles
  • GetFeatureInfo freemarker templates
  • Some datastore extra configurations, like mosaic, app-schema, pre-generalized store, image jdbc config
  • Plugins config files like control-flow, monitoring
  • Icons and other files referred by the styles, WMS watermark, and the like

For those which cannot be managed through the system variables (and need to save extra data), we will create a set of pluggable objects that can handle those files, tentatively with these interfaces:

/**
 * Handles saving/restoring configuration that is not attached to a specific catalog resource
 **/
interface GlobalConfigArchiver {
     public void saveConfiguration(ZipOutputStream zos);
     public void loadConfiguration(ZipInputStream zis);     
}

/**
 * Handles configuration attached to a specific catalog element
 **/
interface CatalogConfigArchiver {
     public boolean canHandle(CatalogInfo ci);
     public void saveConfiguration(CatalogInfo ci, ZipOutputStream zos);
     public void loadConfiguration(CatalogInfo ci, ZipEntry ze, ZipInputStream zis);     
}

It is worth notice that not all of the points listed above need pluggability. As an instance the logging subsystem is using system variables and does not need such extra helper classes to handle the configuration. The same is for ImageMosaic (see later), where we assume that the folder structure remains the same over the environments; i.e. the backup and restore will take care only of the catalog configuration.

Plugins like jms-clustering, monitoring, control-flow, geofence, remote-wps and so on, need extra information on where properties files are located. Infact, from the GeoServer catalog and the ResourceInfo there is no way to get information about that.

ImageMosaic/ImagePyramid DataStores

Coverage Plugins like ImageMosaic or ImagePyramid, use internally a DataStores to access the mosaic granules index. They also save the physical location of the granules into the DataStore as a FeatureAttribute.

The moving of such Coverages from an environment to another, must face the following issues:

  1. The connection parameters of the DataStore (as an instance if it’s a PostGIS or an Oracle DataStore) may be different between the two environments.
  2. The physical location of the granules (especially if we instructed the plugin to store absolute paths) may not be coincident.

To solve the issue above using the current implementation of the GeoTools ImageMosaic plugin, we propose the following changes:

  • We need to inject a DataStore from outside (i.e. from GeoServer) into the GranuleCatalog instead of let the CatalogManager build the “tileIndexStore” from the SPI and Properties of the “datastore.properties”. The GeoServer ResourcePool currently injects a new CoverageReader Hint into the ImageMosaic or ImagePyramid reader in order to pass down the ExecutorService. The idea is to allow GeoServer to inject into the reader a Repository (whenever is available) with its DataStores through a new Hint. The user willing to reuse an existing DataStore from the GranuleCatalog, should provide the “id” of the store available into the Repository, instead of the whole list of the connection parameters, as a property of the “datastore.properties” file. The ImageMosaic CatalogManager will be improved in order to look for the DataStore id into the Repository and provide it to the GranuleCatalog.
  • As far as the location of the data is concerned, particular will have to be made in making sure the two systems are configured in the same way or, as an alternative, the index gets modified accordingly when moved across (which is out of the scope of this proposal as we don’t focus on data but on the GeoServer configuration; although the index is not properly “data” it is closer to the definition of data rather than to the definition of configuration

UI/REST driving the backup/restore

The backup restore subsystem will provide a user interface to perform backup and restores of the configuration.

The UI will consist of one page with sections, one for backup and one for restore, allowing to choose the target/source file, and buttons to initiate the operation, with some visual feedback as to what is being backed-up/restored at the moment (no progress available), and the ability to cancel the operation.

The REST api will consist of a few resources meant to used in an asynchronous fashion:

Resource Method Parameters and Notes
/rest/br/backup/ POST Post a JSON document with the backup parameters, see below.
/rest/br/backup/backupId GET Returns a json representation of the backup operation. See below.

/rest/br/backup/backupId DELETE Cancels the backup operation /rest/br/restore POST Post a JSON document with the restore parameters, see below /rest/br/restore/restoreId GET Returns a json representation of the backup operation, see below. /rest/br/restore/restoreId DELETE Cancels the restore operation

Creation of a new backup

{  
  "workspaces":[  
    "ws1",
    "ws2"
  ],
  "file":"/var/backups/gsBackup1.zip",
  "overwrite":true
}

The workspaces and overwrite parameters are optional, if missing the backup is full, and the backup will fail if the target file already exists.

Representation of backup/restore operation

{  
  "status":"processing",
  "currentResource":"/myWorkspace/myStore/datastore.xml",
  "progress":0.35,
  "warnings":[  
    "Layer xyz requires style abc, but it was not found",
    "Store s123 cannot be connected to",
    "Feature type ft45 cannot be found in store s89"
  ]
}

The status field can be queued/processing/success/failure. In case of failure the document will contain a error message:

{  
  "status":"failed",
  "currentResource":"/myWorkspace/myStore/datastore.xml",
  "progress":0.35,
  "errorMessage":"Failed to restore resource /myWorkspace/myStore/datastore.xml with error xyz"
  "warnings":[  
    "Layer xyz requires style abc, but it was not found",
    "Store s123 cannot be connected to",
    "Feature type ft45 cannot be found in store s89"
  ]
}

Issuing a restore

{  
   "dry-run": true|false,
   "file":"/var/backups/gsBackup1.zip",
}

Dry run support

Checks to be performed:

  • The environment specific variables needed are all there
  • Check if all object references can be properly resolved (e.g. layers -> style, group -> layer and so on)
  • Stores can be connected to, feature types can be computed, grid coverages can be read
  • For styles, check if resources being referred (e.g., icons) are available

The result will be a report of issues found and warnings. Issues are blockers with respect to performing the restore while warnings will not block the restore but needs to be evaluated since they might indicate eventual problems.

Areas of work:

  • Machinery to run the checks
  • Changes to the UI to allow the dry run while doing a restore, or stand-alone, and showing a list of issues found
  • REST api to drive the dry-run

Issuing a restore

{  
   "dry-run": true,
   "file":"/var/backups/gsBackup1.zip",
}

Representation of a dry-run operation

{  
  "status":"processing",
  "currentResource":"/myWorkspace/myStore/datastore.xml",
  "progress":0.35
}

The status field can be queued/processing/success/failure. In case of failure the document will contain a list of error messages and warnings:

{  
  "status":"failed",
  "currentResource":"/myWorkspace/myStore/datastore.xml",
  "progress":0.35,
  "errors":[  
    "Layer xyz requires style abc, but it was not found",
    "Store s123 cannot be connected to",
    "Feature type ft45 cannot be found in store s89"
  ],
  "warnings":[  
    "Layer xyz requires style abc, but it was not found",
    "Store s123 cannot be connected to",
    "Feature type ft45 cannot be found in store s89"
  ]
}

Discussion

Voting

Project Steering Committee:

  • Alessio Fabiani: +1
  • Andrea Aime: +1
  • Ben Caradoc-Davies: +1
  • Brad Hards: +1
  • Christian Mueller: +1
  • Ian Turton: +1
  • Jody Garnett: +1
  • Jukka Rahkonen: +1
  • Kevin Smith:
  • Simone Giannecchini: +0

Links

Clone this wiki locally