Job Manager for GeoWebCache - lisasoft/geowebcache GitHub Wiki

The job manager work for GeoWebCache that LISAsoft has been working on is close to ready, so it's time to show it off to the community and start working on that pull request.

A summary of the original proposal has been copied to this wiki here: New Features - Early Discussion

Updated Documentation

The best way to summarize the changes is to point at the new documentation. This can be built from this clone of GeoWebCache, but for convenience is available here as a zip: zip out of date and no longer available

The documentation has three new pages and an updated seed page. Check out the following (probably in the order presented below):

  • /html/webinterface/seed.html
  • /html/configuration/basemap.html
  • /html/webinterface/jobs.html
  • /html/rest/jobs.html

I restructured the REST API section a bit as well, and changed the index page of the web interface section.

Design Considerations

To help the pull request, I've listed some of the more interesting design considerations that went into the project here:

Jobs

GeoWebCache now has a concept of jobs. A job is a managed cache manipulation. It's very similar to the existing tasks in that they can be seed, reseed or truncate and have similar parameters (what to seed, how many threads etc). The difference is that a job is a summary and point of interaction of all the tasks that spawn for one seed or truncate effort. Jobs also support features like they can be scheduled and throttled across all threads. To be clear, a task is one thread of a job.

The JobStore

Parallel to the MetaStore, there is now a JobStore that houses two tables, jobs and job logs. Jobs are persisted to this store on creation, periodically during execution and on completion.

The job store closely duplicates the approach taken for the metastore, but simplified. The metastore and diskstore (that managed the tiles themselves) are both jointly managed through a storage broker, and support a locking mechanism but the jobs don't use a storage broker.

The job store uses a simple JDBC wrapper same as the meta store. It was tempting to introduce Hibernate to handle persistence of objects but I didn't want to introduce too many new dependencies and was worried about Java 1.5 compatability, so I just copied how the meta store does things. I think a good improvement would be to eliminate the SQL strings and manual object-to-sql stuff here, but I really like the database upgrade mechanism that's in place too and this would also be impacted by such a change.

For now I've added a unit test specifically for the job store because it's a pretty fragile area.

Creating Jobs

The API for GeoWebCache has been extended to support Jobs and job logs. and the seed form has been changed to create the job as part of creating the tasks.

The seed form has been improved to include new job settings like schedule and throttling. Two major new features of this form are:

  • Interactive Map - for selecting the region to seed. This use OpenLayers and shows the layer to seed with the correct projection and zoom levels. A configurable basemap is used here.
  • Estimation - so you can get an idea of how big the job will be before starting it. This is supported by a REST service to do the estimation at the server. This was done so that in the future the estimation could take into account things like tiles that won't need to be seeded and any metrics that exist on tile generation time. At the moment 5 requests a second is the assumption.

Basemap Configuration

The basemap used for the interactive map on the seed page is set up in the geowebcache XML config file. This takes an excerpt of OpenLayers javascript code to return a layer. This may feel hacky, but there are a couple of good reasons to do it this way. One is that the Seed Form makes lots of information available about the layer to seed so that this code exceerpt could make decisions about basemap to use. In short the basemap has to support the projection and zoom levels of the layer to seed.

Future of the Seed Form

There wasn't time for the seed form to be fully converted over to the ExtJS client, but this would be ideal. At the moment jobs can't be edited - the seed form doesn't support editing and all the considerations of what properties of a job can be changed and when changes are allowed based on the state of the job need to be dealt with.

Future of the GWC UI

Once the seed form is moved over to the JavaScript - REST only client, there are only a couple of features left that would need to move over - links to the capabilities docs etc. At this point the GeoWebCache service and client are completely decoupled from each other and could easily be separate projects. This may be appropriate for licensing reasons but probably simpler to leave them integrated.

How Jobs are Monitored

There is a new task that automatically runs all the time in GeoWebCache - the JobMonitorTask. This task periodically (every 5 seconds at the moment) iterates through the running tasks, correlating that information to update the jobs associated to those tasks. It then saves this information off to the JobStore. This means progress is regularly persisted, based on however the JobStore is configured.

The Job Monitor also handles a few things to do with Jobs on startup of GeoWebCache. It finds jobs that it thinks were interrupted the last time GeoWebCache was running and will restart them. It also ensures all scheduled jobs are scheduled using Cron4J.

Cron4J

The only dependency introduced to the project is Cron4J. Cron4J is an open source project licensed under LGPL. Scheduled tasks are registered with Cron4J using the JobScheduler class. Whenever a new job is created it is scheduled as well if appropriate.

One concern I have is I couldn't find a good choice of maven repository to get Cron4J from. At the moment it uses http://www.gridgainsystems.com/maven2 but if possible I'd prefer to publish Cron4J into maybe the OSGeo repository.

The JavaScript Client

The client uses ExtJS 4 and provides a nice example framework for going ExtJS 4 projects. There is one view - the Job List - and this client is generally referred to as the Job Manager page. This page shows information from the JobStore accessed via REST. It also supports manipulating the jobs by copying, cancelling or deleting them.

Copying Jobs in the Client

At the moment jobs that aren't running can be cloned / restarted. This copies the job by doing a post of the same job to the server. This makes a new job with no log information that is ready to run, and will then run (unless it has associated schedule information).

This functionality is intended to change to create new jobs via the edit job form - cloning a scheduled job shouldn't make a copy with exactly the same conditions that will run because it'll be redundant. Instead it opens the new job form on the client (something that isn't in the client yet) and isn't created on the server until you save it. Restarting a job may as well go through the same mechanism to give the user an opportunity to verify the properties before they restart it.

The JavaScript GWC API

The GWC API is isolated into a single JS file. It's built on ExtJS 4 and wraps the REST API into ExtJS Javascript. The API exposes an ExtJS.data.Store that you can bind grids and things to for display. It also allows you to take actions on jobs (like save or delete them). The logs for jobs are exposed in a similar fashion.

The Job Manager is an example of making use of the API.

UI Editable Options

A recent feature needed was to be able to purge old jobs automatically. A system setting was added to control the timeframe for purging, but as we wanted the settings to be editable through the UI and the UI is client side only JavaScript that interacts with a RESTful service, it became logical that changing settings should be done through a restful interface.

There is now a RESTful endpoint to get and update GWC settings for the job manager. Currently there is only one setting but the mechanism is in place for more. Only UI alterable settings should use this mechanism - not all settings. These settings are stored in the JobStore instead of a configuration file.

A couple of other settings that came up (how often the job monitor polls running jobs, the schedule for running the old job purge task) are done the same way as the tile failure settings - through the context document or environment settings.

The JavaScript GWC API supports reading and changing settings, and the JobManager client makes use of the API to do this.