GSIP 155
GeoServer catalog optimizations for faster startup, faster runtime, better scalability with many layers.
Andrea Aime
This proposal is for GeoServer 2.11-beta.
- Under Discussion
- In Progress
- Completed
- Rejected
- Deferred
The current DefaultCatalogFacade
is based on non thread safe data structures that get compromise under concurrent modification, and performs a significant number of linear scans reducing its performance when many layers are configured.
The code reading the configuration from disk is also inefficient, paying a high price if the file system is not caching the information GeoServer needs to read during startup (directory contents, file contents).
GeoServer services and UI typically lookup the catalog either by id (especially during startup) or by resource name (especially during OGC service operation).
The current DefaultCatalogFacade
uses different types of data structures to handle catalog infos, in particular:
- A non thread safe multimap for stores and resources, fast to lookup by id, but requiring linear scans to lookup by name
- Concurrent maps keyed by name for workspaces and namespaces, requiring linear scans to lookup by name
- Concurrent copy on write arrays for layers and layer groups, requiring linear scans for all types of access.
Scalable methods to access the catalog (listing by class and filter) are also not as efficient as they could be, they first lookup the entire collection, wrap it in modification proxies, and then filter it, causing much garbage generation in the process.
Looking up a layer by name (typically in WMS) also incurs in two linear scans, first looking up the ResourceInfo
by name, and then looking up the layer associated to it.
The DefaultCatalogFacade
has been rewritten to leverage a CatalogInfoLookup
index class allowing keyed access both by identifier and by name. To reduce maintenance cost and simplify the code the qualified name lookup is replaced by a lookup using a workspace/namespace id and the local name instead (keyed by org.opengis.feature.type.Name
).
The layer lookup has also been replaced to perform a single key based request.
File system wise, Resource has been given a getContents()
method returning the full content of a resource as a byte[]
, which has been implemented using java.nio.file.Files.readAllBytes
, significantly faster to read a small file on a cold file system cache. The Resource.getType()
call has also been optimized to make a single system call instead of the 3 ones currently used by the code (this single change reduces the overall startup time on a cold file system cache by 3 times).
A data directory previously developed to debug a startup time issue (already fixed) has been used to evaluate the performance improvements in the current catalog facade. The data directory uses:
- A single local PostGIS data store
- 10000 tables in it, all clones of "topp:states", configured as layers
- 10000 unique styles (the data dir has been created using the importer extension)
- 10000 GWC cached layers (the default)
The data directory is clearly limited in what it allows to test, does not cover the case of many rasters, nor many stores, but is nevertheless representative of a case with few stores and many layers.
The hardware, for reference, is:
- Intel Core i7 860
- 16GB RAM
- 2TB Seagate SSHD spinning disk hard drive with 32GB of SSD embedded cache
Three different GeoServer versions have been tested against this data directory:
- GeoServer 2.10.1, with no extra plugins
- GeoServer 2.10.1 with JDBCConfig, having the catalog stored in a local PostgreSQL database
- GeoServer master with the changes of this proposal
The first benchmark is a startup time with a cold file system cache, obtained by dropping all caches on a Linux machine running free && sync && echo 3 > /proc/sys/vm/drop_caches && free
as root before starting up GeoServer.
Version | Startup time secs |
---|---|
2.10.1 | 428 |
2.10.1 + JDCBConfig | 62 |
GSIP 155 | 68 |
The second one is a startup time with hot file system cache (run right after the previous one), with all the directory and file contents in memory:
Version | Startup time secs |
---|---|
2.10.1 | 39 |
2.10.1 + JDCBConfig | 49 |
GSIP 155 | 29 |
It is to be noted that the JDCBConfig case still has to load all the GWC configuration from local disk, and will have to perform one or more database queries for each GWC layer being loaded.
Finally, a concurrent GetMap test has been performed using the following ApacheBench command, running GetMap on one of the many "topp:states" (picked in the middle of the lot) using 8 threads.
ab -n 400 -c 8 "http://localhost:8080/geoserver/the_states/wms?STYLES=&LAYERS=the_states%3Astates_5000&FORMAT=image%2Fpng&SERVICE=WMS&VERSION=1.1.1&REQUEST=GetMap&SRS=EPSG%3A4326&BBOX=-139.17181525,18.549281576172,-52.52945575,55.778420423828&WIDTH=768&HEIGHT=330"
The GeoServer have been all running with Marlin rasterizer to minimize the cost of producing the output map, and making the catalog contribution more evident in the overall response times.
Version | Throughput req/sec | Avg resp. time ms |
---|---|---|
2.10.1 | 169 | 47 |
2.10.1 + JDCBConfig | 68 | 117 |
GSIP 155 | 233 | 34 |
Observations:
- The gap between 2.10.1 and GSIP-155 times is likely to grow wider as the number of layers increase, as the overhead is due to linear scans in the current
DefaultCatalogFacade
- The gap between JDBCCnfig and GSIP-155 times is likely to be mostly fixed, due to the database queries overhead, as both systems use indexes to speed up searches. In other words, expect a fixed 80ms overhead on a machine like the one used for tests. While this overhead appears very large on the small dataset used for tests, it may be not as noticeable while rendering a map composed of many features or using expensive rendering features.
Some tests had to be corrected as they were saving a resource with one name, and then saving the layer with a different one, but without saving again the resource before doing so (LayerInfo
does not store a name, it delegates it to the resource).
This is wrong, but the old catalog was tolerant of this loophole if the layer being saved was referring to a non modification-proxied resource, the current one will notice the inconsistency and throw a validation error.
This works poses the bases for future improvements, including:
- Full concurrent catalog load at startup time. The concurrent nature of the new DefaultCatalogFacade allows loading its contents in parallel, thus allowing even shorter startup times (or tests with larger catalogs)
- Reduction of global catalog locks usage. The current catalog is protected by a global lock disallowing concurrent usage of admin page and REST calls. The new catalog should allow relaxing this. While dropping the lock altogether would be a mistake (the various catalog interactions performed in a single call are still not protected by a transaction), it should be possible to delay grabbing a exclusive lock at the first catalog update.
Project Steering Committee:
- Alessio Fabiani:
- Andrea Aime: +1
- Ben Caradoc-Davies: +1
- Brad Hards:
- Christian Mueller:
- Ian Turton: +1
- Jody Garnett: +1
- Jukka Rahkonen: +1
- Kevin Smith: +1
- Simone Giannecchini: +1
©2022 Open Source Geospatial Foundation