Implement a pure java Dbase indexing to optimize shapefile access - STEMLab/geotools GitHub Wiki

Proposal: Implement a pure java Dbase indexing to optimize shapefile access.

Description

The goal of this work is to implement a pure java Dbase indexing to optimize shapefile access. Now, the shapefile provider read all features from data file in order to resolve a filter (When it does not contain spatial or fid items).

I would propose a new set of classes to resolve a filter using an optional Dbase-index file (Also the code provides full integration of spatial and alphanumeric filters). The code defines a few set of interfaces and classes that we can override for each format required.

I have implemented a CDX-index file reader and other new open DBX-index file reader, and I get great performance compared than current bruteforce behavior (https://github.com/geotools/geotools/pull/1056#issuecomment-164694852). The code evaluate the filter for each node managed in the Dbase-index file and save valid records in a temporary buffer.

The code provides a filter visitor ExpressionFilterExtractor that resolve the expression for each key-value pair present in the tree. The full integration provides other new filter visitor ShapeFileExpressionFilterVisitor that add spatial capabilities.

Also, the IndexManager class now inherits from a new DbaseIndexManager class who manages the alphanumeric indexing, and avoiding edit current IndexManager class as far as possible.

The Dbase-index managers are implemented in "unsupported" plugins. First plugin supports CDX-index files generated with foxpro (Our client uses this format) and second plugin uses a new open DBX-index manager that supports the creation of these indexes. It is possible write mdx, ndx or idx file managers implementing a new little set of interfaces (http://www.clicketyclick.dk/databases/xbase/format/)

The pull supports full integration of spatial and alphanumeric filters, also it fixes invalid results of filters who mix spatial and alphanumeric operators and that they do not share same spatial envelopes. Now, the shapefile provider uses the total BBOX present in the query to filter all geometries even when they belong to other child of the filter where the spatial operation must not applied.

Status

Choose one of:

  • Under Discussion
  • In Progress
  • Completed
  • Rejected
  • Deferred

Voting:

  • Andrea Aime
  • Ben Caradoc-Davies
  • Christian Mueller
  • Ian Turton
  • Justin Deoliveira
  • Jody Garnett
  • Simone Giannecchini

Tasks

API Change

  • IndexManager class inherits from new DbaseIndexManager class.