featurecollection clean up motivation - STEMLab/geotools GitHub Wiki

Viewpoints

We have several viewpoints on the problem (each has an interface in GeoTools past that is similar to what is needed):

  • Implementors: Really need to return a Feature and get out of the way, of central importance is the ability to recognize when a user is done (in order to close any open connections or streams).
    • This viewpoint is captured by GeoTools 2.0 FeatureResults class and the use of FeatureReader
    • The fact that it implements a Collection places a great burden on implementors. Way too many methods some with non specific meaning? What does getSize() represent in a dynamic system where a FeatureCollection is defined by a query?
    • The fact that it implements Feature places a great burden on implementors (way too much crazy type stuff going on, and little opportunity for code reuse, and how to you cache derived attributes such as "bounds" anyways?)
  • Standards: Really need a FeatureCollection to be a FeatureCollection in the modeling sense. It must both be a Feature (ie something that can be drawn on a Map), have attribtues (often derived), have metadata in the form of FeatureCollectionType which documents the associations captured by the Feature members.
    • This viewpoint is best represented by GeoAPI FeatureCollection interface
  • Users: Really would like a normal Java Collection class back so they can stop learning new GeoTools stuff
  • Pragmatic: Right now it is hard to use GeoTools to work with real data (where some of it may be invalid)
    • The amount of nested try/catch/finally code needed to protect against invalid data means most projects do not get it right
  • Community: the general community wants us to figure out any interface for this idea and stick with it
    • This viewpoint is best represented by the Deegree project or gvSig

What we can Give Up

  • Justin Deoliveira: FeatureCollection extends Collection<Feature> and associated size(), addAll(), removeAll() etc... methods
  • Jody Garnett: FeatureVisitor
  • Jody Garnett: The idea of using FeatureCollection methods to replace Query. That is subCollection( Filter ), sort( SortBy ) and so on. Although please treat this as a last resort!
  • Jody Garnett: size(), getCount()
  • Jody Garnett: FeatureList
  • Jesse Eichar: FeatureCollection events
  • Jody Garnett: Use of FeatureCollection as the result of feaureSource.getFeatures() (if we really cannot handle the overhead of implementing Feature then lets not pretend we are producing a FeatureCollection in the ISO19107 sense)
  • Jody Garnett: Documentation and Users (if need be)

What we must Keep

  • Jody Garnett: FeatureCollection extends Feature, and associated getBounds()
  • Andrea Aime: Safty - we must not risk leaking resources (ie connections and io streams)
  • Jody Garnett: uDig and GeoServer
  • Jody Garnett: short code examples (we cannot depend on try/catch/finally and boiler plate code)

Motivation

Closing Iterators just does not Happen

We cannot implement Collection<Feature> and expect to stay happy. The combination of Java 5 syntatic sugar (like foreach / break) and IDE code completion means that users are forver writing code like the following:

return collection.iterator().next();
for( Feature feature : collection ){
    if( feature.getId() == selected ){
        return feature;
    }
}

I each case we have a problem - the user has not "closed" the iterator, leaving streams or connections in use. These kind of errors only show up in production (after an application has been running for days or weeks). This kind of thing is not discovered using unit testing. We simply cannot expose our users to this kind of risk.

Short term:

  • Add a finalizer that freaks out on unclosed iterators (so that test cases can reveal the problem).

Long term:

  • We need to make sure FeatureCollection does not extend Collection<Feature>, Andrea Aime "want the bug to be obvious in the code you're writing"
  • We want the api to be easy for people to learn, "Having the same methods as Collections will make it easier"

Working with Real (ie Invalid) Data

The amount of code needed to protect yourself from invalid data means that almost all client code breaks when presented with data that is out of sync with its indicated FeatureType.

FeatureCollection featureCollection = featureSource.getFeatures(filter);
Iterator iterator = null;
int count;
int problems;
try {
     for( iterator = features.iteator(); iterator.hasNext(); count++){
           try {
               Feature feature = (Feature) iterator.next();
               ...
           }
           catch( RuntimeException dataProblem ){
               problems++;
               if( iterator.hasNext() ) continue; // skip and try the next one
               throw dataProblem(); // must be a serious error
           }
     }
}
finally {
     featureCollection.close( iterator );
}
System.out.println("Skiped "+problems+" out of "+count );

Short term:

  • Provide a wrapper "SafeFeatureIterator"? That catches the above RuntimeExceptions and allows users to ignore the problem

Long term:

  • Switch to GeoAPI Feature which is capable of representing invalid data
  • Be forgiving by default, let users engage a "Strict" if needed

Timing and Transition

We are switching over to GeoAPI SimpleFeature, this represents our best (possibly only?) opportunity to address our problems with FeatureCollection. We should use the bad experience we have now and avoid introducing mistakes we know about into the GeoAPI interfaces.

Ideas

Stop supporting Collection<Feature> ✅

It really looks like we have to do this, there is too much magic IDE and Java 5 support for Collection. The API introduces too much overhead for implementors (hi Justin) and too much risk of not closing connections (hi everyone else).

Either as a helper method on FeatureCollection (as was done on FeatureResults previously):

interface FeatureCollection extends Feature {
    Collection<Feature> collection(); // in memory collection 
}

Or as a DataUtilities method:

Collection<Feature> DataUtilities.collection( FeatureCollection features );

In both cases the goals are the same

  • to free the implementor for the burden of implementing a dozen methods ( size(), retainAll() etc...) that may or may not be applicable
  • to protect against the use of iterator() without the possibility of close()

Split FeatureResults from FeatureCollection again ⛔

We could have FeatureSource.getFeatures( Query ) return a "FeatureResults" class again.

interface FeatureSource {
    FeatureResults getFeatures();
    FeatureResults getFeatures(Filter
    FeatureResults getFeatures(Query);
    FeatureType getSchema();
    ...
}
interface FeatureResults {
   FeatureCollection features();
   Collection<Feature<> collection(); // in memory
   FeatureType getSchema();
   BoundingBox getBounds();
   int getCount();
}

There is too much similarity between this interface and the responsibilities of FeatureSource, we should instead work on DataStore.getView( typeName, filter ) and ensure that it meets our needs of a data access specific API.

Visitor vs Iterator ✅

The concept of an Iterator you have to close is not that bad, the trade off between Iterator and Visitor is mostly one of style. There are a couple of things to take into account

  • Java developers are far more happy with Iterator than Visitor( They have been using Enumeration for almost 10 years now)
  • The use of Iterator does demand that we ask users to use try/finally or risk leaking resources, Visitor handles this problem in a much more clean fashion - we can be sure to close any connections used as it is the data structure class taking responsibility for traversal
  • Visitor is similar to the concept of a Closure being introduced for Java 7 (we may want to look at that JSR and ensure our method is compatible when these ideas when they show up?)
  • Iterator has the ability to "break" out of the loop at any point in time, this functionality is much loved (but we will need to try/finally to protect against it.

We need to make the following improvements to FeatureVisitor:

  • We can "break" out of the current feature collection at any time by returning false
  • We should be careful when optimizing for specific visitors (such as "min")
⚠️ **GitHub.com Fallback** ⚠️