crs authorty alternative proposals - STEMLab/geotools GitHub Wiki

The issue (https://osgeo-org.atlassian.net/browse/GEOT-1286) is supposed to be used to track alternatives - but since it does not let me do pictures here we are.

  • #Worker Pool - :yellow_heart: use an object pool to manage workers, and connection lifecyle, inject cache into workers
  • #Fire and Forget - less code, but performance may suffer
  • #Worker DAO Pool - treat worker as a pure data access object only retrieving definitions
  • #Just do It - minimal design change, maximal code

The star above indicates the current proposal.

Worker DAO Pool

crs-authority-alternative-proposals/RuntimeOverviewDAO.png

Pros:

  • Object Pool used to manage workers
  • Object Pool lifecycle methods used to notify worker of transitions so connection can be managed appropriately
  • Layered seperation between the service and the pool of data access objects makes sense to the Java EE developers (and looks normal)
  • The cache is limited to one object, that has responsibility for object creation and caching

Cons:

  • DAO cannot be wrapped in any of the decorators we have hanging around

Question: Do we really need this complexity?

A disadvantage that I see about OracleEPSGDefinitionDAO is that going through a Map intermediate seems to introduce a lot of complexity to me for no obvious benefit (unless I miss the point). I means, we are basically just converting a ResultSet into a Map. The ResultSet could be though as a kind of collection to, but converting it to a Map means that we need to defines a whole bunch of...
desruisseaux: ...HashMap keys. Do we really need all this complexity?

  • The point is that the seperation of concerns. The java EE guys like to see a layered architecture and they would honestly ask me where the data access object is. The current worker is not a very good DAO as it does more stuff (it was recursive, and created real objects and so on)
    and this gets in way of being able to make a pool of them easily. Now we have some reasons; and some benefits to making the workers create real objects and share a cachejgarnett: so I do not really mind.

The map of properties are always metadata and are built in the same way for all referencing objects. The HashMap keys for metadata are clearly defined, but they are the only one clearly defined. After metadata, we reach "real" object and the story become a little bit different.

  • So by reviewing the sequence diagram for Datum we chose too easy of an example.

Allowing Multiple Users

OracleEPSGAuthority allows multiple threads, making use of ReferencingObjectCache in order to return objects previously constructed and an ObjectPool of workers to supply definitions in the event of a cache miss.

In detail: In the event of a cache miss a a Semaphore is placed in the cache (for others to block on) and a worker is taken from the ObjectPool and asked to fetch the definition required for construction. The defined instances is (using the ReferencingObjectFactory which uses an internal CanonicalSet to prevent duplicates) and placed into the cache. The Semaphore is asked to release any waiting threads (all of which can now get a cache hit).

Cache Handling

The cache has been isolated into a single class - ReferencingObjectCache. This class is responsible for storing strong references to objects already created and released to code outside of the referencing module. It is also responsible for storing weak references to temporary objects created during the use of find method.

This class is thread safe.

Connection Issues

OracleEPSGAuthority is the keeper of a dataSource which is used when creating OracleEPSGDefinitionDAO to populate the ObjectPool. The OracleEPSGDefinitionDAO use their dataSource to create a connection as needed, they will also keep a cache of PreparedStatements against that connection.

ObjectPool lifecycle methods are implemented allowing OracleEPSGDefinitionDAO object to be notified when they pass out of constant use; at this point there PreparedStatements and connection are closed.

We will need to make use of a single worker (and use it to satisfy multiple definitions) when implementing the find method.

By providing hints to tune the ObjectPool we can allow an application to:

  • Ensure that less workers are in play than number of Connections managed by the DataSource (so other oracle modules do not starve)
  • Emulate the current 20 min timeout behavior
  • Arrive at a compromise for J2EE applications (where a worker can free it's connection the moment it is no longer in constant use)

Worker Pool

Threads: Use ObjectPool to manage multiple workers
Cache: Isolate cache into seperate object, and inject into workers so they can perform their own cache check
Connection: Use ObjectPool lifecycle methods to close connection, tune ObjectPool within limts of provided DataSource

crs-authority-alternative-proposals/RuntimeOverview.png

Question: How will 2 threads would know that the same object is under construction?

The first thread will place a Semaphore into the Map, the second thread will block on it. See Cache Handling below.

Allowing Multiple Users

OracleEPSGAuthority allows multiple threads, making use of ReferencingObjectCache in order to return objects previously constructed and an ObjectPool of workers to create new content in the event of a cache miss.

To build compound objects the workers will need to share the cache (or use a back pointer to call the parent) - we have chosen to let them share the cache.

Cache Handling

The cache has been isolated into a single class - ReferencingObjectCache. This class is responsible for storing strong references to objects already created and released to code outside of the referencing module. It is also responsible for storing weak references to temporary objects created during the use of find method.

This class is thread safe, and populated by the workers when they create objects.

Connection Issues

OracleEPSGAuthority is the keeper of a dataSource which is used when creating OracleEPSGDefinitionDAO to populate the ObjectPool. The OracleEPSGDefinitionDAO use their dataSource to create a connection as needed, they will also keep a cache of PreparedStatements against that connection.

ObjectPool lifecycle methods are implemented allowing OracleEPSGDefinitionDAO object to be notified when they pass out of constant use; at this point there PreparedStatements and connection are closed.

We will need to make use of a single worker (and use it to satisfy multiple definitions) when implementing the find method.

By providing hints to tune the ObjectPool we can allow an application to:

  • Ensure that less workers are in play than number of Connections managed by the DataSource (so other oracle modules do not starve)
  • Emulate the current 20 min timeout behavior
  • Arrive at a compromise for J2EE applications (where a worker can free it's connection the moment it is no longer in constant use)

Fire and Forget

  • Threads: Create a FactoryUsingOracleSQL as needed in a "Fire and Forget" manner
  • Cache: Workers can keep a back pointer, and call the parent to check the cache
  • Connection: Workers must limit connection use to just the duration of the executeStatement

crs-authority-alternative-proposals/RuntimeOverviewFireAndForget.png

Pros:

  • Simple, minimal code
  • Follows Java EE best practice

Cons:

  • Not assured of a prepared statement cache ... so performance may be terrible in some configurations

Are the prepared statements worth it?

Make no mistake about it - the design would get a lot more simple, and would be normal Java EE best practise if we were not holding on to our connections. We could throw away the workers the moment we had used them to handle one request.

  • The prepared statements are worth it when we are creating a lot of objects - as with the find method
  • They are supposed to be 2 to 3 times faster, this expectation is in line with our experience for these kinds of requests

However simply using a prepared statement, and throwing it away after - may allows us some of the performance gain.

We should perform some measurements before evaluating further.

Why are you doing this - use the DataSource to cache prepared statements?

They can do that? Wow they can ...

Well this would be very very cool. For the geotools out of the box experience we could use dbcp and let it manage this problem leaving our code simple.... But our Java EE customers wants to use the DataSource provided by the application container so we are in a bit of a conflict here. Still if we manage them ourselves in an ObjectPool - and the DataSource keeps them alive even longer we get the best of both worlds.

Just do It

  • Threads: Create a Collection of Workers and manage it ourselves, keeping track of the Thread so we recursively use the same worker
  • Cache: Workers can keep a back pointer, and call the parent to check the cache
  • Connection: Choose a connection policy strategy object based on a new Hint