crs authorty alternative proposals - STEMLab/geotools GitHub Wiki
The issue (https://osgeo-org.atlassian.net/browse/GEOT-1286) is supposed to be used to track alternatives - but since it does not let me do pictures here we are.
- #Worker Pool - :yellow_heart: use an object pool to manage workers, and connection lifecyle, inject cache into workers
- #Fire and Forget - less code, but performance may suffer
- #Worker DAO Pool - treat worker as a pure data access object only retrieving definitions
- #Just do It - minimal design change, maximal code
The star above indicates the current proposal.
Worker DAO Pool
- Threads: CRS Authority Allowing Multiple Users#Recomendation Use an ObjectPool of FactoryUsingOracleSQL
- Cache: Limit cache use to OracleEPSGAuthority, limit the scope of workers to only providing the definition (not object creation)
- Connection: Use ObjectPool lifecycle methods to close connection, tune ObjectPool within limits of provided DataSource
crs-authority-alternative-proposals/RuntimeOverviewDAO.png
Pros:
- Object Pool used to manage workers
- Object Pool lifecycle methods used to notify worker of transitions so connection can be managed appropriately
- Layered seperation between the service and the pool of data access objects makes sense to the Java EE developers (and looks normal)
- The cache is limited to one object, that has responsibility for object creation and caching
Cons:
- DAO cannot be wrapped in any of the decorators we have hanging around
Question: Do we really need this complexity?
A disadvantage that I see about OracleEPSGDefinitionDAO is that going through a Map intermediate
seems to introduce a lot of complexity to me for no obvious benefit (unless I miss the point). I
means, we are basically just converting a ResultSet into a Map. The ResultSet could be though as a
kind of collection to, but converting it to a Map means that we need to defines a whole bunch of...
desruisseaux: ...HashMap keys. Do we really need all this complexity?
- The point is that the seperation of concerns. The java EE guys like to see a layered
architecture and they would honestly ask me where the data access object is. The current worker
is not a very good DAO as it does more stuff (it was recursive, and created real objects and so
on)
and this gets in way of being able to make a pool of them easily. Now we have some reasons; and some benefits to making the workers create real objects and share a cachejgarnett: so I do not really mind.
The map of properties are always metadata and are built in the same way for all referencing objects. The HashMap keys for metadata are clearly defined, but they are the only one clearly defined. After metadata, we reach "real" object and the story become a little bit different.
- So by reviewing the sequence diagram for Datum we chose too easy of an example.
Allowing Multiple Users
OracleEPSGAuthority
allows multiple threads, making use of ReferencingObjectCache
in order to
return objects previously constructed and an ObjectPool
of workers to supply definitions in
the event of a cache miss.
In detail: In the event of a cache miss a a Semaphore is placed in the cache (for others to block
on) and a worker is taken from the ObjectPool
and asked to fetch the definition required for
construction. The defined instances is (using the ReferencingObjectFactory
which uses an internal
CanonicalSet to prevent duplicates) and placed into the cache. The Semaphore is asked to release any
waiting threads (all of which can now get a cache hit).
Cache Handling
The cache has been isolated into a single class - ReferencingObjectCache
. This class is
responsible for storing strong references to objects already created and released to code outside of
the referencing module. It is also responsible for storing weak references to temporary objects
created during the use of find method.
This class is thread safe.
Connection Issues
OracleEPSGAuthority
is the keeper of a dataSource which is used when creating
OracleEPSGDefinitionDAO
to populate the ObjectPool
. The OracleEPSGDefinitionDAO
use their
dataSource to create a connection as needed, they will also keep a cache of
PreparedStatements
against that connection.
ObjectPool
lifecycle methods are implemented allowing OracleEPSGDefinitionDAO
object to be
notified when they pass out of constant use; at this point there PreparedStatements
and connection
are closed.
We will need to make use of a single worker (and use it to satisfy multiple definitions) when implementing the find method.
By providing hints to tune the ObjectPool
we can allow an application to:
- Ensure that less workers are in play than number of Connections managed by the DataSource (so other oracle modules do not starve)
- Emulate the current 20 min timeout behavior
- Arrive at a compromise for J2EE applications (where a worker can free it's connection the moment it is no longer in constant use)
Worker Pool
Threads: Use ObjectPool to manage multiple workers
Cache: Isolate cache into seperate object, and inject into workers so they can perform their own
cache check
Connection: Use ObjectPool lifecycle methods to close connection, tune ObjectPool within limts of
provided DataSource
crs-authority-alternative-proposals/RuntimeOverview.png
Question: How will 2 threads would know that the same object is under construction?
The first thread will place a Semaphore into the Map, the second thread will block on it. See Cache Handling below.
Allowing Multiple Users
OracleEPSGAuthority
allows multiple threads, making use of ReferencingObjectCache
in order to
return objects previously constructed and an ObjectPool
of workers to create new content in
the event of a cache miss.
To build compound objects the workers will need to share the cache (or use a back pointer to call the parent) - we have chosen to let them share the cache.
Cache Handling
The cache has been isolated into a single class - ReferencingObjectCache
. This class is
responsible for storing strong references to objects already created and released to code outside of
the referencing module. It is also responsible for storing weak references to temporary objects
created during the use of find method.
This class is thread safe, and populated by the workers when they create objects.
Connection Issues
OracleEPSGAuthority
is the keeper of a dataSource which is used when creating
OracleEPSGDefinitionDAO
to populate the ObjectPool
. The OracleEPSGDefinitionDAO
use their
dataSource to create a connection as needed, they will also keep a cache of
PreparedStatements
against that connection.
ObjectPool
lifecycle methods are implemented allowing OracleEPSGDefinitionDAO
object to be
notified when they pass out of constant use; at this point there PreparedStatements
and connection
are closed.
We will need to make use of a single worker (and use it to satisfy multiple definitions) when implementing the find method.
By providing hints to tune the ObjectPool
we can allow an application to:
- Ensure that less workers are in play than number of Connections managed by the DataSource (so other oracle modules do not starve)
- Emulate the current 20 min timeout behavior
- Arrive at a compromise for J2EE applications (where a worker can free it's connection the moment it is no longer in constant use)
Fire and Forget
- Threads: Create a FactoryUsingOracleSQL as needed in a "Fire and Forget" manner
- Cache: Workers can keep a back pointer, and call the parent to check the cache
- Connection: Workers must limit connection use to just the duration of the executeStatement
crs-authority-alternative-proposals/RuntimeOverviewFireAndForget.png
Pros:
- Simple, minimal code
- Follows Java EE best practice
Cons:
- Not assured of a prepared statement cache ... so performance may be terrible in some configurations
Are the prepared statements worth it?
Make no mistake about it - the design would get a lot more simple, and would be normal Java EE best practise if we were not holding on to our connections. We could throw away the workers the moment we had used them to handle one request.
- The prepared statements are worth it when we are creating a lot of objects - as with the find method
- They are supposed to be 2 to 3 times faster, this expectation is in line with our experience for these kinds of requests
However simply using a prepared statement, and throwing it away after - may allows us some of the performance gain.
We should perform some measurements before evaluating further.
Why are you doing this - use the DataSource to cache prepared statements?
They can do that? Wow they can ...
- DBCP (ie Tomcat): http://jakarta.apache.org/commons/dbcp/configuration.html
- C3P0 (ie JBoss): http://www.hibernate.org/214.html
- OC4J: has instructions...
Well this would be very very cool. For the geotools out of the box experience we could use dbcp and let it manage this problem leaving our code simple.... But our Java EE customers wants to use the DataSource provided by the application container so we are in a bit of a conflict here. Still if we manage them ourselves in an ObjectPool - and the DataSource keeps them alive even longer we get the best of both worlds.
Just do It
- Threads: Create a Collection of Workers and manage it ourselves, keeping track of the Thread so we recursively use the same worker
- Cache: Workers can keep a back pointer, and call the parent to check the cache
- Connection: Choose a connection policy strategy object based on a new Hint