Data Model - mark222/SSDir GitHub Wiki

Data Model

The first, and to some perhaps the most surprising architecture feature is the lack of a database. SSDir has no underlying SQL database for the directory data. The original design did in fact take a traditional approach with a series of tables, foreign keys, and a full set of mapping classes to read and write the tables to/from Plain Old Java Objects (POJOs). As the design quickly evolved, all the usual O/R mapping problems reared up including the need for intermediate key tables for N:N relationships, the difficultly of changing the data model, and the deployment complexity of the database software itself. The Hibernate ORM tool offered some relief, but brought complexities of its own.

After a bit of reflection, it became clear that an SQL database was vast overkill for the application. The primary thing that SQL brings is very efficient searching (queries). But the directory has only a few, very trivial queries that are actually required to implement the user functions. No complex joins or elaborate subsetting is needed. The Java code to implement the required search functions is trivial. Storing data in POJOs and persisting it to directly to disk resulted in a huge reduction in complexity of the application.

A second realization was that the amount of data to be managed and searched is very small by today's computing standards. The entire directory for a 5,000 member church takes up less than 100MB of Java object memory (excluding the photos). A major reduction in complexity is achieved by using an in-memory data model implemented as a set of POJOs. Client calls are serviced extremely fast because no disk I/O is required for any directory data except photos. It is useful to realize that 99% of directory access is read-only (updates are rare), so a model optimized for reading/searching works well. The use of POJOs for the data model allows rich and full featured Java objects that smartly manage the data and implement useful functions directly in the data model.

Concurrency

Since the directory does support online updates, the model must be safe and robust for updates from concurrent users. Each user executes as a thread in the web server. Making each object in the model thread-safe is difficult to do properly and can result in complex locking (and deadlock) scenarios that are hard to predict. Instead, a holistic approach is taken whereby a thread servicing a user request declares it's intent to READ or WRITE the model before it access the model in any way. A model manager (the Dir object) manages these thread-based access requests in a multi-reader, single-writer style. Since 99% of all requests are read-only, almost all requests can safely execute on the model concurrently. When a thread requests write access, it is blocked until there are no current readers, and then is given exclusive access to the model. When that thread completes it's model updates, the updated model is marked for persistence to disk (see Persistence), and then the model is made available to any waiting reader threads.

The model concurrency is managed by the Dir.lockForRead() and Dir.lockForWrite() methods.