Concepts Database - UBOdin/mimir GitHub Wiki

Querying the Database

Adding features to Mimir will invariably lead to the need to query the database. In general, all queries should go through db.query(...), which takes either a SQL Query string or an operator tree. Operator trees are preferable, as SQL parsing is expensive and can lead to errors like SQL injection attacks.

For example, to compute a join of R and S on R.B = S.B, you might write:

db.query(
  Select(
    Comparison(Cmp.Eq, Var("R_B"), Var("S_B")),
    Join(
      db.getTableOperator("R"),
      db.getTableOperator("S")
    )
  )
) { results =>
  // .. do something with results
}

We're working on a DSL to make writing these queries a little easier.

Results and Caching

The results object will be valid only within the block passed to db.query(). You can return results from it, but any subsequent calls on results will fail with the error:

   Attempting to read from closed iterator.  This is probably because you're trying to return 
   an unmaterialized iterator from Database.query

TLDR: If you see this happening, check to see if you're returning any collections outside of a query block and use toIndexedSeq to create a materialized version of the collection before it leaves the query block.

Why This Is Happening: JDBC ResultSets need to be closed in order to release the associated resources (Garbage Collection won't do that). We've had too many bugs stemming from a ResultSet not being closed correctly, so by fiat Database.query will do this for you now.

Unfortunately, Scala evaluates iterator operations lazily. Any of the following methods will not actually do any computation (at least not immediately):

  • toIterator
  • map
  • filter
  • toSeq In other words, these methods eventually modify their inputs, but not until they actually have to. Values will not be read out of the ResultSet until they're actually needed. As soon as you leave the query block, the ResultSet is closed, and it's no longer possible to read from it. Even toSeq only gives you a lazily populated sequence object. The most straightforward way to guarantee a materialized result is to use toIndexedSeq, which produces a fully materialized representation of everything left in the iterator.

Views

Because of the need to support Virtual C-Tables (discussed below), Mimir completely subverts the underlying database's view functionality. In later revisions it would be nice to mirror Mimir views back onto the backend database, but for now that's not feasible.

The mimir.views.ViewManager class (aka db.views) is responsible for creating, destroying, and persisting views and associated metadata. Views are persisted in the backend database in a table called MIMIR_VIEWS, and the view definition is stored as an Operator AST serialized and encoded in Base64. We tried using toString and OperatorParser in the past, but these are incredibly slow.