SeeDB Outline - aerohead/streaming-analytics GitHub Wiki

SeeDB applies to a database D with a snowflake schema. The primary table is a "fact" table with a collection of IDs which are primary keys into "dimension" tables, one for each type of ID.

Parameters

Dimension attributes, A: ** attributes to apply a group-by query to
Measure attributes, M: ** attributes to apply an aggregation to
Aggregate functions, F: ** functions applied to M, such as count(), sum(), avg()

SeeDB groups D along any of the dimension attributes A and aggregates any of the measure attributes M with a certain function F. The output is a two-column table that is visualized as a bar chart or line plot. To date, SeeDB focuses on only one aggregation at a time, but it's possible to extend to multiple aggregations, which results in a n-column table output.

SeeDB assumes that the user provides a query Q, which is a SELECT, PROJECT, JOIN on the database. SELECT produces a subset of rows in the fact table, PROJECT filters columns of the fact table, and JOIN appends the remaining dimension tables. This results in a more focused subset of data called D_Q for which SeeDB recommends visualizations with high "utility".

Each visualization is an "aggregate over group-by query" depending on the aggregation defined by m and f, and the grouping defined by a. The domain of these parameters define the search space for SeeDB to optimize the utility function.