Home - accandme/openplum GitHub Wiki

Distributed relational databases are appealing when we want to store large data sets, since data is partitioned across several nodes that can each manage its own, smaller share of it. However, this approach comes with its own challenges. With partitioned data, it is difficult to know how, given an arbitrary query, it must be executed on the different nodes in order to produce the correct result. It is not possible to simply execute the query on each node individually and combine the results, since the query might be calling for data that is available on some nodes but not on the others. Aggregation and nested queries provide further challenges. Because of all this, careful planning and intelligent pre-shuffling of the data is required. It is not surprising then that there are no open-source relational distributed database systems capable of executing SQL queries with aggregates, as of the time of writing this report.

This project is an attempt at building such a system.