Background - akvo/akvo-core-services GitHub Wiki

The Akvo tools (FLOW, RSR, OpenAid and Akvopedia) can all be characterised as data systems: systems that answers questions based on information that was acquired in the past. All tools share these common tasks:

Capture - devices, web interfaces, external sources (IATI)
Store - store data
Analyse - aggregate data by properties, trends
Deliver - dashboards, widgets, websites, reports, other export formats

Data systems

This is how we understand data, queries, and views:

Data - will refer to the information that can't be derived from anything else. Data serves as the axioms from which everything else derives.
Queries - are questions you ask of your data. For example, you query your financial transaction history to determine your current bank account balance.
Views - are information that has been derived from your base data. They are built to assist with answering specific types of queries.

Each data system should manage:

the storage and querying of data with a lifetime measured in years,
encompassing every version of the applications to ever exist,
every hardware failure,
and every human mistake ever made.

To be able to do this, the desired properties of the data system are:

Robust and human fault-tolerant - immutable data, possibility of recomputation
Low latency reads and updates - people expect changes to data to propagate immediately
Scalable
General - compute arbitrary views on datasets
Extensible - adding a new view should be easy
Allows ad hoc queries
Minimal maintenance
Debuggable

Common goals

Common goals to achieve this include:

not to reinvent the wheel, use existing tech where available
reusable core services that can be shared
favour simplicity & composiblity
find tech that can reduce sys admin needs

Current size of data

This are the typical sizes of data in the various systems:

Open Aid: 70.000 projects = ~ 700.000 transactions
RSR: 1500 projects, 1000 orgs, 3000 updates
FLOW: 250.000 surveyInstances = ~5.000.000 facts

Audit and versioning

The data systems we create need to support auditing and versioning for these reasons:

Access historic data
Correct human errors
Undo actions
Validity of data

Examples of systems using versioning are: Transifex, Wordpress, Google Docs