Background - akvo/akvo-core-services GitHub Wiki
The Akvo tools (FLOW, RSR, OpenAid and Akvopedia) can all be characterised as data systems: systems that answers questions based on information that was acquired in the past. All tools share these common tasks:
- Capture - devices, web interfaces, external sources (IATI)
- Store - store data
- Analyse - aggregate data by properties, trends
- Deliver - dashboards, widgets, websites, reports, other export formats
Data systems
This is how we understand data, queries, and views:
- Data - will refer to the information that can't be derived from anything else. Data serves as the axioms from which everything else derives.
- Queries - are questions you ask of your data. For example, you query your financial transaction history to determine your current bank account balance.
- Views - are information that has been derived from your base data. They are built to assist with answering specific types of queries.
Each data system should manage:
- the storage and querying of data with a lifetime measured in years,
- encompassing every version of the applications to ever exist,
- every hardware failure,
- and every human mistake ever made.
To be able to do this, the desired properties of the data system are:
- Robust and human fault-tolerant - immutable data, possibility of recomputation
- Low latency reads and updates - people expect changes to data to propagate immediately
- Scalable
- General - compute arbitrary views on datasets
- Extensible - adding a new view should be easy
- Allows ad hoc queries
- Minimal maintenance
- Debuggable
Common goals
Common goals to achieve this include:
- not to reinvent the wheel, use existing tech where available
- reusable core services that can be shared
- favour simplicity & composiblity
- find tech that can reduce sys admin needs
Current size of data
This are the typical sizes of data in the various systems:
- Open Aid: 70.000 projects = ~ 700.000 transactions
- RSR: 1500 projects, 1000 orgs, 3000 updates
- FLOW: 250.000 surveyInstances = ~5.000.000 facts
Audit and versioning
The data systems we create need to support auditing and versioning for these reasons:
- Access historic data
- Correct human errors
- Undo actions
- Validity of data
Examples of systems using versioning are: Transifex, Wordpress, Google Docs