Background - akvo/akvo-core-services GitHub Wiki

The Akvo tools (FLOW, RSR, OpenAid and Akvopedia) can all be characterised as data systems: systems that answers questions based on information that was acquired in the past. All tools share these common tasks:

  • Capture - devices, web interfaces, external sources (IATI)
  • Store - store data
  • Analyse - aggregate data by properties, trends
  • Deliver - dashboards, widgets, websites, reports, other export formats

Data systems

This is how we understand data, queries, and views:

  • Data - will refer to the information that can't be derived from anything else. Data serves as the axioms from which everything else derives.
  • Queries - are questions you ask of your data. For example, you query your financial transaction history to determine your current bank account balance.
  • Views - are information that has been derived from your base data. They are built to assist with answering specific types of queries.

Each data system should manage:

  • the storage and querying of data with a lifetime measured in years,
  • encompassing every version of the applications to ever exist,
  • every hardware failure,
  • and every human mistake ever made.

To be able to do this, the desired properties of the data system are:

  • Robust and human fault-tolerant - immutable data, possibility of recomputation
  • Low latency reads and updates - people expect changes to data to propagate immediately
  • Scalable
  • General - compute arbitrary views on datasets
  • Extensible - adding a new view should be easy
  • Allows ad hoc queries
  • Minimal maintenance
  • Debuggable

Common goals

Common goals to achieve this include:

  • not to reinvent the wheel, use existing tech where available
  • reusable core services that can be shared
  • favour simplicity & composiblity
  • find tech that can reduce sys admin needs

Current size of data

This are the typical sizes of data in the various systems:

  • Open Aid: 70.000 projects = ~ 700.000 transactions
  • RSR: 1500 projects, 1000 orgs, 3000 updates
  • FLOW: 250.000 surveyInstances = ~5.000.000 facts

Audit and versioning

The data systems we create need to support auditing and versioning for these reasons:

  • Access historic data
  • Correct human errors
  • Undo actions
  • Validity of data

Examples of systems using versioning are: Transifex, Wordpress, Google Docs