E3. DATA STANDARDS - colouring-cities/manual GitHub Wiki

Introduction

The CCRP looks to maximise trustworthiness, accuracy and suitability of data and to minimise malicious use. Collaborative work is undertaken across the CCRP on standardisation of data.


Data nomenclature

Data nomenclature relates to naming conventions across datasets which is essential for standardising data and facilitating data sharing and analysis across countries e.g can a date be entered as 1938, 1930s, c1930 or early 20th century or just one of these? Is an address collected separately under building name, number, street, city and area code or as a whole? It also can reduce errors, speed up searches, support segmentation and prevent duplication.

The CCRP operate a two-tier system. Tier 1 comprises over 100 datasets which use naming conventions applicable to all countries and which are collected for cross country analysis. Nomenclature for these is recorded in the CCRP colouring-core open source code repository. Wherever possible, as with the use of the Global Earthquake Model's construction system taxonomy, naming conventions for relevant to individual datasets, specifically developed to drive cooperation and analysis across countries are identified and used.

Tier 2 consists of datasets that are specific to individual national platforms, code for which is located in country level platform repositories. For example under the Typology section, the 'Basic typology' dropdown offers options describing approximate height and density of the block, using naming conventions applicable to any country. Whereas under the 'Architectural style', naming conventions will likely only be relevant to a single country and therefore are unable to be compared. (The collection of age data by year, under the 'Age and History' category, does however allow buildings from any architectural period from any country to be compared). In the same way, under 'Land Use', in Tier 1 around 10 classes of data are collected activities known to occur in all countries. Whereas in Tier 2, one or more national land use classifications systems will be incorporated, with their own naming conventions e.g A convenience store may also be called a cold store, party store, bodega, carry out, mini-market, corner shop, deli, milk bar, dairy, superette, dépanneur, dep etc..


The Locus Charter

The Locus Charter provides ethical standards for data collection and use developed by The American Geographical Society's EthicalGEO Initiative in collaboration with governments, organisations and individuals, to which the CCRP works towards: (Please see also see the CCRP Ethical framworks for more detail).

  • Realise Opportunities
  • Understand Impacts
  • Do no harm
  • Protect the vulnerable
  • Address Bias
  • Minimise intrusion
  • Minimise data
  • Protect Privacy
  • Prevent identification of individuals
  • Provide accountability

See also The Open Data Institute's Data ethics Canvas under the CCRP Ethical framework page


The Open Data Charter

These are open data standards to which the CCRP works towards:

  • Open by default
  • Timely and comprehensive
  • Accessible and useable
  • Comparable and interoperable
  • For improved governance and citizen engagement
  • For inclusive development and innovation

FAIR Guiding Principles for scientific data management and stewardship.

These are scientific data standards to which the CCRP works towards (see also how to go FAIR:

  • Findability
  • Accessibility
  • Interoperability
  • Reuseability

National personal data protection regulations

These are personal data standards to which the CCRP works maintains which differ in each country/world region. The main data collected within the Colouring London prototype relevant to GDPR principles are user email addresses (required to reset password) and actual user names if these are chosen to be provided - though this is actively discouraged. As an example, in UK/Europe these fall under the General Protection Data Regulation (GDPR). GDPR principles of user identification are also applied by the CCRP to building owners and and to the interior of people's homes. Great care is needed when handing domestic data which will likely relate, as does in the UK, to over 90% of all buildings. GDPR regulations relate to:

  • Lawfulness
  • Fairness
  • Purpose limitation
  • Data minimisation
  • Accuracy
  • Storage limitation
  • Integrity
  • Confidentiality (security)
  • Accountability

Country level, Government Data Quality Dimensions

This is an example of national government data quality dimensions/characteristics (against which data quality is measured) adhered to in the UK. (CCRP partners to insert appropriate national dimensions/standards here).

  • Accuracy: when data reflects reality e.g. correct names/addresses that are up-to-date;
  • Completeness: when all data for a particular use is present and able to be used;
  • Uniqueness: when data appears only once in a dataset and duplication is avoided
  • Consistency: when data values do not conflict with other values within a record or across datasets;
  • Timeliness: the time between when information is expected and when it is readily available for use;
  • Validity: the extent to which the data conforms to the expected format, type and range e.g email addresses having an @ symbol.

Please see also the CCRP Ethical framework for other ethical frameworks referred to in relation to CCRP development, mission and governance.