D. RESEARCH GOALS & METHODS - colouring-cities/manual GitHub Wiki

In process of being edited. Editor Alan Turing Institute (P.Hudson)

Summary of Research Goals

The Colouring Cities Research Programme (CCRP) is an academic research programme run at the Alan Turing Institute (UK), designed to support the development of open data platforms about the building stock, and to promote and advance reproducible, ethical and collaborative data science. The key research question posed by the programme is as follows:

Can open-source platforms providing open data on the composition, performance and dynamics of building stocks, able to test diverse methods of geospatial data capture and visualisation and actively engage built environment stakeholders in their development and maintenance, be cost-effectively set up and managed by research institutions, to effect a step-change in the amount and quality of data on stocks available for integration, synthesis, dissemination, and analysis, necessary to support their sustainable and resilient development at global level?

In this section research stages relating to the design, construction, reproduction and evaluation of Colouring Cities platforms are summarised. . These stages relate, firstly, to r&d carried out on the Colouring London prototype between 2015 and 2019 at University College London; secondly to the validation of open platform code by international academic institutions; and thirdly to the setting up of, in collaboration with international partners, the Colouring Cities Research programme. Information on planned development steps for 2023/4 is also recorded in this section.

The CCRP's main research goals can be summarised as follows:

  • To produce permanent platforms that:
  • Release open geospatial data on building stock, of a type, format and quality able to support research into the improvement of the sustainability, resilience, quality and efficiency of the building stock and the United Nation's New Urban Agenda;
  • Release open data able to improve understanding of the stock as a dynamic system - its stock composition, performance and quality, and (past, present, and predicted) dynamic behaviour;
  • Release data of a scale and quality able to support the use of AI and machine learning approaches in data analysis, simulation and forecasting;
  • Maximise accessibility for all stakeholders;
  • Increase opportunities for open knowledge sharing about stocks, and collaborative research between diverse stakeholders at national and international level;
  • Increase opportunities for integrating diverse geospatial data capture methods and testing feedback loops between these;
  • Promote openness, inclusivity, trustworthiness in data science, and drive up ethical standards relating to building attribute data and its application;
  • Create open-source code able to be easily reproduced across countries to support the efficient sharing of data, skills and knowledge to support global sustainability goals;
  • Are sufficiently adaptable to provide open national databases, which combined provide data on the global stock, able to be built-on in perpetuity.

Overview of Design Stages

CCRP design stages are as follows. Please note that these are not linear:

Completed/In progress

  • Design Stage 1: Colouring London prototype: conceptualisation, assessment of need, and identification of required data types (2015-18);

  • Design Stage 2: Colouring London prototype design, funding applications, platform construction & building footprint access and integration, launch and open code release (2015-18);

  • Design Stage 3: Live prototype testing and experimentation with data capture and verification methods(2015 to current);

  • Design Stage 4: Assessment of platform value/forking and validation of prototype open code by international academic partners (2019 to current);

  • Design Stage 5: Set up of the Colouring Cities Research Programme to allow co-working on building attribute data capture, collation, release and analysis; development of academic model and protocols and manual, and expansion of academic partnerships (2020-current);

  • Design Stage 6: Shift from forking of prototype code by academic partners to co-working on core code, checking of core data, and joint identification of potential data applications and impact (2021 to current).

Planned 2023-5

  • Design Stage 7: Expansion of academic programme and ongoing incremental improvements to platform code

  • Design Stage 8: In parallel with Stage 8, co-analysis of data captured by CCRP partners, and open publication of research. Joint work to include testing of AI and machine learning approaches to improve understanding of the stock as a dynamic system, and identify underlying rules of operation, patterns and locked-in cycles of behaviour;

  • Design Stage 9: Integration of data animations and 3D rule-based open models into CCRP platforms, experimentation with platforms as digital twins able to simulate long-term change, and integration of open data for other areas of national infrastructure;

  • Design Stage 10: Dissemination and use of open data at global level to support UN Sustainabile Development Goals in perpetuity.

design step 6

CCRP Design Stage diagram, 2022. Black borders indicate current focus.


Design Stage 1. Prototype conceptualisation, assessment of need and identification of required data types

Overview and objectives

  • The objective of the first design stage was to develop a theoretical framework providing a rationale and context for the choice of data types, data structuring, data capture methods, visualisation and open release, and to demonstrate the grounding of these choices in established ideas. This involved an extensive literature review of 350 papers, websites and other relevant documents, undertaken between 2015 and 2018 at the Centre for Advance Spatial Analysis, University College London, and funded by an Engineering and Physical Sciences Research Council doctoral grant.

Method

  • Assessment of potential impact of open building attribute data release in relation to the United Nation's Sustainable Development Goal no.11 and the New Urban Agenda, which look to make cities and human settlements more inclusive, safe, resilient and sustainable using academic publications;
  • Review of literature relating to types of data commonly used/required to analyse sustainability in building stocks. For general references see here. For relevance to sustainability science see here.
  • Review of literature relating to types of data used/required to analyse urban complexity, and stocks as dynamic systems, and to experiment with computational approaches able identify underlying rules of operation relating to stocks able to improve forecasting models, and particularly the value of building age/lifespan data, see here.
  • Review of literature relating to methods of capturing dynamics data, see here.
  • Identification and assessment of data types typically sought by built environment stakeholders involved in building design, construction, and management e.g. required by government, industry, third sector, public -using UK as example. See consultees here.
  • Review of literature on global status of London as suitability of the city as the UK prototype, see here.
  • Set-up of specialist group session involving UCL experts in urban/London history, urban complexity and energy analysis to assess specific issues relating to capture of building age/lifespan/dynamics data, see here.
  • Review of literature exploring public engagement in data science, see here.
  • Review of The Building Exploratory action research model for collaboratively built knowledge exchange centres about building stocks, see here.
  • Review of open availability of geospatial building attribute data types identified as of relevance for London/UK, see here.
  • Review of spatial data maps/platforms visualising current and historical data on stocks at city/national level in the UK, and examples of building-level colour-coded building attribute maps ADD LINK.
  • Review of examples of successful open database initiatives and their governance models, see here;
  • Selection of potential methods of building attribute data capture to be tested (see Design Stage 3 below).
  • Identification of and application for suitable funding sources for technical costs of Colouring London prototype development, see here.
  • Development of prototype name, likely to stimulate interest from diverse stakeholders in actively engaging in knowledge exchange about buildings and cities, see here.
  • Exploration of the potential of platforms to act as digital twins producing microspatial lifespan simulations and forecasting models, using a conceptual model, see here.

Resources/data accessed

  • United Nations Sustainable Development Goals * Academic publication and other documents including those relation to building attribute requirements for studies into building sustainability, diversity and urban resilience, lifespan extension, urban complexity, urban morphology, and rule-based spatial modelling, and automated methods of attribute data geneneration * UK government information on climate change targets and goals, and UK response to New Urban Agenda draft * UK public authority websites and resources holding open and restricted current building attribute data (including central government departments involved in property taxation, mapping, housing, planning, heritage, energy and environmental data provision * Other UK organisations holding open and restricted building attribute data (third sector and private sector) * UK and international public mapping websites providing information on urban areas at building level (including UK local authority GIS sites providing planning information, plus some international local authority sites e.g. Geoportal BCN (Barcelona) * [International property tax building age visualisations] * Information on London as global hub * Manually coloured urban morphology maps (including those by Hugo Hassinger, M.R.G Conzen and the Hackney Society) * Historical maps of London 1786-1990 and Colour-coded building maps of London 1800-1950 (including Goad Insurance, Booth Poverty, and London County Council bomb damage maps) * Design records for the Building Exploratory's model for local physical/digital knowledge exchange centres on stocks * Expert interviews with UK/London stakeholders 2016-2022.

Outputs

  • Selection and tabulation of over 50 draft subcategories;
  • Selection and graphic representation of 12 main data categories;
  • Choice of London as the prototype city
  • Identification of as the basic building block of platforms -highest quality, updated, comprehensive footprint required to produce as comprehensive, detailed, up-to-date and accurate data on buildings as possible;
  • Identification of the need to include short-term and long-term dynamics data;
  • Identification of key data capture methods;
  • Identification of need for features and mechanisms that facilitate diverse audience engagement including the use of colour as a motivator for data contribution.
  • Selection of open GIS based interface using building footprints to capture, collate and visualise attribute data at building level;
  • Selection of prototype name.

Design Stage 2: Prototype/demonstration model design, construction and testing.

Overview and objectives

The objective of Design Stage 2 was to set-up and test a live prototype/demonstration model. The prototype, [Colouring London] (https://colouringlondon.org/) was built and tested at University College, from 2017, and its first iteration released live in 2018 with all open code released on GitHub from 2017. This stage also involved the successful submission of funding applications to fund software engineering time, and negotiation with the UK's national mapping agency, Ordnance Survey, regarding access to the most comprehensive and highest quality building footprint data available. The prototype was, and continues to be be developed, as an action research initiative with features created incrementally, consulted on and tested live and a network of interested parties gradually developed. Since 2020 r & d work on the prototype has undertaken to the Alan Turing Institute and from 2023 core Colouring Cities code will be contributed by all Colouring Cities partners (NB: Some interface and design features and subcategories designed in 2017 are still to be implemented owing to the scale of coding time required. This has led to a backlog of GitHub issues (including new feature design briefs, issues arising from feature implementation, and other issues arising over time). Co-working on Colouring Cities code across international teams will help accelerate this process (Design Stage 6).

Method:

  • Negotiation with UK's national mapping agency, Ordnance Survey, regarding special agreement with regard to use of restricted, comprehensive, updated building footprint data (OS MasterMap) for Greater London, for open building attribute data capture purposes;
  • Negotiation with the Greater London Authority regarding access to OSMasterMap via its public sector mapping agreement Licence (2016);
  • Applications for first stage funding for appointment of a technical architect to build platform, plus design support to work on interface graphic design (2017);
  • Technical set-up of GitHub open code repository and working environment (Tom Russell at UCL);
  • Training of non-technical researchers on GitHib - re raising issue and design briefs, and co-working on interface feature development;
  • Technical transformation of OS MasterMap vectorised footprint data into raster tiles, to protect OSMasterMap copyright and allow for data capture, collation and visualisation to begin;
  • Fine tuning of main category selection by research lead and representation, working with graphic designer, as simple, striking, colour graphic able to operate as the interface keypad visual, and as the platform logo;
  • Mock-up main interface pages working with graphic designer;
  • Design logo working with graphic designer;
  • Co-working with technical architect on feasibility of proposed interface design and adjustment where necessary;
  • Additional consultation with stakeholders;
  • Securing of logos from public institutions to increase trust in platform;
  • Technical implementation of mock-up designs;
  • Application for, and securing of second stage funding to support ongoing work on prototype by core team;
  • Live release of platform (2018), public release and testing of new feature design; and implementation;
  • Application and securing of specific feature design related grants;
  • Draft design for the 'Showcase' section enabling information and visuals showing the application of Colouring London data, and similar types of data to be easily uploaded and accessed.

Resources/data

  • OS MasterMap footprints for London * Open OS map layers for Britain * ESRI Arc GIS licenced software * InDesign licenced software * GitHub open repository * For testing of data applications by research collaborators - UCL Energy Institute 3D Building Stock model for London; VU.CITY 3D model of London; 3D Procedural London (Dr Flora Roumpani).

Outputs

  • Colouring London operational prototype platform https://colouringlondon.org/;
  • Colouring London GitHub site providing open-source code: https://github.com/colouring-cities/colouring-london;
  • Unique agreement with Ordnance Survey to use highest quality building footprints available for Greater London to experiment with data capture and release method;
  • Colouring London Logo design;
  • Activation of all 12 main data categories and over 50% of planned subcategories;
  • Archiving of iterations on GitHub;
  • Securing of first & second stage funding at UCL: First stage Historic England. Second stage, MacArthur Foundation & Geospatial Commission/Innovate UK).

Planned outputs/improvements 2022/3

  • Activation of all remaining planned subcategories;
  • Scaling up of Colouring London prototype testing to Colouring Britain;
  • Transition from development of Colouring London prototype code by The Alan Turing Institute to new model, managed by Turing, in which features and core open code are worked on collaboratively by CCRP partners. [See also Design Stage 5).

image test


Design stage 3. Testing and evaluation of data capture and verification methods

Overview and objectives

Since 2015, four methods of capture and integration of data on stocks have begun to be tested. Further discussion on these methods is provided in the Data Capture section of the manual. Methods tested are as follows:

  • Bulk upload of existing open datasets (current composition and performance data);
  • Crowdsourcing/manual upload of new data building-by-building (current composition, performance and dynamics data);
  • Automated generation of large-scale new data using inference or other computational methods (current composition and dynamics data);
  • Live streaming of public datasets using existing APIs (current composition, performance and short-term dynamics data);
  • In addition significant research has been undertaken on data verification methods.

Work has not been linear and methods continue to be experimented with. A key area of future work identified as necessary to improve data accuracy and coverage has been the development of feedback loops between methods. Methods tested to-date, as well as plans for 2022/23 testing are summarised below.

3.1 Bulk upload of existing datasets

Overview and Objectives Identification and upload of existing open datasets was the obvious starting point for efficient integration of building attribute data into the prototype platform. Open datasets, where available, continue to be manually uploaded and visualised, as code for new subcategories as these have been developed. However in the UK few relevant open datasets have been identified [See section N03 (https://github.com/colouring-cities/manual/wiki/N3.-Appendix-3.-Colouring-London:-Open-building-attribute-data-availability-UK-London-2021). Where they do exist cleaning is often cleaning. This may range from relatively straightforward reformatting (ADD GitHib for EPCs) to more complex processes such as [disaggregation of protected/listed buildings entries. Though open code have been produced to facilitate uploaded it has been concluded that a queuing system for open datasets is required to allow anyone to add link for datasets for integration to the queue but for these to be moderated/quality controlled, and uploaded by CCRP partners and/or their academic collaborators. The aim is to maximise quality and coverage as quickly as possible, to help prevent high quality data within the platform to be overwritten with poorer quality data, and reduce the possibility of malicious activity. Moderation responsibilities spread across academic and other public institutions is considered the most efficient method of maximising data quality and volume and minimising platform costs.

Method

  • Rapid assessment of open data availability for UK/London for prototype development;
  • Detailed assessment of open data availability for the UK/London for prototype development (including consultation with stakeholders);
  • Development of relationships with public institutions either holding necessary but restricted datasets (e.g. Ordnance Survey), or that hold specific datasets that require extensive work to map (e.g. Historic England), and explore ways of releasing/accessing data in novel ways that help meet goals for both parties;
  • Work on open code to facilitate bulk upload;
  • Development of national networks of academic and other and public institutions interested in moderating datasets within their remit or relevant to their research of relevance;
  • Co-working with national institutions to integrate complex datasets and assess the value of collaborative working (e.g.work with Historic England on listed building data (2020/21) and University of Birmingham on Energy Performance Certificate Data (Current).
  • Initial exploration of the potential of an automatic queuing system for open bulk uploads.
  • Evaluation and design of relevant new platform features.

Resources/data accessed

Outputs

  • Assessment of open building attribute data availability for London and UK;
  • Open code on GitHub supporting open dataset integration- ADD ref;
  • Integration of open datasets into the Colouring London platform;
  • UK informal network of academic/public institutions interested in regional/national engagement (including Universities of Loughborough, Birmingham, Portsmouth and Bradford).

Planned outputs/improvements 2022/3

  • Open code for automatic queuing system for data moderation (due 2023);
  • Operational moderator system for Colouring Britain;
  • Construction of CCRP template for moderator networks and testing with CCRP partners;
  • Use of examples of international CCRP platforms integrating open government datasets to drive open release of similar datasets in countries where access restrictions still apply.

bulk upload Open bulk upload of UK DEC data (non-domestic energy performance ratings)


3.2 Manual upload of new data at the microscale/building level crowdsourcing and data quality assessment (2015-2017)

Overview and Objectives Crowdsourcing of building attribute data, at building level, was identified in Design Stage 1 as as critical step in the process of high quality data capture owing to high levels of fragmentation of knowledge/information about the building stock, and the fact that detailed information on specific building attributes such as building age, as well as user feedback on quality, and information on long-term dynamics can only be collected on a building-by-building basis. Testing of manual upload at building level was undertaken in two stages. The first stage (2015-17) involved pre-prototype data capture for five data types using ArcGIS. The purpose was to assess the feasibility of capturing/crowdsourcing specific data types within the Colouring London prototype; to define and address potential problems; to assess feature requirements to support contributors' knowledge and needs; and to generate a sufficiently large sample to test semi-automated data generation. This stage also looked to assess whether high-quality building attribute data could be easily generated through crowdsourcing at low cost. It also produced a series of colour-coded data visualisations of London's buildings able to be used during the consultation process to interest diverse stakeholders and potential audiences to engage with platform development, and generated a unique sample of building age data for London the value of which was then able to be assessed in various academic and commercial research contexts. Since the release of the prototype platform live in 2018 work has focused on response to feedback on the live site and the upload of new subcategories and improvement features. So far this stage of testing has been delayed by difficulty in accessing sufficient software engineering time to implement all data category features.

Method Pre-prototype set-up testing

  • Selection of five data types for capture at building level, for which UK open datasets were not available (as part of data review as noted in 2b.1): Building age, location, number of storeys, current land use and original/historical land use/building typology. Selection based on relevance to 3D stock modelling for energy and procedural modelling purposes, to resilience assessment and analysis of role of diversity of form, and to study of urban complexity within urban science;
  • Accessing of OSMasterMap building footprints, and ArcGIS software, via UCL academic licences from Ordnance Survey (the UK's national mapping agency) and ESRI respectively;
  • Capture of new building level data for c21,000 buildings in the London Borough of Camden and c2,000 buildings in the adjacent City of Westminster over a two year period;
  • Visualisation of captured data for building age, storeys, land use and typology data;
  • Assessment of usefulness of building age data captured by selected academic and commercial partners;
  • Assessment of opportunities and ethical issues relating to citizen input on building quality/performance;
  • Review of academic studies on crowdsourcing to gain insight into contributor motivation and needs;
  • Evaluation and design of relevant new platform features to support crowdsourcing, and verification of attribute data particularly by local civic and conservation groups monitoring physical composition and impact of physical change at community level.

Post prototype set-up testing

Resources/data accessed:

  • Ordnance Survey Master Map building footprints for the London Borough of Camden, and the City of Westminster;
  • Arc GIS software licenced by University College London from ESRI;
  • Historical gazetteers, scholarly historical guidebooks and historical maps (including Pevsner Guidebooks, Survey of London Volumes, National Heritage List for England, Camden history publications);
  • Google Maps/Google Street View;
  • Bing Maps;
  • Ordnance Survey Open Maps (road names).

Outputs

  • Open datasets containing c100,000 open data entries relating to building age, location, number of storeys, current land use and original/historical land use/building typology for c21,000 buildings, uploaded to the platform;
  • 3000+ longitudinal data entries tracking lifespans and demolition rates (1875, 1890, 1960, 2016) for 2 London boroughs
  • Comparative footprint data for 2009 and 2019 for 1km areas showing plotsprawl within 4 London Boroughs * Historical OS maps for London (1875, 1890, 1960)
  • Crowdsourced contributions ADD stats - including over 40,000 age data entries from a single contributor account;
  • Colour-coded visualisations of captured open building attribute data for use in consultation;
  • Academic paper showing application of captured age data in energy modelling](https://www.creds.ac.uk/publications/building-stock-energy-modelling-in-the-uk-the-3dstock-method-and-the-london-building-stock-model/)
  • Academic paper reference showing application of age and land use data in 3D Procedural models/3D rule based modelling;
  • Integration of data into VU.CITY commercial 3D planning models and mobile app.

Planned outputs/improvements 2022/3

  • Development of model for academic/institutional moderator networks to facilitate harnessing of stakeholder knowledge and increase crowdsourced input (see also bulk upload above);
  • Improvement of interface features to make the contribution of data more interesting, particularly in relation to verification;
  • Co-working with CCRP partners to improve global crowdsourcing networks and core platform crowdsourcing features;
  • Testing of integration of historical building footprints on Colouring London, to facilitate crowdsourcing of dynamics data from historians;
  • Testing of visual prompts with CCRP partners.

age open manual Building age data from UCL study combined with data crowdsourced using Colouring London from professional and non-professional historians


3.3 Automated data capture and extraction (2018-19)

Overview and objectives A number of relevant methods of automated data capture were identified through the literature review in Design Stage 1. The key attraction of data generated though automated extraction and inference is the speed, and scale at which attribute data can be made available, and the efficiency and cost effectiveness of the approach. This is particularly relevance to countries where limited open building attribute data area available and where costs of data are high. Automated processes are also important in gaining insights to relationships between building attributes required to inform underlying rules of operation necessary to improve accuracy in simulations and long-term forecasting models.

Use of computer vision (i.e. the automated extraction of data from images); inference of unknown attributes from known attributes (e.g. inferring the roof shape for all buildings of an identical building footprint and certain age); inference from historical data (e.g. use of historical street network data to infer land use and building age), and rule-based approaches (e.g. the development of algorithms that automatically classify and group all building footprints of a certain length and width from which land use and typology can be inferred.

The assessment concluded that automated methods were essential for rapid coverage and important in developing draft datasets, but that automated methods needed to be verified wherever possible, at building level by experts within the crowd. This feedback loop was also identified as critical in terms improving the effectiveness of algorithms to capture targeted types of attribute data.

Method

  • Review of current studies testing building attribute generation using inference;
  • Identification and integration of available open inferred datasets
  • Use of historical network data to infer geolocation of non-domestic land use, see here;
  • Use of historical network data to infer building age, see here;
  • Use of footprint configuration to infer high street building locations, see here;
  • Use of building age to infer 4D attributes with IOER/Colouring Germany, see here.
  • Use of building age to infer 3D attributes, see here
  • Working with the Alan Turing Institute's Computer Vision for Digital Heritage Special Interest Group to identify national and international partnerships interested in supporting the extraction of historical of data, and particularly building footprints using computer vision, to support integration of large-scale datasets able to facilitate crowdsourcing from historians of historical/dynamics data.
  • Testing of feedback loops between automated approaches and manual crowdsourcing to improve data quality using age data;

Resources/data

Outputs

  • Integration of adjacency data from UCL Energy Institute into Colouring London's _Typology_category;
  • Integration into Colouring London of 750,000 building age data entries into Colouring London's _'Age'_category;
  • Sample of inferred age data (datasets generated using historic network data) and verified using manual methods ADD APPENDIX REF;
  • New Colouring London subcategory Unclassified presumed residential within land use category - allows for data presumed to represent non-domestic stock, generated through automated processes, to be uploaded ready to feedback confirmation or need for method correction;
  • Academic paper on use of age data in 3D procedural modelling.

Planned outputs/Improvements 2022/3

  • Upload and checking of inferred data for multiple attributes across countries;
  • Experimentation with vectorisation of historical maps and generation of open historical footprints using the Colouring London prototype;
  • Development of algorithms with research collaborators to geolocate and classify typologies, using building footprint data, across countries.

adjacency Adjacency data, courtesy UCL Energy Institute, uploaded to Colouring London.


3.4 Testing of live streaming of public datasets (2022)

Overview and objectives For certain datasets, where APIs already exist, live streaming of datasets will be possible. This maximises efficiency especially as it allows for the most updated version of data held by government, and other owners, to be accessed by users. This is particularly relevant where data are statutory or require frequent updating i.e. planning application data, demolition data, energy performance certificate data, and land use data. Though live streaming of demolition data and updated building footprint data additional work is also necessary to explore automatic (but also moderated) transfer, of attribute data for demolished buildings into the 'Dynamics'/site history section. testing of live streaming of planning data is currently being undertaken in the Uk through a partnership between the Alan Turing Institute and Loughborough University.

Method

  • Test live streaming of planning data for London, into Colouring London, in collaboration with Loughborough University. Use existing local government API to provide access live visualised up-to-date planning data showing the status of planning applications. For information on feature design, progress and open code see GitHub Issue 884 (Current/2022). See also;
  • Discussion with national public partners regarding integration of APIs enabling live streaming/updating of relevant government data.

Outputs

Planned outputs/improvements 2022/3

  • Test live streaming of demolished buildings data into the Dynamics category (planned 2023)
  • Live stream to other relevant APIs (planned 2023);
  • Test live updating of OSMM building footprints for UK prototype (planned 2023);
  • Co-working on features to live streaming/updating of public data with international CCRP partners.

planning image Colouring London planning section showing live streamed data


3.5 Verification

Overview and objectives 'How do you ensure accuracy of the data?' is the question most commonly asked by Colouring Cities consultees. Development of features and processes able to help increase accuracy of Colouring Cities data is a core area of research for the CCRP. However platforms also place responsibility on individual users with regard to the suitability of data as the level of accuracy required will vary considerably depending Whether they are used, for example, in a primary school project or to justify change in government policy.

Method

  • Identify and integrate features relating to data provenance developed and successfully tested by open database initiatives such as Wikipedia 'Source' information and OpenStreetMap 'Edit history';
  • Experimentation with greater visibility of last editor;
  • Experimentation with uncertainty measures e.g. subcategories indicating for example earliest and latest possible construction dates;
  • Experimentation with disclaimers and Menu information on accuracy for platform users;
  • Integration of verification button for data entries;
  • Integration of verification manual copy tool;
  • Experimentation with feedback loops between automated dataset generation using inference, and verification/correction by experts within the crowd. disclaimers and Menu information on accuracy for platform users;
  • Identification of data subcategories where accuracy disclaimers are specifically required.

Outputs

  • Edit history, source, verification, and uncertainty features integration into Colouring London e.g. see Age(https://colouringlondon.org/view/age);
  • Menu information page on data accuracy;
  • Data accuracy disclaimers relating to planning information;
  • Automated data sample checked using manual verification.

Planned outputs/improvements 2022/3

  • Verification copy tool;
  • New/improved verification features designed with CCRP partners;
  • Improved data accuracy through greater use of feedback loops, tested with CCRP partners.

verification Examples of accuracy features shown on Colouring London interface


Design Stage 4: Validation of reproducible prototype code, and assessment of potential impact by international academic partners (2019 to current)

Overview and objectives

Design Stage 4 involved testing ease of reproducibility of Colouring London open-source code and assessing the potential impact, for research into the building stock looking to improve its sustainability, resilience, efficiency and quality. Open-source code for the London prototype also need to be validated to ensure the system architecture could meet the operational needs of platform managers and users for other cities. During this design phase which was carried out, in 2018/19 at UCL, and from 2020 at the Alan Turing Institute, potential impact and applications, common barriers to set-up, governance models, as well as the need for additional features, protocols for collaborative working. It was concluded at this point that an academic reproduction model would be rolled out.

Testing was originally planned to be carried in collaboration with academic partners in other UK cities. However international testing, prior to UK testing, was pursued owing to difficulties in extending the formal agreement with Ordnance Survey for access to UK building footprints for the country as a whole; to awareness of the need to maintain research and funding momentum- which waiting for UK permissions could have thrown off course; and to immediate interest from academic colleagues in Beirut in 2018/19 testing prototype code, offering immediate opportunities for testing knowledge and data exchange across countries. It was also identified that countries with greater ease of access to footprints and property tax data bases (often the richest source of building level data on stock composition) could assist in opening up these areas in the UK, by demonstrating the impact on scientific research of open data release.

Though progress on Beirut's platform has been affected by issues impacting at national level since this time, and its development paused, in 2020 a longstanding collaboration with Loughborough University led to the University of Bahrain coming on board. This was followed by interest from individual academics, representing academic departments/institutions involved in very different aspects of building stock research, in Australia (2020), Greece 202), Indonesia (2021), Germany (2021 though IOER had been involved with aspects prototype development since 2018), and Colombia (2022). Discussions with academic departments resulting in an initial commitment to set up and test demonstration platforms takes time and discussions with Switzerland (2021), China (2021) and Sweden (2021) are ongoing, with existing academic partners also looking to bring other countries on board. Though academic partner has joined with a view to advancing specific research areas, though partners there is an understanding that departments will work collaboratively to ensure that all types of data captured in the London prototype, and others where relevant, will be collected across countries to allow these to be co-analysed and shared. The speed of development varies considerably and has largely depended on the speed with which national institutions. From 2020 testing was also combined with the set-up of the Colouring Cities Research Programme described in Design Stage 5.

Method

  • Assessment of feasibility of testing Colouring London prototype code with Uk cities. (Negotiation with Ordnance Survey re use of OSMasterMap polygons at national level ongoing);
  • Discussion with the American University of Beirut's urban Lab (AUB) in 2018/19 (via IOER) with particular focus on testing open code value in the analysis of urban complexity, urban science and the study of urban dynamics;
  • Colouring Beirut team set-up, securing of demo funding, and forking of Colouring London code;
  • Citation of Colouring Colouring Beirut platform in energy context and to urban science (see also outputs below);
  • Discussions with the University of Bahrain's Urban and housing lab and cultural ministry (via Loughborough University) with particular focus on testing open code value for heritage research, housing and planning;
  • Colouring Bahrain team set-up, securing of demo funding, and forking of prototype open code;
  • Discussions with the University of New South Wales (via CASA/UCL) with a particular focus on testing open code value in context of the Australian Housing Data Analytics platform;
  • Colouring Australia team set-up, securing of demo funding, and forking of prototype open code;
  • Discussions with the Technical University of Athens (via Loughborough University) - with an interest in providing open data relevant to diverse stakeholders involved in planning, heritage, sustainability and energy, housing and vacancy, construction, public space design, and disaster scenarios;
  • Colouring Athens team set-up, securing of demo funding, and forking of prototype code;
  • Discussions with King's College London (UK) (via CASA/UCL) and the Institut Teknologi Bandung (Indonesia) re testing open code value in democratising building attribute data, capturing data relevant for urban simulations and modelling, and testing a model for the Global South;
  • Colouring Bandung team set-up, securing of demo funding, forking of prototype code;
  • Discussions with Empa, Switzerland (via University of Oxford) re testing value of prototype code in context of energy analysis in Zurich;
  • Discussions with the Colour Research Institute, China Academy of Art re involvement in colour research;
  • Discussions with The Leibniz Institute for Ecological Urban and Regional Development (Colouring London colaborators since 2018) re testing value of prototype code in co-creation of platforms involving citizens and specialists to answer scientifically and socially relevant questions, to also use training or validation data for AI-based mapping approaches and analyses of the building stock in relation to issues of energy efficiency, heat load, materials or retrofitting.
  • Discussions with Mälardalen University (Department of Energy, Building and Environment) (via AUB)
  • Discussions with The District University of Bogotá (Universidad Distrital - Francisco Jose de Caldas (via the Alan Turing Institute) re testing value of prototype code value in increase public engagement with existing large-scale databases providing open building attribute data;
  • Colouring Bogotá team set-up, securing of demo funding and forking of prototype code;
  • Colouring Sweden team set-up;
  • Discussions with UNSW and IOER regarding support for additional countries coming on board;
  • Alan Turing Institute provision of support through co-ordinating of PI and software engineering meetings for international collaborators, provision of free access to Colouring Cities domain name, free one-to-one sessions with PIs and engineers and when required, inclusion on Turing website and dedicated pages within the Open Manual, and facilitation of joint publications and funding bids;
  • Development of methods of working.

Outputs

  • Academic collaborations in Lebanon, Bahrain, Australia, Germany, Greece, Colombia, Indonesia relating to building stock research;
  • Funded demonstration platforms testing Colouring London code under development in Lebanon, Bahrain, Australia, Germany, Greece, Colombia, Indonesia. See here;
  • Additional academic collaborations including those being discussed with potential partners in Switzerland, Sweden, China, and potential collaboration identified in Finland, Vietnam and Bangladesh;
  • Academic publications, articles and citations arising from testing. See here;

platforms Ed current development flow


Design Stage 5: Colouring Cities Research Programme set-up;

Overview and objectives

Design Stage 5, which began in 2020, relates to the development of the Colouring Cities Research Programme (CCRP) as as a result of testing of Colouring London code by international academic partners. This process has led to the Alan Turing Institute to focusing on

  • a) testing ways of producing sustainable, global, interconnected network of Colouring Cities open databases/data platforms, able to be overseen by academia, at low cost, that are co-developed and collaboratively maintained with communities, government and industry and other stakeholders to produce big data on building stocks, at global level, of the highest quality possible;
  • b) developing an international academic research programme that looks to undertake multidisciplinary analysis of data, across countries and disciplines, and to gain, through experimentation with AI and machine learning, and the use of data simulations and visualisations, insights into building attribute relationships, patterns, cycles and underlying rules of operation, necessary to improve the sustainability, resilience, efficiency and quality of the global building stock (Design Stage 6).

This section describes the first steps taken to build the interconnected network which includes formalisation of the working practices within the CCRP academic group, and more collaborative working on prototype platform code, and data applications.

Method

  • Clear branding of the CCRP academic programme to differentiate testing of Colouring London code from other applications resulting from open release of code on GitHub;

  • Production of CCRP logo for exclusive use by CCRP partners;

  • Set-up of CCRP webpage on the Alan Turing Institute website to publicise the CCRP academic programme and to identify CCRP partners and provide evidence of membership for national-level fundraising/consultation programmes;

  • Development of the CCRP Open Manual on GitHub, drafted and initially edited by The Alan Turing Institute, to provide background information for international academic collaborators/stakeholders, and updated information on publications showing impact and applications;

  • Set-up of dedicated CCRP Open Manual pages on GitHub for participating countries, edited by academic collaborators, recording teams, methods, funding etc, to distribute maintenance/updating on progress and to incentivise collaboration across countries and to support new country sign up and provide examples of academic engagement relevant to new countries considering CCRP engagement;

  • Publication of CCRP protocols on CCRP open manual- addressing eligibility, common objectives, responsibilities, issues relating to data ethics, openness and inclusivity, project values etc.

  • Set-up procedure for sign-up to these protocols by all institutions joining the CCRP and setting up Colouring Cities platforms and provision of free support package for new members offering 1:1 meetings with Turing to discuss interest/ feasibility of engagement, issues relating to setting up of demonstration platform set-up and requirement of letters from academic collaborators confirming eligibility, and protocol agreement;

  • Governance model (first iteration);

  • Identification of impacts; Setting out of resources provided by the Alan Turing Institute; Set-up of regular international meetings of principal investigators and engineering teams, co-ordinated by Turing, Loughborough University and the University of Oxford; Development of international software engineering collaboration- contributions to GitHub open code repository and regular sharing of software engineering expertise;

  • International research collaboration - collation of multidisciplinary knowledge, support for funding bids, joint academic publications;

  • testing of low cost operational model*f multidisciplinary knowledge, support for funding bids, joint academic publications;

  • Co-working on manual

  • Ongoing fundraising for secure team;

  • https://github.com/colouring-cities/manual/wiki/C.-Governance,-sustainable-development,-risks-&-funding This that large-scale high quality datasets can be produced and compared across countries; and that research progamme design facilitates and supports international, multidisciplinary knowledge exchange on building stocks.

  1. Existing ethical frameworks and protocols for open data collection;
  2. Open data crowdsourcing platforms (including Wikipedia; Survey of London's Whitechapel map site, OpenStreetMap, New York Public Library, Historic England and Know Your Place);
  3. Open data ethical frameworks (including Open Data Institute, Centre for Digital Building Britain, GitHub, Open Knowledge Foundation, OpenStreetMap); The development of a single GitHub repository for co-working on core open-source code as well as sharing of country specific code.

This will accelerate completion of core prototype components, allow for improvements and additions to be implemented more quickly, and enable international research engineering time and expertise to be more efficiently shared. differences in ethical issues across countries, and more efficient methods of code development were all explored. d) awareness that centralisation and technical management of platform, at individual country level level, was more efficient and also necessary to prevent further data fragmentation and to move towards national open databases for stocks as a Whole.

CCRP governance   collaborative maintenance model  Image courtesy of the Alan Turing Institute

ed long term development flow Image produced by Ed Chalstrey 2022


Design Stage 6: Identification and joint recording of applications and impact.

ADD

impact diagram

See also Section M and individual country pages.


Planned Design Stages 7-9 2023/4

Further information on planned design stages will be added here

Design Stage 7: Co-analysis of data captured with CCRP partners, and open publication of research. Joint papers to include use of AI and machine learning approaches to understanding the stock as a dynamic systems, including locked-in patterns and rules of operation (Planned 2023/4);

Design Stage 8: Integration of data animations and simulations and development and integration 3D rule-based open models (Planned 2023/4;

Design Stage 9: Open-source code development to support integration of open data for other areas of national infrastructure (Planned 2023/4).