A1. FAQs - colouring-cities/manual GitHub Wiki

What is the project about? What is its purpose?

The Colouring Cities Research Programme (CCRP) has been set up a to facilitate knowledge and data sharing about buildings at national and international level. The aim of the research programme is to increase stock quality, sustainability, efficiency and resilience, to support the United Nations' Sustainable Development Goals, and to assist communities and other stakeholders in this process. The CCRP also looks to effect a step-change in the amount and type of building level attribute data available for use in scientific research and in AI and machine learning, to advance understanding of the stock as a complex dynamic system. The CCRP does this by bringing together international academic institutions that specialise in building research to a) develop and manage connected Colouring Cities platforms, b) harness and collate knowledge and data held by diverse stakeholders at country level, c) co-work on reproducible open-source code for these platforms, and d) tackle common problems found across countries of data fragmentation, incompleteness, quality, formatting, range, geographic coverage, granularity, security and accessibility, and publish findings. The CCRP also promotes The Turing Way which looks to advance reproducible, ethical and collaborative data science. See also here.

What is the idea behind the Colouring Cities Research Programme?

We believe that highly visual, data platforms, managed by academia, that capture and release open spatial data on the composition, operation and dynamics of building stock at building level, are essential to help improve the quality and efficiency and sustainability of national stocks and to help meet global sustainability goals. We also believe that the most efficient way to collect relevant data and to analyse urban patterns, to solve complex problems relating to building quality, longevity and operation - is though reproducible open platforms, designed and operated by stakeholders, that visualise and share data at building level and test multiple data capture methods.

How have you chosen the data categories?

Data categories have been selected based on: a) initial extensive consultation with communities and built environment stakeholders at the Building Exploratory (1996-2001); b) cross sector consultation at the Centre for Advanced Spatial Analysis, University College London (2015-2019); c) extensive literature review of data types at UCL used in sustainability science and the urban modelling (2015-18), and d) feedback from UK and international academic advisors and collaborators collated by the Alan Turing Institute (2020-). Examples of specific data applications for which data are being used include Colouring Australia's Housing Data Analytics platform (AHDAP) collected are shown here. Categories are added to on an ongoing basis in discussion with international partners and the stakeholders at country level.

When did the project start?

The concept for free mapping platforms sharing comprehensive data on the composition, quality and history of the stock at building level, to improve local area quality, efficiency and sustainability was first explored in the 1990s at the Building Exploratory charity in London. The Exploratory prototype was built over 6 years by local community members in Hackney, London working with stakeholders It operated as a model for low-cost, creative knowledge sharing centre about local stocks, built by local communities in collaboration with stakeholders from science and technology, the humanities and the arts, and the property industry. The Exploratory tested a new type of public GIS interface that layered current and historical spatial datasets at building scale, and allowed users to zoom down onto their homes using aerial and selected streetview images, to drive discussion on the past, present and future of the stock.

Between 2014 and 2020 the Colouring London prototype was developed and tested at the Centre for Advanced Spatial Analysis (CASA), University College London. Within this findings from the mutlisector building knowledge exchange model, and GIS interface design both tested Building Exploratory, were merged with a) research undertaken in 2010 (into methods of creating city-scale, building level, building age maps to support retrofit targeting, and into 3D evolution animations), b) to the open data movement and its principles, collaborative maintenance systems and open source repositories, c) findings from the Survey of London's work with CASA mapping knowledge crowdsourced from historians (https://surveyoflondon.org/), as well as input from many sectors and disciplines and from a community led testing programme. In 2016 permission was given by Ordnance Survey to use its footprint data to capture and map building attribute data, and was developed tested over the next 4 years at UCL. In 2020 the project moved to The Alan Turing Instititute - the UK's National Institute of Data Science and Artificial Intelligence - where the Colouring Cities Research Programme was set up to support reproduction of open code and testing of the platform at international scale.

Who are the significant partners in the Colouring Cities?

The Colouring Cities Research Programme (CCRP), for which Colouring London is the prototype, currently works with academic partners across nine countries: Australia, Bahrain, Britain, Colombia, Greece, Germany, Lebanon, Indonesia and Sweden. More information on these and partner prototocols can be found here. Within each country academic leads then collaborate with multiple stakeholders. See also.

How was the project financed?

Colouring London's initial development was mainly funded by the UK's Engineering and Physical Sciences Research Council (EPSRC), Historic England and Innovate UK.

International development of the CCRP and work on the prototype since 2020 has been funded by The Alan Turing Institute and by the UK government's AI for Science and Government funding programme (ASG).

International Colouring Cities platforms set by academic partners are funded at country level. Further details of this funding can be found in section M of the open manual. (In future information for all countries will be collated).

What have been some of the biggest challenges?

Retention of high quality research software engineers is one of the biggest problems. We are now looking at a model where engineering expertise is eventually pooled across countries. The need for a long initial period of funded research time to develop and test the first stage of the prototype- where impact measurement was problematic- was a challenge to achieve. We needed, as in the Exploratory, to build, incrementally, and to create deep, extensive and trusted foundations and networks to create platforms from which rapid, multidisciplinary and geographically extensive knowledge sharing, and data analysis could then occur. Working in academia has allowed the CCRP to expand incrementally in a way that is affordable and beneficial to all partners.

You have said that London is still in the experimental phase, even after seven years. Why is that? And is it a "bad thing"?

No it’s a good thing! We still feel we are in the experimental phase in London even after over 7 years of development and over 4 years live as the individual categories have each taken so long to design, consult on and build. What is brilliant is that our international partners are testing and advancing research in many new areas. We are beginning to publicise Colouring London much more widely and will set up Colouring Britain in 2023. The first iteration of Colouring London (using mapping interfaces to integrate and visualise data on the composition and history of building stocks at building level) in the late 1990s at The Building Exploratory charitable trust in London (which was also built collaboratively and took over six years to set up the basic structure). It wasn’t until 2015/16 through working with Ordnance Survey in the UK and with Tom Russell at UCL/Oxford that were able to get comprehensive high quality building footprints for London, up online (essential building block of Colouring Cities platforms which act as visual mini filing cabinets for all our data) and begin to connect the concept of open databases/mapping platforms combining info on the composition, performance and history/dynamics on stocks to many advances made in the meantime by the open data movement. So you have to be patient with these things. Also though you have to have reliable ongoing it is more helpful in this type of research have less funding but for this to be committed for 3-5 years enabling you time to collaborate across sectors to lever in help in kind of a much higher value (in a way that benefits all sides) than get the funding in a lump sum that has to be used quickly and that grows teams much too fast and which then runs out. Negotiation and development of trust and collaboration with so many stakeholders also takes a lot of time.

We have been a very small team for many years and there have been so many features we have needed to build, and issues we have needed to think about, to make the platform make sense that is has not felt ready to publicise until now. We still have several hundred outstanding issues on Github and are also constantly wanting to add new ideas which is exciting but can be frustrating if things have been designed and we know are useful but we just can’t get them out to users because of lack of engineering time. Hence why the international engineering pool is so important but this has taken many years to begin to set up. We went live in 2018, officially launched in 2019 and it is only in 2023 that I think we will really begin to see numbers rise significantly. Our initial main push for 2023 is with the historic environment community as this is the sector best set up to voluntarily provide high quality data relevant to multiple categories

How do you motivate citizens to participate in the project?

The interface is specifically designed for citizens as well as professional stakeholders. The use of colour is essential to this process and physical process of colouring in. When a user clicks a building and adds data colour is used to thank them and to help them see how each data entry they add is a critical piece in the visual puzzle. Colour needs to appear instantaneously for this to work properly to retain user interest. It is also used to show that we can work collectively, whoever we are, on a single canvas to provide information necessary help solve complex urban issues. We expect to learn much from Colouring Dresden work next year, funded by IOER's recent prize, in terms of blockers/facilitators to citizen participation that will be fed back into the core open code design.

Are users required to enter personal details?

Users are actively discouraged from providing the CCRP with personal data. Users are not required to add personal information to use the site. The site is free to view. For those wishing to edit only a username and password are required. The email address is optional. If provided this allows us to send the user an email to reset their password if they forget it, and in exceptional circumstances contact a user directly if it looks like they are misusing the site, for example to let them know if we plan to disable or remove their account.

The email address is stored in a database which can only be accessed from within the local network of the Colouring London application server, which is only accessible to select developers working on the project. Users of the Colouring London site can only access their own information (there is no "admin panel" or other kind of user with special access) and connect to the site using standard HTTPS encrypted communications. Documentation of crypt and gen_salt used can be found here.

It would actually be useful for us to know more about the sector users come from etc so we can ensure in the long run we maximise accessibility for diverse groups but we think security is more important and so for the moment we will continue consult directly with representatives from as many stakeholder groups as possible instead. We also don’t collect data on the inside of homes as we think this is personal space and needs much more discussion.

How can the quality of data collected through Citizen Science be verified?

We cannot vouch for the accuracy of data we can only provide indicators of accuracy and uncertainty that users can employ to assess to suitability of data for their own purposes- whether a policy document or school project. We will continue to work to look at ways of improving data quality. The main ways we are currently doing this is by:

  • Integrate four methods of data capture- i) moderated open bulk upload form reliable sources, ii) crowdsourcing expert knowledge from communities and experts at building level; computational generation using inference undertaken as part of research initiatives with academic collaborators, and streaming of official data.
  • Creating feedback loops between these processes. Feedback loops also improve data accuracy and speed of data capture. For example you can use automated methods to create a draft of say building age data – we’re doing this at present for London using vectorised historical street data- and then ask expert historians and building conservation groups to use the map as a canvas to collectively verify these one by one.
  • Providing a visible edit history details - as in OpenStreetMap
  • Providing Verification buttons (At the moment this button can only be licked once per user and simply logs if another user agrees with this edit. We still have work to do on this. For example what happens if the entry has multiple verifications but is then edited? Though All verifications will be recorded in the edit history the viewer will still only see the new edit with no verifications instead of say a previous entry with 10 verifications which is likely to be more reliable. This is an example of one of the hundreds of issues we try to grapple with that don’t have easy answers and why working across countries and academic teams and getting feedback from multiple sectors is so important).
  • Providing Source links
  • Including Uncertainty measures- e.g. earliest or latest possible data

What has been the most important lessons learned so far?

  • that everyone has knowledge of how well buildings and local areas operate and that we need to find better ways of capturing statistical data on this from multiple sources, including from citizens
  • that spatial statistics are critical when analysing building stocks as their attributes, performance and dynamic behaviour are affected by where they are built
  • that online, visual, open data platform that map data on stocks at building level, for all buildings, can support urban problem solving in a highly efficient way
  • that countries we work with are all interested in collecting similar types of data - though with local variations of course- and that understanding differences and similarities are critical in advancing collaborative research into sustainability at global level
  • that an academic governance model has been shown to work well for cases such as stocks where data are highly fragmented, with knowledge and datasets held by many sectors and disciplines and by communities themselves, and where trust and strong ethical frameworks are critical
  • that Large scale datasets produced by the CCRP, combining manual and automated methods have the potential to exploit the ability of AI to rapidly provide insights into data patterns and potential blockers to and facilitators of sustainable development
  • that ethical issues relating to large-scale collection at building level must be prioritised and more openly discussed, with significant issues foreseen in relation to collection of open data relating to the interiors of homes and future use in 3D city models and games. (Domestic buildings make up over 90% of UK properties).

The project is now being implemented in other countries. How far does The Alan Turing Institute support rollout?

It is important to stress that members of the CCRP international research group represent academic departments that are specialist in specific areas of building stock research. As such all CCRP collaborators bring research expertise, as well as engineering skills, to the table. Our interest is in working with CCRP members to ensure that we collaboratively create a model in which CCRP members feel that benefits of membership and international collaboration exceed the value of effort and funding put in.

CCRP resources provided to CCRP academic partners Use of the Colouring Cities Research Programme logo; Inclusion on the Alan Turing Institute website; Free use of the Colouring Cities domain name; Dedicated partner page on the Colouring Cities Open Manual with editing rights; Access to online CCRP PI and engineering meetings managed by The Alan Turing Institute (see meeting programme below); Software engineering guidance for research software engineering teams for demo platform development; Help-in-kind support for funding applications which relate to CCRP platform setup; Opportunities to co-work on platform code and share engineering expertise; Opportunities to co-work on content and interface design and share research and stakeholder expertise; Opportunities to co-work on additional platform tools as well as animations and simulations of data and 3D/4D open models; Opportunities to co-work work on data analysis across countries and to experiment with AI and machine learning approaches; Opportunities to co-work on research papers; Opportunities for joint publicity; Opportunities to co-work on international funding bids.

What's next for the project? Any future plans or cooperations?

We are currently working with our international partners to set CCRP hubs that support Colouring Cities open data platform, in a number of global regions. We are starting co-working on academic papers across countries and beginning to co-work on improving core-platform code. Once a sufficiently large global database has been developed we will also be using AI and machine learning approaches to gain insights into data patterns.

The CCRP is currently managed by Turing – in 2022 under a UK Artificial Intelligence for Science and Government (ASG) grant and in 2023 under core funding. We are now beginning to help set up hubs for global regions which operate using the turing model and simply provide informal support for academic departments within these regions wishing to reproduce CCRP open code. Each country finds its own funding, ideally starting with small amounts available with academic departments able to use platforms to answer research questions relevant to them. We are interested in agile working and developing supportive flexible networks but not creating new bodies/admin unnecessarily and wasting money. We think that we can run our global hubs at minimal cost with those academic institutions/countries supporting these initially offering some help in kind but we will need to apply for international research grants to support this process, and allowing it to be mainly self managing but also overseen in case of issues, and to maximise research outputs and impact from it.

How difficult is it to sustain a high quality software engineering team?

High quality engineering time will be a significant expensive of any data platform unless methods of sharing expertise can be found - as demonstrated by collaborative maintenance initiatives such as OpenStreetMap. However many of the wonderful engineers we have worked with who have been very interested in open data and academic research don’t have money as a major motivator. Furthermore, though they will of course be interested in the project they will also have many people wanting to work with them and are likely also to want to advance their own initiatives/ideas. Research software engineers are ideal for Colouring Cities as these engineers come from academia and so the research aspect is already a priority for them. Their work is also already coordinated by academic departments, so we recommend our CCRP partners collaborating with such teams wherever possible. The more our open code is tested and the more countries/academic departments come on board the more a diverse groups of great engineers will become involved. The idea is that though most of CCRP work will involve engineers developing high quality databases for their own countries, some time will be able to be given by each to co-work on Colouring Cities core open code. If there are 20 countries in the pool that is potentially a lot of engineering time, and PI time, available to improve core code features that then benefits/ speeds up work for everyone. Furthermore if one team in a country has questions or issues others can help answer these. This is already happening and the Software Sustainability Institute funded our technical architecture advisor Tom Russell to coordinate engineering discussions with CCRP international engineers last year. Worth noting that Colouring Bogota is testing a fantastic model driven by engineering students.

Is the Colouring Cities Research Programme (CCRP) there to support the Colouring London project or how can you describe the relationship between those two?

The CCRP has evolved out of the development of Colouring London. Colouring London has to date operated as the open code prototype for CCRP international partners. However from this year we are changing this model. Colouring London will now be renamed and expanded this year to become Colouring Britain. Core code from colouring London has now been copied into a new repository called core-colouring cities. Features and issues in this GitHub repository will now be co-worked on by engineers from all participating CCRP countries. Turing will continue to oversee code additions and changes.

You want to use Artificial Intelligence in the future - what will that look like?

Our engineers are already beginning to test use of AI in the development of our open code and this year we will begin to develop algorithms to support rapid large-scale inference specific types of data based on footprint size, shape and specific configuration. What we are really interested in in future working with academic partners to use AI to begin to identify underlying patterns and relationships within the data. But this can only happen once we have collected enough data/ huge amounts of data within and across countries for such patterns and cycles to be able to be identified. This is what CCRP platforms are set up, in this first stage, to do. We’re interested in for example better understanding the relationship between building form and construction and urban health, to look at deprivation cycles in urban areas over more than 100 years and identify whether areas of high deprivation largely remain (as they do in London) in the same areas despite a century of policies, to begin to understand why retrofit is occurring more rapidly in some geographic locations than others, and to see what are patterns that are common across countries or specific to a single nation? But we also need to do this in a way that always maximise security and privacy of international platform users and of building occupiers. So a lot more discussion is needed and a lot more work undertaken around ethical issues as we go.