E1. DATA - colouring-cities/manual GitHub Wiki

Live Editor polly64.

Introduction

The Colouring Cities Research Programme (CCRP) develops public databases and data capture platforms that collect, collate, visualise, verify and release high quality spatial data on national building stocks. Data are supplied by diverse stakeholders including from academia, industry, government, the third sector, and citizens, using a range of data capture methods. The aim to to provide the minimum amount of data required to support UN Sustainable Development Goals and the UN New Urban Agenda. Personal data are not collected.

Data categories, classes and formats are standardised across national platforms to allow for comparative analysis of many types of datasets, and to enable very large-scale datasets to be generated to allow AI and machine approaches to gain insights into the stock as a complex dynamic system and to help identify cycles and patterns, underlying rules of dynamic behaviour, and sustainable trajectories.

Open code for core repository data classes is managed by The Alan Turing Institute and contributed to by CCRP academic partners. Open code for local data classes i.e. those specific to individual countries or regions within these is managed by academic partners within country level CCRP repositories.

CCRP platforms:

use a common language and promote a common vision
use a standard interface that maximises access to over 100 classes of spatial building attribute data
share information consistently across the CCRP system
test and integrate a range of data capture methods and test feedback methods between to improve data accuracy and maximise stakeholder engagement
use consistent validation rules (i.e. the data type needs to be a date, text, integer, part of a list, etc.) and consistent metrics and formats
explore the relationships between attributes, and use attributes to infer others
experiment with a range of data verification methods
prioritise the well-being and security of platform users and building occupiers in data collection
are never used to collect personal data.

Data quality criteria and standards

The CCRP follows the following open data standards and frameworks against which quality is measured, in order to maximise trustworthiness, accuracy and suitability for purpose:

1.(The Open Data Charter](https://opendatacharter.net/principles/)

Open by default
Timely and comprehensive
Accessible and useable
Comparable and interoperable
For improved governance and citizen engagement
For inclusive development and innovation

FAIR Guiding Principles for scientific data management and stewardship; how to go FAIR:

Findability
Accessibility
Interoperability
Reuseability

The Locus Charter

Subject area: Geospatial data Oversight: ** Formed through collaboration between the Benchmark initiative (Ordnance Survey and Omidyar Network) and the American geographical Society's EthicalGeo program'** ** https://ethicalgeo.org/wp-content/uploads/2021/03/Locus_Charter_March21.pdf Add info

UK Data Quality Dimensions These are:

Accuracy: when data reflects reality e.g. correct names/addresses that are up-to-date;
Completeness: when all data for a particular use is present and able to be used;
Uniqueness: when data appears only once in a dataset and duplication is avoided
Consistency: when data values do not conflict with other values within a record or across datasets;
Timeliness: the time between when information is expected and when it is readily available for use;
Validity: the extent to which the data conforms to the expected format, type and range e.g email addresses having an @ symbol.

Collaborative work on quality and standardisation of data categories and formats across in CCRP platforms by CCRP international partners is ongoing. Further information on data capture methods and cross referencing is available below.

Data Divisions

Building attribute data captured by the CCRP sits within the wider domain of physical infrastructure data. Main categories have been chosen to maximise ease of access for all stakeholders, and be flexible enough to allow ongoing adjustments prompted by results of ongoing platform testing/stakeholder feedback. Main categories have been selected and structured in such a way as to meet the following requirements identified over the lifespan of the research programme:

To provide comprehensive data on national stocks - composition, performance and short and long-term dynamics accessible, and relevant, to academia. government, industry, the third sector, and citizens;
To develop a multifunctional platform that not only provides open building attribute data but also supports stakeholders and communities as a whole to work collaboratively to accelerate the move towards better quality, zero carbon stocks and to support disaster situations;
To test and integrate global classification systems wherever possible to maximise platform usefulness and data interoperability/sharing/accessibility across countries;
To increase understanding of the stock as a complex dynamic system;
To better understand underlying relationship between individual building attributes, and between building typologies and their location.

Three divisions of building attribute data are collected. These relate to:

i) The current composition of the stock and spatial location of building typologies
ii) Building quality and performance
iii) Building lifespan and the short-term and long-term dynamic behaviour of stocks.
Division 1: Data on the Current composition of the stock Division 1 comprises data on the characteristics and spatial location of all buildings within the current stock to build a complete picture of current stock composition. Data is collected on building coordinates and IDs, typologies, land use, ownership type, age, size, and construction system and materials.
Division 2: Data on Performance, state and quality (socio-cultural, economic and environmental). Division 2 comprises data on how well these buildings work in socio-cultural, economic and environmental terms. This information is essential to improve monitoring and decision-making relating to stocks in the context of global sustainability goals (see Sections A and B). Data collected includes that relating to energy performance, to community views on how well buildings work at local and city level, to professional awards, and to those (current and historical) involved in a building's design development to incentivise developers to construct better performing/longer lasting building. New types of dataset -(inferred from other data collected) are also proposed providing. These include ratings for adaptability and potential for lifespan extension, repairability, and ease of retrofit. Division 2 also addresses the need to live capture of data on the structural state of buildings in emergency situations, caused by climate change (e.g. earthquakes, wildfires, flooding) or by other human interventions such as bomb explosions/war.
Division 3: Data on the Dynamic behaviour of typologies and sites. Division 3 comprises data on historical change in stocks to enable current change to be more easily tracked (What? Where? How much? How fast?), and cycles, patterns and constraints/opportunities for the the future to be better understood. Of critical importance is to understand the operation of the stock as a system and underlying rules of change in relation to its component parts to improve long-term strategies, and to provide both data and rules for stock forecasting models that are as accurate as possible. Data collected includes short-term dynamics data, under planning, on new builds and demolitions, designation data (indicating locations with development constraints/where churn in the stock is consciously slowed), and long-term dynamics data on lifespans of all demolished buildings on sites (involving the capture from historians of construction and demolition dates, and location information, for all demolished buildings ever built), from which typology/building survival rates and reasons for resilience/vulnerability can be assessed. Information on dynamic characteristics (and anticipated future dynamic behaviour) of the urban tissue (i.e. the type of building-plot-street network configuration is also captured.

The 12-category-grid

Data relating to these three 'Divisions' are grouped within 12 main data categories. These represent the smallest number of categories found to comfortably structure the 100+ classes of data collected by CCRP platforms (required to help meet objectives set out under 'Divisions') and additional data classes, as and when needed. And also for the number of categories to be able to be configured in a clear, simple, visually attractive way. The 12 data categories are grouped in a simple, coloured, 4 x 3 grid, which also forms the CCRP logo, and operates as an interactive keypad on the user interface. Each main category contains a series of subcategories within which the data classes sit. CCRP main categories are as follows;

'Location'
'Land Use'
'Typology'
'Age & History'
'Size'
'Construction'
'Street Context'
'Team'
'Planning Controls'
'Energy Performance'
'Resilience'
'Community'

The relationship of Divisions to Main Categories is as follows:

Current stock composition data: Found under 'Location', 'Land Use', 'Typology', 'Age and History', 'Size', 'Construction' & 'Street Context'
Building performance, state & quality data: Found under 'Team', 'Energy Performance', 'Resilience' & 'Community'
Dynamic behaviour data: Found under 'Planning' and 'Age and History.

The coloured grid offers a simple, accessible portal to a vast array of spatial data. The grid has undergone many iterations since 2016 with main categories fine tuned and data classes added.

To maximise efficiency and clarity, the CCRP looks to identify and provide access to the minimum number of data classes required to support UN Sustainable Development Goals and the UN New Urban Agenda, and to enable cross country analysis. An unexpected degree of agreement on categories and classes required to achieve this has been noted during consultation. Consultation has also identified the need, within each platform, for additional bespoke data classes to be included to accommodate local variations. The format has been proven to very flexible, with restructuring, renaming, reordering, and merging of main categories and the relocation of classes across categories, found to be very straight forward. Only in a few cases has it been seen as necessary to duplicate classes across categories for ease of use. Though main category titles have remained relatively constant since project conception in 2016, CCRP partner discussion has resulted in some adjustments being made. These include the merging of 'Age' and 'Dynamics' under 'Age and History' (recommended by Colouring Australia/UNSW); the splitting of 'Sustainability' section into 'Energy Performance' and 'Resilience' (recommended by Colouring Germany/IOER), and the reverting to the original 'Street Context' from 'Streetscape'from (as recommended by Colouring Indonesia/King's College).

The method used for main category and data class choice is as follows

Initial assessment of existing UK public mapping platforms providing data on the building stock (2016);
Review of over 300 academic papers and documents produced with the context of urban science, sustainability science, urban morphology, urban theory, planning, building conservation, urban history and UK building data classification (2014-2019);
Consultation with over 60 stakeholders including two public exhibition and a community testing programme (2016-19);
In-house platform category testing and adjustment (from 2018)
Ongoing discussion with CCRP international partners, and national and international organisations/agencies (from 2018).

The 12-category-grid is also designed to be able to be nested within more comprehensive grids made up of an unlimited number of cells to provide a simple portal to other types of relevant open data. An example is shown below.

big picture 2

CCRP data classes

Within the 12 subject categories, over 100 classes of spatial data are collected at building level are collected all countries. These are considered 'universal' classes. In addition CCRP open source code may be produced at country level to support national/regional variations or required additions. This code may also be reproduced by other platforms where relevant.

Classes include quantitative data such as building height (m), storeys (no. storeys), footprint area (m2),(road width (m)) lifespan (no.years); categorical data such as building name, builder's or architect's name and land use, and qualitative data (non-narrative/statistical only) such as how well people think a typology works- here using dropdowns or Yes/No options.

Though classes may be easily added to, reordered, renamed, and repositioned within new main categories, changes to questions posed, and data formats need to be limited as much as possible to prevent data wastage. To achieve this, extensive consultation on each additional class has to be undertaken with experts in the relevant field. This helps maximise ongoing usefulness of, and continuity of formatting for all classes of data.

Below are classes of data for which CCRP code are available, and data formats. Data on type of source type , source link/s and number of verifications are also collected for every class.

Single function and multifunctional categories

The initial seven categories in the 12-category-grid can be described as single function categories and contain data relating to current building characteristics and building context. 'Location' data are placed first in the grid as they are required to geolocate all other attribute data and without building footprint geometries and coordinates Colouring Cities platforms cannot be built. Placed 2nd in the grid is current 'Land Use', noted during UK prototype consultation to be the most commonly used building attribute data type. Land use is followed by 'Typology' and then 'Age and History'. Data relating to both categories are predicted to increase in value owing to their importance in inferring 3D form and detail, and building resilience and lifespan. Building age, and information on demolished buildings, are also able to used in data animations showing changes to the age and form of the stock over time. These are followed by 'Construction' and materials, with 'Size' data helping to flesh out the detail of each building's physical form. 'Streetscape' completes the single function categories, providing data on the relationship between the building, plot and street.

The remaining five categories, outlined in black below, operate in a different way. Each provides attribute data but also has one or more additional functions used to help promote sustainability goals. These (along with integration and testing of multiple data capture methods- and feedback loops between these) distinguish Colouring Cities platforms from other building attribute databases and visualisation platforms. Platforms are also designed to operate in future as digital twins within which a dynamic connection exists between the digital platform and the stock itself to help drive the sustainability agenda, and where data are able to be rapidly turned into insights able to improve decision making. Examples of multifunctionality within categories include spatiotemporal tracking of building quality produced by individual developers to drive up building standards ('Team'); streaming of status of planning applications/current change, and live engagement of communities with the planning system ('Planning Controls'); live capture of structural status of buildings in emergency situations ('Resilience'), and tracking of losses/gains to community space over time ('Community').

data category mixed Each of these categories is described briefly below:

1. Location

https://colouringlondon.org/view/location

Current Location data subcategories: Addresses; Property/footprint IDs and Coordinates
Data classes: Building name (domestic); Building name (non-domestic); Number; Street name; Address line 2; Town/City; Area code; Building footprint ID; Property Reference Number; OpenStreeMap ID; Centroid coordinates; Alternative building footprint links
Security and privacy concerns: High

Building footprints (in the form of vectorised polygons) provide the basic building blocks for all Colouring Cities platforms. These acting as mini filing cabinets, as in other types of Geographic Information System (GIS), within and through which attribute data can be collected and stored, collated, corrected, visualised and disseminated. Coloured footprints, rather than coloured points, also make data much easier to be read by platform users, and facilitates community discussion on data patterns, able to supplement computational approaches. Building polygons/footprints provide information/data on spatial location, building geometry, ground floor/total area, building perimeter, and number, size and shape of walls. They can also be used to help infer other characteristics, such as age, height and 3D form. Extrusions from footprints can also be used to generate simple 3D models.

Footprints are also necessary to undertake detailed spatial analysis and modelling of the stock. To ensure data captured with Colouring Cities platforms are of maximum use in understanding the relationship of the physical form of cities to socio-economic and environmental performance, additional types of spatial reference data are also required. This include centroid coordinates for footprint polygons, unique property reference numbers where applicable, building and street number and area code information, national mapping agency polygon IDs, and OpenStreetMap IDs. In Colouring Cities platforms all are collected to maximise interoperability and usefulness of building attribute data. Spatial reference data also has other uses such as enabling basic features within the user interface, such as a zoom facility to specific buildings of interest.

The more accurate and comprehensive the footprint data, the greater the usefulness of the data. The highest quality building footprints are commonly held by national mapping agencies. Where access to national mapping data is restricted, charged for, or not available, other open datasets will need to be used. In recent years open footprints have come more widely available owing to: the release, in a number of countries (mainly since 2015), of open property tax datasets; increased interest from OpenStreetMap in collating building footprints datasets, and open datasets now produced by Microsoft and Google using satellite imagery and AI, producing block level footprints at global scale. This has radically changed the level of interest in granular building attribute data and significantly increased opportunities to understand and analyse stocks. It has also raised a number of security and privacy issues which now need to be addressed.

Building level footprints are only used in CCRP platforms to capture, collate, verify and visualise non-personal open attribute data at building level. CCRP platforms do not release open footprint datasets at building level. This is due to security and privacy concerns regarding the use of footprints to collect and visualise of personal data relating to building occupants. e.g relating to income or health. Colouring Cities national platforms are asked to control building level footprints behind firewalls regardless of whether these are released by public bodies open data. Data relating the interior of buildings is also considered by the CCRP to be private. Visualisation of, for example, energy ratings, energy use and heat loss, at building are is seen grey area requiring further discussion with visualisation currently recommended at block not building level, again even where these are released publicly released at building/property level.

Security and privacy issues also exist in terms of the crowdsouring of address data. Free text boxes are not permitted on CCRP platforms where linked to a building location. Address data may therefore only be entered as moderated bulk uploads/live streams.

Specific research questions of interest in relation to the capture of location data include: How can we address privacy and security concerns regarding potential use of open building footprints at building level to capture and visualise personal data? What types of open location data are needed to maximise the potential for multidisciplinary/multi-sector analysis? What computational approaches are most useful in increasing access to high quality open building footprints? For further information on sources of building footprints used in CCRP platforms please see here.

location

2. Land Use

https://colouringlondon.org/view/use

Current Use data subcategories:General land use; Specific land use
Data classes: Residential/Non-residential or Mixed Use; Specific land use activity
Security and privacy concerns: Medium to low

Land use is one of the most commonly used types of data in urban analysis and is used in many areas of research and practice including in planning, energy, housing, property taxation and economic analysis. However comprehensive open land use data, at building level, are still rarely available at building level other than where property tax datasets have been released. Data on mixed-use buildings is also particularly difficult to access.

This category captures data on the current type, or types, of activity occurring in each building. Owing to the fact that housing will be the dominant use in all CCRP platforms (making up for example over 93% of taxable properties in a country such as the UK) a default setting of 'unverified residential' is initially used.

Land use classification systems vary from country to country, with in some cases multiple systems operating simultaneously. The CCRP uses a generalised land use taxonomy that enables land use data to be shared across all countries as part of core code, as well as country specific land use classifications. Generalised land use classes are as follows: Unverified Residential, verified Residential; Retail, Industry and Business, Community services, Recreation & Leisure, Transport, Utilities & Infrastructure, Defence, Agriculture, Minerals, Vacant and Derelict. An 'Unclassified presumed non-residential' class is also included to allow for rapid estimation/draft colouring of non-residential areas using computational approaches (method to be added).

Security issues relating to the availability of land use data at building level, mainly relate to the location of utilities. For this reason utilities are visualised under a single 'Utilities and infrastructure' heading.

Specific research questions of interest in relation to land use data include: How much land is used in a city or country for a specific type of activity? Can certain mixtures/configurations of land uses in specific locations help make cities more efficient, resilient and/or sustainable? Does loss of these mixtures effect the opposite? Are specific land use patterns consistent across cities and , and if so how can rules be developed to improve accuracy of land use forecasting models?

current use

3. Typology

https://colouringlondon.org/view/type

Current Typology data subcategories:

Base type; Local typology; Original building use; Roof type; Adjacency/position; 3D open procedural model link Security and privacy issues rating: Low

The type of activities, and number of people, a building is designed to hold, as well as social and technical context of the historical period in which it is built, will all affect its 3D form. Building typology relates to the way buildings are grouped according to their use, shape and size/morphology, and/or architectural style. Some typological descriptions will be common to all countries e.g. detached low- rise buildings and some specific to only one, e.g. the Victorian terraced house (specific to Britain).

Geolocated data on building typologies can allow many individual attributes to be inferredthe stock to be quickly divided according to similar characteristics, and for 3D form, rough dimensions/volumes, relationship to adjacent buildings, roof shape, materials and methods of construction to all be inferred.

This allows for computational approaches to be applied relatively easily to accelerate data coverage especially when footprint and age data are also available. Typology data is also particularly valuable when combined with lifespan data (see 'Dynamics') in allowing both vulnerable and resilient typologies to be geolocated, for the depletion of finite typology reserves to be more easily monitored, for appropriate methods of retrofit (and retrofit budgets) to be more precisely targeted, and for the potential for buildings to adapt and extend their lifespan over long time period to be assessed. The development of typology 'rules' as part of this process is particularly relevant to the generation of 3D and 4D rule-based open simulation models designed to test planning and energy scenarios (see also Section U.3 and Section x). The typology section currently looks to integrate a range of typology descriptions in order to allow for typology classification across countries, and at also local level. These are mainly drawn from urban morphology and are discussed further in section x. Further information on the development of the Colouring London Type category, and issues relating to access to open Type data in the UK can be found in Appendix 4(3).

Dynamic tissue classification 'Dynamic tissue'. The classification builds on Brenda Case Scheer's work in urban morphology associating specific attributes and dynamic qualities with specific configurations of street, plot and building, though tissue types are renamed (see Section x). They are seen as having great potential to allow relative rates of change between specific, easily locatable, types of urban tissue types to be predicted to improve accuracy of forecasting models and simulations.

The three dynamic tissue types for which data collected are: i) 'Old street network' tissue, represented by the oldest/preindustrial streetnetworks along which footfall/trade has passed for the greatest number of years, and where in cities such as London the bulk of commercial and mixed use lie (image top); ii)'Street infill' tissue, represented by, mainly, domestic buildings (which comprise the bulk of the stock) and which fill in the gaps formed by the 'old street network' routes (image middle); and iii) 'large parcel' tissue represented by large land parcels (e.g. >5,000m2) which often contain their own access routes, as used by industrial estates, housing estates, hospitals, large schools, airports, depots, universities etc.. These have also been shown in Colouring London research (and by Stailov and Batty, Whitehand and others), to have different locations, and rhythms of change/demolition, depending on their land use (image bottom).An example of the location of 'Old street network' tissue for London is shown below.

elstic tissue Further information on the development of the Colouring London Dynamics category, and issues relating to access to open Dynamics data in the UK can be found in Appendix 4(11).

Type

4. Age and History

https://colouringlondon.org/view/age

Age and history is split into 3 sections:

Building age: which includes Main construction date; Earliest and latest construction dates; Extension dates; Facade date; Cladding data (if applicable); Last retrofit date and Historical source links
Lifespan data: which captures pairs of demolition and construction dates of all building ever built on a site, as well as links to historical information
Survival data: which compares current and historical building footprints to allow capture of information on number of buildings serving from specific time points. Security and privacy issues rating: Low

Building Age: Age data forms one of the most important datasets in the Colouring Cities platform and is the data type, along with building footprint data, on which most time has been set during prototype development. In Colouring Cities core construction date is the main type of data captured, though facade date, cladding date and main extension date are all also sought. Importantly age is collected by individual year (with uncertainty accommodated through capturing earliest and latest possible date). This has been found to extremely important in addressing current issues with lack of interoperability of age data owing to frequent capture using different date intervals (i.e. morphological periods, by building regulation dates etc) which is essential when analysing age within, and across, countries, and inferring building characteristics.

Capturing data by year significantly increases the usefulness of the data. The more precise the dating the more accurate other inferred information about the building is likely to be. Along with footprint data, spatial age data captured by year can be used to generate 3D typology information, and indicate materials, construction methods and standards, building dimensions, original interior layouts, roof shapes and even original internal and external detailing with greater precision than by interval or using typology descriptions alone. A Victorian terraced house in Britain built in 1837 or 1901 will for example have a different range of possible detail and 3D form. Age data can also infer protection/designation constraints, and potential for lifespan extension/adaptation within plots. It also provides essential baseline data from which lifespans of both extant and demolished buildings, and vulnerabilities and survival and urban metabolism rates can be calculated. Age data has many other applications, ranging from the geolocation of potential health hazards in housing (e.g. toxic materials, damp, steepness of stairs). the development of resilience to demolition in stocks (see also Section U.4 and Section x for further discussion) and risk assessment re earthquake damage (ref GEM). Facade data when mapped against core construction date can also indicate the extent of coring out/rapid energy and waste flows occurring in historic centres (through facade-only retention), unable to be discerned at street level. Age data collected within Colouring London has been evaluated in academic papers in the context of 3D procedural model generation (https://www.tandfonline.com/doi/abs/10.1080/17567505.2018.1517142 (NB produce open version) and energy analysis (https://journal-buildingscities.org/articles/10.5334/bc.52/print/). Methods of inferring building age using historical street network data, and the crowdsourcing of building age in collaboration with the historic environment sector are also discussed in Sections x and x. Further information on the development of the Colouring London Age category, and issues relating to access to open Age data in the UK can be found in Appendix 4(4).

age top

Lifespan data: Lifespan data is extremely difficult to capture. It is however extremely important in understanding typology resilience and in the calculation and prediction of material energy flows (see Section B). In Colouring Cities platforms construction and demolition dates are collected at building level from those with historical knowledge of buildings and localities. with a view to data capture for all demolished buildings ever constructed on each site. Lifespans are automatically calculated once construction and demolition date are entered. Uncertainty measures (i.e. earliest and latest possible dates) are included. Weblinks provide further (generally) text/image based information about each site. Users are also asked to state whether demolished buildings were completely contained within the present-day plot boundary, and if not then to suggest approximately how much of it would have been contained, as earlier buildings would sometimes have spanned multiple contemporary plots. Lifespan data can be used to update the life expectancy information for typologies/age cohorts within the 'Sustainability' to allow less resilient typologies and locations to be identified. NB Typology/historical land use subcategory also needs to be added. This category is also designed to engage and celebrate the vast body of knowledge on stocks and their history held by historians and the community and to sue this to inform scientific research.

dynamics 2

Survival data As well as collecting geolocated lifespan data, additional data are sought to support the analysis of survival/mortality rates in relation to original stock reserves (e.g. number of Victorian buildings (1837-1901) surviving compared to all Victorian buildings ever built) to inform resilience strategies and identify areas of vulnerability and wastage within stocks. Spatial data are sought on percentage survival of original age/typology cohorts, and on their exact location. This section provides an option for users to overlay current polygons onto historical maps (raster format initially) and for surviving/matching buildings to be colour coded. This can be done manually by amateur or professional historians checking if age of buildings from plan shape and current streetview images. Plans to use Computer Vision to generate vectorised footprints (and road networks) from historical maps, currently being discussed within the Turing Computer Vision and Digital Heritage Special Interest Group and Turing;'s Living with Machines programme (https://livingwithmachines.ac.uk/computer-vision-for-digital-heritage/), would allow for rapid matching and counting of footprints/buildings and checking by historians of polygons where matches could not be found or where errors were thought by historians to have occurred- this feedback loop would also help improve computational methods. Below (top) is an example of current footprints over laid onto an historical map but with coloured dots, rather than coloured polygons as proposed, used to show matching so matching/non-matching of polygons. Below this is an example of current footprints coloured in by polygon, with purple here denoting buildings demolished since 1960, and grey surviving buildings.

5. Construction

https://colouringlondon.org/view/construction

Current Construction data subcategories:

Core material; Secondary materials; Main roof covering; Construction system; Foundation type Security and privacy issues rating: Medium

In the Construction category data on core construction systems, construction materials and secondary materials, roof coverings, and foundation types are collected. Here Global Earthquake Model (GEM) classifications are currently being integrated. For vintage cohorts, simpler construction systems will often mean that construction characteristics and materials can be relatively easily be inferred. However for post-war buildings where rapid developments in technology have resulted in much greater diversity of materials and systems available, more detailed information on construction systems materials for individual buildings will is required. (NB extensions and additions also need to be addressed ADD on this). Data are particularly important for understanding embodied energy in stocks and calculating energy and material waste flows; managing housing supply in terms of anticipating resilience to system failure/demolition; assessing potential risk (and facilitating live data capture) with regard to earthquakes, and to climate change related events (e.g. extreme heat, fire, flooding etc.), and community well-being in terms of of risk to health (from say material toxicity) or community displacement (from demolition of unsafe stock). Further information on the development of the Colouring London Construction category, and issues relating to access to open Construction data in the UK can be found in Appendix 4(5).

construction

6. Size

https://colouringlondon.org/view/size

Current Size data subcategories:

Storeys-Core number; Storeys-basements; Storeys; Attic; Height to apex; Height to eaves; Floor area - ground; Floor area- total; Frontage width; Opening area, Number of units Security and privacy issues rating: Low

In the Size category, data collected (metres) includes building height (apex and eaves), storeys, floor area, frontage width, and number of units. These support sustainability research by increasing accuracy in calculations of energy use/building volume, embodied energy and material waste and energy flows through demolition etc. Size data also allows for the quantification and geolocation of domestic and non-domestic floorspace available for specific activities, critical to urban analysis and monitoring, and building valuation. Frontage width is also collected as it is useful for studies relating to accessibility, and permeability of the urban tissue (add ref). Further information on the development of the Colouring London Size category, and issues relating to access to open Size data in the UK can be found in Appendix 4(6).

size

7. Street Context

https://colouringlondon.org/view/streetscape

Current Streetscape data subcategories:

Distance from public green space; Distance to closest tree in street; Number trees in garden?; Green walls/roof; Total area of plot; Plot dimensions; FAR ratio; Plot geometry link; Land ownership parcel link; Land ownership type (overlap with 'Community); Street width; Average width of pavement; Street network geometry link Security and privacy issues rating: Low

Streetscape data, in the first instance, provides information on the context of building including proximity to greenery and greenspace; distance to, and average, pavement and street widths; garden size, and percentage built on; modal height of buildings; and block permeability based on number of doors opening onto the street front (check if we want this). Links to open plot geometry and land parcel geometry are also included within this section. These are important in the analysis of urban density, accessibility and potential of building/sites/typologies for adaptation within plot constraints (see Section x). Land parcel and plot data are also important in inferring dynamic tissue type (see 'Dynamics') which provides information on likely relative rates of change, constraints on adaptability and potential resilience to demolition. Street network data are needed for procedural typology generation, to measure accessibility from street, to measure capacity for adaptability/densification/resilience. (add refs). Street networks can also be used to infer land use and dynamic tissue type (see section x). Further information on the development of the Colouring London Streetscape category, and issues relating to access to open Streetscape data in the UK can be found in Appendix 4(7).

streetscape

8. Team

https://colouringlondon.org/view/team

Current Team data subcategories:

Is data on main building or a major extension?; When built (link to 'Age'); Developer?; Designer type (Landowner, speculative builder/developer, volume house builder, the state, architectural firm, engineering firm. other); Designer name; links to further info

The Team category, in the first instance, collects data on those designing and constructing the building. This includes information on developers (including builder-owners), builders and architects, both for core build and for major extensions (landowners?). In the case of mature industrial cities such as London, where 55% of surviving buildings were built before 1940 (VOA, 2020), information on teams - for the majority of buildings, if recorded at all - will be held within historical records. (The inclusion of data on teams was also noted as, inadvertently, as increasing the attractiveness of the platform to historic environment specialists, also interested in adding and editing data on building age, dynamics. original use and typology and designation. This is of particular relevance owing to the importance this sector in maximising quality and keeping datasets up-to-date at local level (see Section E).

As well as recording data on teams, the Team section is also specifically tested as a new type of tool designed to increase transparency in building standards and incentivise developers to accelerate the move away from the longstanding focus on new build towards reuse and high quality adaptation. The platform offers opportunities to connect developers with spatially located buildings within their construction portfolios and to track these portfolios over time using performance data. Data on building awards and certificates of excellence are also included to support this process. Further information on the development of the Colouring London Team category, and issues relating to access to open Team data in the UK can be found in Appendix 4(8). Security and privacy issues rating: Low - add issues with free text collection

9. Planning

https://colouringlondon.org/view/planning Security and privacy issues rating: Low

Current Planning data subcategories:

Planning portal link; Planning status (live? recently approved? or work complete?); P planned for demolition?; In a protected area?; On a protected building list? (Grade/rank; ID number; risk list ID; official weblink); World Heritage list ID; Regional historic environment record ID; In other area of architectural/historic/scientific importance?; Listed as of local importance (+link)

'Planning' is designed to provide direct links for each building to regulatory information on current planning applications, proposed demolitions, statutory protections and designations imposing potential constraints on development, and to historical planning applications. These data are, other than demolition data, often the only data on building stock released by local authority planning departments at building level and, in some cases, visualised as part of their public GIS interfaces. However spatial data on the live status of all planning applications is not currently available, where data can be accessed status is not currently easy to extract. This means that communities and developers have little idea of what change exactly is being proposed in their local area, region or in the country as a whole, nor what stage of approval the application is at. The Colouring London planning section tests a live planning visualisation/traffic tool which streams planning data and visualises application status. It also allows communities to upload data on where they think/know development is planned to occur, before it is formally announced, to allow time for detailed challenges to be prepared if required. Further information on the development of the Colouring London Planning category, and issues relating to access to open Planning data in the UK can be found in Appendix 4(9).

planning

10. Energy Performance'

https://colouringlondon.org/view/sustainability (possible name change to 'Resilience')

Current sustainability data subcategories:

Building Sustainability approved rating; Energy performance approved rating (domestic/non-domestic); Date of last significant retrofit (link to 'Age' only); Expected lifespan for type; Repairability rating for type; Adaptability potential within plot Security and privacy issues rating: Medium

This brings together data on energy performance including energy performance certificates and sustainable construction certification.

11. Resilience

https://colouringlondon.org/view/dynamics Security and privacy issues rating: Medium

Though all of the 12 main categories capture data to support sustainable development goals a dedicated category has been included for 'resilience'. As well as collating data captured within other main categories and adding to these it is also introduced to introduce new types of rating considered necessary to forecast sustainable and resilient performance e.g. ‘a repairability rating’, ‘an adaptability rating’ and information in ‘typology life expectancy’, with 'max, min and average lifespans for typologies also included. These are currently included as unactivated subcategories to stimulate discussion. A diversity rating for street blocks, in terms of range of age and land use included within blocks is also proposed owing to the identified relationship, described by Jane Jacobs in the 1960s, between building age diversity and resilience (see Section x). The category will also include an emergency planning feature designed to capture data on structural stability/building state in emergency situations. Further information on the development of the Colouring London Sustainability category, and issues relating to access to open Sustainability data in the UK can be found in Appendix 4(10).

12. Community

https://colouringlondon.org/view/community

Current Community data subcategories:

Do users like building and think it contributes to the city?; Do users think this type of building is generally worth keeping?; Is this building protected from demolition?; Do you think this building should be protected from demolition?; Are activities open to the community currently taking place in the building?; Has the building ever been used for community activities in the past?; Has the building always been used for community activities?; is the building in public/community ownership Security and privacy issues rating: Medium

The Community category provides statistical data on community views on the quality and effectiveness of buildings. The aim here is to collect data on performance and design to help optimise the operation and quality of local stocks and to ensure sustainable local planning strategies take account of what citizens think works and what doesn't. The section also demonstrates the scale of knowledge on stocks held at community level and provides an important entry the platform for citizens and users of all ages and abilities, especially that may not realise they hold important information on the stock. The section was also designed as a tool to help community planning groups specifically - identified as one of the most stakeholder groups in long term platform maintenance, highlight well-functioning buildings under threat of demolition, with a demolition threat tool also planned for the Planning category. Users are asked if they think the buildings work well to which they are asked with a simple yes/no answer. The more agreement there is the deeper the colour. A dislike button was not included owing to its potential to encourage cyberbullying and the author’s view that it would counteract efforts to build trust and a constructive, safe community space. Screenshots of buildings ‘likes’ can then be used, along with captured data,to provide statistical evidence to local planning authorities of community interest in a building’s retention and reuse. Data on commonly ‘liked’ typologies can then then be analysed against physical attributes, location and performance.

The Community section is also used to track the loss and gain over time of buildings considered to be local ‘assets’ (be these pubs, places of workshop or dedicated community facilities) and to capture data on ownership. From consultation and the literature review, land ownership data was found to be controversial in UK, with many arguments regarding privacy circulating around this. However ownership type e.g. private individual, private company, state, charitable, community etc. (rather than owner name) is important in developing sustainability strategies at specific. In London the default position focus is set for ownership by private individuals as, as in the case of residential building in the land use category, this will the most common state) with focus placed on both on verification of these data and of other, less common, ownership types. Further information on the development of the Colouring London Community category, and issues relating to access to open Community data in the UK can be found in Appendix 4(12).

![community](https://user-images.githubusercontent.com/42236514/160645849-7aaf8486-3aa4-4bd3-b00c-71a1d8f9c91c.JP

Method & History

Steps 1 and 2: Drawing from the Building Exploratory prototype, and assessing relevant public GIS platforms

Draft data categories for the Colouring London prototype were first based on those selected by her for the Building Exploratory's public GIS system, in 1998, to provide free information London Borough of Hackney's building stock. Here data relating to the three divisions discussed above were collated: on current composition (e.g. Ordnance Survey landline footprint, English heritage designation data, ownership data for housing association and local authority stock); on state/performance/quality (e.g. streetview images of state of housing association stock available when footprints were clicked); and on dynamics (e.g. historical footprints for multiple temporal intervals from late 19th century onwards purchased by all local authorities for contamination mapping). Background mapping information e.g. roads/greenspaces/streetnames was also included with raster images of microspatial colour-coded bomb damage maps from the 1940s layered at a second stage.

Data were mapped at building level and borough scale -mainly supplied by local authority and housing association datasets. (though commissions for data animations/internatives from the Centre for Advanced Spatial Analysis (CASA) at UCL (using designation data), and Cambridge Environmental Research Consultants (24 hour pollution animation) were included in the public exhibition).. At this time Ordnance footprints were not permitted to be either viewed or downloaded the public. Datasets were only permitted to be viewed on terminals linked to the Council's intranet and represented the first public viewing of all of the majority of the featured datasets, and the first time that spatial data, at building level, on the composition, quality and dynamics of the stock had been made available in such a way as to use data from the past to begin to explain the present state of the stock to help stimulate greater engagement and discussion and improve its quality in future. Critically the focus on building level data, the range of data types, the layering of data, the emphasis on visualisation and colour mapping were necessary to engage diverse stakeholders capable, together, of improving the building stock, but also in capturing feedback from residents industry, government, academia and the third sector on types of data to collected to produce a free, innovative, useful public information tools.

Between 2014 and 2016 an assessment was made to identify and advances made in publicly accessible GIS platforms providing building attribute data at building level. No publicly accessible platforms were found to exist in the UK dedicated to the provision of data on the stock, and none were identified in or outside the UK, through the literature review, providing the three divisions of data sought (see also Section 3

Data categories were also influenced by research undertaken by Hudson between 2003 and 2010. This included the design of a 4D animation of a 1km area of Hackney spanning 250 years, built with Steve Evans (https://www.youtube.com/watch?v=px_qakrZQ4w) which identified types of data required to produce time series animations and simulations and related problems with collection; and a series of free animations, also with Evans, on the history of fossil fuel use in Britain, which also tested the use of 3D spatial data visualisations to provide public information on carbon emissions. Between 2009 and 2010 Hudson also worked with Louis Jobst and The London Borough of Camden, to assess methods of capturing age data for digitised historical maps to geolocate typologies, and support local authorities in more ore rapid, accurate targeting and efficient of retrofit budgets and retrofit methods (see also Section C).

Step 3: Literature review

This basic content framework both the interface and for data types of interest was then fleshed out through an extensive review of academic literature between 2014 and 2017. This specifically looked the types of data needed to analyse and forecast resilience and sustainability in the stock, which was funded by a doctoral research grant from the UK's Engineering and Physical Science Research Council (EPSRC). Findings from this review are recorded in Section B. Here it was identified that for this research data was urgently needed, across countries, for all buildings in the stock at building level, on current building characteristics, performance and quality, rate of scale of change/churn, and capacity for adaptation and survival. Specific data types of value identified were then extracted and began to be structured within a simple main data category graphic. The main limitation of the literature review was though over 300 sources were referred to (see references) this still was known to represent only a small percentage of research on data needs in relation to sustainability of the stock.

The main output from the process was an awareness that datatypes identified from on a number of additional data types of data lifespans A key finding from this step was that ongoing consultation was also required with each stakeholder group/field of expertise to ensure as many data subcategories as possible were able to be identified.

Step 4 First stage consultation and initial platform set up (2015-2018)

First stage consultation on Colouring London was undertaken in parallel with the set up of the prototype platform by Tom Russell (see Section x). The consultation strategy was designed to:

identify issues with proposed data categories
extend the knowledge base and identify new categories required
build a network reaching all relevant stakeholders’ audiences
develop levels of trust, goodwill and interest necessary to operate and sustain a collaborative maintenance system (see Section x) .
stimulate ongoing feedback on both proposed content, interface design and the wider value of the platform to ensure development in as as useful, efficient, effective and sustainable way as possible.
address three issues raised by Morphet and Morphet (ref) in relation to potential problems with New urban Agenda delivery in the UK: institutional fear of data release, difficulty in normalisation of relevant data upload and data release initiative sustainability.

This initial stage of consultation with specialist groups commonly involved 2 hour + discussions with specific stakeholders, using, in the first instance between 2015 and 2017, excel sheets to describe/promote discussion of proposed categories and subcategories, followed later by the addition of colour graphics for the category grid, and finally by the live prototype platform once released 2018 in 2018. This was initially undertaken with: researchers working in urban science (at CASA) re data required in analysing cities as complex systems; researchers working in energy research (with UCL Energy Institute, UCL Institute of Environmental Design and Engineering, and the Building Research Establishment) re data required in energy analysis and monitoring, and retrofit; historic environment specialists (including Historic England and amenity societies) working on conservation/building lifespan extension; historians able to provide lifespan data (e.g. Survey of London, and the Institute of historical research); and community planning groups on data required to challenge negative change to local areas. The latter two involved consultation with the Better Archway Forum (interviews) and Somers Town Neighbourhood Forums as well as community workshops funded by Historic England). Two consultation exhibitions (in 2015 and 2018) were also hosted by Alan Baxter Associates, where many national amenity societies and small London organisations are based. Between 2017 and 2021 consultation was expanded, as and when stakeholders were identified, to include individuals and organisations from more academic departments, the construction industry, local and central government, and the voluntary and community sectors. Around sixty organisations and individuals were involved in commenting on the interface and data categories. A list of consultees and partners and the type of contribution offered is provided in Appendix x.

Specific steps used in consultation were as follows:

Identify stakeholders/consultees through relevant writing/projects or recommendation by other consultees;
Approach consultees to set up meeting ideally with recommended contact, emphasising the project as an open data project run by an academic host
Conduct face to face interviews (approx 2 hours)
Identify which categories and features are of particular interest to each stakeholder and why;
Discuss what adjustments/additions could be made to existing categories to support their work.
Identify any related interface improvements to attract specific audiences- e.g. non technical category descriptions, non-technical interface, inclusion of option to add links to text based information and images
Requesting information on ideal data formats
Work out how stakeholders might best like to contribute (such as through the provision of bulk data, coding support, voluntary contributions from members, expert advice, platform launch venues, publicity or funding).
Co-develop ideas for a long-term win–win scenario for each stakeholder group to encourage ongoing contributions and collaborative maintenance of data categories, and where relevant possible joint projects/funding applications.

As noted above an unexpected degree of agreement on relevant data types for inclusion was noted during UK and international consultation. very few specific request for additional subcategories were made at this stage other than from the Historic England which asked for more detailed designation subcategories. A notable limitation of the informality and ad hoc nature of the first stage consultation process was that consultation meetings were not recorded. This has meant that, though all consultees were/are credited on the platform, recommendations leading to any category adjustments were not tracked. Methods to facilitate and improve recording of stakeholder input into category adjustments in future are outlined below

Step 5: Release live, first stage testing and adjustment

As part of the iterative process of design, main categories were repeatedly renamed and moved and edited (see Appendix 2). New subcategories were also introduced, and categories removed or temporarily reintroduced with a slightly different focus between 2016 and 2019. All changed are recorded in Colouring Cities open code within the GitHub repositories. Renaming has included ‘Design/Build’ section has been renamed ‘Team’; the ‘Street Front’ section has been expanded and renamed ‘Streetscape’; the Greenery section has been merged with ‘Streetscape’; the ‘Protection’ section has been expanded and renamed ‘Planning’; the ‘Demolitions’ section has been expanded and renamed ‘Dynamics; the ‘Like me?’ section has been expanded and renamed ‘Community’, and a new section ‘Sustainability’ has been added. The process of additions/adjustment to main category names and subcagtories is recorded within the GitHub's archives. In addition many weeks of testing data upload for activated categories resulted in numerous minor tweaks to text and position (recorded on GitHub).

Steps 6-8

Subcategory change caused by shifts to open data policy/new data availability: For example in terms of 'Location' data new subcategories for London have been able to be added over the years owing to a gradual change in UK government attitude towards the economic value of open data (add Geospatial Commission ref). Here Ordnance Survey polygon/building footprint reference information could be included from the outset but centroid coordinates (held in Unique Property Reference Numbers) allowing captured open data to be mapped were not released until 2020.
Subcategory change caused by detailed consultation with experts in relevant sectors: For example in terms of 'Age' data, subcategories were added i n response to historic environment sector concerns that different age facades and major accretions should also be recorded, and saw the option to add sources extremely important in ensuring accurate data was provided. In time more feedback from the general public on ideas for new categories or requests for clarity are expected.
Subcategory change caused by ethical requirementss relating to data release and the privacy and security of platform users, and of building occupiers and owners. The category most affected by this has been 'Community' which allows the public and other stakeholders to colour buildings they think perform well, and where numerous iterations of questions have occurred owing to concerns regarding potential threats to inclusivity and in the worst case scenario geospatial bulling. (This is further discussed under 'Phrasing of subcategory questions in Chapter x: 'Interface Design).'
Awareness of the opportunity to release new types of data in future, to support sustainable development and that thee could still have value as proposed subcategories without being activated. This was most relevant to the 'Sustainability' category where information as repairability, adaptability and longevity of buildings was considered extremely useful, and able to be inferred by combining current attribute subcategories such as age, typology, footprint, plot size, with 'Dynamics data' such as dynamic tissue type, and construction and demolition dates for typologies of demolished buildings.

Of key importance to this iterative process of data selection within the prototype is an awareness that as though subcategories may be swapped within main category headings, and be added or removed, changes to subcategories may cause new data collected to longer match with previously captured data, and which cause these data to become redundant. Though agility and ability to adapt are critical characteristics for this type of platform, it should be stressed that work on data subcategory selection needed largely carried out up front, using the literature review and focused stakeholder consultation approach to be carried out over a long period, to minimise risks in this regard. The current second phrase of platform testing with CCRP partners will provide detailed information on the extent to which these categories translate across countries, and on other categories and subcategories that may need to be included.

The current method used in relation to category adjustment and alterations building on learning from the above stages through the 'Discussion' forum, and GitHub issues with have since been implemented. Not all subcategories have been released. This is largely as a result of the significant cost of lack of enough software engineering time for implementation. The amount of engineering time required to select, design, build, consult on and release data subcategories was initially significantly underestimated by the author. Methods of better recording and crediting contributor design input through the platform’s feedback system, as well as through GitHub are currently being looked at. Not all subcategories have been released. This is largely as a result of the significant cost of lack of enough software engineering time for implementation. The amount of engineering time required to select, design, build, consult on and release data subcategories was initially significantly underestimated by the author.

This section is currently being edited. Open code for our Showcase section will in future allow users of Colouring Cities data, or of other similar attribute data (including restricted data), to both upload and access data visualisations and papers/report/website links to applications.

showcasepage2017

Specific data applications/areas currently being explored by the CCRP, as part of academic research collaborations, include:

energy and waste flow analysis
participatory planning
stock dynamics modelling
3D rule based model generation
public health in relation to building form, age and construction
tracking long-term deprivation and demolitions cycles
live assessment of stock condition in disaster situations
open housing stock auditing
collaborative improvement of government building attribute, planning and performance data
community feedback on typology quality
city walkability
computer vision and historical map vectorisation
open footprint accuracy
use of feedback loops between automated and crowdsourcing to improve data accuracy
development of algorithms to automatically geolocate typologies and land uses
governance models for data sharing architectures relating to physical infrastructure
use and value of colour and design in collaborative maintenance physical infrastructure platforms
provision of visual canvases for the collection of spatial statistics on city /site evolution by historians and local commuities
mechanisms for collecting and sharing statistical data on city/site evolution between the the humanities and science
building attribute data standards
data ethics in relation to building attribute data platforms