I. DATA - colouring-cities/manual GitHub Wiki

Live Editor polly64.

Introduction and data quality

The Colouring Cities Research Programme (CCRP), explores the value of public databases and data capture platforms that facilitate the collection and collation of comprehensive, high quality spatial data on national building stocks. The prototype tests a sustainable, collaborative maintenance model, and academic research initiative, that collates, visualises and releases open spatial data supplied by diverse stakeholders - in academia, industry, government, the third sector, and the public- using a number of data capture methods. It also explores whether such platforms can be used, simultaneously to improve stock efficiency, drive up building quality, and support a whole-of-society approach to urban sustainability and complex problem solving.

The process of data capture and classification within CCRP platforms collation is designed to produce a network of international platforms containing databases in which data categories and formats are standardised to allow for comparative analysis across countries. One of the key aims of the CCRP is to produce volume data to support the use of AI in gaining insights into the stock as a complex dynamic system and into cycles, patterns and rules of behaviour.

CCRP platforms:

  • use a common language
  • share information consistently across the CCRP system
  • make combining information simple and more streamlined
  • promote a shared vision
  • test feedback methods between data capture methods to improve data accuracy and maximise stakeholder engagement
  • use consistent validation rules (i.e. the data type needs to be a date, text, integer, part of a list, etc.) and consistent metrics and formats
  • looks for the smallest number of data categories required to be captured across countries,to support research goals
  • explore the relationships between attributes and how known attributes can to be used to infer others
  • do not collect personal data, avoid free text entries, and prioritise the well-being and security of platform users and building occupiers in data collection

The CCRP prototype, Colouring London, uses six data quality dimensions, as recommended by the UK government guidelines as the characteristics against which quality is measured, in order to maximise trustworthiness and suitability for purpose. These are:

  • Accuracy: when data reflects reality e.g. correct names/addresses that are up-to-date;
  • Completeness- when all data for a particular use is present and able to be used;
  • Uniqueness- when data appears only once in a dataset and duplication is avoided
  • Consistency- when data values do not conflict with other values within a record or across datasets;
  • Timeliness - when data are available when expected and needed;
  • Validity - the extent to which the data conforms to the expected format, type and range e.g email addresses having an @ symbol. Collaborative work on quality and standards in relation to CCRP platforms will be undertaken by CCRP international partners during 2023.

Data architecture and category selection

Building data captured by the CCRP sits within the wider domain of physical infrastructure data. Three divisions of building data are collected. These relate to i) the current composition of the stock, ii) its quality and performance, and iii) its short-term and long-term dynamic behaviour. All divisions have been identified in CCRP research as necessary to answer the following questions: What type of buildings currently exist in the stock? How well do they perform? How are likely to perform in future based on current attributes and past behaviour? and how can we, as a community of stakeholders, maximise their efficiency, quality, resilience and sustainable performance, now and for the future?

  • Division 1: Data on the Current composition of the stock Division 1 comprises data on the characteristics and spatial location of all buildings within the current stock to build a complete picture of current stock composition. Research questions include: What types of buildings exist in the city/country and how are these being used today? How many of each type are there and where are they located? What can the 3D form of each building best be described and its relationship to adjacent buildings? What is the size of buildings? What are they made of? how are they built? and what is the type of ownership, which will affect all decision making? e.g. state, charitable, private (company), private (individual/other).

  • Division 2: Data on Performance, state and quality (socio-cultural, economic and environmental). Division 2 comprises data on how well these buildings work in socio-cultural, economic and environmental terms. This information is essential to improve monitoring and decision-making relating to stocks in the context of global sustainability goals (see Sections A and B). Data collected includes that relating to energy performance, to community views on how well buildings work at local and city level, to professional awards, and to those (current and historical) involved in a building's design development to incentivise developers to construct better performing/longer lasting building. New types of dataset -(inferred from other data collected) are also proposed providing. These include ratings for adaptability and potential for lifespan extension, repairability, and ease of retrofit. Division 2 also addresses the need to live capture of data on the structural state of buildings in emergency situations, caused by climate change (e.g. earthquakes, wildfires, flooding) or by other human interventions such as bomb explosions/war.

  • Division 3: Data on the Dynamic behaviour of typologies and sites. Division 3 comprises data on historical change in stocks to enable current change to be more easily tracked (What? Where? How much? How fast?), and cycles, patterns and constraints/opportunities for the the future to be better understood. Of critical importance is to understand the operation of the stock as a system and underlying rules of change in relation to its component parts to improve long-term strategies, and to provide both data and rules for stock forecasting models that are as accurate as possible. Data collected includes short-term dynamics data, under planning, on new builds and demolitions, designation data (indicating locations with development constraints/where churn in the stock is consciously slowed), and long-term dynamics data on lifespans of all demolished buildings on sites (involving the capture from historians of construction and demolition dates, and location information, for all demolished buildings ever built), from which typology/building survival rates and reasons for resilience/vulnerability can be assessed. Information on dynamic characteristics (and anticipated future dynamic behaviour) of the urban tissue (i.e. the type of building-plot-street network configuration is also captured.

  • Main data categories within Divisions Data relating to these three Division are distributed across 12 subject areas which represent the main data categories. These are described as building Location', 'Land Use', 'Typology', 'Age & History', 'Size', 'Construction', 'Street Context', 'Team','Planning','Energy Performance', 'Resilience' and 'Community'. The are structured visually in a 3 x 4 grid as shown below. Each is designed to provide quick access to a specific type of information considered significant enough to warrant its own category.

Choice of main subject categories

The selecting and structuring of the main data categories has been the most complex aspect of content design. 12 main subject categories have been selected into which all data subcategories, relating to each of the three divisions, can be easily stored.

The distribution of data relating to each of the three Divisions across 12 main categories is as follows:

  • Current stock composition: 'Location', 'Land Use', 'Typology', 'Age and History', 'Size', 'Construction' & 'Street Context'
  • Building performance, state & quality: 'Team', 'Energy Performance', 'Resilience' & 'Community'
  • Dynamic behaviour: 'Planning' and 'Age and History.

Categories are limited in number, to maximise ease of access for all stakeholders, and highly flexible, to allow ongoing adjustments promoted by incremental testing and stakeholder feedback. They have been selected and structured in such a way as to meet the following requirements identified over the 6 year research programme:

  1. To provide data on the current stock and its performance and dynamics in a way relevant and accessible to all stakeholders, including residents.
  2. To develop categories so as to provide a free multifunctional tool that not only provides data but also supports stakeholders and communities as a whole accelerate the move towards better quality, zero carbon stocks.
  3. To test and integrate global classification systems wherever possible to maximise platform usefulness and data interoperability/sharing/accessibility across countries.
  4. To increase understanding of the stock as a complex dynamic system to improve long-term forecasting of sustainability, resilience and risk
  5. To, using simple graphic representation not only to maximise accessibility, but also to begin unravel the underlying relationships between building characteristics, understand potential constraints to sustainable trajectories, and to explore the logic of the stock.

Category selection has been result of six years of research, which continues today, undertaken in the following stages: a) assessment of existing UK public mapping platforms providing data on the building stock; b) a review of over 300 academic papers and documents produced with the context of urban science, sustainability science, urban morphology, urban theory, planning, building conservation, urban history and UK building data classification; c) consultation with over 60 stakeholders including two public exhibition and a community testing programme; d) in-house platform category testing and adjustment since 2018, and e) discussion with CCRP international partners, and national and international organisations/agencies 2018-21.

Though subject titles have remained relatively constant since project conception in 2016, increased international CCRP partner discussion has resulted in adjustments being made. The most significant has been merging of 'Age' and 'Dynamics' under 'Age and History' (as recommended by Colouring Australia/UNSW); tsplitting of 'Sustainability' section into 'Energy Performance' and 'Resilience' (as recommended by Colouring Germany/IOER), and reverting to 'Street Context' over 'Streetscape'(as recommended by Colouring Indonesia/King's College).

Within the 12 subject categories, over fifty subcategories of spatial data are collected at building level. These are at various stages of activation. These include quantitative data captured across, for example 'Size', 'Streetscape' and 'Dynamics' categories (e.g. building height (m), storeys (no. storeys), footprint area (m2),(road width (m)) lifespan (no.years)); categorical data captured across, for example 'Location', 'Use' and 'Team' categories (e.g. building name, builder's or architect's name, land use), and qualitative data (non-narrative/statistical only) captured within the 'Community section' on how well people think a building or typology works- here using dropdowns or Yes/No options. Data subcategories are discussed detail within on individual category/subject area pages in Sections U1-12.

Single function categories

The initial seven categories in the data-category-grid represent single function data types describing specific building and street attributes. Location data are placed first as they geolocate all other attribute data, and as building footprint geometries function form the main building blocks for Colouring Cities platforms - without which they cannot be built. Placed 2nd in the grid comes current use, noted during UK prototype development to be the most commonly used building attribute data type. Use is followed by building Typology and then Age, both predicted to rise rapidly in value owing to their importance in the grouping of characteristics necessary to infer 3D form, volume and detail, for use in energy and urban analysis and in 3D and 4D rule-based simulation models (see Section x). The construction and materials category follows, with the Size category fleshing out the detail of each building's physical form. Finally 'Streetscape' provides data on the relationship between the building, plot and street. Each of these categories is described briefly below:

1. Location


Current Location data subcategories:

  • Building address; Area code; Footprint ID; Unique Property Reference Number; OpenStreeMap ID; Centroid coordinates

Building footprints (in the form of vectorised polygons) provide the basic building blocks for all Colouring Cities platforms. These acting as mini filing cabinets, as in other types of Geographic Information System (GIS), within and through which attribute data can be collected and stored, collated, corrected, visualised and disseminated. Coloured footprints, rather than coloured points, also make data much easier to read by stakeholders, and for insights to supplement those made using computational methods. Building polygons/footprints provide information/data on spatial location, building geometry, ground floor/total area, building perimeter, and number, size and shape of walls. They can also be used to help infer other characteristics, such as age, height and 3D form. Extrusions from footprints can also be used to generate simple 3D models.

Footprints are also necessary to undertake detailed spatial analysis and modelling of the stock. To ensure data captured with Colouring Cities platforms are of maximum use in understanding the relationship of the physical form of cities to socio-economic and environmental performance additional types of spatial reference are also required. This include centroid coordinates for footprint polygons, unique property reference numbers where applicable, building and street number and area code information, national mapping agency polygon IDs, and OpenStreetMap IDs. In Colouring Cities platforms all are collected to maximise interoperability and usefulness of building attribute data. Spatial reference data also has other uses such as allowing for basic features within the user interface. such as a zoom facility to specific buildings of interest. Lack of building numbers also increase the potential for users to enter data into the wrong building footprint, especially for densely packed streets where footprints are similar.

Open access to comprehensive footprint data and open address data is therefore extremely important. The open provision of building footprints by national mapping agencies varies from country to country, with footprints often released alongside open property tax datasets. In recent years OpenStreetMap, has begun to increase its focus on footprint capture. Microsoft/Bing’s current application of artificial intelligence to satellite imagery, has meant open footprints for the US, Canada, Uganda and Australia are now being able to be integrated into OSM, alongside open street-network and international national mapping agency data (where available), and other types of open, crowdsourced, geoinformation (Microsoft, 2019; 2020). The OSM/Microsoft partnership is important in that it means that footprints are likely to become increasingly available at global scale. OSM building IDs are included as one of Colouring London's location data subcategories.

However, though OSM footprints will be available for cities, in terms of geometric accuracy, comprehensive geographic coverage and regularity of updating it is unlikely that these will compare with the quality of national mapping data overseen by systematic updating undertaken by Ordnance survey of those generated by OSMM. The more accurate and comprehensive the footprint data, the more accurate the analysis and modelling has he potential to be. interoperability. Though the use of semi restricted national mapping agency footprints is tested in the UK prototype, other approaches, including the use of OSM footprints, are being experimented with in other countries by other CCRP partners (see other country pages - include ref).

Specific research questions of interest in relation to the capture of location data include: What types of open location data are needed to maximise the potential for multidisciplinary/multi-sector analysis of captured data? and how easy are accurate, comprehensive, regularly updated open building footprints to access across countries and what computational approaches are most relevant in increasing accessibility? Further information on the development of the Colouring London Location category, and issues relating to access to open Location data in the UK can be found in Section N.3.


2. Land Use


Current Use data subcategories:

  • Land use order/s; Land use group/s

Land use is one of the most commonly used types of data in urban analysis and is relevant to the monitoring and management of buildings in areas such as planning, energy research, housing provision and property taxation. However despite this, comprehensive open land use data, at building level, are still not available at building level in countries such as the UK. This category captures data on the current type, or types of activity in each building. Owing to the fact that housing will be the dominant use in all CCRP platforms (making up over 93% of taxable properties in the UK), placed on the domestic data verification (rather than domestic data upload), and on mapping of non-residential use. Several land use classifications are used in the UK which required harmonisation. None are viewed as fit for purpose. We are currently testing the National Land Use Database classification system which though not yet implemented in the UK is the most comprehensive. We are currently looking to work with CCRp partners to develop a landuse classification system that functions across countries but is tailored to provide detailed information on the local context, including mixing of uses.

Specific research questions of interest include: How much land is used up for specific types of activity, can recognisable patterns of spatial distribution be recognised within and across countries for specific types of land use? Can certain mixtures of land use in specific locations make cities more efficient, resilient and/or sustainable, and loss of these mixtures do they opposite? Are these patterns consistent across cities and countries and if so how can rules be developed to improve accuracy of land use forecasting models? Further information on the development of the Colouring London Current Use category, and issues relating to access to open Current Use data in the UK can be found in Appendix 4(2).

current use

3. Typology


Current Typology data subcategories:

  • Base type; Local typology; Original building use; Roof type; Adjacency/position; 3D open procedural model link

The type of activities and number of people a building was originally designed to hold, as well as social and technical changes occurring within its period of construction period will affect its size, 3D form, construction system, materials and room layout. Geolocated data on building typology description (such as 'Victorian terraced house') can allow the stock not only to be quickly divided according to similar characteristics, and for 3D form, rough dimensions/volumes, relationship to adjacent buildings, roof shape, materials and methods of construction to all be inferred. This allows for computational approaches to be applied relatively easily to accelerate data coverage especially when footprint and age data are also available. Typology data is also particularly valuable when combined with lifespan data (see 'Dynamics') in allowing both vulnerable and resilient typologies to be geolocated, for the depletion of finite typology reserves to be more easily monitored, for appropriate methods of retrofit (and retrofit budgets) to be more precisely targeted, and for the potential for buildings to adapt and extend their lifespan over long time period to be assessed. The development of typology 'rules' as part of this process is particularly relevant to the generation of 3D and 4D rule-based open simulation models designed to test planning and energy scenarios (see also Section U.3 and Section x). The typology section currently looks to integrate a range of typology descriptions in order to allow for typology classification across countries, and at also local level. These are mainly drawn from urban morphology and are discussed further in section x. Further information on the development of the Colouring London Type category, and issues relating to access to open Type data in the UK can be found in Appendix 4(3).


**4. Age and History


Age and history is split into 3 sections:

  • Building age: which includes Main construction date; Earliest and latest construction dates; Extension dates; Facade date; Cladding data (if applicable); Last retrofit date and Historical source links
  • Lifespan data: which captures pairs of demolition and construction dates of all building ever built on a site, as well as links to historical information
  • Survival data: which compares current and historical building footprints to allow capture of information on number of buildings serving from specific time points.

Building Age: Age data forms one of the most important datasets in the Colouring Cities platform and is the data type, along with building footprint data, on which most time has been set during prototype development. In Colouring Cities core construction date is the main type of data captured, though facade date, cladding date and main extension date are all also sought. Importantly age is collected by individual year (with uncertainty accommodated through capturing earliest and latest possible date). This has been found to extremely important in addressing current issues with lack of interoperability of age data owing to frequent capture using different date intervals (i.e. morphological periods, by building regulation dates etc) which is essential when analysing age within, and across, countries, and inferring building characteristics.

Capturing data by year significantly increases the usefulness of the data. The more precise the dating the more accurate other inferred information about the building is likely to be. Along with footprint data, spatial age data captured by year can be used to generate 3D typology information, and indicate materials, construction methods and standards, building dimensions, original interior layouts, roof shapes and even original internal and external detailing with greater precision than by interval or using typology descriptions alone. A Victorian terraced house in Britain built in 1837 or 1901 will for example have a different range of possible detail and 3D form. Age data can also infer protection/designation constraints, and potential for lifespan extension/adaptation within plots. It also provides essential baseline data from which lifespans of both extant and demolished buildings, and vulnerabilities and survival and urban metabolism rates can be calculated. Age data has many other applications, ranging from the geolocation of potential health hazards in housing (e.g. toxic materials, damp, steepness of stairs). the development of resilience to demolition in stocks (see also Section U.4 and Section x for further discussion) and risk assessment re earthquake damage (ref GEM). Facade data when mapped against core construction date can also indicate the extent of coring out/rapid energy and waste flows occurring in historic centres (through facade-only retention), unable to be discerned at street level. Age data collected within Colouring London has been evaluated in academic papers in the context of 3D procedural model generation (https://www.tandfonline.com/doi/abs/10.1080/17567505.2018.1517142 (NB produce open version) and energy analysis (https://journal-buildingscities.org/articles/10.5334/bc.52/print/). Methods of inferring building age using historical street network data, and the crowdsourcing of building age in collaboration with the historic environment sector are also discussed in Sections x and x. Further information on the development of the Colouring London Age category, and issues relating to access to open Age data in the UK can be found in Appendix 4(4).

age top

Lifespan data: Lifespan data is extremely difficult to capture. It is however extremely important in understanding typology resilience and in the calculation and prediction of material energy flows (see Section B). In Colouring Cities platforms construction and demolition dates are collected at building level from those with historical knowledge of buildings and localities. with a view to data capture for all demolished buildings ever constructed on each site. Lifespans are automatically calculated once construction and demolition date are entered. Uncertainty measures (i.e. earliest and latest possible dates) are included. Weblinks provide further (generally) text/image based information about each site. Users are also asked to state whether demolished buildings were completely contained within the present-day plot boundary, and if not then to suggest approximately how much of it would have been contained, as earlier buildings would sometimes have spanned multiple contemporary plots. Lifespan data can be used to update the life expectancy information for typologies/age cohorts within the 'Sustainability' to allow less resilient typologies and locations to be identified. NB Typology/historical land use subcategory also needs to be added. This category is also designed to engage and celebrate the vast body of knowledge on stocks and their history held by historians and the community and to sue this to inform scientific research.

dynamics 2

  • Survival data As well as collecting geolocated lifespan data, additional data are sought to support the analysis of survival/mortality rates in relation to original stock reserves (e.g. number of Victorian buildings (1837-1901) surviving compared to all Victorian buildings ever built) to inform resilience strategies and identify areas of vulnerability and wastage within stocks. Spatial data are sought on percentage survival of original age/typology cohorts, and on their exact location. This section provides an option for users to overlay current polygons onto historical maps (raster format initially) and for surviving/matching buildings to be colour coded. This can be done manually by amateur or professional historians checking if age of buildings from plan shape and current streetview images. Plans to use Computer Vision to generate vectorised footprints (and road networks) from historical maps, currently being discussed within the Turing Computer Vision and Digital Heritage Special Interest Group and Turing;'s Living with Machines programme (https://livingwithmachines.ac.uk/computer-vision-for-digital-heritage/), would allow for rapid matching and counting of footprints/buildings and checking by historians of polygons where matches could not be found or where errors were thought by historians to have occurred- this feedback loop would also help improvecomputational methods. 3 colours will be used for these maps, for i) 'survivals/footprints that match', 'footprints that don't match/demolitions' and 'not sure'. Below (top) is an example of current footprints over laid onto an historical map but with coloured dots, rather than coloured polygons as proposed, used to show matching so matching/non-matching of polygons. Below this is an example of current footprints coloured in by polygon, with purple here denoting buildings demolished since 1960, and grey surviving buildings.


  • Dynamic tissue classification The third type of data collected relates to types of 'Dynamic tissue'. The classification builds on Brenda Case Scheer's work in urban morphology associating specific attributes and dynamic qualities with specific configurations of street, plot and building, though tissue types are renamed (see Section x). They are seen as having great potential to allow relative rates of change between specific, easily locatable, types of urban tissue types to be predicted to improve accuracy of forecasting models and simulations.

The three dynamic tissue types for which data collected are: i) 'Old street network' tissue, represented by the oldest/preindustrial streetnetworks along which footfall/trade has passed for the greatest number of years, and where in cities such as London the bulk of commercial and mixed use lie (image top); ii)'Street infill' tissue, represented by, mainly, domestic buildings (which comprise the bulk of the stock) and which fill in the gaps formed by the 'old street network' routes (image middle); and iii) 'large parcel' tissue represented by large land parcels (e.g. >5,000m2) which often contain their own access routes, as used by industrial estates, housing estates, hospitals, large schools, airports, depots, universities etc.. These have also been shown in Colouring London research (and by Stailov and Batty, Whitehand and others), to have different locations, and rhythms of change/demolition, depending on their land use (image bottom).An example of the location of 'Old street network' tissue for London is shown below.

elstic tissue Further information on the development of the Colouring London Dynamics category, and issues relating to access to open Dynamics data in the UK can be found in Appendix 4(11).

5. Construction


Current Construction data subcategories:

  • Core material; Secondary materials; Main roof covering; Construction system; Foundation type

In the Construction category data on core construction systems, construction materials and secondary materials, roof coverings, and foundation types are collected. Here Global Earthquake Model (GEM) classifications are currently being integrated. For vintage cohorts, simpler construction systems will often mean that construction characteristics and materials can be relatively easily be inferred. However for post-war buildings where rapid developments in technology have resulted in much greater diversity of materials and systems available, more detailed information on construction systems materials for individual buildings will is required. (NB extensions and additions also need to be addressed ADD on this). Data are particularly important for understanding embodied energy in stocks and calculating energy and material waste flows; managing housing supply in terms of anticipating resilience to system failure/demolition; assessing potential risk (and facilitating live data capture) with regard to earthquakes, and to climate change related events (e.g. extreme heat, fire, flooding etc.), and community well-being in terms of of risk to health (from say material toxicity) or community displacement (from demolition of unsafe stock). Further information on the development of the Colouring London Construction category, and issues relating to access to open Construction data in the UK can be found in Appendix 4(5).


6. Size


Current Size data subcategories:

  • Storeys-Core number; Storeys-basements; Storeys; Attic; Height to apex; Height to eaves; Floor area - ground; Floor area- total; Frontage width; Opening area, Number of units

In the Size category, data collected (metres) includes building height (apex and eaves), storeys, floor area, frontage width, and number of units. These support sustainability research by increasing accuracy in calculations of energy use/building volume, embodied energy and material waste and energy flows through demolition etc. Size data also allows for the quantification and geolocation of domestic and non-domestic floorspace available for specific activities, critical to urban analysis and monitoring, and building valuation. Frontage width is also collected as it is useful for studies relating to accessibility, and permeability of the urban tissue (add ref). Further information on the development of the Colouring London Size category, and issues relating to access to open Size data in the UK can be found in Appendix 4(6).


7. Street Context


Current Streetscape data subcategories:

  • Distance from public green space; Distance to closest tree in street; Number trees in garden?; Green walls/roof; Total area of plot; Plot dimensions; FAR ratio; Plot geometry link; Land ownership parcel link; Land ownership type (overlap with 'Community); Street width; Average width of pavement; Street network geometry link

Streetscape data, in the first instance, provides information on the context of building including proximity to greenery and greenspace; distance to, and average, pavement and street widths; garden size, and percentage built on; modal height of buildings; and block permeability based on number of doors opening onto the street front (check if we want this). Links to open plot geometry and land parcel geometry are also included within this section. These are important in the analysis of urban density, accessibility and potential of building/sites/typologies for adaptation within plot constraints (see Section x). Land parcel and plot data are also important in inferring dynamic tissue type (see 'Dynamics') which provides information on likely relative rates of change, constraints on adaptability and potential resilience to demolition. Street network data are needed for procedural typology generation, to measure accessibility from street, to measure capacity for adaptability/densification/resilience. (add refs). Street networks can also be used to infer land use and dynamic tissue type (see section x). Further information on the development of the Colouring London Streetscape category, and issues relating to access to open Streetscape data in the UK can be found in Appendix 4(7).


Multifunctional categories: Colouring Cities platforms in the context of digital twins

The remaining five categories outlined in black below, operate in a different way. Each provides attribute data but also has one or more additional functions used to help promote sustainability goals. These, and feedback loops relating to data capture methods (described in Section E) distinguish Colouring Cities platforms from a basic digital databases or geographic information systems. Platforms contain databases but also are designed to operate as digital twins in which dynamic connection exist between the digital platform and the stock itself,helps drive the sustainability agenda, and where data are turned into insights that support improved decision making.' https://www.cdbb.cam.ac.uk/files/gemini_papers_-_what_are_connected_digital_twins.pdf

data category mixed

8. Team


Current Team data subcategories:

  • Is data on main building or a major extension?; When built (link to 'Age'); Developer?; Designer type (Landowner, speculative builder/developer, volume house builder, the state, architectural firm, engineering firm. other); Designer name; links to further info

The Team category, in the first instance, collects data on those designing and constructing the building. This includes information on developers (including builder-owners), builders and architects, both for core build and for major extensions (landowners?). In the case of mature industrial cities such as London, where 55% of surviving buildings were built before 1940 (VOA, 2020), information on teams - for the majority of buildings, if recorded at all - will be held within historical records. (The inclusion of data on teams was also noted as, inadvertently, as increasing the attractiveness of the platform to historic environment specialists, also interested in adding and editing data on building age, dynamics. original use and typology and designation. This is of particular relevance owing to the importance this sector in maximising quality and keeping datasets up-to-date at local level (see Section E).

As well as recording data on teams, the Team section is also specifically tested as a new type of tool designed to increase transparency in building standards and incentivise developers to accelerate the move away from the longstanding focus on new build towards reuse and high quality adaptation. The platform offers opportunities to connect developers with spatially located buildings within their construction portfolios and to track these portfolios over time using performance data. Data on building awards and certificates of excellence are also included to support this process. Further information on the development of the Colouring London Team category, and issues relating to access to open Team data in the UK can be found in Appendix 4(8).

9. Planning


Current Planning data subcategories:

  • Planning portal link; Planning status (live? recently approved? or work complete?); P planned for demolition?; In a protected area?; On a protected building list? (Grade/rank; ID number; risk list ID; official weblink); World Heritage list ID; Regional historic environment record ID; In other area of architectural/historic/scientific importance?; Listed as of local importance (+link)

'Planning' is designed to provide direct links for each building to regulatory information on current planning applications, proposed demolitions, statutory protections and designations imposing potential constraints on development, and to historical planning applications. These data are, other than demolition data, often the only data on building stock released by local authority planning departments at building level and, in some cases, visualised as part of their public GIS interfaces. However spatial data on the live status of all planning applications is not currently available, where data can be accessed status is not currently easy to extract. This means that communities and developers have little idea of what change exactly is being proposed in their local area, region or in the country as a whole, nor what stage of approval the application is at. The Colouring London planning section tests a live planning visualisation/traffic tool which streams planning data and visualises application status. It also allows communities to upload data on where they think/know development is planned to occur, before it is formally announced, to allow time for detailed challenges to be prepared if required. Further information on the development of the Colouring London Planning category, and issues relating to access to open Planning data in the UK can be found in Appendix 4(9).


10. Energy Performance'

https://colouringlondon.org/view/sustainability (possible name change to 'Resilience')

Current sustainability data subcategories:

  • Building Sustainability approved rating; Energy performance approved rating (domestic/non-domestic); Date of last significant retrofit (link to 'Age' only); Expected lifespan for type; Repairability rating for type; Adaptability potential within plot

This brings together data on energy performance including energy performance certificates and sustainable construction certification.

11. Resilience


Though all of the 12 main categories capture data to support sustainable development goals a dedicated category has been included for 'resilience'. As well as collating data captured within other main categories and adding to these it is also introduced to introduce new types of rating considered necessary to forecast sustainable and resilient performance e.g. ‘a repairability rating’, ‘an adaptability rating’ and information in ‘typology life expectancy’, with 'max, min and average lifespans for typologies also included. These are currently included as unactivated subcategories to stimulate discussion. A diversity rating for street blocks, in terms of range of age and land use included within blocks is also proposed owing to the identified relationship, described by Jane Jacobs in the 1960s, between building age diversity and resilience (see Section x). The category will also include an emergency planning feature designed to capture data on structural stability/building state in emergency situations. Further information on the development of the Colouring London Sustainability category, and issues relating to access to open Sustainability data in the UK can be found in Appendix 4(10).

12. Community


Current Community data subcategories:

  • Do users like building and think it contributes to the city?; Do users think this type of building is generally worth keeping?; Is this building protected from demolition?; Do you think this building should be protected from demolition?; Are activities open to the community currently taking place in the building?; Has the building ever been used for community activities in the past?; Has the building always been used for community activities?; is the building in public/community ownership

The Community category provides statistical data on community views on the quality and effectiveness of buildings. The aim here is to collect data on performance and design to help optimise the operation and quality of local stocks and to ensure sustainable local planning strategies take account of what citizens think works and what doesn't. The section also demonstrates the scale of knowledge on stocks held at community level and provides an important entry the platform for citizens and users of all ages and abilities, especially that may not realise they hold important information on the stock. The section was also designed as a tool to help community planning groups specifically - identified as one of the most stakeholder groups in long term platform maintenance, highlight well-functioning buildings under threat of demolition, with a demolition threat tool also planned for the Planning category. Users are asked if they think the buildings work well to which they are asked with a simple yes/no answer. The more agreement there is the deeper the colour. A dislike button was not included owing to its potential to encourage cyberbullying and the author’s view that it would counteract efforts to build trust and a constructive, safe community space. Screenshots of buildings ‘likes’ can then be used, along with captured data,to provide statistical evidence to local planning authorities of community interest in a building’s retention and reuse. Data on commonly ‘liked’ typologies can then then be analysed against physical attributes, location and performance.

The Community section is also used to track the loss and gain over time of buildings considered to be local ‘assets’ (be these pubs, places of workshop or dedicated community facilities) and to capture data on ownership. From consultation and the literature review, land ownership data was found to be controversial in UK, with many arguments regarding privacy circulating around this. However ownership type e.g. private individual, private company, state, charitable, community etc. (rather than owner name) is important in developing sustainability strategies at specific. In London the default position focus is set for ownership by private individuals as, as in the case of residential building in the land use category, this will the most common state) with focus placed on both on verification of these data and of other, less common, ownership types. Further information on the development of the Colouring London Community category, and issues relating to access to open Community data in the UK can be found in Appendix 4(12).


12-category-grid: accessibility and flexibility

Colouring Cities' 12 main data categories are grouped in a simple coloured, 4 x 3 category grid, which forms the platform logo, and the main interactive keypad for the user interface (see also Section x. Interface design). Short, single word names for main categories are used. The use of colour in the grid and of clear text and graphics has been found to be extremely important in attracting interest and engagement from the very wide range of stakeholders the platform needs to reach to both capture and maintain relevant information on the stock of the highest standard (see data capture methods Section x and Interface Design, Section x). The 12- category-grid has undergone many iterations over the past five years with names of categories, and subcategories held within them, repeatedly fine tuned. Although at first glance the grid appears perhaps over simple and rigid its design anticipates the need for continuous micro-adjustments to be able to made on an ongoing basis in future to enable the most efficient structuring of data to be achieved through trial and error, and as learning from stakeholders is continuously absorbed. It has has proven to be a highly flexible tool, easy to adjust, and to add, move, rename and merge main categories of data as and when required, in response to testing and consultation. However it is important to note that whilst restructuring, renaming, and reordering of main categories has been proven to be relatively straightforward, causing minimal disruption, as has been the relocation of subcategories across categories, changes to questions posed in relation to subcategories, and to format the data should be minimised as far as possible. Extensive consultation on additional subcategories required should be undertaken at the earliest stage to maximise the potential of ongoing usefulness and interoperability and of all collected data. Issues with funding platform engineering time during the project's first stage of development in fact proved beneficial in this area, in that incremental development of the front end meant that more time was allowed for content discussion.

An unexpected degree of agreement on relevant data types for inclusion was noted during consultation, including from CCRP international partners. This indicated that a limited number of data subcategories could potentially be identified to support resilience and sustainability within stocks in any country, provided that accommodation for local variations could be made. This approach also addressed Morphet and Morphet’s highlighting of the need to collect the bare minimum of data necessary to both support New Urban Agenda goals and maximise effectiveness and efficiency (add ref).

The 12-category-grid is also designed to be able to be nested within more comprehensive grids made up of an unlimited number of cells to provide a simple portal to other types of relevant open data. An example is shown below.

big picture 2

Method & History

Steps 1 and 2: Drawing from the Building Exploratory prototype, and assessing relevant public GIS platforms

Draft data categories for the Colouring London prototype were first based on those selected by her for the Building Exploratory's public GIS system, in 1998, to provide free information London Borough of Hackney's building stock. Here data relating to the three divisions discussed above were collated: on current composition (e.g. Ordnance Survey landline footprint, English heritage designation data, ownership data for housing association and local authority stock); on state/performance/quality (e.g. streetview images of state of housing association stock available when footprints were clicked); and on dynamics (e.g. historical footprints for multiple temporal intervals from late 19th century onwards purchased by all local authorities for contamination mapping). Background mapping information e.g. roads/greenspaces/streetnames was also included with raster images of microspatial colour-coded bomb damage maps from the 1940s layered at a second stage.

Data were mapped at building level and borough scale -mainly supplied by local authority and housing association datasets. (though commissions for data animations/internatives from the Centre for Advanced Spatial Analysis (CASA) at UCL (using designation data), and Cambridge Environmental Research Consultants (24 hour pollution animation) were included in the public exhibition).. At this time Ordnance footprints were not permitted to be either viewed or downloaded the public. Datasets were only permitted to be viewed on terminals linked to the Council's intranet and represented the first public viewing of all of the majority of the featured datasets, and the first time that spatial data, at building level, on the composition, quality and dynamics of the stock had been made available in such a way as to use data from the past to begin to explain the present state of the stock to help stimulate greater engagement and discussion and improve its quality in future. Critically the focus on building level data, the range of data types, the layering of data, the emphasis on visualisation and colour mapping were necessary to engage diverse stakeholders capable, together, of improving the building stock, but also in capturing feedback from residents industry, government, academia and the third sector on types of data to collected to produce a free, innovative, useful public information tools.

Between 2014 and 2016 an assessment was made to identify and advances made in publicly accessible GIS platforms providing building attribute data at building level. No publicly accessible platforms were found to exist in the UK dedicated to the provision of data on the stock, and none were identified in or outside the UK, through the literature review, providing the three divisions of data sought (see also Section 3

Data categories were also influenced by research undertaken by Hudson between 2003 and 2010. This included the design of a 4D animation of a 1km area of Hackney spanning 250 years, built with Steve Evans (https://www.youtube.com/watch?v=px_qakrZQ4w) which identified types of data required to produce time series animations and simulations and related problems with collection; and a series of free animations, also with Evans, on the history of fossil fuel use in Britain, which also tested the use of 3D spatial data visualisations to provide public information on carbon emissions. Between 2009 and 2010 Hudson also worked with Louis Jobst and The London Borough of Camden, to assess methods of capturing age data for digitised historical maps to geolocate typologies, and support local authorities in more ore rapid, accurate targeting and efficient of retrofit budgets and retrofit methods (see also Section C).

Step 3: Literature review

This basic content framework both the interface and for data types of interest was then fleshed out through an extensive review of academic literature between 2014 and 2017. This specifically looked the types of data needed to analyse and forecast resilience and sustainability in the stock, which was funded by a doctoral research grant from the UK's Engineering and Physical Science Research Council (EPSRC). Findings from this review are recorded in Section B. Here it was identified that for this research data was urgently needed, across countries, for all buildings in the stock at building level, on current building characteristics, performance and quality, rate of scale of change/churn, and capacity for adaptation and survival. Specific data types of value identified were then extracted and began to be structured within a simple main data category graphic. The main limitation of the literature review was though over 300 sources were referred to (see references) this still was known to represent only a small percentage of research on data needs in relation to sustainability of the stock.

The main output from the process was an awareness that datatypes identified from on a number of additional data types of data lifespans A key finding from this step was that ongoing consultation was also required with each stakeholder group/field of expertise to ensure as many data subcategories as possible were able to be identified.

Step 4 First stage consultation and initial platform set up (2015-2018)

First stage consultation on Colouring London was undertaken in parallel with the set up of the prototype platform by Tom Russell (see Section x). The consultation strategy was designed to:

  • identify issues with proposed data categories
  • extend the knowledge base and identify new categories required
  • build a network reaching all relevant stakeholders’ audiences
  • develop levels of trust, goodwill and interest necessary to operate and sustain a collaborative maintenance system (see Section x) .
  • stimulate ongoing feedback on both proposed content, interface design and the wider value of the platform to ensure development in as as useful, efficient, effective and sustainable way as possible.
  • address three issues raised by Morphet and Morphet (ref) in relation to potential problems with New urban Agenda delivery in the UK: institutional fear of data release, difficulty in normalisation of relevant data upload and data release initiative sustainability.

This initial stage of consultation with specialist groups commonly involved 2 hour + discussions with specific stakeholders, using, in the first instance between 2015 and 2017, excel sheets to describe/promote discussion of proposed categories and subcategories, followed later by the addition of colour graphics for the category grid, and finally by the live prototype platform once released 2018 in 2018. This was initially undertaken with: researchers working in urban science (at CASA) re data required in analysing cities as complex systems; researchers working in energy research (with UCL Energy Institute, UCL Institute of Environmental Design and Engineering, and the Building Research Establishment) re data required in energy analysis and monitoring, and retrofit; historic environment specialists (including Historic England and amenity societies) working on conservation/building lifespan extension; historians able to provide lifespan data (e.g. Survey of London, and the Institute of historical research); and community planning groups on data required to challenge negative change to local areas. The latter two involved consultation with the Better Archway Forum (interviews) and Somers Town Neighbourhood Forums as well as community workshops funded by Historic England). Two consultation exhibitions (in 2015 and 2018) were also hosted by Alan Baxter Associates, where many national amenity societies and small London organisations are based. Between 2017 and 2021 consultation was expanded, as and when stakeholders were identified, to include individuals and organisations from more academic departments, the construction industry, local and central government, and the voluntary and community sectors. Around sixty organisations and individuals were involved in commenting on the interface and data categories. A list of consultees and partners and the type of contribution offered is provided in Appendix x.

Specific steps used in consultation were as follows:

  1. Identify stakeholders/consultees through relevant writing/projects or recommendation by other consultees;
  2. Approach consultees to set up meeting ideally with recommended contact, emphasising the project as an open data project run by an academic host
  3. Conduct face to face interviews (approx 2 hours)
  4. Identify which categories and features are of particular interest to each stakeholder and why;
  5. Discuss what adjustments/additions could be made to existing categories to support their work.
  6. Identify any related interface improvements to attract specific audiences- e.g. non technical category descriptions, non-technical interface, inclusion of option to add links to text based information and images
  7. Requesting information on ideal data formats
  8. Work out how stakeholders might best like to contribute (such as through the provision of bulk data, coding support, voluntary contributions from members, expert advice, platform launch venues, publicity or funding).
  9. Co-develop ideas for a long-term win–win scenario for each stakeholder group to encourage ongoing contributions and collaborative maintenance of data categories, and where relevant possible joint projects/funding applications.

As noted above an unexpected degree of agreement on relevant data types for inclusion was noted during UK and international consultation. very few specific request for additional subcategories were made at this stage other than from the Historic England which asked for more detailed designation subcategories. A notable limitation of the informality and ad hoc nature of the first stage consultation process was that consultation meetings were not recorded. This has meant that, though all consultees were/are credited on the platform, recommendations leading to any category adjustments were not tracked. Methods to facilitate and improve recording of stakeholder input into category adjustments in future are outlined below

Step 5: Release live, first stage testing and adjustment

As part of the iterative process of design, main categories were repeatedly renamed and moved and edited (see Appendix 2). New subcategories were also introduced, and categories removed or temporarily reintroduced with a slightly different focus between 2016 and 2019. All changed are recorded in Colouring Cities open code within the GitHub repositories. Renaming has included ‘Design/Build’ section has been renamed ‘Team’; the ‘Street Front’ section has been expanded and renamed ‘Streetscape’; the Greenery section has been merged with ‘Streetscape’; the ‘Protection’ section has been expanded and renamed ‘Planning’; the ‘Demolitions’ section has been expanded and renamed ‘Dynamics; the ‘Like me?’ section has been expanded and renamed ‘Community’, and a new section ‘Sustainability’ has been added. The process of additions/adjustment to main category names and subcagtories is recorded within the GitHub's archives. In addition many weeks of testing data upload for activated categories resulted in numerous minor tweaks to text and position (recorded on GitHub).

Steps 6-8

  • Subcategory change caused by shifts to open data policy/new data availability: For example in terms of 'Location' data new subcategories for London have been able to be added over the years owing to a gradual change in UK government attitude towards the economic value of open data (add Geospatial Commission ref). Here Ordnance Survey polygon/building footprint reference information could be included from the outset but centroid coordinates (held in Unique Property Reference Numbers) allowing captured open data to be mapped were not released until 2020.

  • Subcategory change caused by detailed consultation with experts in relevant sectors: For example in terms of 'Age' data, subcategories were added i n response to historic environment sector concerns that different age facades and major accretions should also be recorded, and saw the option to add sources extremely important in ensuring accurate data was provided. In time more feedback from the general public on ideas for new categories or requests for clarity are expected.

  • Subcategory change caused by ethical requirementss relating to data release and the privacy and security of platform users, and of building occupiers and owners. The category most affected by this has been 'Community' which allows the public and other stakeholders to colour buildings they think perform well, and where numerous iterations of questions have occurred owing to concerns regarding potential threats to inclusivity and in the worst case scenario geospatial bulling. (This is further discussed under 'Phrasing of subcategory questions in Chapter x: 'Interface Design).'

  • Awareness of the opportunity to release new types of data in future, to support sustainable development and that thee could still have value as proposed subcategories without being activated. This was most relevant to the 'Sustainability' category where information as repairability, adaptability and longevity of buildings was considered extremely useful, and able to be inferred by combining current attribute subcategories such as age, typology, footprint, plot size, with 'Dynamics data' such as dynamic tissue type, and construction and demolition dates for typologies of demolished buildings.

Of key importance to this iterative process of data selection within the prototype is an awareness that as though subcategories may be swapped within main category headings, and be added or removed, changes to subcategories may cause new data collected to longer match with previously captured data, and which cause these data to become redundant. Though agility and ability to adapt are critical characteristics for this type of platform, it should be stressed that work on data subcategory selection needed largely carried out up front, using the literature review and focused stakeholder consultation approach to be carried out over a long period, to minimise risks in this regard. The current second phrase of platform testing with CCRP partners will provide detailed information on the extent to which these categories translate across countries, and on other categories and subcategories that may need to be included.

The current method used in relation to category adjustment and alterations building on learning from the above stages through the 'Discussion' forum, and GitHub issues with have since been implemented. Not all subcategories have been released. This is largely as a result of the significant cost of lack of enough software engineering time for implementation. The amount of engineering time required to select, design, build, consult on and release data subcategories was initially significantly underestimated by the author. Methods of better recording and crediting contributor design input through the platform’s feedback system, as well as through GitHub are currently being looked at. Not all subcategories have been released. This is largely as a result of the significant cost of lack of enough software engineering time for implementation. The amount of engineering time required to select, design, build, consult on and release data subcategories was initially significantly underestimated by the author.