E2. DATA CAPTURE METHODS choice and discussion - colouring-cities/manual GitHub Wiki

UNDER CONSTRUCTION Page not yet active

editor @pollyhudson

Combining official Streaming and Visualisation of planning application status, with crowdsourced data to improve planning effiency and transparency.

Case study project/proof of concept: 'Colouring London Planning Visualisation (CLPV) Tool: Live Streaming and Visualisation of Planning Application Data'

  • The CLPV Tool project has been awarded by Loughborough University’s Enterprise Projects Group (EPG) with funding by the Engineering and Physical Sciences Research Council (EPSRC) Impact Acceleration Account (IAA).

Purpose of Colouring London Planning Visualisation (CLPV) Tool

  • To provide open code for a Colouring Cities feature that increases accessibility to open planning data- demoed for London using data released by the Greater London Authority
  • To enable communities, developers, local authorities, and others to gain instant accessto visual info onthe location and status of planning applications, as well as weblinks to application details
  • To collect information, on the location/rate and type of new build and demolitions occurring o provide live data on urban metabolism and on energy and waste flows
  • To provide greater transparency on what is being approved/rejected and why? to understand how this links to sustainable planning policies, and what types of finite resource are being lost
  • To identify where appeals are occurring e.g. locations of conflict between communities and/or local authorities, and developers, to widen debate.
  • To provide live open data on housing completions

First stage design brief for UK test case

  • Work out live applications (greater London Authority planning up to location Issues arising: - No open address data, to UPRN? centroid building footprint? could be multiple building footprint on site. INSPIRE polygon? point on UPRN centroid plus coloured land parcel?
  • Create system that enables colour to change as planning status changes as follows
  • Click if you think a planning application is likely to be submitted here? (PINK); 'Create date of entry' box; Create verification button; Create auto colour fade to duller tone after 3 months; create pop up text saying 'colour needs to refreshed after 3 months'; add to both COMMUNITY section and PLANNING section; create pop-up text to say 'click PLANNING category for visualisation of live planning applications'
  • Has a planning application has been submitted? (Turn parcel/footprint YELLOW) Live stream from GLA planning hub but also have manual verification. Should we have a manual option back up if live stream doesn't work?
  • If listed building or conservation area application add colour highlight (check how easy to extract from GLA appplications)
  • Will most of the building be demolished? Yes/no. Probably has to be crowdsourced (need to check) no colour but add verification button (not visualised)
  • is the application approved? If Yes GREEN. If No/ Rejected - RED. Live stream from GLA planning hub but also have manual verification
  • Has approved work begun? If Yes- change to Green, If No - leave orange
  • Add date (not visualised)
  • Is there in appeal? Add RED OUTLINE to GREEN approval. Appeals important as likely to show significant community concern
  • If multiple appeals- thicken or change outline with each appeal-multiple appeals show v serious community concern.
  • Is the new work completed and the building in use, Change to TURQUOISE (colour shows for say 6 months only and then changes to a dulled colouring so we can see historical applications from a specific date?)

Key Questions arising

  • What should we be visualising- parcel. footprint - need to avoid points as hard to see.
  • How long should approved applications be visualised for?
  • How is historical data stored?
  • Is streamed data comprehensive/adequate?
  • Signpost limitations of live streaming for users
  • Test whether data can be streamed using code from another regional authority

See image Link no longer works was for London Borough of Hackney. Find My Nearest (Map). Web. Accessed March 21, 2021. https://hackney.gov.uk/find-my-nearest-map.

Launching planning application data streaming service in another location

To launch this service in another city following are necessary

  • update Colouring Cities platform to version including livestreaming this data (note: this is not available for now and is being implemented)
  • planning permission data must be made available to public
  • data must be available in a machine readable form (for example, scanned documents works require enormous, massive effort to process)
  • it is necessary to write a custom downloader code
  • if permission lifecycle is distinct from already implemented then special support would need to be implemented

Data source in London

It is provided by The Planning London Datahub

Impact of data quality on design decisions

Data is available only for part of the London. Only in some boroughs data is available at all.

Listed application include detailed information split into separate fields. For visualisation location and geometry of affected area is especially crucial. Unfortunately, property with geometry representing an affected area is always empty. This makes representation of data much more problematic.

Data includes also centroid location data, unfortunately in several cases this data was clearly wrong, with location set in an Atlantic Ocean far away from London.

For example extract from one of entries presented below exhibits both missing polygon and invalid location:

          "uprn": "000128045097",
          "centroid": {
            "lon": -7.55716,
            "lat": 49.766807
          },
          "polygon": null,
          "id": "Kingston-20_02589_FUL",
          "description": "Application for Variation of Condition 2 (Approved Plans) of Planning Permission ref: 16/13202/FUL (Erection of a terrace of 4 dwellings with associated car parking and landscaping on the main part of the site; and the erection of a replacement and enlarged office building following demolition of existing buildings) dated 23/01/2017. Amendments sought are to change floor to floor heights\r\r\r",

Data includes also UPRN identifiers which implicitly locate places. In many cases it may allow to locate objects without valid centroid data and without provided polygon geometry.

As result, data should be augmented by implied location implied by UPRN - at least where centroid data is clearly invalid.

But some entries contains no valid UPRN

          "centroid": {
            "lon": -7.55716,
            "lat": 49.766807
          },
          "id": "Tower_Hamlets-PA_22_00889_NC",
          "uprn": null,

Provider of data was asked about missing data.

Maximising data accuracy and quality** CCRP platforms experiment with the capture of spatial building attribute data at building level and city or national scale and work to, over time, provide datasets of the highest quality possible. To improve data quality and particularly accuracy and interoperability of systems, specific features are required to engage specific types of expert groups with knowledge of specific aspects of buildings e.g. energy specialists, residents, historians etc.. This also requires different types of data capture methods. For example computational approaches using inference and bulk uploads can accelerate data capture and increase the scale of data available, but manual additions and enrichment, verification and updating at local level are also vital to maximise reliability. Platforms also require features to be designed to help stakeholders expend minimal resources and time to enrich, maintain and access data, and to make this process as enjoyable as possible. Detailed consultation with stakeholders and diverse research collaborations, and involvement of specialists working in art and design and in communication is, as such, a critical part of the development process.

Currently being edited by poly@64

Introduction

Extensive and detailed stakeholder consultation and engagement is an essential part of the set up, management and maintenance of any Colouring Cities platform. Stakeholder input is considered critical to platform sustainability and success. Engagement across sectors and disciplines, and with communities, is an integral part of platform design, not an add on, and viewed as the most efficient way to produce the quality of data required.

Colouring Cities tests an adapted version of the highly successful collaborative maintenance model operated at global level by open data platforms such as Wikipedia and OpenStreetMapwork. Here data are supplied, checked and updated by platform users. Contributors also manage platforms ADD. The key difference between these initiatives and Colouring Cities initiatives is that they are run as foundations, not research programmes (see also Governance Section x) and are not designed to capture data to allow specific research questions to be answered. Furthermore there is no concept of a platform being 'complete' whereas in Colouring Cities it is the Colouring in of the whole map, and indeed of multiple maps that is the aim as comprehensive, granular attribute data is what is sought. This allows for full a picture of view of the stock as a complex dynamic system as possible facilitating analysis at multiple levels.

Once maps are fully coloured in they Colouring Cities constant enrichment and updating must occur, the idea of developing data capture methods that allow the all pieces of the puzzle to be fitted for each data type for cities, an ultimately as a whole city.. Once the most efficient has been identified then it can be used to update and enrich coreto be filling information idea of data havig to filled in They also do not need toseek to As such they do not need to identify specific types of expertise . This difference means that specific types of expertise are not specifically sought.

Steps

  • identify issues with proposed data categories
  • extend the knowledge base and identify new categories required
  • build a network reaching all relevant stakeholders’ audiences
  • develop levels of trust, goodwill and interest necessary to operate and sustain a collaborative maintenance system (see Section x) .
  • stimulate ongoing feedback on both proposed content, interface design and the wider value of the platform to ensure development in as as useful, efficient, effective and sustainable way as possible.
  • address three issues raised by Morphet and Morphet (ref) in relation to potential problems with New urban Agenda delivery in the UK: institutional fear of data release, difficulty in normalisation of relevant data upload and data release initiative sustainability.
  1. Identify stakeholders/consultees through relevant writing/projects or recommendation by other consultees;
  2. Approach consultees to set up meeting ideally with recommended contact, emphasising the project as an open data project run by an academic host
  3. Conduct face to face interviews (approx 2 hours)
  4. Identify which categories and features are of particular interest to each stakeholder and why;
  5. Discuss what adjustments/additions could be made to existing categories to support their work.
  6. Identify any related interface improvements to attract specific audiences- e.g. non technical category descriptions, non-technical interface, inclusion of option to add links to text based information and images
  7. Requesting information on ideal data formats
  8. Work out how stakeholders might best like to contribute (such as through the provision of bulk data, coding support, voluntary contributions from members, expert advice, platform launch venues, publicity or funding).
  9. Co-develop ideas for a long-term win–win scenario for each stakeholder group to encourage ongoing contributions and collaborative maintenance of data categories, and where relevant possible joint projects/funding applications.

As noted above an unexpected degree of agreement on relevant data types for inclusion was noted during UK and international consultation. very few specific request for additional subcategories were made at this stage other than from the Historic England which asked for more detailed designation subcategories. A notable limitation of the informality and ad hoc nature of the first stage consultation process was that consultation meetings were not recorded. This has meant that, though all consultees were/are credited on the platform, recommendations leading to any category adjustments were not tracked. Methods to facilitate and improve recording of stakeholder input into category adjustments in future are outlined below

The consultation method designed to build the collaborative maintenance network foundations involved the following: • identifying stakeholders; • understanding which categories and features were of particular interest of interest and why; • discussing how stakeholders thought the interface could be improved to help support their work and engage their specific audience/s; • requesting information on ideal data formats; • working out how stakeholders might best like to contribute (such as through the provision of bulk data, technical support, voluntary contributions from members, expert advice, launch venues, publicity or funding). • Developing a win–win scenario for each partner body (as successfully tested at the Building Exploratory.

Addressing data fragmentation by harnessing stakeholders' expert knowledge

Issues identified:

  • Integrated platform do not currently exist designed so as to capture data on the composition, performance/quality and/or dynamics in a way that caters for multiple stakeholders
  • Building stock stakeholders are very diverse, ranging from academic researchers and government policy makers working in housing, planning, energy and conservation, to building industry specialists, residents and schools.
  • Stakeholders across countries appear to be similar
  • Stakeholders need data on stocks for diverse purposes e.g. research papers, policy documents, planning applications, school projects
  • Stakeholders find free, high quality, accurate, relevant data extremely difficult to find.
  • All stakeholders hold some expert knowledge on the stock but often in relation to very specific areas aspects of buildings, or buildings in specific locations.
  • Stakeholders won't put in extensive time/expertise unless they trust the project, believe they/their sectors will directly benefit from the project, and where applicable consider that their business models are enhanced rather than compromised
  • Stakeholders could potentially monitor and maintain specific datasets relevant to their area of expertise (e.g. historic environment departments agencies could oversee protected assets, energy departments/research bodies could oversee if the benefits are significant enough and tailor to suit their needs) Community planning groups are rare in that they often hold expert knowledge of composition, quality and dynamics of buildings, though at a very local level

Solutions currently being tested:

  • Extensive consultation is required to get stakeholders on board

  • Time is requirement to develop relationships and understand expertise and data needs of stakeholder groups to produce win-win scenerios

  • Integrated centralised platforms able to scale to nationally level can be easily and inexpensively built by academic hosts to allow stakeholders to:

    • contribute information very easily without requiring any technical knowledge
    • see other stakeholders' contributions visualised to motivate upload and understand the gaps
    • use the platform to curate datasets of interest to them/on which they have expert knowledge, and to see them enriched by others
    • become involved at their own speed
    • trust paltforms with their data
    • credit stakeholders for their involvement through the edit history, leaderboard and/or Who's involved page
  • Suitability of specific data capture methods for specific data types

Four types of method of data capture were identified during the literature review and public consultation process. These are:

  • Bulk upload of existing open datasets
  • Crowdsourcing
  • Computational generation using inference
  • Live Streaming

Potential for combining these methods to improve the quality of data was also identified and is discussed further below, This multi-pronged approach to building attribute capture and dissemination has not been found to have been tested before but, has during testing been identified as the most effective way to produce the highest-quality data possible, at the fastest speed, for the city as a whole, as well as to update and maintain. Bulk upload collation, computational generation, and crowdsourcing methods were tested between 2016 and and 2019 both pre and post platform development. Live-streaming has not yet been tested, however live streaming and visualisation of planning data (providing information on short-term dynamics) within the Colouring London prototype has been has been submitted to Loughborough university and if successful will be undertaken in 2022.

Bulk uploads, benefits and limitations

The advantage of bulk uploads from trusted sources is that their quality has been checked and is the responsibility of, or of interest to, the contributing organisation. This is for example the case with UK open datasets released by government agencies/departments tested on Colouring London which included such protected building data (Historic England), and Energy Performance Certificates (Department for Levelling Up, Housing and Communities). It was also true of data donated by academic departments such as adjacency data, and cleaned LiDAR/height data (provided by UCL Energy Institute and accessed original from The Environment Agency). An additional advantage is that these datasets are already known to be of interest and relevance to a specific audience groups.

Open data availability however varies considerably across countries, an area of particular interest in CCRP research The disadvantage of bulk uploads in countries such as the UK is these are very limited at the availability of these open bulk uploads varies considerably across countries (see Appendix 2). This is most evident with regard to current building attribute data. In nations where property tax databases are accessible, existing bulk uploads are highly significant as these appear to be the richest source of data on stock composition for many countries. They are open in the US, Netherlands, Slovenia, Netherlands and Iceland for example but heavily restricted, even to academia and government departments, in Britain. In Britain's property tax database for example contains building level records for multiple attributes for around 27 million taxable buildings (i.e. virtually the entire stock) offering well over 100 million data points. These data must also be updated and maintained to a sufficiently high standard to withstand legal challenges to taxation decisions. Colouring Cities platforms able to access to property tax data will therefore be able to advance Stock composition categories and subcategories far faster those that do not. For these alternative methods of data capture are required.

Co-working on bulk uploads is also extremely important in the context of a proposed collaborative maintenance systems where individual bodies take responsibility for the upkeep of specific datasets of value to them, but where the CCRP platform hosts can provide support for national and regional partners wishing to improve the quality and visibility of their data, and to see it integrated into a national public system. This approach continues to be tested through a collaboration between Turing and Historic England which has explored the disaggregation and cleaning of data on protected assets (focusing on isted buildings) for London for upload onto the Colouring London prototype. The work was necessary for the platform to allow data to be mapped at building level. From Historic England's point of view the work had the potential to make their data more useful and accessible and accurate. The work originally tested automated approach involving address matching. However it was concluded that accuracy levels were insufficient and was agreed that the most efficient way to maximise accuracy was to run it as a combined bulk upload/crowdsourcing approach in which point data is uploaded as a guide for crowdsourcing, with local civic groups asked to add and verify data (see below).

Moderation of bulk uploads by CCRP was concluded to be essential to prevent malicious activity in upload and overwriting of existing data. A queuing system, allowing datasets to be submitted for checking and uploaded in-house, is proposed. Examples of existing bulk datasets gathered to date are shown in Table 9.5. These were either identified by the in-house team or recommended by partners (although not all of those listed below have yet been uploaded).

Computational generation using inference: benefits and limitations

Computational approaches offer huge opportunities for rapid population of many categories open data platforms on the stock. In the context of computer vision, satellite imagery has been used in the extraction land-use and height data for a number of years (Ehrlich et al., 2012). More recently AI has been used to extract building façade data from Google Street View (Law et al., 2019), and to support extraction and analysis at a much more detailed scale (Postadjiana et al., 2017; Schlosser, 2020). Light imaging, detection and ranging (LiDAR) data has also been used generate height data for cities, and low-cost 3D, rapid stock visualisations, as shown in Figure 1.5, and to infer storey number (Evans et al, 2017).

Automated classification methods in which training data are used to teach the computers to recognise and classify building features using rules predefined by expert knowledge, have been used since 2004 in heat analysis in Germany, to predict the geolocation of specific types and ages of building (Neidhart and Sester, 2004). Machine learning has also been used to classify street-blocks types, from topographical maps (Meinel et al. 2009) and have been applied to historical maps to study the persistence and resilience and change to form over long periods of time (Hecht et al., 2015; 2019; Herold, 2015). Computer-assisted approaches to volume data generation, based on interdependencies, and mathematical relationships between building components have also been experimented with, since the late 1960s, within the science of built form (Steadman, 2016). In the UK, large-scale building adjacency datasets for example (showing whether a building is detached, semi-detached etc.) have been created using this approach (Orford and Ratcliffe, 2007; Smith and Crooks, 2010; Evans et al., 2017). In the international context, Berghauser Pont reference examples of automated classification work relating to buildings, streets and plots (Berghauser Pont et al., 2019). However if significant potential is considered to exist Computational generation is considered relevant to most data categories and is likely to become the main data production and updating method for most data types. Since 2018, 4.7 million edits to Colouring London have been made, of which all but 200,000 have been generated using computational approaches. Expert checking through crowdsourced feedback loops is, as previously discussed, also essential, in order to maximise quality and accuracy. Crowdsourcing is also necessary to capture historical data that cannot be extracted using automated processes. In Table 9.4, data generation methods are summarised for each Colouring London grid category, along with target uploaders. Table 9.4 Main Colouring London data upload methods for core category types Category Main anticipated upload method Main anticipated data providers Location Bulk uploads OS Current Use Computational generation In-house (Colouring London team) Crowdsourcing Communities, schools and other Type Computational generation In-house Crowdsourcing Historic environment sector Age Computational generation In-house Crowdsourcing Historic environment sector and construction industry Size Computational generation In-house Crowdsourcing (minor) Construction industry (new build) and historic environment sector Construction Computational generation In-house Bulk uploads Construction industry, energy research Crowdsourcing Historic environment sector and construction industry Streetscape Computational generation In-house Crowdsourcing (minor) Citizens Team Crowdsourcing Historic environment sector Bulk uploads Award bodies Planning Bulk uploads Planning and historic environment sector Crowdsourcing (for demolitions) Citizens Live-streaming (proposed) GLA Sustainability Computational generation In-house Bulk uploads UCL Energy Institute, MHCLG, BEIS Crowdsourcing (minor) Citizens Dynamics Crowdsourcing Historic environment sector Community Crowdsourcing Citizens 9.13.2 Bulk upload of existing open datasets

Table 9.5 Colouring London, examples of bulk uploads Category Dataset Source Cleaning required Location Open TOIDs OS No Open UPRNs OS Some OSM IDs OSM No Current use Land use (NLUD) group Camden/Westminster sample No Type Type (NLUD) group Camden/Westminster sample No Age Type (NLUD) group Author’s Camden/Westminster sample No Size Height Environment Agency (prepared by UCL Energy Institute) Yes Construction - - Streetscape - - Team BREAM awards BRE, RIBA, etc. Planning Listed buildings (aggregated) Historic England Yes Conservation areas Historic England and individual London local authorities No Sustainability DEC MHCLG No EPCs MHCLG Yes Dynamics n/a n/a n/a Community n/a n/a n/a 9.13.3 Computational generation using inference Most attribute data has been generated to date – and will likely be generated for Colouring London overall – using automated methods and inference. In Summer 2020, 750,000 age data points produced by Flora Roumpani using Kiril Stanilov’s road network data and the author’s method were uploaded onto Colouring London. Algorithms for mixed-use retail data geolocation are also being worked on with Stephan Law. Historic environment sector ecosystem links required to facilitate the development of expert checking feedback loops are being developed with Historic England. Computationally generated height and adjacency data have also kindly been provided by UCL Energy Institute, coordinated by Steve Evans and Dominic Humphrey. Examples of open datasets proposed for computational generation during the second stage are starred in Table 9.6. Table 9.6 Colouring London, potential for computational uploads Category Dataset Generator Derived from/using Location OSM centroid coordinates In-house OSM/open UPRN Current Use Domestic buildings default In-house Made default for all buildings Retail buildings* S. Law (ATI) with author Inferred using OSMM and historical network data Type Building adjacency type UCL Energy Institute UCL Energy Institute data Age Construction date In-house: F. Roumpani (FR) with PH Inferred from historical network data Size Storeys* In-house (FR and PH) To be inferred from using age data and footprint

Roof shape* 	In-house (FR and PH)	

Construction Construction system* In-house (FR and PH) Main materials* In-house (FR and PH) Secondary materials* In-house (FR and PH) Streetscape Modal height UCL Energy Institute UCL Energy Institute Pavement width (average)* N. Palominos N.Palominos Street width* N.Palminos N.Palminos Gardens/ Greenery* tbc tbc Proximity to public green spaces* Colouring London spatial join Use OS open greenspace data
Block permeability* tbc tbc Team n/a n/a n/a Planning n/a n/a n/a Sustainability Lifespan and repairability* tbc tbc Dynamics n/a n/a n/a Community n/a n/a n/a 9.13.4 Crowdsourcing and the importance of the historic environment sector Crowdsourcing has been identified within this study as of considerable value in terms of checking accuracy of computationally generated building attribute data, especially in areas such as age. It has also been found to be important in the generation of types of building attribute data (that cannot be generated using automated methods) relating to dynamics or to whether communities think particular buildings operate well, or where a specific degree of precision (as in the case of age data) is required. Crowdsourcing is also necessary if a whole-of-society approach to emissions reduction and urban sustainability is to be advanced as it allows many different types of audience to easily become practically involved. In terms of crowdsourced entries to Colouring London so far, age is the data type that has been most commonly uploaded. Indeed, as the Leaderboard shows, over 40,000 age data entries have been made by a single user (with spot checks finding these to have a high degree of accuracy, but that sources are not included). Colouring London age data (produced through computational generation and crowdsourcing) has also been described by UCL Energy Institute London as the age data of choice for its London Building Stock Model (Steadman et al., 2020). Table 9.6 highlights categories and subcategories to which crowdsourcing has been identified as of particular relevance. Table 9.6 Colouring London, data categories relevant to crowdsourcing Category Subcategories to which crowdsourcing is relevant Main anticipated data sources Location Building Name only Diverse user groups Land use All subcategories Diverse user groups but including schools Type Original Use; Date of Last Change of Use Historic environment sector Age All subcategories combined with computational generation Historic environment sector and construction industry Size Storeys (especially attics and basements) – combined with computational generation Historic environment sector Construction Enhancement of computationally generated data Historic environment sector and construction industry Streetscape Enhancement of computationally generated data Diverse user groups Team All subcategories Historic environment sector and construction industry (new build) Planning Demolitions threats and completions, and completions of new build only Historic environment sector, residents, commercial owners Sustainability Retrofit only Residents and commercial owners Dynamics All categories Historic environment sector Community All categories Residents For crowdsourcing platforms to be successful, voluntary contributions of data must be of mutual benefit, the submitter gaining satisfaction from giving data while the platform benefits from receiving it (Estellés-Arolas and González-Ladrón-de-Guevara, 2012). This requires clear and well-thought-out mechanisms to deal not only with the management and storage of collected data but also with issues such as malicious engagement and the tailing-off of user interest (Lauriault and Mooney, 2014). Attracting and sustaining high-quality reliable editors is known to be difficult. Wikipedia records that only 0.4% of its users are also contributors and that only 0.04% of these are regular contributors (Wikipedia, 2019). The vast majority of edits on Wikipedia and OSM also made by men (Wikipedia, 2021; Gardner et al., 2019).
The main to harnessing the vast body of expertise held within the historic environment and community planning sector shown in Table 9.6), and to facilitating a whole-of-society approach as found in the development of the Building Exploratory, is to design an easy-to-use, attractive platform where both high-quality colour graphic design and simplicity are core components, and where editing tools are all specifically designed for non-technical audiences interested in adding data on buildings and their history. Examples of features designed in response to requests from this sector include the lifespan feature within the Dynamics section (for lifespan data), the Team and Community categories, and weblinks allowing additional historical information to be provided. (Issues of lack of building numbers and the need for better integration of images were raised but have not yet been addressed.)
Work has begun to develop a schools land-use data upload programme and to engage construction industry bodies in supplying information for the Construction and Team sections. However, most effort to date has been given to linking into and working with London’s trusted historic environment and community planning network, as this network holds expert knowledge of local building stocks and their evolution and has the skills and motivation to help build and maintain a high-quality building attribute database for the city. In 2016, the author highlighted opportunities for the historic environment sector to collaborate with data scientists and to radically reposition itself at the forefront of the sustainable and intelligent city debate, owing to its ‘unrivalled knowledge of the building stock and its evolution and of the impact of change’ (Historic England, 2016). Since this time, Historic England has committed strategic grants to Colouring London totalling £150,000, indicating the potential its sees the project as having in delivering conservation sector goals. As discussed in Chapter 2, questions of key interest and relevance to this sector are sustainable city development through incremental adaptation; retention of uniqueness; building reuse and lifespan extension; demolition tracking and designation. All of these are dealt with by Colouring London. The voluntary conservation movement, which has operated in London for over a century (as noted in Chapter 5), is at local level extremely well organised, well informed, practical, creative, forward-thinking, community-focused and effective. Data contributions are likely to be sustained with few prompts. Consultations with community planning groups in Camden and Islington have shown substantial interest in using Colouring London to record and visualise data. Colouring London in turn offers the historic environment community a unique tool, allowing it, without any technical know-how, and through the simple, collaborative, and enjoyable process of colouring buildings, to create and manage a public database on the evolution and current composition of the stock. This database is relevant to this community’s work in monitoring and managing older buildings, in celebrating built heritage, and in promoting lifespan extension and reuse. It also allows historians, for the first time, to help scientists build more accurate models and simulations on the performance and dynamics of the stock. Coleman et al. describe drivers of ‘deliberate’ (as opposed to ‘accidental’) crowdsourced uploads as including altruism; professional or personal interest; intellectual stimulation; protection or enhancement of personal investment; social reward; enhanced personal reputation; finding an outlet for creative and independent self-expression; and pride of place (which could simply mean adding detail about your street or building on the map) (Coleman et al., 2009). All of these drivers are relevant to the historic environment sector. In Table 9.7, historic environment groups, and those interested in this area, are classified, following Coleman et al., as Neophyte contributors (having interest and time but no background); interested amateurs (having begun background reading); expert amateurs (knowing a lot but not earning a living from it); expert professional contributors (having studied and practiced a subject and relying on it for a living, and that can be used in relation to opinions); and expert authority contributors (having an established record, reputation and livelihood that can be lost if their credibility is even temporarily damaged). Of these, the expert amateur class, represented by local amenity societies, is considered likely to provide the most data.
Table 9.7 Grouped contributor classes for Colouring London relevant to the collection of building age data. Relevant historic environment stakeholders and potential contributors to building-attribute data based on Coleman et al. classifications Stakeholder classification Home owners/occupiers of older properties Neophyte (with a small percent overlapping other categories) Primary and secondary school children involved in local history studies Heritage site visitors Students of architecture, architectural history, planning, or historic building conservation Expert amateur Local arts, architecture and history societies/groups Historic-building and Blue Badge guides Neighbourhood fora Local civic societies, local amenity societies and community-led planning groups Historic-building and preservation trusts National amenity societies Architectural historians (outside academia) Expert professional Independent heritage organisations (e.g. National Trust) Specialist educators (university/school/freelance) Architectural journalists Construction industry professionals (practising and retired) Professional bodies – RIBA, RICS, Royal Institute of Engineering etc. Local authority officers in planning, housing, conservation University departments/research relating to building conservation; historic building surveys and architectural history; urban morphology; architecture; surveying; civil engineering Expert authority Government departments and advisory bodies, including Historic England, VOA, MHCLG and BEIS

5.4.2.4 Computational classification of dynamic tissue types Scheer’s dynamic tissue classification offers a simple method, able to be used within open building attribute data platforms, to spatially group buildings, streets and plots within cities according to their dynamic behaviour. For this data on land use, building type (especially adjacency) and specific street characteristics (e.g. location of arterial routes to locate ‘elastic’ and ‘static’ tissue; and of parcels containing access roads to locate ‘campus’ tissue) are required. Once dynamic tissue types have been geolocated, a dynamic classification can be assigned to each building in the city to indicate likely change to physical form over time, and the rhythm and relative speed at which this is predicted to occur. The potential value of dynamic classification data, for stock forecasting models, and for procedural models discussed below, is considerable. Dynamic classifications can also be verified using recent historical planning data, and/or longitudinal data (e.g. vectorised footprints as described in Stanilov and Batty’s 2011 study). In Chapter 8, semi-automated methods of geolocating dynamic tissue types for London are tested.

5..5 Typology subcategories proposed for open data platforms The chapter concludes with a second draft table, Table 5.3 (which in Chapter 6 this table is merged with Table 5.2 above), which proposes subcategories for the ‘Type’ category, within the prototype open data platform, based on discussion in this second section of the chapter. Owing to the category’s complexity and the need for further detailed consultation with organisation and disciplines represented within the workshop, the main development of this category is postponed to the prototype second development stage. Table 5.3 Examples of data subcategories proposed for the typology section Open typology subcategories proposed. All to include dropdown options Sample descriptions Base form (after Steadman et al., 2000) e.g. Daylit (sidelit) cellular strip 1–4 storeys Base form density classification (after Berghauser Pont and Haupt, 2005) e.g. Low rise compact strip development blocks Local typology description (after regional urban morphology classification) e.g. London Victorian terraced house Adjacency description e.g. Terraced, Semi-detached or detached Parasite forms (after Steadman et al., 2000) e.g. Basement Dynamic mutations observed (after Kostourou, 2021) e.g. Extrude + extend Dynamic tissue classification (after Scheer, 2010) e.g. ‘Static’

Problems of complexity

Data Capture methods

Bulk third party open data uploads

Bulk third party open data uploads

Crowdsourcing

Crowdsourcing

Computational approaches using inference and feedback loops

Computational approaches

Live streaming

Live streaming

9.13 Data production methods 9.13.1 Summary of main data production methods Colouring London is a collaborative maintenance project that incorporates the crowdsourcing and public editing of data as one of four methods of data capture, the other methods, as discussed in previous chapters, being bulk uploads of existing datasets, large-scale computational generation, and live streaming. This multipronged approach is considered the most effective way to produce the highest-quality data possible, at the fastest speed, for the city as a whole, as well as to update and maintain Bulk upload collation, computational generation, and crowdsourcing have all been tested in this study and commented on below Live-streaming of planning data with the GLA is proposed for the second development stage. Relevance of each to the twelve main data categories is shown in Table 9.4. Of all four methods, bulk upload collation is considered likely to produce the smallest amount of data for Colouring London until OSMM and VOA property tax data are released. Computational generation is considered relevant to most data categories and is likely to become the main data production and updating method for most data types. Since 2018, 4.7 million edits to Colouring London have been made, of which all but 200,000 have been generated using computational approaches. Expert checking through crowdsourced feedback loops is, as previously discussed, also essential, in order to maximise quality and accuracy. Crowdsourcing is also necessary to capture historical data that cannot be extracted using automated processes. In Table 9.4, data generation methods are summarised for each Colouring London grid category, along with target uploaders. Table 9.4 Main Colouring London data upload methods for core category types Category Main anticipated upload method Main anticipated data providers Location Bulk uploads OS Current Use Computational generation In-house (Colouring London team) Crowdsourcing Communities, schools and other Type Computational generation In-house Crowdsourcing Historic environment sector Age Computational generation In-house Crowdsourcing Historic environment sector and construction industry Size Computational generation In-house Crowdsourcing (minor) Construction industry (new build) and historic environment sector Construction Computational generation In-house Bulk uploads Construction industry, energy research Crowdsourcing Historic environment sector and construction industry Streetscape Computational generation In-house Crowdsourcing (minor) Citizens Team Crowdsourcing Historic environment sector Bulk uploads Award bodies Planning Bulk uploads Planning and historic environment sector Crowdsourcing (for demolitions) Citizens Live-streaming (proposed) GLA Sustainability Computational generation In-house Bulk uploads UCL Energy Institute, MHCLG, BEIS Crowdsourcing (minor) Citizens Dynamics Crowdsourcing Historic environment sector Community Crowdsourcing Citizens 9.13.2 Bulk upload of existing open datasets Bulk upload was the first data production method tested for Colouring London, with age and land-use from the Camden/Westminster sample being the first datasets to be uploaded. Other bulk uploads include open TOIDs from OS, conservation data from London local authorities, aggregated listed building data from Historic England, and BREAM data from the BRE. Tower block data has also been offered by Edinburgh University. The surprising finding was how few open bulk uploads of relevant data were available for London. The advantage of bulk uploads from trusted sources is that their quality is likely to have been monitored or be under ongoing monitoring, and also that they are already known to be of interest and relevance to a specific audience group. The disadvantage is that at present these uploads are limited, although of course OSMM, VOA property tax data and other national datasets may be released in the future. Bulk uploads also in some cases, such as the NHLE, require significant cleaning to be able to be integrated. A bulk upload queuing system, allowing datasets to be more easily pre-checked and uploaded in-house, is planned. Examples of existing bulk datasets gathered to date are shown in Table 9.5. These were either identified by the in-house team or recommended by partners (although not all of those listed below have yet been uploaded). Table 9.5 Colouring London, examples of bulk uploads Category Dataset Source Cleaning required Location Open TOIDs OS No Open UPRNs OS Some OSM IDs OSM No Current use Land use (NLUD) group Camden/Westminster sample No Type Type (NLUD) group Camden/Westminster sample No Age Type (NLUD) group Author’s Camden/Westminster sample No Size Height Environment Agency (prepared by UCL Energy Institute) Yes Construction - - Streetscape - - Team BREAM awards BRE, RIBA, etc. Planning Listed buildings (aggregated) Historic England Yes Conservation areas Historic England and individual London local authorities No Sustainability DEC MHCLG No EPCs MHCLG Yes Dynamics n/a n/a n/a Community n/a n/a n/a 9.13.3 Computational generation using inference Most attribute data has been generated to date – and will likely be generated for Colouring London overall – using automated methods and inference. In Summer 2020, 750,000 age data points produced by Flora Roumpani using Kiril Stanilov’s road network data and the author’s method were uploaded onto Colouring London. Algorithms for mixed-use retail data geolocation are also being worked on with Stephan Law. Historic environment sector ecosystem links required to facilitate the development of expert checking feedback loops are being developed with Historic England. Computationally generated height and adjacency data have also kindly been provided by UCL Energy Institute, coordinated by Steve Evans and Dominic Humphrey. Examples of open datasets proposed for computational generation during the second stage are starred in Table 9.6. Table 9.6 Colouring London, potential for computational uploads Category Dataset Generator Derived from/using Location OSM centroid coordinates In-house OSM/open UPRN Current Use Domestic buildings default In-house Made default for all buildings Retail buildings* S. Law (ATI) with author Inferred using OSMM and historical network data Type Building adjacency type UCL Energy Institute UCL Energy Institute data Age Construction date In-house: F. Roumpani (FR) with PH Inferred from historical network data Size Storeys* In-house (FR and PH) To be inferred from using age data and footprint

Roof shape* 	In-house (FR and PH)	

Construction Construction system* In-house (FR and PH) Main materials* In-house (FR and PH) Secondary materials* In-house (FR and PH) Streetscape Modal height UCL Energy Institute UCL Energy Institute Pavement width (average)* N. Palominos N.Palominos Street width* N.Palminos N.Palminos Gardens/ Greenery* tbc tbc Proximity to public green spaces* Colouring London spatial join Use OS open greenspace data
Block permeability* tbc tbc Team n/a n/a n/a Planning n/a n/a n/a Sustainability Lifespan and repairability* tbc tbc Dynamics n/a n/a n/a Community n/a n/a n/a 9.13.4 Crowdsourcing and the importance of the historic environment sector Crowdsourcing has been identified within this study as of considerable value in terms of checking accuracy of computationally generated building attribute data, especially in areas such as age. It has also been found to be important in the generation of types of building attribute data (that cannot be generated using automated methods) relating to dynamics or to whether communities think particular buildings operate well, or where a specific degree of precision (as in the case of age data) is required. Crowdsourcing is also necessary if a whole-of-society approach to emissions reduction and urban sustainability is to be advanced as it allows many different types of audience to easily become practically involved. In terms of crowdsourced entries to Colouring London so far, age is the data type that has been most commonly uploaded. Indeed, as the Leaderboard shows, over 40,000 age data entries have been made by a single user (with spot checks finding these to have a high degree of accuracy, but that sources are not included). Colouring London age data (produced through computational generation and crowdsourcing) has also been described by UCL Energy Institute London as the age data of choice for its London Building Stock Model (Steadman et al., 2020). Table 9.6 highlights categories and subcategories to which crowdsourcing has been identified as of particular relevance. Table 9.6 Colouring London, data categories relevant to crowdsourcing Category Subcategories to which crowdsourcing is relevant Main anticipated data sources Location Building Name only Diverse user groups Land use All subcategories Diverse user groups but including schools Type Original Use; Date of Last Change of Use Historic environment sector Age All subcategories combined with computational generation Historic environment sector and construction industry Size Storeys (especially attics and basements) – combined with computational generation Historic environment sector Construction Enhancement of computationally generated data Historic environment sector and construction industry Streetscape Enhancement of computationally generated data Diverse user groups Team All subcategories Historic environment sector and construction industry (new build) Planning Demolitions threats and completions, and completions of new build only Historic environment sector, residents, commercial owners Sustainability Retrofit only Residents and commercial owners Dynamics All categories Historic environment sector Community All categories Residents For crowdsourcing platforms to be successful, voluntary contributions of data must be of mutual benefit, the submitter gaining satisfaction from giving data while the platform benefits from receiving it (Estellés-Arolas and González-Ladrón-de-Guevara, 2012). This requires clear and well-thought-out mechanisms to deal not only with the management and storage of collected data but also with issues such as malicious engagement and the tailing-off of user interest (Lauriault and Mooney, 2014). Attracting and sustaining high-quality reliable editors is known to be difficult. Wikipedia records that only 0.4% of its users are also contributors and that only 0.04% of these are regular contributors (Wikipedia, 2019). The vast majority of edits on Wikipedia and OSM also made by men (Wikipedia, 2021; Gardner et al., 2019).
The main to harnessing the vast body of expertise held within the historic environment and community planning sector shown in Table 9.6), and to facilitating a whole-of-society approach as found in the development of the Building Exploratory, is to design an easy-to-use, attractive platform where both high-quality colour graphic design and simplicity are core components, and where editing tools are all specifically designed for non-technical audiences interested in adding data on buildings and their history. Examples of features designed in response to requests from this sector include the lifespan feature within the Dynamics section (for lifespan data), the Team and Community categories, and weblinks allowing additional historical information to be provided. (Issues of lack of building numbers and the need for better integration of images were raised but have not yet been addressed.)
Work has begun to develop a schools land-use data upload programme and to engage construction industry bodies in supplying information for the Construction and Team sections. However, most effort to date has been given to linking into and working with London’s trusted historic environment and community planning network, as this network holds expert knowledge of local building stocks and their evolution and has the skills and motivation to help build and maintain a high-quality building attribute database for the city. In 2016, the author highlighted opportunities for the historic environment sector to collaborate with data scientists and to radically reposition itself at the forefront of the sustainable and intelligent city debate, owing to its ‘unrivalled knowledge of the building stock and its evolution and of the impact of change’ (Historic England, 2016). Since this time, Historic England has committed strategic grants to Colouring London totalling £150,000, indicating the potential its sees the project as having in delivering conservation sector goals. As discussed in Chapter 2, questions of key interest and relevance to this sector are sustainable city development through incremental adaptation; retention of uniqueness; building reuse and lifespan extension; demolition tracking and designation. All of these are dealt with by Colouring London. The voluntary conservation movement, which has operated in London for over a century (as noted in Chapter 5), is at local level extremely well organised, well informed, practical, creative, forward-thinking, community-focused and effective. Data contributions are likely to be sustained with few prompts. Consultations with community planning groups in Camden and Islington have shown substantial interest in using Colouring London to record and visualise data. Colouring London in turn offers the historic environment community a unique tool, allowing it, without any technical know-how, and through the simple, collaborative, and enjoyable process of colouring buildings, to create and manage a public database on the evolution and current composition of the stock. This database is relevant to this community’s work in monitoring and managing older buildings, in celebrating built heritage, and in promoting lifespan extension and reuse. It also allows historians, for the first time, to help scientists build more accurate models and simulations on the performance and dynamics of the stock. Coleman et al. describe drivers of ‘deliberate’ (as opposed to ‘accidental’) crowdsourced uploads as including altruism; professional or personal interest; intellectual stimulation; protection or enhancement of personal investment; social reward; enhanced personal reputation; finding an outlet for creative and independent self-expression; and pride of place (which could simply mean adding detail about your street or building on the map) (Coleman et al., 2009). All of these drivers are relevant to the historic environment sector. In Table 9.7, historic environment groups, and those interested in this area, are classified, following Coleman et al., as Neophyte contributors (having interest and time but no background); interested amateurs (having begun background reading); expert amateurs (knowing a lot but not earning a living from it); expert professional contributors (having studied and practiced a subject and relying on it for a living, and that can be used in relation to opinions); and expert authority contributors (having an established record, reputation and livelihood that can be lost if their credibility is even temporarily damaged). Of these, the expert amateur class, represented by local amenity societies, is considered likely to provide the most data.
Table 9.7 Grouped contributor classes for Colouring London relevant to the collection of building age data. Relevant historic environment stakeholders and potential contributors to building-attribute data based on Coleman et al. classifications Stakeholder classification Home owners/occupiers of older properties Neophyte (with a small percent overlapping other categories) Primary and secondary school children involved in local history studies Heritage site visitors Students of architecture, architectural history, planning, or historic building conservation Expert amateur Local arts, architecture and history societies/groups Historic-building and Blue Badge guides Neighbourhood fora Local civic societies, local amenity societies and community-led planning groups Historic-building and preservation trusts National amenity societies Architectural historians (outside academia) Expert professional Independent heritage organisations (e.g. National Trust) Specialist educators (university/school/freelance) Architectural journalists Construction industry professionals (practising and retired) Professional bodies – RIBA, RICS, Royal Institute of Engineering etc. Local authority officers in planning, housing, conservation University departments/research relating to building conservation; historic building surveys and architectural history; urban morphology; architecture; surveying; civil engineering Expert authority Government departments and advisory bodies, including Historic England, VOA, MHCLG and BEIS

Experimenting with data capture using semi-automated methods and methods of collecting longitudinal data. 8.1 Introductory remarks In this chapter, sample data captured in Chapter 6, and findings from earlier chapters, are brought together to begin to explore automated methods of data generation, to allow the prototype platform to release large-scale open age datasets from the outset. Simple methods of analysing dynamic behaviour using Camden/Westminster sample data, and data generated from historical maps described below, are also tested. These illustrate the speed that insights can be made into stock composition and dynamics where microspatial data are available. Owing to time constraints, and the scale of work involved in platform set-up (including stakeholder consultation; software engineering fundraising and management; and interface design) only limited experimentation has been possible. Methods described below, of a number tested, are those considered to warrant further research. The chapter describes exploratory work relating to: • semi-automated methods of generating age and land-use data; • rapid assessment methods relating to the speed of densification in plots in London; • methods testing a building’s propensity to convert its use or to host mixed uses; • the use of age diversity as a potential indicator of resilience, and of vulnerability to demolition; • the value of longitudinal data in identifying locked-in cycles of demolition in London spanning over 100 years. Within the first four areas, methods data capture relating to ‘elastic’, ‘static’ and ‘campus’ dynamic tissue types for London are also described. 8.2 Generating open land-use data using semi-automated methods & geolocating ‘elastic’ tissue in London The association of retail with high accessibility routes in London has already been recorded in a number of studies (Hiller, 1996; Crook and Smith, 2010; Stanilov and Batty, 2011; Masucci et al.,2013). The geolocation of non-domestic and mixed-use buildings along high accessibility routes has been illustrated by the LBSM. Stanilov and Batty have also identified links between high accessibility routes and preurban networks (Chapter 3), describing these, in their work with Masucci, as forming the ‘backbone’ of London (Masucci, 2013, p.3). Scheer (Chapter 4) has shown that high accessibility routes are characterised by ‘elastic’ tissue, which is a heterogeneous dynamic tissue type along which retail clusters. Torma et al. (Chapter 4) have visualised, at building level, the ongoing cycle of incremental demolition and construction operating on London’s suburban high streets, and along this type of route. The association of retail with mixed use, and the spatial distribution of retail/residential building along high accessibility/preurban routes was also identified and illustrated in Chapter 6, using the Camden/Westminster sample, and in work with Kalla, described in Chapter 4. In Figure 8.1, preurban routes are visualised for Greater London for the Roman period in red, the medieval period in blue, and for the post medieval period up to 1786 in yellow. (The choice of 1786 as a cut-off point for the preurban period is based on Stanilov and Batty’s 2011 definition, which itself is based on the availability of London-wide maps from around the turn of the 18th century.) Saxon sites, from which many of London’s high streets were to spring, are shown as black dots. Roman, Saxon and medieval data were provided by Peter Rauxloh at the Museum of London Archaeology (MOLA), and 1786 data by Kiril Stanilov. These networks can be seen to compare with OS’s London high street distribution (Ordnance Survey, 2019a) as shown in Figure 8.2.

In this first experiment, the hypothesis tested was that preurban networks could be used to generate open datasets, for London, relating to retail and mixed land use. Mixed use buildings from the Camden/Westminster sample, observed during collection to often contain retail use, were first mapped in ArcGIS alongside 1786 network data. In Figure 8.3, mixed use, shown in red, can be seen clustering along the preurban network, shown in turquoise (much of which overlaps with much earlier medieval routes). In contrast, little mixed use appears to cluster along 1830 streets, shown in yellow (the next year for which Stanilov’s street network data are available). An anomaly was however found in Park Street, a small, diverse, shopping street forking west off Camden high street, shown boxed in Figure 8.3. Here mixed retail/residential buildings were found to cluster along the 1830 rather than 1786 route. It was therefore hypothesised that the route might be in fact much older and had perhaps been omitted from the map from which Stanilov’s 786 network data were generated. To test this, MOLA’s medieval archaeology map for Greater London map (Museum of London Archaeology, 2015) was checked. As shown in the inset image in Figure 8.3. This shows that Park Street does in fact represent the shortest path between two Saxon settlements, likely to have developed over time into a well-used route. MOLA’s maps were also used to check a second anomaly slightly to Park’s Street’s north. This was also found to run along the shortest path between two Saxon sites. Owing to the observed spatial relationship between retail/residential and the preurban network the hypothesis was tested that preurban networks could be used to rapidly generate open retail and mixed use data for Greater London. As mixed use data for Greater London were not available to the study for verification, the hypothesis was able only to be tested in relation to retail. This was done by using OS’s AddressBase Plus product (accessed 7 December 2017) to measure the percentage of retail buildings in Greater London clustering along the preurban route. The AddressBase Plus dataset for London comprised (in 2017) 240,451 OSMM ‘commercial’ footprints (CR code, 102357). The commercial class is in fact very broad and includes many types of non-domestic land use (Ordnance Survey, 2021f). From the commercial dataset, 102,357 OSMM polygons were extracted for the secondary description of ‘retail’. 50 and 75m wide buffers were then generated in ArcGIS from the centreline of Stanilov’s 1786 network, for the whole of London. The 50m buffer was designed to capture buildings facing directly onto commercial roads, and the 75m buffer to also capture buildings set immediately to their rear. A spatial join was then generated between OSMM polygons within both preurban network buffer sizes, using the OS AddressBase Plus ‘commercial’ dataset, and then repeated for the AddressBase Plus ‘retail’ dataset. The method was also repeated for 1830 and 1880 networks to allow clustering to also be compared with later routes. The number of ‘commercial’ and ‘retail’ polygons falling within each size of buffer, for each network age was then tabulated, as shown in Table 8.1. The spatial join was kindly carried out by Bin Chi at CASA, as the join was quick to execute, and Chi had already secured permission to access the OS AddressBase Plus product, and had OSMM polygons downloaded for the whole of Greater London. Table 8.1 Number of commercial and retail OSMM AddressBase Plus polygons found within London within a 50m and 75m buffer of 1786. 1830 and 1880 networks Percentage of commercial and retail polygons falling within fifty- and seventy-five-metre buffers of Stanilov’s vectorised street network data for Greater London, 1786, 1830 and 1880 Polygon type All 1786 network All 1830 network All 1880 network No. OSMM polygons assessed within 50m buffer No. OSMM polygons assessed within 75m buffer No. OSMM polygons assessed within 50m buffer No. OSMM polygons assessed within 75m buffer No. OSMM polygons assessed within 50m buffer No. OSMM polygons assessed within 75m buffer Total OSMM polygons assessed in Greater London Commercial 156,606 169,729 172,724 184,614 190,296 201,735 240,451

Retail 79,107 82,580 85,337 88,151 90,439 93,082 102,357

Street network % commercial polygons in 50m buffer % commercial polygons in 75m buffer % retail polygons in 50m buffer % retail polygons in 75m buffer 1786 network 65.13% 70.59% 77.29% 80.68% 1830 network 71.83% 76.78% 83.37% 86.12% 1880 network 79.14% 83.90% 88.36% 90.94% % falling along new streets built 1786–1830 6.7 % 6.19% 6.08% 5.44% % falling new streets built 1830–1880 14.01% 13.31% 11.07% 10.26% Nearly 81% of all polygons classified in AddressBase Plus as ‘retail’ were found to fall within a seventy-five-metre buffer of the 1786 route, and 77% within the fifty-metre buffer. For ‘commercial’ buildings as a whole, this was slightly lower, with 71% found within the seventy-five-metre buffer and 65% within the fifty-metre buffer. Only around 6% more ‘commercial’ and ‘retail’ OSMM polygons were found on new road segments built between 1786 and 1830, for either of the buffer sizes, and only around 12% on segments built between 1786 and 1880. The results confirmed that vectorised data for preurban routes has potential value in supporting the generation of open retail datsets. The method is considered sufficiently robust to warrant testing in other cities, and is easily reproducible where late-18th-century maps (from which streetnetworks can georeferenced and vectorised) and OSMM footprints are available. It is also considered useful for the rapid geolocation of ‘elastic’ tissue, and the capture and release of data on dynamic properties of buildings built along these routes (Chapter 4). Suggested second stage work includes more detailed analysis of the number of pure retail buildings, compared to mixed use retail buildings, falling along preurban routes. To support this, the author is currently working with Stephen Law at the Alan Turing Institute to design an algorithm using the 1786 network data, and ‘rules’ in relation to footprint size, shape and relation to plot. Verification is possible using OS’s highstreet dataset for London, available to universties under a public mapping agreement licence. Tracking fluctuations in the proportion of retail, and size of retail buildings falling along preurban routes, and the socio-economic impact of this, will also be important in future, particularly in the light of recent planning policies which will encourage the replacement of high street retail in London with residential stock (Gardiner and Hopkirk, 2020). An obvious question arising, based on findings above, is: Where are new shops, which have a natural affinity to preurban routes/ along which retail uses have clustered for centuries, meant to go in future? 8.3 Measuring volatility and transformation of land uses The capacity of land uses to convert and mix has been described by Jacobs, Page and Moffatt (Chapter 3) as an important characteristic of resilient and sustainable urban form. The LBSM has illustrated the extent to which such mixing is occurring in London’s stock today, and work with Matthias Kalla (Chapter 4) has shown the propensity for residential use to transition to mixed retail and residential use along pre-urban routes/’elastic’ tissue. Conversion to pure commercial use along ‘elastic’ tissue, where ‘elastic’ tissue is denser (such as the historic core), was also observed during Camden/Westminster sample collection. In this section a method for measuring the ‘volatility’ of different building typologies, and their propensity to adapt through mixing or converting is presented. This involves a comparison of ‘original land use’ and ‘current land use’ data from the Camden/Westminster sample. An assessment is also made of the likelihood of conversion occurring along preurban routes. NLUD ‘Order’ and ‘Group’ categories for ‘original use’ data and for ‘current land use’ were first accessed from the Camden and Westminster sample, with only current land-use groups represented by over fifty OSMM polygons extracted. Data were then exported into an Excel pivot table and percentage change between original use and current use calculated for all land-use types. Buildings were sorted from highest to lowest percentage change to show the propensity of each land use for conversion. These were then grouped into those falling within a 50m buffer of the 1786 network for the sample area and those not. Results are shown in Table 8.2 (by land use [NLUD ‘Order’]) and Table 8.3 (by building type [NLUD ‘Group’]). Residential stock can be seen to be particularly stable and resistant to change, with 90% of polygons (of a sample of over 15,714) retaining their residential use within the whole sample, regardless of date of construction. This is in line with Scheer’s description of ‘static’ tissue as being exceptionally stable. Where change did occur, it was concentrated along pre-urban routes, where, as Kalla had recorded (Chapter 4) front gardens had often been built over and ground floors converted to retail use. This is demonstrated by the fact that only 69% of residential polygons in the sample area, built along these specific routes, have sustained pure residential use over time, compared to 90% for the sample as a whole. In the table, 100% of original retail/residential use, i.e. where shops were purpose built with dwellings above (in some cases also used for offices or storage) remained unchanged by 2016. This reflects the observation (made during sample data collection) that purpose built retail/residential buildings are a particularly stable, flexible and useful typology which should be retained where possible. The most unstable land uses in the sample related to manufacturing buildings (factories) and storage buildings (warehouses), where only 39% and 30% of all polygons retained their original use respectively. Both were found to have converted to a range of activity uses, including offices, dwellings and cultural centres. Polygon description by land use (NLUD ‘Order’) % polygons with same original use and current use (2016) within 1786 50m buffer only % polygons with same land-use original and current use (2016) within whole sample No. polygons assessed within 1786
50m buffer No. polygons assessed in whole sample INDUSTRY AND BUSINESS/RECREATION AND LEISURE 100 100 1 1 INDUSTRY AND BUSINESS/RETAIL 100 100 301 572 INDUSTRY AND BUSINESS/TRANSPORT 100 100 1 11 RESIDENTIAL/INDUSTRY AND BUSINESS/RETAIL 100 100 14 40 RESIDENTIAL/INDUSTRY AND BUSINESS/TRANSPORT 100 100 4 5 RESIDENTIAL/RETAIL 100 100 1,036 1,428 TRANSPORT 97 97 244 1,080 RETAIL 94 92 205 329 RECREATION AND LEISURE 93 97 59 132 COMMUNITY SERVICES 88 91 425 868 INDUSTRY AND BUSINESS 85 81 635 1,195 UTILITIES AND INFRASTRUCTURE 84 82 64 106 RESIDENTIAL 72 90 4,054 15,714 DEFENCE 20 83 5 40 AGRICULTURE 0 0 8 11 RESIDENTIAL/INDUSTRY AND BUSINESS 0 1 8 RESIDENTIAL/TRANSPORT 0 0 120 238 TRANSPORT/RETAIL 0 17 6 6 RESIDENTIAL/UTILITIES AND INFRASTRUCTURE n/a 100 n/a 1 COMMUNITY SERVICES/RETAIL n/a 100 n/a 3 MIXED * n/a 100 n/a 2 RESIDENTIAL OR INDUSTRY AND BUSINESS/RETAIL (discounted) n/a n/a 107 109 UNKNOWN n/a n/a 13 29 Totals n/a n/a 7,303 21,928 *Discounted as data requires further disaggregation Table 8.3 Volatility of land uses within the Camden/Westminster sample by building typology (NLUD Group) Polygon description by building type (NLUD ‘Group’) % polygons with same original use and current use (2016) within 1786 50m buffer only % polygons with same land-use original and current use (2016) within whole sample No. polygons assessed within 1786 50m buffer No. polygons assessed in whole sample DWELLINGS/SHOPS 100 100 1,022 1,414 OFFICES/SHOPS 100 100 301 572 TRANSPORT TERMINALS 100 100 37 86 WHOLESALE DISTRIBUTION 100 100 2 81 DWELLINGS INSTITUTIONS 100 100 39 62 VEHICLE STORAGE 96 98 207 977 SHOPS 97 97 88 135 HOTELS, BOARDING AND GUEST HOUSES 95 96 61 75 EDUCATION 89 94 104 327 PLACES OF WORSHIP 84 89 50 113 DWELLINGS 69 90 3,952 15,574 COMMUNITY SERVICES 86 88 198 284 PUBLIC HOUSES and BARS 117 87 91 194 MEDICAL AND HEALTHCARE SERVICES 100 85 73 144 OFFICES 86 80 398 676 MANUFACTURING 31 39 172 338 STORAGE 16 30 63 98 DWELLINGS/VEHICLE STORAGE 0 0 120 237 The method was concluded to be of value in simulating discussion on mixed use stock and on the type and location of buildings able to extend their lifespans though mixing uses and conversion. The method was quick and easy. Its main limitation was that buildings were not weighted by age. The greater time available for older buildings to convert needs to be taken into account, as does designation which will inhibit change. Availability of open land use and original use data for the whole of Greater London would mean that the method could be tested incorporating all land uses found in the city, not just those represented in the sample. Future comparison of data across cities and countries would be of particular value to understanding whether universal rules in relation to mixing and conversion may exist. 8.4 Generating open age data using semi-automated methods & geolocating ‘static’ tissue in London In this section, a method is developed to geolocate ‘static’ residential tissue in London and to estimate its date of construction. ‘Static’ tissue makes up most of Greater London; Smith and Crooks describe the city as being essentially suburban in character, ‘dominated by residential functions, with small-scale local retail and service centres’ (Smith and Crooks, 2010, p.39). Masucci, Stanilov and Batty point out how spaces between the preurban/‘elastic’ network in London have been gradually filled by residential tissue, mainly from the centre outwards until constrained by the Green belt (Masucci et al., 2013). These residential streets function primarily as access routes to-and-from homes rather than as connectors, as in the case of ‘elastic’ tissue, between commercial locations. In Figure 8.4, the growth of London between 1786 and 2010, is shown by visualising Stanilov’s historical street network data chronologically. A detailed web of routes can already be seen by 1786. Between 1786 and 1830, densification can be seen concentrated around the periphery of the ancient core (though expansion of medieval villages is not visible at this scale). Between 1830 and 1900, development can be seen occurring across Inner London, reflecting growth in population, and house building, during the Industrial Revolution. In the interwar period, rapid suburban development is visible in Outer London. From the post-war period, clear patterns of development can no longer be seen. In Figure 8.5, the location of new roads constructed during each survey interval is made clearer by erasing one network from the next, in chronological order. This is done using the Arc ‘erase’ tool, for example, the 1786 network is first erased from the 1830 network to generate only roads built between 1786 and 1829, the 1830 network erased from the 1880 network etc. The process, which took only a few hours to complete, and was able to generate a dataset showing the position of all new roads built in Greater London within eight temporal intervals, from 1786 to 2010. Based on Masucci et al.’s finding that residential tissue fills spaces between high accessibility older routes, it was felt it could be reasonably assumed that all routes other than pre-1786 routes would be likely to represent residential streets. Furthermore, that the majority of dwellings built along these could be reasonably expected to have been constructed at approximately the same time. Based on these assumptions it was hypothesised that if a 50m buffer was used to capture OSMM polygons/ buildings falling along these streets, then buildings could be approximately dated simply by assigning the street network date interval to each polygon/building. For example, a construction date interval of 1900–1919 would be assigned to buildings built along Stanilov’s 1920 network. This was also viewed a useful and quick method of geolocating ‘static’ tissue in London. The method was kindly tested for the whole of London by Flora Roumpani, using Stanilov’s street network data for 1900 (representing the interval 1880–1899 interval), for 1920 (the 1900–1919 interval) and 1940 (the 1920–1939 interval). This generated around 750,000 building age dates. These were later uploaded onto the Colouring London as described in Chapter 8, and a sample is shown in Figure 8.6. On the platform, estimated age is derived from the beginning of the interval, i.e. 1920, and the earliest and latest possible construction date given as 1920 and 1939 respectively. Accuracy was checked in two limited ways owing to time constraints. Firstly a rapid visual assessment was made using EDINA Digimap Ancient Roam, to see whether age generated for the selected time intervals roughly corresponded with blocks of new development arising on historical maps, within corresponding interval dates. Areas, in the north, south, west and east of London were spot checked, for each of the three time intervals. A number of individual road segments in outer London were also analysed to gain some idea of the scale of error that might be occurring owing to the amount of infill and redevelopment occurring since the main construction phase. Analysis of Birkbeck Road, shown in Figure 8.7, gives an idea of the time-consuming process of manually checking and correct data. Here, pale blue represents buildings for which age has been auto-generated, using the above method, for the period 1880–1899. Buildings in other colours, which were also initially also pale blue, have all required correction. Those in purple fall into the next age interval but also represent new build on greenfield sites. Those in orange and yellow represent later replacement buildings built between 1960 and 1999. Dates were checked by first comparing the earliest and latest EDINA OS Historical Roam maps, this being considered the quickest method available (as identified in Chapter 5) of estimating approximate date. The author also used facade images from Google Street View, and knowledge of map footprint shape to assess whether buildings had been replaced. For Inner London areas, London County Council (LCC) bomb damage maps were also considered useful. The dating method was not as detailed as that described in Chapter 6. From these cursory assessments, it was observed that rebuilding and infilling, in a piecemeal fashion, at small-scale, was relatively common Outer London, between semi-detached and small-detached buildings. In Inner London (based on observation of the Camden/Westminster sample only), patterns of redevelopment were found to be different. Demolition of nineteenth- and early-twentieth-century streets tended to have occurred as groups, often as a result of bomb damage or social housing estate insertion. Where older terraces had managed to survive they appeared to also have often retained their integrity as a unit, with redevelopment and infilling seemingly inhibited by the tight row structure and sharing of party walls. Where small-scale demolition did occur, end of terraces appeared most vulnerable. Mid-terrace demolition in the sample, when checked against LCC bomb damage maps, was also found to often have been a result of bomb damage. The test concluded that relatively reliable, baseline volume age data for ‘static’ residential tissue for London, could be generated using the semi-automated method extremely quickly. Roumpani estimates that it took around half a day to fully process data for each age interval, ready for upload to Colouring London. However it was concluded that to develop the most accurate datasets possible, all data may have to be checked to pick up later small infill and rebuilding. This process was found to be extremely time consuming, requiring expert knowledge, with the correcting of the Birkbeck Road sample alone taking around half a day, i.e. the same time it took Roumpani to auto-generate approximate data for an entire age interval. Finding methods of encouraging historians and building specialists to collaboratively check data at local level, for the city as a whole, was therefore confirmed as critical. Once auto-generated data have been checked by a network of historic environment specialists working at the local level, open age data of unprecedented accuracy and quality will be able to be, relatively quickly, produced for London. Furthermore open training data, able to be used in computational methods designed to reduce reliance on expert checking of age in other UK cities, will also be made available. Provided that Krenz’ method of spatially tracking demolition, or alternative approaches to demolition tracking can be applied, the age dataset’s shelf life will be unlimited, with historians also able to fine tune entries as-and-when new information becomes available. Three future areas of work arose. The first, in relation to tapping into the historic environment ecosystem, to encourage checking of age entries at the earliest date; the second, to assess auto-generated data against the Camden/Westminster sample, for which the age of buildings has already been checked manually; the third, to explore ways in which feedback loops between historians, and data scientists could be supported and extended to include algorithm design. As Flora Roumpani has suggested, algorithms able to identify and classify buildings in the ‘static’ tissue, based on footprint shape and position, which do not match their neighbours would be very useful. However, though algorithms could potentially limit inaccuracies and narrow down the number of buildings needing to be manually verified, expert checking and enrichment of data categories, and the adding of links to sources of age information, will still be required.

Key types of open building attribute data identified as requiring capture

Analysis of studies in sustainability science, urban science and the science of form , together, enable four main types of accessible, comprehensive, accurate, granular data to be identified. These are

  • comprehensive microspatial data on stock composition
  • comprehensive microspatial data stock performance (socio-cultural, economic and environmental
  • comprehensive microspatial data on current dynamic behaviour e.g nw builds, adaptation and demolitions
  • microspatial time series data showing stock evolution e.g construction, demolitions, change of use, typology obsolescence of typologies over time

Specific types of data required within relevant scientific studies or extrapolated from them were then identified as listed below: Data on:

  • Location data
  • Land use
  • Typology
  • Age/Construction date
  • Construction system/materials, and repairand retrofit potential
  • Dimensions
  • Street context
  • Longevity/Lifespan (Construction + demolition dates for demolished typologies )
  • Sustainable Performance (e.g. energy rating, quality marks, citizen feedback on quality of operation)
  • Potential for lifespan extension/incremental adaptation

Significant gaps in data on developer performance, and on community feedback on quality of building operation were identified as part of this process

Capturing and making these types of data available offers multiple opportunities. These include:

  • provision of access to free spatial data on the stock for the first time for use in energy analysis, housing, planning, architecture, health analysis etc by academia industry and government
  • better tracking of stock performance (socio-cultural, economic and environmental)
  • support for energy analysis particularly through age data provision, and increasing efficiency of retrofit programmes through release of typology data
  • geolocate areas of resilience and vulnerability/risk in urban areas explore the role of diversity in developing resilience , and locked in negative patterns/cycles using timeseries/lifespan data
  • provision of spatiotemporal data to aid accuracy in predictive modelling
  • provision of volume data for use in exploitation of AI and machine learning techniques to analyse complex underlying patterns impeding/supporting sustainability
  • tracking quality of developer/landowner performance over time
  • advancement and celebration of a whole-of-society approach to urban sustainability through collaborative maintenance of data and visualisation of richness and diversity in our citiesEditor polly64. editing in progress

**## Crowdsourcing introduction **

Developing a sustainable management model; Direction, Consultation, and Collaborative maintenance

In Colouring London, VGI and Citizen Science are to large extent integrated, with VGI data used, in part, to help solve a set of defined scientific problems. Hacklay argues that VGI is seen as of increasing value to scientific progress, with hundreds of scientific papers resulting from Christmas Bird Counts and other long-running volunteer monitoring programmes (Haklay 2010; 2012: 105–122). Lauriault and Mooney also highlight the role of Citizen Science in monitoring resources (Lauriault and Mooney, 2014). ‘Non-scientists’ are increasingly being involved in the generation of high quality scientific databases, or in the assessment of existing scientific data (Irwin, 2019) though differences of opinion exist on benefits of a more top-down approach and one more loosely aligned to a centralised structure (ibid). Zooniverse, the biggest citizen science project in the UK with one million volunteers, allows users to assist a variety of scientific projects from identifying galaxies to transcribing Shakespeare’s texts. Here scientists can set a project for volunteers to engage with and over time appoint ‘moderators’ from within the volunteer teams to facilitate discussion, promote the project and contribute to the research (Zooniverse, 2019). Other examples of UK citizen science projects include Cell Slider, for which volunteers have analysed over 2 million images of cancer cells, Dark Sky meter, where citizens provide data on light pollution, and Patientslikeme where patients share data on the side effects and impact of drugs (Nesta, 2019). Hacklay argues that Citizen Science challenges the notion that only dedicated, full-time researchers can produce scientific knowledge or solve complex scientific problems and has helped shift the way in the way professional and scientific organisations are now perceiving volunteered data (Hacklay, 2013). He cites an example of the way in which a long-standing problem relating to protein-structure and the prediction of viruses was solved through a collaboration of scientists working with (non-scientist) expert players of the computer game Fold it (Khatib et al. 2011 quoted in Hacklay, 2013). Irwin argues that Citizen Science is ‘growing bigger, more ambitious and more networked’ (Irwin, 2019), highlighting the need to embed it into the routine of science way to support the policy-making process. ‘The movement is surfing wider societal forces, including a thirst for data; the rise of connectedness and low-cost sensor technologies; and a push to improve the transparency and accessibility of science. Increasingly, government institutions and international organizations are getting in on the action’ (ibid). 7.4.3.6 Contributor types Coleman et al. provide important insights into designing for volunteer contributors. Coleman discusses how the generation of ‘deliberate’ as opposed ‘to ‘accidental’ users, and greater creative collaboration with them, was likely to characterise the next generation of VGI (Coleman et al. 2009). Drivers for contributions included altruism, professional or personal interest, intellectual stimulation, protection or enhancement of personal investment, social reward, enhanced personal reputation, and providing an outlet for creative and independent self-expression and pride of place (which could simply mean showing that your street or building is on the map). (ibid). The authors characterised contributions as either ‘Constructive ‘or’ Damaging’. Constructive motivations were identified as including legitimate new content, constructive amendments, validation and repair, and minor edits and format changes. Damaging motivations were identified as mischief, bias, and malice and/or criminal intent (ibid). Coleman et al. identified five classes of VGI contributor, though not necessarily mutually exclusive and looked at the type of data offered by different types of contributor, contributor characteristics (such as robot/human), frequency, quality and reputational liability (i.e. whether they could be sued if opinions proved incorrect, inadequate or libellous) (Coleman et al, 2009). In Table 7. 5 the five contributor types identified by the authors are tabulated and their likely degree of contribution mapped against their level of knowledge. Figure 7.5 reproduces Colman et’s al’s graphic of the relation of the knowledge of Geographic Information of different contributor types, set against the degree of VGI contribution.

Contributor description Attributes Neophyte No formal background in subject but with interest, time and willingness to offer an opinion. Interested Amateur Interest in subject, has begun background reading, consulting and is gaining experience in appreciating the subject. Expert Amateur May know a lot about the subject, practise it passionately but not earn a living from it. Expert Professional Has studied and practices a subject and relies on it for a living, and may be sued if the products, opinions and/or recommendations are considered inadequate, incorrect or libellous. Expert Authority Has widely studied and long practiced a subject to a point where he or she is recognised to possess an established record of high quality products serves and/or well informed opinions and stands to lose that reputation and perhaps livelihood if that credibility is even temporarily lost.

Examples of relevant contributors to the Colouring London platform Coleman et al. contributors classification Residents (not working as experts) Neophyte School children Construction professionals Expert professional Local authority- planning and building control Expert academic or government advisors (e.g. The Survey of London, Historic England, VOA) Expert authority Table 7.5 (Top) Contributor descriptions taken from Coleman, Georgiadou and Labonte, 2009). (Bottom) Colouring London data generation and sourcing approaches (possibly move image to Chapter 1)

Figure 7.5 Coleman, Georgiadou and Labonte, 2009 REDO COPYRIGHT LOUIS Examples of ‘neophytes’ relevant to Colouring London are non-expert residents and schoolchildren. In both cases, significant work is anticipated to be required to harness the input and interest of this group, but as shown by both Citizen science projects and the 1920s land survey, significant rewards may be gleaned as long as specific types of data targeted for collection are carefully considered. For example, targeting schools to collect age data would make little sense when the same or less effort could be expended in accessing expert specialist knowledge from the historic buildings sector which would generate much more accurate data at a faster rate. Owing to the need to support ‘neophyte’ contributions, in the case of schools, the education advisory group for Colouring London comprised of organisations with pan London (or UK) reach (including Historic England, the RIBA and IHS) was set up by the author. Examples of Interested and Expert Amateur involvement in Colouring London are discussed in detail below in the context of the heritage and conservation sector. Expert professional contributors will include professional architects, surveyors and local authority staff, and expert authorities, those working in academia, and within government within specific research and policy areas.

7.5 Core audience groups and partnerships Core audience groups are summarised in Table 7.4 (See also Appendix 7.2). The requirements of audiences, and the potential benefits for them, were identified during the consultation process, within which a range of partnerships was developed, and designed in such a way as to create a win-win scenario for both parties. All organisations consulted were invited to identify new or existing data categories or features necessary to, or of potential to its work, and to the work of audiences it represented, and to make recommendations on how the platform could be made more interesting and relevant to its own audience group. The aim of this process was maximise the extent and quality of the knowledge base; to build a network allowing relevant audiences to be reached to use, monitor and enrich data; and to, where applicable, gather advice on data formatting and data standards. Logos offered by organisations provide quality assurance marks able to be used to attract other partnerships and collaborations. A key objective was to maximise the capacity of the platform to sustain itself in the long-term by gaining buy-in from as many sectors as possible and opening up opportunities to tap into existing funding streams. Significant input into subcategory selection was gained at this stage. The priority order for partnerships can be summarised as follows: academic (sustainability science/urban science, historical research); conservation (policy/sustainable development); open data ethics; community led-planning value/heritage sector; central and regional local authorities delivering sustainable planning, housing, energy agenda; professional bodies representing the construction sector; schools; commercial construction industry; housing providers; commercial property industry; tourist industry. The order reflects the author’s prediction of the scale of contribution of each type of partnership to the overall sustainable objectives and longevity of the platform. Core partners are listed in the introduction above. Information on others can be accessed at https://www.pages.colouring.london/whoisinvolved.

Core user/audience groups anticipated for Colouring London

User / Audience Group Benefits for platform Benefits for partners Academic researchers- sustainability science/urban science, historical research Expert knowledge on data required to support sustainability objects; evidence of value through research applications; expert input into data categories; association of platform with objective, academic standards; access to potential research funding. Open spatial statistics on characteristics of the current building stock at the microscale, and its long-term history able to be used to more accurately describe and measure the current stock and energy and waste flows. Also to provide data for resilience analysis and for use in rule based computational models that look to predict the long-term dynamic behaviour of the stock. Building conservation sector (government ) Expert knowledge of the evolution of the building stock. Access to data on designated assets. Joint research funding potential funding.

Better monitoring of built heritage. Supports evidence based research on socio-economic value of older stock. Highlight predictive knowledge of conservation sector. Celebrates heritage and encourages public participation. Promotes value of conserving both buildings and archive/historical sources. Supports grass roots action to reuse local buildings of value, promotes sustainable policies at local level. 

Heritage sector/community led planning groups Expert knowledge of the evolution of the building stock. Supports delivery of sustainable development at grass roots level. Institutions involved in Open Data standards Expert advice on open data standards. Use of Open Data Institute’s Data Ethics Canvas. Associates platform with highest data ethics standards essential to develop and maintain user trust. Promotes value of open data. Promotes the ODI’s open data ethical framework. Helps increase and drive open data release.

Central government departments (planning, housing, energy, business). Expert knowledge in relation to data quality and standards in relation to planning, housing and energy. Access to large-scale datasets. Potential access to grants linked to delivery of specific policy objectives. Free monitoring, visualisation and promotion of sustainability standards. Promotion of UK low carbon agenda and the wider UN Sustainable Development Goals and NUA Agenda. Increased transparency in planning as required in national Planning Policy Framework. Detailed data on the composition of the housing stock and anticipated lifespans- able to increase accuracy and availability in housing data for use in analysing relating to housing quality and supply. Open data on non-domestic land uses relevant to business and industry performance analysis. Open code able to reproduce platform in other cities. Visual test bed for promotion of opening up of geospatial data. GLA and London /local authorities Use of OSMM licence. Expert advice on regional policies designed to deliver London’s sustainability goals. Large-scale data input and monitoring. Collaboration with GLA allows for close collaboration on methods of increasing integration of Colouring London live Planning portal and statistical data generated by the LBSM. Potential access to grants linked to delivery of specific policy objectives. Free tool for London’s local authorities supporting measurement, monitoring and analysis of London’s stock. Promotes transparency in planning and ease of access to planning information/encouragement of public engagement at early stage. Promotes London’s low carbon agenda. Allows testing of performance methods to support behavioural change by local authorities (regarding release of data), industry (in building quality and reduction of energy and waste flows) and residents (in relation to carbon reduction).

London Schools Allows for delivery of education about urban sustainability. Necessary to drive behavioural change over the next decade. Free, visual, educational resource on London, cities and the building stock of relevance to cross curricular study. Demonstration of the role children can play in sustainability agenda.

London housing suppliers, commercial construction and property industries, Data input on the building stock Provision of free, geolated, spatial statistics/attribute data on London’s domestic and non-domestic stock. Pre-application access to buildings seen as important to communities. Celebration of high quality sustainable new build.
International city authorities/ researcher institutions Application of data in international research Application of open platform code in international context. International publicity Free access to data on a major global city. Access to open code allowing the platform to be reproduced free of charge platform code. London/UK tourist industry/ International visitors to London Publicity, Data input. Free visual resource for visitors providing information on buildings in London including its heritage.

London Residents Data input. Free visual resource providing information on buildings in London, planning and regulation. Offers simple way to help support London’s sustainable objectives

Table 7.4 Core user groups identified during HE consultation

7.5 Exploiting knowledge within the heritage, conservation and community led-planning 7.5.1 Extent of knowledge Translating users into contributors and then sustaining contributions is recognised with crowdsourcing studies to be difficult. In Wikipedia, under 0.4% of users are also contributors, and only 0.04% are regular uploaders (Wikipedia, 2019). Haklay observes that “You can get a lot of people for a short time investment, or very few people for a deep and intensive engagement, but you can’t get everyone doing it all the time,” (Irwin, 2019). The relevance of Colouring London to the conservation and heritage sectors and to community-led planning, and their importance to the platform as the foundation for much of its knowledge base has been discussed above. The willingness of these sectors to offer voluntary time to support heritage-related projects, and scale of their potential contribution were significant. In its Heritage Counts: Heritage Indicators report for 2018, Historic England records that in 2017 there were 69.8 million visits to historic properties., increasing by 4% from 2016 (Historic England, 2018). It also recorded high levels of membership of heritage organisations, with the National Trust recording over 4.9 million members, and English Heritage just over a million (ibid); of heritage volunteers, with The National Trust attracting over 60,000 volunteers, with over 615,000 historic environment volunteers recorded for England as a whole (ibid). 192 building preservation trusts were also recoded (ibid).This does not include national amenity societies, and the hundreds of local amenity, civic societies, of which there were 320 in England in 2014 (Hewitt, 2014) and community-led planning groups, all actively involved in monitoring and managing change in local areas. An illustration of the range of types of potential contributors and their classification according to Coleman et al’s is given in Table 7.6. The main attributes identified as of particular relevance to Colouring London, and common to volunteers and specialists working within the above can be summarised as follows: an interest or passion for older buildings; an interest in historical research and the observation of buildings over time; an understanding of the contribution of older stock to societies, an understanding of the value of evidence based research, data accuracy and the importance of recording data sources; an interest in protecting buildings of historic and/or architectural significance, and of local value from demolition and through the extension lifespans of older buildings through adaptive reuse, and an interest in contributing voluntary time for the common good. Contribution drivers of altruism, professional or personal interest, intellectual stimulation, social reward, and pride of place were all seen to apply. Examples of relevant potential contributors in the context of building conservation, heritage and community led-planning Coleman et al. contributor classification Visitors to heritage sites Neophyte Heritage enthusiast Interested Amateur (may overlap with expert amateurs) Students of architecture, architectural history, planning, historic building conservation. Local arts and architecture societies/groups Historic building and blue badge guides Expert Amateurs (may overlap with Expert professionals) Community-led planning groups Local civic societies and local amenity societies Historic building trusts Neighbourhood fora Local history clubs National amenity societies Architectural historians (outside academia) Architects, engineers and surveyors involved in historic buildings Expert Professional Academics working within building conservation, historic building surveys and architectural history, urban morphology, architecture , surveying, civil engineering Expert Authority Local authority historic building/conservation officers Historic England and other government funded staff involved in historic building/conservation research Table 7.6. Conservation sector contributor classification 7.5.2 Supporting input from the conservation sector. Owing to the importance of harnessing a knowledge base and exploiting the potential scale of interest and voluntary support within the London’s conservation sector (particularly as the capital has the highest concentration of heritage assets in the UK and the fact that all contributor classes identified by Coleman et al. were represented) a specific design strategy was developed to engage and to encourage the above audiences to volunteer in the project to facilitate ‘deliberate’ and sustained uploads. This process had in fact already begun within the setting out of the Heritage Protection Commissions grant application, with data categories already checked and adjusted to ensure relevance to the historic building sector and, as previously discussed, ‘Like me?’ included to create a new type of tool for communities to allow buildings under threat to be highlighted for others to see, and the predictive knowledge of local communities to be captured. However, during the interface design process a more detailed consultation was found to be needed, with the author working with representatives from Historic England, The Survey of London and with local planning groups in Archway and Somers Town. A community engagement and testing programme was run in the latter area in which community members used the platform to develop a conservation area proposal to allow them greater say in managing change in the area, and from which a user video was also created . Through consultation, a number of issues were identified. The design solutions developed by the author and implemented by Russell will post-launch be able to tested on a much wider scale. The first issue identified was that, whereas in heritage related crowdsourcing projects such as the Survey of London’s Whitechapel project’ and Historic England‘s ‘Enriching the List’, images and text could be added, in Colouring London they could not. In the case of images this was in relation to data storage concerns though methods of integrating images was seen as an important next stage. In the case of text, this was firstly because the platform was set up to collect statistical data, and secondly because free text could not be added without moderation in view of the security issues discussed below. The difficultly was to find a way of encouraging volunteers to translate their knowledge, as well as information derived from historical images, records and texts into statistical data. Explicit and relevant returns, and rewards were stressed by Hacklay as being essential to motivate users to contribute and edit data, and to encourage volunteers to sustain contributions. The solution employed was to encourage users to work individually, or collaboratively (working on a single computer or on separate computers in separate places), to ‘colour in’ data for their local areas, just as if they would colour a map by hand except that here data needed to be added to a subcategory to make the building colour. This use of colour as a motivator, reward and essential feature in visualising and making sense of data was found to be critical in engaging users - as hypothesised by the author from the earliest design phase. However, two important issues arose from the testing process; firstly that buildings needed to colour instantaneously to provide sufficient motivation to add further data, and that colour itself was important in driving uploads. Of relevance also to these audiences was also that involvement allowed them to become part of a revival, for the first time since World War II, of a long historical tradition of colour-coded map making in London at building/block level and city scale. Further discussion of this tradition and of these use of colour is provided in the design section below. Other key issues arising included the importance of an option enabling weblinks to be added to connect users to images of the building or its history. The inclusion of this option also addressed another concern, namely the lack of ability to record the complex history of a building and its extensions and alterations owing to the simplification of OSMM polygons, and the fact that user was working in 2D- (a problem discussed in the context of resilience analysis in Chapter 5 and 6. As well as a weblink option, separate subcategories for façade age and major extensions and refurbishments were included as well as simplified age data categories – to supplement construction year - in which age was collected by decade and century. A multiple copy and paste tool feature was also designed to support and encourage ‘colouring in’ of data for buildings for a local areas as a whole, and to try to move away from one-off contributions. This tool was also necessary to prevent building-by-building uploads becoming laborious. As noted in Chapter 5, this type of tool was more relevant to areas of homogeneity, as in domestic suburbs, than at the historic centre, or along preurban routes or preurban London ‘villages’ where diversity of age and land use was high. However the tool, built by Russell, differed from the type of ‘calculate tool’ used for this procedure in Arc, was that large-scale selection was not possible explain. This was seen as necessary by the author to minimise and discourage malicious or lazy intervention, with large data uploads having to be submitted using a moderated bulk upload tool. A need for introductory videos with case studies on the many ways in which the site could be of relevance to the conservation, heritage and community planning sectors was also raised. The need for a more concerted approach to harnessing this sector was also identified and the opportunity to begin to build on existing heritage networks to build a database for London to allow Colouring London to more easily advertise to and engage volunteers. In August 2019 the author also negotiated a £7,000 grant extension with Historic England to develop such a database and to run ‘The London Building Age Map project ’to produce a full draft of the building age map by December 2020. As well as engaging the targeted sectors and groups, the idea was also to promote the richness and diversity of London's heritage, celebrate the sector’s body of knowledge on the building stock. At the same time the project was designed to generate open age data for use in sustainability science, urban science, urban analysis as a whole and which would also generate base data for use in longitudinal simulations within procedural models.

7.4.2 Conservation sector engagement To these ten categories, a further two were added following detailed negotiations with Historic England. In Spring 2016, Historic England commissioned an article from the author for its Conservation Bulletin on the potential future role of the conservation sector in supporting the energy and sustainable city agenda. In it the author highlighted the opportunity available ‘for those working in building conservation, building history, and the production and conservation of historical spatial data to radically reposition themselves at the forefront of the intelligent cities debate. Their USP is an unrivalled knowledge of the building stock and its evolution and of the impact of change. As a city’s stock forms its largest, most complex and most valuable resource, detailed data relating to finite components, and changes to them, are likely to become increasingly highly prized’ (Historic England, 2016, p.9) . Subsequent discussions between the author and Historic England explored how a mutually beneficial partnership could be developed in relation to the proposed Colouring London platform. A number of clear overlaps were identified. Firstly, in relation to the promotion of principles of sustainable urban development within the stock long followed by the conservation sector, including lifespan extension through the repair and adaptive reuse of buildings, incremental development and the fostering of uniqueness of urban areas, and the use of long-term planning horizons with strategic work. Secondly, in relation to the use of GIS in exploring the morphology and evolution of the building stock. From the 1990s pioneering data collection work was carried out by Historic England (then English Heritage ) within its Historic Landscape Characterisation programme (Thomas, 2006). Though data formatting and aggregation in urban areas make these data difficult in scientific applications, the ambition of the project, the harnessing of GIS technology to capture and map dynamic change were seen to closely align with Colouring London’s objectives. Thirdly, in Historic England’s role as the overall custodian of designated stock, responsible for demolition policy relating to it. Fourthly, in its curation of the largest and most detailed open database on the stock for England and London, the National Heritage List for England (NHLE). Fifthly in its curation of historical records and archive collections relating to the building stock. Sixthly in the interest of its vast body of volunteers involved in conservation, heritage and community planning, in older buildings, building and local area history. Seventhly in its close links with bodies and groups within the conservation sector working to identify assets of benefit to the community and urban landscape, protect these from demolition, and/or repair and care for them. Here it is important to note, as illustrated by the theories of Jacobs and Alexander, discussed in Chapter 2 and evidenced by the websites of civic and amenity societies and planning groups, that the conservation sector in general is not primarily focused on preventing change but on reducing the loss of reusable finite resources and ensuring new-build insertions integrate benefit rather than detract from the local area. Over the last decade, Historic England has also changed its message from one of building protection to one of managed change (Historic England, 2019a). It is also at this grass roots level that predictive knowledge of the long term operation and socio-economic value of older buildings will be held. Historic England was also, at the time of discussion, involved in the development of two open platform/crowdsouring projects; the first related to an open source GIS platform to improve documentation and provide greater access to the Greater London Historic Environment Record (GLHER) which provides information on historic assets in London , and the second to its ‘Enriching the List’ crowdsourcing programme designed to capture voluntary contributions from the heritage sector to correct, update and enrich the NHLE (Historic England, 2019b). Owing to the number of overlaps and common objectives identified, a proposal for collaboration prepared by the author was recommended for submission to the Heritage Protection Commission (Appendix 7.2). The proposal was specifically tailored to meet Historic England’s corporate aims and strategic objectives in its 2016 Heritage Information Access Strategy. This included accelerating and facilitating research into sustainable development; placing knowledge of the historical evolution of the city at the heart of this work; demonstrating the critical importance of the extensive and unique knowledge pool held within the building conservation, historical or archival research and local community planning sectors; promoting the historic environment and its social and economic value, its richness and diversity; and engaging with the whole community to foster the widest possible sense of ownership of buildings and place. The costs of setting up the Colouring London platform were argued to demonstrate exceptional value for money, owing to the unprecedented access to OSMM polygons, and the fact that costs of the long-term management of the project would be the responsibility of UCL. The preparation of the application also involved extensive consultation with academic departments, local authorities, community-led planning groups, civic societies, heritage providers, schools and professionals in the property and construction industries. As a result the broader, multipurpose value of the platform became apparent, including the provision of previously inaccessible data for many areas of urban analysis and research; the supply of free data for the construction industry to use in predevelopment context analysis; the generation of an access point for regulatory information on planning and protection; as a tool able to track building performance; and as a tool to support public engagement within the planning process; and as a tool to provide a free education resource for the city and an international platform celebrating London’s unique heritage. As a result of the consultation, four further categories were added. ‘Designer/Builder’, to record the names of development teams; ‘Protection’ (later to become ‘Planning’), to include regulatory information on protected heritage assets and links to the NHLE and the GHLER; ‘Street Design’, to provide information on the immediate context for the building, and ‘Like me’. The latter was specifically proposed to help capture local knowledge on buildings of long-term value to local areas and to enable communities to collectively highlight buildings under threat, in advance of development . The feature was therefore not originally designed to collect data for use in the scientific analysis of the stock but instead to increase democratic engagement in the planning system. It was realised that this could also help to attract uses of all ages and abilities, introducing a hitherto missing element of fun. A dislike button was not considered, as this, with its potential to encourage cyberbullying, was seen to counteract the guiding principles of user trust and of the development of a constructive safe space for knowledge exchange. The grant was approved in 2017, allowing Russell’s work on the back end of the platform to begin.

## The Historic Environment Sector

The historic environment sector, in the UK context, has been identified as the most important sector to target to improve building attribute data quality and to harness the power of crowdsourcing volunteered data contributions. Within the historic environment and conservation sector, the principle of building lifespan extension, and resource conservation, has been actively and systematically advanced for many years; in the case of the UK for well over a century (Delafons, 1997). Here the interdependence of parts within the stock as a system, the long-term knock-on effects of rapid change, and the need for long-term planning horizons, avoidance of finite resource depletion, tracking of survival, mortality and extinction rates, and the promotion of incremental adaptation to maintain and create diverse, sustainable and resilient systems, are all well understood. However despite this sector being in the vanguard of sustainable stock development, the vast knowledge base held within still remains little exploited by the scientific community (Hudson, 2016). Awareness of the need to capture data on the evolution of stocks and the way areas develop, and typologies mutate, over time, is however now beginning to change the research landscape, with a spotlight likely to be increasingly placed on historic environment sector knowledge in coming years. At the same time interest in a more rigorous, scientific approach to data curation, collection and analysis, as long championed within urban morphology, is also growing within the sector itself (National Trust for Historic Preservation, 2014). Mechanisms to support the historic environment sector in adding and checking data, as well as encouraging download and analysis, therefore need to be built into open dat platforms in older harness knowledge particularly on composition, operation in terms of how well buildings work in local areas for communities, and dynamic behaviour. Owing to relevance of the sector’s input to so many data areas specific attention to the needs of this sector need to be made. ADD

## Architecture

Opportunities also exist to exploit the growing interest in adaptation and refurbishment of urban stocks in architecture, in which debate and interest has traditionally largely focused on new build. In 2020 Oliver Wainwright, writing on architect-driven initiatives in the Netherlands that track material reserves, posed the question, ‘What if every existing building had to be preserved, adapted and reused, and new buildings could only use what materials were already available? Could we continue to make and remake our cities out of what is already there?’ (Wainwright, 2020). In 2016 Daniel Abramson, in his book 'Obsolescence', explored the economic drivers in the US behind the promotion of building obsolescence from the 1920s, and the misconception that new buildings will always outperform old (Abramson, 2016). In 'Buildings Must Die', published in 2014, Cairns and Jacobs address why building mortality and decay have been so little explored in architecture (Cairns and Jacobs, 2014), citing Kevin Lynch’s argument, from 1960, that future remodelling, demolition or dismantling of buildings should always be considered at the building design stage (ibid.). Stewart Brand, in 'How Buildings Learn', argued, in the 1990s, that architecture should be rebranded as ‘the design-science of the life of buildings’ with reuse placed at the forefront of its work (Brand, p. 210). He described the benefits of incremental, extension and adaptation, and of the process of trial-and-error in building, stating that ‘slow is healthy’ (ibid.). Brand believed that ‘designers should study the present the way historians study the past- diachronically, in terms of changes over time’. (ibid., 1994, p. 210). He also called for much greater discussion of building vulnerability, mortality and failure. ‘We need failure analysis that is systemic over the full scope of building-related activities and the entire life of buildings’ (ibid., p. 214). The need for a wider discussion of failure was also raised by Stephen Groak, who in 1990 described an ‘epidemic’ of building failure ‘in the past 30 or so years’ (Groák, 1990, p. 163). Longevity, reuse and adaptability were also all advocated in the 1960s by William Howell. In his review, in 1967, of Jane Jacob’s RIBA lecture of the same year, he stated, ‘We cannot afford the waste of national resources involved in knocking down buildings with good life in them and which in any case may be housing social and economic activities which would not survive transplantation…we are never in our foreseeable economic future going to be able to afford to treat buildings like Kleenex’ (HKPA, 2016). Opportunities now exist to harness knowledge held within the historic environment sector, and the construction industry, (and the growing interest of the former in data analytics, and of the latter in building reuse), to bring together knowledge from science, humanities and the arts to support the development of open building attribute data platforms. This requires partnerships to be developed across sectors and for features to be specifically designed to support sector engagement as tested in Chapter 9. Hi Robert, issues of data quality are being written up/drafted in the manual - which can then be built on with the CCRP group https://github.com/colouring-cities/manual/wiki. I am working on this now but they will end up in section F.

Re quality control the main mechanisms we have using the example of age is that a) bulk uploads are moderated by us

b) manual entries can only be done building by building- This can be speeded up using the copy and paste tool. We have specifically chosen not to use the ArcGIS highlight large area and paste option for this reason

c) the following will indicate reliability of data name of editor/edit history type of source source link

d) the verification buttons tells you how many other users agree with date e) We are building a network of specialist users (see CLHEAG) to check and enrich data. Local planning groups. and local civic societies are set up specifically to oversee change in local areas. It is therefore in their interest to verify and monitor data as well as to enrich

f) for age we are looking to cross check data generated using a number of methods these include: upload from unknown user upload from known expert group upload using historical street network inference upload using UK gov energy performance certificate data upload (if every released) of VOA property tax data g)It is possible that we might have a feature where we allow all dates ever

entered for a building to be viewed at once and link to editor name h) we are making last editor name more visible

i) we may include image of facade but would need to do this in a way to keep storage light and also to not link to commercial products- e.g. googlestreetview where they could just change terms and conditions at any stage ( as they have for analysing the streetview data)

k) we will probably include typology dropdown diagrams in the 'Type section' so this will also act as an additional verification

I am also interested in feedback loops between the automated processes and manual checking and how not to override the specialist input. We are trying to move towards a system which allows you to download say age data and asses the reliability yourself using all the above info.