Week 08 (W1 Jan11) Crimes in the UK - Rostlab/DM_CS_WS_2016-17 GitHub Wiki

Week 8 (W1 Jan11) Crimes in the UK

Note: For unknown abbreviations and terms, please consider the glossary. If anything is missing, just create an issue or write us an email and we will add it. A list of our data attributes can be found here.

Content

  1. Descriptive statistics
  2. [New datasets] (#new-datasets)
  3. Dark map

Descriptive statistics:

In the past two weeks we have continued conducting descriptive statistics on the crime data, location accuracy, and demographical influences. This shall assist us in implementing our prediction goals. In order to have a uniform centralized structure for the descriptive statistics part of the lab for the reader and us later, we have collected all conducted statistics so far in the following document

New datasets:

Further to defining our prediction goals and conducting final descriptive statistics, this week we also included two datasets. The first set is about the police forces, motivated by one of the prediction goals about assessment of police forces with respect to solving crimes of different types. This dataset comprises detailed information about the respective police forces such as the formation year, annual budget, area size covered by a police force as well as the numbers of stations it has. In addition, the dataset also comprises the number of the police officers (absolute and per 10.000 population) per police force, broken down by the rank, gender ethnicity, staff & community support officers and special constables. The information is conducted by the Home office on 31 March 2013.

The second dataset is an extension of the previously included census data about Local Authorities. The extension provides the same information but for a more detailed granularity, namely LSOA and MSOA. The data were taken from the Census 2011 that offers 748 different statistics from which 406 have demographic data for LSOAs and 596 have demographic data for MSOA. We merged all the tables to have a single table with all the LSOA data and another table with all the MSOA data. At the moment we have thousands of features. Therefore the next step is to run a feature selection algorithm to choose the 150 most useful of them.

Dark map:

As we have mentioned earlier in the Wiki, the actual geographical coordinates of crimes in our dataset were mapped to one of the nearest 750k anonymous points (center points of streets, public place, etc.) in order to protect the privacy of the victims. This inaccuracy could definitely have a negative effect on our approach of gaining more information about the environment around crimes by aggregating Stop-and-Searches and Points-of-interests within 500m from the crimes.

This week we were able to measure the inaccuracy rate of crime locations, demonstrated in the dark maps. On an initially black map of England, Ireland and Wales, around every crime with geographical coordinates we drew 20 km² squares in green and then unioned the resulting areas. In urban cities like London for example we didn’t find any dark “holes” in the unified green area, which means that any crime happened (or will happen) in London, it will surely get mapped to one of the master points and therefore the actual coordinates of the crimes are at most 20km² inaccurate. As you can see in the image below, almost all crimes in UK are at most 20km² inaccurate.

Dark map with 20km² squares (in green) around crime coordiantes.

20km2

Afterwards we drew the same map again but with different square sizes around crime coordinates as seen in the images below. One could notice that almost all given crimes are at most 2km² inaccurate. Starting from squares of size 1km² downwards, one could notice the appearance of large dark areas in the map. We were able to see that our 500m accuracy assumption holds in almost all urban areas.

Dark map with 10km² squares (in green) around crime coordiantes.

20km2

Dark map with 2km² squares (in green) around crime coordiantes.

20km2

Dark map with 1km² squares (in green) around crime coordiantes.

20km2

Dark map with 500 meter squares (in green) around crime coordiantes.

It can be observed that 500 square meter inaccuracy is a good assumption when analyzing big cities. 20km2

Dark map with 200 meter squares (in green) around crime coordiantes.

Having 200 square meter inaccuracy is not enough for some cities. 20km2

Dark map with 100 meter squares (in green) around crime coordiantes.

Having a just 100 square meter or less inaccuracy is even for London too small. 20km2

Link to our presentation