Week 11 (W5 Feb) Crimes in the UK - Rostlab/DM_CS_WS_2016-17 GitHub Wiki

Week 11 (W5 Feb) Crimes in the UK

Note: For unknown abbreviations and terms, please consider the glossary. If anything is missing, just create an issue or write us an email and we will add it. A list of our data attributes can be found here.

Stop and search probability

This week we compared the probability of people in different age-ranges and with various ethnicities to get stopped and searched.

###Data preparation

Doing s&s research using MSOA was not possible, because the data was too sparse. We therefore came with another more creative approach. We looked closely to the s&s data by drawing it on a map as you can see below. While observing we came to the conclusion that we should cluster the data to a bigger granularity.We have chosen for this analysis the 5 biggest cities in the UK that we have the most data about. We aggregated the s&s data in this cities using LSOAs and removed the rest of it.

Next, we got the demographic data in those cities by aggregating using LSOAs, as we did with the s&s data.

Which ethnicity is more likely to be stopped?

The approach we used is to compare the distribution of the demographic data with the distribution of the s&s data. In the picture below you can see the demographic distribution by ethnicity.

The next picture describes distribution of the s&s data by ethnicity. Here we can see that for instance in London we have more black stop and searches although the amount of blacks is much smaller compared to the amount of whites.

By comparing the two pictures, one can observe some distribution differences. However, to observe the proportional differences of stops and searches we normalized. The figure below depicts the normalized number of s&s pro 10000 people by ethnicity. It can observe that in London, the probability of a blacks or asians to be stopped and searched is more than double than that of whites.

In which age ranges are people stopped more often?

Similar to how we did for determining the most probable ethnicity to be stopped we also analyzed the age influence. In the picture below you can see the demographic distribution by age-range.

The next picture describes distribution of the s&s data by age range. It can be observed that each city contains the same stop an search pattern, which can mean that all police forces have the same principles and that they communicate. We also expected that people in age-range 18-24 are being stopped and searched the most, because this is a sensitive period in a human life where most people do not have do not have much money, education and life experience.

Some distribution differences can be observed by analyzing the two pictures above. However, to observe the proportional differences of stops and searches we normalized. The figure below depicts the normalized number of s&s pro 10000 people by age-range. Newcastle is the smallest city between the five chosen cities. A big distribution difference can be can seen compared to the other cities. There are many more stop and searches of teenagers than people in age-range 18-24.

Outlier detection

As mentioned in last's week wiki, our outlier detection algorithm (using one class SVM) detected 673 abnormal MSOAs with respect to their criminal behavior based on the demographic data and the Points of interests. Our next step was to find a way for understanding the factors that lead those 673 MSOAs to be outliers.

This week, we worked on a concept to reach this goal, which is mainly based on clustering. The concept is visualized in the figure below. The image on the left shows a plot of our MSOA using only demographic data as features. Note that the red point is an outlier but with respect to criminal behavior, which is not included as a feature in this stage. Afterwards we grouped the MSOA according to the demographic data as seen in the right image. This allowed us to understand in which categories do the outliers belong, when we ignore the criminal behavior. This allowed us to have a comparison measure for outliers.

After having a unique cluster for each MSOA, we then included the criminal behavior features to compute the average of each criminal behavior feature for each cluster, which will serve as our comparison measure for outliers. As we can see in the image below, in cluster one the outliers deviate the most in criminal behavior features: "total crime types", "Number Anti-Social behavior", "Number other theft", "Number Shoplifting", and "Number Rubbery". Specifically, those features play the biggest role in defining MSOA outliers.

The image below plots outliers on the UK map. Its definitely not a surprise to see big cities like London, Leeds, or Hull. However, it was interesting to see small unknown cities like Blackpool and Brighton and Hove that are also defined as outliers, mainly due to the factors listed above

Link to our presentation