2012 Toronto Police Report Analysis - louislau66/Toronto_Police_Field_Report_Python GitHub Wiki

1 Introduction

Police will sometimes randomly stop a person for questioning. Although police define this as “A simple ‘meet-and-greet’ communication between a police officer and a community member, wherein a limited exchange of information may occur. ” Opponents say it unfairly targets racial minorities, which amounts to racial profiling. The data is from Field Information Reports (FIRS) reported in the year of 2012 (http://data.torontopolice.on.ca/pages/firs). I want to check whether racial profiling exists.

2 Objective

My approach to this project is to explore the FIRS, then try to answer the following questions:

  • In which month police filed the highest and lowest number of reports?
  • The nature of contacts
  • What’s the distribution of FIRS per day?
  • The distributions of Gender, Age, Time of day and Skin color.
  • Whether the racial profiling exists.

3 FIRS report structure

FIRS consists of the following fields:

4 Missing Values

The report consists of 404293 records. I observed there are 3 columns (BIRTH_PLACE, HOME_CITY, and HOME_PATROL_ZONE) with a huge amount of missing data. I choose to exclude these 3 columns from my analysis.

RangeIndex: 404293 entries, 0 to 404292

Data columns (total 15 columns):

CONTACTID 404293 non-null int64

TPS_PATROL_ZONE 404293 non-null int64

NATURE_OF_CONTACT 404293 non-null object

CONTACT_DATE 404293 non-null object

CONTACT_TIME 404293 non-null object

CONTACT_YEAR 404293 non-null int64

AGE 404293 non-null int64

SEX 96760 non-null object

BIRTH_PLACE 100780 non-null object

SKIN_COLOUR 361341 non-null object

YEAR_MONTH_OF_BIRTH 397525 non-null object

UNIQUE_PERSON_ID 403098 non-null float64

HOME_CITY 284066 non-null object

HOME_PATROL_ZONE 205676 non-null float64

FID 404293 non-null int64

dtypes: float64(2), int64(5), object(8)

5 Cleaning and Transforming the Dataset

  • In ‘AGE’ column, there are many abnormal values which likely caused by a typo. I filled all ages above 120 with null value. (ex: I found police questioned a 941-year-old man on June 15th, 2012 for trespassing!)
  • Extracted month from ‘CONTACT_DATE’ column for future analysis.
  • Column ‘AGE’ has been segmented into 4 groups (0-13 yrs 14-30 yrs 31-47 yrs 48-64 yrs 65+) for easy analysis.
  • Column ‘CONTACT_TIME’ has been divided into 6 segments (0-4am 4-8am 8am-12pm 12-4pm 4-8pm 8pm-0am) for analysis.

6 Descriptive Analysis

6.1 FIRS by month

Police filed fewer reports in winter than the other time of the year

6.2 FIRS per day distribution

We can see the number of FIRS per day looks like a normal distribution with a mean around 1125.

6.3 The Nature of contacts

Top 5 reasons for FIRS:

  • General Investigation
  • Radio Call
  • Traffic Stop
  • Vehicle Related
  • Bail Compliance Check-No Violation

6.4 Distribution by Age groups

People in the age groups of 14-30 and 31-47 are more likely to be stopped by police for FIRS.

6.5 Gender distribution

Men are 3 times more likely to be stopped by police than women.

6.6 Time of day distribution

The highest number of FIRS was filed between 8 pm ~0 am. The lowest number of FIRS was filed between 4 ~ 8 am.

6.7 Distribution of skin color

24% of FIRS was for people identified as Black, which only represents 8.9% of the population in GTA (Wikipedia). Looks like racial profiling is real!!!

6.8 Age by skin color

From the age distribution by skin color, we can see the average age of Black and Brown (25~26) are almost 10 years younger than White (35). This is significant when we consider the large data size.
**Police are targeting a much younger age group of visual minority compared with White. **