Visualising Crime Data with R: How Different Charts Help Uncover Hidden Patterns - ivinnyaraujo/dataengineer-datascience-python GitHub Wiki
Why Data Visualisation Matters
In the age of big data, making sense of complex datasets is key, and that is where visualisation comes in. Graphical representations of data help aggregate and communicate findings clearly and reveal patterns and outliers that might be missed in tabular views. For data mining, detecting trends, and generating insights, using the most appropriate visual can make all the difference.
In data science, visualisation is not only a powerful communication tool but also a critical part of exploratory data analysis. During the early stages of modelling, visual techniques allow data professionals to quickly identify patterns, outliers, and relationships between variables — an essential step in selecting the most appropriate statistical methods for analysis. The ability to turn raw data into clear, intuitive visuals is also essential for explaining results to stakeholders and building effective dashboards for non-technical users.
This wiki demonstrates how different visualisation types can aggregate and reveal crime trends across different regions of New Zealand between 2017 and 2024. While the analysis focuses on crime data, the methods are broadly applicable to other datasets. All code was developed in R using open-source libraries and is reproducible using this R notebook.
Tools & Libraries
New Zealand Police’s Victimisation Demographics dataset was used. The data covers victimisation cases between 2017 and 2024. To process and analyse it, it was used R Studio with the key libraries:
- library(tidyverse) # Includes ggplot2, dplyr, tidyr, etc.
- library(janitor) # For clean data
- library(scales) # For formatting numbers
- library(e1071) # For skewness calculation
Visualising Aggregated Annual Trends
Bar Chart: Total Victimisation by Year
The first chart is a bar plot showing the number of victimisation cases per year across all regions. This is a simple but effective way to spot macro-level trends. Bar charts make it easy to compare totals across categories (in this case, years). Peaks in 2022 and 2024 stand out clearly, indicating possible external factors driving higher crime.
2022 recorded the highest number of cases, which may reflect the socio-economic conditions post-COVID-19.
Understanding Data Distributions
Histogram: Distribution of Victimisation Counts
A histogram provides a visual summary of numerical data by grouping values into bins and displaying their frequencies. This reveals key distribution characteristics, such as normality or skewness of the data, guiding appropriate statistical methods. In this analysis, the histogram below illustrates how crime victimisation trends vary across years. The right-skewed distribution indicates that certain years (notably 2022) had significantly higher victimisation counts, highlighting potential outliers or shifts in underlying patterns.
With a mean of 259,194.6 and a median of 255,948, the distribution is moderately right-skewed (skewness = 0.8). This suggests that a few years, such as 2022 and 2024, had a significantly higher victimisation count than average.
Comparing Categories Across Time and Locations
Stacked Bar Chart: Victimisation by Region Over Time
The stacked bar chart below displays annual victimisation totals by region, showing areas with persistently high or low counts. This visualisation not only highlights geographic crime hotspots but also tracks shifts in regional contributions to national trends over time.
Auckland City, Canterbury, and Counties/Manukau show higher victimisation counts. Spatial analysis (e.g., heat maps) could offer deeper insights, but it is outside the scope of this discussion.
Line Plot: Regional Trends Over Time
This multi-line chart plots regional trends separately, highlighting growth or decline in victimisation over time per region. Line plots excel at showing temporal trends. For example, they can show if crime is rising consistently in a specific area, or if there are outliers.
Counties/Manukau, Auckland City, and Canterbury have sharper increases than other regions, especially in 2022 and 2024.
Exploring Categorical Breakdowns
Grouped Bar Chart: Crime Type by Region
A grouped bar chart was used to compare the prevalence of different crime types across regions. This reveals differences in offence types across regions. Categorical breakdowns are essential for targeted strategies. They highlight whether a region deals more with theft, assaults, or property crimes.
Counties/Manukau leads in theft from retail premises crimes and serious assault resulting in injury.
Multi-Line Plot: Crime Type Trends in Top Regions
Used to track the most common crime types over time in the top five regions with higher counts. Seeing how crime types evolve year by year gives context to policy or economic shifts. For instance, a spike in retail theft in 2022 could be aligned with higher economic pressure post-COVID.
Most frequent crime types across each region between 2017 and 2024.
Statistical Summary of Crime Types
Box Plot: Spread and Outliers of Key Crime Types
A box plot was used to analyse and compare the top three crime types across the five regions with the highest victimisation counts. The box plot displays key statistical measures, including median values, interquartile ranges (IQR), and outliers. By illustrating the central tendency, spread, and skewness of the data, these graphics provide insights into the patterns of crime types across regions, allowing for quick comparison of distribution patterns, highlighting regions with higher crime dispersion or unusual trends that may require further investigation.
Auckland City, Canterbury, Counties/Manukau, and Waikato show high variability (wide IQR). Waikato & Counties/Manukau have outliers, indicating surges in specific years. All regions have right-skewed distributions (median near Q1), meaning most years have lower counts with occasional spikes. Theft (excluding motor vehicles): stable overall, but Canterbury has the highest median and variability. Other regions show narrower distributions, suggesting consistent incident rates over time.
Summary
Visualisations are more than just pretty graphics, especially when working with big data, they help spot trends and outliers faster, identify relationships between variables, and communicate findings clearly to non-technical audiences.
Each graphic has its specific purpose, from showing overall trends (bar/line charts), exploring distributions (histogram/box plots), to examining categorical comparisons (stacked/grouped bars). Choosing the right chart is crucial to understanding the data and helping stakeholders to make informed decisions.
Resources
-
Dataset: New Zealand Police Open Data, Victimisations demographics
Disclaimer:
Please note that this article is intended to demonstrate the application of various visual types in data science. It does not aim to provide a comprehensive interpretation of the dataset used. For official statistics and more detailed analysis, visit the New Zealand Police Victimisations Demographics portal.