Week 01 (W46 Nov16) Terrorism Database - Rostlab/DM_CS_WS_2016-17 GitHub Wiki

0 - Summary & index:

During this week, the team focused on making exploratory analysis of the data. We defined a private github repository and used R and Python for the analysis. It was possible to get some analysis with R, however it proved to be time inneficient. Frequencies, plots and wordclouds were computed allowing us to make brainstorming. The three main findings of the week are:

  • Terrorist attacks have increasing during the last years, or at least theirs report.
  • A huge number of Terrorist attacks occurred in the middle-east region.
  • Groups responsible for most terrorism events are Taliban, Shining Path (SL), Farabundo Marti National Liberation Front, Islamic State of Iraq and the Levant.
  • To solve some missing values we are going to merge related attributes.

Wiki status - concluded

Index

###1. Weekly work ####1.1 Dataset organization & description #####1.1.1 - Dataset summary #####1.1.2 - Dataset description ####1.2 - Descriptive analysis of the instances and features #####1.2.1 - Numeric Attributes #####1.2.2 - Geo-referencial Attributes #####1.2.3 - Text Attributes ####1.3 - Missing values ####1.4 - Outliers ####1.5 - Insights and ideas ####1.6 - Presentation ####1.7 - Perceived Feedback

1 - Weekly work

1.1 - Dataset organization & description

1.1.1 - Dataset summary

The Global Terrorism Database provides a set of instances that describe terrorist attacks from 1970 until 2015, worldwide. This topic is very popular nowadays due to the terrorist attacks ocurring in the past years. Despite being a hard task, predict the ocurrence of terrorist attacks is the most thrilling question that this dataset can provide insight about.

1.1.2 - Dataset description

  • Size: 73.9 MB
  • Attributes: 137
  • Rows: 156 773
  • Format: xlsx
  • Incidents from 1970 to 2015

1.2 - Descriptive analysis of the instances and features

In order to better understand the dataset, some main metrics were collected. We focus on getting frequencies and plots for the main variables.

1.2.1 - Numeric Attributes

Year of the attacks

Most Terrorism related events occurred in 2014, not 2015. Hopefully the trend continues. Number of Events per year

Month of the attacks

December is the most peaceful month of the year. Numbers are less than February. Number of Events per month

Month day of the attacks

1st and 15th of the Month are frequent dates for attacks. Number of Events per day

1.2.2 - Geo-referencial Attributes

Country of the attack

Iraq has more incidents of terror than other countries. Pakistan, India and Afghanistan complete the top 4. Number of Events per country (TOP 20)

Region of the attack

There have been more incidents of terror in the Middle East and North Africa than in the Indian Subcontinent.

Number of Events per region

Nationality of targets

Iraq, Pakistan, India and Afghanistan again have high numbers. Number of Events per country (TOP 20)

Scatter Map for the Number of Incidents

The scatter plot on the world map depicts the number of incidents that occurred since 1970. Clearly, as can be seen from the map, Iraq, India, Pakistan, and Afghanistan are the most affected.

Scatter Map for the Number of Incidents

Heat Map as per the Number of Incidents

The heat map also signifies the intensity of the number of attacks/incidents. Few countries adversely affected are India, Pakistan, Afganistan, Philippines, Iraq, Syria, and many countries in Europe.

Heat Map as per the Number of Incidents

Scatter Map for the Number of Deaths

The scatter plot depicts the number of deaths in various attacks. Here, the color depicts the magnitude of deaths that occurred. Red is more than 50 Deaths in an event, Blue shows events with deaths between 50 and 10, and Yellow depicts less than 10.

Scatter Map for the Number of Deaths

Heat Map as per the Number of Deaths

The color signifies the number of deaths, deaths increases as color changes from Yellow --> Blue --> Red.

Heat Map as per the Number of Deaths

1.2.3 - Text Attributes

For the text attributes, wordclouds were generated. We have several variables analysed. The most interesting findings are presented:

Summary of the attack

Summary

Motive for the attack

Motive

weapdetail - Weapon detail

Weapon

target2 - target of the attack 2

target2

target3 - target of the attack 3

target3

Some other wordclouds also computed:

Alternative text for describing the attack

Alternative text

propcomment - Property Damage Comments

Prop comment

Ransom note

Ransom note

addnotes - Additional notes

This field is used to capture additional relevant details about the attack. Additional notes

Scite1 - First Source Citation

Scite1

Scite2 - Second Source Citation

Scite2

Scite3 - Third Source Citation

Scite3

corp2 - Name of Second Entity

corp2

corp3 - Name of Third Entity

corp3

1.2.3 - Attack Descriptives

Type of Attack

Bombing/Explosion are most frequent, followed by 'Armed Assault' 'Assassination'. Attack Types

Target of Attack

Private Citizens and Property are most common targets. Military, Police and Governments are targeted frequently too. Attack Targets

Is the attack part of a set of events

Is Multiple

Is the Responsibility Uncertain

Responsibility of most groups are certain. Maybe we can predict in other cases where the responsibility is not certain. Attack Responsibility Uncertain

Is it a Suicide Attack

Maybe we can predict in whether the attacker will commit suicide. Attack Is Suicide

Group responsible

Groups responsible for most terrorism events. Taliban, Shining Path (SL), Farabundo Marti National Liberation Front, Islamic State of Iraq and the Levant. Group Responsible

Value of Property damaged

Most events caused less $1 million worth of damage. Damage caused

1 = Catastrophic (likely > $1 billion) 2 = Major (likely > $1 million but < $1 billion) 3 = Minor (likely < $1 million) 4 = Unknown

Number of Attackers

Number of attackers

Number of people killed

Spikes in the data at round numbers like 50 and 100 are suspicious. Number of people killed

Number of terrorists killed

Spikes in the data at round numbers like 50 and 100 are suspicious. Number of terrorists killed

Number of terrorists captured

Very rarely are terrorists captured alive Number of terrorists captured

Success of the Attack

Based on the type of attack, the criterion for success varies. Is Successful

Preferred weapon

Dynamite/Explosives and Firearms are most preferred. Preferred weapon

Outcome of Hostage/Kidnapping situations

In most cases hostages are released, but alarmingly the second highest number of cases the hostages are killed. Outcome of hostage scenario

1.3 - Missing values

Major Missing Data: Events for the year 1993 not available.

Regarding the ratios of missing values in the dataset's columns, some columns had high rates of missing values (>90%). However, these high rates occurred on columns that only applied to very specific situations (e.g. 4th firearm used in the attack, outcome of hostage situation). This points out that some data needs to be merged for the analysis.

1.4 - Outliers

Given the nature and source of the data, the team considered that in a first appraisal there were no outliers in this dataset. However, it could be interesting to find if there are any terrorist attacks that can be considered devious from the rest of the attacks.

1.5 - Insights and ideas

Related Variables:

  • Match countries with types of attacks
  • Match countries with weapons
  • Target with motive, target with type of attack

Clustering:

  • Cluster countries based on their attack and its characteristics
  • Cluster terrorist groups based on their attacks
  • Cluster attacks, the oldest one may be an inspiration. What characteristics did it matter?
  • Clustering based on the targets

Possible prediction tasks:

  • Based on the current known information known from an attack predict the group that made the attack
  • Based on the current known information known from an attack predict how many victims the attack will have.

Visualization:

  • Wordclouds
  • Timelapse of the incidents happening

Tasks:

  • For every attribute, check the number of instances (NAs) and see if they are going to help us
  • Merge attributes: target2 and target3.
  • Add missing values for 1993 from other sources. Wikipedia has a list of events in 1993. 2 Major Events in 1993: World Trade Center bombing (6 Dead, 1042 Injured) and Bombay serial bombing (257 Dead, 713 Injured). Unfortunately, the list will not be exhaustive.

Other:

  • Relate incidents with events that happened during that time
  • Host kinds - distribution of dates?
  • Dates, do specific groups use a day/week every single time? Do we have a pattern?
  • Pattern in the dates of the terrorist attacks
  • See the weekdays
  • If an attack was unsuccessful, did they try again later?
  • Property damage: analyze it
  • See if the add-notes has information

1.6 - Presentation

https://docs.google.com/presentation/d/1XM7Byvb_pBvdXQxRDtKeBdcpYTeBTto2OkslM8JNZP4/edit?usp=sharing

1.7 - Perceived Feedback

  • In the analysis of the text features it is important to consider the length of the the text (for example compute the Average and Standard deviation of the text fields)

  • Relate the occurrence of attacks with a list of historic facts (is there any database with the most important political/belic incidents?)

  • In the maps check on other variables:

    • add different colors to distinguish between attacks in different decades
    • with different weapons
    • with different groups
  • In the goals of prediction we can also consider the time and space of a new attack.