Week 01 (W46 Nov16) Terrorism Database - Rostlab/DM_CS_WS_2016-17 GitHub Wiki
0 - Summary & index:
During this week, the team focused on making exploratory analysis of the data. We defined a private github repository and used R and Python for the analysis. It was possible to get some analysis with R, however it proved to be time inneficient. Frequencies, plots and wordclouds were computed allowing us to make brainstorming. The three main findings of the week are:
- Terrorist attacks have increasing during the last years, or at least theirs report.
- A huge number of Terrorist attacks occurred in the middle-east region.
- Groups responsible for most terrorism events are Taliban, Shining Path (SL), Farabundo Marti National Liberation Front, Islamic State of Iraq and the Levant.
- To solve some missing values we are going to merge related attributes.
Wiki status - concluded
Index
###1. Weekly work ####1.1 Dataset organization & description #####1.1.1 - Dataset summary #####1.1.2 - Dataset description ####1.2 - Descriptive analysis of the instances and features #####1.2.1 - Numeric Attributes #####1.2.2 - Geo-referencial Attributes #####1.2.3 - Text Attributes ####1.3 - Missing values ####1.4 - Outliers ####1.5 - Insights and ideas ####1.6 - Presentation ####1.7 - Perceived Feedback
1 - Weekly work
1.1 - Dataset organization & description
1.1.1 - Dataset summary
The Global Terrorism Database provides a set of instances that describe terrorist attacks from 1970 until 2015, worldwide. This topic is very popular nowadays due to the terrorist attacks ocurring in the past years. Despite being a hard task, predict the ocurrence of terrorist attacks is the most thrilling question that this dataset can provide insight about.
1.1.2 - Dataset description
- Size: 73.9 MB
- Attributes: 137
- Rows: 156 773
- Format: xlsx
- Incidents from 1970 to 2015
1.2 - Descriptive analysis of the instances and features
In order to better understand the dataset, some main metrics were collected. We focus on getting frequencies and plots for the main variables.
1.2.1 - Numeric Attributes
Year of the attacks
Most Terrorism related events occurred in 2014, not 2015. Hopefully the trend continues.
Month of the attacks
December is the most peaceful month of the year. Numbers are less than February.
Month day of the attacks
1st and 15th of the Month are frequent dates for attacks.
1.2.2 - Geo-referencial Attributes
Country of the attack
Iraq has more incidents of terror than other countries. Pakistan, India and Afghanistan complete the top 4.
Region of the attack
There have been more incidents of terror in the Middle East and North Africa than in the Indian Subcontinent.
Nationality of targets
Iraq, Pakistan, India and Afghanistan again have high numbers.
Scatter Map for the Number of Incidents
The scatter plot on the world map depicts the number of incidents that occurred since 1970. Clearly, as can be seen from the map, Iraq, India, Pakistan, and Afghanistan are the most affected.
Heat Map as per the Number of Incidents
The heat map also signifies the intensity of the number of attacks/incidents. Few countries adversely affected are India, Pakistan, Afganistan, Philippines, Iraq, Syria, and many countries in Europe.
Scatter Map for the Number of Deaths
The scatter plot depicts the number of deaths in various attacks. Here, the color depicts the magnitude of deaths that occurred. Red is more than 50 Deaths in an event, Blue shows events with deaths between 50 and 10, and Yellow depicts less than 10.
Heat Map as per the Number of Deaths
The color signifies the number of deaths, deaths increases as color changes from Yellow --> Blue --> Red.
1.2.3 - Text Attributes
For the text attributes, wordclouds were generated. We have several variables analysed. The most interesting findings are presented:
Summary of the attack
Motive for the attack
weapdetail - Weapon detail
target2 - target of the attack 2
target3 - target of the attack 3
Some other wordclouds also computed:
Alternative text for describing the attack
propcomment - Property Damage Comments
Ransom note
addnotes - Additional notes
This field is used to capture additional relevant details about the attack.
Scite1 - First Source Citation
Scite2 - Second Source Citation
Scite3 - Third Source Citation
corp2 - Name of Second Entity
corp3 - Name of Third Entity
1.2.3 - Attack Descriptives
Type of Attack
Bombing/Explosion are most frequent, followed by 'Armed Assault' 'Assassination'.
Target of Attack
Private Citizens and Property are most common targets. Military, Police and Governments are targeted frequently too.
Is the attack part of a set of events
Is the Responsibility Uncertain
Responsibility of most groups are certain. Maybe we can predict in other cases where the responsibility is not certain.
Is it a Suicide Attack
Maybe we can predict in whether the attacker will commit suicide.
Group responsible
Groups responsible for most terrorism events. Taliban, Shining Path (SL), Farabundo Marti National Liberation Front, Islamic State of Iraq and the Levant.
Value of Property damaged
Most events caused less $1 million worth of damage.
1 = Catastrophic (likely > $1 billion) 2 = Major (likely > $1 million but < $1 billion) 3 = Minor (likely < $1 million) 4 = Unknown
Number of Attackers
Number of people killed
Spikes in the data at round numbers like 50 and 100 are suspicious.
Number of terrorists killed
Spikes in the data at round numbers like 50 and 100 are suspicious.
Number of terrorists captured
Very rarely are terrorists captured alive
Success of the Attack
Based on the type of attack, the criterion for success varies.
Preferred weapon
Dynamite/Explosives and Firearms are most preferred.
Outcome of Hostage/Kidnapping situations
In most cases hostages are released, but alarmingly the second highest number of cases the hostages are killed.
1.3 - Missing values
Major Missing Data: Events for the year 1993 not available.
Regarding the ratios of missing values in the dataset's columns, some columns had high rates of missing values (>90%). However, these high rates occurred on columns that only applied to very specific situations (e.g. 4th firearm used in the attack, outcome of hostage situation). This points out that some data needs to be merged for the analysis.
1.4 - Outliers
Given the nature and source of the data, the team considered that in a first appraisal there were no outliers in this dataset. However, it could be interesting to find if there are any terrorist attacks that can be considered devious from the rest of the attacks.
1.5 - Insights and ideas
Related Variables:
- Match countries with types of attacks
- Match countries with weapons
- Target with motive, target with type of attack
Clustering:
- Cluster countries based on their attack and its characteristics
- Cluster terrorist groups based on their attacks
- Cluster attacks, the oldest one may be an inspiration. What characteristics did it matter?
- Clustering based on the targets
Possible prediction tasks:
- Based on the current known information known from an attack predict the group that made the attack
- Based on the current known information known from an attack predict how many victims the attack will have.
Visualization:
- Wordclouds
- Timelapse of the incidents happening
Tasks:
- For every attribute, check the number of instances (NAs) and see if they are going to help us
- Merge attributes: target2 and target3.
- Add missing values for 1993 from other sources. Wikipedia has a list of events in 1993. 2 Major Events in 1993: World Trade Center bombing (6 Dead, 1042 Injured) and Bombay serial bombing (257 Dead, 713 Injured). Unfortunately, the list will not be exhaustive.
Other:
- Relate incidents with events that happened during that time
- Host kinds - distribution of dates?
- Dates, do specific groups use a day/week every single time? Do we have a pattern?
- Pattern in the dates of the terrorist attacks
- See the weekdays
- If an attack was unsuccessful, did they try again later?
- Property damage: analyze it
- See if the add-notes has information
1.6 - Presentation
https://docs.google.com/presentation/d/1XM7Byvb_pBvdXQxRDtKeBdcpYTeBTto2OkslM8JNZP4/edit?usp=sharing
1.7 - Perceived Feedback
-
In the analysis of the text features it is important to consider the length of the the text (for example compute the Average and Standard deviation of the text fields)
-
Relate the occurrence of attacks with a list of historic facts (is there any database with the most important political/belic incidents?)
-
In the maps check on other variables:
- add different colors to distinguish between attacks in different decades
- with different weapons
- with different groups
-
In the goals of prediction we can also consider the time and space of a new attack.