Dataset Global Terrorism Database - Rostlab/DM_CS_WS_2016-17 GitHub Wiki
Dataset Global Terrorism Database
-
Proposer: @paulafortuna - [email protected]
-
Final Team:
- @paulafortuna
- @msandim
- @kapoorabhishek24
- @avradips
-
Votes: 1. @brosequartz, 2. @msandim, 3. @avradips, 4. @vivek-sethia, 5. @kapoorabhishek24, 6. @muhammadasad1, 7. @ashwarypande, 8. @ishaanraj
Summary
The Global Terrorism Database provides a set of instances/features that describe terrorist attacks from 1970 until 2015, worldwide. This topic is very popular nowadays due to the terrorist attacks occurring in the past years. Despite being a hard task, predict the occurrence of terrorist attacks is the most thrilling question that this dataset can provide insight about.
Prediction Goals
- Describe terrorism and terrorist attacks (locations, time, type of attack, terrorist groups)
- Predict terrorist attacks (location, time, type of attack)
Weekly Progress
-
Week 01 (W46-Nov16) Terrorism Database -- Main findings:
- Terrorist attacks have increased during the last years, or at least theirs report.
- A huge number of Terrorist attacks occurred in the middle-east region.
- Groups responsible for most terrorism events are Taliban, Shining Path (SL), Farabundo Marti National Liberation Front, Islamic State of Iraq and the Levant.
- To solve some missing values we are going to merge related attributes.
-
Week 02 (W47-Nov23) Terrorism Database -- Main findings:
- Terrorist attack localization evolves over time.
- There is a huge number of terrorist groups and its activity is evolving over time.
- There are around 40 features with a substantial number of missing values.
-
Week 03 (W48-Nov30) Terrorism Database -- Main findings:
- The targets of terrorism are not the same in different parts of the globe.
- Some instances of our dataset are marked with uncertainty about being a terrorist attack.
-
Week 04 05 (W49 50 Dec14) Terrorism Database -- Main findings:
- Around half of the attacks have no identified perpetrator.
- It is possible to cluster the terrorist groups based on target type, attack type, and weapons used.
- In the next weeks, we should do some tasks of predictive analysis.
-
Week 06 (W51 Dec21) Terrorism Database -- Main findings:
- We started with our prediction phase and discussed few of the topics that we want to consider.
-
Week 07 (W02 Jan11) Terrorism Database -- Main findings:
- A preliminary experience in predicting the number of victims achieved good results.
- In the future it is also important to know how to predict the terrorist group.
-
Week 08 (W03 Jan18) Terrorism Database -- Main findings:
- For our first prediction task, we provide a model that can predict with good accuracy if a terrorist attack will have mortal victims or not. However, predicting the precise number of victims is a more challenging task.
- For our second prediction task, creating a model for each region proved to be a good strategy. The accuracy of the models found is around 0.80.
- Relating the terrorism with demographic data is a more difficult prediction task. We are still searching for a demographic dataset that fits with our Terrorism Dataset.
-
Week 09 (W04 Jan25) Terrorism Database -- Main findings:
- We were able to improve the prediction of the number of kills, however, we are still far from getting a good model.
- The terrorist group prediction achieved good accuracy results, even when distinguishing between many possible terrorist groups in some regions.
- We discover a dataset for demographic data that can help us in the prediction tasks.
-
Week 10 (W05 Feb01) Terrorism Database -- Main findings:
- We were able to achieve good results in the terrorist group prediction task and with our model we concluded that 50% of the attacks in the Middle East may been conducted by ISIS and Al-Qaida.
Long Description
1 - Dataset Description
- Size: 73.9 MB
- Attributes: 120
- Rows: 156773
- Format: xlsx
- Incidents from 1970 to 2015
- Instance Example:
Place: Germany, Hanover Description: Assailants set fire to refugee housing near Hanover, Lower Saxony state, Germany. There were no reported casualties in the attack. No group claimed responsibility for the incident. Latitude, Longitude: 52.375892, 9.73201
2 - Attributes categories
- Incident Date
- Region
- Country
- State/Province
- City
- Latitude and Longitude (beta)
- Perpetrator Group Name
- Tactic used in attack
- Nature of the target (type and sub-type, up to three targets)
- Identity, corporation, and nationality of the target (up to three nationalities)
- Type of weapons used (type and sub-type, up to three weapons types)
- Whether the incident was considered a success
- If and how a claim(s) of responsibility was made
- Amount of damage, and more narrowly, the amount of United States damage
- Total number of fatalities (persons, United States nationals, terrorists)
- Total number of injured (persons, United States nationals, terrorists)
- Indication of whether the attack is international or domestic
Several Data Types: categories, text, coordinates, boolean, numeric, timestamps
The detailed description is presented in the Codebook of the GTD [2] (p. 14).
Dataset collector
Global Terrorism Database (GTD) is maintained by the National Consortium for the Study of Terrorism and Responses to Terrorism (START) [1].
Data Quality
The main attributes (date, location, and summary) are completed in more than 90.000 instances.