Home - Gabya06/GlobalTerrorism GitHub Wiki

Welcome to the GlobalTerrorism wiki!

##Steps taken to clean data so far:

  1. Look only at top 6 countries of interest: Iraq, Pakistan, Afghanistan, India, Philippines and United States - These make up ~65% of entire dataset

  2. Remove columns where there are more than half the rows with NA

  3. Remove columns with "_txt" since these are coded numerically

  4. Remove other columns with detailed location information or characters: "eventid", "provstate", "city","latitude","longitude","specificity", "location","summary","targsubtype1","motive","weapdetail","propcomment","scite1","scite2","dbsource"

  5. Of the columns that include international information, remove those where more than half the rows are missing or incomplete: "INT_LOG","INT_IDEO","INT_ANY"

  6. Assign numeric values to gname (Perpetrator Group Name), corp1 (corporate entity/government agency that was targeted) and target1 (Specific person, building, installation, targeted) and remove the columns with corresponding character information.

  7. Rearrange columns so that the label "Success" is at far right

##Perform subset selection using regsubsets from leaps package:

  1. using 20 features:

Looking at the features selection based on highest Adjusted R squared:

coef(reg.model, max.adjR)

(Intercept) iyear imonth extended country region doubtterr multiple
1.45E+01 -6.80E-03 -1.28E-03 -4.36E-02 -2.12E-04 1.35E-02 3.85E-02 1.46E-02
suicide attacktype1 claimed weaptype1 weapsubtype1 nkill nkillus nwound
-5.73E-02 5.95E-02 1.41E-02 -2.57E-02 -3.59E-03 5.94E-03 2.24E-02 -3.73E-04
property ishostkid corp.index gname.index target1.index
-3.17E-03 -3.21E-02 -8.39E-06 1.55E-04 -2.16E-06

AIC returns the same features, however BIC returns these:

(Intercept) iyear country region doubtterr suicide attacktype1 claimed
1.33E+01 -6.18E-03 -2.14E-04 1.32E-02 3.97E-02 -5.74E-02 5.79E-02 1.47E-02
weaptype1 weapsubtype1 nkill property ishostkid corp.index gname.index target1.index
-2.38E-02 -3.56E-03 5.29E-03 -3.17E-03 -4.59E-02 -8.74E-06 1.65E-04 -2.23E-06

Looks like imonth, extended, multiple, nkillus, nwound are all not returned.

Adjusted R2 BIC Cp