Analysis on oil price effects on Alberta employment - adamlee001/analytics GitHub Wiki

Analysis on oil price effects on Alberta employment

Using R

<Read on Google doc> It is 2016. A lot of people seem to be in agreement that Alberta is suffering economically due to the collapse of oil price since mid-2015.

Using public data for oil prices and Alberta unemployment rates, we can try to analyze if the above sentiment holds.

Null hypothesis: Oil price does not influence employment rate in Alberta Alternative hypothesis: Oil price has influence on employment rate in Alberta

Unemployment data is downloaded in CSV format from http://economicdashboard.albertacanada.com/Unemployment. This actually contains much more than just Alberta data, however. So, we will just grab the first 4 columns of the data frame:

unemp <- read.csv("Unemployment_Rate.csv",header=T,as.is=T) View(unemp) albertaUnemploymentRate <- unemp[,1:4] View(albertaUnemploymentRate)

str(albertaUnemploymentRate) 'data.frame': 2880 obs. of 4 variables: $ When : chr "01/01/1976" "01/01/1976" "01/01/1976" "01/01/1976" ... $ AgeGroup: chr "15 years and over" "15 years and over" "15 to 24 years" "15 to 24 years" ... $ Sex : chr "Males" "Females" "Males" "Both sexes" ... $ Alberta : num 3.8 5.2 8 8.1 8.3 4.3 3.7 6.9 6.8 6.8 ...

head(albertaUnemploymentRate) When AgeGroup Sex Alberta 1 01/01/1976 15 years and over Males 3.8 2 01/01/1976 15 years and over Females 5.2 3 01/01/1976 15 to 24 years Males 8.0 4 01/01/1976 15 to 24 years Both sexes 8.1 5 01/01/1976 15 to 24 years Females 8.3 6 01/01/1976 15 years and over Both sexes 4.3 tail(albertaUnemploymentRate) When AgeGroup Sex Alberta 2875 12/01/2015 15 years and over Both sexes 7.0 2876 12/01/2015 15 years and over Males 7.9 2877 12/01/2015 15 to 24 years Both sexes 11.1 2878 12/01/2015 15 years and over Females 5.8 2879 12/01/2015 15 to 24 years Females 8.2 2880 12/01/2015 15 to 24 years Males 13.6

Data is a bit too detailed. Let’s see if we can get the average for each month. This means we ignore the age groups and genders.

a <- albertaUnemploymentRate

by1 <- a$When fby1 <- factor(by1,exclude = "") fby1 [1] 01/01/1976 01/01/1976 01/01/1976 01/01/1976 01/01/1976 01/01/1976 [7] 02/01/1976 02/01/1976 02/01/1976 02/01/1976 02/01/1976 02/01/1976 …

We drop columns AgeGroup and Sex:

aa <- a[,-2:-3] head(aa) When Alberta 1 01/01/1976 3.8 2 01/01/1976 5.2 3 01/01/1976 8.0 4 01/01/1976 8.1 5 01/01/1976 8.3 6 01/01/1976 4.3

Apply average to field Alberta grouped by column When:

aaa <- aggregate(aa$Alberta, by=list(fby1), FUN = "mean") head(aaa) Group.1 x 1 01/01/1976 6.283333 2 01/01/1977 6.566667 3 01/01/1978 6.283333 4 01/01/1979 5.600000 5 01/01/1980 5.833333 6 01/01/1981 5.350000

class(aaa) [1] "data.frame" ?names names(aaa) [1] "Group.1" "x"
names(aaa) <- c("Date","Alberta")

names(aaa) [1] "Date" "Alberta" head(aaa) Date Alberta 1 01/01/1976 6.283333 2 01/01/1977 6.566667 3 01/01/1978 6.283333 4 01/01/1979 5.600000 5 01/01/1980 5.833333 6 01/01/1981 5.350000

Now, we translate the Date column as actual dates:

str(aaa) 'data.frame': 480 obs. of 2 variables: $ Date : Factor w/ 480 levels "01/01/1976","01/01/1977",..: 1 2 3 4 5 6 7 8 9 10 ... $ Alberta: num 6.28 6.57 6.28 5.6 5.83 ...

Well done!

Dates are still Factor. Let’s now convert them to real dates. Preserver aaa by work on a copy aaaa. aaaa <- aaa

str(aaaa) 'data.frame': 480 obs. of 2 variables: $ Date : Factor w/ 480 levels "01/01/1976","01/01/1977",..: 1 2 3 4 5 6 7 8 9 10 ... $ Albeta: num 6.28 6.57 6.28 5.6 5.83 ... aaaa$Date <- strptime(aaaa$Date,"%m/%d/%Y") str(aaaa) 'data.frame': 480 obs. of 2 variables: $ Date : POSIXlt, format: "1976-01-01" "1977-01-01" ... $ Albeta: num 6.28 6.57 6.28 5.6 5.83 ...

The records as they are now:

head(aaaa) Date Albeta 1 1976-01-01 6.283333 2 1977-01-01 6.566667 3 1978-01-01 6.283333 4 1979-01-01 5.600000 5 1980-01-01 5.833333 6 1981-01-01 5.350000

The dates are not properly ordered. We need to sort the data frame by this column. We will put the good data in a new data frame, just in case.

alberta <- aaaa[order(aaaa$Date),] head(alberta) Date Albeta 1 1976-01-01 6.283333 41 1976-02-01 5.316667 81 1976-03-01 5.883333 121 1976-04-01 4.883333 161 1976-05-01 5.916667 201 1976-06-01 5.566667

Now the dates are properly sorted.

The field name Date rather gets in the way as Date also implies a Date object class. Let’s rename the columns as follows:

names(alberta) <- c("When","Alberta")

str(alberta) 'data.frame': 480 obs. of 2 variables: $ When : POSIXlt, format: "1976-01-01" "1976-02-01" ... $ Alberta: num 6.28 5.32 5.88 4.88 5.92 ...

We still have too much information. We only want data from 2006 to 2015 (ten years).

alberta1 <- subset(alberta,When >= as.POSIXlt("2006-01-01")) head(alberta1) When Alberta 31 2006-01-01 6.000000 71 2006-02-01 4.733333 111 2006-03-01 4.533333 151 2006-04-01 4.783333 191 2006-05-01 5.216667 231 2006-06-01 4.850000

tail(alberta1) When Alberta 280 2015-07-01 8.700000 320 2015-08-01 8.200000 360 2015-09-01 9.100000 400 2015-10-01 9.033333 440 2015-11-01 9.700000 480 2015-12-01 8.933333

Now we have 2006 to 2015 monthly data.

The time series plot of the unemployment data is shown below:

albertaTS <- ts(alberta1$Alberta,start=c(2006,1),frequency=12) plot.ts(albertaTS)

Download oil price data from https://research.stlouisfed.org/fred2/series/DCOILWTICO/downloaddata For comparison with unemployment, we chose 2006 to 2015 as well.

oildata <- read.csv("DCOILWTICO.csv",header=T,as.is=T)

head(oildata) DATE VALUE 1 2000-01-01 27.26 2 2000-02-01 29.37 3 2000-03-01 29.84 4 2000-04-01 25.72 5 2000-05-01 28.79 6 2000-06-01 31.82

We just want to make sure we have the same Date field as unemployment for the oil price data

o <- oildata

str(o) 'data.frame': 192 obs. of 2 variables: $ DATE : chr "2000-01-01" "2000-02-01" "2000-03-01" "2000-04-01" ... $ VALUE: chr "27.26" "29.37" "29.84" "25.72" ... names(o) <- c("When","OilPrice") str(o) 'data.frame': 192 obs. of 2 variables: $ When : chr "2000-01-01" "2000-02-01" "2000-03-01" "2000-04-01" ... $ OilPrice: chr "27.26" "29.37" "29.84" "25.72" ... o$When <- strptime(o$When,"%Y-%m-%d") str(o) 'data.frame': 192 obs. of 2 variables: $ When : POSIXlt, format: "2000-01-01" "2000-02-01" ... $ OilPrice: chr "27.26" "29.37" "29.84" "25.72" ... str(alberta1) 'data.frame': 120 obs. of 2 variables: $ When : POSIXlt, format: "2006-01-01" "2006-02-01" ... $ Alberta: num 6 4.73 4.53 4.78 5.22 ...

oildata1 <- subset(o,(When >= as.POSIXlt("2006-01-01") & When <= as.POSIXlt("2015-12-01")))

str(oildata1) 'data.frame': 120 obs. of 2 variables: $ When : POSIXlt, format: "2006-01-01" "2006-02-01" ... $ OilPrice: chr "65.49" "61.63" "62.69" "69.44" ...

oo <- oildata1 oildata1$OilPrice <- as.numeric(oo$OilPrice) str(oildata1) 'data.frame': 120 obs. of 2 variables: $ When : POSIXlt, format: "2006-01-01" "2006-02-01" ... $ OilPrice: num 65.5 61.6 62.7 69.4 70.8 ...

Now we have both columns properly as dates and numbers.

Let’s plot the oil price data:

oildataTS <- ts(oildata1$OilPrice,start=c(2006,1),frequency=12) plot.ts(oildataTS)

Does oil price has influence on unemployment in Alberta?

Laying side by side they do not tell much.

So these series must be coexisting to mean something.

plot(oildata1$OilPrice,alberta$Alberta)

Let’s merge the two data frames as one for easier control.

ao <- cbind(oildata1,alberta)

head(ao) When OilPrice When Alberta 73 2006-01-01 65.49 2006-01-01 6.000000 74 2006-02-01 61.63 2006-02-01 4.733333 75 2006-03-01 62.69 2006-03-01 4.533333 76 2006-04-01 69.44 2006-04-01 4.783333 77 2006-05-01 70.84 2006-05-01 5.216667 78 2006-06-01 70.95 2006-06-01 4.850000

plot(ao$OilPrice,ao$Alberta)

This looks identical to the last plot so all is well.

aoRegression <- lm(ao$Alberta~ao$OilPrice) str(aoRegression) List of 13 $ coefficients : Named num [1:119] 7.517 -1.5 0.883 -0.95 0.8 ... ..- attr(, "names")= chr [1:119] "(Intercept)" "ao$OilPrice100.54" "ao$OilPrice100.80" "ao$OilPrice100.82" ... $ residuals : Named num [1:120] 6.30e-18 9.35e-19 4.40e-19 5.05e-19 9.09e-19 ... ..- attr(, "names")= chr [1:120] "1" "2" "3" "4" ... $ effects : Named num [1:120] -77.78 -1.088 1.296 -0.534 1.219 ... ..- attr(, "names")= chr [1:120] "(Intercept)" "ao$OilPrice100.54" "ao$OilPrice100.80" "ao$OilPrice100.82" ... $ rank : int 119 $ fitted.values: Named num [1:120] 6 4.73 4.53 4.78 5.22 ... ..- attr(, "names")= chr [1:120] "1" "2" "3" "4" ... $ assign : int [1:119] 0 1 1 1 1 1 1 1 1 1 ... $ qr :List of 5 ..$ qr : num [1:120, 1:119] -10.9545 0.0913 0.0913 0.0913 0.0913 ... .. ..- attr(, "dimnames")=List of 2 .. .. ..$ : chr [1:120] "1" "2" "3" "4" ... .. .. ..$ : chr [1:119] "(Intercept)" "ao$OilPrice100.54" "ao$OilPrice100.80" "ao$OilPrice100.82" ... .. ..- attr(, "assign")= int [1:119] 0 1 1 1 1 1 1 1 1 1 ... .. ..- attr(, "contrasts")=List of 1 .. .. ..$ ao$OilPrice: chr "contr.treatment" ..$ qraux: num [1:119] 1.09 1.01 1.01 1.01 1.01 ... ..$ pivot: int [1:119] 1 2 3 4 5 6 7 8 9 10 ... ..$ tol : num 1e-07 ..$ rank : int 119 ..- attr(, "class")= chr "qr" $ df.residual : int 1 $ contrasts :List of 1 ..$ ao$OilPrice: chr "contr.treatment" $ xlevels :List of 1 ..$ ao$OilPrice: chr [1:119] "100.27" "100.54" "100.80" "100.82" ... $ call : language lm(formula = ao$Alberta ~ ao$OilPrice) $ terms :Classes 'terms', 'formula' length 3 ao$Alberta ~ ao$OilPrice .. ..- attr(, "variables")= language list(ao$Alberta, ao$OilPrice) .. ..- attr(, "factors")= int [1:2, 1] 0 1 .. .. ..- attr(, "dimnames")=List of 2 .. .. .. ..$ : chr [1:2] "ao$Alberta" "ao$OilPrice" .. .. .. ..$ : chr "ao$OilPrice" .. ..- attr(, "term.labels")= chr "ao$OilPrice" .. ..- attr(, "order")= int 1 .. ..- attr(, "intercept")= int 1 .. ..- attr(, "response")= int 1 .. ..- attr(, ".Environment")=<environment: R_GlobalEnv> .. ..- attr(, "predvars")= language list(ao$Alberta, ao$OilPrice) .. ..- attr(, "dataClasses")= Named chr [1:2] "numeric" "character" .. .. ..- attr(, "names")= chr [1:2] "ao$Alberta" "ao$OilPrice" $ model :'data.frame': 120 obs. of 2 variables: ..$ ao$Alberta : num [1:120] 6 4.73 4.53 4.78 5.22 ... ..$ ao$OilPrice: chr [1:120] "65.49" "61.63" "62.69" "69.44" ... ..- attr(, "terms")=Classes 'terms', 'formula' length 3 ao$Alberta ~ ao$OilPrice .. .. ..- attr(, "variables")= language list(ao$Alberta, ao$OilPrice) .. .. ..- attr(, "factors")= int [1:2, 1] 0 1 .. .. .. ..- attr(, "dimnames")=List of 2 .. .. .. .. ..$ : chr [1:2] "ao$Alberta" "ao$OilPrice" .. .. .. .. ..$ : chr "ao$OilPrice" .. .. ..- attr(, "term.labels")= chr "ao$OilPrice" .. .. ..- attr(, "order")= int 1 .. .. ..- attr(, "intercept")= int 1 .. .. ..- attr(, "response")= int 1 .. .. ..- attr(, ".Environment")=<environment: R_GlobalEnv> .. .. ..- attr(, "predvars")= language list(ao$Alberta, ao$OilPrice) .. .. ..- attr(, "dataClasses")= Named chr [1:2] "numeric" "character" .. .. .. ..- attr(*, "names")= chr [1:2] "ao$Alberta" "ao$OilPrice"

  • attr(*, "class")= chr "lm"

plot(ao$OilPrice,ao$Alberta) abline(aoRegression)

We can interpret that oil price has impact on unemployment rate: Unemployment Rate is higher when oil price drops.