Aggregate Time Series Data - vaibhavdobriyal/R_Tutorials GitHub Wiki
This is complicated since we are talking about a time series related aggregation. Please explore if you can use aggregate, apply,lapply or sapply before going into loops. By the way if you are proficient with SQL then sqldf may be your friend also !
Interesting article on loops https://www.datacamp.com/community/tutorials/tutorial-on-loops-in-r
Example showing aggregation of data on time dimension.
> names(dat) <- c("utcTimestamp","start", "end", "usage","reading_interval", "status", "meterID", "alarm","is_imputed" ,"localAcquisitionTime")
> head(dat)
utcTimestamp start end usage reading_interval status meterID alarm is_imputed localAcquisitionTime
1 2015-11-12T00:00:00Z +00:00 0 3558 15 192 334_101 NA false 2015-11-12T00:00:00Z
2 2015-11-12T00:15:00Z +00:00 0 4048 15 192 334_101 NA false 2015-11-12T00:15:00Z
3 2015-11-12T00:30:00Z +00:00 0 3609 15 192 334_101 NA false 2015-11-12T00:30:00Z
4 2015-11-12T00:45:00Z +00:00 0 3706 15 192 334_101 NA false 2015-11-12T00:45:00Z
5 2015-11-12T01:00:00Z +00:00 0 3867 15 192 334_101 NA false 2015-11-12T01:00:00Z
6 2015-11-12T01:15:00Z +00:00 0 3000 15 192 334_101 NA false 2015-11-12T01:15:00Z
>
> dat <- subset(dat,meterID=="334_4")
>
> ##################aggregating data to 30 minutes level to reduce some data###################
> dat$localAcquisitionTime <- strptime(dat$localAcquisitionTime,format = "%Y-%m-%dT%H:%M:%SZ")
> library(xts)
> dat.xts <- xts(x = dat[, c(4)], order.by = dat[, "localAcquisitionTime"])
> head(dat.xts)
[,1]
2014-10-27 00:00:00 5153
2014-10-27 00:10:00 9608
2014-10-27 00:20:00 8955
2014-10-27 00:30:00 9522
2014-10-27 00:40:00 9446
2014-10-27 00:50:00 9653
> names(dat.xts) <- c("usage")
> #defining the level to aggregate - 30 minutes in our case
> ep <- endpoints(dat.xts, "minutes", 30)
> #finally performing the aggregation
> dat.xts30mins <- period.apply(dat.xts,ep,sum)
> head(dat.xts30mins)
usage
2014-10-27 00:20:00 23716
2014-10-27 00:50:00 28621
2014-10-27 01:20:00 28994
2014-10-27 01:50:00 28176
2014-10-27 02:20:00 27271
2014-10-27 02:50:00 27560
> dat30min <- (data.frame(localDateTime=index(dat.xts30mins), coredata(dat.xts30mins)))
Aggregating on two dimension
Ran into another issue when the logic of algo changed and required aggregation for every meter and date.
for (i in 1: nrow(meterIDList))
{
subsetDf <- subset(dat[,c("meterID","localDateTime","usage")],meterID==meterIDList[i,])
subsetDf <- subsetDf[,c("localDateTime","usage")]
subsetDf$localDateTime <- strptime(subsetDf$localDateTime,format = "%Y-%m-%dT%H:%M:%SZ")
#converting to XTS object
subset_xts <- xts(x = subsetDf[, c("usage")], order.by = subsetDf[, "localDateTime"])
names(subset_xts) <- c("usage")
#defining the level to aggregate - 30 minutes in our case
ep <- endpoints(subset_xts, "minutes", 30)
#finally performing the aggregation
subset_xts_30agg <- period.apply(subset_xts,ep,sum,na.rm=T)
#converting back to dataframe from timeseries
subset_dat30min <- (data.frame(localDateTime=index(subset_xts_30agg), coredata(subset_xts_30agg)))
subset_dat30min$meterID=meterIDList[i,]
#adding to the datframe with data for all meters -> contains meterID, usage and localtime
aggregated_dat <- rbind(aggregated_dat,subset_dat30min)
}
Sample input data
localDateTime usage meterID
2015-10-01T00:00:00Z 4 334_20
2015-10-01T00:15:00Z 4330 334_20
2015-10-01T00:30:00Z 4465 334_20
2015-10-01T00:45:00Z 5404 334_20
2015-10-01T01:00:00Z 5577 334_20
2015-10-01T01:15:00Z 5134 334_20
2015-10-01T01:30:00Z 4857 334_20
2015-10-01T01:45:00Z 4859 334_20
2015-10-01T02:00:00Z 4946 334_20
2015-10-01T02:15:00Z 5182 334_20
2015-10-01T02:30:00Z 5388 334_20
Sample Output data
2015-10-01 00:15:00 4334
2015-10-01 00:45:00 9869
2015-10-01 01:15:00 10711
2015-10-01 01:45:00 9716
2015-10-01 02:15:00 10128
Not a very clean solution but it works. Definitely there will be a better way of doing this - faster and efficient