Methodology - dssg/energywise GitHub Wiki

The report produced by the tool is to be read as follows.

Our tool consumes a buildings energy data, runs a series of analyses, and generates a report that walks building managers through the results. This is a guide for users on how to read the report, and how we did specific analyses. (A pdf version of this report guide is available here.)

Summary

The following report shows a detailed analysis of the energy consumption for a building. We identify the physical characteristics of the building (left), a histogram of energy usage by number of hours (top right), and the monthly energy usage (bottom right).

Average Behaviour

Average Day This plot shows the hourly energy usage, averaged over the year (blue). Typically, average hourly consumption during the week (green) is much higher than hourly consumption during the weekend (red). The vertical error bars represent the one standard deviation.

Daily average behaviour

Average Week This plot shows the daily energy usage profile, averaged over the year.

Weekly average behaviour

Types of Days

We use a machine learning algorithm known as k-means (http://en.wikipedia.org/wiki/K-means) to identify and profile the most commonly occurring patterns in daily energy consumption. For example, in the figures to the right, we identify three types of days: 77 days out of the year have a profile similar to the blue line, showing little variation in energy consumption; 167 days have a profile similar to the green line, with a significantly higher energy consumption from 4am – 5pm; the remaining 122 days of the year (red) show a very similar peak.

The figure on the bottom shows the days during the year on which these occur. It can be observed seen that there is a clear cyclical pattern between the blue, versus the green and red days. This may indicate scheduling effects (for instance, weekends versus weekdays may have different profiles). We also notice that the red lines with increasing energy consumption tend to only occur during the summer months, from May to September.

Types of days

Effect of Schedule

Box Plots enable us to study the distributional characteristics of energy consumption, as they compare for times during normal hours of operation, and after hours. They are also useful for identifying outliers and for capturing the variation within each group. In the two box plots provided, we capture the variance in energy consumption due to scheduling, for weekdays and weekends.

As seen in the figure on the right, the box plot consists of various components.

Inter-quartile range The middle “box” represents the middle 50% of kWh values. The range of scores from lower to upper quartile is referred to as the inter-quartile range. The middle 50% of values fall within the inter-quartile range.

Whiskers The “whiskers" above and below each box give additional information about the spread of the data, stretching over a wider range of kWh than the middle quartile groups. Whiskers are vertical lines that end in a horizontal stroke. The upper and lower whiskers represent the kWh outside the middle 50%.

Median The median (middle quartile) marks the mid-point of the data and is shown by the line that divides the box into two parts. Half the kWh readings are greater than or equal to this value and half are less.

Upper quartile Seventy-five percent of the scores fall below the upper quartile.

Lower quartile Twenty-five percent of scores fall below the lower quartile.

Schedule box plots

Energy Usage vs Temperature

Plot on Left: Energy consumption vs. hourly temperature queried from the nearest weather station (Weather data courtesy of Underground, http://www.wunderground.com/).

Plot on Right: Average daily energy consumption vs. average daily temperature queried from the nearest weather station.

Possible relationship and interpretations:

  • ‘V’ shape is a common occurrence in literature, meaning that the building is highly sensitive to temperature. At low temperatures, a significantly higher amount of energy is used to heat up the building; similarly, as temperatures rise, energy must be used to cool down the building

v shape

  • Lack of a clearly defined structure indicates that the building has a high variation in heating and cooling systems at every temperature. This may be an indicator that 1) there are inefficiencies in how the heating and cooling systems are used, and/or both may be on at the same time, or 2) the building is insensitive to outside temperature, either because of a constant outside temperature, a large thermal mass, or if temperature management is not a concern (for example, in some warehouses and storage facilities).

no structure

  • Flat structure at low temperatures usually indicates natural gas heating, when only the baseline load is maintained.

inverted L

Energy Usage vs Sunlight

We use the position of the sun as a proxy for the amount of sunlight hitting a building. Using the address of the building, we calculate the latitude and longitude, which is then used to infer the altitude, or position of the sun in the sky at each hour throughout each the day.

Depending on operating schedules as well as architectural features and location, there may or may not be a relationship between energy consumption and sunlight. We can identify three different cases:

  1. non-stop, on a 24-hour schedule (gas stations, convenience stores)
  2. 9-5 schedule (office buildings, dentists)
  3. Open during evenings only (bars, nightclubs)

Plot on Left: Heat map of hourly energy consumption vs. amount of sunlight during the day

Plot on Right: Average daily energy consumption vs. average daily amount of sunlight

Possible relationship and interpretations:

  • Flat structure indicates that the building does not rely on natural sunlight for lighting and/or heating. This may due to architectural features, size of the building (larger buildings or high rises with little access to natural sunlight will depend more on artificial lighting throughout the day)

No sun

  • Positive linear relationship typically occurs when buildings have access to natural light, and/or their schedules correspond to peak hours of natural sunlight. For example, in the plot to the right, we observe two distinct patterns– a flat line along the x-axis, as well as a a strong positive linear relationship. The two patterns may illustrate the effects of seasonality, as the amount of overall sunlight during winter and summer months varies significantly. Before sunrise, a baseline load is observed typically representing systems that are never turned off. After the sun rises, we see a consistent increase in energy consumption, which may be indicative of a daily schedule when operations begin, and continue into the evening, when lights also need to be turned on

Sun

It is important to note that the relationship that may be evident between temperature and sunlight is not necessarily one of causation, but may simply be of correlation. That is, amount of sunlight and/or fluctuations in temperature may often be correlated with increases in energy consumption; however, we are not able to declare causation in all cases.

Spikes

We identify the top six instances which register the highest hourly jump (increase) in energy consumption. Mathematically, this is equivalent to calculating the discrete first derivative for hourly energy consumption.

In the example on the right, we see that the spike identified occurred between 9 and 10 pm on 2/27/2011.

It may be useful to observe whether there is a tendency for spikes to occur at a certain time of the day, or be associated with sunlight and/or temperature.

Spikes

Distributions of Load Fluctuations

A heat map representing the distributions of changes in energy consumption by hour of the day. Mathematically, this represents the first derivative, or the change in energy consumption from hour to hour. The color represents the density of the plots; the length of the bar gives an indication of the approximate variation registered for each hour. For example, if we observe many fluctuations of a particular magnitude, they would be indicated by a red color. Lower frequencies are represented by blue. Points below the line represent negative changes (decreases in energy consumption); points above the line represent positive changes (increases in energy consumption).

In the figure to the right, we observe significantly less variation after 3pm, indicating that the building may be winding down operations. The largest increases in hourly energy consumption are registered at 12-2am. We can also observe trending behaviour spanning multiple hours. For example, mornings typically register an increase in load, reflecting the increase in energy needed to maintain daily activities.

Load

Outliers

We identify six anomalous days and six anomalous weeks throughout the year using MAD (Mean Absolute Deviation) from the average day, after centering the data. The blue line indicates energy usage in kWhs, the red line indicates temperature over the same time period, and the dotted grey line indicates the estimated amount of sunlight that hits the building. We incorporate a double axis, showing energy consumption on the right, and temperature is shown on the left.

The figure on the right shows the week in which the peak occurs. We observe that sunlight is correlated with the increased energy consumption peak, and the peak occurs during the day.

Outliers

Holidays

These plots highlight the energy consumption during Federal holidays, as well as a day before and after.

It is important to isolate these days for a number of reasons. Firstly, they can provide an estimate for ‘phantom’ load, or baseline load of the building. Depending on what day the holidays occurs, weekend or weekday, it may also lead to deviations from the usual cyclical patterns registered throughout the year. When isolating outliers or peaks, it thus becomes important to differentiate between these anomalous behavior that can be explained, and those that have no simple explanation.

Holidays

Peaks: Times in the top 1%

We identify the top nine hourly peaks in energy consumption, and show the relationship to sunlight and temperature.

In the example on the right, we see that the maximum energy consumption occurs at approximately 6am in the morning on May 31st, before sunrise, at the lowest temperature reading for the day.

It may be useful to identify whether there is a pattern between the times of day at which the peaks occur:

  • Are they mostly in the mornings, afternoons, or evenings?
  • What is their relationship to temperature and/or sunlight? For instance, peaks during low temperatures may indicate a stress on heating, while peaks during high temperatures indicate stress on load due to AC systems. This may provide a good indicator for what system should be investigated for potential inefficiencies.
  • Did anything unusual occur on those days (increased production, failure of a particular system, large purchase of computers, etc.)

top one

Note: the plots for the last two sections are not complete at this time.

Extreme Days

The three plots on this page show:

  1. the average daily profile for the building
  2. the day registering the highest total energy consumption out of the year
  3. the day registering the lowest total energy consumption out of the year

Raw Data

Temperature Over Time: show daily temperatures over the course of the year, including imputed values (marked in red).

Energy usage over time: annual energy usage in kWhs; the red and black dotted lines represent the 5th and 95th percentiles, respectively

Energy usage in the frequency domain: we transform the energy signal into the frequency domain via the Discrete Fourier Transform. This information is used in our internal analysis, and may safely be skipped if not already familiar with the mathematics behind it.