Sample ~ Array of Struct - miniboxing/ildl-plugin GitHub Wiki

If you haven't read the [[introduction|Tutorial--Introduction]] and [[first example|Tutorial--Example-(Part-1)]] already, it's a good time to do so, as this benchmark description assumes familiarity with the ildl-plugin.

This section will show how to optimize data structures across different libraries. Case in question, we optimize Vector[SensorReadings], where the Vector and SensorReading classes are defined independently, unaware of each other. This results in a suboptimal data layout, which the ildl plugin can fix, for speedups of up to 5.7x.

The benchmark files are available in the ildl-plugin repository, in the tests/benchmarks/src/ildl/benchmark/aos2soa directory. If you have imported the ildl-* projects in the Scala IDE, this benchmark is available under the ildl-benchnarks project, in the src directory, in package ildl.benchmark.aos2soa.

Problem

The current benchmark is inspired by a presentation given by Lee Mighdoll during ScalaDays 2015 in San Francisco on fast visualization tools for sensor data. Here's the presentation he gave: https://www.parleys.com/tutorial/visualize-things

Let's imagine you have a sensor and it records different parameters over the day. It could measure temperature, humidity, light, sound, vibration or other parameters. Each recording is stored on a local server in the form of entries:

+-----------+------------------------+----------------+
| timestamp | event/measurement type | sensor reading |
+-----------+------------------------+----------------+

While some readings occur regularly, like temperature and humidity, others only occur when some external trigger event occurs, such as, for example, a strong noise or vibration. In this case, the readings are not necessarily uniform (we can't say that every 5th reading is a temperature reading). Instead, we have a log of measurements, stored as an array:

+-----+-----+-----+-----
|  o  |  o  |  o  | ...
+- | -+- | -+- | -+-----
   |     |     |
   |     |     v    ...
   |     |   entry3
   |     |
   |     v
   |   entry2
   |
   v
 entry1

Now let's examine the following method that reads sensor data and averages the readings for a certain (potentially sparse) event. Let's imagine we ask the question: "What was the average temperature last night?" To answer this question, we have to traverse the array of entries and, for each one, check if the measurement is a temperature reading (and not humidity, ...) and, if it is, add the measurement to our computed sum and count.

def getAverageDirect(data: SensorReadings, event: Long): Double = {
   var i = 0
   var acc = 0.0
   var cnt = 0
   val size = data.length
    while (i < size) {
     if (data.event(i) == event) {
       acc += data.reading(i)
       cnt = 0
     }
     i += 1
   }
   if (cnt != 0)
     acc / cnt
   else
     Float.NaN
}

Although the code is written as efficiently as possible, using while loops, we can optimize it even further.

Solution

Currently, the data is stored as an array of pointers to each event:

+-----+-----+-----+-----
|  o  |  o  |  o  | ...
+- | -+- | -+- | -+-----
   |     |     |
   |     |     v    ...
   |     |   entry3
   |     |
   |     v
   |   entry2
   |
   v
 entry1

A much better way to store it would be to split each field in its own array:

+--------+--------+--------+-----
| time1  | time2  | time3  | ...
+--------+--------+--------+-----

+--------+--------+--------+-----
| event1 | event2 | event3 | ...
+--------+--------+--------+-----

+----------+----------+----------+-----
| reading1 | reading2 | reading3 | ...
+----------+----------+----------+-----

Now, we only need to traverse the types array, one element after another. When we encounter a temperature event, only then we access the reading array and add the sensor value to our sum and count one more entry (so we can output the average temperature, not the sum of temperatures). The gist of this transformation is to transform an array of tuples (or struct) to a tuple (or struct) of arrays. It's important to have this transformation across both data loading and reading, as we don't want to pay for the cost of transforming form one representation to another.

Effectively what the programmer should do is to transform SensorReadings to:

case class SensorReadingsSoA(arrayOfTimestamps: Array[Long],
                             arrayOfEvents:     Array[Long],
                             arrayOfReadings:   Array[Double])

Of course, we can change the existing code to transform the arrays and the processing code manually. But can we do it using the ildl-plugin? Sure we can.

Transformation

SensorReadings ===> SensorReadingsSoA
 \                        ^
  \______________________/
  ArrayOfStructToStructOfArray

We use the ArrayOfStructToStructOfArray transformation both when loading the data (well, in our case randomly producing it) and when traversing the data structure. With this, we can obtain the speed benefits without rewriting any of our code. In the artifact, you will notice that aside from the adrt annotation, there are no differences between the createDataDirect and createDataSoA, and, respectively, the getAverageDirect and getAverageSoA methods.

The AverageTemperature.scala file contains the benchmarked code.

Running the Benchmark

Okay, what are the results?

Running the benchmark can be done in sbt:

$ cd ildl-plugin
$ sbt 'ildl-benchmarks/runMain ildl.benchmark.aos2soa.BenchmarkRunner'

We analyzed this transformation in two scenarios:

  • If the temperature readings are randomly spaced, assuming there are many triggered readings
  • If the temperature readings are evenly spaced, assuming a calm residential setting.

These are the results we obtained:

The results obtained on our test machine are:

| Structure       |        | Time | Speedup | Garbage* | GC time* | Garbage+ | GC time+ |
|                 |        | (ms) |         | (MB)     | (ms)     | (MB)     | (ms)     |
| --------------- | ------ | ---- | ------- | -------- | -------- | -------- | -------- |
| array of struct | random | 55.5 |    none |        0 |        0 |      451 |       15 |
| struct of array | random | 30.4 |    1.8x |        0 |        0 |      435 |       13 |
| array of struct | even   | 32.5 |    none |        0 |        0 |      454 |       16 |
| struct of array | even   |  5.7 |    5.7x |        0 |        0 |      433 |       19 |
  • * GC pause time during the benchmark run
  • + GC pause time between the benchmark runs (mainly preparing the data)

As expected, the transformation paid off better if the temperature readings are regular, but even if they are not, we can still obtain a good speedup of 1.8x.

Note: In the paper we only included the random distribution, but now we plan to include both the random and even readings.

From Here:

Congratulations! You've now seen all our sample transformations!

⚠️ **GitHub.com Fallback** ⚠️