TickTockDB v0.20.0 query performance evaluation (RPI5, ARM64bit) - ytyou/ticktock GitHub Wiki

Table of Contents

4.1 Data not cached

4.2 Data already cached

4.3 Comparison of data not cached and already cached

5. InfluxDB v2 Query performance

5.1 Data not cached

5.2 Comparison of data not cached and already cached

6. Comparison of TickTockDB v0.20.0 and InfluxDB v2

7. Conclusion

1. Introduction

Recently TickTockDB v0.20.0 was released on 4/21/2024. It dramatically improved read performance due to three changes:

The binary format of data is changed. Data are split into buckets;
Raw data is rolled up to hourly and daily data points;
Memory usage is optimized.

In this wiki, we want to show you how TickTockDB v0.20.0 performs in reads, especially large queries, on RPI5. We will also compare it with InfluxDB v2.

2. IoTDB-benchmark Introduction

We selected IoTDB-benchmark for performance evaluation. Basically IoTDB-benchmark simulates multiple groups (identified as metrics/measurements) of devices (as tags/fields), each of which has a list of sensors (as tags/fields).

Please refer to README and the introduction in the previous wiki for details.

3. Experiment settings

3.1. Hardware

The figure shows a RaspberryPI5, a Single Board Computer (SBC) with

Broadcom BCM2712 2.4GHz quad-core 64-bit Arm Cortex-A76 CPU, with cryptography extensions, 512KB per-core L2 caches and a 2MB shared L3 cache
8GB LPDDR4X-4267 SDRAM,
802.11 b/g/n wireless LAN,
512GB v30 extreme SD card (SanDisk),
OS: Raspbian 12 (bookworm),
Cost: $80 in Canakit.com (board only, not including the SD card)

We run IoTDB-benchmark on an Ubuntu laptop with 12 cores AMD Ryzen5 5600H cpu, 20GB of memory. We try to minimize network bottleneck by connecting the laptop with the RaspberryPI4 by a network cable directly. We assign static IPs to RaspberryPI5 and the laptop by running, e.g., in RaspberryPI5

sudo ip ad add 10.0.0.5/24 dev eth0

3.2. Software

3.2.1 TickTockDB

Version: 0.20.0

Most configs are default except the followings. You can call config.sh to find out.

ylin30@raspberrypi:~/ticktock $ ./admin/config.sh
{
  "tsdb.timestamp.resolution": "millisecond"
}

Please update #openfile limits to a very high number. See this instruction.

For comparison purpose, we pick InfluxDB since it is the most popular TSDB, and the defacto TSDB in RaspberryPI.

3.2.2 Influxdb

Version: 2.6

Config: default

3.2.3 IoTDB-benchmark

Version: main
Sample config:

for TickTockDB: config.properties

for InfluxDB: config.properties

Important settings in the config:

Read-Write ratio: reads(0%) and writes(100%).

Number of groups (i.e., metrics/measurements): 1k.

Number of devices (i.e., tags/fields) : 10k.

Number of sensors (i.e., tags) per device: 10

1 data point/minute

The above configs will simulate a list of clients collecting data points (DEVICE_NUMBER * 10 sensors per device, 1 data point per minute per sensor), and sending them to TickTockDB/InfluxDB back to back, as called backfill scenarios.
For TT, we backfilled 5 years of data.
For InfluxDB, we only backfilled 6 months of data simply because it took too long (6 days to backfill 6 months of data) due to low throughput of InfluxDB. We also have to reduce the devices from 10K to 1k. We think InfluxDB must perform better with smaller cardinality since it uses much less disks. Nevertheless, TT still beats InfluxDB in this case.
We measure how much time it takes to query 1 metric, which has 100 sensors (i.e.,time series) =10 devices * 10 sensors/device, in different settings, e.g., different downsamplings (30m-avg, 12h-avg, 1d-avg) and different time range (e.g., 1 month to 5 years).

4. TickTockDB v0.20.0 query performance

In general, if we repeat a query several times, the first run will be slower than the later runs. The reason is because data need to be loaded from disks to caches of OS and/or TSDB. So we present you two different scenarios, data not cached (i.e., the first run) and data already cached (i.e., the later runs).

One of query examples is as:

[yi-IdeaPad ~]$ time curl -v -s 'http://10.0.0.5:6182/api/query?start=1420070400&end=1422748800&m=avg:1d-avg:g2_73'

For those who are not familiar with TickTockDB's APIs (i.e., OpenTSDB's APIs), let me explain it. The query above is meant to return the average (i.e., avg:) values of all time series of metric g2_73 from start=1420070400 (i.e., 01/01/2015-00:00:00z) to end=1422748800 (i.e., (i.e., 02/01/2015-00:00:00z). The 1d-avg means that all data points of one time series (i.e., sensor) within each day are downsampled to one data point with an average value.

Notice there are 100 time series (i.e., 10 devices * 10 sensors/device) in one metric and 1000 metrics (i.e., 1000 groups) in our experimental setup. The total size of 5 year data in disk is 339GB, which almost exhausts the capacity of the 512GB MicroSD card. We will test higher cardinality in X86 which can plugin much higher capacity disks in the future.

ylin30@pi5:~/ticktock $ du -hl -d 1 ./data
68G     ./data/2015
68G     ./data/2016
68G     ./data/2017
6.9M    ./data/WAL
68G     ./data/2019
68G     ./data/2018
339G    ./data
ylin30@pi5:~/ticktock $

4.1 Data not cached

In order to avoid data already cached in OS, we reboot RPI5 before tests. For each data point in the following graphs, we run 6 queries to get low, high and average response time. Each time we use a different metric name (i.e., g2_73 in the example above) to avoid data in cache.

The figure above shows 3 lines, representing downsample granularities of 30-minutes average (30m-avg), 12-hour average (12h-avg), and 1-day average (1d-avg). In all 3 lines, query response time is almost linearly scaled based on query time range. The longer range of a query it is, the longer it takes, linearly.

Notice that we use stock charts in that each point in the graphs has low, high and average values. We measure each point 6 times to get low, high and average. As you can see, the standard deviation is relatively very small, which means TickTockDB responded consistently.

Another observation is that 1d-avg is the fastest and 30m-avg is the slowest. 12-avg is in between but much closed to 1d-avg. For example, to query 1 year's data, it takes 30m-avg 19.34 seconds, 12h-avg 0.54 seconds, and 1d-avg 0.11 seconds, respectively. It is because 12h-avg and 1d-avg take advantage of data rollup feature in TickTockDB v0.20.0.

TickTockDB supports two levels of rollup data, hourly and daily. If the downsample granularity of a query is larger than 1 day, it will read the pre-rolled up daily data which has the smallest size. Else if it is larger than 1 hour, it will read pre-rolled up hourly data whose size is smaller than that of its original data. It will read the original data only if the downsample granularity is smaller than 1 hour.

The figure shows that the data rollup feature is very effective to help reduce query response time. To query 5 year data of 100 time series, it only takes 0.28 seconds if you use 1d-avg and 2.84 seconds if 12h-avg. TickTockDB only supports avg/min/max/sum/count aggregation in data rollup. Also please be informed that daily rollup is enabled by default in X86 but not in ARM (e.g., RaspberryPI). In our tests, we actually manually ran <ticktock/admin/rollup.sh> multiple times to prepare daily rollup data.

4.2 Data already cached

The figure above shows query response time when data are already cached. We repeated a query multiple times and measure its response time in runs other than the first one. The pattern of the graphs is similar to the pattern in case of data not cached, i.e.,

The longer range a query is, the longer its response time is. The scale is closed to linear.
1d-avg is the fastest, 30m-avg is the slowest, and 12h-avg is in between.

One noticeable symptom is that a query is much faster if its data are already in cache, especially in 30m-avg case. Let's talk about that next.

4.3 Comparison of data not cached and already cached

4.3.1 30m-avg

The above figure compares 30m-avg queries with and without data in cache. For example, to query 1 year of data, it takes 19.34 seconds if no data in cache versus 3.36 seconds if data already in cache. Obviously it is much faster if its data are already in cache. We can assume it takes 15.98(=19.34-3.36) seconds to load data from a disk, ideally. So we can conclude that the majority response time of a query is spent in disk IO instead of computing in CPU. The data rollup feature is effective to query simply because it reduces data size and thus IO time.

Note that 30m-avg queries do not use the data rollup feature yet since its downsample granularity is smaller than 1 hour. But a query is fast enough if its data are already in cache, especially when query range is not too long, e.g., 3.36 sec for 1 year and 6.61 sec for 2 years. However, notice that to query 5 years of data, it takes 86.20 seconds, similar to the case of data not cached. We think the reason is that 5 years of data is too large to be kept in cache. OS is opt to discard the data from cache. Thus, there is not difference in response time even if a query is repeated. Its data will have to be read from disks upon each time.

4.3.2 12h-avg

We have already known that, with 12h-avg downsampler, TickTockDB executes queries very fast even when data are not cached. For example, it takes only 2.84 second to query 5 years of data in average. When data are in cache, TickTockDB performs even better, only 0.54 seconds in average.

4.3.3 1d-avg

With 1d-avg downsampler, it is similar to 12h-avg. The data points are all less than half a second. Since queries are too fast, their low and high response time are in wider range relatively than 12h-avg and 30m-avg.

5. InfluxDB v2 Query performance

Similar to TickTockDB, we present you two scenarios of InfluxDB, data not cached and data already cached. We only backfilled 6 months of data to InfluxDB since its throughput was too low and it would take too long to backfill 5 years of data.

One of query examples is:

curl --request POST http://localhost:8086/api/v2/query?org=test \
--header 'Authorization: Token token' \
--header 'Accept: application/csv' \
--header 'Content-type: application/vnd.flux' \
--data 'from(bucket:"test")
        |> range(start:1420070460, stop:1427760000)
        |> filter(fn: (r) => r["_measurement"] == "g2_4")
        |> group(columns:["_measurement"])
        |> aggregateWindow(every: 12h, fn: mean)'

5.1 Data not cached

Similar to TT, 30m-avg is the slowest and 1d-avg is the fastest. 12h-avg is very closed to 1d-avg. For example, to query 6 months of data, it takes 12h-avg and 1d-avg 3.60 and 3.10 seconds, respectively, while it take 30m-avg 20.59 seconds, much longer than 12-avg and 1d-avg.

Query response time is linear scale to query range in 12h-avg and 1d-avg. It takes 12h-avg 0.50 (1x), 1.55 (=3 x 0.50), 3.60 (=7 x 0.50) seconds to query 1, 3, 6 months of data, respectively. But 30m-avg does not seem to be in linear scale. For example, it takes 1.15 seconds to query 1 month of data while 20.59 seconds (=18 x 1.15 seconds) to 6 months of data.

5.2 Comparison of data not cached and already cached

We find that data cached or not does not make any difference to query response time in InfluxDB v2. We show you the 12h-avg case as an example in the figure above, and skip the 30m-avg and 1d-avg cases.

6. Comparison of TickTockDB v0.20.0 and InfluxDB v2

We only compare them in the case of data not cached. As shown in the above sections, when data are in cache (i.e., if you repeat a query) TickTockDB is much faster but InfluxDB has similar performance. The gap between TickTockDB and InfluxDB would be wider in the case of data in cache.

6.1 Downsampler 30m-avg

When querying only 1 month of data, both TickTockDB and InfluxDB are very fast, 1.62 and 1.15 seconds, respectively. InfluxDB is even faster than TickTockDB.

When querying 6 months of data, TickTockDB only uses 9.45 seconds while InfluxDB uses 20.59 seconds which is much slower. TickTockDB can even query 1 year of data within 20 seconds.

6.2 Downsampler 12h-avg

The figure above shows that TickTockDB is consistently faster than InfluxDB with 12h-avg downsampler. When querying 1 month of data, TickTockDB takes 0.12 seconds and InfluxDB 0.50 second in average. When querying 6 months of data, TickTockDB takes 0.29 second and InfluxDB 3.60 seconds in average, respectively. TickTockDB is almost 12x faster than InfluxDB.

6.3 Downsampler 1d-avg

The figure above shows that TickTockDB is consistently faster than InfluxDB with 1d-avg downsampler. When querying 1 month of data, TickTockDB takes 0.07 second and InfluxDB 0.48 second in average. When querying 6 months of data, TickTockDB takes 0.06 second and InfluxDB 3.10 seconds in average, respectively. TickTockDB is 51x faster than InfluxDB.

7. Conclusion

TickTockDB v0.20.0 improves query performance by rolling up data to hourly and daily data points.
It takes just 0.24 second for TickTockDB to query 5 years of data which aggregates 100 time series if using 1d-avg downsampler and 2.84 seconds if using 12h-avg downsampler.
If a query is using a downsampler less than 1 hour (e.g., 30m-avg), TickTockDB will read original data which is much slower than read rolled up data. It will take TickTockDB 9.45 seconds to read just 6 months of data of 100 time series. We would suggest to use 1h-avg or larger downsamplers if your queries are longer than 6 months. Notice that there are 4380 hours in 6 months, which is already larger than the horizontal resolution 3840 in a 4K resolution monitor (=3840 * 2160).
TickTockDB also performs much better if data are already in cache (i.e., a query is repeated). For example, It will bring down response time of querying 5 years of data from 2.84 seconds to 0.53 seconds with 12h-avg. It demonstrates that the majority of query response time is in disk IO.
We compared TickTockDB with InfluxDB in RaspberryPI5. TickTockDB v0.20.0 is much faster than InfluxDB v2. When data are not cached,
- with 30m-avg, TickTockDB is 2 times faster than InfluxDB (9.45 vs 20.59 seconds to query 6 months of data, respectively).
- with 12h-avg, TickTockDB is 12 times faster than InfluxDB (0.29 vs 3.60 seconds to query 6 months of data, respectively).
- with 1d-avg, TickTockDB is 51 times faster than InfluxDB (0.06 vs 3.10 seconds to query 6 months of data, respectively).
The gap between TickTockDB and InfluxDB will be wider when data are already in cache.