InfluxDB Setup - Weitspringer/squirrel-hpc GitHub Wiki
This section will provide you with what you need to know when using Squirrel with InfluxDB.
To get started, we need to set up an InfluxDB instance. We provide a local docker-compose setup, which will be used in this tutorial. But you can also use any InfluxDB setup.
- Create a
docker-compose.yaml
file. - Insert the following content:
version: '3'
services:
influxdb:
image: influxdb:2
container_name: influxdb
hostname: influxdb
volumes:
- type: volume
source: influxdb2-data
target: /var/lib/influxdb2
- type: volume
source: influxdb2-config
target: /etc/influxdb2
ports:
- "8086:8086"
networks:
- squirrel
volumes:
influxdb2-data:
influxdb2-config:
networks:
squirrel:
driver: bridge
- Execute
docker compose up -d
. - In your browser, go to
localhost:8086
.
You are asked to set up your InfluxDB instance. Name your user, choose a password, name the organization, and create a bucket. You can select anything here, but we recommend creating a bucket called squirrel
.
Important
Always ensure that config/squirrel.cfg
is up to date!
Copy the API token after the setup step, as you might need it later on.
We will use InfluxDB to store lifecycle grid carbon intensity (GCI) data from Electricity Maps.
Electricity Maps also provides hourly data from previous years for free. Their data portal contains data for all available energy zones.
Once you downloaded the data, you can use the python -m cli electricitymaps ingest-history <path-to-downloaded-csv>
command of Squirrel's Typer interface to load the data into InfluxDB.
Now, we will use Telegraf to get hourly lifecycle grid carbon intensity data from Electricity Maps and store them in InfluxDB. Feel free to check this official blogpost for running InfluxDB and Telegraf using Docker.
Please set up a free API token for Electricity Maps.
You can setup multiple Telegraf instances for multiple energy zones.
We use the following docker-compose.yaml
(replace values respectively):
version: '3'
services:
influxdb:
image: influxdb:2
container_name: influxdb
hostname: influxdb
volumes:
- type: volume
source: influxdb2-data
target: /var/lib/influxdb2
- type: volume
source: influxdb2-config
target: /etc/influxdb2
ports:
- "8086:8086"
networks:
- squirrel
telegraf:
image: telegraf:latest
container_name: telegraf
depends_on:
- influxdb
volumes:
# Mount for telegraf config
- ./telegraf.conf:/etc/telegraf/telegraf.conf:ro
environment:
- INFLUX_URL=http://influxdb:8086
- INFLUX_TOKEN=<influx-api-token>
- INFLUX_ORG=<your-org>
- INFLUX_BUCKET=<your-bucket>
- EMAPS_TOKEN=<electricity-maps-api-token>
- EMAPS_URL=https://api.electricitymap.org/v3/carbon-intensity/history?zone=<your-energy-zone>
networks:
- squirrel
volumes:
influxdb2-data:
influxdb2-config:
networks:
squirrel:
driver: bridge
Here is a template for the Telegraf configuration file telegraf.conf
. This configuration instructs Telegraf to query the Electricity Maps API every 15 minutes by default. You can change the interval for testing purposes. The result will be written into your bucket as "electricity_maps" measurements.
# Configuration for telegraf agent
[agent]
## Default data collection interval for all inputs
interval = "10s"
## Rounds collection interval to 'interval'
## ie, if interval="10s" then always collect on :00, :10, :20, etc.
round_interval = true
## Telegraf will send metrics to outputs in batches of at most
## metric_batch_size metrics.
## This controls the size of writes that Telegraf sends to output plugins.
metric_batch_size = 100
## Maximum number of unwritten metrics per output. Increasing this value
## allows for longer periods of output downtime without dropping metrics at the
## cost of higher maximum memory usage.
metric_buffer_limit = 100
## Collection jitter is used to jitter the collection by a random amount.
## Each plugin will sleep for a random time within jitter before collecting.
## This can be used to avoid many plugins querying things like sysfs at the
## same time, which can have a measurable effect on the system.
collection_jitter = "2s"
## Default flushing interval for all outputs. Maximum flush_interval will be
## flush_interval + flush_jitter
flush_interval = "30s"
## Jitter the flush interval by a random amount. This is primarily to avoid
## large write spikes for users running a large number of telegraf instances.
## ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s
flush_jitter = "1s"
## By default or when set to "0s", precision will be set to the same
## timestamp order as the collection interval, with the maximum being 1s.
## ie, when interval = "10s", precision will be "1s"
## when interval = "250ms", precision will be "1ms"
## Precision will NOT be used for service inputs. It is up to each individual
## service input to set the timestamp at the appropriate precision.
## Valid time units are "ns", "us" (or "µs"), "ms", "s".
precision = "1s"
## Log at debug level.
# debug = false
## Log only error level messages.
quiet = false
## Name of the file to be logged to when using the "file" logtarget. If set to
## the empty string then logs are written to stderr.
logfile = ""
## The logfile will be rotated after the time interval specified. When set
## to 0 no time based rotation is performed. Logs are rotated only when
## written to, if there is no log activity rotation may be delayed.
# logfile_rotation_interval = "0d"
## The logfile will be rotated when it becomes larger than the specified
## size. When set to 0 no size based rotation is performed.
# logfile_rotation_max_size = "0MB"
## Maximum number of rotated archives to keep, any older logs are deleted.
## If set to -1, no archives are removed.
# logfile_rotation_max_archives = 5
## Pick a timezone to use when logging or type 'local' for local time.
## Example: America/Chicago
# log_with_timezone = ""
## Override default hostname, if empty use os.Hostname()
hostname = "telegraf"
## If set to true, do no set the "host" tag in the telegraf agent.
omit_hostname = false
[[outputs.influxdb_v2]]
## The URLs of the InfluxDB cluster nodes.
##
## Multiple URLs can be specified for a single cluster, only ONE of the
## urls will be written to each interval.
## ex: urls = ["https://us-west-2-1.aws.cloud2.influxdata.com"]
urls = ["${INFLUX_URL}"]
## Token for authentication.
token = "${INFLUX_TOKEN}"
## Organization is the name of the organization you wish to write to; must exist.
organization = "${INFLUX_ORG}"
## Destination bucket to write into.
bucket = "${INFLUX_BUCKET}"
## The value of this tag will be used to determine the bucket. If this
## tag is not set the 'bucket' option is used as the default.
# bucket_tag = ""
## If true, the bucket tag will not be added to the metric.
# exclude_bucket_tag = false
## Timeout for HTTP messages.
# timeout = "5s"
## Additional HTTP headers
# http_headers = {"X-Special-Header" = "Special-Value"}
## HTTP Proxy override, if unset values the standard proxy environment
## variables are consulted to determine which proxy, if any, should be used.
# http_proxy = "http://corporate.proxy:3128"
## HTTP User-Agent
# user_agent = "telegraf"
## Content-Encoding for write request body, can be set to "gzip" to
## compress body or "identity" to apply no encoding.
# content_encoding = "gzip"
## Enable or disable uint support for writing uints influxdb 2.0.
# influx_uint_support = false
## Optional TLS Config for use on HTTP connections.
# tls_ca = "/etc/telegraf/ca.pem"
# tls_cert = "/etc/telegraf/cert.pem"
# tls_key = "/etc/telegraf/key.pem"
## Use TLS but skip chain & host verification
# insecure_skip_verify = false
# Read formatted metrics from one or more HTTP endpoints
[[inputs.http]]
## One or more URLs from which to read formatted metrics
urls = ["${EMAPS_URL}"]
headers = {"auth-token" = "${EMAPS_TOKEN}"}
data_format = "json"
name_override = "electricity_maps"
tagexclude = ["url", "host"]
json_query = "history"
json_name_key = "carbonIntensity"
tag_keys = ["zone", "emissionFactorType"]
json_time_key = "datetime"
json_time_format = "2006-01-02T15:04:05Z07:00"
You can use the stored historical data to write forecasted GCI data into InfluxDB. While you can use the built-in scheduler, storing the forecast beforehand can speed up submitting jobs by 50%.
For calculating forecasts on a range of data in the past, you can use the following command:
python -m cli forecast range-to-influx --help
For calculating the forecast for the next hours (as configured in config/squirrel.cfg
), use:
python -m cli forecast to-influx --help
The command can be executed regularly, e.g. with a cronjob, to consecutively store forecasts using the built-in forecasting.