InfluxDB Setup - Weitspringer/squirrel-hpc GitHub Wiki

This section will provide you with what you need to know when using Squirrel with InfluxDB.

Getting started

To get started, we need to set up an InfluxDB instance. We provide a local docker-compose setup, which will be used in this tutorial. But you can also use any InfluxDB setup.

  1. Create a docker-compose.yaml file.
  2. Insert the following content:
version: '3'

services:
  influxdb:
    image: influxdb:2
    container_name: influxdb
    hostname: influxdb
    volumes:
      - type: volume
        source: influxdb2-data
        target: /var/lib/influxdb2
      - type: volume
        source: influxdb2-config
        target: /etc/influxdb2
    ports:
      - "8086:8086"
    networks:
      - squirrel

volumes:
  influxdb2-data:
  influxdb2-config:

networks:
  squirrel:
    driver: bridge
  1. Execute docker compose up -d.
  2. In your browser, go to localhost:8086.

You are asked to set up your InfluxDB instance. Name your user, choose a password, name the organization, and create a bucket. You can select anything here, but we recommend creating a bucket called squirrel.

Important

Always ensure that config/squirrel.cfg is up to date!

Copy the API token after the setup step, as you might need it later on.

grafik

Persist historical grid carbon intensity data

We will use InfluxDB to store lifecycle grid carbon intensity (GCI) data from Electricity Maps.

Data portal downloads

Electricity Maps also provides hourly data from previous years for free. Their data portal contains data for all available energy zones.

Once you downloaded the data, you can use the python -m cli electricitymaps ingest-history <path-to-downloaded-csv> command of Squirrel's Typer interface to load the data into InfluxDB.

The last 24 hours

Now, we will use Telegraf to get hourly lifecycle grid carbon intensity data from Electricity Maps and store them in InfluxDB. Feel free to check this official blogpost for running InfluxDB and Telegraf using Docker.

Please set up a free API token for Electricity Maps.

You can setup multiple Telegraf instances for multiple energy zones. We use the following docker-compose.yaml (replace values respectively):

version: '3'

services:
  influxdb:
    image: influxdb:2
    container_name: influxdb
    hostname: influxdb
    volumes:
      - type: volume
        source: influxdb2-data
        target: /var/lib/influxdb2
      - type: volume
        source: influxdb2-config
        target: /etc/influxdb2
    ports:
      - "8086:8086"
    networks:
      - squirrel
  telegraf:
    image: telegraf:latest
    container_name: telegraf
    depends_on:
      - influxdb
    volumes:
      # Mount for telegraf config
      - ./telegraf.conf:/etc/telegraf/telegraf.conf:ro
    environment:
      - INFLUX_URL=http://influxdb:8086
      - INFLUX_TOKEN=<influx-api-token>
      - INFLUX_ORG=<your-org>
      - INFLUX_BUCKET=<your-bucket>
      - EMAPS_TOKEN=<electricity-maps-api-token>
      - EMAPS_URL=https://api.electricitymap.org/v3/carbon-intensity/history?zone=<your-energy-zone>
    networks:
      - squirrel

volumes:
  influxdb2-data:
  influxdb2-config:

networks:
  squirrel:
    driver: bridge

Here is a template for the Telegraf configuration file telegraf.conf. This configuration instructs Telegraf to query the Electricity Maps API every 15 minutes by default. You can change the interval for testing purposes. The result will be written into your bucket as "electricity_maps" measurements.

# Configuration for telegraf agent
[agent]
  ## Default data collection interval for all inputs
  interval = "10s"
  ## Rounds collection interval to 'interval'
  ## ie, if interval="10s" then always collect on :00, :10, :20, etc.
  round_interval = true

  ## Telegraf will send metrics to outputs in batches of at most
  ## metric_batch_size metrics.
  ## This controls the size of writes that Telegraf sends to output plugins.
  metric_batch_size = 100

  ## Maximum number of unwritten metrics per output.  Increasing this value
  ## allows for longer periods of output downtime without dropping metrics at the
  ## cost of higher maximum memory usage.
  metric_buffer_limit = 100

  ## Collection jitter is used to jitter the collection by a random amount.
  ## Each plugin will sleep for a random time within jitter before collecting.
  ## This can be used to avoid many plugins querying things like sysfs at the
  ## same time, which can have a measurable effect on the system.
  collection_jitter = "2s"

  ## Default flushing interval for all outputs. Maximum flush_interval will be
  ## flush_interval + flush_jitter
  flush_interval = "30s"
  ## Jitter the flush interval by a random amount. This is primarily to avoid
  ## large write spikes for users running a large number of telegraf instances.
  ## ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s
  flush_jitter = "1s"

  ## By default or when set to "0s", precision will be set to the same
  ## timestamp order as the collection interval, with the maximum being 1s.
  ##   ie, when interval = "10s", precision will be "1s"
  ##       when interval = "250ms", precision will be "1ms"
  ## Precision will NOT be used for service inputs. It is up to each individual
  ## service input to set the timestamp at the appropriate precision.
  ## Valid time units are "ns", "us" (or "µs"), "ms", "s".
  precision = "1s"

  ## Log at debug level.
  # debug = false
  ## Log only error level messages.
  quiet = false

  ## Name of the file to be logged to when using the "file" logtarget.  If set to
  ## the empty string then logs are written to stderr.
  logfile = ""

  ## The logfile will be rotated after the time interval specified.  When set
  ## to 0 no time based rotation is performed.  Logs are rotated only when
  ## written to, if there is no log activity rotation may be delayed.
  # logfile_rotation_interval = "0d"

  ## The logfile will be rotated when it becomes larger than the specified
  ## size.  When set to 0 no size based rotation is performed.
  # logfile_rotation_max_size = "0MB"

  ## Maximum number of rotated archives to keep, any older logs are deleted.
  ## If set to -1, no archives are removed.
  # logfile_rotation_max_archives = 5

  ## Pick a timezone to use when logging or type 'local' for local time.
  ## Example: America/Chicago
  # log_with_timezone = ""

  ## Override default hostname, if empty use os.Hostname()
  hostname = "telegraf"
  ## If set to true, do no set the "host" tag in the telegraf agent.
  omit_hostname = false

[[outputs.influxdb_v2]]
  ## The URLs of the InfluxDB cluster nodes.
  ##
  ## Multiple URLs can be specified for a single cluster, only ONE of the
  ## urls will be written to each interval.
  ##   ex: urls = ["https://us-west-2-1.aws.cloud2.influxdata.com"]
  urls = ["${INFLUX_URL}"]

  ## Token for authentication.
  token = "${INFLUX_TOKEN}"

  ## Organization is the name of the organization you wish to write to; must exist.
  organization = "${INFLUX_ORG}"

  ## Destination bucket to write into.
  bucket = "${INFLUX_BUCKET}"

  ## The value of this tag will be used to determine the bucket.  If this
  ## tag is not set the 'bucket' option is used as the default.
  # bucket_tag = ""

  ## If true, the bucket tag will not be added to the metric.
  # exclude_bucket_tag = false

  ## Timeout for HTTP messages.
  # timeout = "5s"

  ## Additional HTTP headers
  # http_headers = {"X-Special-Header" = "Special-Value"}

  ## HTTP Proxy override, if unset values the standard proxy environment
  ## variables are consulted to determine which proxy, if any, should be used.
  # http_proxy = "http://corporate.proxy:3128"

  ## HTTP User-Agent
  # user_agent = "telegraf"

  ## Content-Encoding for write request body, can be set to "gzip" to
  ## compress body or "identity" to apply no encoding.
  # content_encoding = "gzip"

  ## Enable or disable uint support for writing uints influxdb 2.0.
  # influx_uint_support = false

  ## Optional TLS Config for use on HTTP connections.
  # tls_ca = "/etc/telegraf/ca.pem"
  # tls_cert = "/etc/telegraf/cert.pem"
  # tls_key = "/etc/telegraf/key.pem"
  ## Use TLS but skip chain & host verification
  # insecure_skip_verify = false

# Read formatted metrics from one or more HTTP endpoints
[[inputs.http]]
  ## One or more URLs from which to read formatted metrics
  urls = ["${EMAPS_URL}"]
  headers = {"auth-token" = "${EMAPS_TOKEN}"}
  data_format = "json"
  name_override = "electricity_maps"
  tagexclude = ["url", "host"]
  json_query = "history"
  json_name_key = "carbonIntensity"
  tag_keys = ["zone", "emissionFactorType"]
  json_time_key = "datetime"
  json_time_format = "2006-01-02T15:04:05Z07:00"

Persist forecasted grid carbon intensity data

You can use the stored historical data to write forecasted GCI data into InfluxDB. While you can use the built-in scheduler, storing the forecast beforehand can speed up submitting jobs by 50%.

Range forecast

For calculating forecasts on a range of data in the past, you can use the following command:

python -m cli forecast range-to-influx --help

Current forecast

For calculating the forecast for the next hours (as configured in config/squirrel.cfg), use:

python -m cli forecast to-influx --help

The command can be executed regularly, e.g. with a cronjob, to consecutively store forecasts using the built-in forecasting.

⚠️ **GitHub.com Fallback** ⚠️