Job meta data

Below are job related information as used by ClusterCockpit. I find it important to separate data related to job accounting from information required for job performance monitoring. There can be links between job accounting and job performance monitoring, but those two things should not be mixed up in my opinion.

job id
user id
project
cluster name
number of nodes - (redundant but useful for simplifying queries)
job state - (running, aborted, finished successfully, etc.)
start time - epoch time in s
stop time - epoch time in s
walltime - (to evaluate used vs requested time)
node and CPU list - node and CPU IDs that specify the used compute resources (if nodes are only used exclusively, the CPU list can be omitted)
tag list - array of tags, tags are pairs of <tag type>:<tag name>, both can be any string

Optional:

duration - seconds (redundant but useful to have this available for queries)
job script

Performance footprints in form of metric job averages. Useful to analyse or sort jobs according to HPM metrics. This can be freely configured in ClusterCockpit but one could also settle on a fixed set of metrics.

mem_capacity_avg
flops_any_avg
mem_bw_avg
ib_bw_avg
lustre_bw_avg

User meta data

user id - string
user name - string
email - string
is active - boolean, indicates if user is active or if this is an inactive account

I currently removed additional fields as group and project as I believe that a performance project usually happens together with a user. But this could be discussed. In ClusterCockpit current handling would add a tag to the job with project:<project name> and thereby all jobs of a project could be grouped.

Performance Metrics

In the ProPE project the focus is to include metrics which mainly quantify resource utilization. Below lists specify the name of the metric, what it means, the smallest granularity it is valid for and how these values can be acquired. If one uses likwid-perfctr for measuring HPM metrics all below metrics can be acquired using 2 performance groups: MEM_DP and FLOPS_SP.

Basic metrics

cpu_used - CPU core utilization (between 0 and 1) / cpu level / kernel fs
ipc - avg ipc of active cores (cores executing instructions) / cpu level / HPM
mem_used - memory capacity used / node level / kernel fs
mem_bw - memory bandwidth / socket level / HPM
flops_any - total flop rate with DP flops scaled up / cpu level / HPM
rapl_power - CPU power consumption / socket level / HPM
lustre_bw - total lustre fs bandwidth / node level / kernel fs
ib_bw - total infiniband or omnipath bandwidth / node level / kernel fs
gpu_used - GPU utilization / GPU level / NVML (NVIDIA GPUs only)
gpu_mem_used - GPU memory capacity used / GPU level / NVML (NVIDIA GPUs only)
gpu_power - GPU power consumption / GPU level / NVML (NVIDIA GPUs only)

Extended metrics

clock - avg core frequency / cpu level / HPM
flops_sp - SP flop rate / cpu level / HPM
flops_dp - DP flop rate / cpu level / HPM
eth_read_bw - Ethernet read bandwidth / node level / kernel fs
eth_write_bw - Ethernet write bandwidth / node level / kernel fs
lustre_read_bw - Lustre read bandwidth / node level / kernel fs
lustre_write_bw - Lustre write bandwidth / node level / kernel fs
lustre_read_req- Lustre read requests / node level / kernel fs
lustre_write_req - Lustre write requests / node level / kernel fs
lustre_inodes - Lustre inodes / node level / kernel fs
lustre_accesses - Lustre open close / node level / kernel fs
lustre_fsync - Lustre fsync / node level / kernel fs
lustre_create - Lustre create / node level / kernel fs
ib_read_bw - Infiniband, Omnipath read bandwidth / node level / kernel fs
ib_write_bw - Infiniband, Omnipath write bandwidth / node level / kernel fs
ib_congestion - Infiniband, Omnipath congestion / node level / kernel fs

InfluxDB schema

InfluxDB uses its own nomenclature for the database schema. Still there are according structures in relational database speak. In InfluxDB a database is structured into measurements, a measurements has tags (strings) and fields (numbers). A measurement in InfluxDB is similar to a table in SQL, where a tag is a column with an index optimized for queries and fields are regular columns without index.

For a low overhead reporting and storage of metrics in InfluxDB it would make sense to put together metrics into one measurement. All fields in one measurement statement must have the same timestamp. For timestamp granularity seconds should be the right choice.

One could for example use the smallest topological entity on the node level as measurements:

cpu
- tags: host, cpu
- fields: load, cpi, flops_any, clock
socket
- tags: host, socket
- fields: rapl_power, mem_bw
node
- tags: host
- fields: mem_used, lustre_bw, ib_bw

As there are so many IO and Network related additional metrics it could make sense to create extra measurements for them:

network
- tags: host
- fields: ib_read_bw, ib_write_bw, eth_read_bw, eth_write_bw
fileIO
- tags: host
- fields: lustre_read_bw, lustre_write_bw, lustre_read_requests, lustre_write_requests, lustre_create, lustre_open, lustre_close, lustre_seek, lustre_fsync

The following InfluxDB measurements are currently used in Dresden's ProPE-database (one measurement per data source)

cpu
- tags: hostname, cpu
- fields: used
infiniband
- tags: hostname
- fields: bw
likwid_cpu
- tags: hostname, cpu
- fields: cpi, flops_any
likwid_socket
- tags: hostname, cpu
- fields: mem_bw, rapl_power
lustre[_scratch|highiops] (Dresden has two lustre file systems)
- tags: hostname
- fields: read_bw, write_bw, read_requests, write_requests, create, open, close, seek, fsync
memory
- tags: hostname
- fields: used
nvml
- tags: hostname, gpu
- fields: gpu_used, mem_used, power, temp

ProPE Job monitoring specifications - RRZE-HPC/DFG-PE GitHub Wiki

Job meta data

User meta data

Performance Metrics

Basic metrics

Extended metrics

InfluxDB schema

⚠️ GitHub.com Fallback ⚠️

ProPE Job monitoring specifications - RRZE-HPC/DFG-PE GitHub Wiki

Job meta data

User meta data

Performance Metrics

Basic metrics

Extended metrics

InfluxDB schema

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️