DevOpsMonitoring - henk52/knowledgesharing GitHub Wiki

DevOps Monitoring

Introduction

Purpose

References

Vocabulary

  • Baggage - Contextual information that is passed between signals. Baggage TODO
    • It seems to be application data that is sent down the trace span so the data is also available in the sub spans.
      • e.g. 'clientId'
      • so you application code does not have to share it, for the sub-span to have it.
    • instrumentations automatically propagate baggage for you. ref
    • Warning: Sensitive Baggage items can be shared with unintended resources, like third-party APIs ref
    • TODO move this to the coding part: To add baggage entries to attributes, you need to explicitly read the data from baggage and add it as attributes to your spans, metrics, or logs.
  • Collector - Receives, process and export. OpenTelemetry Collector - Explanation & Setup
    • Benefits
      • Prevent vendor lock-in
      • Consolidate all telemetry
      • allows you to filter out sensitive data.
      • provides reliability and effeciency - can even handle retry.
      • optimizes costs
      • offerrs self-observability.
  • Grafana - Data visiualization
  • Logs - a timestamped message emitted. ref
    • Logging is no longer the first tool you reach for, but it’s still the last mile of truth
    • Choose
      • too verbose for traces
      • too detailed to emit on every span
      • e.g
        • SQL-like query payloads
        • Validation errors with full field lists
        • Business rule decisions
        • Serialized inputs/outputs (redacted)
  • Loki - logging
  • Metrics - Multiple datapoints over time. e.g. CPU usage.
  • mimr - TODO
  • otel-lgtm - Loki, Grafana, Tempo, metrics?
  • otlp - OpenTelemetry Protocol. is implemented over gRPC and HTTP. with protocol buffer.
  • Prometheus - time series database
  • Reliability - Is the service doing what users expect it to be doing? ref
  • SLI - Service Level Indicator.
  • SLO - Service Level Objective.
  • Span - represents a unit of work or operation within a trace.
    • e.g. a database query, an HTTP call, or a function execution
  • spanId - TODO
  • Telemetry - Metrics, Traces, Logs
  • Tempo - Trace storage
  • Trace - tracks the flow of requests over multiple services.
    • the entire journey of a single request or operation as it flows through a system.
    • A trace is made of one or more spans.
  • traceId - TODO
  • unmarshal message - TODO Convert JSON (or Protobuf) to internal data?

Overview

  • Telemetry source
  • intermediate nodes
    • collectors
    • telemetry backends
graph LR;
      client-->|telemetry|backend;
      backend<-->|queries|frontend;
  • client - Infrastructure or applications.
  • backend - process and store data.
  • frontend - visualizeation.

otlp Client

  • OpenTelemetry Protocol Specification

  • The client must only send a single data point? per message

    • the server send an ack per packet.
  • The client may send multiplepackets before getting acks

  • The server sends a status code in the ack. See OTLP specification

  • When retrying, the client SHOULD implement an exponential backoff strategy.

  • OTLP allows backpressure signaling. from the server.

  • OTLP/gRPC - The default network port is 4317.

  • OTLP/HTTP

    • The default network port is 4318
    • either binary or json format
    • uses HTTP POST
    • uses HTTP/1.1 or HTTP/2.
      • should be able to fallback to HTTP/1.1
  • Teletmetry data can be sent to multiple servers.

Explanining Spans

Includes:

  • Name -
  • Parent span ID - (empty for root spans)
  • Start and End Timestamps -
  • Span Context -
    • trace id
    • span id
    • Trace Flags - a binary encoding containing information about the trace
    • Trace State - a list of key-value pairs that can carry vendor-specific trace information
    • Baggage - ? TODO is this key value list in this section?
  • status code ?
  • status message ?
  • Attributes - Attributes are key-value pairs that contain metadata that you can use to annotate a Span to carry information about the operation it is tracking. ref
    • You can add attributes to spans during or after span creation. Prefer adding attributes at span creation to make the attributes available to SDK sampling.
      • TODO what is SDK sampling?
      • Keys must be non-null string values.
      • Values must be a non-null string, boolean, floating point value, integer, or an array of these values.
  • Span Events - a structured log message (or annotation) on a Span, typically used to denote a meaningful, singular point in time during the Span’s duration. ref
    • is span events part of a span or a sepearate thing?
    • attributes vs events ref
      • If the timestamp in which the operation completes is meaningful or relevant, attach the data to a span event.
      • If the timestamp isn’t meaningful, attach the data as span attributes.
  • Span Links - exist so that you can associate one span with one or more spans, implying a causal relationship. ref
  • Span Status - TODO This seems to be the actual status of the span, not related to application successref
    • Error - some error occurred in the operation it tracks.
      • For example, this could be due to an HTTP 500 error on a server handling a request.
    • Ok - the span was explicitly marked as error-free by the developer of an application.
    • Unset - the operation it tracked successfully completed without an error.
  • spand kind - provides a hint to the tracing backend as to how the trace should be assembled. See also SpanKind
    • Client - span represents a synchronous(not queue) outgoing remote call such as an outgoing HTTP request or database call.
    • Consumer - represent the processing of a job created by a producer and may start long after the producer span has already ended.
    • Internal - operations which do not cross a process boundary.
    • Producer - creation of a job which may be asynchronously processed later. TODO expand it.
    • Server - a synchronous incoming remote call such as an incoming HTTP request or remote procedure call.

Installing and configuring

Set-up the host

debugging

  • sudo tcpdump -nn -i eno1 port 5601

ELK

  • sudo vi /etc/sysctl.conf
    • vm.max_map_count=262144
  • docker pull sebp/elk:5615
  • docker run --publish-all=true --name elk docker.io/sebp/elk:5615
  • docker ps
    • 5044 - logstash
    • 5601 - kibana
    • 9200 - elasticsearch - REST
    • 9300 - elasticsearch - for nodes communication
  • docker inspect 8a57a403e858 | grep IPAddress

Grafana

  1. docker pull grafana/grafana
  2. docker run --publish-all=true --name grafana grafana/grafana
  3. connect with browser to grafane
  4. login as admin and pw admin
  5. change the password

Add elasticsearch as source to grafana

  • click add resource
  • HTTP
    • url: http://172.17.0.2:9200
      • The 172.17.0.2 is the docker net ip address
      • the 9200 is the actual exposed port, even if the public port is set to 32772
    • Access: Server
  • Elasticsearch details
    • Index Name: [pipelineinfo-]YYYY.MM.DD
    • Pattern: Daily

Grafana dashboards for prometheus

  • find grafana dashboards at Dashboards

  • 1806 - node exporter

  • 12693 - haproxy (doesn't show any data)

  • 12538 - libvirt

  • 10530 - smartctl

  • click dashboards

  • click new

  • click import

  • enter the ID

  • click load

  • Select 'prometheus' as the prometheus source

  • click import

Graphite

  1. docker run --publish-all=true -e COLLECTD=1 --name graphite graphiteapp/graphite-statsd

Prometheus

Installing prometheus agents

sudo apt install -y prometheus-haproxy-exporter prometheus-node-exporter prometheus-libvirt-exporter

Installing promethus container

  1. docker pull prom/prometheus

Data Observability

Data Observability pillars

See:

  • Freshness: Freshness seeks to understand how up-to-date your data tables are, as well as the cadence at which your tables are updated.
    • Freshness is particularly important when it comes to decision making; after all, stale data is basically synonymous with wasted time and money.
  • Distribution: Distribution, in other words, a function of your data’s possible values, tells you if your data is within an accepted range.
    • Data distribution gives you insight into whether or not your tables can be trusted based on what can be expected from your data.
  • Volume: Volume refers to the completeness of your data tables and offers insights on the health of your data sources. If 200 million rows suddenly turns into 5 million, you should know.
  • Schema: Changes in the organization of your data, in other words, schema, often indicates broken data. Monitoring who makes changes to these tables and when is foundational to understanding the health of your data ecosystem.
  • Lineage: When data breaks, the first question is always “where?” Data lineage provides the answer by telling you which upstream sources and downstream ingestors were impacted, as well as which teams are generating the data and who is accessing it. Good lineage also collects information about the data (also referred to as metadata) that speaks to governance, business, and technical guidelines associated with specific data tables, serving as a single source of truth for all consumers.

Open telemetry

  • Traces

  • Context - (seems to be trace_id and span-id)

  • Span - start and end of a "function" call (I think)

  • Span Context - the part of a span that is serialized and propagated alongside Distributed Context and Baggage.

    • trace-id, span-id, trace flags, trace state.
  • span-id -

  • trace-id - identify the call (I think)

  • Tracer Provider -

  • parent-id - parent span id.

  • Trace Exporters - send traces to a consumer, e.g. OpenTelemetry Collector

  • Context Propagation - With Context Propagation, Spans can be correlated with each other and assembled into a trace, regardless of where Spans are generated.

  • Trace Semantic Conventions

  • Span Events - seems to be a singular event, compared to a span that has a start and a finish.

    • If the timestamp in which the operation completes is meaningful or relevant, attach the data to a span event.
    • If the timestamp isn’t meaningful, attach the data as span attributes.
  • Span Status - an additional operation is queued to be executed, but its execution is asynchronous

  • Span Kind -

  • TODO IMPORTANT - Semantic conventions - Standadized naming for seamless analysis

    • so that both you and vendors use the same name for the same thing like e.g. 'user.id'

Fluent bit

Fluent bit installation

  • Installing with Helm Chart

  • helm repo add fluent https://fluent.github.io/helm-charts

  • helm search repo fluent

  • helm upgrade --install fluent-bit fluent/fluent-bit

Fluent bit configuration

  • [INPUT]

    • Name - name of the plugin
    • Tag - name og your tag
      • used in e.g. filter
  • [SERVICE] - this is the fluentbit service

  • yaml

    • env
    • services
    • pipeline
      • inputs
      • filters
      • output

Plugins?

  • BackPressure

    • Mem_Buf_Limits
  • Monitoring

    • expose metrics (in json)
      • /api/v1/uptime
      • /api/v1/metrics
        • curl http://10.42.204.112:2020/api/v1/metrics | jq
      • /api/v1/metrics/prometheus - similar to metrics but in the prometheus format
      • /api/v1/health
  • Expect - enable us to validate the data is formated as expected.

  • Fluentbit metrics

  • Health

  • Nginx exporter metrics

  • Node exporter metrics

  • Prometheus scrap metrics

    • Seems like this will do prometheus scrapes TODO investigate
  • StadsD

  • Windows Exporter metrics

    • node exporter for windows?
  • open telemetry input plugin

    • otlp http
    • TCP 4318
  • filter plugins?

    • Expect
    • GeopIP
    • Grep
    • k8s
    • record modifier
    • rewrite tag
    • throttle modify
    • Nest
    • LUA scripts
  • output plugins

    • Prometheus remote write
    • prometheus exporter
    • OpenSearch
    • OpenTelemetry
    • Loki
    • stdout

Calyptia - will visualize the configuration of fluentbit

fluentbit parsers

nested json

{
  "resourceLogs": [
    {
      "resource": {
        "attributes": [
          {
            "key": "service.name",
            "value": {
              "stringValue": "logs-basic-example"
            }
          }
        ]
      },
      "scopeLogs": [
        {
          "scope": {
            "name": "opentelemetry-log-appender",
            "version": "0.3.0"
          },
          "logRecords": [
            {
              "timeUnixNano": null,
              "time": null,
              "observedTimeUnixNano": 1712651426487589000,
              "observedTime": "2024-04-09 08:30:26.487",
              "severityNumber": 9,
              "severityText": "INFO",
              "body": {
                "stringValue": "Hello from logs-basic-example"
              },
              "attributes": [],
              "droppedAttributesCount": 0
            }
          ]
        }
      ]
    }
  ]
}

fluentbit filters

[FILTER]
    Name         parser
    Parser       simple_json
    Match        json.*
    Key_Name     msg
    Reserve_Data On
    Preserve_Key On

with filter

{"create":{"_index":"logstash-2024.04.12"}}
{"@timestamp":"2024-04-12T09:10:58.305Z","log":"2024-04-12T09:10:58.305209409Z stderr F {\"level\":\"Info\",\"ts\":1712913058305,\"msg\":\"{ \\\"filename\\\": \\\"src/main.rs\\\", \\\"line\\\": 34, \\\"data\\\": \\\"time info\\\" }\"}"}
{"create":{"_index":"logstash-2024.04.12"}}
{"@timestamp":"2024-04-12T09:11:08.306Z","log":"2024-04-12T09:11:08.305943113Z stderr F {\"level\":\"Info\",\"ts\":1712913068305,\"msg\":\"{ \\\"filename\\\": \\\"src/main.rs\\\", \\\"line\\\": 34, \\\"data\\\": \\\"time info\\\" }\"}"}

without filter

{"@timestamp":"2024-04-12T09:19:08.321Z","log":"2024-04-12T09:19:08.320983428Z stderr F {\"level\":\"Info\",\"ts\":1712913548320,\"msg\":\"{ \\\"filename\\\": \\\"src/main.rs\\\", \\\"line\\\": 34, \\\"data\\\": \\\"time info\\\" }\"}"}

Fluentd

POC Stacks

grafana/otel-lgtm

  • docker run -p 3000:3000 -p 4317:4317 -p 4318:4318 --rm -it grafana/otel-lgtm

    • ports
      • 3000 - Grafana. admin/admin
      • 4317 - Opentelemetry GRPC endpoint
      • 4318 - OpenTelemetry HTTP endpoint
  • View Traces in Grafana

    • Open Grafana: Go to http://localhost:3000 in your browser.
    • Navigate to Explore: Click on the Explore icon (compass icon) in the left sidebar.
    • Select Tempo: In the data source dropdown at the top, select Tempo.
    • Search for Traces:
      • Click on Search to find recent traces.
      • You can filter by service name, span name, or trace ID.
    • View Trace Details: Click on a trace ID to see the waterfall view, which shows span timing, service interactions, and any associated logs or errors.
Waiting for the OpenTelemetry collector and the Grafana LGTM stack to start up...
Running Tempo v2.9.0 logging=false
Running Prometheus v3.9.1 logging=false
Running OpenTelemetry Collector v0.143.1 logging=false
Running Grafana v12.3.1 logging=false
Running Loki v3.6.3 logging=false
Running Pyroscope v1.17.1 logging=false
Tempo is up and running. Startup time: 27 seconds
Pyroscope is up and running. Startup time: 39 seconds
Loki is up and running. Startup time: 40 seconds
Prometheus is up and running. Startup time: 44 seconds
Otelcol is up and running. Startup time: 54 seconds
Grafana is up and running. Startup time: 103 seconds
Total startup time: 103 seconds

Startup Time Summary:
---------------------
Grafana: 103 seconds
Loki: 40 seconds
Prometheus: 44 seconds
Tempo: 27 seconds
Pyroscope: 39 seconds
OpenTelemetry collector: 54 seconds
Total: 103 seconds
The OpenTelemetry collector and the Grafana LGTM stack are up and running. (created /tmp/ready)
Open ports:
 - 4317: OpenTelemetry GRPC endpoint
 - 4318: OpenTelemetry HTTP endpoint
 - 3000: Grafana (http://localhost:3000). User: admin, password: admin
 - 4040: Pyroscope endpoint
 - 9090: Prometheus endpoint

Trouble shooting

Troubleshooting grafana

Grafana getting 502 when trying to connect to elasticsearch

I had to use the IP address on the docker container net and the actual port 9200 not the one on the public network.

http://172.17.0.2:9200

ELK Troubleshooting

vm.max_map_count

  • sudo sysctl -w vm.max_map_count=262144
    • for the elk stack to run

Troubleshooting fluentbit

illegal_argument_exception

Remove or Replace _type Parameter: Since type is no longer supported, you should either remove it entirely or replace it with an appropriate parameter depending on your Elasticsearch version. In Elasticsearch 7.x and later, documents are stored in a single_doc type by default, so you can simply remove the Type parameter altogether

    [OUTPUT]
        Name es
        Match kube.*
        Host elasticsearch-master
        Suppress_Type_Name on
        Logstash_Format On
        Retry_Limit False
        Type _doc

    [OUTPUT]
        Name es
        Match host.*
        Host elasticsearch-master
        Suppress_Type_Name on
        Logstash_Format On
        Logstash_Prefix node
        Retry_Limit False
        Type _doc
[2024/04/08 09:11:14] [error] [output:es:es.0] HTTP status=400 URI=/_bulk, response:
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Action/metadata line [1] contains an unknown parameter [_type]"}],"type":"illegal_argument_exception","reason":"Action/metadata line [1] contains an unknown parameter [_type]"},"status":400}

[2024/04/08 09:11:14] [ warn] [engine] failed to flush chunk '1-1712566940.860717429.flb', retry in 532 seconds: task_id=171, input=tail.0 > output=es.0 (out_id=0)
[2024/04/08 09:11:14] [ warn] [engine] failed to flush chunk '1-1712566838.861255530.flb', retry in 538 seconds: task_id=53, input=tail.0 > output=es.0 (out_id=0)

failed to flush chunk '1-1712829235.737404026.flb', retry in 8 seconds: task_id=1, input=tail.0 > output=es.0 (out_id=0)

[2024/04/11 10:22:26] [debug] [input chunk] update output instances with new chunk size diff=990, records=1, input=tail.0
[2024/04/11 10:22:26] [debug] [task] created task=0x7fb55a239f20 id=3 OK
[2024/04/11 10:22:26] [debug] [upstream] KA connection #125 to elasticsearch-master:9200 has been assigned (recycled)
[2024/04/11 10:22:26] [debug] [output:es:es.0] task_id=3 assigned to thread #0
[2024/04/11 10:22:26] [debug] [http_client] not using http_proxy for header
[2024/04/11 10:22:26] [debug] [output:es:es.0] HTTP Status=200 URI=/_bulk
[2024/04/11 10:22:26] [debug] [output:es:es.0] Elasticsearch response
{"errors":false,"took":51,"items":[{"create":{"_index":"logstash-2024.04.11","_id":"vUWuzI4B-yGqFaeXHvYb","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":556031,"_primary_term":2,"status":201}}]}
[2024/04/11 10:22:26] [debug] [upstream] KA connection #125 to elasticsearch-master:9200 is now available
[2024/04/11 10:22:26] [debug] [out flush] cb_destroy coro_id=183
[2024/04/11 10:22:26] [debug] [task] destroy task=0x7fb55a239f20 (task_id=3)
[2024/04/11 10:22:27] [debug] [output:es:es.0] task_id=2 assigned to thread #1
[2024/04/11 10:22:27] [debug] [upstream] KA connection #122 to elasticsearch-master:9200 has been assigned (recycled)
[2024/04/11 10:22:27] [debug] [http_client] not using http_proxy for header
[2024/04/11 10:22:27] [debug] [output:es:es.0] HTTP Status=200 URI=/_bulk
[2024/04/11 10:22:27] [debug] [upstream] KA connection #122 to elasticsearch-master:9200 is now available
[2024/04/11 10:22:27] [debug] [out flush] cb_destroy coro_id=183
[2024/04/11 10:22:27] [debug] [retry] re-using retry for task_id=2 attempts=2
[2024/04/11 10:22:27] [ warn] [engine] failed to flush chunk '1-1712830937.946819883.flb', retry in 9 seconds: task_id=2, input=tail.0 > output=es.0 (out_id=0)
[2024/04/11 10:22:27] [debug] [input:tail:tail.0] inode=312543, /var/log/containers/vanilla-log-7c7bdb9545-jvrqt_default_vanilla-log-58da14eb59c2e88f5dd1389f42fd80de64c21d14804a5c166d0bca5be163305d.log, events: IN_MODIFY
[2024/04/11 10:22:27] [debug] [input chunk] update output instances with new chunk size diff=818, records=1, input=tail.0
[2024/04/11 10:22:28] [debug] [task] created task=0x7fb55a239c00 id=3 OK
[2024/04/11 10:22:28] [debug] [upstream] KA connection #123 to elasticsearch-master:9200 has been assigned (recycled)
[2024/04/11 10:22:28] [debug] [output:es:es.0] task_id=3 assigned to thread #0
[2024/04/11 10:22:28] [debug] [http_client] not using http_proxy for header
[2024/04/11 10:22:28] [debug] [output:es:es.0] HTTP Status=200 URI=/_bulk
[2024/04/11 10:22:28] [debug] [upstream] KA connection #123 to elasticsearch-master:9200 is now available
[2024/04/11 10:22:28] [debug] [out flush] cb_destroy coro_id=184
[2024/04/11 10:22:28] [debug] [retry] new retry created for task_id=3 attempts=1
[2024/04/11 10:22:28] [ warn] [engine] failed to flush chunk '1-1712830947.947063732.flb', retry in 9 seconds: task_id=3, input=tail.0 > output=es.0 (out_id=0)

add

        Trace_Error       On
        Trace_Output      On

to

    [OUTPUT]
        Name es
        Match kube.*
        Host elasticsearch-master
        Logstash_Format On
        Retry_Limit False
        Trace_Error       On
        Trace_Output      On
        Suppress_Type_Name on
        Type _doc
[2024/04/11 21:10:55] [ info] [input:tail:tail.0] inotify_fs_add(): inode=311564 watch_fd=1 name=/var/log/containers/log-json-simple-7996b6c769-pdnb9_default_log-json-simple-0f06eff0c18da53700b06390f264163aac4652e4ae3acf6de395b1911dbc92f8.log
{"create":{"_index":"node-2024.04.11"}}
{"@timestamp":"2024-04-11T21:10:56.292Z","PRIORITY":"6","SYSLOG_FACILITY":"3","_UID":"0","_GID":"0","_CAP_EFFECTIVE":"1ffffffffff","_SELINUX_CONTEXT":"unconfined\n","_MACHINE_ID":"16f08a93c1904a3191927197e0cfbffb","_HOSTNAME":"worker1","_SYSTEMD_SLICE":"system.slice","_TRANSPORT":"stdout","SYSLOG_IDENTIFIER":"kubelet","_COMM":"kubelet","_EXE":"/usr/bin/kubelet","_CMDLINE":"/usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock --pod-infra-container-image=registry.k8s.io/pause:3.9","_SYSTEMD_CGROUP":"/system.slice/kubelet.service","_SYSTEMD_UNIT":"kubelet.service","_PID":"822","_BOOT_ID":"1f405ae4c27d4f77b6aa322c97c976a7","_STREAM_ID":"c3ffcb16b5db48e0afffcab8690165d4","_SYSTEMD_INVOCATION_ID":"1b8e62ec16bb4d28b770bd2adfd3d194","MESSAGE":"I0411 21:10:56.292107     822 pod_startup_latency_tracker.go:102] \"Observed pod startup duration\" pod=\"default/fluent-bit-bvc68\" podStartSLOduration=4.405401576 podStartE2EDuration=\"7.292045832s\" podCreationTimestamp=\"2024-04-11 21:10:49 +0000 UTC\" firstStartedPulling=\"2024-04-11 21:10:51.470213906 +0000 UTC m=+47501.068491933\" lastFinishedPulling=\"2024-04-11 21:10:54.356858132 +0000 UTC m=+47503.955136189\" observedRunningTime=\"2024-04-11 21:10:56.291283103 +0000 UTC m=+47505.889560930\" watchObservedRunningTime=\"2024-04-11 21:10:56.292045832 +0000 UTC m=+47505.890323619\""}
{"create":{"_index":"logstash-2024.04.11"}}
{"@timestamp":"2024-04-11T21:10:57.820Z","log":"2024-04-11T21:10:57.820318828Z stderr F {\"level\":\"Info\",\"ts\":1712869857819,\"msg\":\"Hello from logs-basic-example\"}","kubernetes":{"pod_name":"log-json-simple-7996b6c769-pdnb9","namespace_name":"default","pod_id":"39c7e953-7648-4d4a-99ee-45ff147e6c69","labels":{"app":"log-json-simple","pod-template-hash":"7996b6c769"},"annotations":{"cni.projectcalico.org/containerID":"1064c90cebfad72b17cb9abee89ed5e1c1965df1cffbe7b6c5ef309ea0fae422","cni.projectcalico.org/podIP":"10.42.235.138/32","cni.projectcalico.org/podIPs":"10.42.235.138/32"},"host":"worker1","container_name":"log-json-simple","docker_id":"0f06eff0c18da53700b06390f264163aac4652e4ae3acf6de395b1911dbc92f8","container_hash":"192.168.1.102:5000/log-json-simple@sha256:2bf291c81781a92c5ad7a5bbbfc7ba80974d928d07a681cd51a8708ff5100687","container_image":"192.168.1.102:5000/log-json-simple:0.1.0"}}
[2024/04/11 21:10:57] [error] [output:es:es.0] error: Output
{"errors":true,"took":0,"items":[{"create":{"_index":"logstash-2024.04.11","_id":"kWD_zo4B-yGqFaeX2nsq","status":400,"error":{"type":"document_parsing_exception","reason":"[1:325] object mapping for [kubernetes.labels.app] tried to parse field [app] as object, but found a concrete value"}}}]}
[2024/04/11 21:10:57] [ warn] [engine] failed to flush chunk '1-1712869857.820601044.flb', retry in 6 seconds: task_id=0, input=tail.0 > output=es.0 (out_id=0)
{
  "@timestamp": "2024-04-11T21:10:56.292Z",
  "PRIORITY": "6",
  "SYSLOG_FACILITY": "3",
  "_UID": "0",
  "_GID": "0",
  "_CAP_EFFECTIVE": "1ffffffffff",
  "_SELINUX_CONTEXT": "unconfined\n",
  "_MACHINE_ID": "16f08a93c1904a3191927197e0cfbffb",
  "_HOSTNAME": "worker1",
  "_SYSTEMD_SLICE": "system.slice",
  "_TRANSPORT": "stdout",
  "SYSLOG_IDENTIFIER": "kubelet",
  "_COMM": "kubelet",
  "_EXE": "/usr/bin/kubelet",
  "_CMDLINE": "/usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock --pod-infra-container-image=registry.k8s.io/pause:3.9",
  "_SYSTEMD_CGROUP": "/system.slice/kubelet.service",
  "_SYSTEMD_UNIT": "kubelet.service",
  "_PID": "822",
  "_BOOT_ID": "1f405ae4c27d4f77b6aa322c97c976a7",
  "_STREAM_ID": "c3ffcb16b5db48e0afffcab8690165d4",
  "_SYSTEMD_INVOCATION_ID": "1b8e62ec16bb4d28b770bd2adfd3d194",
  "MESSAGE": "I0411 21:10:56.292107     822 pod_startup_latency_tracker.go:102] \"Observed pod startup duration\" pod=\"default/fluent-bit-bvc68\" podStartSLOduration=4.405401576 podStartE2EDuration=\"7.292045832s\" podCreationTimestamp=\"2024-04-11 21:10:49 +0000 UTC\" firstStartedPulling=\"2024-04-11 21:10:51.470213906 +0000 UTC m=+47501.068491933\" lastFinishedPulling=\"2024-04-11 21:10:54.356858132 +0000 UTC m=+47503.955136189\" observedRunningTime=\"2024-04-11 21:10:56.291283103 +0000 UTC m=+47505.889560930\" watchObservedRunningTime=\"2024-04-11 21:10:56.292045832 +0000 UTC m=+47505.890323619\""
}
{
  "errors": true,
  "took": 0,
  "items": [
    {
      "create": {
        "_index": "logstash-2024.04.11",
        "_id": "kWD_zo4B-yGqFaeX2nsq",
        "status": 400,
        "error": {
          "type": "document_parsing_exception",
          "reason": "[1:325] object mapping for [kubernetes.labels.app] tried to parse field [app] as object, but found a concrete value"
        }
      }
    }
  ]
}