EFK Deployment using Helm - caprivm/virtualization GitHub Wiki

caprivm ([email protected])

Description

This page describes the steps needed to install the EFK (Elasticsearch, Fluentd, Kibana) stack for Magma services. elasticsearch installation is based on a cluster of 1 master node and 2 data nodes. fluentd is used to take data from the AGW and store it in Elasticsearch, while kibana is used as a visualization tool.

All steps that expose this section has been tested in a Deployment Machine using the following requirements.

Feature Value
OS Used Ubuntu 18.04 LTS
vCPU 2
RAM (GB) 4
Disk (GB) 50
Home user ubuntu
Kubernetes namespace magma

The contents of the page are:

Prerequisites

Before starting this guide, you should have installed the following tools. You can check the adjacent links if you haven't already:

Add the necessary Helm repositories

The installation is based on Helm, so a few repos are required:

helm repo add stable https://charts.helm.sh/stable
helm repo add elastic https://helm.elastic.co
helm repo update

Install the Elasticsearch Cluster

The Elasticsearch installation consists of a cluster of 1 master node and 2 data nodes. To do this, consider the following upgrade/installation steps. The helm upgrade --install command installs the chart if it does not exist. Also the elasticsearch-curator is installed for cleaning and doing basic maintenance on indices in the elasticsearch cluster.

helm -n magma upgrade --install elasticsearch elastic/elasticsearch --values facebook_values_elasticsearch_master_x.yaml
helm -n magma upgrade --install elasticsearch-data elastic/elasticsearch --values facebook_values_elasticsearch_data_x.yaml
helm -n magma upgrade --install elasticsearch-data2 elastic/elasticsearch --values facebook_values_elasticsearch_data2_x.yaml
helm -n magma upgrade --install elasticsearch-curator stable/elasticsearch-curator --values facebook_values_elasticsearch_curator_x.yaml

Below are examples for each values file. Adjust the values according to your environment or deployment need.

Elasticsearch master node

facebook_values_elasticsearch_master_x.yaml
esJavaOpts: -Xmx2G -Xms2G
imageTag: 7.12.0
minimumMasterNodes: 1
replicas: 1
resources:
  limits:
    cpu: "2"
    memory: 4Gi
  requests:
    cpu: "2"
    memory: 4Gi
roles:
  data: "false"
  ingest: "false"
  master: "true"
  ml: "false"
  remote_cluster_client: "false"
volumeClaimTemplate:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: longhorn    # <--- Your storageClass
replicas: 1
minimumMasterNodes: 1
rbac:
  create: true
antiAffinity: "soft"
service:
  annotations:
    external-dns.alpha.kubernetes.io/hostname: elasticsearch.magma.svc.cluster.local    # <--- Your external-dns configuration

Elasticsearch data node 1

facebook_values_elasticsearch_data_x.yaml
esJavaOpts: -Xmx4G -Xms4G
imageTag: 7.12.0
minimumMasterNodes: 1
nodeGroup: data
replicas: 1
resources:
  limits:
    cpu: "1"
    memory: 8Gi
  requests:
    cpu: "1"
    memory: 8Gi
roles:
  data: "true"
  ingest: "true"
  master: "false"
  ml: "false"
  remote_cluster_client: "false"
volumeClaimTemplate:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi
  storageClassName: longhorn     # <--- Your storageClass

Elasticsearch data node 2

facebook_values_elasticsearch_data2_x.yaml
esJavaOpts: -Xmx4G -Xms4G
imageTag: 7.12.0
minimumMasterNodes: 1
nodeGroup: data2
replicas: 1
resources:
  limits:
    cpu: "1"
    memory: 8Gi
  requests:
    cpu: "1"
    memory: 8Gi
roles:
  data: "true"
  ingest: "true"
  master: "false"
  ml: "false"
  remote_cluster_client: "false"
volumeClaimTemplate:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi
  storageClassName: longhorn     # <--- Your storageClass

Elasticsearch curator

facebook_values_elasticsearch_curator_x.yaml
cronjob:
  schedule: "0 0 * * *"
  annotations: {}
  labels: {}
  concurrencyPolicy: ""
  failedJobsHistoryLimit: ""
  successfulJobsHistoryLimit: ""
  jobRestartPolicy: Never

configMaps:
  action_file_yml: |-
    ---
    actions:
      1:
        action: delete_indices
        description: "Clean up ES by deleting old indices"
        options:
          timeout_override:
          continue_if_exception: False
          disable_action: False
          ignore_empty_list: True
        filters:
        - filtertype: age
          source: name
          direction: older
          timestring: '%Y.%m.%d'
          unit: days
          unit_count: 7
          field:
          stats_result:
          epoch:
          exclude: False
      2:
        action: delete_indices
        description: "Clean up ES by magma log indices if it consumes more than 75% of volume"
        options:
          timeout_override:
          continue_if_exception: False
          disable_action: False
          ignore_empty_list: True
        filters:
        - filtertype: pattern
          kind: prefix
          value: magma-
        - filtertype: space
          disk_space: 10
          use_age: True
          source: creation_date
  config_yml: |-
    ---
    client:
      hosts:
        - elasticsearch-master:9200   # <--- Your elasticsearch master service and port
      port: 9200
      use_ssl: false
    logging:
      loglevel: "INFO"

Install Fluentd

Fluentd is used to take information from the AGW and store it in the elasticsearch cluster. Fluentd integrates with the td-agent-bit service in the AGW to obtain the information of logs, CDRs, etc., which means that if that service is not running it will not be able to obtain the necessary information. Its installation is done from the stable repository:

helm -n magma upgrade --install fluentd stable/fluentd --values facebook_values_fluentd_x.yaml

Below is an example for the values file. Adjust the values according to your environment or deployment need.

Fluentd

facebook_values_fluentd_x.yaml
configMaps:
  forward-input.conf: |-
    <source>
      @type forward
      port 24224
      bind 0.0.0.0
      <transport tls>
        ca_path /certs/certifier.pem
        cert_path /certs/fluentd.pem
        private_key_path /certs/fluentd.key
        client_cert_auth true
      </transport>
    </source>
  output.conf: |-
    <match eventd>
      @id eventd_elasticsearch
      @type elasticsearch
      @log_level info
      include_tag_key true
      host "#{ENV['OUTPUT_HOST']}"
      port "#{ENV['OUTPUT_PORT']}"
      scheme "#{ENV['OUTPUT_SCHEME']}"
      ssl_version "#{ENV['OUTPUT_SSL_VERSION']}"
      logstash_format true
      logstash_prefix "eventd"
      reconnect_on_error true
      reload_on_failure true
      reload_connections false
      log_es_400_reason true
      <buffer>
        @type file
        path /var/log/fluentd-buffers/eventd.kubernetes.system.buffer
        flush_mode interval
        retry_type exponential_backoff
        flush_thread_count 2
        flush_interval 5s
        retry_forever
        retry_max_interval 30
        chunk_limit_size "#{ENV['OUTPUT_BUFFER_CHUNK_LIMIT']}"
        queue_limit_length "#{ENV['OUTPUT_BUFFER_QUEUE_LIMIT']}"
        overflow_action block
      </buffer>
    </match>
    <match **>
      @id elasticsearch
      @type elasticsearch
      @log_level info
      include_tag_key true
      host "#{ENV['OUTPUT_HOST']}"
      port "#{ENV['OUTPUT_PORT']}"
      scheme "#{ENV['OUTPUT_SCHEME']}"
      ssl_version "#{ENV['OUTPUT_SSL_VERSION']}"
      logstash_format true
      logstash_prefix "magma"
      reconnect_on_error true
      reload_on_failure true
      reload_connections false
      <buffer>
        @type file
        path /var/log/fluentd-buffers/kubernetes.system.buffer
        flush_mode interval
        retry_type exponential_backoff
        flush_thread_count 2
        flush_interval 5s
        retry_forever
        retry_max_interval 30
        chunk_limit_size "#{ENV['OUTPUT_BUFFER_CHUNK_LIMIT']}"
        queue_limit_length "#{ENV['OUTPUT_BUFFER_QUEUE_LIMIT']}"
        overflow_action block
      </buffer>
    </match>
extraVolumeMounts:
- mountPath: /certs
  name: certs
  readOnly: true
extraVolumes:
- name: certs
  secret:
    defaultMode: 420
    secretName: orc8r-secrets-certs
output:
  host: elasticsearch-master
  port: 9200
  scheme: http
rbac:
  create: false
replicaCount: 2
service:
  annotations:
    external-dns.alpha.kubernetes.io/hostname: fluentd.magma.svc.cluster.local
  ports:
  - containerPort: 24224
    name: "forward"
    protocol: TCP
  type: LoadBalancer

Install Kibana

Kibana is used as a visualization tool for the events generated from the AGW to the Orchestrator. In this case kibana-oss is used which uses only open source code. Through the graphical interface you can create dashboards and/or custom displays of information related to logs, CDRs, etc. Its installation is done from the stable repository:

helm -n magma upgrade --install kibana stable/kibana --values facebook_values_kibana_x.yaml

Below is an example for the values file. Adjust the values according to your environment or deployment need.

Kibana

facebook_values_fluentd_x.yaml
image:
  repository: docker.elastic.co/kibana/kibana-oss
  tag: 7.10.2
  pullPolicy: IfNotPresent

env:
  LOGGING_VERBOSE: "false"
  ELASTICSEARCH_HOSTS: http://elasticsearch-master:9200
  # SERVER_PORT: 5601

files:
  kibana.yml:
    server.name: kibana
    server.host: "0"
    elasticsearch.hosts: http://elasticsearch-master:9200

dashboardImport:
  enabled: true
  timeout: 60
  dashboards:
    k8s: https://raw.githubusercontent.com/monotek/kibana-dashboards/master/k8s-fluentd-elasticsearch.json

service:
  type: LoadBalancer

Verify installation

Wait for the deployment of EKF stack and check the status of the pods:

# NAME                                             READY   STATUS      RESTARTS   AGE
# elasticsearch-curator-1630198800-p89mw           0/1     Completed   0          2d20h
# elasticsearch-curator-1630285200-l5scp           0/1     Completed   0          44h
# elasticsearch-curator-1630368000-zfk4t           0/1     Completed   0          21h
# elasticsearch-data-0                             1/1     Running     0          8d
# elasticsearch-data2-0                            1/1     Running     0          8d
# elasticsearch-master-0                           1/1     Running     0          8d
# fluentd-55ff7685f4-djxkp                         1/1     Running     0          11d
# fluentd-55ff7685f4-jcx67                         1/1     Running     0          11d
# kibana-6b9594695d-s7wrk                          1/1     Running     0          20h

Troubleshooting

In this section some known errors and tips to identify problems in the deployment are attached.

Kibana

Some issues in Kibana.

Kibana is not ready yet

Based on: https://discuss.elastic.co/t/error-kibana-server-is-not-ready-yet/156834

When you try to access the Kibana GUI, the error appears: Kibana is not ready yet. To fix this error, delete the Kibana indexes that were created in elasticsearch and restart the Kibana service.

curl -XDELETE http://elasticsearch-master.magma.svc.cluster.local:9200/.kibana*
curl -XGET http://elasticsearch-master.magma.svc.cluster.local:9200/.kibana*      # <--- To validate
kubectl -n magma delete pod kibana-<id>

Elasticsearch

Some issues in Elasticsearch.

Check available space to store data

A useful command in the elasticsearch cluster can be to see how much space is left to store information. You can do it through:

curl -XGET http://elasticsearch-master.magma.svc.cluster.local:9200/_cat/allocation?v
# shards disk.indices disk.used disk.avail disk.total disk.percent host           ip             node
#      3       71.8mb   432.4mb     48.5gb     48.9gb            0 10.233.67.49   10.233.67.49   elasticsearch-data2-0
#      3       71.6mb   432.1mb     48.5gb     48.9gb            0 10.233.114.130 10.233.114.130 elasticsearch-data-0
⚠️ **GitHub.com Fallback** ⚠️