EFK Deployment using Helm - caprivm/virtualization GitHub Wiki
caprivm ([email protected])
This page describes the steps needed to install the EFK (Elasticsearch, Fluentd, Kibana) stack for Magma services. elasticsearch
installation is based on a cluster of 1 master node and 2 data nodes. fluentd
is used to take data from the AGW and store it in Elasticsearch, while kibana
is used as a visualization tool.
All steps that expose this section has been tested in a Deployment Machine using the following requirements.
Feature | Value |
---|---|
OS Used | Ubuntu 18.04 LTS |
vCPU | 2 |
RAM (GB) | 4 |
Disk (GB) | 50 |
Home user | ubuntu |
Kubernetes namespace | magma |
The contents of the page are:
- Description
- Prerequisites
- Install the Elasticsearch Cluster
- Install Fluentd
- Install Kibana
- Verify installation
- Troubleshooting
Before starting this guide, you should have installed the following tools. You can check the adjacent links if you haven't already:
The installation is based on Helm, so a few repos are required:
helm repo add stable https://charts.helm.sh/stable
helm repo add elastic https://helm.elastic.co
helm repo update
The Elasticsearch installation consists of a cluster of 1 master node and 2 data nodes. To do this, consider the following upgrade/installation steps. The helm upgrade --install
command installs the chart if it does not exist. Also the elasticsearch-curator
is installed for cleaning and doing basic maintenance on indices in the elasticsearch
cluster.
helm -n magma upgrade --install elasticsearch elastic/elasticsearch --values facebook_values_elasticsearch_master_x.yaml
helm -n magma upgrade --install elasticsearch-data elastic/elasticsearch --values facebook_values_elasticsearch_data_x.yaml
helm -n magma upgrade --install elasticsearch-data2 elastic/elasticsearch --values facebook_values_elasticsearch_data2_x.yaml
helm -n magma upgrade --install elasticsearch-curator stable/elasticsearch-curator --values facebook_values_elasticsearch_curator_x.yaml
Below are examples for each values file. Adjust the values according to your environment or deployment need.
Elasticsearch master node
facebook_values_elasticsearch_master_x.yaml
esJavaOpts: -Xmx2G -Xms2G
imageTag: 7.12.0
minimumMasterNodes: 1
replicas: 1
resources:
limits:
cpu: "2"
memory: 4Gi
requests:
cpu: "2"
memory: 4Gi
roles:
data: "false"
ingest: "false"
master: "true"
ml: "false"
remote_cluster_client: "false"
volumeClaimTemplate:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: longhorn # <--- Your storageClass
replicas: 1
minimumMasterNodes: 1
rbac:
create: true
antiAffinity: "soft"
service:
annotations:
external-dns.alpha.kubernetes.io/hostname: elasticsearch.magma.svc.cluster.local # <--- Your external-dns configuration
Elasticsearch data node 1
facebook_values_elasticsearch_data_x.yaml
esJavaOpts: -Xmx4G -Xms4G
imageTag: 7.12.0
minimumMasterNodes: 1
nodeGroup: data
replicas: 1
resources:
limits:
cpu: "1"
memory: 8Gi
requests:
cpu: "1"
memory: 8Gi
roles:
data: "true"
ingest: "true"
master: "false"
ml: "false"
remote_cluster_client: "false"
volumeClaimTemplate:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
storageClassName: longhorn # <--- Your storageClass
Elasticsearch data node 2
facebook_values_elasticsearch_data2_x.yaml
esJavaOpts: -Xmx4G -Xms4G
imageTag: 7.12.0
minimumMasterNodes: 1
nodeGroup: data2
replicas: 1
resources:
limits:
cpu: "1"
memory: 8Gi
requests:
cpu: "1"
memory: 8Gi
roles:
data: "true"
ingest: "true"
master: "false"
ml: "false"
remote_cluster_client: "false"
volumeClaimTemplate:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
storageClassName: longhorn # <--- Your storageClass
Elasticsearch curator
facebook_values_elasticsearch_curator_x.yaml
cronjob:
schedule: "0 0 * * *"
annotations: {}
labels: {}
concurrencyPolicy: ""
failedJobsHistoryLimit: ""
successfulJobsHistoryLimit: ""
jobRestartPolicy: Never
configMaps:
action_file_yml: |-
---
actions:
1:
action: delete_indices
description: "Clean up ES by deleting old indices"
options:
timeout_override:
continue_if_exception: False
disable_action: False
ignore_empty_list: True
filters:
- filtertype: age
source: name
direction: older
timestring: '%Y.%m.%d'
unit: days
unit_count: 7
field:
stats_result:
epoch:
exclude: False
2:
action: delete_indices
description: "Clean up ES by magma log indices if it consumes more than 75% of volume"
options:
timeout_override:
continue_if_exception: False
disable_action: False
ignore_empty_list: True
filters:
- filtertype: pattern
kind: prefix
value: magma-
- filtertype: space
disk_space: 10
use_age: True
source: creation_date
config_yml: |-
---
client:
hosts:
- elasticsearch-master:9200 # <--- Your elasticsearch master service and port
port: 9200
use_ssl: false
logging:
loglevel: "INFO"
Fluentd is used to take information from the AGW and store it in the elasticsearch
cluster. Fluentd integrates with the td-agent-bit
service in the AGW to obtain the information of logs, CDRs, etc., which means that if that service is not running it will not be able to obtain the necessary information. Its installation is done from the stable
repository:
helm -n magma upgrade --install fluentd stable/fluentd --values facebook_values_fluentd_x.yaml
Below is an example for the values file. Adjust the values according to your environment or deployment need.
Fluentd
facebook_values_fluentd_x.yaml
configMaps:
forward-input.conf: |-
<source>
@type forward
port 24224
bind 0.0.0.0
<transport tls>
ca_path /certs/certifier.pem
cert_path /certs/fluentd.pem
private_key_path /certs/fluentd.key
client_cert_auth true
</transport>
</source>
output.conf: |-
<match eventd>
@id eventd_elasticsearch
@type elasticsearch
@log_level info
include_tag_key true
host "#{ENV['OUTPUT_HOST']}"
port "#{ENV['OUTPUT_PORT']}"
scheme "#{ENV['OUTPUT_SCHEME']}"
ssl_version "#{ENV['OUTPUT_SSL_VERSION']}"
logstash_format true
logstash_prefix "eventd"
reconnect_on_error true
reload_on_failure true
reload_connections false
log_es_400_reason true
<buffer>
@type file
path /var/log/fluentd-buffers/eventd.kubernetes.system.buffer
flush_mode interval
retry_type exponential_backoff
flush_thread_count 2
flush_interval 5s
retry_forever
retry_max_interval 30
chunk_limit_size "#{ENV['OUTPUT_BUFFER_CHUNK_LIMIT']}"
queue_limit_length "#{ENV['OUTPUT_BUFFER_QUEUE_LIMIT']}"
overflow_action block
</buffer>
</match>
<match **>
@id elasticsearch
@type elasticsearch
@log_level info
include_tag_key true
host "#{ENV['OUTPUT_HOST']}"
port "#{ENV['OUTPUT_PORT']}"
scheme "#{ENV['OUTPUT_SCHEME']}"
ssl_version "#{ENV['OUTPUT_SSL_VERSION']}"
logstash_format true
logstash_prefix "magma"
reconnect_on_error true
reload_on_failure true
reload_connections false
<buffer>
@type file
path /var/log/fluentd-buffers/kubernetes.system.buffer
flush_mode interval
retry_type exponential_backoff
flush_thread_count 2
flush_interval 5s
retry_forever
retry_max_interval 30
chunk_limit_size "#{ENV['OUTPUT_BUFFER_CHUNK_LIMIT']}"
queue_limit_length "#{ENV['OUTPUT_BUFFER_QUEUE_LIMIT']}"
overflow_action block
</buffer>
</match>
extraVolumeMounts:
- mountPath: /certs
name: certs
readOnly: true
extraVolumes:
- name: certs
secret:
defaultMode: 420
secretName: orc8r-secrets-certs
output:
host: elasticsearch-master
port: 9200
scheme: http
rbac:
create: false
replicaCount: 2
service:
annotations:
external-dns.alpha.kubernetes.io/hostname: fluentd.magma.svc.cluster.local
ports:
- containerPort: 24224
name: "forward"
protocol: TCP
type: LoadBalancer
Kibana is used as a visualization tool for the events generated from the AGW to the Orchestrator. In this case kibana-oss
is used which uses only open source code. Through the graphical interface you can create dashboards and/or custom displays of information related to logs, CDRs, etc. Its installation is done from the stable
repository:
helm -n magma upgrade --install kibana stable/kibana --values facebook_values_kibana_x.yaml
Below is an example for the values file. Adjust the values according to your environment or deployment need.
Kibana
facebook_values_fluentd_x.yaml
image:
repository: docker.elastic.co/kibana/kibana-oss
tag: 7.10.2
pullPolicy: IfNotPresent
env:
LOGGING_VERBOSE: "false"
ELASTICSEARCH_HOSTS: http://elasticsearch-master:9200
# SERVER_PORT: 5601
files:
kibana.yml:
server.name: kibana
server.host: "0"
elasticsearch.hosts: http://elasticsearch-master:9200
dashboardImport:
enabled: true
timeout: 60
dashboards:
k8s: https://raw.githubusercontent.com/monotek/kibana-dashboards/master/k8s-fluentd-elasticsearch.json
service:
type: LoadBalancer
Wait for the deployment of EKF stack and check the status of the pods:
# NAME READY STATUS RESTARTS AGE
# elasticsearch-curator-1630198800-p89mw 0/1 Completed 0 2d20h
# elasticsearch-curator-1630285200-l5scp 0/1 Completed 0 44h
# elasticsearch-curator-1630368000-zfk4t 0/1 Completed 0 21h
# elasticsearch-data-0 1/1 Running 0 8d
# elasticsearch-data2-0 1/1 Running 0 8d
# elasticsearch-master-0 1/1 Running 0 8d
# fluentd-55ff7685f4-djxkp 1/1 Running 0 11d
# fluentd-55ff7685f4-jcx67 1/1 Running 0 11d
# kibana-6b9594695d-s7wrk 1/1 Running 0 20h
In this section some known errors and tips to identify problems in the deployment are attached.
Some issues in Kibana.
Based on: https://discuss.elastic.co/t/error-kibana-server-is-not-ready-yet/156834
When you try to access the Kibana GUI, the error appears: Kibana is not ready yet
. To fix this error, delete the Kibana indexes that were created in elasticsearch
and restart the Kibana service.
curl -XDELETE http://elasticsearch-master.magma.svc.cluster.local:9200/.kibana*
curl -XGET http://elasticsearch-master.magma.svc.cluster.local:9200/.kibana* # <--- To validate
kubectl -n magma delete pod kibana-<id>
Some issues in Elasticsearch.
A useful command in the elasticsearch
cluster can be to see how much space is left to store information. You can do it through:
curl -XGET http://elasticsearch-master.magma.svc.cluster.local:9200/_cat/allocation?v
# shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
# 3 71.8mb 432.4mb 48.5gb 48.9gb 0 10.233.67.49 10.233.67.49 elasticsearch-data2-0
# 3 71.6mb 432.1mb 48.5gb 48.9gb 0 10.233.114.130 10.233.114.130 elasticsearch-data-0