Log archive - project-hatohol/hatohol GitHub Wiki
This document describes how to construct log archive system with Hatohol.
"The log archive system" means "log archive system with Hatohol".
Some users (such as carriers) need to keep old logs.
Zabbix stores logs into RDBMS. It is not efficient because stored logs aren't used in most cases. If the stored logs are used only when something important accident is occurred.
Logs are large size data. Storing logs to RDBMS is high cost for CPU, network, I/O and disk space.
The log archive system also provides Log search system integration to support searching archived logs.
Hatohol uses Fluentd for better log archive system.
The log archive system focuses on "archived logs aren't used in most cases". The log archive system doesn't store logs to RDBMS. The log archive system just stores logs to file system.
The same logs are stored into two or more nodes to be protected from data lost.
The log archive system introduces "log routing nodes" that copies logs and distributes logs to suitable "log archive nodes".
"log routing nodes" cluster will not be changed. But "log archive nodes" cluster will be changed. "log archive nodes" will be increased to store new logs.
If monitoring target nodes should know about "log archive nodes", monitoring target node's configurations are needed to be changed. There are many monitoring target nodes. So changing monitoring target node's configurations is bother.
With "log routing nodes", monitoring target nodes just need to know about "log routing nodes". "log routing nodes" cluster isn't changed. So monitoring target node's configurations are also not changed.
Users just maintain "log routing nodes" and "log archive nodes". Users don't need to maintain monitoring target node's configurations frequency.
Here is the log archive system:
+-------------+ +-------------+ +-------------+ Monitoring
|Fluentd | |Fluentd | |Fluentd | target
+-------------+ +-------------+ +-------------+ nodes
collects and collects and collects and
forwards logs forwards logs forwards logs
| | |
| secure connection | |
| | |
\/ \/ \/
+-------------+ +-------------+
|Fluentd | |Fluentd | Log routing nodes
+-------------+ +-------------+
copies and copies and
distributes distributes
logs logs
| \ | \
| secure connection | \
| \ | \
\/ _\| \/ _\|
+-------------+ +-------------+ +-------------+
|Fluentd | |Fluentd | |Fluentd | Log archive nodes
+-------------+ +-------------+ +-------------+
store logs store logs store logs
You need to set up the following node types:
- Log archive node
- Log routing node
- Monitoring target node
The following subsections describe about how to set up each node type.
You need to set up Fluentd on all nodes. This section describes common set up procedure.
Fluentd recommends to install ntpd to use valid timestamp.
See also: Before Installing Fluentd | Fluentd
Install and run ntpd:
% sudo yum install -y ntp
% sudo chkconfig ntpd on
% sudo service ntpd start
Install Fluentd:
% curl -L http://toolbelt.treasuredata.com/sh/install-redhat-td-agent2.sh | sh
% sudo chkconfig td-agent on
See also: Installing Fluentd Using rpm Package | Fluentd
Note: td-agent is a Fluentd distribution provided by Treasure Data, Inc.. Td-agent provides init script. So it is suitable for server use.
Confirm host name is valid:
% hostname
node1.example.com
If host name isn't valid, you can set host name by the following:
% sudo vi /etc/sysconfig/network
(Change HOSTNAME= line.)
% sudo /sbin/shutdown -r now
% hostname
node1.example.com
(Confirm your host name.)
Install the following Fluentd plugins:
% sudo td-agent-gem install fluent-plugin-secure-forward
% sudo td-agent-gem install fluent-plugin-sort
% sudo td-agent-gem install fluent-plugin-record-reformer
% sudo td-agent-gem install fluent-plugin-forest
Configure Fluentd:
% sudo mkdir -p /var/spool/td-agent/buffer/
% sudo chown -R td-agent:td-agent /var/spool/td-agent/
% sudo mkdir -p /var/log/archive/
% sudo chown -R td-agent:td-agent /var/log/archive/
Create /etc/td-agent/td-agent.conf
:
<source>
type secure_forward
shared_key fluentd-secret
self_hostname "#{Socket.gethostname}"
cert_auto_generate yes
</source>
<match raw.**>
type sort
add_tag_prefix sorted.
buffer_type file
buffer_path /var/spool/td-agent/buffer/sort
flush_interval 60
</match>
<match sorted.raw.**>
type record_reformer
enable_ruby false
tag archive.${tag_parts[1]}.${tag_parts[3]}.${tag_suffix[4]}.${tag_parts[2]}
</match>
<match archive.raw.log.**>
type forest
remove_prefix archive.raw.log
escape_tag_separator /
subtype file
<template>
path /var/log/archive/${escaped_tag}
compress gz
format single_value
append true
flush_interval 60
</template>
</match>
A log archive node expects message tag is the following format:
raw.${type}.log.${host_name}
For example:
raw.messages.log.node1
raw.apache2.log.node2
Messages are stored to the following format path:
/var/log/archive/${host_name}/${type}
Logs are compressed by gzip.
Fluentd doesn't guarantee message order. It means that archived logs
may be out of order. fluent-plugin-sort
plugin reduces the case.
fluent-plugin-sort
keeps received messages before storing
files. fluent-plugin-sort
sorts buffered messages when buffer is
flushed. It will reduce out of order logs but it's not perfect. Some
logs may be out of order.
You can reduce out of order logs by increasing flush_interval
parameter in type sort
configuration. If its value is large, many
messages will be sorted at once. It reduces out of order logs. But
large flush interval delays log archive. Tune the parameter carefully.
Ensure starting Fluentd:
% sudo service td-agent restart
Install the following Fluentd plugins:
% sudo td-agent-gem install fluent-plugin-secure-forward
% sudo td-agent-gem install fluent-plugin-forest
Configure Fluentd:
% sudo mkdir -p /var/spool/td-agent/buffer/
% sudo chown -R td-agent:td-agent /var/spool/td-agent/
Create /etc/td-agent/td-agent.conf
:
<source>
type secure_forward
shared_key fluentd-secret
self_hostname "#{Socket.gethostname}"
cert_auto_generate yes
</source>
<match raw.*.log.**>
type forest
subtype secure_forward
<template>
shared_key fluentd-secret
self_hostname "#{Socket.gethostname}"
buffer_type file
buffer_path /var/spool/td-agent/buffer/secure-forward-${escaped_tag}
flush_interval 1
</template>
<case raw.*.log.node1.example.com>
<server>
host archiver1.example.com
</server>
</case>
<case raw.*.log.node2.example.com>
<server>
host archiver2.example.com
</server>
</case>
<case **>
<server>
host archiver2.example.com
</server>
</case>
</match>
A log parsing node expects message tag is the following format:
raw.${type}.log.${host_name}
For example:
raw.messages.log.node1
raw.apache2.log.node2
All log routing nodes must be able to resolve names of all log archive
nodes. Names of all log archive nodes are archiver1.example.com
and
archiver2.example.com
in this example.
Confirm that a log routing node can resolve names of all log archive nodes:
% ping -c 1 archiver1.example.com
% ping -c 1 archiver2.example.com
If you get the following error message, you must configure your DNS or
edit your /etc/hosts
.
ping: unknown host archiver1.example.com
Ensure starting Fluentd after a log routing node can resolve names of all log archive nodes:
Ensure starting Fluentd:
% sudo service td-agent restart
Install the following Fluentd plugins:
% sudo td-agent-gem install fluent-plugin-secure-forward
% sudo td-agent-gem install fluent-plugin-config-expander
Configure Fluentd:
% sudo mkdir -p /var/spool/td-agent/buffer/
% sudo chown -R td-agent:td-agent /var/spool/td-agent/
% sudo chmod g+r /var/log/messages
% sudo chgrp td-agent /var/log/messages
Create /etc/td-agent/td-agent.conf
:
<source>
type config_expander
<config>
type tail
path /var/log/messages
pos_file /var/log/td-agent/messages.pos
tag raw.messages.log.${hostname}
format none
</config>
</source>
<match raw.*.log.**>
type copy
<store>
type secure_forward
shared_key fluentd-secret
self_hostname "#{Socket.gethostname}"
buffer_type file
buffer_path /var/spool/td-agent/buffer/secure-forward-router
flush_interval 1
<server>
host router1.example.com
</server>
<server>
host router2.example.com
</server>
</store>
</match>
Monitoring target node configuration for log archive system can be shared with the configuration for [log search system](Log search).
You can share configuration by adding more <store>
sub section into
<match raw.*.log.**>
section:
<match raw.*.log.**>
type copy
<store>
type secure_forward
shared_key fluentd-secret
self_hostname "#{Socket.gethostname}"
buffer_type file
buffer_path /var/spool/td-agent/buffer/secure-forward-parser
flush_interval 1
<server>
host parser1.example.com
</server>
<server>
host parser2.example.com
</server>
</store>
<store>
# ...
</store>
</match>
All monitoring target nodes must be able to resolve names of all log
routing nodes (and all log parsing nodes). All log routing node names
are router1.example.com
and router2.example.com
in this example.
Confirm that a monitoring target node can resolve names of all log routing nodes:
% ping -c 1 router1.example.com
% ping -c 1 router2.example.com
If you get the following error message, you must configure your DNS or
edit your /etc/hosts
.
ping: unknown host router1.example.com
Ensure starting Fluentd after a monitoring target node can resolve names of all log routing nodes (and all log parsing nodes):
% sudo service td-agent restart