Log archive - project-hatohol/hatohol GitHub Wiki

This document describes how to construct log archive system with Hatohol.

"The log archive system" means "log archive system with Hatohol".

Motivation

Some users (such as carriers) need to keep old logs.

Zabbix stores logs into RDBMS. It is not efficient because stored logs aren't used in most cases. If the stored logs are used only when something important accident is occurred.

Logs are large size data. Storing logs to RDBMS is high cost for CPU, network, I/O and disk space.

The log archive system also provides Log search system integration to support searching archived logs.

Hatohol uses Fluentd for better log archive system.

Architecture

The log archive system focuses on "archived logs aren't used in most cases". The log archive system doesn't store logs to RDBMS. The log archive system just stores logs to file system.

The same logs are stored into two or more nodes to be protected from data lost.

The log archive system introduces "log routing nodes" that copies logs and distributes logs to suitable "log archive nodes".

"log routing nodes" cluster will not be changed. But "log archive nodes" cluster will be changed. "log archive nodes" will be increased to store new logs.

If monitoring target nodes should know about "log archive nodes", monitoring target node's configurations are needed to be changed. There are many monitoring target nodes. So changing monitoring target node's configurations is bother.

With "log routing nodes", monitoring target nodes just need to know about "log routing nodes". "log routing nodes" cluster isn't changed. So monitoring target node's configurations are also not changed.

Users just maintain "log routing nodes" and "log archive nodes". Users don't need to maintain monitoring target node's configurations frequency.

Here is the log archive system:

+-------------+ +-------------+ +-------------+ Monitoring
|Fluentd      | |Fluentd      | |Fluentd      | target
+-------------+ +-------------+ +-------------+ nodes
collects and    collects and    collects and
forwards logs   forwards logs   forwards logs
|                     |               |
| secure connection   |               |
|                     |               |
\/                    \/              \/
+-------------+ +-------------+
|Fluentd      | |Fluentd      | Log routing nodes
+-------------+ +-------------+
copies and      copies and
distributes     distributes
logs            logs
|             \       |     \
| secure connection   |      \
|               \     |       \
\/              _\|   \/      _\|
+-------------+ +-------------+ +-------------+
|Fluentd      | |Fluentd      | |Fluentd      | Log archive nodes
+-------------+ +-------------+ +-------------+
store logs      store logs      store logs

How to set up

You need to set up the following node types:

  • Log archive node
  • Log routing node
  • Monitoring target node

The following subsections describe about how to set up each node type.

All nodes

You need to set up Fluentd on all nodes. This section describes common set up procedure.

Fluentd recommends to install ntpd to use valid timestamp.

See also: Before Installing Fluentd | Fluentd

Install and run ntpd:

% sudo yum install -y ntp
% sudo chkconfig ntpd on
% sudo service ntpd start

Install Fluentd:

% curl -L http://toolbelt.treasuredata.com/sh/install-redhat-td-agent2.sh | sh
% sudo chkconfig td-agent on

See also: Installing Fluentd Using rpm Package | Fluentd

Note: td-agent is a Fluentd distribution provided by Treasure Data, Inc.. Td-agent provides init script. So it is suitable for server use.

Confirm host name is valid:

% hostname
node1.example.com

If host name isn't valid, you can set host name by the following:

% sudo vi /etc/sysconfig/network
(Change HOSTNAME= line.)
% sudo /sbin/shutdown -r now
% hostname
node1.example.com
(Confirm your host name.)

Log archive node

Install the following Fluentd plugins:

% sudo td-agent-gem install fluent-plugin-secure-forward
% sudo td-agent-gem install fluent-plugin-sort
% sudo td-agent-gem install fluent-plugin-record-reformer
% sudo td-agent-gem install fluent-plugin-forest

Configure Fluentd:

% sudo mkdir -p /var/spool/td-agent/buffer/
% sudo chown -R td-agent:td-agent /var/spool/td-agent/
% sudo mkdir -p /var/log/archive/
% sudo chown -R td-agent:td-agent /var/log/archive/

Create /etc/td-agent/td-agent.conf:

<source>
  type secure_forward
  shared_key fluentd-secret
  self_hostname "#{Socket.gethostname}"
  cert_auto_generate yes
</source>

<match raw.**>
  type sort
  add_tag_prefix sorted.
  buffer_type file
  buffer_path /var/spool/td-agent/buffer/sort
  flush_interval 60
</match>

<match sorted.raw.**>
  type record_reformer
  enable_ruby false

  tag archive.${tag_parts[1]}.${tag_parts[3]}.${tag_suffix[4]}.${tag_parts[2]}
</match>

<match archive.raw.log.**>
  type forest
  remove_prefix archive.raw.log
  escape_tag_separator /
  subtype file
  <template>
    path /var/log/archive/${escaped_tag}
    compress gz
    format single_value
    append true
    flush_interval 60
  </template>
</match>

A log archive node expects message tag is the following format:

raw.${type}.log.${host_name}

For example:

raw.messages.log.node1
raw.apache2.log.node2

Messages are stored to the following format path:

/var/log/archive/${host_name}/${type}

Logs are compressed by gzip.

Fluentd doesn't guarantee message order. It means that archived logs may be out of order. fluent-plugin-sort plugin reduces the case.

fluent-plugin-sort keeps received messages before storing files. fluent-plugin-sort sorts buffered messages when buffer is flushed. It will reduce out of order logs but it's not perfect. Some logs may be out of order.

You can reduce out of order logs by increasing flush_interval parameter in type sort configuration. If its value is large, many messages will be sorted at once. It reduces out of order logs. But large flush interval delays log archive. Tune the parameter carefully.

Ensure starting Fluentd:

% sudo service td-agent restart

Log routing node

Install the following Fluentd plugins:

% sudo td-agent-gem install fluent-plugin-secure-forward
% sudo td-agent-gem install fluent-plugin-forest

Configure Fluentd:

% sudo mkdir -p /var/spool/td-agent/buffer/
% sudo chown -R td-agent:td-agent /var/spool/td-agent/

Create /etc/td-agent/td-agent.conf:

<source>
  type secure_forward
  shared_key fluentd-secret
  self_hostname "#{Socket.gethostname}"
  cert_auto_generate yes
</source>

<match raw.*.log.**>
  type forest
  subtype secure_forward

  <template>
    shared_key fluentd-secret
    self_hostname "#{Socket.gethostname}"

    buffer_type file
    buffer_path /var/spool/td-agent/buffer/secure-forward-${escaped_tag}
    flush_interval 1
  </template>
  <case raw.*.log.node1.example.com>
    <server>
      host archiver1.example.com
    </server>
  </case>
  <case raw.*.log.node2.example.com>
    <server>
      host archiver2.example.com
    </server>
  </case>
  <case **>
    <server>
      host archiver2.example.com
    </server>
  </case>
</match>

A log parsing node expects message tag is the following format:

raw.${type}.log.${host_name}

For example:

raw.messages.log.node1
raw.apache2.log.node2

All log routing nodes must be able to resolve names of all log archive nodes. Names of all log archive nodes are archiver1.example.com and archiver2.example.com in this example.

Confirm that a log routing node can resolve names of all log archive nodes:

% ping -c 1 archiver1.example.com
% ping -c 1 archiver2.example.com

If you get the following error message, you must configure your DNS or edit your /etc/hosts.

ping: unknown host archiver1.example.com

Ensure starting Fluentd after a log routing node can resolve names of all log archive nodes:

Ensure starting Fluentd:

% sudo service td-agent restart

Monitoring target node

Install the following Fluentd plugins:

% sudo td-agent-gem install fluent-plugin-secure-forward
% sudo td-agent-gem install fluent-plugin-config-expander

Configure Fluentd:

% sudo mkdir -p /var/spool/td-agent/buffer/
% sudo chown -R td-agent:td-agent /var/spool/td-agent/
% sudo chmod g+r /var/log/messages
% sudo chgrp td-agent /var/log/messages

Create /etc/td-agent/td-agent.conf:

<source>
  type config_expander
  <config>
    type tail
    path /var/log/messages
    pos_file /var/log/td-agent/messages.pos
    tag raw.messages.log.${hostname}
    format none
  </config>
</source>

<match raw.*.log.**>
  type copy

  <store>
    type secure_forward
    shared_key fluentd-secret
    self_hostname "#{Socket.gethostname}"

    buffer_type file
    buffer_path /var/spool/td-agent/buffer/secure-forward-router
    flush_interval 1

    <server>
      host router1.example.com
    </server>
    <server>
      host router2.example.com
    </server>
  </store>
</match>

Monitoring target node configuration for log archive system can be shared with the configuration for [log search system](Log search).

You can share configuration by adding more <store> sub section into <match raw.*.log.**> section:

<match raw.*.log.**>
  type copy

  <store>
    type secure_forward
    shared_key fluentd-secret
    self_hostname "#{Socket.gethostname}"

    buffer_type file
    buffer_path /var/spool/td-agent/buffer/secure-forward-parser
    flush_interval 1

    <server>
      host parser1.example.com
    </server>
    <server>
      host parser2.example.com
    </server>
  </store>

  <store>
    # ...
  </store>
</match>

All monitoring target nodes must be able to resolve names of all log routing nodes (and all log parsing nodes). All log routing node names are router1.example.com and router2.example.com in this example.

Confirm that a monitoring target node can resolve names of all log routing nodes:

% ping -c 1 router1.example.com
% ping -c 1 router2.example.com

If you get the following error message, you must configure your DNS or edit your /etc/hosts.

ping: unknown host router1.example.com

Ensure starting Fluentd after a monitoring target node can resolve names of all log routing nodes (and all log parsing nodes):

% sudo service td-agent restart
⚠️ **GitHub.com Fallback** ⚠️