Logstash - newgeekorder/TechWiki GitHub Wiki
Logstash is run form a config (conf) file and consist of 3 basic options
- input
- output
- and filter
e.g input from file
file {
type => "mule-ee_log"
path => [ "/home/mule/mule-enterprise-standalone-3.4.0/logs/mule_ee.log*" ]
sincedb_path => [ "/home/logstash" ]
}
Managing multi-line strings like an xml file
multiline {
type => "mule-ee_log"
pattern => "%{DATESTAMP}"
negate => true
what => "previous"
}
grok {
pattern => "%{LOGLEVEL:loglevel}"
}
grok {
pattern => "%{DATESTAMP:logdate}"
}
grok {
pattern => "%{JAVACLASS:javaclass}"
}
grok {
pattern => "(?<CorrelationID>(?<CorrelationID>)(\d*?)(?=</CorrelationID>))"
}
grok {
pattern => "(?<MessageID>(?<MessageID>)(\d*?)(?=</MessageID>))"
}
grok {
pattern => "(?<MessageType>(?<=MessageType>)(.*?)(?=</MessageType>))"
}
Note filters can be conditional
filter {
if ([message] =~ /<ROOT/) {
grok {
match => [ "message",
'number="(?<number>\d+)" number2="(?<number1>\d+)"'
]
}
} else if ([message] =~ /<EVENT /) {
grok {
match => [ "message", 'name="(?<name>[^"]+)"']
}
}
- testing grok http://grokconstructor.appspot.com/do/match
- https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html
input {
file {
path => "/home/richard/apps/logstash-2.0.0-beta3/testXml.log"
start_position => "beginning"
codec => multiline {
pattern => "^<PhysicalAssetMessage(.*)>"
negate => true
what => previous
}
}
}
filter {
# mutate {
# strip => "message"
# }
xml {
source => message
target => physicalasset
# remove ns schema from PhysicalAssetMessage
remove_namespaces => false
# remove any redundant fields left
#remove_field => [ "@version", "@timestamp", "tags", "host", "path", "type", "message" ]
}
}
output{
stdout{
}
elasticsearch {
index => "physicalasset"
}
}
- cood from a Logstash cookbook
Elastic seem to have two alternative shippers in the works
- filebeat - file logging
- packetbeat - network packet logging/analyzer
- topbeat - process cpu statistic shipper
- gettting started guide
File beat is a go binary that can be downloaded as per the getting started curl or go get github.com/elastic/filebeat
Filebeat command line is used to start, stop, provide a config file Note: filebeat will by default grab it's config file from elastic search
sudo /etc/init.d/filebeat start
Filebeat uses a tab sensitive yml file for configuration. A default file to ship all linux log files would be:
filebeat:
# List of prospectors to fetch data.
prospectors:
# Each - is a prospector. Below are the prospector specific configurations
-
# Paths that should be crawled and fetched. Glob based paths.
# For each file found under this path, a harvester is started.
paths:
- /var/log/*.log
# - c:\programdata\elasticsearch\logs\*
# Type of the files. Based on this the way the file is read is decided.
# The different types cannot be mixed in one prospector
#
# Possible options are:
# * log: Reads every line of the log file (default)
# * stdin: Reads the standard in
type: log
# Optional additional fields. These field can be freely picked
# to add additional information to the crawled log files for filtering
#fields:
# level: debug
# review: 1
# Ignore files which were modified more then the defined timespan in the past
# Time strings like 2h (2 hours), 5m (5 minutes) can be used.
#ignore_older:
# Scan frequency in seconds.
# How often these files should be checked for changes. In case it is set
# to 0s, it is done as often as possible. Default: 10s
#scan_frequency: 10s
# Defines the buffer size every harvester uses when fetching the file
#harvester_buffer_size: 16384
# Always tail on log rotation. Disabled by default
# Note: This may skip entries
#tail_on_rotate: false
# Configure the file encoding for reading file with international characters
# supported encodings:
# plain, utf-8, utf-16be-bom, utf-16be, utf-16le, big5, gb18030, gbk, hzgb2312,
# euckr, eucjp, iso2022jp, shiftjis, iso8859-63, iso8859-6i, iso8859-8e,
# iso8859-8i
#encoding: plain
output:
elasticsearch:
# Enable Elasticsearch as output
enabled: true
# The Elasticsearch cluster
hosts: ["http://localhost:9200"]
# Comment this option if you don't want to store the topology in
# Elasticsearch. The default is false.
# This option makes sense only for Packetbeat
save_topology: true
# Optional index name. The default is packetbeat and generates
# [packetbeat-]YYYY.MM.DD keys.
index: "packetbeat"