Grok - ignacio-alorre/ElasticSearch GitHub Wiki
Grok filter is a parser for unstructured data
Grok uses regular expressions behind the scenes.
A lot of common expressions are already predefined, in Logstash's Grok filter, and we can use their pattern names instead of writing those complicated strings of characters ourselves. For example:
Value
[email protected]
Regex Pattern
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,3})$
Predefined Grok Pattern
%{EMAILADDRESS:client_email}
This Grok pattern will look for all email addresses and identify each as client_email
We can debug the behaviour Grok filter in this link
For example, for an input
[email protected] DEBUG A simple log
with a filter
%{NGUSER:email} %{LOGLEVEL:logLevel} %{GREEDYDATA:logMessage}
the output would be:
{
"email": [
[
"[email protected]"
]
],
"NGUSERNAME": [
[
"[email protected]"
]
],
"logLevel": [
[
"DEBUG"
]
],
"logMessage": [
[
"A simple log"
]
]
}
Then the filter needs to be added into Logstash
input {
file {
path => "/home/student/03-grok-examples/sample.log"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
grok {
match => { "message" => ['%{TIMESTAMP_ISO8601:time} %{LOGLEVEL:logLevel} %{GREEDYDATA:logMessage}'] }
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "demo-grok"
}
stdout {}
}
Source
######################
It is a filter for parsing unstructured data into structured data
It uses regular expressions to match patterns in the input
Inside the Grok block, there's a match block that has a message, which is the parameter that will contain any incoming data in its raw form
The value of that message in our filter block is where we define what each bit of data is called and what data pattern to match
In the example of the video [reference to later add images] The first word is IP, which matches the first part of the log line (an IP address). The match block has IP:c so Grok will assign that value to the key c in the structured data at outputs. The next matches a WORD and we'll assign to m for method.
img
Mutate is another plugin you are likely to be using to filter data. It's very versatile because it lets you do various transformations of the data. So with this filter block, there's a Grok filter to get the data tag correctly, and then a mutate block to operate on the results.
img
This website will help us in building a Grok Constructor: