Lab 4.1 Configuring Logstash - Hsanokklis/2023-2024-Tech-journal GitHub Wiki
Helpful Info
The Public IPv4 address will change every new session.
Current IPv4 address in use: 3.89.90.137
to access your instance
ssh -i hannelore-elk-key.pem ubuntu@public key
Private IPv4 address is :
172.31.87.23
When next login your system make sure to start everything again
- Start in this order: Elasticsearch, Logstash, Kibana
To get to the kibana dashboard type
http://public-ip:5601
into your browser
Helpful Vocab!
Logstash pipeline
A Logstash pipeline has two required elements, input
and output
, and one optional element, filter
. The input plugins consume data from a source, the filter plugins modify the data as you specify, and the output plugins write the data to a destination.
Lab-Prep
For now, we will stop filebeat, metricbeat, and logstash on your test system
- sudo systemctl stop filebeat
- sudo systemctl stop metricbeat
- sudo systemctl stop logstash
It also seems we might need some more memory in our AWS instances as we add to our Elastic stack server
- Power off/Stop your server
- From the AWS Console - Change the Instance Type to t2-Large
How to change instance
- Go to Instance state
- Choose
stop
- Actions ---> Instance settings ---> Change instance type
- Restart your server/instance and you should be in good shape!
Step 1: Configuring a Basic Pipeline
First, let’s test the Logstash installation by running the most basic Logstash pipeline.
A Logstash pipeline has two required elements, input
and output
, and one optional element, filter
. The input plugins consume data from a source, the filter plugins modify the data as you specify, and the output plugins write the data to a destination.
To test your Logstash installation, run the most basic Logstash pipeline. For example:
cd /usr/share/logstash
bin/logstash -e 'input { stdin { } } output { stdout {} }'
The -e flag enables you to specify a configuration directly from the command line. Specifying configurations at the command line lets you quickly test configurations without having to edit a file between iterations. The pipeline in the example takes input from the standard input, stdin
, and moves that input to the standard output, stdout
, in a structured format.
After starting Logstash, wait until you see Pipeline Started-
The stdin plugin is now waiting for input" or Successfully started API Endpoint
and then enter hello world
at the command prompt:
hello world
{
"@version" => "1",
"@timestamp" => 2020-12-07T15:00:35.568Z,
"host" => "agelk",
"message" => "hello world"
}
Logstash adds timestamp and IP address information to the message. Exit Logstash by issuing a CTRL-D
command in the shell where Logstash is running.
Congratulations! You’ve created and run a basic Logstash pipeline. Next, you learn how to create a more realistic pipeline.
Submit Screenshot of basic pipeline
Step 2: Basic File Parsing with Logstash
Download some sample data for testing
- cd /root or /home/user
- wget https://download.elastic.co/demos/logstash/gettingstarted/logstash-tutorial.log.gz
- gunzip logstash-tutorial.log.gz
This is a sample Apache access log we will use for testing
Next, you create a Logstash configuration pipeline that uses the Beats input plugin to receive events from Beats.
The following text represents the skeleton of a configuration pipeline:
# The # character at the beginning of a line indicates a comment. Use
# comments to describe your configuration.
input {
}
# The filter part of this file is commented out to indicate that it is
# optional.
# filter {
#
# }
output {
}
This skeleton is non-functional, because the input and output sections don’t have any valid options defined.
To get started, copy and paste the skeleton configuration pipeline into a file named first-pipeline.conf
in your home Logstash directory (/usr/share/logstash).
Next, configure your Logstash instance to use the Beats input plugin by adding the following lines to the input section of the
first-pipeline.conf` file:
beats {
port => "5044"
}
You’ll configure Logstash to write to Elasticsearch later. For now, you can add the following line to the output
section so that the output is printed to stdout when you run Logstash:
stdout { codec => rubydebug }
When you’re done, the contents of first-pipeline.conf
should look like this:
input {
beats {
port => "5044"
}
}
# The filter part of this file is commented out to indicate that it is
# optional.
# filter {
#
# }
output {
stdout { codec => rubydebug }
}
To verify your configuration, run the following command:
bin/logstash -f first-pipeline.conf --config.test_and_exit
The
--config.test_and_exit
option parses your configuration file and reports any errors.
Configure Filebeat to send Test Log to Logstash
We will create a test filebeat YAML file to use for this sample
Open a new terminal window (we will want Filebeat and Logstash in separate Windows)
Create a new file in /etc/filebeat called test-filebeat.yml
with the following lines. Make sure paths
points to the example Apache log file, logstash-tutorial.log
, that you downloaded earlier (NOTE - if you copy/paste please make sure that indentations are correct!):
filebeat.inputs:
- type: log
paths:
- /path/to/file/logstash-tutorial.log
Links to an external site.
output.logstash:
hosts: ["your_server_IP:5044"]
Start Logstash and Filebeat to send and parse the test log
Starting Logstash
For the test, from the Logstash terminal, start Logstash with the following command:
- bin/logstash -f first-pipeline.conf --config.reload.automatic
The --config.reload.automatic
option enables automatic config reloading so that you don’t have to stop and restart Logstash every time you modify the configuration file.
As Logstash starts up, you might see one or more warning messages about Logstash ignoring the pipelines.yml
file. You can safely ignore this warning. The pipelines.yml file is used for running multiple pipelinesLinks to an external site. in a single Logstash instance. For the examples shown here, you are running a single pipeline.
Starting Filebeat
From the Filebeat terminal window, enter the following:
sudo filebeat -e -c test-filebeat.yml -d "publish"
If your pipeline is working correctly, you should see a series of events like the following written to the console:
{
"@timestamp" => 2017-11-09T01:44:20.071Z,
"offset" => 325,
"@version" => "1",
"beat" => {
"name" => "My-MacBook-Pro.local",
"hostname" => "My-MacBook-Pro.local",
"version" => "6.0.0"
},
"host" => "My-MacBook-Pro.local",
"prospector" => {
"type" => "log"
},
"input" => {
"type" => "log"
},
"source" => "/path/to/file/logstash-tutorial.log",
"message" => "83.149.9.216 - - [04/Jan/2015:05:13:42 +0000] \"GET /presentations/logstash-monitorama-2013/images/kibana-search.png HTTP/1.1\" 200 203023 \"http://semicomplete.com/presentations/logstash-monitorama-2013/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\"",
"tags" => [
[0] "beats_input_codec_plain_applied"
]
}
...
Submit Screenshot of Logstash output
Parsing Web Logs with the Grok Filter Plugin
Now you have a working test pipeline that reads log lines from Filebeat. However you’ll notice that the format of the log messages is not ideal. You want to parse the log messages to create specific, named fields from the logs. To do this, you’ll use the grok
filter plugin.
The grokLinks to an external site. filter plugin is one of several plugins that are available by default in Logstash. The grok filter plugin enables you to parse the unstructured log data into something structured and queryable.
Because the grok
filter plugin looks for patterns in the incoming log data, configuring the plugin requires you to make decisions about how to identify the patterns that are of interest to your use case. A representative line from the web server log sample looks like this:
83.149.9.216 - - [04/Jan/2015:05:13:42 +0000] "GET /presentations/logstash-monitorama-2013/images/kibana-search.png
HTTP/1.1" 200 203023 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel
Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"
The IP address at the beginning of the line is easy to identify, as is the timestamp in brackets. To parse the data, you can use the %{COMBINEDAPACHELOG}
grok pattern, which structures lines from the Apache log using the following schema:
Edit the first-pipeline.conf
file and replace the entire filter
section with the following text:
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}"}
}
}
When you’re done, the contents of first-pipeline.conf
should look like this:
input {
beats {
port => "5044"
}
}
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}"}
}
}
output {
stdout { codec => rubydebug }
}
Save your changes. Because you’ve enabled automatic config reloading, you don’t have to restart Logstash to pick up your changes. However, you do need to force Filebeat to read the log file from scratch. To do this, go to the terminal window where Filebeat is running and press Ctrl+C
to shut down Filebeat. Then delete the Filebeat registry file. For example, run:
sudo rm -rf /var/lib/filebeat/registry
Since Filebeat stores the state of each file it harvests in the registry, deleting the registry file forces Filebeat to read all the files it’s harvesting from scratch.
Next, restart Filebeat with the following command:
- sudo filebeat -e -c test-filebeat.yml -d "publish"
There might be a slight delay before Filebeat begins processing events if it needs to wait for Logstash to reload the config file.
After Logstash applies the grok pattern, the events will have the following JSON representation:
{
"request" => "/presentations/logstash-monitorama-2013/images/kibana-search.png",
"agent" => "\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\"",
"offset" => 325,
"auth" => "-",
"ident" => "-",
"verb" => "GET",
"prospector" => {
"type" => "log"
},
"input" => {
"type" => "log"
},
"source" => "/path/to/file/logstash-tutorial.log",
"message" => "83.149.9.216 - - [04/Jan/2015:05:13:42 +0000] \"GET /presentations/logstash-monitorama-2013/images/kibana-search.png HTTP/1.1\" 200 203023 \"http://semicomplete.com/presentations/logstash-monitorama-2013/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\"",
"tags" => [
[0] "beats_input_codec_plain_applied"
],
"referrer" => "\"http://semicomplete.com/presentations/logstash-monitorama-2013/\"",
"@timestamp" => 2017-11-09T02:51:12.416Z,
"response" => "200",
"bytes" => "203023",
"clientip" => "83.149.9.216",
"@version" => "1",
"beat" => {
"name" => "My-MacBook-Pro.local",
"hostname" => "My-MacBook-Pro.local",
"version" => "6.0.0"
},
"host" => "My-MacBook-Pro.local",
"httpversion" => "1.1",
"timestamp" => "04/Jan/2015:05:13:42 +0000"
}
Notice that the event includes the original message, but the log message is also broken down into specific Web Request fields such as referrer, user-agent...
Submit Screenshot of Logstash output with Apache specific fields
Enhancing Your Data with the Geoip Filter Plugin
In addition to parsing log data for better searches, filter plugins can derive supplementary information from existing data. As an example, the geoipLinks to an external site. plugin looks up IP addresses, derives geographic location information from the addresses, and adds that location information to the logs.
Configure your Logstash instance to use the geoip
filter plugin by adding the following lines to the filter section of the first-pipeline.conf file:
geoip {
source => "clientip"
}
The geoip
plugin configuration requires you to specify the name of the source field that contains the IP address to look up. In this example, the clientip
field contains the IP address.
Since filters are evaluated in sequence, make sure that the geoip
section is after the grok
section of the configuration file and that both the grok
and geoip
sections are nested within the filter
section.
When you’re done, the contents of first-pipeline.conf should look like this:
input {
beats {
port => "5044"
}
}
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}"}
}
geoip {
source => "clientip"
}
}
output {
stdout { codec => rubydebug }
}
Save your changes. To force Filebeat to read the log file from scratch, as you did earlier, shut down Filebeat (press Ctrl+C), delete the registry folder again (/var/lib/filebeat/registry), and then restart Filebeat with the following command:
sudo filebeat -e -c test-filebeat.yml -d "publish"
Notice that the event now contains geographic location information:
{
"request" => "/presentations/logstash-monitorama-2013/images/kibana-search.png",
"agent" => "\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\"",
"geoip" => {
"timezone" => "Europe/Moscow",
"ip" => "83.149.9.216",
"latitude" => 55.7485,
"continent_code" => "EU",
"city_name" => "Moscow",
"country_name" => "Russia",
"country_code2" => "RU",
"country_code3" => "RU",
"region_name" => "Moscow",
"location" => {
"lon" => 37.6184,
"lat" => 55.7485
},
"postal_code" => "101194",
"region_code" => "MOW",
"longitude" => 37.6184
},
...
Submit Screenshot showing GeoIP info included in Logstash output
Step 3: Indexing Your Data into Elasticsearch
Now that the web logs are broken down into specific fields, you’re ready to get your data into Elasticsearch.
The Logstash pipeline can index the data into an Elasticsearch cluster. Edit the first-pipeline.conf file and replace the entire output section with the following text. This will add the logs to your existing filebeat index:
output {
elasticsearch {
hosts => [ "your_ip:9200" ]
index => "%{[@metadata][beat]}-%{[@metadata][version]}"
}
}
At this point, your first-pipeline.conf file has input, filter, and output sections properly configured, and looks something like this:
input {
beats {
port => "5044"
}
}
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}"}
}
geoip {
source => "clientip"
}
}
output {
elasticsearch {
hosts => [ "your_ip:9200" ]
index => "%{[@metadata][beat]}-%{[@metadata][version]}"
}
}
Save your changes. To force Filebeat to read the log file from scratch, as you did earlier, shut down Filebeat (press Ctrl+C), delete the registry directory, and then restart Filebeat with the following command:
sudo filebeat -e -c test-filebeat.yml -d "publish"
Testing Your Pipeline
Now that the Logstash pipeline is configured to index the data into an Elasticsearch cluster, you can query Elasticsearch.
MAKE SURE TO GO AND START ALL OF UR PROGRAMS
To see a list of available indexes, use this query: curl 'your_ip:9200/_cat/indices?v'
Try a test query to Elasticsearch based on the fields created by the grok filter plugin. Use the name of your filebeat index in the URL
curl -XGET 'your_server_ip:9200/your_filebeat_index/_search?pretty&q=clientip:83.149.9.216'
Filebeat index is
filebeat-7.17.15-2023.12.12-000001
Use:
curl -XGET 172.31.87.23:9200/filebeat-7.17.15-2023.12.12-000001/_search?pretty&q=clientip:83.149.9.216
83.149.9.216 is an IP found in the sample Apache logs - so should yield results.
Try another search for the geographic information derived from the IP address.
curl -XGET '172.31.87.23:9200/filebeat-7.17.15-2023.12.12-000001/_search?pretty&q=geoip.city_name:Moscow'
Searching for data entries with the geo.city_name
of Moscow
Submit Screenshot of ElasticSearch results
You can also explore the Filebeat data in Kibana:
Under Discover-Filebeat, you should be able to query for the recent logs from your logstash-tutorial.log file.