Lab 4.1 Configuring Logstash - Hsanokklis/2023-2024-Tech-journal GitHub Wiki

Helpful Info

The Public IPv4 address will change every new session. Current IPv4 address in use: 3.89.90.137

to access your instance ssh -i hannelore-elk-key.pem ubuntu@public key

Private IPv4 address is : 172.31.87.23

When next login your system make sure to start everything again

Start in this order: Elasticsearch, Logstash, Kibana

To get to the kibana dashboard type http://public-ip:5601 into your browser

Helpful Vocab!

Logstash pipeline

A Logstash pipeline has two required elements, input and output, and one optional element, filter. The input plugins consume data from a source, the filter plugins modify the data as you specify, and the output plugins write the data to a destination.

Lab-Prep

For now, we will stop filebeat, metricbeat, and logstash on your test system

sudo systemctl stop filebeat
sudo systemctl stop metricbeat
sudo systemctl stop logstash

It also seems we might need some more memory in our AWS instances as we add to our Elastic stack server

Power off/Stop your server
From the AWS Console - Change the Instance Type to t2-Large

How to change instance

Go to Instance state

Choose stop

Actions ---> Instance settings ---> Change instance type

Restart your server/instance and you should be in good shape!

Step 1: Configuring a Basic Pipeline

First, let’s test the Logstash installation by running the most basic Logstash pipeline.

To test your Logstash installation, run the most basic Logstash pipeline. For example:

cd /usr/share/logstash
bin/logstash -e 'input { stdin { } } output { stdout {} }'

The -e flag enables you to specify a configuration directly from the command line. Specifying configurations at the command line lets you quickly test configurations without having to edit a file between iterations. The pipeline in the example takes input from the standard input, stdin, and moves that input to the standard output, stdout, in a structured format.

After starting Logstash, wait until you see Pipeline Started-The stdin plugin is now waiting for input" or Successfully started API Endpoint and then enter hello world at the command prompt:

hello world

{
"@version" => "1",
"@timestamp" => 2020-12-07T15:00:35.568Z,
"host" => "agelk",
"message" => "hello world"
}

Logstash adds timestamp and IP address information to the message. Exit Logstash by issuing a CTRL-D command in the shell where Logstash is running.

Congratulations! You’ve created and run a basic Logstash pipeline. Next, you learn how to create a more realistic pipeline.

`Submit Screenshot of basic pipeline`

Step 2: Basic File Parsing with Logstash

Download some sample data for testing

cd /root or /home/user
wget https://download.elastic.co/demos/logstash/gettingstarted/logstash-tutorial.log.gz
gunzip logstash-tutorial.log.gz

This is a sample Apache access log we will use for testing

Next, you create a Logstash configuration pipeline that uses the Beats input plugin to receive events from Beats.

The following text represents the skeleton of a configuration pipeline:

# The # character at the beginning of a line indicates a comment. Use
# comments to describe your configuration.
input {
}
# The filter part of this file is commented out to indicate that it is
# optional.
# filter {
#
# }
output {
}

This skeleton is non-functional, because the input and output sections don’t have any valid options defined.

To get started, copy and paste the skeleton configuration pipeline into a file named first-pipeline.conf in your home Logstash directory (/usr/share/logstash).

Next, configure your Logstash instance to use the Beats input plugin by adding the following lines to the input section of the first-pipeline.conf` file:

beats {
        port => "5044"
    }

You’ll configure Logstash to write to Elasticsearch later. For now, you can add the following line to the output section so that the output is printed to stdout when you run Logstash:

stdout { codec => rubydebug }

When you’re done, the contents of first-pipeline.conf should look like this:

input {
    beats {
        port => "5044"
    }
}
# The filter part of this file is commented out to indicate that it is
# optional.
# filter {
#
# }
output {
    stdout { codec => rubydebug }
}

To verify your configuration, run the following command:

bin/logstash -f first-pipeline.conf --config.test_and_exit

The --config.test_and_exit option parses your configuration file and reports any errors.

Configure Filebeat to send Test Log to Logstash

We will create a test filebeat YAML file to use for this sample

Open a new terminal window (we will want Filebeat and Logstash in separate Windows)

Create a new file in /etc/filebeat called test-filebeat.yml with the following lines. Make sure paths points to the example Apache log file, logstash-tutorial.log, that you downloaded earlier (NOTE - if you copy/paste please make sure that indentations are correct!):

filebeat.inputs:
- type: log
  paths:
    - /path/to/file/logstash-tutorial.log 
Links to an external site.
output.logstash:
  hosts: ["your_server_IP:5044"]

Start Logstash and Filebeat to send and parse the test log

Starting Logstash

For the test, from the Logstash terminal, start Logstash with the following command:

bin/logstash -f first-pipeline.conf --config.reload.automatic

The --config.reload.automatic option enables automatic config reloading so that you don’t have to stop and restart Logstash every time you modify the configuration file.

As Logstash starts up, you might see one or more warning messages about Logstash ignoring the pipelines.yml file. You can safely ignore this warning. The pipelines.yml file is used for running multiple pipelinesLinks to an external site. in a single Logstash instance. For the examples shown here, you are running a single pipeline.

Starting Filebeat

From the Filebeat terminal window, enter the following:

sudo filebeat -e -c test-filebeat.yml -d "publish"

If your pipeline is working correctly, you should see a series of events like the following written to the console:

{
    "@timestamp" => 2017-11-09T01:44:20.071Z,
        "offset" => 325,
      "@version" => "1",
          "beat" => {
            "name" => "My-MacBook-Pro.local",
        "hostname" => "My-MacBook-Pro.local",
         "version" => "6.0.0"
    },
          "host" => "My-MacBook-Pro.local",
    "prospector" => {
        "type" => "log"
    },
    "input" => {
        "type" => "log"
    },
        "source" => "/path/to/file/logstash-tutorial.log",
       "message" => "83.149.9.216 - - [04/Jan/2015:05:13:42 +0000] \"GET /presentations/logstash-monitorama-2013/images/kibana-search.png HTTP/1.1\" 200 203023 \"http://semicomplete.com/presentations/logstash-monitorama-2013/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\"",
          "tags" => [
        [0] "beats_input_codec_plain_applied"
    ]
}
...

`Submit Screenshot of Logstash output`

Parsing Web Logs with the Grok Filter Plugin

Now you have a working test pipeline that reads log lines from Filebeat. However you’ll notice that the format of the log messages is not ideal. You want to parse the log messages to create specific, named fields from the logs. To do this, you’ll use the grok filter plugin.

The grokLinks to an external site. filter plugin is one of several plugins that are available by default in Logstash. The grok filter plugin enables you to parse the unstructured log data into something structured and queryable.

Because the grok filter plugin looks for patterns in the incoming log data, configuring the plugin requires you to make decisions about how to identify the patterns that are of interest to your use case. A representative line from the web server log sample looks like this:

83.149.9.216 - - [04/Jan/2015:05:13:42 +0000] "GET /presentations/logstash-monitorama-2013/images/kibana-search.png
HTTP/1.1" 200 203023 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel
Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"

The IP address at the beginning of the line is easy to identify, as is the timestamp in brackets. To parse the data, you can use the %{COMBINEDAPACHELOG} grok pattern, which structures lines from the Apache log using the following schema:

Edit the first-pipeline.conf file and replace the entire filter section with the following text:

filter {
    grok {
        match => { "message" => "%{COMBINEDAPACHELOG}"}
    }
}

When you’re done, the contents of first-pipeline.conf should look like this:

input {
    beats {
        port => "5044"
    }
}
filter {
    grok {
        match => { "message" => "%{COMBINEDAPACHELOG}"}
    }
}
output {
    stdout { codec => rubydebug }
}

Save your changes. Because you’ve enabled automatic config reloading, you don’t have to restart Logstash to pick up your changes. However, you do need to force Filebeat to read the log file from scratch. To do this, go to the terminal window where Filebeat is running and press Ctrl+C to shut down Filebeat. Then delete the Filebeat registry file. For example, run:

sudo rm -rf /var/lib/filebeat/registry

Since Filebeat stores the state of each file it harvests in the registry, deleting the registry file forces Filebeat to read all the files it’s harvesting from scratch.

Next, restart Filebeat with the following command:

sudo filebeat -e -c test-filebeat.yml -d "publish"

There might be a slight delay before Filebeat begins processing events if it needs to wait for Logstash to reload the config file.

After Logstash applies the grok pattern, the events will have the following JSON representation:

{
        "request" => "/presentations/logstash-monitorama-2013/images/kibana-search.png",
          "agent" => "\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\"",
         "offset" => 325,
           "auth" => "-",
          "ident" => "-",
           "verb" => "GET",
     "prospector" => {
        "type" => "log"
    },
     "input" => {
        "type" => "log"
    },
         "source" => "/path/to/file/logstash-tutorial.log",
        "message" => "83.149.9.216 - - [04/Jan/2015:05:13:42 +0000] \"GET /presentations/logstash-monitorama-2013/images/kibana-search.png HTTP/1.1\" 200 203023 \"http://semicomplete.com/presentations/logstash-monitorama-2013/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\"",
           "tags" => [
        [0] "beats_input_codec_plain_applied"
    ],
       "referrer" => "\"http://semicomplete.com/presentations/logstash-monitorama-2013/\"",
     "@timestamp" => 2017-11-09T02:51:12.416Z,
       "response" => "200",
          "bytes" => "203023",
       "clientip" => "83.149.9.216",
       "@version" => "1",
           "beat" => {
            "name" => "My-MacBook-Pro.local",
        "hostname" => "My-MacBook-Pro.local",
         "version" => "6.0.0"
    },
           "host" => "My-MacBook-Pro.local",
    "httpversion" => "1.1",
      "timestamp" => "04/Jan/2015:05:13:42 +0000"
}

Notice that the event includes the original message, but the log message is also broken down into specific Web Request fields such as referrer, user-agent...

`Submit Screenshot of Logstash output with Apache specific fields`

Enhancing Your Data with the Geoip Filter Plugin

In addition to parsing log data for better searches, filter plugins can derive supplementary information from existing data. As an example, the geoipLinks to an external site. plugin looks up IP addresses, derives geographic location information from the addresses, and adds that location information to the logs.

Configure your Logstash instance to use the geoip filter plugin by adding the following lines to the filter section of the first-pipeline.conf file:

 geoip {
        source => "clientip"
    }

The geoip plugin configuration requires you to specify the name of the source field that contains the IP address to look up. In this example, the clientip field contains the IP address.

Since filters are evaluated in sequence, make sure that the geoip section is after the grok section of the configuration file and that both the grok and geoip sections are nested within the filter section.

When you’re done, the contents of first-pipeline.conf should look like this:

input {
    beats {
        port => "5044"
    }
}
 filter {
    grok {
        match => { "message" => "%{COMBINEDAPACHELOG}"}
    }
    geoip {
        source => "clientip"
    }
}
output {
    stdout { codec => rubydebug }
}

Save your changes. To force Filebeat to read the log file from scratch, as you did earlier, shut down Filebeat (press Ctrl+C), delete the registry folder again (/var/lib/filebeat/registry), and then restart Filebeat with the following command:

sudo filebeat -e -c test-filebeat.yml -d "publish"

Notice that the event now contains geographic location information:

{
        "request" => "/presentations/logstash-monitorama-2013/images/kibana-search.png",
          "agent" => "\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\"",
          "geoip" => {
              "timezone" => "Europe/Moscow",
                    "ip" => "83.149.9.216",
              "latitude" => 55.7485,
        "continent_code" => "EU",
             "city_name" => "Moscow",
          "country_name" => "Russia",
         "country_code2" => "RU",
         "country_code3" => "RU",
           "region_name" => "Moscow",
              "location" => {
            "lon" => 37.6184,
            "lat" => 55.7485
        },
           "postal_code" => "101194",
           "region_code" => "MOW",
             "longitude" => 37.6184
    },
    ...

`Submit Screenshot showing GeoIP info included in Logstash output`

Step 3: Indexing Your Data into Elasticsearch

Now that the web logs are broken down into specific fields, you’re ready to get your data into Elasticsearch.

The Logstash pipeline can index the data into an Elasticsearch cluster. Edit the first-pipeline.conf file and replace the entire output section with the following text. This will add the logs to your existing filebeat index:

output {
    elasticsearch {
        hosts => [ "your_ip:9200" ]
        index => "%{[@metadata][beat]}-%{[@metadata][version]}"
    }
}

At this point, your first-pipeline.conf file has input, filter, and output sections properly configured, and looks something like this:

input {
    beats {
        port => "5044"
    }
}
 filter {
    grok {
        match => { "message" => "%{COMBINEDAPACHELOG}"}
    }
    geoip {
        source => "clientip"
    }
}
output {
    elasticsearch {
        hosts => [ "your_ip:9200" ]
        index => "%{[@metadata][beat]}-%{[@metadata][version]}"
    }
}

Save your changes. To force Filebeat to read the log file from scratch, as you did earlier, shut down Filebeat (press Ctrl+C), delete the registry directory, and then restart Filebeat with the following command:

sudo filebeat -e -c test-filebeat.yml -d "publish"

Testing Your Pipeline

Now that the Logstash pipeline is configured to index the data into an Elasticsearch cluster, you can query Elasticsearch.

MAKE SURE TO GO AND START ALL OF UR PROGRAMS

To see a list of available indexes, use this query: curl 'your_ip:9200/_cat/indices?v'

Try a test query to Elasticsearch based on the fields created by the grok filter plugin. Use the name of your filebeat index in the URL

curl -XGET 'your_server_ip:9200/your_filebeat_index/_search?pretty&q=clientip:83.149.9.216'

Filebeat index is filebeat-7.17.15-2023.12.12-000001

Use: curl -XGET 172.31.87.23:9200/filebeat-7.17.15-2023.12.12-000001/_search?pretty&q=clientip:83.149.9.216

83.149.9.216 is an IP found in the sample Apache logs - so should yield results.

Try another search for the geographic information derived from the IP address.

curl -XGET '172.31.87.23:9200/filebeat-7.17.15-2023.12.12-000001/_search?pretty&q=geoip.city_name:Moscow'

Searching for data entries with the geo.city_name of Moscow

`Submit Screenshot of ElasticSearch results`

You can also explore the Filebeat data in Kibana:

Under Discover-Filebeat, you should be able to query for the recent logs from your logstash-tutorial.log file.