Lab 1 Setting up Elastic in AWS - Hsanokklis/2023-2024-Tech-journal GitHub Wiki
The Public IPv4 address will change every new session.
Current IPv4 address in use: 54.173.196.221
Private IPv4 address is :
172.31.87.23
Part 1:
- Configure an Ubuntu Server in AWS that you can SSH into
Part 2:
- Install Elasticsearch on that server
- Install Logstash on that server
- Configure a test pipeline from Logstash to Elasticsearch
- Install Kibana
- Successfully query log data using Kibana
CLICK start lab when going into the box
The boxes time out every 4 hours, so once you reboot you will have a new IP address (so keep that in mind)
When you create your key pair, it stores the public key on amazon and downloads the private key to your downloads
When we interact with the address from Champlain we have to ssh with the public 1pv4 address
ssh -i .\hannelore-elk.pem ubuntu@'ipaddress'
Security groups are essentially firewalls that are attached to your instance
https://awsacademy.instructure.com/courses/60764
Use the Services search bar and search for “EC2”
Click EC2 to get to the EC2 Dashboard
Click on the Instances-Instances Menu on the left pane
Click the Orange “Launch Instances” on the Right
On the “Launch an Instance” screen
- Name your server “your_name ELK Server”
- Select Quick Start Ubuntu as the OS Image
- Select t2.medium as the Instance Type
Important: Key Pair (login)
- Choose “Create new key pair”
- Key pair name: yourname-elk-key
- Type: RSA
- Format: .pem
Click Create Key Pair and it should Download to your computer
This file is like a password - you will need access to it to log into your server- please keep track of it!
Network Settings:
- Create security group
- Check Allow SSH Traffic from anywhere 0.0.0.0/0
- Check Allow HTTPS Traffic from Internet
- Check Allow HTTP Traffic From Internet
- Leave defaults for “Configure Storage” and Advanced Details”
Configure Storage --> update it to 30GB
Launch the Instance at the bottom of the page
you should see a green bar saying success with your instance
You should see your instance running! --->
ssh -i hannelore-elk-key.pem [email protected]
the public ip might change the next time I open the instance
Update Security Group (aka firewall) for Elastic
- Final prep step - in the AWS EC2 Console - find the security group attached to your instances
- Update the security group to allow the Elasticsearch and Kibana ports
- Edit Inbound Rules for the Security Group assigned to your ubuntu Server
- Add Custom TCP port 5601 with source Any IPv4 (this is Kibana)
- Add Custom TCP port 9200 with source Any IPv4 (this is Elasticsearch)
This message should pop up once you edit your security rules
From SSH on your Ubuntu server
First, you need to add Elastic’s signing key so that the downloaded package can be verified (skip this step if you’ve already installed packages from Elastic):
you need to do cd to the downloads folder so that it can find the key
Connected to Ubuntu!
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
For Debian based OS's like Ubuntu, we need to then install the apt-transport-https package:
sudo apt-get update
sudo apt-get install apt-transport-https
The next step is to add the repository definition to your system (one line command)
echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list
To install a version of Elasticsearch that contains only features licensed under Apache 2.0 (aka OSS Elasticsearch) (again, one line command):
echo "deb https://artifacts.elastic.co/packages/oss-7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list
All that’s left to do is to update your repositories and install Elasticsearch:
sudo apt-get update
sudo apt-get install elasticsearch
Elasticsearch configurations are done using a configuration file that allows you to configure general settings (e.g. node name), as well as network settings (e.g. host and port), where data is stored, memory, log files, and more.
For our example, since we are installing Elasticsearch in AWS, it is a good best practice to bind Elasticsearch to the private IP (like 172.31.something):
sudo apt install micro (actually being able to read the text editor)
sudo micro /etc/elasticsearch/elasticsearch.yml
Edit the file to uncomment the following in the Network section (delete # for those lines). For use the Private IP of your AWS server network.host:
sudo service elasticsearch start
To confirm that everything is working as expected, use curl to port 9200 on your server:
curl http://your_private_ip:9200/
Logstash requires Java 8 or Java 11 to run so we will start the process of setting up Logstash with:
sudo apt-get install default-jre
Verify java is installed:
java -version
With output something like:
Since we already defined the Elastice repository, all we have to do to install Logstash is run:
sudo apt-get install logstash
Before you run Logstash, you will need to configure a data pipeline.
3.1 Download Sample data
We will use some sample data to send data from logstash to elasticsearch (aka “shipping” data). For the purpose of this lab, there is some sample data containing Apache access logs.
It can be download from: https://raw.githubusercontent.com/agoldstein333/Files/main/apache-daily-access.log
Download it using “wget” on your Ubuntu server. Wget
is a command to retrieve files via http when there is no gui/browser.
Make a directory for your data
- `mkdir /logstash
cd into that directory
cd /logstash
use wget to download the data to that directory
wget https://raw.githubusercontent.com/agoldstein333/Files/main/apache-daily-access.log
Change owner/group of that directory to the “logstash” user
sudo chown -R logstash /logstash
sudo chgrp -R logstash /logstash
TROUBLESHOOTING: I kept getting the error saying that the user logstash could not be found. So I trouble shooted to try and figure out if the logstash user actually exsisted.
First I made sure the package was installed with
apt list --installed | grep logstash
and it was indeed installed.
I attemped to make a new user with
sudo adduser logstash
and the system told me there was already a user with that name.Then I checked
cat /etc/passwd
to see if logstash was thereI ended up taking out the / when specifying the file path and it work! I think I made the directory wrong. I then removed the directory and then remade it the correct way so it matched lab instructions.
Permissions changed:
3.2 Create Logstash Configuration File
Next, create a Logstash configuration file to ingest the sample apache logs and output to elasticsearch
The new file will be /etc/logstash/conf.d/apache-01.conf
sudo nano /etc/logstash/conf.d/apache-01.conf
Enter the following Logstash configuration (NOTE - this file is in YAML format and indentation matters) AND
- Make sure to change the path to the location of your data file (probably something like:
/logstash/apache-daily-access.log
) - Make sure to change the elasticsearch IP to the Private IP of your server
input {
file {
path => "your_data_file"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
geoip {
source => "clientip"
}
}
output {
elasticsearch {
#stdout {}
hosts => ["your_private_ip:9200"]
}
}
3.3 Test Configuration File
Before starting Logstash - test the configuration file to make sure there are no errors/issues. You can do this by running logstash from the command line:
sudo /usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/apache-01.conf
Validation message at the end of the output
3.4 Start Logstash
If all goes well, start Logstash with:
sudo service logstash start
If it works, a new Logstash index will be created in Elasticsearch
You can directly call the Elasticsearch api with curl using GET /_cat/indices
to see if the index was created:
Private IP is 172.31.87.23
curl http://your_private_ip:9200/_cat/indices?v
TROUBLESHOOTING: This command wasn't working. It kept saying that the connection timed out. Here is what I did to solve this:
- I rebooted the instance
- I restarted my computer
- I checked to see if both logstash and elasticserach were installed with
apt list --installed | grep logstash
andapt list --installed | grep elasticsearch
- I then checked the status of elasticsearch with
systemctl status elasticseach
and it said it was dead so here was the problem- I ran
systemctl start elasticserach
and we were back in business baby!
If the “doc.count” is zero, wait a few minutes and check again as it can take a bit for the data to be indexed
If it stays at 0, there is likely an issue with importing the apache log file.
It could be:
- Incorrect path in your config file
- Incorrect permissions on the apache log sample file
*Once the document count is showing up, you can again use curl and the elasticsearch api (GET /_search) to see the data - replacing “your_index” with the name of your index - something like logstash-2023.08.13-000001
your_index should be logstash-2023.11.14-000001
make sure to change the IP address to your private IP which is 172.31.87.23
curl http://172.31.83.57:9200/logstash-2023.11.14-000001/_search?pretty=true
You should now see the log data in semi-structured/JSON form - like:
"_index" : "logstash-2023.08.13-000001",
"_type" : "_doc",
"_id" : "INK88IkBHSuKG7wH-wyZ",
"_score" : 1.0,
"_source" : {
"response" : "200",
"httpversion" : "1.1",
"agent" : "\"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.46 Safari/535.11\"",
"request" : "/category/software",
"bytes" : "111",
"geoip" : {
"location" : {
"lon" : -43.0811,
"lat" : -22.9201
},
"region_code" : "RJ",
"longitude" : -43.0811,
"continent_code" : "SA",
"latitude" : -22.9201,
"country_code3" : "BR",
"timezone" : "America/Sao_Paulo",
"city_name" : "Rio de Janeiro",
"region_name" : "Rio de Janeiro",
"country_name" : "Brazil",
"ip" : "200.222.44.110",
"postal_code" : "24000",
"country_code2" : "BR"
},
"auth" : "-",
"host" : "ip-172-31-83-57",
"ident" : "-",
"@version" : "1",
"@timestamp" : "2022-11-30T17:22:20.000Z",
"referrer" : "\"http://www.google.com/search?ie=UTF-8&q=google&sclient=psy-ab&q=Software&oq=Software&aq=f&aqi=g-vL1&aql=&pbx=1&bav=on.2,or.r_gc.r_pw.r_qf.,cf.osb&biw=2401&bih=503\"",
"timestamp" : "30/Nov/2022:17:22:20 +0000",
"verb" : "GET",
"path" : "/logstash/sample-data",
"message" : "200.222.44.110 - - [30/Nov/2022:17:22:20 +0000] \"GET /category/software HTTP/1.1\" 200 111 \"http://www.google.com/search?ie=UTF-8&q=google&sclient=psy-ab&q=Software&oq=Software&aq=f&a
qi=g-vL1&aql=&pbx=1&bav=on.2,or.r_gc.r_pw.r_qf.,cf.osb&biw=2401&bih=503\" \"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.46 Safari/535.11\"",
"clientip" : "200.222.44.110"
}
Use a simple apt command to install Kibana:
sudo apt-get install kibana
Open up the Kibana configuration file with vim or nano at: /etc/kibana/kibana.yml, and make sure you have the following configurations defined - replace with the IP of your Ubuntu system.
sudo micro /etc/kibana.yml
server.port: 5601
server.host: '<YourPrivateIP>'
elasticsearch.hosts: ["http://<YourPrivateIP>:9200"]
Start Kibana with:
sudo service kibana start
NOTE: It might take a few minutes for Kibana to actually start - even though it says the service is running
Open up Kibana in your laptop/workstation browser with: http://Public_IP_of_Ubuntu:5601. You will be presented with the Kibana home page.
http://54.173.196.221:5601
TROUBLESHOOTING: my kibana website was not connecting. Here are the steps I took:
- I checked to see if Kibana, logstash and elasticsearch were all on and running (and they were)
- I checked the logs of kibana to see if it actually was running with `sudo tail -f /var/log/kibana/kibana.log and I got this error message:
{"type":"log","@timestamp":"2023-11-14T15:31:51+00:00","tags":["error","elasticsearch-service"],"pid":1460,"message":"Unable to retrieve version information from Elasticsearch nodes. connect ECONNREFUSED 127.0.0.1:9200"}
_ By this message I figured there was an issue in the configuration file so I went back with `sudo micro /etc/kibana/kibana.yml_
Turns out I didn't uncomment the alterations I made with the private IP address, so kibana could not find elasticsearch and thus was not able to connect in the web browser.
The colorful lines are the ones that I got rid of the comments
You can also see the server status by going to http://YourPublicIP:5601/status
4.2 Add an Index Pattern to display to Logstash Index
In Kibana, click the 3 horizontal lines in the upper left to open the left navigation pane
Go to Stack Management → Kibana -> Index Patterns - select “Create Index Pattern”
Enter “logstash-*” as the index pattern, and in the next step select @timestamp as your Time Filter field.
Hit Create index pattern, and you are ready to analyze the data.
4.3 Use Kibana to query data
In the Navigation Menu (3 lines) Go to the Analytics- Discover tab
Use the time filter in the Upper Right to find a time window that includes the Apache Log data. (In testing, the file I downloaded went back to November 2022)
4.4 Query the Data
Explore the data and some of the key-value pairs that are in the documents. Notice that not all documents have all the same attributes - but there is consistency in the key name where the data is present.
Also note the GeoIP data - this was added as part of the logstash processing. Using a GeoIP database, it looked up the location of the Client_IP address from the original data.
Deliverable: Use the Search Bar in the upper left and make 3 different queries using the keys in the data
Searching for users with the country code AR
(Argentina)
Link to search for country codes: https://www.nationsonline.org/oneworld/country_code_list.htm
Searching for a googlebot from the US
Searching for requests that are 126 bytes
Links for reference:
Screenshot of Discover Window showing Apache logs through logstash