Lab 2 Elasticsearch Queries - Hsanokklis/2023-2024-Tech-journal GitHub Wiki

Helpful Info

The Public IPv4 address will change every new session. Current IPv4 address in use: 3.92.222.131

to access your instance ssh -i hannelore-elk-key.pem ubuntu@public key

Private IPv4 address is : 172.31.87.23

When next login your system make sure to start everything again

Start in this order: Elasticsearch, Logstash, Kibana

Helpful Vocab!

Node

anytime that you start an instance of Elasticsearch you are starting a node.

if you are running a single node of Elasticsearch, then you have a cluster of one node.

Every node in the cluster can handle HTTP and transport traffic by default

Every node in the cluster can handle

Link: https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html#:~:text=Any%20time%20that%20you%20start,and%20transport%20traffic%20by%20default.

Cluster

a collection of connected nodes [Shards](https://www.baeldung.com/java-shards-replicas-elasticsearch)

Shards are building blocks representing a subset of the data stored in the index

a shard is a Lucene index defined and stored within a node

the collection of one or more shards represents an Elasticsearch index

They are used to distribute data horizontally across the cluster nodes/members

Link: https://logz.io/blog/elasticsearch-cluster-tutorial/#:~:text=What%20is%20an%20Elasticsearch%20cluster,the%20nodes%20in%20the%20cluster.

Lucene shards and Index

Lucene or Apache Lucene is an open-source Java Library used as a search engine

Elasticsearch is built on top of Lucene

Elasticsearch converts Lucene into a distributed system/search engine for scaling horizontally.

Link: https://opster.com/guides/elasticsearch/glossary/elasticsearch-lucene/#:~:text=Lucene%20or%20Apache%20Lucene%20is,search%20engine%20for%20scaling%20horizontally.

Horizontal Scaling

increasing the capacity of a system by adding additional machines (nodes), as opposed to increasing the capability of the existing machines.

Link: https://www.cockroachlabs.com/blog/vertical-scaling-vs-horizontal-scaling/#:~:text=handle%20heavier%20workloads.-,What%20is%20horizontal%20scaling%3F,is%20also%20called%20scaling%20out.

Step 1: Using API to create Indices and Add Documents

Indexing is the process of adding data to Elasticsearch. This is because when you feed data into Elasticsearch the data is placed into Apache Lucene

Imagine you have some log data or configuration files with the following info:

We can use the API to create a indices for that!

View the current indexes in your Elasticsearch installation (replace localhost with your IP):

curl -XGET '172.31.87.23:9200/_cat/indices?v&pretty'

There should be a bunch of system created ones that start with a leading "."

And you should see the Logstash index we created the other day!

Use curl as in the following examples to create HTTP packets to send POSTS and PUTS to Elasticsearch service

Create "app" index and add the user John with id=4

curl -X PUT 'localhost:9200/app/users/4' -H 'Content-Type: application/json' -d '
{
  "id": 4,
  "username": "john",
  "last_login": "2023-01-25 12:34:56"
}
'

replace the localhost with the private IP of your server
-H specifies a HTTP header, in this case setting the Content-Type to JSON
-d is the body of the message which you should remember from studying HTTP contains parameters and values sent to the server
Uses PUT as the id is specified
app is the index which will be created if it does not exist
users is the type of document

Should get a response like:

{"_index":"app","_type":"users","_id":"4","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}

Create the “logs” index and add an entry for a log in by John

curl -XPOST 'localhost:9200/logs/my_app' -H 'Content-Type: application/json' -d'
{
	"timestamp": "2023-01-24 12:34:56",
	"message": "User logged in",
	"user_id": 4,
	"admin": false
}
'

replace localhost with the private ip of your server
-H specifies a HTTP header, in this case setting the Content-Type to JSON
-d is the body of the message which you should remember from studying HTTP contains parameters and values sent to the server
Uses POST as it should auto-assign an id
logs is the index which will be created if it does not exist
my_app is the Type of document

Should get a response like:

{"_index":"logs","_type":"my_app","_id":"ZsWdJ2EBir6MIbMWSMyF","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}

Review the list of indices again - and should see “logs” and “app” showing as indices:

curl -XGET '172.31.87.23:9200/_cat/indices?v&pretty'

Add more documents to the “app” index

Add 2 more users to the app index using a curl command similar to the steps above

NOTE: Make sure to change the id number (N) in both the URL and the JSON doc when using PUT or it will overwrite your current John entry

localhost:9200/app/users/N where N is the id you want to use

One can be from the table above (Jane, Robert, or Emily)

One should be one that you make up

Adding emily with the id = 12

curl -X PUT '172.31.87.23:9200/app/users/12' -H 'Content-Type: application/json' -d '
{
  "id": 12,
  "username": "emily",
  "last_login": "2023-01-25 12:34:56"
}
'

Output:

Adding greg with the id = 26

curl -X PUT '172.31.87.23:9200/app/users/26' -H 'Content-Type: application/json' -d '
{
  "id": 26,
  "username": "greg",
  "last_login": "2023-01-25 12:34:56"
}
'

Output:

Add 2 more entries into the “log” index for the 2 users you added

Use the curl in Step 4 and the Access table above as a guide

Entries can include messages User Logged In, User Logged Out, Incorrect Password

Message Entry for emily

curl -XPOST '172.31.87.23:9200/logs/my_app' -H 'Content-Type: application/json' -d'
{
	"timestamp": "2023-01-24 12:34:56",
	"message": "User logged in",
	"user_id": 12,
	"admin": false
}
'

Output:

Message Entry for greg

curl -XPOST '172.31.87.23:9200/logs/my_app' -H 'Content-Type: application/json' -d'
{
	"timestamp": "2023-01-24 12:34:56",
	"message": "Password Incorrect",
	"user_id": 26,
	"admin": false
}
'

Output:

Step 2: Elasticsearch API Queries

Now that the data is indexed in Elasticsearch, we can start searching and analyzing it. The simplest query is to fetch items with a URI Search.

URI or Uniform Resource Identifier can be used to search an Elasticserach cluster. You can pass a simple query to Elasticsearch using the q query parameter. The following query will search your whole cluster for documents with the name field equal to "travis":

curl “localhost:9200/_search?q=name:travis”

https://logz.io/blog/elasticsearch-queries/#:~:text=URI%20Search,%E2%80%9Clocalhost%3A9200%2F_search%3F

Via the Elasticsearch REST API, use `GET` to retrieve the doc for `id 4`

curl -XGET '172.31.87.23:9200/app/users/4?pretty'

response should be like this:

{
  "_index" : "app",
  "_type" : "users",
  "_id" : "4",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "id" : 4,
    "username" : "john",
    "last_login" : "2023-01-25 12:34:56"
  }
}

The fields starting with an underscore are all meta fields of the result. The _source object is the original document that was indexed.

Use GET to do searches by calling the _search API endpoint with the “q” parameter. The following will return documents that include the keyword “logged”.

Request:

curl -XGET '172.31.87.23:9200/_search?q=logged'

Response:

{"took":173,"timed_out":false,"_shards":{"total":16,"successful":16,"skipped":0,"failed":0},"hits":{"total":1,"max_score":0.2876821,"hits":[{"_index":"logs","_type":"my_app","_id":"ZsWdJ2EBir6MIbMWSMyF","_score":0.2876821,"_source":
{
    "timestamp": "2023-01-24 12:34:56",
    "message": "User logged in",
    "user_id": 4,
    "admin": false
}
}]}}

The result contains a number of extra fields that describe both the search and the result. Here’s a quick rundown:

took: The time in milliseconds the search took
timed_out: if the search timed out
_shards: The number of Lucene shards searched, and their success and failure rates
hits: the actual results, along with meta information for the results

The search above is known as a URI Search, and is the simplest way to query Elasticsearch. By providing only a word, ES will search all of the fields of all the documents for that word (which was logged)

Using the following as an example, try some more specific searches by using Lucene queries:

curl -XGET '172.31.87.23:9200/_search?q=username:john&pretty'

using &pretty will present the output in more human readable format. Important for more complex queries later on
username:john – Looks for documents where the username field is equal to “john”
john* – Looks for documents that contain terms that start with john and is followed by zero or more characters such as “john,” “johnb,” and “johnson”
john? – Looks for documents that contain terms that start with john followed by only one character. Matches “johnb” and “johns” but not “john.”

Create 2 Queries that will return some of the records you created

SUBMIT: Screenshot of your 2 successful queries

**Query 1: Looking for username=greg

curl -XGET '172.31.87.23:9200/_search?q=username:greg&pretty'

Query 2: Searching for documents starting with emily and are followed by zero of more characters

curl -XGET '172.31.87.23:9200/_search?q=username:emily*&pretty'

I also tried searching for documents that contain terms that start with emily followed by only one character

curl -XGET '172.31.87.23:9200/_search?q=username:emily?&pretty'

This one came up with no results, as you can see by the hit value of 0.

Step 3: Elasticsearch Query DSL

In addition to URI Searches, Elasticsearch also provides a request body search with a Query DSL for more advanced searches. There is a wide array of options available in these kinds of searches, and can mix and match different options to get results.

Query DSL contains two kinds of clauses:

leaf query clauses that look for a value in a specific field

compound query clauses (which might contain one or several leaf query clauses).

Test Query DSL with an example (again, replace localhost with your IP address)

curl -XGET '172.31.87.23:9200/logs/_search?pretty' -H 'Content-Type: application/json' -d'
{
  "query": {
    "match_phrase": {
      "message": "User logged in"
    }
  }
}
'

Come up with a simple Query DSL to return a record that you created

curl -XGET '172.31.87.23:9200/logs/_search?pretty' -H 'Content-Type: application/json' -d'
{
  "query": {
    "match_phrase": {
      "message": "Password Incorrect"
    }
  }
}
'

I used this link for help: https://www.tutorialspoint.com/elasticsearch/elasticsearch_query_dsl.htm

curl -XGET '172.31.87.23:9200/logs/_search?pretty' -H 'Content-Type: application/json' -d'
{
  "query": {
    "term": {
      "user_id": "26"
    }
  }
}
'

TROUBLESHOOTING: I wanted to find a user with a certain user ID, but I was having trouble finding what to search to do this.

The first step I took was trying to figure out what Index the records I made were stored and I did this with this command:

curl -X GET "172.31.87.23:9200/logs/_search?pretty" -H 'Content-Type: application/json' -d '{
  "query": {
    "match_all": {}
  }
}'

By looking at all my records I figured out that I should have been using user_id in my search query instead of id or uid because that is specifically what it is identified as.

Step 4: Elasticsearch Query Challenges

Elasticsearch URI Searches

Go to your Kibana page (http://your_public_ip:5601) and review the log entries for the apache logs ingested via logstash (Analytics-Discover)

Identify some of the parameters (fields) in the log entry (such as "message", "referrer", "verb", "clientip", "request"

message: *

referrer: *

clientip: *

request: *

Use curl with the _search api call to query for specific parameter/value entries in the Apache log.
Remember - syntax will be like curl -XGET 'ip_of_server:9200/_search?q=parameter:value&pretty'

Submit Screenshots of at least two different successful URI queries of apache log data

`First successful query`

curl -XGET '172.31.87.23:9200/_search?q=referrer:/catergory/toys&pretty'

`second successful query`

curl -XGET '172.31.87.23:9200/_search?q=verb:GET&pretty'

Elasticsearch Query DSL

Do some research on example Query DSL queries
- The Query DSL on this page is a good starting point https://www.elastic.co/guide/en/elasticsearch/reference/current/search-your-data.html

Elasticsearch Query DSL (Domain Specific Language) is a query language used to interact with Elasticsearch. It allows users to construct complex queries to retrieve specific data from an Elasticsearch index. Components of Elasticsearch Query DSL include:

Query Clauses: These clauses define the conditions that documents must meet to be considered a match.

Filter Clauses: Filters are used to narrow down the search results based on specific criteria without affecting the relevance scores.

Aggregation Clauses: Aggregations allow you to perform data analysis on the result set, such as calculating averages, sums or other statistical operations on the data.

A search consists of one or more queries that are combined and sent to Elasticsearch. You can use the search API to search and aggregate data stored in Elasticsearch.

Review the data that is returned from your queries against the Apache logs
Demonstrate 2 examples of using a Query DSL to retrieve data from the Apache logs
- can use curl or Kibana Console (from Kibana: Management - DevTools)

`Using the Dev tools to search for files that are 126 bytes in size`

GET /logstash-2023.11.14-000001/_search
{
  "query": {
    "bool": {
      "must": [
        {"term": {"bytes": {"value": "126"}}}
      ]
    }
  }
}

`Using the Dev tools to search for documents with the country code IT (for Italy)`

GET /logstash-2023.11.14-000001/_search
{
  "query": {
      "match": {
        "geoip.country_code3": "IT"
      }
  }
}

Lab 2 Elasticsearch Queries - Hsanokklis/2023-2024-Tech-journal GitHub Wiki

Helpful Info

Helpful Vocab!

Step 1: Using API to create Indices and Add Documents

View the current indexes in your Elasticsearch installation (replace localhost with your IP):

Use curl as in the following examples to create HTTP packets to send POSTS and PUTS to Elasticsearch service

Create the “logs” index and add an entry for a log in by John

Review the list of indices again - and should see “logs” and “app” showing as indices:

Add more documents to the “app” index

Add 2 more entries into the “log” index for the 2 users you added

Step 2: Elasticsearch API Queries

Via the Elasticsearch REST API, use GET to retrieve the doc for id 4

Use GET to do searches by calling the _search API endpoint with the “q” parameter. The following will return documents that include the keyword “logged”.

Using the following as an example, try some more specific searches by using Lucene queries:

Create 2 Queries that will return some of the records you created

Step 3: Elasticsearch Query DSL

Test Query DSL with an example (again, replace localhost with your IP address)

Come up with a simple Query DSL to return a record that you created

Step 4: Elasticsearch Query Challenges

Elasticsearch URI Searches

First successful query

second successful query

Elasticsearch Query DSL

Using the Dev tools to search for files that are 126 bytes in size

Using the Dev tools to search for documents with the country code IT (for Italy)

Via the Elasticsearch REST API, use `GET` to retrieve the doc for `id 4`

`First successful query`

`second successful query`

`Using the Dev tools to search for files that are 126 bytes in size`

`Using the Dev tools to search for documents with the country code IT (for Italy)`