Lab 2 Elasticsearch Queries - Hsanokklis/2023-2024-Tech-journal GitHub Wiki

Helpful Info

The Public IPv4 address will change every new session. Current IPv4 address in use: 3.92.222.131

to access your instance ssh -i hannelore-elk-key.pem ubuntu@public key

Private IPv4 address is : 172.31.87.23

When next login your system make sure to start everything again

  • Start in this order: Elasticsearch, Logstash, Kibana

Helpful Vocab!

Node

  • anytime that you start an instance of Elasticsearch you are starting a node.
  • if you are running a single node of Elasticsearch, then you have a cluster of one node.
  • Every node in the cluster can handle HTTP and transport traffic by default
  • Every node in the cluster can handle

Link: https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html#:~:text=Any%20time%20that%20you%20start,and%20transport%20traffic%20by%20default.

Cluster

  • a collection of connected nodes [Shards](https://www.baeldung.com/java-shards-replicas-elasticsearch)
  • Shards are building blocks representing a subset of the data stored in the index
  • a shard is a Lucene index defined and stored within a node
  • the collection of one or more shards represents an Elasticsearch index
  • They are used to distribute data horizontally across the cluster nodes/members

Link: https://logz.io/blog/elasticsearch-cluster-tutorial/#:~:text=What%20is%20an%20Elasticsearch%20cluster,the%20nodes%20in%20the%20cluster.

Lucene shards and Index

  • Lucene or Apache Lucene is an open-source Java Library used as a search engine
  • Elasticsearch is built on top of Lucene
  • Elasticsearch converts Lucene into a distributed system/search engine for scaling horizontally.

Link: https://opster.com/guides/elasticsearch/glossary/elasticsearch-lucene/#:~:text=Lucene%20or%20Apache%20Lucene%20is,search%20engine%20for%20scaling%20horizontally.

Horizontal Scaling

  • increasing the capacity of a system by adding additional machines (nodes), as opposed to increasing the capability of the existing machines.

Link: https://www.cockroachlabs.com/blog/vertical-scaling-vs-horizontal-scaling/#:~:text=handle%20heavier%20workloads.-,What%20is%20horizontal%20scaling%3F,is%20also%20called%20scaling%20out.

Step 1: Using API to create Indices and Add Documents

Indexing is the process of adding data to Elasticsearch. This is because when you feed data into Elasticsearch the data is placed into Apache Lucene

Imagine you have some log data or configuration files with the following info:

image

image

We can use the API to create a indices for that!

View the current indexes in your Elasticsearch installation (replace localhost with your IP):

curl -XGET '172.31.87.23:9200/_cat/indices?v&pretty'

image

  • There should be a bunch of system created ones that start with a leading "."
  • And you should see the Logstash index we created the other day!

image

Use curl as in the following examples to create HTTP packets to send POSTS and PUTS to Elasticsearch service

Create "app" index and add the user John with id=4

curl -X PUT 'localhost:9200/app/users/4' -H 'Content-Type: application/json' -d '
{
  "id": 4,
  "username": "john",
  "last_login": "2023-01-25 12:34:56"
}
'

image

  • replace the localhost with the private IP of your server
  • -H specifies a HTTP header, in this case setting the Content-Type to JSON
  • -d is the body of the message which you should remember from studying HTTP contains parameters and values sent to the server
  • Uses PUT as the id is specified
  • app is the index which will be created if it does not exist
  • users is the type of document

Should get a response like:

{"_index":"app","_type":"users","_id":"4","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}

image

Create the “logs” index and add an entry for a log in by John

curl -XPOST 'localhost:9200/logs/my_app' -H 'Content-Type: application/json' -d'
{
	"timestamp": "2023-01-24 12:34:56",
	"message": "User logged in",
	"user_id": 4,
	"admin": false
}
'

image

  • replace localhost with the private ip of your server
  • -H specifies a HTTP header, in this case setting the Content-Type to JSON
  • -d is the body of the message which you should remember from studying HTTP contains parameters and values sent to the server
  • Uses POST as it should auto-assign an id
  • logs is the index which will be created if it does not exist
  • my_app is the Type of document

Should get a response like:

{"_index":"logs","_type":"my_app","_id":"ZsWdJ2EBir6MIbMWSMyF","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}

image

Review the list of indices again - and should see “logs” and “app” showing as indices:

curl -XGET '172.31.87.23:9200/_cat/indices?v&pretty'

image

Add more documents to the “app” index

Add 2 more users to the app index using a curl command similar to the steps above

NOTE: Make sure to change the id number (N) in both the URL and the JSON doc when using PUT or it will overwrite your current John entry

  • localhost:9200/app/users/N where N is the id you want to use
  • One can be from the table above (Jane, Robert, or Emily)
  • One should be one that you make up

Adding emily with the id = 12

curl -X PUT '172.31.87.23:9200/app/users/12' -H 'Content-Type: application/json' -d '
{
  "id": 12,
  "username": "emily",
  "last_login": "2023-01-25 12:34:56"
}
'

image

Output:

image

Adding greg with the id = 26

curl -X PUT '172.31.87.23:9200/app/users/26' -H 'Content-Type: application/json' -d '
{
  "id": 26,
  "username": "greg",
  "last_login": "2023-01-25 12:34:56"
}
'

image

Output:

image

Add 2 more entries into the “log” index for the 2 users you added

Use the curl in Step 4 and the Access table above as a guide

  • Entries can include messages User Logged In, User Logged Out, Incorrect Password

Message Entry for emily

curl -XPOST '172.31.87.23:9200/logs/my_app' -H 'Content-Type: application/json' -d'
{
	"timestamp": "2023-01-24 12:34:56",
	"message": "User logged in",
	"user_id": 12,
	"admin": false
}
'

image

Output:

image

Message Entry for greg

curl -XPOST '172.31.87.23:9200/logs/my_app' -H 'Content-Type: application/json' -d'
{
	"timestamp": "2023-01-24 12:34:56",
	"message": "Password Incorrect",
	"user_id": 26,
	"admin": false
}
'

image

Output:

image

Step 2: Elasticsearch API Queries

Now that the data is indexed in Elasticsearch, we can start searching and analyzing it. The simplest query is to fetch items with a URI Search.

URI or Uniform Resource Identifier can be used to search an Elasticserach cluster. You can pass a simple query to Elasticsearch using the q query parameter. The following query will search your whole cluster for documents with the name field equal to "travis":

  • curl “localhost:9200/_search?q=name:travis”

https://logz.io/blog/elasticsearch-queries/#:~:text=URI%20Search,%E2%80%9Clocalhost%3A9200%2F_search%3F

Via the Elasticsearch REST API, use GET to retrieve the doc for id 4

curl -XGET '172.31.87.23:9200/app/users/4?pretty'

response should be like this:

{
  "_index" : "app",
  "_type" : "users",
  "_id" : "4",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "id" : 4,
    "username" : "john",
    "last_login" : "2023-01-25 12:34:56"
  }
}

image

  • The fields starting with an underscore are all meta fields of the result. The _source object is the original document that was indexed.

Use GET to do searches by calling the _search API endpoint with the “q” parameter. The following will return documents that include the keyword “logged”.

Request:

curl -XGET '172.31.87.23:9200/_search?q=logged'

image

Response:

{"took":173,"timed_out":false,"_shards":{"total":16,"successful":16,"skipped":0,"failed":0},"hits":{"total":1,"max_score":0.2876821,"hits":[{"_index":"logs","_type":"my_app","_id":"ZsWdJ2EBir6MIbMWSMyF","_score":0.2876821,"_source":
{
    "timestamp": "2023-01-24 12:34:56",
    "message": "User logged in",
    "user_id": 4,
    "admin": false
}
}]}}

image

The result contains a number of extra fields that describe both the search and the result. Here’s a quick rundown:

  • took: The time in milliseconds the search took
  • timed_out: if the search timed out
  • _shards: The number of Lucene shards searched, and their success and failure rates
  • hits: the actual results, along with meta information for the results

The search above is known as a URI Search, and is the simplest way to query Elasticsearch. By providing only a word, ES will search all of the fields of all the documents for that word (which was logged)

Using the following as an example, try some more specific searches by using Lucene queries:

curl -XGET '172.31.87.23:9200/_search?q=username:john&pretty'

  • using &pretty will present the output in more human readable format. Important for more complex queries later on
  • username:john – Looks for documents where the username field is equal to “john”
  • john* – Looks for documents that contain terms that start with john and is followed by zero or more characters such as “john,” “johnb,” and “johnson”
  • john? – Looks for documents that contain terms that start with john followed by only one character. Matches “johnb” and “johns” but not “john.”

Create 2 Queries that will return some of the records you created

SUBMIT: Screenshot of your 2 successful queries

**Query 1: Looking for username=greg

curl -XGET '172.31.87.23:9200/_search?q=username:greg&pretty'

image

Query 2: Searching for documents starting with emily and are followed by zero of more characters

curl -XGET '172.31.87.23:9200/_search?q=username:emily*&pretty'

image

I also tried searching for documents that contain terms that start with emily followed by only one character

curl -XGET '172.31.87.23:9200/_search?q=username:emily?&pretty'

image

This one came up with no results, as you can see by the hit value of 0.

Step 3: Elasticsearch Query DSL

In addition to URI Searches, Elasticsearch also provides a request body search with a Query DSL for more advanced searches. There is a wide array of options available in these kinds of searches, and can mix and match different options to get results.

Query DSL contains two kinds of clauses:

  • leaf query clauses that look for a value in a specific field
  • compound query clauses (which might contain one or several leaf query clauses).

Test Query DSL with an example (again, replace localhost with your IP address)

curl -XGET '172.31.87.23:9200/logs/_search?pretty' -H 'Content-Type: application/json' -d'
{
  "query": {
    "match_phrase": {
      "message": "User logged in"
    }
  }
}
'

image

Come up with a simple Query DSL to return a record that you created

curl -XGET '172.31.87.23:9200/logs/_search?pretty' -H 'Content-Type: application/json' -d'
{
  "query": {
    "match_phrase": {
      "message": "Password Incorrect"
    }
  }
}
'

image

I used this link for help: https://www.tutorialspoint.com/elasticsearch/elasticsearch_query_dsl.htm

curl -XGET '172.31.87.23:9200/logs/_search?pretty' -H 'Content-Type: application/json' -d'
{
  "query": {
    "term": {
      "user_id": "26"
    }
  }
}
'

image

TROUBLESHOOTING: I wanted to find a user with a certain user ID, but I was having trouble finding what to search to do this.

The first step I took was trying to figure out what Index the records I made were stored and I did this with this command:

curl -X GET "172.31.87.23:9200/logs/_search?pretty" -H 'Content-Type: application/json' -d '{
  "query": {
    "match_all": {}
  }
}' 

By looking at all my records I figured out that I should have been using user_id in my search query instead of id or uid because that is specifically what it is identified as.

image

Step 4: Elasticsearch Query Challenges

Elasticsearch URI Searches

Go to your Kibana page (http://your_public_ip:5601) and review the log entries for the apache logs ingested via logstash (Analytics-Discover)

image

Identify some of the parameters (fields) in the log entry (such as "message", "referrer", "verb", "clientip", "request"

  • message: *

image

  • referrer: *

image

  • clientip: *

image

  • request: *

image

  • Use curl with the _search api call to query for specific parameter/value entries in the Apache log.
  • Remember - syntax will be like curl -XGET 'ip_of_server:9200/_search?q=parameter:value&pretty'

Submit Screenshots of at least two different successful URI queries of apache log data

First successful query

curl -XGET '172.31.87.23:9200/_search?q=referrer:/catergory/toys&pretty'

image

second successful query

curl -XGET '172.31.87.23:9200/_search?q=verb:GET&pretty'

image

Elasticsearch Query DSL

Elasticsearch Query DSL (Domain Specific Language) is a query language used to interact with Elasticsearch. It allows users to construct complex queries to retrieve specific data from an Elasticsearch index. Components of Elasticsearch Query DSL include:

  • Query Clauses: These clauses define the conditions that documents must meet to be considered a match.
  • Filter Clauses: Filters are used to narrow down the search results based on specific criteria without affecting the relevance scores.
  • Aggregation Clauses: Aggregations allow you to perform data analysis on the result set, such as calculating averages, sums or other statistical operations on the data.

A search consists of one or more queries that are combined and sent to Elasticsearch. You can use the search API to search and aggregate data stored in Elasticsearch.

  • Review the data that is returned from your queries against the Apache logs
  • Demonstrate 2 examples of using a Query DSL to retrieve data from the Apache logs
    • can use curl or Kibana Console (from Kibana: Management - DevTools)

Using the Dev tools to search for files that are 126 bytes in size

GET /logstash-2023.11.14-000001/_search
{
  "query": {
    "bool": {
      "must": [
        {"term": {"bytes": {"value": "126"}}}
      ]
    }
  }
}

image

Using the Dev tools to search for documents with the country code IT (for Italy)

GET /logstash-2023.11.14-000001/_search
{
  "query": {
      "match": {
        "geoip.country_code3": "IT"
      }
  }
}

image