Lab 2 Elasticsearch Queries - Hsanokklis/2023-2024-Tech-journal GitHub Wiki
Helpful Info
The Public IPv4 address will change every new session.
Current IPv4 address in use: 3.92.222.131
to access your instance
ssh -i hannelore-elk-key.pem ubuntu@public key
Private IPv4 address is :
172.31.87.23
When next login your system make sure to start everything again
- Start in this order: Elasticsearch, Logstash, Kibana
Helpful Vocab!
Node
- anytime that you start an instance of Elasticsearch you are starting a node.
- if you are running a single node of Elasticsearch, then you have a cluster of one node.
- Every node in the cluster can handle HTTP and transport traffic by default
- Every node in the cluster can handle
Cluster
- a collection of connected nodes
[Shards](https://www.baeldung.com/java-shards-replicas-elasticsearch)
- Shards are building blocks representing a subset of the data stored in the index
- a shard is a Lucene index defined and stored within a node
- the collection of one or more shards represents an Elasticsearch index
- They are used to distribute data horizontally across the cluster nodes/members
Lucene shards and Index
- Lucene or Apache Lucene is an open-source Java Library used as a search engine
- Elasticsearch is built on top of Lucene
- Elasticsearch converts Lucene into a distributed system/search engine for scaling horizontally.
Horizontal Scaling
- increasing the capacity of a system by adding additional machines (nodes), as opposed to increasing the capability of the existing machines.
Step 1: Using API to create Indices and Add Documents
Indexing is the process of adding data to Elasticsearch. This is because when you feed data into Elasticsearch the data is placed into Apache Lucene
Imagine you have some log data or configuration files with the following info:
We can use the API to create a indices for that!
View the current indexes in your Elasticsearch installation (replace localhost with your IP):
curl -XGET '172.31.87.23:9200/_cat/indices?v&pretty'
- There should be a bunch of system created ones that start with a leading
"."
- And you should see the Logstash index we created the other day!
Use curl as in the following examples to create HTTP packets to send POSTS and PUTS to Elasticsearch service
Create "app" index and add the user John
with id=4
curl -X PUT 'localhost:9200/app/users/4' -H 'Content-Type: application/json' -d '
{
"id": 4,
"username": "john",
"last_login": "2023-01-25 12:34:56"
}
'
- replace the localhost with the private IP of your server
-H
specifies a HTTP header, in this case setting theContent-Type
toJSON
-d
is the body of the message which you should remember from studying HTTP contains parameters and values sent to the server- Uses
PUT
as the id is specified app
is the index which will be created if it does not existusers
is the type of document
Should get a response like:
{"_index":"app","_type":"users","_id":"4","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}
Create the “logs” index and add an entry for a log in by John
curl -XPOST 'localhost:9200/logs/my_app' -H 'Content-Type: application/json' -d'
{
"timestamp": "2023-01-24 12:34:56",
"message": "User logged in",
"user_id": 4,
"admin": false
}
'
- replace localhost with the private ip of your server
-H
specifies a HTTP header, in this case setting theContent-Type
toJSON
-d
is the body of the message which you should remember from studying HTTP contains parameters and values sent to the server- Uses
POST
as it should auto-assign anid
logs
is the index which will be created if it does not existmy_app
is the Type of document
Should get a response like:
{"_index":"logs","_type":"my_app","_id":"ZsWdJ2EBir6MIbMWSMyF","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}
Review the list of indices again - and should see “logs” and “app” showing as indices:
curl -XGET '172.31.87.23:9200/_cat/indices?v&pretty'
Add more documents to the “app” index
Add 2 more users to the app index using a curl command similar to the steps above
NOTE: Make sure to change the id number (N) in both the URL and the JSON doc when using PUT or it will overwrite your current John entry
localhost:9200/app/users/N
where N is the id you want to use- One can be from the table above (Jane, Robert, or Emily)
- One should be one that you make up
Adding emily
with the id = 12
curl -X PUT '172.31.87.23:9200/app/users/12' -H 'Content-Type: application/json' -d '
{
"id": 12,
"username": "emily",
"last_login": "2023-01-25 12:34:56"
}
'
Output:
Adding greg
with the id = 26
curl -X PUT '172.31.87.23:9200/app/users/26' -H 'Content-Type: application/json' -d '
{
"id": 26,
"username": "greg",
"last_login": "2023-01-25 12:34:56"
}
'
Output:
Add 2 more entries into the “log” index for the 2 users you added
Use the curl in Step 4 and the Access table above as a guide
- Entries can include messages
User Logged In
,User Logged Out
,Incorrect Password
Message Entry for emily
curl -XPOST '172.31.87.23:9200/logs/my_app' -H 'Content-Type: application/json' -d'
{
"timestamp": "2023-01-24 12:34:56",
"message": "User logged in",
"user_id": 12,
"admin": false
}
'
Output:
Message Entry for greg
curl -XPOST '172.31.87.23:9200/logs/my_app' -H 'Content-Type: application/json' -d'
{
"timestamp": "2023-01-24 12:34:56",
"message": "Password Incorrect",
"user_id": 26,
"admin": false
}
'
Output:
Step 2: Elasticsearch API Queries
Now that the data is indexed in Elasticsearch, we can start searching and analyzing it. The simplest query is to fetch items with a URI Search.
URI or Uniform Resource Identifier can be used to search an Elasticserach cluster. You can pass a simple query to Elasticsearch using the
q
query parameter. The following query will search your whole cluster for documents with the name field equal to "travis":
curl “localhost:9200/_search?q=name:travis”
GET
to retrieve the doc for id 4
Via the Elasticsearch REST API, use curl -XGET '172.31.87.23:9200/app/users/4?pretty'
response should be like this:
{
"_index" : "app",
"_type" : "users",
"_id" : "4",
"_version" : 1,
"found" : true,
"_source" : {
"id" : 4,
"username" : "john",
"last_login" : "2023-01-25 12:34:56"
}
}
- The fields starting with an underscore are all meta fields of the result. The
_source
object is the original document that was indexed.
Use GET to do searches by calling the _search API endpoint with the “q” parameter. The following will return documents that include the keyword “logged”.
Request:
curl -XGET '172.31.87.23:9200/_search?q=logged'
Response:
{"took":173,"timed_out":false,"_shards":{"total":16,"successful":16,"skipped":0,"failed":0},"hits":{"total":1,"max_score":0.2876821,"hits":[{"_index":"logs","_type":"my_app","_id":"ZsWdJ2EBir6MIbMWSMyF","_score":0.2876821,"_source":
{
"timestamp": "2023-01-24 12:34:56",
"message": "User logged in",
"user_id": 4,
"admin": false
}
}]}}
The result contains a number of extra fields that describe both the search and the result. Here’s a quick rundown:
took
: The time in milliseconds the search tooktimed_out
: if the search timed out_shards
: The number of Lucene shards searched, and their success and failure rateshits
: the actual results, along with meta information for the results
The search above is known as a URI Search, and is the simplest way to query Elasticsearch. By providing only a word, ES will search all of the fields of all the documents for that word (which was logged
)
Using the following as an example, try some more specific searches by using Lucene queries:
curl -XGET '172.31.87.23:9200/_search?q=username:john&pretty'
- using
&pretty
will present the output in more human readable format. Important for more complex queries later on username:john
– Looks for documents where the username field is equal to “john”john*
– Looks for documents that contain terms that start with john and is followed by zero or more characters such as “john,” “johnb,” and “johnson”john?
– Looks for documents that contain terms that start with john followed by only one character. Matches “johnb” and “johns” but not “john.”
Create 2 Queries that will return some of the records you created
SUBMIT: Screenshot of your 2 successful queries
**Query 1: Looking for username=greg
curl -XGET '172.31.87.23:9200/_search?q=username:greg&pretty'
Query 2: Searching for documents starting with emily
and are followed by zero of more characters
curl -XGET '172.31.87.23:9200/_search?q=username:emily*&pretty'
I also tried searching for documents that contain terms that start with emily
followed by only one character
curl -XGET '172.31.87.23:9200/_search?q=username:emily?&pretty'
This one came up with no results, as you can see by the hit
value of 0.
Step 3: Elasticsearch Query DSL
In addition to URI Searches
, Elasticsearch also provides a request body search
with a Query DSL
for more advanced searches. There is a wide array of options available in these kinds of searches, and can mix and match different options to get results.
Query DSL contains two kinds of clauses:
- leaf query clauses that look for a value in a specific field
- compound query clauses (which might contain one or several leaf query clauses).
Test Query DSL with an example (again, replace localhost with your IP address)
curl -XGET '172.31.87.23:9200/logs/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"match_phrase": {
"message": "User logged in"
}
}
}
'
Come up with a simple Query DSL to return a record that you created
curl -XGET '172.31.87.23:9200/logs/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"match_phrase": {
"message": "Password Incorrect"
}
}
}
'
I used this link for help: https://www.tutorialspoint.com/elasticsearch/elasticsearch_query_dsl.htm
curl -XGET '172.31.87.23:9200/logs/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"term": {
"user_id": "26"
}
}
}
'
TROUBLESHOOTING: I wanted to find a user with a certain user ID, but I was having trouble finding what to search to do this.
The first step I took was trying to figure out what Index the records I made were stored and I did this with this command:
curl -X GET "172.31.87.23:9200/logs/_search?pretty" -H 'Content-Type: application/json' -d '{
"query": {
"match_all": {}
}
}'
By looking at all my records I figured out that I should have been using
user_id
in my search query instead ofid
oruid
because that is specifically what it is identified as.
Step 4: Elasticsearch Query Challenges
Elasticsearch URI Searches
Go to your Kibana page (http://your_public_ip:5601) and review the log entries for the apache logs ingested via logstash (Analytics-Discover)
Identify some of the parameters (fields) in the log entry (such as "message", "referrer", "verb", "clientip", "request"
message: *
referrer: *
clientip: *
request: *
- Use curl with the _search api call to query for specific parameter/value entries in the Apache log.
- Remember - syntax will be like
curl -XGET 'ip_of_server:9200/_search?q=parameter:value&pretty'
Submit Screenshots of at least two different successful URI queries of apache log data
First successful query
curl -XGET '172.31.87.23:9200/_search?q=referrer:/catergory/toys&pretty'
second successful query
curl -XGET '172.31.87.23:9200/_search?q=verb:GET&pretty'
Elasticsearch Query DSL
- Do some research on example Query DSL queries
- The Query DSL on this page is a good starting point https://www.elastic.co/guide/en/elasticsearch/reference/current/search-your-data.html
Elasticsearch Query DSL (Domain Specific Language) is a query language used to interact with Elasticsearch. It allows users to construct complex queries to retrieve specific data from an Elasticsearch index. Components of Elasticsearch Query DSL include:
- Query Clauses: These clauses define the conditions that documents must meet to be considered a match.
- Filter Clauses: Filters are used to narrow down the search results based on specific criteria without affecting the relevance scores.
- Aggregation Clauses: Aggregations allow you to perform data analysis on the result set, such as calculating averages, sums or other statistical operations on the data.
A search consists of one or more queries that are combined and sent to Elasticsearch. You can use the search API to search and aggregate data stored in Elasticsearch.
- Review the data that is returned from your queries against the Apache logs
- Demonstrate 2 examples of using a Query DSL to retrieve data from the Apache logs
- can use curl or Kibana Console (from Kibana: Management - DevTools)
Using the Dev tools to search for files that are 126 bytes in size
GET /logstash-2023.11.14-000001/_search
{
"query": {
"bool": {
"must": [
{"term": {"bytes": {"value": "126"}}}
]
}
}
}
Using the Dev tools to search for documents with the country code IT (for Italy)
GET /logstash-2023.11.14-000001/_search
{
"query": {
"match": {
"geoip.country_code3": "IT"
}
}
}