ElasticSearch (2023) - MacKittipat/note-developer GitHub Wiki

Run ElasticSearch and Kibana

1. Run in Docker

docker run --name mac-elasticsearch -e "discovery.type=single-node" -e "xpack.security.enabled=false" -e "xpack.security.http.ssl.enabled=false" --net elastic -p 9200:9200 -it docker.elastic.co/elasticsearch/elasticsearch:8.8.0
docker run --name mac-kibana --net elastic -p 5601:5601 docker.elastic.co/kibana/kibana:8.8.0

xpack.security.enabled=false will allow client to access http://localhost:9200 without asking for username and password
xpack.security.http.ssl.enabled=false will disable TSL and allow us to use HTTP instead of HTTPS

2. Open Kibana

http://localhost:5601/

3. Configure Kibana

Enter ElasticSearch URL : http://mac-elasticsearch:9200
Enter Verification code from the terminal

What is ElasticSearch

Elasticsearch is a distributed, free and open search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured
Elasticsearch is a distributed document store.
When a document is stored, it is indexed and fully searchable in near real-time within 1 second. Elasticsearch uses a data structure called an inverted index that supports very fast full-text searches
An inverted index lists every unique word that appears in any document and identifies all of the documents each word occurs in.
- There will be an inverted index for each full-text field per index. So if you have an index containing documents that contain five full-text fields, you will have five inverted indices.

Architecture

ElasticSearch is distributed.
A node is a server (either physical or virtual)
A cluster is a collection of nodes
As the index is distributed across multiple shards, a query against an index is executed in parallel across all the shards. The results from each shard are then gathered and sent back to the client

Mapping

Mapping is the process of defining how a document, and the fields it contains, are stored and indexed.
Type of mapping
- Dynamic mapping
  - Index the document without defining the mappings.
- Explicit mapping
  - Mapping is defined before the document is indexed

Text Analysis

Text analysis is the process of converting unstructured text, like the body of an email or a product description, into a structured format that’s optimized for search.
Elasticsearch performs text analysis when indexing or searching text fields.
Text Analysis consist of the following processes
- Tokenizer
  - Breaking a text down into smaller chunks, called tokens. In most cases, these tokens are individual words.
- Normalization
  - Normalize tokens into a standard format
  - Example
    - Quick can be lowercased: quick.
    - foxes can be stemmed, or reduced to its root word: fox.
    - jump and leap are synonyms and can be indexed as a single word: jump.
Text Analysis is performed by Analyzer

Analyzer

Consist of character filters, tokenizers, and token filters
Character filters
- Receives the original text as a stream of characters and can transform the stream by adding, removing, or changing characters.
- For instance, strip HTML elements like <b> from the stream.
Tokenizer
- Receives a stream of characters, breaks it up into individual tokens (usually individual words), and outputs a stream of tokens
- For instance, a whitespace tokenizer breaks text into tokens whenever it sees any whitespace. It would convert the text "Quick brown fox!" into the terms [Quick, brown, fox!].
Token filter
- Receives the token stream and may add, remove, or change tokens.
- For example, a lowercase token filter converts all tokens to lowercase

Index and Search Analysis

Text analysis occurs at two times
- Index time
  - When a document is indexed, any text field values are analyzed.
- Search time
  - When running a full-text search on a text field, the query string (the text the user is searching for) is analyzed.
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-index-search-time.html

Field Data Type

Text
- Text fields are analyzed, that is they are passed through an analyzer to convert the string into a list of individual terms before being indexed
- Text field values are analyzed for full-text search
- Best suited for unstructured but human-readable content such as such as the body of an email or the description of a product
Keyword
- Often used in sorting, aggregations, and term-level queries
- Keyword strings are left as-is for filtering and sorting
- Best suited for structured content such as IDs, email addresses, hostnames, status codes, zip codes, or tags.
- Use when you want an exact match query

Query

Full text queries

Enable you to search analyzed text fields

Term-level quries

Term-level queries do not analyze search terms. Instead, term-level queries match the exact terms stored in a field.
Can be used to query numbers, boolean, dates, and text
Can use the term on analyzed or non-analyzed fields. It just looks for the term in the inverted index.

Courses

https://www.youtube.com/watch?v=gS_nHTWZEJ8&list=PL_mJOmq4zsHZYAyK606y7wjQtC0aoE6Es

ElasticSearch (2023) - MacKittipat/note-developer GitHub Wiki

Run ElasticSearch and Kibana

1. Run in Docker

2. Open Kibana

3. Configure Kibana

What is ElasticSearch

Architecture

Mapping

Text Analysis

Analyzer

Index and Search Analysis

Field Data Type

Query

Full text queries

Term-level quries

Courses

References