ElasticSearch (2023) - MacKittipat/note-developer GitHub Wiki
Run ElasticSearch and Kibana
1. Run in Docker
docker run --name mac-elasticsearch -e "discovery.type=single-node" -e "xpack.security.enabled=false" -e "xpack.security.http.ssl.enabled=false" --net elastic -p 9200:9200 -it docker.elastic.co/elasticsearch/elasticsearch:8.8.0
docker run --name mac-kibana --net elastic -p 5601:5601 docker.elastic.co/kibana/kibana:8.8.0
xpack.security.enabled=false
will allow client to access http://localhost:9200
without asking for username and password
xpack.security.http.ssl.enabled=false
will disable TSL and allow us to use HTTP instead of HTTPS
2. Open Kibana
3. Configure Kibana
What is ElasticSearch
- Elasticsearch is a distributed, free and open search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured
- Elasticsearch is a distributed document store.
- When a document is stored, it is indexed and fully searchable in near real-time within 1 second. Elasticsearch uses a data structure called an inverted index that supports very fast full-text searches
- An inverted index lists every unique word that appears in any document and identifies all of the documents each word occurs in.
- There will be an inverted index for each full-text field per index. So if you have an index containing documents that contain five full-text fields, you will have five inverted indices.
Architecture
- ElasticSearch is distributed.
- A node is a server (either physical or virtual)
- A cluster is a collection of nodes
- As the index is distributed across multiple shards, a query against an index is executed in parallel across all the shards. The results from each shard are then gathered and sent back to the client
Mapping
- Mapping is the process of defining how a document, and the fields it contains, are stored and indexed.
- Type of mapping
- Dynamic mapping
- Index the document without defining the mappings.
- Explicit mapping
- Mapping is defined before the document is indexed
Text Analysis
- Text analysis is the process of converting unstructured text, like the body of an email or a product description, into a structured format that’s optimized for search.
- Elasticsearch performs text analysis when indexing or searching text fields.
- Text Analysis consist of the following processes
- Tokenizer
- Breaking a text down into smaller chunks, called
tokens
. In most cases, these tokens are individual words.
- Normalization
- Normalize tokens into a standard format
- Example
Quick
can be lowercased: quick
.
foxes
can be stemmed, or reduced to its root word: fox
.
jump
and leap
are synonyms and can be indexed as a single word: jump
.
- Text Analysis is performed by Analyzer
Analyzer
- Consist of
character filters
, tokenizers
, and token filters
- Character filters
- Receives the original text as a stream of characters and can transform the stream by adding, removing, or changing characters.
- For instance, strip HTML elements like
<b>
from the stream.
- Tokenizer
- Receives a stream of characters, breaks it up into individual tokens (usually individual words), and outputs a stream of tokens
- For instance, a
whitespace
tokenizer breaks text into tokens whenever it sees any whitespace. It would convert the text "Quick brown fox!"
into the terms [Quick, brown, fox!]
.
- Token filter
- Receives the token stream and may add, remove, or change tokens.
- For example, a
lowercase
token filter converts all tokens to lowercase
Index and Search Analysis
Field Data Type
- Text
- Text fields are analyzed, that is they are passed through an analyzer to convert the string into a list of individual terms before being indexed
- Text field values are analyzed for full-text search
- Best suited for
unstructured but human-readable content
such as such as the body of an email or the description of a product
- Keyword
- Often used in sorting, aggregations, and term-level queries
- Keyword strings are left as-is for filtering and sorting
- Best suited for
structured content
such as IDs, email addresses, hostnames, status codes, zip codes, or tags.
- Use when you want an exact match query
Query
Full text queries
- Enable you to search
analyzed text fields
Term-level quries
- Term-level queries do not analyze search terms. Instead, term-level queries
match the exact terms
stored in a field.
- Can be used to query numbers, boolean, dates, and text
- Can use the term on analyzed or non-analyzed fields. It just looks for the term in the inverted index.
Courses
References