ElasticSearch (2023) - MacKittipat/note-developer GitHub Wiki

Run ElasticSearch and Kibana

1. Run in Docker

docker run --name mac-elasticsearch -e "discovery.type=single-node" -e "xpack.security.enabled=false" -e "xpack.security.http.ssl.enabled=false" --net elastic -p 9200:9200 -it docker.elastic.co/elasticsearch/elasticsearch:8.8.0
docker run --name mac-kibana --net elastic -p 5601:5601 docker.elastic.co/kibana/kibana:8.8.0
  • xpack.security.enabled=false will allow client to access http://localhost:9200 without asking for username and password
  • xpack.security.http.ssl.enabled=false will disable TSL and allow us to use HTTP instead of HTTPS

2. Open Kibana

3. Configure Kibana

What is ElasticSearch

  • Elasticsearch is a distributed, free and open search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured
  • Elasticsearch is a distributed document store.
  • When a document is stored, it is indexed and fully searchable in near real-time within 1 second. Elasticsearch uses a data structure called an inverted index that supports very fast full-text searches
  • An inverted index lists every unique word that appears in any document and identifies all of the documents each word occurs in.
    • There will be an inverted index for each full-text field per index. So if you have an index containing documents that contain five full-text fields, you will have five inverted indices.

Architecture

  • ElasticSearch is distributed.
  • A node is a server (either physical or virtual)
  • A cluster is a collection of nodes
  • As the index is distributed across multiple shards, a query against an index is executed in parallel across all the shards. The results from each shard are then gathered and sent back to the client

Mapping

  • Mapping is the process of defining how a document, and the fields it contains, are stored and indexed.
  • Type of mapping
    • Dynamic mapping
      • Index the document without defining the mappings.
    • Explicit mapping
      • Mapping is defined before the document is indexed

Text Analysis

  • Text analysis is the process of converting unstructured text, like the body of an email or a product description, into a structured format that’s optimized for search.
  • Elasticsearch performs text analysis when indexing or searching text fields.
  • Text Analysis consist of the following processes
    • Tokenizer
      • Breaking a text down into smaller chunks, called tokens. In most cases, these tokens are individual words.
    • Normalization
      • Normalize tokens into a standard format
      • Example
        • Quick can be lowercased: quick.
        • foxes can be stemmed, or reduced to its root word: fox.
        • jump and leap are synonyms and can be indexed as a single word: jump.
  • Text Analysis is performed by Analyzer

Analyzer

  • Consist of character filters, tokenizers, and token filters
  • Character filters
    • Receives the original text as a stream of characters and can transform the stream by adding, removing, or changing characters.
    • For instance, strip HTML elements like <b> from the stream.
  • Tokenizer
    • Receives a stream of characters, breaks it up into individual tokens (usually individual words), and outputs a stream of tokens
    • For instance, a whitespace tokenizer breaks text into tokens whenever it sees any whitespace. It would convert the text "Quick brown fox!" into the terms [Quick, brown, fox!].
  • Token filter
    • Receives the token stream and may add, remove, or change tokens.
    • For example, a lowercase token filter converts all tokens to lowercase

Index and Search Analysis

Field Data Type

  • Text
    • Text fields are analyzed, that is they are passed through an analyzer to convert the string into a list of individual terms before being indexed
    • Text field values are analyzed for full-text search
    • Best suited for unstructured but human-readable content such as such as the body of an email or the description of a product
  • Keyword
    • Often used in sorting, aggregations, and term-level queries
    • Keyword strings are left as-is for filtering and sorting
    • Best suited for structured content such as IDs, email addresses, hostnames, status codes, zip codes, or tags.
    • Use when you want an exact match query

Query

Full text queries

  • Enable you to search analyzed text fields

Term-level quries

  • Term-level queries do not analyze search terms. Instead, term-level queries match the exact terms stored in a field.
  • Can be used to query numbers, boolean, dates, and text
  • Can use the term on analyzed or non-analyzed fields. It just looks for the term in the inverted index.

Courses

References