Elasticsearch Cheat Sheet - gpawade/gpawade.github.io GitHub Wiki
ElasticSearch is a highly scalable open source search engine with a REST API
- Distributed, scalable, and highly available
- Real-time search and analytics capabilities
- Sophisticated RESTful API
- is build ton top of Lucene
Default port - localhost:9200
Near Real Time, this means is there is a slight latency (normally one second) from the time you index a document until the time it becomes searchable.
A cluster is a collection of one or more nodes (servers) that together holds your entire data and provides federated indexing and search capabilities across all nodes. A cluster is identified by a unique name which by default is "elasticsearch"
A node is a single server that is part of your cluster, stores your data, and participates in the cluster’s indexing and search capabilities. Just like a cluster, a node is identified by a name which by default is a random Marvel character name that is assigned to the node at startup. In a single cluster, you can have as many nodes as you want.
An index is a collection of documents that have somewhat similar characteristics. For example, you can have an index for customer data, another index for a product catalog, and yet another index for order data.
Within an index, you can define one or more types. A type is a logical category/partition of your index whose semantics is completely up to you. In general, a type is defined for documents that have a set of common fields. For example, let’s assume you run a blogging platform and store all your data in a single index. In this index, you may define a type for user data, another type for blog data, and yet another type for comments data.
A document is a basic unit of information that can be indexed.
An index can potentially store a large amount of data that can exceed the hardware limits of a single node. For example, a single index of a billion documents taking up 1TB of disk space may not fit on the disk of a single node or may be too slow to serve search requests from a single node alone. To solve this problem, Elasticsearch provides the ability to subdivide your index into multiple pieces called shards.
Sharding is important for two primary reasons:
- It allows you to horizontally split/scale your content volume
- It allows you distribute and parallelize operations across shards (potentially on multiple nodes) thus increasing performance/throughput
$ elasticsearch # or use window service
base url - http://localhost:9200
- /_cat --- will use cat api for checking health of cluster, will show all option for _cat command
- /_cat/nodes?v --- list of nodes in our cluster
- /_cat/indices?v --- list of indexes
- /_cat/plugins -- List of plugins
- Green - Everything Ok
- Yellow - Some replica not allocated
- Red - Some data is missing
$ curl -XPUT 'localhost:9200/blog?pretty'
$ curl -XPUT 'http://localhost:9200/blog/user/dilbert' -d '{ "name" : "Dilbert Brown" }'
$ curl -XPUT 'http://localhost:9200/blog/post/1' -d '
"user": "dilbert",
"postDate": "2011-12-15",
"body": "Search is hard. Search should be easy." ,
"title": "On search"
$ curl -XPUT 'http://localhost:9200/blog/post/2' -d '
"user": "dilbert",
"postDate": "2011-12-12",
"body": "Distribution is hard. Distribution should be easy." ,
"title": "On distributed search"
# Get
$ curl -XGET 'http://localhost:9200/blog/user/dilbert?pretty=true'
$ curl -XGET 'http://localhost:9200/blog/post/1?pretty=true'
$ curl -XPOST 'customer/external/1/_update?pretty&pretty' -H 'Content-Type: application/json' -d'
"doc": { "name": "Jane Doe" }
Elasticsearch provides the abiltiy to perform any of operation ( insert, update, delete) in batches using the bulk
POST /customer/internal/_bulk
{ "index": { "id": 1} }
{ "name" : "john" }
{ "index": { "id": 2} }
{ "name" : "doe" }
Above example - indexes two document ( Id -1 john & Id-2 - Doe) in one bulk operation.
$ POST /customer/external/_bulk?pretty
{"doc": { "name": "John Doe becomes Jane Doe" } }
Above example - update the first document & then delete the second document.
$ curl 'localhost:9200/blog/_search?q=user:ganesh&pretty=true'
This will return result in
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
"hits" : {
"total" : 1,
"max_score" : 0.30685282,
"hits" : [ {
"_index" : "blog",
"_type" : "post",
"_id" : "1",
"_score" : 0.30685282,
"_source":{"user":"ganesh", "title":"on search", "body":"search is hard"}
} ]
We can also use the JSON query language
$ curl 'localhost:9200/blog/_search?pretty' -d '{
"query" : {
"match" : { "user" : "ganesh"}
GET /bank/_search?q=*&sort=account_number:asc&pretty
GET /bank/_search
"query": { "match_all": {} },
"sort" : [
{ "account_number" : "asc" }
// match term `mill` or `lane`
GET /bank/_search
"query": { "match": { "address": "mill lane" } }
// match term `mill lane`
GET /bank/_search
"query": { "match_phrase": { "address": "mill lane" } }
GET /bank/_search
"query": { "match_all": {} },
"from": 10,
"size": 10
GET /bank/_search
"query": { "match": { "account_number": 20 } }
GET /bank/_search
"query": { "match_all": {} },
"_source": ["account_number", "balance"]
// return all accounts containing "mill" and "lane" in address
GET /bank/_search
"query": {
"bool": {
"must": [
{ "match": { "address": "mill" } },
{ "match": { "address": "lane" } }
GET /bank/_search
"query": {
"bool": {
"must": [
{ "match": { "age": "40" } }
"must_not": [
{ "match": { "state": "ID" } }
GET /bank/_search
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "state.keyword"
Above aggregation is similar in concept to - SELECT state, COUNT(*) FROM bank GROUP BY state ORDER BY COUNT(*) DESC
Note that we set size=0
to not show search hits because we only want to see the aggregation results in the response.
$ sudo bin/elasticsearch_plugin install <plugin-name>
$ ./bin/plugin list
- x-pack
- head
- Marvel - https://www.elastic.co/downloads/marvel
elastic / changeme
$ bin/kibana
Open source centralized logging manager
- Elasticsearch.Net
Elasticsearch.Net is a very low level, dependency free.