Es test - zhongjiajie/zhongjiajie.github.com GitHub Wiki

Es-test

Elasticsearch provides data manipulation and search capabilities in near real time. By default, you can expect a one second delay (refresh interval) from the time you index/update/delete your data until the time that it appears in your search results. This is an important distinction from other platforms like SQL wherein data is immediately available after a transaction is completed.

If we study the above commands carefully, we can actually see a pattern of how we access data in Elasticsearch. That pattern can be summarized as follows:

///

using the same

When indexing, the ID part is optional. If not specified, Elasticsearch will generate a random ID and then use it to index the document. The actual ID Elasticsearch generates (or whatever we specified explicitly in the previous examples) is returned as part of the index API call.

Note that in the above case, we are using the POST verb instead of PUT since we didn’t specify an ID.

Note though that Elasticsearch does not actually do in-place updates under the hood. Whenever we do an update, Elasticsearch deletes the old document and then indexes a new document with the update applied to it in one shot.

Updates can also be performed by using simple scripts. This example uses a script to increment the age by 5

POST /customer/_doc/1/_update?pretty { "script" : "ctx._source.age += 5" }

In the above example, ctx._source refers to the current source document that is about to be updated.

curl -X DELETE "localhost:9200/customer/_doc/2?pretty"

See the _delete_by_query API to delete all documents matching a specific query. It is worth noting that it is much more efficient to delete a whole index instead of deleting all documents with the Delete By Query API

Bulk request throws error in elasticsearch 6.1.1

As a quick example, the following call indexes two documents (ID 1 - John Doe and ID 2 - Jane Doe) in one bulk operation

curl -X POST "localhost:9200/customer/_doc/_bulk?pretty" -H 'Content-Type: application/json' -d' {"index":{"_id":"1"}} {"name": "John Doe" } {"index":{"_id":"2"}} {"name": "Jane Doe" } '

This example updates the first document (ID of 1) and then deletes the second document (ID of 2) in one bulk operation

curl -X POST "localhost:9200/customer/_doc/_bulk?pretty" -H 'Content-Type: application/json' -d' {"update":{"_id":"1"}} {"doc": { "name": "John Doe becomes Jane Doe" } } {"delete":{"_id":"2"}} '

The Bulk API does not fail due to failures in one of the actions. If a single action fails for whatever reason, it will continue to process the remainder of the actions after it. When the bulk API returns, it will provide a status for each action (in the same order it was sent in) so that you can check if a specific action failed or not.

The REST API for search is accessible from the _search endpoint. This example returns all documents in the bank index

curl -X GET "localhost:9200/bank/_search?q=*&sort=account_number:asc&pretty"

curl -X GET "localhost:9200/bank/_search" -H 'Content-Type: application/json' -d' { "query": { "match_all": {} }, "sort": [ { "account_number": "asc" } ] } '

  • took – time in milliseconds for Elasticsearch to execute the search
  • timed_out – tells us if the search timed out or not
  • _shards – tells us how many shards were searched, as well as a count of the successful/failed searched shards
  • hits – search results
  • hits.total – total number of documents matching our search criteria
  • hits.hits – actual array of search results (defaults to first 10 documents)
  • hits.sort - sort key for results (missing if sorting by score)
  • hits._score and max_score - ignore these fields for now

It is important to understand that once you get your search results back, Elasticsearch is completely done with the request and does not maintain any kind of server-side resources or open cursors into your results. This is in stark contrast to many other platforms such as SQL wherein you may initially get a partial subset of your query results up-front and then you have to continuously go back to the server if you want to fetch (or page through) the rest of the results using some kind of stateful server-side cursor.

use domain-specific language

specific return size,default 10

curl -X GET "localhost:9200/bank/_search" -H 'Content-Type: application/json' -d' { "query": { "match_all": {} }, "size": 1 } '

The from parameter (0-based) specifies which document index to start from and the size parameter specifies how many documents to return starting at the from parameter. This feature is useful when implementing paging of search results. Note that if from is not specified, it defaults to 0

curl -X GET "localhost:9200/bank/_search" -H 'Content-Type: application/json' -d' { "query": { "match_all": {} }, "from": 10, "size": 10 } '

By default, the full JSON document is returned as part of all searches. This is referred to as the source (_source field in the search hits). If we don’t want the entire source document returned, we have the ability to request only a few fields from within source to be returned.

curl -X GET "localhost:9200/bank/_search" -H 'Content-Type: application/json' -d' { "query": { "match_all": {} }, "_source": ["account_number", "balance"] } '

account_number = 20的文档

curl -X GET "localhost:9200/bank/_search" -H 'Content-Type: application/json' -d' { "query": { "match": { "account_number": 20 } } } '

address contain 'mill'

curl -X GET "localhost:9200/bank/_search" -H 'Content-Type: application/json' -d' { "query": { "match": { "address": "mill" } } } '

address contain 'mill' or address contain 'lane'

curl -X GET "localhost:9200/bank/_search" -H 'Content-Type: application/json' -d' { "query": { "match": { "address": "mill lane" } } } '

address contain 'mill lane'

curl -X GET "localhost:9200/bank/_search" -H 'Content-Type: application/json' -d' { "query": { "match_phrase": { "address": "mill lane" } } } '

address contain 'mill' and contain 'lane'

curl -X GET "localhost:9200/bank/_search" -H 'Content-Type: application/json' -d' { "query": { "bool": { "must": [ { "match": { "address": "mill" } }, { "match": { "address": "lane" } } ] } } } '

address contain 'mill' or contain 'lane'

curl -X GET "localhost:9200/bank/_search" -H 'Content-Type: application/json' -d' { "query": { "bool": { "should": [ { "match": { "address": "mill" } }, { "match": { "address": "lane" } } ] } } } '

address not contain 'mill' and not contain 'lane'

curl -X GET "localhost:9200/bank/_search" -H 'Content-Type: application/json' -d' { "query": { "bool": { "must_not": [ { "match": { "address": "mill" } }, { "match": { "address": "lane" } } ] } } } '

curl -X GET "localhost:9200/bank/_search" -H 'Content-Type: application/json' -d' { "query": { "bool": { "must": { "match_all": {} }, "filter": { "range": { "balance": { "gte": 20000, "lte": 30000 } } } } } } '

mapping

create mapping

curl -X PUT "localhost:9200/test?pretty" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "_doc": {
      "properties": {
        "id": {
          "type": "keyword"
        },
        "location": {
          "type": "geo_point",
          "ignore_malformed": true
        }
      }
    }
  }
}
'

update mapping

curl -X PUT "localhost:9200/twitter/_mapping/_doc" -H 'Content-Type: application/json' -d'
{
  "properties": {
    "email": {
      "type": "keyword"
    }
  }
}
'

score

In the previous section, we skipped over a little detail called the document score (_score field in the search results). The score is a numeric value that is a relative measure of how well the document matches the search query that we specified. The higher the score, the more relevant the document is, the lower the score, the less relevant the document is

But queries do not always need to produce scores, in particular when they are only used for "filtering" the document set. Elasticsearch detects these situations and automatically optimizes query execution in order not to compute useless scores

范围查询

curl -X GET "localhost:9200/bank/_search" -H 'Content-Type: application/json' -d' { "query": { "bool": { "must": { "match_all": {} }, "filter": { "range": { "balance": { "gte": 20000, "lte": 30000 } } } } } } '

reindex

The most basic form of _reindex just copies documents from one index to another. This will copy documents from the twitter index into the new_twitter index:

curl -X POST "localhost:9200/_reindex" -H 'Content-Type: application/json' -d'
{
  "source": {
    "index": "people_route_1"
  },
  "dest": {
    "index": "people_route"
  }
}
'

In Elasticsearch, you have the ability to execute searches returning hits and at the same time return aggregated results separate from the hits all in one response. This is very powerful and efficient in the sense that you can run queries and multiple aggregations and get the results back of both (or either) operations in one shot avoiding network roundtrips using a concise and simplified API.

To start with, this example groups all the accounts by state, and then returns the top 10 (default) states sorted by count descending (also default)

curl -X GET "localhost:9200/bank/_search" -H 'Content-Type: application/json' -d' { "size": 0, "aggs": { "group_by_state": { "terms": { "field": "state.keyword" } } } } ' Note that we set size=0 to not show search hits because we only want to see the aggregation results in the response

Building on the previous aggregation, this example calculates the average account balance by state (again only for the top 10 states sorted by count in descending order)

curl -X GET "localhost:9200/bank/_search" -H 'Content-Type: application/json' -d' { "size": 0, "aggs": { "group_by_state": { "terms": { "field": "state.keyword" }, "aggs": { "average_balance": { "avg": { "field": "balance" } } } } } } ' Notice how we nested the average_balance aggregation inside the group_by_state aggregation. This is a common pattern for all the aggregations. You can nest aggregations inside aggregations arbitrarily to extract pivoted summarizations that you require from your data

Building on the previous aggregation, let’s now sort on the average balance in descending order

curl -X GET "localhost:9200/bank/_search" -H 'Content-Type: application/json' -d' { "size": 0, "aggs": { "group_by_state": { "terms": { "field": "state.keyword", "order": { "average_balance": "desc" } }, "aggs": { "average_balance": { "avg": { "field": "balance" } } } } } } '

This example demonstrates how we can group by age brackets (ages 20-29, 30-39, and 40-49), then by gender, and then finally get the average account balance, per age bracket, per gender

curl -X GET "localhost:9200/bank/_search" -H 'Content-Type: application/json' -d' { "size": 0, "aggs": { "group_by_age": { "range": { "field": "age", "ranges": [ { "from": 20, "to": 30 }, { "from": 30, "to": 40 }, { "from": 40, "to": 50 } ] }, "aggs": { "group_by_gender": { "terms": { "field": "gender.keyword" }, "aggs": { "average_balance": { "avg": { "field": "balance" } } } } } } } } ' curl -X DELETE "localhost:9200/test?pretty" curl -X GET "localhost:9200/_cat/indices?v" curl -X PUT "localhost:9200/test?pretty" -H 'Content-Type: application/json' -d' { "mappings": { "_doc": { "properties": { "id": { "type": "keyword" }, "location": { "type": "geo_point", "ignore_malformed": true } } } } } ' curl -X PUT "localhost:9200/test/_doc/1?pretty" -H 'Content-Type: application/json' -d' { "id": "John Doe", "location": "23.45,103.65" } ' curl -X PUT "localhost:9200/test/_doc/2?pretty" -H 'Content-Type: application/json' -d' { "id": "John Doe", "location": "" } ' curl -X PUT "localhost:9200/test/_doc/3?pretty" -H 'Content-Type: application/json' -d' { "id": "John Doe", "location": null } ' curl -X GET "localhost:9200/test/_doc/_search?pretty" -H 'Content-Type: application/json' -d' { "size": 1, "query": { "match_all" : {} } } ' curl -X POST "localhost:9200/_snapshot/my_backup" -H 'Content-Type: application/json' -d' { "type": "fs", "settings": { "location": "/tmp/snapshots" } } ' curl -XPUT "localhost:9200/_snapshot/my_backup/snapshot_test" -H 'Content-Type: application/json' -d ' { "indices": "test" } ' curl -XGET "localhost:9200/_snapshot/my_backup/snapshot_test?pretty" curl -X POST "localhost:9200/_snapshot/drug_monitor_backup?pretty" -H 'Content-Type: application/json' -d' { "type": "fs", "settings": { "location": "/tmp/srv/snapshots" } } '

-- backup and restore curl -X GET "localhost:9200/_snapshot/drug_monitor_backup/people_route?pretty" curl -X GET "localhost:9200/_snapshot?pretty" curl -X GET "localhost:9200/_snapshot/my_backup?pretty" curl -XPOST "localhost:9200/_snapshot/my_backup/people_route/_restore" -H 'Content-Type: application/json' -d ' { "indices": "people_route", "rename_replacement": "people_route" } '

给了很多关于docker用于生产模式的建议,需要修改较多的配置以达到高性能

commom option

  • ?pretty: pretty_results, eturned will be pretty formatted (use it for debugging only!).
  • ?human=false: This makes sense when the stats results are being consumed by a monitoring tool, rather than intended for human consumption. The default for the human flag is false
  • filter_path: 通过限制字段限制返回reduce the response returned by Elasticsearch.

docker run --name elasticsearch -d -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" -e "path.repo=/tmp" -e "http.host=0.0.0.0" -e "transport.host=0.0.0.0" -e "xpack.security.enabled=false" elasticsearch:6.5.1 docker run -p 9400:9200 -p 9401:9300 -e "http.host=0.0.0.0" -e "transport.host=0.0.0.0" -e "xpack.security.enabled=false" docker.elastic.co/elasticsearch/elasticsearch:5.2.2

Unsatisfied dependency expressed through field 'peopleRouteRepository'; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'peopleRouteRepository': Invocation of init method failed; nested exception is org.springframework.data.mapping.model.MappingException: Attempt to add id property private java.lang.String com.ly.entity.trailAna.PeopleRoute.id but already have property private java.lang.String com.ly.entity.trailAna.PeopleRoute._id registered as id

elasticsearch script

要先通过if防止NPE

curl -X POST "localhost:9200/people_route/_update_by_query?pretty" -H 'Content-Type: application/json' -d' { "script": { "source": "if(ctx._source.station != null) {ctx._source.district = ctx._source.station.substring(0, 6)}" }, "query": { "match_all": {} } } '

curl -X GET "localhost:9200/people_route/_doc/_search?pretty" -H 'Content-Type: application/json' -d ' { "size": 0, "aggs": { "group_by_date": { "range": { "field": "starttime", "format": "yyyyMMdd", "ranges": [ { "from": "20170101", "to": "20180101" }, { "from": "20180101", "to": "20190101" } ] }, "aggs": { "group_by_district": { "terms": { "field": "district", "size": 100 } } } } } } '

curl -X GET "localhost:9200/people_route/_search" -H 'Content-Type: application/json' -d' { "size": 0, "aggs": { "group_by_district": { "terms": { "field": "district_zh", "size": 100 } } } } ' curl -X POST "localhost:9200/people_route/_delete_by_query" -H 'Content-Type: application/json' -d' { "query" : { "bool" : { "filter" : { "terms" : { "district_zh" : ["白云", "天河", "海珠", "黄埔", "花都", "南沙", "番禺", "从化", "增城"] } } } } } ' curl -X POST "localhost:9200/people_route/_delete_by_query" -H 'Content-Type: application/json' -d' curl -X GET "localhost:9200/people_route/_search" -H 'Content-Type: application/json' -d' { "query": { "range" : { "starttime" : { "format": "yyyyMMdd", "gte" : "20170101", "lt" : "20180101" } } } } '

curl -X GET "localhost:9200/my_index/_mapping?pretty" curl -X DELETE "localhost:9200/my_index?pretty" curl -X PUT "localhost:9200/my_index?pretty" -H 'Content-Type: application/json' -d' { "mappings": { "_doc": { "properties": { "group": { "type": "keyword" }, "user": { "type": "nested" } } } } } ' curl -X GET "localhost:9200/my_index/_search?pretty" -H 'Content-Type: application/json' -d' { "query": { "match_all": {} } } '

curl -X POST "localhost:9200/my_index/_doc?pretty" -H 'Content-Type: application/json' -d' { "group" : "fans", "user" : [{ "first" : "John", "last" : "Smith" }] } ' curl -X GET "localhost:9200/my_index/_search?pretty" -H 'Content-Type: application/json' -d' { "query": { "bool": { "must": [ {"match": {"user.first": "John"}} ] } } } '

curl -X GET "localhost:9200/my_index/_search?pretty" -H 'Content-Type: application/json' -d' { "query": { "bool": { "must": [ { "nested": { "path": "user", "query": { "bool": { "must": [ { "match": { "user.first": "John" }} ] } } } } ] } } } '

people测试用的people_route

curl -X GET "localhost:9200/people_route/_search?pretty" -H 'Content-Type: application/json' -d' { "size": 1, "query": { "bool": { "must": { "term": { "district": "440111" } } } }, "aggs": { "groupByStation": { "terms": { "field": "station" } } } } '

curl -X GET "localhost:9200/people_route/_search?pretty" -H 'Content-Type: application/json' -d' { "size": 1, "query": { "match_all": {} } } '

curl -X POST "localhost:9200/cars/_search?size=0" -H 'Content-Type: application/json' -d' { "aggs" : { "type_count" : { "cardinality" : { "field" : "type" } } } } '

curl -X GET "localhost:9200/people_route/_doc/_search?pretty" -H 'Content-Type: application/json' -d' { "size": 0, "aggs": { "groupByDistrict": { "terms": { "field": "district_zh" } }, "distinctTotal": { "cardinality": { "field": "id" } } } } '

curl -X GET "localhost:9200/people_route/_search?pretty" -H 'Content-Type: application/json' -d' { "aggs": { "groupByDate": { "range": { "field": "starttime", "format": "yyyyMMdd", "ranges": [ { "from": "20160101", "to": "20170101" }, { "from": "20170101", "to": "20180101" } ] }, "aggs": { "groupByDistrict": { "terms": { "field": "district_zh" }, "aggs": { "distinctTotal": { "cardinality": { "filed": "district_zh" } } } } } } } } '

curl -X GET "localhost:9200/people_route/_search?pretty" -H 'Content-Type: application/json' -d' { "size": 11, "query": { "bool": { "must": [ { "bool": { "should": [ { "term": { "drug_type": "涉毒" } }, { "term": { "drug_type": "吸毒" } } ] } }, { "nested": { "query": { "bool": { "must": [ { "term": { "id_attr.sex": "M" } }, { "bool": { "should": [ { "term": { "id_attr.province": "62" } }, { "term": { "id_attr.province": "43" } } ] } }, { "range": { "id_attr.birthday": { "from": "19500101", "to": "1990101", "format": "yyyyMMdd", "include_lower": true, "include_upper": true } } } ] } }, "path": "id_attr" } } ] } }, "aggs": { "groupByDate": { "range": { "field": "starttime", "format": "yyyyMMdd", "ranges": [ { "from": "20160101", "to": "20170101" }, { "from": "20170101", "to": "20180101" } ] }, "aggs": { "groupByDistrict": { "terms": { "field": "district_zh" } }, "distinctTotal": { "cardinality": { "field": "id", "precision_threshold": 40000 } } } } } } '

// for test reason curl -X GET "localhost:9200/people_route/_search?pretty" -H 'Content-Type: application/json' -d' { "size": 0, "query": { "bool": { "must": [ { "bool": { "should": [ { "term": { "drug_type": "吸毒" } }, { "term": { "drug_type": "涉毒" } }, { "term": { "drug_type": "NIS" } } ] } }, { "nested": { "query": { "bool": { "must": [ { "term": { "id_attr.sex": "M" } }, { "bool": { "should": [ { "term": { "id_attr.province": "43" } }, { "term": { "id_attr.province": "62" } } ] } }, { "range": { "id_attr.birthday": { "from": "19700101", "to": "20180101", "format": "yyyyMMdd", "include_lower": true, "include_upper": true } } } ] } }, "path": "id_attr" } } ] } }, "aggs": { "groupByDistrict": { "terms": { "field": "district_zh" } }, "groupByid": { "cardinality": { "field": "id" } } } } '

curl -X PUT "localhost:9200/test?pretty" -H 'Content-Type: application/json' -d' { "mappings": { "_doc": { "properties": { "id": { "type": "keyword" }, "location": { "type": "geo_point", "ignore_malformed": true } } } } } '

curl -X PUT "localhost:9200/people_route?pretty" -H 'Content-Type: application/json' -d' { "mappings": { "_doc": { "properties": { "district": { "type": "keyword" }, "district_zh": { "type": "keyword" }, "drug_type": { "type": "keyword" }, "endtime": { "type": "date", "format": "yyyy-MM-dd||strict_date_time" }, "id": { "type": "keyword" }, "id_attr": { "type": "nested", "properties": { "birthday": { "type": "date", "format": "yyyy-MM-dd" }, "city": { "type": "keyword" }, "district": { "type": "keyword" }, "province": { "type": "keyword" }, "sex": { "type": "keyword" } } }, "intime": { "type": "date", "format": "yyyy-MM-dd||strict_date_time" }, "location": { "type": "geo_point", "ignore_malformed": true }, "name": { "type": "text" }, "outtime": { "type": "date", "format": "yyyy-MM-dd||strict_date_time" }, "route_addr": { "type": "text" }, "route_id": { "type": "keyword" }, "route_name": { "type": "text", "fields": { "keyword": { "type": "keyword" } } }, "route_type": { "type": "keyword" }, "starttime": { "type": "date", "format": "yyyy-MM-dd||strict_date_time" }, "station": { "type": "keyword" }, "station_zh": { "type": "keyword" } } } } } '

curl -X POST "localhost:9200/my_index/_doc?pretty" -H 'Content-Type: application/json' -d' { "group" : "fans", "user" : [{ "first" : "John", "last" : "Smith" }] } '

curl -X POST "localhost:9200/people_route_1/_doc?pretty" -H 'Content-Type: application/json' -d' { "intime": "2013-08-27T11:44:39.000+08:00", "route_id": "4401120150", "route_name": "广州市黄埔区科尔酒店", "outtime": "9999-12-31T00:00:00.000+08:00", "district_zh": "黄埔区", "endtime": "2018-01-01T20:51:00.000+08:00", "starttime": "2017-12-31T21:19:00.000+08:00", "id_attr": { "birthday": "1992-10-02", "province": "44", "city": "4409", "district": "440902", "sex": "F" }, "drug_type": [ "", "吸毒NIS" ], "route_type": "酒店", "district": "440112", "name": "郑秀香", "station": "440112540000", "station_zh": "南岗派出所", "route_addr": "广州市黄埔区黄埔东路2943号之五", "location": "23.09054,113.52518", "id": "440902199210023724" } '

curl -X POST "localhost:9200/_reindex" -H 'Content-Type: application/json' -d' { "source": { "index": "people_route_1" }, "dest": { "index": "people_route" } } '

curl -X GET "localhost:9200/people_route/_doc/_search?pretty" -H 'Content-Type: application/json' -d' { "size": 1, "query": { "match_all" : {} } } '

curl -X DELETE 'http://localhost:9200/people_route_1/_search?pretty' -H 'Content-Type: application/json' -d ' { "size": 1, "query": { "bool": { "must": [ { "range": { "starttime": { "format": "yyyyMMdd", "gt": "20170101" } } } ] } } } '

⚠️ **GitHub.com Fallback** ⚠️