Basic knowledge for using - accgetter/Elasticsearch GitHub Wiki

基本的なデータ構造

名前	意味
index	RDBのdatabase
type	RDBのtable
document	RDBのrecord

Analyzer

Analyzerとは文字列の扱い方の定義。定義すべき扱い方は2種類ある。

1. tokenizer（Documentの分割方法）

名前	説明
n-gram	N(1以上の数字）文字ずつdocumentを区切る
形態素解析	辞書を使って意味のある単語でdocumentを区切る

2. filter（整形処理方法）

名前	説明
lowercase	小文字にならす
uppercase	大文字にならす

Mapping

データをElasticsearch上でどのようなスキーマとして表現するか定義すること。つまりスキーマの定義。 mapping定義の例

"mappings": {
    "company": {
      "_source": {
        "enabled": true
      },
      "_all": {
        "enabled": true,
        "analyzer": "kuromoji_analyzer"
      },
      "properties": {
        "id": {
          "type": "integer",
          "index": "not_analyzed"
        },
        "name": {
          "type": "string",
          "index": "analyzed",
          "analyzer": "ngram_analyzer"
        }
      }
    }
  }

_source オリジナルのJSONデータを保持しておくかどうかの設定

"_source": {
  "enabled": true
}

_all _allフィールドは、その他のfieldのデータを連結して1つの文字列として保持するfieldです。検索時はspace区切りで値を設定して使用します。アナライズド、インデックスされますがこの値自体を取得することはできません。

"_all": {
  "enabled": true,
  "analyzer": "kuromoji_analyzer"
},