elasticsearch rails - mindpin/docs GitHub Wiki

elasticsearch-rails - Rails中文搜索解决方案

ElasticSearch是一个基于Lucene构建的开源,分布式,RESTful搜索引擎。设计用于 云计算中,能够达到实时搜索,稳定,可靠,快速,安装使用方便。支持通过HTTP使用JSON进行数据索引。

我们建立一个网站或应用程序,并要添加搜索功能,令我们受打击的是:搜索工作是很难的。我们希望我们的搜索解决方案要快,我们希望有一个零配置和一个完全免费的搜索模式,我们希望能够简单地使用JSON通过HTTP的索引数据,我们希望我们的搜索服务器始终可用,我们希望能够一台开始并扩展到数百,我们要实时搜索,我们要简单的多租户,我们希望建立一个云的解决方案。Elasticsearch旨在解决所有这些问题和更多的。

elasticsearch 相关内容

安装elasticsearch

OS X

brew install elasticsearch
elasticsearch --config=/usr/local/opt/elasticsearch/config/elasticsearch.yml

访问 http://localhost:9200,访问成功就表示安装完成了。

给 Rails 项目加入 elasticsearch 全文搜索功能

原生集成示例

在Gemfile中加入

gem 'elasticsearch-model'
gem 'elasticsearch-rails'

注意:es-model自带了分页插件,如果你在gemfile中有分页,如will_paginate 或者 kaminari,要把他们放到es-model和es-rails的前面。

在需要添加搜索的model添加以下代码:

class University < ActiveRecord::Base
  include Elasticsearch::Model
  include Elasticsearch::Model::Callbacks
end

完成引用后,我们可以编写search方法了:

 def self.search(search)
    response = __elasticsearch__.search(search)
 end

这是一个很简单的search,通过传入的参数直接进行检索。我们可以使用DSL来使我们的检索语句更加满足我们的业务需要,以下是我需要检索一个状态为1,并且从栏目名为name的一个检索:

def self.search_filter(params)  
  response = __elasticsearch__.search(
      "query": {
        "filtered": {
          "filter":   {
            "bool": {
              "must":     { "term":  { "status": 1 }},
              "must": {
                "query": { 
                  "match": { "name": params }
                }
              }
            }
          }
        }
      }  
    )
end  

然后我们为model创建index, 主要给es使用:

mapping dynamic: false do
  indexes :name
  indexes :tag
end

我们继续往下走,model是可以serialized成json的,我们使用as_indexed_json这个方法。我们可以这样写:

def as_indexed_json(options={})
  self.as_json(
    only: [:id, :name, :description, :status],   
    include: { tags: { only: [:name]}}
  )
end

include的部分是处理association的,only是model本身的字段属性。完成了以上调整,我们的model搜索基本完成了。如果你现在使用搜索,我估计还是搜索不出数据。我们要把数据导入给es,使用这个命令

rake environment elasticsearch:import:model CLASS='your_model_name' FORCE=y

concerns 集成示例

app/models/concerns/searchable.rb

module Searchable
  extend ActiveSupport::Concern 

  included do
    include Elasticsearch::Model
    include Elasticsearch::Model::Callbacks

    Searchable.enabled_models.add(self)
  end

  def self.enabled_models
    @_enabled_models ||= Set.new
  end

  def as_indexed_json(options={})
    as_json(except: [:id, :_id])
  end
end

app/models/concerns/standard_search.rb

module StandardSearch
  extend ActiveSupport::Concern

  included do
    include Searchable
  end

  module ClassMethods
    def standard_search(q)
      param = {
        :query => {
          :multi_match => {
            :fields   => standard_fields,
            :type     => "cross_fields",
            :query    => q,
            :analyzer => "standard",
            :operator => "and"
          }
        }
      }

      self.search(param).records.all
    end

    def standard(*fields)
      standard_fields.merge fields

      settings :index => {:number_of_shards => 1} do
        mappings :dynamic => "false" do
          fields.each do |f|
            indexes f, :analyzer => "chinese"
          end
        end
      end
    end

    def standard_fields
      @_standard_fields ||= Set.new
    end
  end
end

app/models/concerns/pinyin_search.rb

module PinyinSearch
  extend ActiveSupport::Concern

  included do
    include StandardSearch

    delegate :pinyin_fields, :to => :class

    before_save :save_pinyin_fields
  end

  private

  def save_pinyin_fields
    self.pinyin_fields.each do |field|
      value = self.send field

      self.send "#{field}_pinyin=", PinYin.of_string(value).join
      self.send "#{field}_abbrev=", PinYin.abbr(value)
    end
  end

  module ClassMethods
    def pinyin(*fields)
      standard(*fields)

      ext_fields = fields.select do |field|
        self.fields.include?(field.to_s) &&
        self.fields[field.to_s].type == String
      end.each do |f|
        pinyin_fields_from(f).each do |fd|
          field fd, :type => String
        end

        index_pinyin_field(f)
      end

      pinyin_fields.concat ext_fields
    end

    def pinyin_analysis
      {
        :analyzer => {
          :pinyin => {
            :type      => "custom",
            :tokenizer => "lowercase",
            :filter    => ["kc_ngram"]
          }
        },

        :filter => {
          :kc_ngram => {
            :type     => "nGram",
            :min_gram => 1,
            :max_gram => 128
          }
        }
      }
    end

    def pinyin_search(q)
      fields = pinyin_fields.map {|f| pinyin_fields_from(f)}.flatten 

      param = {
        :query => {
          :multi_match => {
            :fields   => fields,
            :type     => "phrase",
            :query    => q,
            :analyzer => "standard"
          }
        }
      }

      self.search(param).records.all
    end

    def pinyin_fields
      @_pinyin_fields ||= []
    end

    private

    def pinyin_fields_from(field)
      %W[#{field}_pinyin #{field}_abbrev]
    end

    def index_pinyin_field(field)
      ext_fields = pinyin_fields_from(field)

      settings :index => {:number_of_shards => 1}, :analysis => self.pinyin_analysis do
        mappings :dynamic => "false" do
          ext_fields.each do |f|
            indexes f, :analyzer => "pinyin"
          end
        end
      end
    end
  end
end

app/models/需要加入拼音搜索的model.rb

class KnowledgeNetStore::Point
  include PinyinSearch
  # 需要加入全文搜索的字段
  pinyin :name
end

现有项目使用情况分析

KnowledgeCamp(以下简称KC)

KnowledgeNetStore::Point.class_eval do
  include PinyinSearch
  pinyin :name
end

即在脑图节点中使用,对 name 字段进行拼音搜索的支持(含经典搜索)

pinIdea

app/models/concerns/searchable.rb

module Searchable
  extend ActiveSupport::Concern 

  included do
    include Elasticsearch::Model

    __elasticsearch__.client = Elasticsearch::Client.new host: "http://localhost:9200", log: true

    Searchable.enabled_models.add(self)

    after_create  {Indexer.perform_async(:index,  self.id.to_s, self.class.name)}
    after_update  {Indexer.perform_async(:update, self.id.to_s, self.class.name)}
    after_destroy {Indexer.perform_async(:delete, self.id.to_s, self.class.name)}
  end

  def self.enabled_models
    @_enabled_models ||= Set.new
  end

  def as_indexed_json(options={})
    as_json(except: [:id, :_id])
  end

  module ClassMethods
    def custom_analysis
      {
        :analyzer => {
          :chargram => {
            :type => :custom,
            :tokenizer => :chargram,
            :filter => [:lowercase]
          }
        },
        :tokenizer => {
          :chargram => {
            :type => :nGram,
            :min_gram => 1,
            :max_gram => 20,
            :token_chars => [:letter, :digit]
          }
        }
      }
    end
  end
end

app/models/concerns/vote_search_config.rb

module VoteSearchConfig
  extend ActiveSupport::Concern 

  included do
    include Searchable

    settings :index => {:number_of_shards => 1}, :analysis => custom_analysis do
      mappings :dynamic => "false" do
        indexes :title, :analyzer => :chargram
      end
    end
  end

  def as_indexed_json(options={})
    as_json(only: [:title])
  end

  module ClassMethods
    def page_search(query,page = 1, per = 20)
      page = 1 if page.blank?
      param = {
        :from => page-1,
        :size => per,
        :query => {
          :multi_match => {
            :type  => :best_fields, 
            :query => query,
            :fields => [:title]
          }
        },

        :highlight => {
          :pre_tags => ["<em class='highlight'>"],
          :post_tags => ["</em>"],
          :fields => {:title=>{}}
        }
      }

      Vote.search(param)
    end

  end

end

pinIdea属于搜索开发在KC之前,可以看出很杂乱,让刚接手的人员,很难去理解,并修改。

搜索功能模块梳理和优化

功能梳理

由两个项目可以看出,我们使用ES,除基本的全文搜索外,还希望能进行拼音搜索。

从扩展的角度来说,我们可能还会做以下的设定:pre_tags post_tags(搜索词前后,用于高亮)、多字段搜索(现有项目无实例)

至于其余扩展,可根据实际需求再行添加。

优化

优化上来说,从功能上来说,基本已经满足了需求,暂时没有什么需要扩展的。单纯转为Rails Engine即可。

然后给 gemspec 添加 elasticsearch 相关的 gem 依赖,则更好。

还有就是可以添加Controller相应的方法,或提供默认搜索的页面以及JSON返回。

更甚还可以提供Helper生成搜索框以及提交表单等

结论

建议做个 Rails Engine Gem, 主要用于简化集成,降低重复开发时间。

主要功能为常规搜索、拼音搜索。(都已实现)

至于复杂的扩展,我们可以根据具体需求,逐步添加至此Gem内。

⚠️ **GitHub.com Fallback** ⚠️