Setting up Elastisearch on Heroku Rails app - StanfordBioinformatics/pulsar_lims GitHub Wiki

In this tutorial, I'll demonstrate how to integrate Elasticsearch with a Rails app that is running on the Heroku platform. It's very simple to do, so keep on reading. For this tutorial, I don't have a demo app to share with you, thus, it is expected that you will follow along with your own app.

Getting the appropriate gems

I recommend using the following gems (add these to your Gemfile):

elasticsearch-model
elasticsearch-rails

You'll need to install the right version of these gems since they parallel the version of Elasticsearch; see https://github.com/elastic/elasticsearch-rails for details. Since the Elasticsearch addon on Heroku will be provisioning the latest release of Elasticsearch (which at the time of this writing is 6.2.4), you'll need to set the branch to "master".

The Elasticsearch addon on Heroku

Elastic Cloud is a SaaS offering from the Elastic company which was founded in 2010 in Amsterdam. The respective Heroku addon you seek is called foundelasticsearch, and is using Elastic Cloud to manage a cluster of Elasticsearch nodes. On the Heroku platform, enable this addon using the service plan that you need:

heroku addons:create foundelasticsearch:dachs-standard

Here I've selected the most basic plan. For a list of all plans, see the Heroku documentation.

At this point, we have an Elasticsearch cluster running on a single node. A new configuration variable has been added by the name of FOUNDELASTICSEARCH_URL, which can be inspected as follows:

heroku config:get FOUNDELASTICSEARCH_URL

What you'll see after running that command is a URL that points to your Elasticsearch cluster. You can access your Elastic Cloud dashboard for the addon as follows:

heroku addons:open foundelasticsearch

I recommend doing that now so that you can explore and also enable Kabana (the GUI monitoring service of your Elasticsearch cluster). You won't be charged more for taking advantage of this benefit, and quite frankly, I'm not sure why it's disabled by default. To enable Kabana, select the "Configuration" tab in the top menu bar of the dashboard, then scroll to the bottom where you'll stumble upon the Kabana section. Then, click on the "Enable" button.

The Elasticsearch username and password

The default username is "elastic". You need to create a password for your Elasticsearch cluster, which you can do from the Elasticsearch Cloud Dashboard. While in the Elasticsearch Cloud Dashboard, select the "Shield" menu item at the top. Next, under the section named Reset password, select the Reset button. It'll auto-generate a password for you, and you'll need to copy that for the next step, which is to add configuration variables for the username and password that will be used as part of HTTP Basic authentication while accessing your Elasticsearch cluster in API calls.

Set the following configuration variables for Heroku:

heroku config:set ES_USER=elastic
heroku config:set ES_PW=${password}
heroku config:set PROD=1

where ${password} is the password that you just obtained in the Elasticsearch Cloud Dashboard . Note that you can name these environment variables how you want. Notice that I added an extra config variable called PROD. We'll use this to distinguish between production and development environments so that you can still connect to a local Elasticsearch cluster from your development environment.

Adding Elasticsearch support to your models

Let's go ahead and set up one of our models so that it can be indexed in Elasticsearch. Below, I'm using an example model called Books.

class Books < ApplicationRecord
end

That's the model outline at least - imagine that you have all your typical application logic in-between. Now, add the support for Elasticsearch so that the class file now looks like this:

require 'elasticsearch/model'

class Books < ApplicationRecord
  include Elasticsearch::Model                                                                         
  include Elasticsearch::Model::Callbacks
end

You'll want to add those lines of code to all models that you plan to index. This extends your model to be able to support Elasticsearch, mainly through methods attached to the Elasticsearch::Model::Proxy class that are accessible from your model class and instances thereof through the __elasticsearch__ method. However, some of the most useful methods are accessible directly on the model class itself, such as the search method. Within the rails console, executing Book.search("some book") right now, however, won't get you far at all as nothing is indexed yet, and that'll serve as the interlude to the next section.

Indexing your models

The one big part that is still missing is the part where we tell our Rails app how to find our Elasticsearch cluster. We'll create an initializer to do that. Initializers are executed at app boot-up time and are stored in /config/initializers. Now, you can create an initializer; let's put it at /config/initializers/elasticsearch.rb and insert the following:

if ENV.include?("PROD")                                                                                
  Elasticsearch::Model.client = Elasticsearch::Client.new hosts: [{                                    
    host: ENV["FOUNDELASTICSEARCH_URL"].split("https://")[1],                                          
    user: ENV["ES_USER"],                                                                              
    port: "9243",                                                                                      
    password: ENV["ES_PW"],                                                                      
    scheme: "https"                                                                                    
  }]                                                                                                   
else                                                                                                   
  Elasticsearch::Model.client = Elasticsearch::Client.new host: "http://localhost:9200"                
end

Originally, I had written my initializer with only:

Elasticsearch::Model.client = Elasticsearch::Client.new url: "https://#{URI::encode(ENV['ES_USER'])}:#{URI::encode(ENV['ES_PW'])}@#{ENV['FOUNDELASTICSEARCH_URL'].split('https://')[1]}"

But that has two major issues:

Passing authentication in the URL is deprecated, and
I'm unable to connect to a local Elasticsearch cluster when running my Rails app in development mode. Thus, the former code stub is the right way to go.

Before pushing all your changes to your production Heroku app instance, lets test things out locally. You'll want to locally export all of the configuration variables that you set for Heroku, except for the PROD one:

export ES_USER=elastic
export ES_PW=${password}
export FOUNDELASTICSEARCH_URL=${value_from_heroku}

Alternatively, you can export them in your .env file that is used when running the heroku local command. Be sure to have Elasticsearch installed and running locally before proceeding. Next, open up a rails console and type

Book.import

You should get an error stating that the index doesn't exist yet. You can fix this like so:

Book.import(force: true)

In another terminal window, you can confirm that you index has been created:

curl -u ${ES_USER}:${ES_PW}  ${FOUNDELASTICSEARCH_URL}/_cat/indices?v

Alternatively, you can go the easy route and check in Kabana.

Now that you have your Books index set up, you can search it using Books.search. If things are working well, you can go ahead and add your changes to production. You may want to first delete the index, however, so you don't mix test data into your production index:

curl -XDELETE -u ${ES_USER}:${ES_PW}  ${FOUNDELASTICSEARCH_URL}/books

After you add you changes to your production instance, you'll need to run the import command again on the Book model. You can do this by opening up a console to your production database: heroku run console and then entering the import command.

Creating an index by hand can become quickly tedious if you have several models. To simplify things, I wrote a couple of functions that you can copy and paste into a rails console session. The import_into_elasticsearch() method can then be called to create indices for all models that need one.

def get_all_models                                                                                     
  #Omits HABTM models.                                                                                 
  Rails.application.eager_load!                                                                        
  res = ActiveRecord::Base.descendants.reject {|m| m.name.starts_with?("HABTM")}                       
end                                                                                                    
                                                                                                       
def import_into_elasticsearch                                                                          
  models = get_all_models()                                                                            
  models.each do |m|                                                                                   
    if m.respond_to?(:import)                                                                          
      puts "Importing model #{m.name}."                                                                
      m.import(force: true)                                                                            
    end                                                                                                
  end                                                                                                  
end

import_into_elasticsearch()

This only attempts to create an index for a model whose class responds to the import method that comes into existence when you add Elasticsearch support to your model as shown earlier.

Examples with curl

Search construct_tags for name equal to YFP curl -u ${ES_USER}:${ES_PW}${FOUNDELASTICSEARCH_URL}/construct_tags/_search?q=name:YFP
Get overall cluster health and see number of nodes in cluster curl -u ${ES_USER}:${ES_PW} ${FOUNDELASTICSEARCH_URL}/_cat/health?v
Get a list of all indices: curl -u ${ES_USER}:${ES_PW} ${FOUNDELASTICSEARCH_URL}/_cat/indices?v

Get the settings for the biosamples index:

curl -u ${ES_USER}:${ES_PW} ${FOUNDELASTICSEARCH_URL}/biosamples/_settings?pretty
{
  "biosamples" : {
    "settings" : {
      "index" : {
        "creation_date" : "1528487039151",
        "number_of_shards" : "5",
        "number_of_replicas" : "1",
        "uuid" : "68tyZBDrTsyneW00HPKFxw",
        "version" : {
          "created" : "6020499"
        },
        "provided_name" : "biosamples"
      }
    }
  }
}

Change the refresh rate from the default of 1s to 3s:

curl -H 'Content-Type: application/json' -XPUT -u ${ES_USER}:${ES_PW} ${FOUNDELASTICSEARCH_URL}/_settings -d 
  '{
    "index" : {
      "refresh_interval" : "3s"
    }
  }'

If you run the command from number 3 above, you'll now see that the key "refresh_interval" is present and set to "3s".

Get stats on the segments in each shard of an index:

curl -u ${ES_USER}:${ES_PW} ${FOUNDELASTICSEARCH_URL}/biosamples/_segments?pretty

List available commands of the cat API:

curl -u ${ES_USER}:${ES_PW} ${FOUNDELASTICSEARCH_URL}/_cat

Get help on a particular cat command:

curl -u ${ES_USER}:${ES_PW} ${FOUNDELASTICSEARCH_URL}/_cat/allocation?help

Get the cluster settings, with defaults:
```
curl -o cluster_stats.json -u ${ES_USER}:${ES_PW} ${FOUNDELASTICSEARCH_URL}/_cluster/settings? 
include_defaults=true
```
For example, in this output you can see the value for indices.memory.index_buffer_size, which defaults to 10% of the JVM heap size.
Get node stats
```
curl -o node_stats.json -u ${ES_USER}:${ES_PW} ${FOUNDELASTICSEARCH_URL}/_nodes/settings
```
The jvm section is interesting here. For example, you can see how much heap is allocated jvm.mem.heap_max_in_bytes, and what percentage of that is used jvm.mem.heap_used_percent.
Sort all indices by document count in descending order
```
curl -u ${ES_USER}:${ES_PW} ${FOUNDELASTICSEARCH_URL}/_cat/indices?s=docs.count:desc
```
Note that adding the field headers with &v doesn't work when sorting.