Lesson: Explore Objects in Fedora and Solr - samvera/hydra-works GitHub Wiki

Goals

  • See how objects are indexed into Solr
  • See how and where objects and metadata are stored in Fedora
  • Set indexers to manage how your metadata is indexed into Solr
  • Re-index objects into Solr (update the Solr index based on any changes to an object)

Explanation

Now that you've created objects and saved them in Fedora, you also want to be able to search for them in Solr. ActiveFedora makes it easy to get your metadata into Solr and manage if/when/how your metadata is indexed.

Step 1: See what your collection looks like in Fedora and Solr

If we go to the URI of your collection in a browser; i.e. http://127.0.0.1:8984/rest/dev/col-1, we should see what it looks like in Fedora. In the properties section you should see the attributes we set:

screen shot 2015-09-12

Let's also see that this collection has been ingested into the Solr search index. If you followed the example and used col-1 for your collection's ID, the solr page will be http://localhost:8983/solr/hydra-development/select?q=col-1 - the generic pattern looks like this: http://localhost:8983/solr/hydra-development/select?q=XXX and replace the XXX with the id from your console session. The page should look like the sample below. Note that, at this point, the title has not been indexed in solr. You only get fields like system_create_dtsi, system_modified_dtsi, id, object_profile_ssm, and has_model_ssim. In the next step we will modify our Collection model to add the collection metadata to the solr document.

<?xml version="1.0" encoding="UTF-8"?>
<response>
  <lst name="responseHeader">
    <int name="status">0</int>
    <int name="QTime">66</int>
    <lst name="params">
      <str name="q">col-1</str>
    </lst>
  </lst>
  <result name="response" numFound="1" start="0" maxScore="0.12829255">
    <doc>
      <date name="system_create_dtsi">2015-12-21T01:24:55Z</date>
      <date name="system_modified_dtsi">2015-12-21T01:32:22Z</date>
      <arr name="has_model_ssim">
        <str>Collection</str>
      </arr>
      <str name="id">col-1</str>
      <arr name="object_profile_ssm">
        <str>{"id":"col-1","title":"Works by Edgar Allan Poe","head_id":[],"tail_id":[]}</str>
      </arr>
      <arr name="member_ids_ssim">
        <str>work-1</str>
        <str>col-2</str>
        <str>work-2</str>
      </arr>
      <long name="_version_">1512687586453225472</long>
      <date name="timestamp">2015-12-21T01:35:20.483Z</date>
      <float name="score">0.12829255</float>
    </doc>
  </result>
  <lst name="facet_counts">
    <lst name="facet_queries"/>
    <lst name="facet_fields"/>
    <lst name="facet_ranges"/>
    <lst name="facet_intervals"/>
    <lst name="facet_heatmaps"/>
  </lst>
  <lst name="spellcheck">
    <lst name="suggestions">
      <bool name="correctlySpelled">true</bool>
    </lst>
  </lst>
</response>

Step 2: See what your bibliographic work looks like in Fedora and Solr

If we go to the URI of your object in a browser; i.e. http://127.0.0.1:8984/rest/dev/work-1, we should see what it looks like in Fedora. In the properties section you should see the attributes we set:

screen shot 2015-09-12

Let's also see that this bibliographic resource work has been ingested into the Solr search index. If you followed the example and used work-1 for your bibliographic resource work's ID, the solr page will be http://localhost:8983/solr/hydra-development/select?q=work-1 - the generic pattern looks like this: http://localhost:8983/solr/hydra-development/select?q=XXX and replace the XXX with the id from your console session. The page should look like the sample below. Note that, at this point, the title, author, and abstract have not been indexed in solr. You only get fields like system_create_dtsi, system_modified_dtsi, id, object_profile_ssm, and has_model_ssim. In the next step we will modify our BibliographicWork model to add the bibliographic resource work metadata to the solr document.

<response>
  <lst name="responseHeader">
    <int name="status">0</int>
    <int name="QTime">3</int>
    <lst name="params">
      <str name="q">work-1</str>
    </lst>
  </lst>
  <result name="response" numFound="1" start="0" maxScore="0.12829255">
    <doc>
      <date name="system_create_dtsi">2015-12-21T01:28:00Z</date>
      <date name="system_modified_dtsi">2015-12-21T01:32:29Z</date>
      <arr name="has_model_ssim">
        <str>BibliographicWork</str>
      </arr>
      <str name="id">work-1</str>
      <arr name="object_profile_ssm">
        <str>{"id":"work-1","head":[],"tail":[],"title":"The Raven","author":"Poe, Edgar Allan","abstract":"A lonely man tries to ease his 'sorrow for the lost Lenore', by distracting his mind with old books of 'forgotten lore'."}</str>
      </arr>
      <arr name="member_ids_ssim">
        <str>fileset-1</str>
        <str>work-2</str>
        <str>fileset-2</str>
      </arr>
      <long name="_version_">1512687848300478464</long>
      <date name="timestamp">2015-12-21T01:36:17.797Z</date>
      <float name="score">0.12829255</float>
    </doc>
  </result>
  <lst name="facet_counts">
    <lst name="facet_queries"/>
    <lst name="facet_fields"/>
    <lst name="facet_ranges"/>
    <lst name="facet_intervals"/>
    <lst name="facet_heatmaps"/>
  </lst>
  <lst name="spellcheck">
    <lst name="suggestions">
      <bool name="correctlySpelled">true</bool>
    </lst>
  </lst>
</response>

Step 3: See what your bibliographic file set looks like in Fedora and Solr

If we go to the URI of your object in a browser; i.e. http://127.0.0.1:8984/rest/dev/fileset-1, we should see what it looks like in Fedora. In the properties section you should see the attributes we set:

screen shot 2015-09-12

Let's also see that this bibliographic resource file has been ingested into the Solr search index. If you followed the example and used fileset-1 for your bibliographic resource file's ID, the solr page will be http://localhost:8983/solr/hydra-development/select?q=fileset-1 - the generic pattern looks like this: http://localhost:8983/solr/hydra-development/select?q=XXX and replace the XXX with the id from your console session. The page should look like the sample below. Note that, at this point, the title has not been indexed in solr. You only get fields like system_create_dtsi, system_modified_dtsi, id, object_profile_ssm, and has_model_ssim. In the next step we will modify our BibliographicFileSet model to add the bibliographic resource file metadata to the solr document.

<response>
  <lst name="responseHeader">
    <int name="status">0</int>
    <int name="QTime">5</int>
    <lst name="params">
      <str name="q">fileset-1</str>
    </lst>
  </lst>
  <result name="response" numFound="1" start="0" maxScore="0.12829255">
    <doc>
      <date name="system_create_dtsi">2015-12-21T01:30:35Z</date>
      <date name="system_modified_dtsi">2015-12-21T01:30:35Z</date>
      <arr name="has_model_ssim">
        <str>BibliographicFileSet</str>
      </arr>
      <str name="id">fileset-1</str>
      <arr name="object_profile_ssm">
        <str>{"id":"fileset-1","head":[],"tail":[],"title":"The Raven pdf"}</str>
      </arr>
      <long name="_version_">1512687863018291200</long>
      <date name="timestamp">2015-12-21T01:32:29.725Z</date>
      <float name="score">0.12829255</float>
    </doc>
  </result>
  <lst name="facet_counts">
    <lst name="facet_queries"/>
    <lst name="facet_fields"/>
    <lst name="facet_ranges"/>
    <lst name="facet_intervals"/>
    <lst name="facet_heatmaps"/>
  </lst>
  <lst name="spellcheck">
    <lst name="suggestions">
      <bool name="correctlySpelled">true</bool>
    </lst>
  </lst>
</response>

Step 4: Start the Rails console

Start rails console by typing the following in a terminal window.

$ rails console

(Or you can abbreviate this as rails c.)

You should see something like Loading development environment (Rails 4.2.0). Now you're in a "REPL", or interactive ruby console that has all of your Rails application's code and configuration loaded.

You need to get back the objects you were working with in the previous lesson.

col = Collection.find("col-1")
=> #<Collection id: "col-1", head: [], tail: [], title: "Works by Edgar Allan Poe">

bw = BibliographicWork.find("work-1")
=> #<BibliographicWork id: "work-1", head: [], tail: [], title: "The Raven", author: "Poe, Edgar Allan", abstract: "A lonely man tries to ease his 'sorrow for the lost Lenore', by distracting his mind with old books of 'forgotten lore'.">

bf = BibliographicFileSet.find("fileset-1")
=> #<BibliographicFileSet id: "fileset-1", head: [], tail: [], title: "The Raven pdf">

Step 5: In the console, see how your collection, bibliographic work, and bibliographic file set metadata are indexed into Solr

The to_solr method is what generates the solr document for your objects. To see the full solr document for the collection, bibliographic resource work, and bibliographic resource file we created, call:

col.to_solr
=> {"system_create_dtsi"=>"2016-06-13T13:53:45Z", "system_modified_dtsi"=>"2016-06-13T14:18:45Z", "has_model_ssim"=>["Collection"], :id=>"col-1", "object_profile_ssm"=>"{\"id\":\"col-1\",\"head\":[],\"tail\":[],\"title\":\"Works by Edgar Allan Poe\"}", "member_ids_ssim"=>["work-1", "col-2", "work-2"], "object_ids_ssim"=>[], "collection_ids_ssim"=>[]}

bw.to_solr
=> {"system_create_dtsi"=>"2016-06-13T14:11:50Z", "system_modified_dtsi"=>"2016-06-13T14:19:15Z", "has_model_ssim"=>["BibliographicWork"], :id=>"work-1", "object_profile_ssm"=>"{\"id\":\"work-1\",\"head\":[],\"tail\":[],\"title\":\"The Raven\",\"author\":\"Poe, Edgar Allan\",\"abstract\":\"A lonely man tries to ease his 'sorrow for the lost Lenore', by distracting his mind with old books of 'forgotten lore'.\"}", "member_ids_ssim"=>["fileset-1", "work-2", "fileset-2"], "object_ids_ssim"=>[]}

bf.to_solr
=> {"system_create_dtsi"=>"2016-06-13T14:16:40Z", "system_modified_dtsi"=>"2016-06-13T14:19:16Z", "has_model_ssim"=>["BibliographicFileSet"], :id=>"fileset-1", "object_profile_ssm"=>"{\"id\":\"fileset-1\",\"head\":[],\"tail\":[],\"title\":\"The Raven pdf\"}", "member_ids_ssim"=>[], "object_ids_ssim"=>[]}

As you can see, the author, title, and abstract values are not in the solr document except in the object_profile_ssm which is a serialization of the entire object. This serialization can be used to re-populate the object from solr instead of Fedora.

Once you're done, exit the console by typing exit

Step 6: Change how your collection, bibliographic work, and bibliographic file set metadata are indexed into Solr

To make the Collection model index title, you need to reopen app/models/collection.rb and change the class to look like this:

class Collection < ActiveFedora::Base
  include Hydra::Works::CollectionBehavior
  property :title, predicate: ::RDF::Vocab::DC.title, multiple: false do |index|
    index.as :stored_searchable
  end
end

To make the BibliographicWork model index the author, title, and abstract fields, you need to reopen app/models/bibliographic_work.rb and change the class to look like this:

class BibliographicWork < ActiveFedora::Base
  include Hydra::Works::WorkBehavior
  property :title, predicate: ::RDF::Vocab::DC.title, multiple: false do |index|
    index.as :stored_searchable
  end
  property :author, predicate: ::RDF::Vocab::DC.creator, multiple: false do |index|
    index.as :stored_searchable
  end
  property :abstract, predicate: ::RDF::Vocab::DC.abstract, multiple: false do |index|
    index.as :stored_searchable
  end
end

To make the BibliographicFileSet model index title field, you need to reopen app/models/bibliographic_file_set.rb and change the class to look like this:

class BibliographicFileSet < ActiveFedora::Base
  include Hydra::Works::FileSetBehavior
  property :title, predicate: ::RDF::Vocab::DC.title, multiple: false do |index|
    index.as :stored_searchable
  end
end

Note: Because we have made changes to our Ruby code that we want to use, we need to restart the Rails console so that it will reload all of the code, including our latest changes.

Now, restart the rails console and we can load the objects we previously created:

col = Collection.find("col-1")
=> #<Collection id: "col-1", title: "Works by Edgar Allan Poe", head_id: "col-1/members/40cee069-3ca6-4447-a531-a4f6555e8ca0", tail_id: "col-1/members/40cee069-3ca6-4447-a531-a4f6555e8ca0">

bw = BibliographicWork.find("work-1")
=> #<BibliographicWork id: "work-1", title: "The Raven", author: "Poe, Edgar Allan", abstract: "A lonely man tries to ease his 'sorrow for the lost Lenore', by distracting his mind with old books of 'forgotten lore'.", head_id: nil, tail_id: nil>

bf = BibliographicFileSet.find("fileset-1")
=> #<BibliographicFileSet id: "fileset-1", title: "The Raven pdf", head_id: nil, tail_id: nil>

Check that the collection has title in the solr doc as title_tesim.

col.to_solr
=> {"system_create_dtsi"=>"2016-06-13T13:53:45Z", "system_modified_dtsi"=>"2016-06-13T14:18:45Z", 
    "has_model_ssim"=>["Collection"], :id=>"col-1", 
    "object_profile_ssm"=>"{\"id\":\"col-1\",\"head\":[],\"tail\":[],\"title\":\"Works by Edgar Allan Poe\"}", 
    "title_tesim"=>["Works by Edgar Allan Poe"], "member_ids_ssim"=>["work-1", "col-2", "work-2"], 
    "object_ids_ssim"=>[], "collection_ids_ssim"=>[]}

Check that the work has title, author, and abstract in the solr doc as title_tesim, author_tesim, and abstract_tesim, respectively.

bw.to_solr
=> {"system_create_dtsi"=>"2016-06-13T14:11:50Z", "system_modified_dtsi"=>"2016-06-13T14:19:15Z", 
    "has_model_ssim"=>["BibliographicWork"], :id=>"work-1", 
    "object_profile_ssm"=>"{\"id\":\"work-1\",\"head\":[],\"tail\":[],\"title\":\"The Raven\",\"author\":\"Poe, Edgar Allan\",\"abstract\":\"A lonely man tries to ease his 'sorrow for the lost Lenore', by distracting his mind with old books of 'forgotten lore'.\"}", 
    "title_tesim"=>["The Raven"], "author_tesim"=>["Poe, Edgar Allan"], 
    "abstract_tesim"=>["A lonely man tries to ease his 'sorrow for the lost Lenore', by distracting his mind with old books of 'forgotten lore'."], 
    "member_ids_ssim"=>["fileset-1", "work-2", "fileset-2"], "object_ids_ssim"=>[]}

Check that the file set has title in the solr doc as title_tesim.

bf.to_solr
=> {"system_create_dtsi"=>"2016-06-13T14:16:40Z", "system_modified_dtsi"=>"2016-06-13T14:19:16Z", 
    "has_model_ssim"=>["BibliographicFileSet"], :id=>"fileset-1", 
    "object_profile_ssm"=>"{\"id\":\"fileset-1\",\"head\":[],\"tail\":[],\"title\":\"The Raven pdf\"}", 
    "title_tesim"=>["The Raven pdf"], "member_ids_ssim"=>[], "object_ids_ssim"=>[]}

Now when you call .to_solr on a Collection, BibliographicWork, and BibliographicFileSet it returns a solr document with fields named title_tesim, author_tesim, and abstract_tesim that contain your title, author, and abstract values. Those are the field names that we will add to Blacklight's queries in Lesson: Make Blacklight Return Search Results.

Step 7: Re-index objects in Solr

At this point, creating the Solr document using the to_solr method returns the Solr document with the newly indexed fields. But calling this method does not update the Solr index. Now we'll call the update_index method, which republishes the Solr document using the changes we've made.

col.update_index
=> {"responseHeader"=>{"status"=>0, "QTime"=>13}}

bw.update_index
=> {"responseHeader"=>{"status"=>0, "QTime"=>10}}
 
bf.update_index
=> {"responseHeader"=>{"status"=>0, "QTime"=>10}} 

If you refresh the document result from solr (http://localhost:8983/solr/hydra-development/select?q=col-1) you should see that the title field has been added to the solr_document in the index:

<arr name="title_tesim">
    <str>Works by Edgar Allan Poe</str>
</arr>

If you refresh the document result from solr (http://localhost:8983/solr/hydra-development/select?q=work-1) you should see that the index.as fields have been added to the solr_document:

<arr name="title_tesim">
    <str>The Raven</str>
</arr>
<arr name="author_tesim">
    <str>Poe, Edgar Allan</str>
</arr>
<arr name="abstract_tesim">
    <str>A lonely man tries to ease his 'sorrow for the lost Lenore', by distracting his mind with old books of 'forgotten lore'.</str>
</arr>

If you refresh the document result from solr (http://localhost:8983/solr/hydra-development/select?q=fileset-1) you should see that the title field has been added to the solr_document:

<arr name="title_tesim">
    <str>The Raven pdf</str>
</arr>

Aside: The strange suffixes on the field names are provided by solrizer. You can read about them in the solrizer documentation and Hydra-Head documentation on the Solr Schema. In short, the _tesim suffix tells Solr to treat the values as text in the english language that should be stored, indexed and allowed to be multivalued. This _tesim suffix is a useful catch-all that gets your searches working predictably with minimal fuss. As you encounter cases where you need to index your content in more nuanced ways, there are ways to change these suffixes in order to achieve different results in Solr.

Why don't the collection, bibliographic work, and bibliographic file set show up in Blacklight?

Now your objects are indexed properly, but they won't show up in Blacklight's search results until you've turned off access controls and added the appropriate fields to Blacklight's queries.

Step 8: Commit your changes

Now that we've got our models working, it's a great time to commit to git:

git add .
git commit -m "Add solr indexing to collection, work, and file set models"

Next Step

Go on to Lesson: Make Blacklight Return Search Results or return to the Dive into Hydra-Works page.

⚠️ **GitHub.com Fallback** ⚠️