Lesson: Add attached files - samvera/hydra-works GitHub Wiki
Goals
- Attaching file sub-resources to models
- See where files are stored in Fedora objects and how to retrieve them
Explanation
So far, we've only added metadata to our objects. Let's attach a file that has some content to it. For example, for our BibliographicFileSet model, this could be an image of the bibliographic resource's cover or a pdf of the bibliographic resource's content, or for the PageFileSet model, an image or pdf of a single page.
In this case, we'll add a file where we can store a pdf of a page.
Steps
Step 1: In the console, add a content file resource to the Page model
When we originally built the PageFileSet model, we added a property named text to hold the text of the page. But for those who have electronic versions of pages, you will want to upload a file instead. The following shows how to attach a content file and later steps will show how to create derivatives of the content file.
By defining our PageFileSet model to include the behaviors of a file set, it is ready to have the page content uploaded. Each file you want to upload will go into a separate file set. Generic files are defined to hold one uploaded content file and any number of derivatives of the uploaded content, for example a thumbnail image file and full text file. The following shows an example of uploading a content file.
require 'open-uri'
pf1 = PageFileSet.find('page-1')
=> #<PageFileSet id: "page-1", page_number: 1, text: "Once upon a midnight dreary...", head_id: nil, tail_id: nil>
file1 = open("https://github.com/projecthydra-labs/hydra-works/wiki/raven_files/TheRaven_page1.pdf","r")
=> #<Tempfile:/var/folders/cm/zq5vgsj946n5hws81m85h5fr0000gn/T/open-uri20150922-869-2uceq0>
Hydra::Works::UploadFileToFileSet.call(pf1, file1)
=> #<PageFileSet id: "page-1", page_number: 1, text: "Once upon a midnight dreary...", head_id: nil, tail_id: nil>
pf1.save
=> true
pf1.files
=> [#<Hydra::PCDM::File uri="http://127.0.0.1:8983/fedora/rest/dev/page-1/files/a64557f8-1c74-4cf0-9d55-3acaebf98bc7" >]
NOTE: There are several ways to create a file that is acceptable to the UploadFileToFileSet service. See the documentation in the header of the service definition file for an exhaustive list. At the writing of this tutorial, the list of accepted content files is...
# @param [IO,File,Rack::Multipart::UploadedFile, #read] object that will be the contents. If file responds to :mime_type or :original_name, those will be called to provide technical metadata.
If you want to upload a local file rather than one from a URL, you can issue the following commands:
pf1 = PageFileSet.find('page-1')
file1 = open("/path/to/a/local/file.pdf")
Hydra::Works::UploadFileToFileSet.call(pf1, file1)
pf1.save
pf1.files
Step 2: View the contents from Fedora
Copy the URL you get when you run pf1.files
and paste it into your browser. You may need to enter your fedora user and password (default username/password fedoraAdmin/fedoraAdmin
).
NOTE: Some browsers will recognize that this is a pdf file and open it appropriately. Or it may try to open it as text and you will need to choose to open it with Adobe Reader.
Step 3: Fix the mimetype set by github
If you used open-uri to open the file directly from github, then the mimetype on the file is incorrectly set to "application/octet-stream". We are going to change it to "application/pdf" before continuing with derivatives.
f1 = pf1.files.first
=> #<Hydra::PCDM::File uri="http://127.0.0.1:8983/fedora/rest/dev/page-31/files/2520d9d5-4631-4f20-9fce-f40eb1bd1095" >
f1.mime_type
=> "application/octet-stream"
f1.mime_type = 'application/pdf'
=> "application/pdf"
pf1.files.first.mime_type
=> "application/pdf"
Step 4: Generate standard derivatives
NOTE: Could not get this running in 1-25-2017 (Hydra 11) update to this tutorial.
There are dependencies that have to be installed prior to being able to generate a thumbnail. See hydra-derivatives for the dependency list and other useful information on working with the hydra-derivatives gem.
Once dependencies have been installed, type the following in the rails console to generate a thumbnail.
pf1.create_derivatives
=> [{:label=>:thumbnail, :format=>"jpg", :size=>"338x493", :object=>#<PageFileSet id: "page-1", head: [], tail: [], page_number: 1, text: "Once upon a midnight dreary...">}]
pf1.files
=> [#<Hydra::PCDM::File uri="http://127.0.0.1:8983/fedora/rest/dev/page-1/files/50a56242-6ab0-4234-bfcb-b6321dfeec6f" >,
#<Hydra::PCDM::File uri="http://127.0.0.1:8983/fedora/rest/dev/page-1/files/d0fad131-50d3-4136-b670-9880c3a0e0f2" >]
pf1.thumbnail
=> #<Hydra::PCDM::File uri="http://127.0.0.1:8983/fedora/rest/dev/page-1/files/d0fad131-50d3-4136-b670-9880c3a0e0f2" >
NOTE: At the time of this writing, create_derivatives only creates a thumbnail for pdf files. To see what create_derivatives generates for various file types, see #create_derivatives method in lib/hydra/works/models/concerns/file_set/derivatives.rb.
Step 5: View the thumbnail from Fedora
Copy the URL from pf1.thumbnail
and paste it into your browser. You may need to enter your fedora user and password (default username/password fedoraAdmin/fedoraAdmin
).
Step 6: Generate full text derivative
Warning: It appears that the full text derivative service has been removed. ### TODO Look into whether there is an alternate way to generate full text.
To generate the full text derivative, type the following in the rails console.
extracted_text = Hydra::Works::FullTextExtractionService.run(pf1)
=> # all the text for page 1
pf1.build_extracted_text
=> #<Hydra::PCDM::File uri="http://127.0.0.1:8983/fedora/rest/dev/page-1/files/7d761246-e65b-48d1-b6c3-0e9537cdf5f2" >
pf1.extracted_text.content = extracted_text
=> # all the text for page 1
pf1.save
=> true
pf1.extracted_text
=> #<Hydra::PCDM::File uri="http://127.0.0.1:8983/fedora/rest/dev/page-1/files/7d761246-e65b-48d1-b6c3-0e9537cdf5f2" >
NOTE: The process for generating derivatives is under review and will likely change such that all derivatives are generated through the hydra-derivatives gem.
Step 7: View the extracted text from Fedora
Copy the URL from pf1.extracted_text
and paste it into your browser. You may need to enter your fedora user and password (default username/password fedoraAdmin/fedoraAdmin
).
Step 8: Add files for other pages
If you like, this is a good time to use this same process to add the other pages of The Raven to the other page files.
Next Step
Proceed to BONUS Lesson: Generate Rails Scaffolding for Creating and Editing or explore other [Dive into Hydra-Works](Dive into Hydra-Works#Bonus) tutorial bonus lessons.