Derivatives - OregonDigital/oregondigital GitHub Wiki
Force Generate Derivatives
From console
This can be useful for seeing some errors that don't register in Resque's Failed listing. Long-running derivatives jobs could lead to shell timeouts.
item = ActiveFedora::Base.find(pid)
item.create_derivatives
Using Resque and workers
From Rails console:
Resque.enqueue(CreateDerivativesJob, pid)
TIFs Process
- TIF file extracted from 'content' datastream in Fedora, saved to
tmp/
vips
is used to create a pyramidal TIF file, which is saved tomedia/pyramidal-tiffs
PDFs Process
- PDF file extracted from 'content' datastream in Fedora, saved to
tmp/
- Each page is extracted (using
docsplit
withgraphicsmagick
andghostscript
) and resized, saved tomedia/document_pages
- Thumbnail is extracted from page 1 jpg file, saved to
/media/thumbnails
- Each page is saved as external datastream reference on Fedora object
- Each page's height/width is scanned and saved for
leafMetadata
datastream on Fedora object - hOCR is extracted from the PDF by
pdftotext
and saved as external datastream on Fedora object
pdftotext -enc UTF-8 '/usr/local/railsuser/tmp/workers/oregondigital:fx71d3855-content.020201214-55795-8mhtc5.pdf' 'document_pages/5/5/oregondigital-fx71d3855/ocr.html' -bbox
Videos Process
- video extracted from 'content' datastream in Fedora, saved to
tmp/
- ffmpeg is run on video, to create 320x240 mp4
ffmpeg -i "/usr/local/railsuser/tmp/workers/oregondigital:rn301143j-content.020201207-26311-1unkjsh.mp4" -s 320x240 -vcodec libx264 -acodec libfdk_aac -g 30 -b:v 345k -ac 2 -ab 96k -ar 44100 -v 0 -nostats -y /oregondigital/media/video/j/3/oregondigital-rn301143j.mp4
- thumbnail is extracted from
mp4
file, saved to/media/thumbnails
Troubleshooting
- If item show page displays video player but no thumbnail loads or no video plays, confirm the derivative video created successfully. There is no check that the file exists before trying to load the player.
Audio Process
- audio extracted from 'content' datastream in Fedora, saved to
tmp/