Derivatives - OregonDigital/oregondigital GitHub Wiki

Force Generate Derivatives

From console

This can be useful for seeing some errors that don't register in Resque's Failed listing. Long-running derivatives jobs could lead to shell timeouts.

item = ActiveFedora::Base.find(pid)
item.create_derivatives

Using Resque and workers

From Rails console:

Resque.enqueue(CreateDerivativesJob, pid)

TIFs Process

  1. TIF file extracted from 'content' datastream in Fedora, saved to tmp/
  2. vips is used to create a pyramidal TIF file, which is saved to media/pyramidal-tiffs

PDFs Process

  1. PDF file extracted from 'content' datastream in Fedora, saved to tmp/
  2. Each page is extracted (using docsplit with graphicsmagick and ghostscript) and resized, saved to media/document_pages
  3. Thumbnail is extracted from page 1 jpg file, saved to /media/thumbnails
  4. Each page is saved as external datastream reference on Fedora object
  5. Each page's height/width is scanned and saved for leafMetadata datastream on Fedora object
  6. hOCR is extracted from the PDF by pdftotext and saved as external datastream on Fedora object
pdftotext -enc UTF-8 '/usr/local/railsuser/tmp/workers/oregondigital:fx71d3855-content.020201214-55795-8mhtc5.pdf' 'document_pages/5/5/oregondigital-fx71d3855/ocr.html' -bbox

Videos Process

  1. video extracted from 'content' datastream in Fedora, saved to tmp/
  2. ffmpeg is run on video, to create 320x240 mp4
ffmpeg  -i "/usr/local/railsuser/tmp/workers/oregondigital:rn301143j-content.020201207-26311-1unkjsh.mp4" -s 320x240 -vcodec libx264 -acodec libfdk_aac -g 30 -b:v 345k -ac 2 -ab 96k -ar 44100 -v 0 -nostats -y /oregondigital/media/video/j/3/oregondigital-rn301143j.mp4
  1. thumbnail is extracted from mp4 file, saved to /media/thumbnails

Troubleshooting

  • If item show page displays video player but no thumbnail loads or no video plays, confirm the derivative video created successfully. There is no check that the file exists before trying to load the player.

Audio Process

  1. audio extracted from 'content' datastream in Fedora, saved to tmp/