Bulk Import from CSV - OregonDigital/ControlledVocabularyManager GitHub Wiki

Bulk Import skips the Review queue. It is assumed that new terms are already vetted and reviewed. This should only be done by those with the Admin role.

Sample CSV ( gwilliams-ons_01-40.csv ):

vocabulary:uri,id:id_hash,label,type,alternate_name,comment
http://opaquenamespace.org/ns/creator,BaileyLD,"Bailey, L. D.",http://www.w3.org/2004/02/skos/core#PersonalName,,Photographer in US Forest Service collections; found in OSU Gerald W. Williams Photographs Collection (P 329).
http://opaquenamespace.org/ns/creator,BakerAlbertB,"Baker, Albert B.",http://www.w3.org/2004/02/skos/core#PersonalName,,"US Forest Service District Ranger, Umatilla National Forest; photographer found in OSU Gerald W. Williams Photographs Collection (P 329)."

Notes and Tips

You can specify id:id_hash with the exact value you want, as in the Sample CSV, or if you leave the column in but the value blank, the system will generate a new 9 digit alphanumeric value.

More fields are supported, such as date and same_as, but they are not in the sample. The values need to match the defined property names exactly. Check the bottom of the Rake task file for what is supported: https://github.com/OregonDigital/ControlledVocabularyManager/blob/master/lib/tasks/csv_to_jsonld.rake

If there's more than 50 new entities, it's best to split up the CSV into chunks of 50 or so (keeping the header row in each) and run them separately. There's no max limit for a single file, but this fits well with the server resources. Also, if there's an error on bulk ingest at any point, there may be some entries that were successful. You'll have to fix the error, but then also remove the successful entries from the CSV and generate again.

Steps

  1. Open Terminal window

  2. If not already done, follow "Local Development Setup" steps (except the last one for 'sunspot', Solr isn't needed for this)

  3. Start Blazegraph bundle exec rake triplestore_adapter:blazegraph:start

  4. Run Rake task to convert CSV to JSON-LD. Input and output paths can be anywhere, but need to be full paths: bundle exec rake transform:csv_to_jsonld[/home/[username]/Downloads/gwilliams-ons_01-40.csv,/home/[username]/Downloads/gwilliams-ons_01-40.jsonld]

  5. Open new JSON-LD file in text editor

  6. Visit http://opaquenamespace.org/load_rdf in browser (log in first if not already)

  7. Paste entire contents of JSON-LD folder into text box

  8. Click Load, processing may take 60 seconds or more.

  • If successful, it will redirect to the first new term.
  • If not successful, an error message should be shown. If the ID already exists, and you want to skip it and deal with it later, you can delete the entry from the JSON-LD file and try again.

At this point, new bulk terms are not added to the search index. Request a reindex to be run.