Importing Data in OpenRefine - smith-special-collections/sc-documentation GitHub Wiki

Importing Data

Common Steps for Word Documents

The following steps taken in Word before importing will save a lot of trouble down the line.

Optionally, in order to preserve italics during transformation, it's necessary to add the appropriate tags. Note that this process can introduce problems in complex inventories, so use with caution.

  1. Click on "Replace"
  2. Click "More" on the bottom left
  3. Click on "Format" on the bottom left and select "Font"
  4. Select "Italic" in the "Font style" menu
  5. Click OK
  6. Add <title render="italic">^&</title> to the second box
  7. Click "Replace All"
  8. Click the "No Formatting" button on the bottom
  9. Change the first box to <title render="italic"></title>
  10. Delete the contents of the second box
  11. Click "Replace All"
  12. Change the first box to <title render="italic"> </title>
  13. Press spacebar in the second box
  14. Click "Replace All"

Word Inventories with Tables

Some Word inventories have data already in tables. This makes working with the data much easier.

  1. Launch OpenRefine
  2. Select Clipboard
  3. Copy entire table and paste into OpenRefine.
  4. Click Next
  5. Give project appropriate name on top right
  6. Select Create Project

Note: Depending on how data was entered, it may be useful to convert to Excel for preliminary cleanup prior to bringing into OpenRefine.

Word Inventories without Tables

SSC accession inventories were created in Word without tables. These require some preparation before we can import them into OpenRefine.

First, let's get rid of the text that repeats on each page in most accession inventories.

  1. Click on "Replace"
  2. Copy and paste the line with the collection name into the first box
  3. Click "Replace All"
  4. Repeat steps 2-3 for each of the lines of text before the start of the inventory

Word files contain a lot of extra encoding, so save it as plain text:

  1. Click on File > Save As
  2. Select desired location
  3. Select "Plain Text" under Save as type
  4. Click Save
  5. Click OK

Now we can load it into OpenRefine using the following steps:

  1. Launch OpenRefine
  2. Select Browse...
  3. Select text file
  4. Click Next
  5. Under "Parse data as" click on "CSV / TSV / separator-based files"
  6. Uncheck the boxes labeled "Parse next," "Use character," and "Store blank rows."
  7. Select Create Project on the top right
⚠️ **GitHub.com Fallback** ⚠️