OpenRefine Tips - DDMAL/linkedmusic-datalake GitHub Wiki
OpenRefine Reconciliation Fundamentals
- Refer to OpenRefine Manual's Reconciling page for all OpenRefine reconciliation functions.
- Useful YouTube link: YouTube Tutorial
- We currently use only the WikiData reconciliation service for all works.
Augment OpenRefine's Capacity
- Refer to OpenRefine Manual's Increasing memory allocation section
- You must adjust the memory size before working with large files.
- Recommend size is 8GB (8192MB), choose a capable machine to work.
Extracting and Applying Steps
Extracting
- You can extract a sequence of steps in OpenRefine to streamline future reconciliation of the same dataset.
- Select a sequence of steps you want to keep
- A .json file should appear in your browser's download folder. This contains all the steps you chose. Keep this for further use.
Do not change what's inside that file unless you know what you're doing.
Applying
- You can apply the sequence of steps that you kept to redo reconciliation or to reconcile an updated version of the same dataset, provided all column names are the same.
- Choose the .json file that contains the steps you want to perform on your project.
- Now wait until OpenRefine is finished, and the sequence of steps will be applied.