FAQ - tfmorris/OpenRefine GitHub Wiki

Frequently Asked Questions

This is a collection of frequently asked questions on OpenRefine. Feel free to ask your own question on the OpenRefine mailing list and we'll try to answer to the best of our abilities and add them to this list.

Questions with Long Answers

Questions with Short Answers

I have a question. Where do I ask?

Send your question to the OpenRefine mailing list.

I've found a bug or want a new feature. What should I do?

Consider first discussing it on the mailing list. This will likely help characterize the issue for a good quality bug report or feature request which you can file on the issue tracker.

How do I change the workspace directory that I want Refine to use for its project storage ?

If you run Refine from the terminal, you can point to the workspace directory through the -d parameter, e.g.,

  ./refine -p 3333 -i 0.0.0.0 -m 6000M -d /where/you/want/the/workspace

OpenRefine opens my browser then waits a long time trying to connect to the server at 165.135.222.96 or another address I don't recognize.

Try to run at a specific IP: ./refine -i 127.0.0.1. (Windows users must use a slash character like C:>refine /i 127.0.0.1:8088)

Double check your Chrome or Firefox proxy settings - In Firefox options/advanced/network/connection/settings and switch from "use system proxy settings" to "auto-detect proxy settings for this network".

If you get a message "Network Error (tcp_error)" in your browser, you might also try to uncheck "automatically detect settings" and also add an exception to your firewall rules to allow 127.0.0.1 (or whatever IP address you decide to run Refine with)

What syntax of regular expression (regex) does OpenRefine support?

The regular expression syntax is that of Java regex, not of Javascript. See GREL Regular Expressions.

What syntax should I use with GREL for constructing URLs correctly and avoid HTTP errors and other pitfalls, for instance, when working with JSON strings within a URL or to create a HYPERLINK, etc ?

A good practice is to use ' single quotes for your Refine Expression syntax and reserve " double quotes for the URL syntax parts. Also make sure to escape() your cell values used, where necessary.

EXAMPLES:

'https://www.googleapis.com/freebase/v1/mqlread?query={"mid":null,"/type/object/key":{"namespace":"/authority/fmd/model","value":"'+escape(cells.ModelName.value, "url")+'"}}'
'=HYPERLINK("http://listings.listhub.net/pages/BHAMMLSAL/' + value + '",' + value + ')'

When is it going to be possible to load things into the Freebase main graph, directly or indirectly?

Short answer: we're working on this. Long answer: Refinery.

How can I delete a whole row or several rows?

Follow these steps:

  1. Flag (or star) the row(s)
  2. In the dropdown above the flag you can get a facet, by going to Facet > Facet by flag.
  3. From the facet that opens select the 'true' option.
  4. In the dropdown menu above the flag you can go to Edit Rows > Remove all matching rows.

How do I make a Text Facet show more than 2000 choices?

From version 2.0, you can go to http://127.0.0.1:3333/preferences and set the facet limit using the preference key "ui.browsing.listFacet.limit". If you are trying to discover duplicates see the next question.

How do I find duplicates in a column?

Several options:

  • create a Text Facet on a column and then in the facet click "Sort by: count". Any facets with a count of 2 or more are duplicates
  • use the facetCount() function like facetCount(value, 'value', 'column name') > 1 and select 'true' to show all rows that have duplicates
  • from version 2.1, there is a shortcut for this, Facet → Customized facets → Duplicates facet

Can OpenRefine be used as a piece of a larger ETL pipeline?

Not right now (version 2.0) but we're open and intrigued by the possibility of using Refine in 'batch mode' if the community finds it a useful feature to have. It's worth pointing out that not all Refine features can work unsupervised and without human interaction (clustering, for example), but some can, and for those it makes sense.

You may also be interested in scripting a Refine server programmatically. Here is some discussion and a project:

⚠️ **GitHub.com Fallback** ⚠️