Open Refine is a powerful desktop tool for cleaning up or transforming messy tabular data, and can be an invaluable tool for working with large datasets. If your data comes in from the field with Fulcrum and needs some modifications to be combined with other data, or to be imported into another location, Refine can help to do mass edits to datasets.
The first thing to do if you want to do some cleanup is get Refine installed on your platform. There are Windows, Mac, and Linux versions available. Head over to the OpenRefine documentation and follow the instructions there to get it running. Once it’s installed, we can begin to get Fulcrum data in for some cleanup and tweaking.
The easiest way to move your data into Open Refine is to use Fulcrum’s CSV export. In the following example, I have an app for capturing data on marine facilities; things like boat launch sites, marinas, and beaches. Here’s a quick snapshot of my data in Fulcrum:
I’ve exported my collected data to a CSV, and opened it in Microsoft Excel:
You can see in the two highlighted columns, address and city, have some data problems I’d like to cleanup. The address field is all entered in lowercase, which I want corrected, and the cities are all entered in inconsistent formats. With Open Refine, the addresses can be corrected to be titlecased to capitalize the first letters in each word, and the cities need to be deduplicated and merged. Let’s import the data into Refine and do some transformations.
Opening Open Refine will launch the application in your web browser, and you’ll see a dashboard of datasets you’ve imported. If you have none yet, select New Project to start a new import.
Browse to your exported CSV file and click Next. Refine will parse your data and try to autodetect the import settings. Give the project a title, make sure CSV is selected as the data type, and click Create Project.
Refine will parse the data and take you to your new project:
Refine has many built in transformation functions for manipulating data, as well as “facets”, which allow you to change how you’re looking at the data, instead of just rows and columns.
We’ll start by titlecasing the address field with a built in cell transform. Click the dropdown arrow on your column, select Edit cells > Common transforms > To titlecase. Your data was now transformed and all words are capitalized.
I also want to dedupe some of the city name data. There are a bunch of slightly different variations of the same city name that need to be combined and standardized. Refine has a feature called Facets that allows you to look at the unique data values in a given column. Clicking the column and selecting Facet will show a list in the sidebar of all unique values in that column, along with a count of matching rows.
In my example, there are several variations of “St. Petersburg” that I want to correct. Hovering over each and clicking “edit” allows you to edit the value for several columns at once. You can make wide corrections across lots of aspects of your data with this feature, cleaning it up and making it much simpler to use for reporting with tools like Microsoft Access, or visualization in ArcGIS.
There are dozens and dozens of other functions and transforms that can be performed with Open Refine. Explore around and see what else you can do with data you’ve collected in your Fulcrum account.