I stumbled across Google Refine
yesterday while writing a story on a Famers Market API
. Google Refine is the rebirth of what was formerly known as Freebase Gridworks, in which Google picked up in their 2010 acquisition of Metaweb
Google Refine is a powerful desktop tool for cleaning up, refining and normalizing datasets on your desktop. Some of the tools it provides out of the box I've spent months writing custom scripts to do, and its all available through a web-based GUI interface.
Google Refine isn't just for cleaning up messy data, you can also use to transform data that you gathered or harvested and normalize into standard, tabular data sets you can use in other system. It provides a powerful regular expression system allowing you to tranform HTML and other formats quickly into columns and rows that make sense of data, in a format that can be used by anyone.
Beyond data refinement and transformation you can add to your data using Google Refine. For example, you can do address lookups and other web service calls using the data you have, to find new values and make your data more complete and usable for whatever goal you have in mind.
Google Refine is a tool any Microsoft Excel savvy user can take advantage of. You don't have to be a programmer to use. It provides a very powerful, spreadsheet like interface that allows you to clean, transform and evolve any data set you want. Google Refine runs in a web interface, but is actually a download available for Windows or Mac, so your data doesn't actually have to be uploaded into the cloud.
I think Google Refine should be standard issue for anyone in charge of working with data in any business, non-profit, or government office. Its a perfect tool for any data journalist or scientist.