The Data Science Toolkit

The Data Science Toolkit is a collection of open source tools wrapped in an easy-to-use REST/JSON interface, and available for download as a virtual machine image.

Some of the tools included areBoilerpipe,GeoIQ/Shuyler Erle's Geocoder, and Geodict.

The Data Science Toolkit is assembled by Pete Warden in an attempt to get these important data tools in the hands of more developers. The toolkit provides:
  • Independence - Never worry about the provider going offline, or charging once you're hooked.
  • Security - Run on your intranet, so customer information stays within the firewall.
  • Scalability - No API limits. Run a cluster of as many instances as you need.
You can play with a sandbox he's setup, review the documentation or grab the VM and launch an Amazon EC2 instance, using public AMI ami-9e7d8ff7.