I had a chance to play with the AlchemyAPI
I came across today. AlchemyAPI
is a semantic tagging and text mining Application Programming Interface (API).
I have about 10K web pages I want to extract top keywords and key phrases from. I want meaning extracted from the words on each page.
provides nine methods:
- Named Entity Extraction - Identifies people, companies, organizations, cities, geographic features and other entities within content provided.
- Topic Categorization - Applies a categorization for the content provided.
- Language Detection - Provides language detection for the content provided.
- Concept Tagging - Tagging of the content provided.
- Keyword Extraction - Provides topic / keyword / tag extraction for the content provided.
- Text Extraction / Web Page Cleaning - Provides mechanism to extract the page text from web pages.
- Structured Content Scraping - Ability to mine structured data from web pages.
- Microformats Parsing / Extraction - Extraction of hCard, adr, geo, and rel formatted content from any web page.
- RSS / ATOM Feed Detection - Provides RSS / ATOM feed detection in any web pages.
I'm only using the keyword extraction and named entity extraction for what I am doing. The whole API provides some great tools to quickly harvest, scrape and process content from the open Internet.
Their API is extremely easy to use and you can be up and running in about 10 minutes harvesting and processing pages.