AlchemyAPI (IBM Bluemix Watson) - Texera/texera GitHub Wiki

Wiki Page Author: Qing Tang

Reviewed by: Chen Li

Overview

IBM AlchemyAPI provides a service that allows developers to send data to their server and get analyzed results. It does not allow developers to send their own large number of records and analyze them.

AlchemyLanguage (https://alchemy-language-demo.mybluemix.net) is a collection of text analysis functions that derive semantic information from user content. Developers can input text, HTML, or a public URL, and leverage sophisticated natural language processing techniques to get a quick high-level understanding of the content, and obtain detailed insights such as directional sentiment from an entity to an object.

How to use

PART 1 : Online DEMO

Tutorial: http://www.ibm.com/watson/developercloud/doc/alchemylanguage/tutorials.shtml

We can do the search by copying the body of text to the box, or copying the URL of the website.

Then select either a "public" or a "custom" model. The "public" model is trained with English websites and News content, and the "custom" model is trained on traffic incident reports. Click on "Analyze", and the system will show the results.

AlchemyLangugae uses entities, keywords, concepts, taxonomy, document emotion, targeted emotion, document sentiment, targeted sentiment, typed relations, relations to display the results in different ways, and title, authors, publication date, language, text extraction and feeds are for URL analysis only. Results are displayed in JSON format.

PART 2: Using Python

Before using the API, we have to make sure to do the following:

  1. Installed Python (version 3.0 or later)

  2. Installed a request module. Enter the following command in a command line:

    pip3.4 install requests / python3.4 -m pip install (Simply change 3.4 to the version we are using if we are using version other than 3.4)

    If we don't have pip installed, we can do it by following the instructions at http://docs.python-guide.org/en/latest/starting/installation/

Instructions

Adapted from: http://www.alchemyapi.com/developers/getting-started-guide/using-alchemyapi-with-python#config-sdk

  1. Get an API key
  • Register an account on IBM Bluemix and go to the Catalog, and find Watson under Services.
  • Go into AlchemyAPI and create an instance by clicking the "create" button at the right bottom.
  • Go to the dashboard and find "AlchemyAPI" under "Services". Click it.
  • Go to "Service Credentials" section and create a new credential. We will see a new credential created under "Service Credentials". Click "View Credentials".
  • Copy or write down the API key.
  1. Clone Python SDK from GitHub
  1. Configure the Python SDK to use the API Key Be sure to run the following command before using this tool. The YOUR_API_KEY is the key copied from step 1. python alchemyapi.py YOUR_API_KEY

  2. Run the example python example.py After typing the command above, we should be able to see the sample format and output of the different functions.

  3. Write our own program We can create a new .py file and start to write own own program using AlchemyAPI tools. Be sure to create the API object before calling those methods.

from alchemyapi import AlchemyAPI

alchemyapi = AlchemyAPI()

Features Provided

Here we only include information about the function used in the pull request https://github.com/Texera/texera/pull/269 . More functions can be found here: http://www.ibm.com/watson/developercloud/doc/alchemylanguage/

Concepts - They identify concepts with which the input text is associated, based on other concepts and entities that are present in that text. Concept-related API functions understand how concepts relate, and can identify concepts that are not directly referenced in the text. For example, if an article mentions CERN and the Higgs boson, the Concepts API functions will identify Large Hadron Collider as a concept, even if that term is not mentioned explicitly in the page. Concept tagging enables higher-level analysis of input content than just basic keyword identification.

Entities - They return items such as persons, places, and organizations that are present in the input text. Entity extraction adds semantic knowledge to content to help understand the subject and context of the text that is being analyzed. The entity extraction techniques used by the AlchemyLanguage service are based on sophisticated statistical algorithms and NLP technology, and are unique in the industry with their support for multilingual analysis, context-sensitive disambiguation, and quotations extraction. We can specify a custom model in our request to identify a custom set of entity types in our content, enabling domain-specific entity extraction.

Keywords - They are important topics in our content that are typically used when indexing data, generating tag clouds, or when searching. The AlchemyLanguage service automatically identifies supported languages (see the next bullet) in our input content, and then identifies and ranks keywords in that content. Sentiment can also be associated with each keyword by using the AlchemyLanguage sentiment analysis capabilities.

Language - It detects the natural language in which input text, HTML, or web-based content is written. Language identification functions can identify English, German, French, Italian, Portuguese, Russian, Spanish and Swedish. These functions enable applications to categorize or filter content based on the language in which it was written. "Microformats" processes microformat information that is included in the HTML of some webpages to add semantic information and to enable easier scanning and processing of those pages by software. The information extracted from web pages by the Microformats method can be used for tasks such as webpage categorization and content discovery.