Microsoft Computer Vision API - bounswe/bounswe2017group3 GitHub Wiki

The cloud-based Computer Vision API provides developers with access to advanced algorithms for processing images and returning information. By uploading an image or specifying an image URL, Microsoft Computer Vision algorithms can analyze visual content in different ways based on inputs and user choices.

##What is it used for?

  • Tag images based on content.
  • Categorize images.
  • Identify the type and quality of images.
  • Detect human faces and return their coordinates.
  • Recognize domain-specific content.
  • Generate descriptions of the content.
  • Use optical character recognition to identify text found in images.
  • Distinguish color schemes.
  • Flag adult content.
  • Crop photos to be used as thumbnails.

##Requirements

  • Supported input methods: Raw image binary in the form of an application/octet stream or image URL.
  • Supported image formats: JPEG, PNG, GIF, BMP.
  • Image file size: Less than 4MB.
  • Image dimension: Greater than 50 x 50 pixels.

##Tagging Images

Computer Vision API returns tags based on more than 2000 recognizable objects, living beings, scenery, and actions. In cases where tags may be ambiguous or not common knowledge, the API response provides “hints” to clarify the meaning of the tag in context of a known setting.A collection of content tags forms the foundation for an image “description” displayed as human readable language formatted in complete sentences.After uploading an image or specifying an image URL, Computer Vision API’s algorithms output a number of tags based on the objects, living beings and actions identified in the image.

##Categorizing Images

Computer Vision API returns the taxonomy-based categories defined in previous versions. These categories are organized as a taxonomy with parent/child hereditary hierarchies. All categories are in English. They can be used alone or in combination with our new models.

Image Response
people
people_crowd
animal_dog
outdoor_mountain
food_bread

##Identifying Image Types

There are several ways to categorize images. Computer Vision API can set a boolean flag to indicate whether an image is black and white or color and use the same method to indicate whether an image is a line drawing or not. It can also indicate whether an image is clipart or not and indicate its quality as such on a scale of 0-3.

##Line drawing type

Detects whether an image is a line drawing or not.

##Faces

Detects human faces within a picture and generates the face coordinates, the rectangle for the face, gender, and age. These visual features are a subset of metadata generated for face. For more extensive metadata generated for faces (facial identification, pose detection, and more), use the Face API.

##Domain-Specific Content

Specialized information can be implemented as a standalone method or in combination with the high level categorization.There are two options for making use of the domain-specific models:

###Option One - Scoped Analysis

Analyze only a chosen model, by invoking an HTTP POST call. For this option, if you know which model you want to use, you just specify the model’s name, and you only get information relevant to that model.

###Option Two - Enhanced Analysis

Analyze to provide additional details related to categories from one of the 86-category taxonomy. This option is available for use in applications where users want to get generic image analysis in addition to details from one or more domain-specific models.

##Generating Descriptions

Computer Vision API’s algorithms analyze the content found in an image, which in turn forms the foundation for a “description” displayed as human readable language in complete sentences. The description summarizes what is found in the image. Computer Vision API’s algorithms generate a number of descriptions based on the objects identified in the image. The descriptions are each evaluated and a confidence score generated.

##Perceiving Color Schemes

The Computer Vision algorithm extracts colors from an image. The colors are analyzed in three different contexts, foreground, background, and whole, and colors are grouped into twelve 12 dominant accent colors (black, blue, brown, gray, green, orange, pink, purple, red, teal, white, and yellow). Depending on the colors in an image, simple black and white or accent colors may be returned in hexadecimal color codes.

##Flagging Adult Content

Among the various visual categories is the adult and racy group, which enables detection of pornographic materials and restricts the display of images containing sexual content. The filter for adult and racy content detection can be set on a sliding scale to accommodate the user’s preference.

##Generating Thumbnails

Computer Vision API has smart cropping feature which helps for need different user experience layouts and thumbnail sizes.A thumbnail is a small representation of a full-size image.

After uploading an image, a high quality thumbnail gets generated and the Computer Vision API algorithm analyzes the objects within the image, then crops it to fit the requirements of the “region of interest” (ROI). The output gets displayed within a special framework as seen in below illustration. The generated thumbnail can be presented in a different aspect ratio than that of the original image to accommodate a user’s needs.

The thumbnail algorithm works as follows:

  1. Removes distracting elements from the image and recognizes the main object, the “region of interest” (ROI).
  2. Crops the image based on identified “region of interest”.
  3. Changes the aspect ratio to fit the target thumbnail dimensions.

##References