ML: Problems - dudycooly/1235 GitHub Wiki

Problems

  • Spam Detection: Given email in an inbox, identify those email messages that are spam and those that are not. Having a model of this problem would allow a program to leave non-spam emails in the inbox and move spam emails to a spam folder. We should all be familiar with this example.
  • Credit Card Fraud Detection: Given credit card transactions for a customer in a month, identify those transactions that were made by the customer and those that were not. A program with a model of this decision could refund those transactions that were fraudulent.
  • Digit Recognition: Given a zip codes hand written on envelops, identify the digit for each hand written character. A model of this problem would allow a computer program to read and understand handwritten zip codes and sort envelops by geographic region.
  • Product Recommendation: Given a purchase history for a customer and a large inventory of products, identify those products in which that customer will be interested and likely to purchase. A model of this decision process would allow a program to make recommendations to a customer and motivate product purchases. Amazon has this capability. Also think of Facebook, GooglePlus and LinkedIn that recommend users to connect with you after you sign-up.
  • Medical Diagnosis: Given the symptoms exhibited in a patient and a database of anonymized patient records, predict whether the patient is likely to have an illness. A model of this decision problem could be used by a program to provide decision support to medical professionals.
  • Stock Trading: Given the current and past price movements for a stock, determine whether the stock should be bought, held or sold. A model of this decision problem could provide decision support to financial analysts.
  • Customer Segmentation: Given the pattern of behaviour by a user during a trial period and the past behaviours of all users, identify those users that will convert to the paid version of the product and those that will not. A model of this decision problem would allow a program to trigger customer interventions to persuade the customer to covert early or better engage in the trial.

Most talked about Real World Problems in 2018

Problem Types

The aim of solving ML problem is to make some predictions (output) based on observations from existing data(input) using its characteristics (features).

The predictions (output) could be

  • a discrete set of values or categorizes or class
  • a continuous or real value
  • relationships between different subset of given data

We may or may not have information we are going to predict for the existing data

Hence, ML problems can be grouped depending accordingly

Classification:

A problem where we are using existing data which has a class/category is assigned in order to predict class/category for new data.

e.g

  1. spam/non-spam
  2. fraud/non-fraud
  3. analyzing a image to determine if it contains a car or a person
  4. analyzing medical data to determine if a certain person is in a high risk group for a certain disease or not.

Regression:

** Alert: ** Regression Algorithms are different from Regression Problems

Here prediction is on a continuous scale using data which are mapped to those real values (think floating point) or continuous value rather than discrete values

e.g.

  1. predicting the stock price of a company
  2. predicting the temperature tomorrow based on historical data.

In both these cases, existing data is mapped or labelled with attribute of which value is being predicted for the new data

Clustering:

When *data is not labelled, the challenge would be to divide data into groups based on similarity and other measures of natural structure in the data.

e.g

  1. Organising pictures by faces without names, where the human user has to assign names to groups, like the way Google group your photos based on face similarities and ask you to confirm the picks

Rule Extraction:

Data is used as the basis for the extraction of propositional rules (if-then business rules) to discover statistically supportable relationships between attributes in the data as oppose to predicting value.

e.g.

  1. discovery of the relationship between the purchase of beer and diapers (this is data mining folk-law, true or not, itโ€™s illustrative of the desire and opportunity).