9.1.Introduction to Machine Learning - sj50179/IBM-Data-Science-Professional-Certificate GitHub Wiki

In this module, you will learn about applications of Machine Learning in different fields such as health care, banking, telecommunication, and so on. You’ll get a general overview of Machine Learning topics such as supervised vs unsupervised learning, and the usage of each algorithm. Also, you understand the advantage of using Python libraries for implementing Machine Learning models.

Learning Objectives

Give examples of Machine Learning in various industries.
Outline the steps machine learning uses to solve problems.
Provide examples of various techniques used in machine learning.
Describe the Python libraries for Machine Learning.
Explain the differences between Supervised and Unsupervised algorithms.
Describe the capabilities of various algorithms.

What is Machine Leaning?

Introduction to Machine Learning

What is machine learning?

Machine learning is the subfield of computer science that gives "computers the ability to learn without being explicitly programmed."

Arthur Samuel, American pioneer in the field of computer gaming and artificial intelligence

Machine learning algorithms, inspired by the human learning process, iteratively learn from data, and allow computers to find hidden insights.

Example of machine learning

Machine Learning impacts society in a very influential way:

Netflix and Amazon use Machine Learning to produce suggestions that you might enjoy. This is similar to how your friends might recommend a television show to you, based on their knowledge of the types of shows you like to watch.
Banks use machine learning to predict the probability of default for each applicant, and then approve or refuse the loan application based on that probability.
Telecommunication companies use their customers’ demographic data to segment them, or predict if they will unsubscribe from their company the next month.

There are many other applications of machine learning that we see every day in our daily life, such as chatbots, logging into our phones or even computer games using face recognition. Each of these use different machine learning techniques and algorithms.

Major machine learning techniques

Regression/Estimation
- Predicting continuous values
  - Example: predicting the price of a house based on its characteristics, or to estimate the Co2 emission from a car’s engine
Classification
- Predicting the item class/category of a case
  - Example: if a cell is benign or malignant, or whether or not a customer will churn.
Clustering
- Finding the structure of data; summarization
  - Example: can find similar patients, or can be used for customer segmentation in the banking field
Associations
- Associating frequent co-occurring items/events
  - Example: grocery items that are usually bought together by a particular customer
Anomaly detection
- Discovering abnormal and unusual cases
  - Example: it is used for credit card fraud detection
Sequence mining
- Predicting next events
  - Example: click-stream (Markov Model, HMM)
Dimension Reduction
- Reducing the size of data (PCA)
Recommendation systems
- Recommending items

Difference between artificial intelligence, machine learning, and deep learning

AI components - AI tries to make computers intelligent in order to mimic the cognitive functions of humans
- Computer vision
- Language processing
- Creativity
- etc.
Machine learning - Machine Learning is the branch of AI that covers the statistical part of artificial intelligence. It teaches the computer to solve problems by looking at hundreds or thousands of examples, learning from them, and then using that experience to solve the same problem in new situations.
- Classification
- Clusting
- Neural network
- etc.
Revolution in ML - Deep Learning is a very special field of Machine Learning where computers can actually learn and make intelligent decisions on their own. Deep learning involves a deeper level of automation in comparison with most machine learning algorithms.
- Deep learning

Python for Machine Learning

Python libraries for machine learning

NumPy
- a math library to work with N-dimensional arrays in Python. It enables you to do computation efficiently and effectively. It is better than regular Python because of its amazing capabilities.
- For example, for working with arrays, dictionaries, functions, datatypes and working with images you need to know NumPy.
SciPy
- a collection of numerical algorithms and domain specific toolboxes, including signal processing, optimization, statistics and much more. SciPy is a good library for scientific and high performance computation.
matplotlib
- a very popular plotting package that provides 2D plotting, as well as 3D plotting.
pandas
- a very high-level Python library that provides high performance easy to use data structures. It has many functions for data importing, manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and timeseries.
scikit learn
- a collection of algorithms and tools for machine learning.

More about scikit-learn

Free software machine learning library
Classification, Regression, and Clustering algorithms
Works with NumPy and SciPy
Great documentation
Easy to implement

Most of the tasks that need to be done in a machine learning pipeline are implemented already in scikit-learn including pre-processing of data, feature selection, feature extraction, train test splitting, defining the algorithms, fitting models, tuning parameters, prediction, evaluation, and exporting the model.

scikit-learn functions

Basically, machine-learning algorithms benefit from standardization of the dataset. If there are some outliers or different scales fields in your dataset, you have to fix them.

The pre-processing package of sciKit-learn provides several common utility functions and transformer classes to change raw feature vectors into a suitable form of vector for modeling.

from sklearn import preprocessing
X = preprocessing.StandardScaler().fit(X).transform(X)

You have to split your dataset into train and test sets to train your model and then test the model's accuracy separately.

sciKit-learn can split arrays or matrices into random train and test subsets for you in one line of code.

from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.33)

Then you can set up your algorithm. For example, you can build a classifier using a support vector classification algorithm.

We call our estimator instance CLF and initialize its parameters.

from sklearn import svm
clf = svm.SVC(gamma=0.001, C=100.)

Now you can train your model with the train set by passing our training set to the fit method, the CLF model learns to classify unknown cases.

clf.fit(X_train, Y_train)

Then we can use our test set to run predictions, and the result tells us what the class of each unknown value is.

clf.predict(X_test)

Also, you can use the different metrics to evaluate your model accuracy.

For example, using a confusion matrix to show the results.

from sklearn.metrics import confusion_matrix
print(confusion_matrix(Y_test, yhat, labels=[1, 0]))

And finally, you save your model.

import pickle
s = pickle.dumps(clf)

The most important point to remember is that the entire process of a machine learning task can be done simply in a few lines of code using sciKit-learn. Please notice that though it is possible, it would not be that easy if you want to do all of this using NumPy or SciPy packages. And of course, it needs much more coding if you use pure Python programming to implement all of these tasks.

Supervised vs Unsupervised

What is supervised learning?

How do we supervise a machine learning model?
- We do this by "teaching the model", that is we load the model with knowledge so that we can have it predict future instances.
How exactly do we teach a model?
- We teach the model by training it with some data from a labeled dataset. It's important to note that the data is labeled.
What does a labeled dataset look like?
- This example is taken from the cancer dataset. As you can see, we have some historical data for patients, and we already know the class of each row.

Types of supervised learning

Classification is the process of predicting discrete class labels or categories.
Regression is the process of predicting continuous values.

What is unsupervised learning?

We do not supervise the model, but we let the model work on its own to discover information that may not be visible to the human eye.
The unsupervised algorithm trains on the dataset, and draws conclusions on unlabeled data.
Generally speaking, unsupervised learning has more difficult algorithms than supervised learning since we know little to no information about the data, or the outcomes that are to be expected.
Unsupervised learning techniques:
- Dimension reduction
  - Dimensionality reduction, and/or feature selection, play a large role in this by reducing redundant features to make the classification easier.
- Density estimation
  - Density estimation is a very simple concept that is mostly used to explore the data to find some structure within it.
- Market basket analysis
  - Market basket analysis is a modeling technique based upon the theory that if you buy a certain group of items, you're more likely to buy another group of items.
- Clustering
  - Clustering is considered to be one of the most popular unsupervised machine learning techniques used for grouping data points, or objects that are somehow similar.
  - Cluster analysis has many applications in different domains, whether it be a bank's desire to segment his customers based on certain characteristics, or helping an individual to organize in-group his, or her favorite types of music.
What is clustering?

Clustering is grouping of data points or objects that are somehow similar by:
- Discovering structure
- Summarization
- Anomaly detection

Supervised vs Unsupervised learning

Supervised Learning

Classification:
- Classifies labeled data
Regression:
- Predicts trends using previous labeled data
Has more evaluation methods than unsupervised learning
Controlled environment

Unsupervised Learning

Clustering:
- Finds patterns and groupings from unlabeled data
- Has fewer evaluation methods than supervised learning
- Less controlled environment

Intro to Machine Learning

TOTAL POINTS 9

Question 1

Supervised learning deals with unlabeled data, while unsupervised learning deals with labelled data.

True
~~False~~

Correct

Question 2

The "Regression" technique in Machine Learning is a group of algorithms that are used for:

Predicting a continuous value; for example predicting the price of a house based on its characteristics.
~~Prediction of class/category of a case; for example a cell is benign or malignant, or a customer will churn or not.~~
~~Finding items/events that often co-occur; for example grocery items that are usually bought together by a customer.~~

Correct

Question 3

When comparing Supervised with Unsupervised learning, is this sentence True or False?

In contrast to Supervised learning, Unsupervised learning has more models and more evaluation methods that can be used in order to ensure the outcome of the model is accurate.

~~True~~
False

Correct

9.1.Introduction to Machine Learning - sj50179/IBM-Data-Science-Professional-Certificate GitHub Wiki

Learning Objectives

What is Machine Leaning?

Introduction to Machine Learning

What is machine learning?

Example of machine learning

Major machine learning techniques

Difference between artificial intelligence, machine learning, and deep learning

Python for Machine Learning

Python libraries for machine learning

More about scikit-learn

scikit-learn functions

Supervised vs Unsupervised

What is supervised learning?

Types of supervised learning

What is unsupervised learning?

What is clustering?

Supervised vs Unsupervised learning

Intro to Machine Learning

Question 1

Question 2

Question 3