Inductive vs Analytical learning - ankitbennett/ML GitHub Wiki

Abstract

The machine learning procedure follows the scientific paradigm of induction and deduction. In the inductive step we learn the model from raw data (so-called training set), and in the deductive step the model is used to predict the behavior of new data. (Now "prediction" is vaguely used because the model itself - eg the Bayesian network - can consist of two kinds of statements: hypothesis and evidence.We can predict the evidence (facts) if the hypotheses are given or the most probable explanation even if the diagnosis of possible causes is probabilistic, ie '' prediction '', but both of these tasks are deductive use of the model.

Introduction

Inductive learning

Inductive Learning is process of learning by example -- Where a system/machine tries to Introduce generalization by training data/observation.

Given: Instance space X

Hypothesis space H

Training examples D of some target function f. D = {hx1, f(x1)i, . . . hxn, f(xn)i}

Determine:

A hypothesis from H consistent with training examples D.

Analytic learning

Analytical learning stems from the idea that when not enough training examples are provided, it may be possible to “replace” the “missing” examples by prior knowledge and deductive reasoning.

Motivation for Analytic learning

Using prior knowledge and deductive reasoning to
augment information given by training examples
explanation-based learning
Two variants: background-knowledge is or is not
complete and correct
Reach more accuracy with less examples

Given:

Instance space X

Hypothesis space H

Training examples D of some target function f. D = {hx1, f(x1)i, . . . hxn, f(xn)i}

Domain theory B for explaining training examples

Determine: A hypothesis from H consistent with both the training examples D and domain theory B.

We say B “explains” hx, f(x)i if x + B ⊢ f(x) B is “consistent with” h if B ⊢ h

Inductive and Analytical Learning

Inductive Learning

Goal: Hypothesis fits data
Justification: Statistical inference
Advantages: Requires little prior knowledge
Pitfalls: Scarce data, incorrect bias

Analytical Learning

Goal: Hypothesis fits domain theory
Justification: Deductive Inference
Advantages: Learns from scarce data
Pitfalls: Imperfect domain theory

History/evolution of the approach with related work

Science courses are traditionally taught deductively. The instructor first teaches students relevant theory and mathematical models, then moves on to textbook exercises, and eventually—maybe—gets to real-world applications. Often the only motivation students have to learn the material, beyond grades, is the vague promise that it will be important later in the curriculum or in their careers. Failure to connect course content to the real world has repeatedly been shown to contribute to students leaving the sciences (Seymour and Hewitt 1997; Kardash and Wallace 2001).

A better way to motivate students is inductive teaching, in which the instructor begins by presenting students with a specific challenge, such as experimental data to interpret, a case study to analyze, or a complex real-world problem to solve. Students grappling with these challenges quickly recognize the need for facts, skills, and conceptual understanding, at which point the teacher provides instruction or helps students learn on their own. Bransford, Brown, and Cocking (2000) survey extensive neurological and psychological research that provides strong support for inductive teaching methods. The literature also demonstrates that inductive methods encourage students to adopt a deep approach to learning (Ramsden 2003; Norman and Schmidt 1992; Coles 1985) and that the challenges provided by inductive methods serve as precursors to intellectual development (Felder and Brent 2004).

Algorithm with mathematical notations and diagram

ID3 Algorithm Ross Quinlan created Iterative Dichotomiser 3 algorithm in 1986. It is also known as ID3 algorithm. It is among the algorithms earlier stated. ID3 is based on Hunt‟s algorithm. It is a simple decision tree learning algorithm. In the iterative inductive approach ID3 is used to classify objects. The whole idea in the buildup of the ID3 algorithm is accomplished through the top down search of particular sets to examine every attribute at each node in the tree. Here, a metric, Information gain, comes into play for the purpose of attribute selection. Attribute selection is the main part of classification of given sets. Information gain enables for the measure of the relevance of the questions asked. This allows for the minimization of the questions needed for the classification of a learning set. The choice that ID3 makes on the splitting attribute depends on the information gain measure. Claude Shannon came up with the idea of measuring information gain by entropy in 1948. ID3 has a preference for the trees generated. Once generated,the tree should be shorter and near the top of the tree is where attributes with lower entropies should be. In building the tree models, ID3 accepts categorical attribute. This is the only process where ID3 accepts them. ID3 algorithm implement decision tree serially. However in the existence of noise ID3 does not give accurate results. For this reason, ID3 has to perform a thorough processing of data before its use in tree model building. These decision trees are mostly used for the decision making purpose .

ID3 algorithm as presented

For each uncategorized attribute, its entropy would be calculated with respect to the categorized attribute, or conclusion.
The attribute with lowest entropy would be selected.
The data would be divided into sets according to the attribute's value. For example, if the attribute "Size" was chosen, and the values for "Size‟ were "big‟, "medium‟ and "small", therefore three sets would be created, divided by these values.
A tree with branches that represent the sets would be constructed. For the above example,three branches would be created where first branch would be "big‟, second branch would be "medium‟ and third branch would be "small‟.
Step 1 would be repeated for each branch, but the already selected attribute would be removed and the data used was only the data that exists in the sets.
The process stopped when there were no more attribute to be considered or the data in the set had the same conclusion, for example, all data had the "Result‟ = yes.

Prolog-EBG

Prolog-EBG produces justified general hypotheses.
The explanation of how the examples satisfy the target concept determines which examples attributes are relevant: those mentioned in the explanation.
Regressing the target concept to determine its weakest preimage allows deriving more general constraints on the value of the relevant features.
Each learned Horn Clause corresponds to a sufficient condition for satisfying the target concept.
The generality of the learned Horn clauses depend on the formulation of the domain theory and on the sequence in which the training data are presented.
Prolog-EBG implicitly assumes that the domain theory is correct and complete.

Prolog-EBG (TargetConcept, Examples,DomainTheory)

LearnedRules ← {}

Pos← the positive examples from Training Examples

for each PositiveExample in Pos that is not covered by

LearnedRules, do

1. Explain: Explanation ← an explanation (proof) in terms of DomainTheory that PositiveExample satisfies TargetConcept

2. Analyze: SufficientConditions ← the most general set of features of PositiveExample that satisfy TargetConcept according to Explanation. (for each PositiveExample in Pos that is not covered by LearnedRules, do)

3. Refine: LearnedRules ← LearnedRules + NewHornClause, where NewHornClause is of the form TargetConcept ← SufficientConditions

Return Learned Rules

Example problem with the solution

SafeToStack(x,y) Learning Problem

Given: Instances: pairs of physical objects

Hypotheses: Sets of Horn clause rules, e.g.,

SafeT oStack(x, y) ← V olume(x, vx) ∧ Type(y,Box)

Training Examples: typical example is

SafeT oStack(Obj1 ,Obj2 )

On(Obj1 ,Obj2 ) Owner(Obj1 , Fred)

Type(Obj1 ,Box) Owner(Obj2 ,Louise)

Type(Obj2 ,Endtable) Density(Obj1 , 0.3)

Color(Obj1 ,Red) Material(Obj1 ,Cardbd)

Domain Theory:

SafeT oStack(x, y) ← ￢Fragile(y)

SafeT oStack(x, y) ← Lighter(x, y)

Lighter(x, y) ← Wt(x,wx) ∧Wt(y,wy) ∧ Less(wx,wy)

Determine: A hypothesis from H consistent with training examples and domain theory.

Applications

Credit risk assessment

X: Property of customer and proposed purchase

F(X): Approve purchase or Not

Diseases diagnostic

X: property of patients (Symptoms, Lab test)

F(X): Disease (may be recommend therapy)

Face reorganization

X: Bitmap Picture of person

F(X): Name of person.

Automatic steering

X: Bitmap of road surface picture in front of car

F(X): Degrees to turn the vehicle

Discussion

Both inductive and analytical learning mechanisms will be needed to cover the range of learning exhibited by humans and other intelligent systems. Analytical mechanisms are required in order to scale up to learning complex concepts, and to handle situations in which available training data is limited. Inductive mechanisms are required in order to learn in situations where prior knowledge is incomplete or incorrect.

Explanation-based neural network (EBNN) learning provides a robust combination of inductive and analytical learning. Experimental results demonstrate that EBNN can learn to control a mobile robot, from noisy data including vision, sonar, and laser range sensors, and based on approximate knowledge that was previously learned by the robot itself. Given strong prior knowledge, EBNN learns from considerably less data than pure induction (exemplified by the neural network Backpropagation algorithm). As the accuracy of this prior knowledge decreases, EBNN’s ability to generalize degrades gracefully until it reaches the same level of performance as pure induction.

Landmark research papers in the topic

A theory and methodology of inductive learning http://www.sciencedirect.com/science/article/pii/0004370283900164
Inductive learning algorithms and representations for text categorization https://dl.acm.org/citation.cfm?id=288651
Class-dependent discretization for inductive learning from continuous and mixed-mode data http://ieeexplore.ieee.org/abstract/document/391407/
Machine learning in automated text categorization https://dl.acm.org/citation.cfm?id=505283
Collaborative Learning: Cognitive and Computational Approaches. Advances in Learning and Instruction Series. https://eric.ed.gov/?id=ED437928

Resources

Books

Machine Learning by Tom M. Mitchell
Machine Learning: Fundamental Algorithms for Supervised and Unsupervised Learning With Real-World Applications by Joshua Chapmann
Machine Learning: An Introduction To Supervised & Unsupervised Learning Algorithms by Michael Colins
Markov Models Supervised and Unsupervised Machine Learning: Mastering Data Science & Python by William Sullivan
Machine Learning for Absolute Beginners: A Plain English Introduction Paperback by Oliver Theobald

Code:

https://github.com/michaelimstepf/supervised-learning.git

Relevant Software

Weka: https://www.cs.waikato.ac.nz/ml/weka/
Rattle GUI: https://cran.r-project.org/bin/windows/base/
KNIME: https://www.knime.com/downloads
Wolfram Alpha : https://www.wolframalpha.com/

Tutorials:

Video lectures: