Inductive vs Analytical learning - ankitbennett/ML GitHub Wiki
Abstract
The machine learning procedure follows the scientific paradigm of induction and deduction. In the inductive step we learn the model from raw data (so-called training set), and in the deductive step the model is used to predict the behavior of new data. (Now "prediction" is vaguely used because the model itself - eg the Bayesian network - can consist of two kinds of statements: hypothesis and evidence.We can predict the evidence (facts) if the hypotheses are given or the most probable explanation even if the diagnosis of possible causes is probabilistic, ie '' prediction '', but both of these tasks are deductive use of the model.
Introduction
Inductive learning
Inductive Learning is process of learning by example -- Where a system/machine tries to Introduce generalization by training data/observation.
Given: Instance space X
Hypothesis space H
Training examples D of some target function f. D = {hx1, f(x1)i, . . . hxn, f(xn)i}
Determine:
A hypothesis from H consistent with training examples D.
Analytic learning
Analytical learning stems from the idea that when not enough training examples are provided, it may be possible to “replace” the “missing” examples by prior knowledge and deductive reasoning.
Motivation for Analytic learning
- Using prior knowledge and deductive reasoning to
- augment information given by training examples
- explanation-based learning
- Two variants: background-knowledge is or is not
- complete and correct
- Reach more accuracy with less examples
Given:
Instance space X
Hypothesis space H
Training examples D of some target function f. D = {hx1, f(x1)i, . . . hxn, f(xn)i}
Domain theory B for explaining training examples
Determine: A hypothesis from H consistent with both the training examples D and domain theory B.
We say B “explains” hx, f(x)i if x + B ⊢ f(x) B is “consistent with” h if B ⊢ h
Inductive and Analytical Learning
Inductive Learning
- Goal: Hypothesis fits data
- Justification: Statistical inference
- Advantages: Requires little prior knowledge
- Pitfalls: Scarce data, incorrect bias
Analytical Learning
- Goal: Hypothesis fits domain theory
- Justification: Deductive Inference
- Advantages: Learns from scarce data
- Pitfalls: Imperfect domain theory
History/evolution of the approach with related work
Science courses are traditionally taught deductively. The instructor first teaches students relevant theory and mathematical models, then moves on to textbook exercises, and eventually—maybe—gets to real-world applications. Often the only motivation students have to learn the material, beyond grades, is the vague promise that it will be important later in the curriculum or in their careers. Failure to connect course content to the real world has repeatedly been shown to contribute to students leaving the sciences (Seymour and Hewitt 1997; Kardash and Wallace 2001).
A better way to motivate students is inductive teaching, in which the instructor begins by presenting students with a specific challenge, such as experimental data to interpret, a case study to analyze, or a complex real-world problem to solve. Students grappling with these challenges quickly recognize the need for facts, skills, and conceptual understanding, at which point the teacher provides instruction or helps students learn on their own. Bransford, Brown, and Cocking (2000) survey extensive neurological and psychological research that provides strong support for inductive teaching methods. The literature also demonstrates that inductive methods encourage students to adopt a deep approach to learning (Ramsden 2003; Norman and Schmidt 1992; Coles 1985) and that the challenges provided by inductive methods serve as precursors to intellectual development (Felder and Brent 2004).
Algorithm with mathematical notations and diagram
ID3 Algorithm Ross Quinlan created Iterative Dichotomiser 3 algorithm in 1986. It is also known as ID3 algorithm. It is among the algorithms earlier stated. ID3 is based on Hunt‟s algorithm. It is a simple decision tree learning algorithm. In the iterative inductive approach ID3 is used to classify objects. The whole idea in the buildup of the ID3 algorithm is accomplished through the top down search of particular sets to examine every attribute at each node in the tree. Here, a metric, Information gain, comes into play for the purpose of attribute selection. Attribute selection is the main part of classification of given sets. Information gain enables for the measure of the relevance of the questions asked. This allows for the minimization of the questions needed for the classification of a learning set. The choice that ID3 makes on the splitting attribute depends on the information gain measure. Claude Shannon came up with the idea of measuring information gain by entropy in 1948. ID3 has a preference for the trees generated. Once generated,the tree should be shorter and near the top of the tree is where attributes with lower entropies should be. In building the tree models, ID3 accepts categorical attribute. This is the only process where ID3 accepts them. ID3 algorithm implement decision tree serially. However in the existence of noise ID3 does not give accurate results. For this reason, ID3 has to perform a thorough processing of data before its use in tree model building. These decision trees are mostly used for the decision making purpose .
ID3 algorithm as presented
-
For each uncategorized attribute, its entropy would be calculated with respect to the categorized attribute, or conclusion.
-
The attribute with lowest entropy would be selected.
-
The data would be divided into sets according to the attribute's value. For example, if the attribute "Size" was chosen, and the values for "Size‟ were "big‟, "medium‟ and "small", therefore three sets would be created, divided by these values.
-
A tree with branches that represent the sets would be constructed. For the above example,three branches would be created where first branch would be "big‟, second branch would be "medium‟ and third branch would be "small‟.
-
Step 1 would be repeated for each branch, but the already selected attribute would be removed and the data used was only the data that exists in the sets.
-
The process stopped when there were no more attribute to be considered or the data in the set had the same conclusion, for example, all data had the "Result‟ = yes.
Prolog-EBG
-
Prolog-EBG produces justified general hypotheses.
-
The explanation of how the examples satisfy the target concept determines which examples attributes are relevant: those mentioned in the explanation.
-
Regressing the target concept to determine its weakest preimage allows deriving more general constraints on the value of the relevant features.
-
Each learned Horn Clause corresponds to a sufficient condition for satisfying the target concept.
-
The generality of the learned Horn clauses depend on the formulation of the domain theory and on the sequence in which the training data are presented.
-
Prolog-EBG implicitly assumes that the domain theory is correct and complete.
Prolog-EBG (TargetConcept, Examples,DomainTheory)
LearnedRules ← {}
Pos← the positive examples from Training Examples
for each PositiveExample in Pos that is not covered by
LearnedRules, do
1. Explain: Explanation ← an explanation (proof) in terms of DomainTheory that PositiveExample satisfies TargetConcept
2. Analyze: SufficientConditions ← the most general set of features of PositiveExample that satisfy TargetConcept according to Explanation. (for each PositiveExample in Pos that is not covered by LearnedRules, do)
3. Refine: LearnedRules ← LearnedRules + NewHornClause, where NewHornClause is of the form TargetConcept ← SufficientConditions
Return Learned Rules
Example problem with the solution
SafeToStack(x,y) Learning Problem
Given: Instances: pairs of physical objects
Hypotheses: Sets of Horn clause rules, e.g.,
SafeT oStack(x, y) ← V olume(x, vx) ∧ Type(y,Box)
Training Examples: typical example is
SafeT oStack(Obj1 ,Obj2 )
On(Obj1 ,Obj2 ) Owner(Obj1 , Fred)
Type(Obj1 ,Box) Owner(Obj2 ,Louise)
Type(Obj2 ,Endtable) Density(Obj1 , 0.3)
Color(Obj1 ,Red) Material(Obj1 ,Cardbd)
Domain Theory:
SafeT oStack(x, y) ← ¬Fragile(y)
SafeT oStack(x, y) ← Lighter(x, y)
Lighter(x, y) ← Wt(x,wx) ∧Wt(y,wy) ∧ Less(wx,wy)
Determine: A hypothesis from H consistent with training examples and domain theory.
Applications
Credit risk assessment
X: Property of customer and proposed purchase
F(X): Approve purchase or Not
Diseases diagnostic
X: property of patients (Symptoms, Lab test)
F(X): Disease (may be recommend therapy)
Face reorganization
X: Bitmap Picture of person
F(X): Name of person.
Automatic steering
X: Bitmap of road surface picture in front of car
F(X): Degrees to turn the vehicle
Discussion
Both inductive and analytical learning mechanisms will be needed to cover the range of learning exhibited by humans and other intelligent systems. Analytical mechanisms are required in order to scale up to learning complex concepts, and to handle situations in which available training data is limited. Inductive mechanisms are required in order to learn in situations where prior knowledge is incomplete or incorrect.
Explanation-based neural network (EBNN) learning provides a robust combination of inductive and analytical learning. Experimental results demonstrate that EBNN can learn to control a mobile robot, from noisy data including vision, sonar, and laser range sensors, and based on approximate knowledge that was previously learned by the robot itself. Given strong prior knowledge, EBNN learns from considerably less data than pure induction (exemplified by the neural network Backpropagation algorithm). As the accuracy of this prior knowledge decreases, EBNN’s ability to generalize degrades gracefully until it reaches the same level of performance as pure induction.
Landmark research papers in the topic
-
A theory and methodology of inductive learning http://www.sciencedirect.com/science/article/pii/0004370283900164
-
Inductive learning algorithms and representations for text categorization https://dl.acm.org/citation.cfm?id=288651
-
Class-dependent discretization for inductive learning from continuous and mixed-mode data http://ieeexplore.ieee.org/abstract/document/391407/
-
Machine learning in automated text categorization https://dl.acm.org/citation.cfm?id=505283
-
Collaborative Learning: Cognitive and Computational Approaches. Advances in Learning and Instruction Series. https://eric.ed.gov/?id=ED437928
Resources
Books
- Machine Learning by Tom M. Mitchell
- Machine Learning: Fundamental Algorithms for Supervised and Unsupervised Learning With Real-World Applications by Joshua Chapmann
- Machine Learning: An Introduction To Supervised & Unsupervised Learning Algorithms by Michael Colins
- Markov Models Supervised and Unsupervised Machine Learning: Mastering Data Science & Python by William Sullivan
- Machine Learning for Absolute Beginners: A Plain English Introduction Paperback by Oliver Theobald
Code:
Relevant Software
- Weka: https://www.cs.waikato.ac.nz/ml/weka/
- Rattle GUI: https://cran.r-project.org/bin/windows/base/
- KNIME: https://www.knime.com/downloads
- Wolfram Alpha : https://www.wolframalpha.com/
Tutorials:
- https://pdfs.semanticscholar.org/61d0/e02d025635d7befc9fff9e9ee362289e0461.pdf
- https://link.springer.com/article/10.1007/BF00849079
Video lectures:
- https://www.youtube.com/watch?v=WYYe93__FpI
- https://www.youtube.com/watch?v=pqXASFHUfhs&t=130s
- https://www.youtube.com/watch?v=-RlLVQYhJt8
- https://youtu.be/EWmCkVfPnJ8
References
- https://link.springer.com/article/10.1007/BF00849079
- http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.64.3240&rep=rep1&type=pdf
- https://www.d.umn.edu/~rmaclin/cs5751/notes/Chapter12-6PerPage.pdf
- KBANN paper, http://citeseer.nj.nec.com/towell94knowledgebased.html
- TangentProp paper: http://research.microsoft.com/~patrice/PS/tang_prop.ps
- EBNN pape: http://citeseer.nj.nec.com/mitchell92explanationbased.html
- http://citeseer.nj.nec.com/pazzani92utility.html