Frequent Pattern Mining - niranjv/ml-notes GitHub Wiki
- Overview
- Methods
- FP-Growth
- Association Rules
- PrefixSpan
- Apriori
- Eclat
Overview
Explore frequent pattern discvery in python, R, Spark
Problems
- Association Rules (with confidence score) - Order of items is not considered
- Sequential pattern mining - Order of items matters
Metrics
- Support
- Confidence
- Lift
- Conviction
Methods
FP-growth
- Find frequent itemsets without traversing all possible itemsets
- Implementations
- Spark:
FPGrowth.train()
,FPGrowthModel
Association Rules
Association rule learning - discover 'interesting' relations between variables
Implementations
- Spark:
AssociationRules
,FPGrowth.FreqItemset
PrefixSpan
- For sequential pattern mining
Implementations
-
Spark:
PrefixSpan
-
Apriori
-
Eclat - Equivalence Class Transformation
References
- FP-Growth algorithm: Mining frequent patterns without candidate generation, Han, et al., 2000, SIGMOD
- Parallel FP-Growth: PFP - Parallel FP-Growth for Query Recommendation
- Apriori algorithm: Fast Algorithms for Mining Association Rules
- PrefixSpan algorithm: Mining sequential patterns by pattern-growth: the PrefixSpan approach
- Spark docs, Feb 2017: Frequent Pattern Mining - RDD-based API
- Databricks blog, Apr 2015: New MLlib Algorithms in Apache Spark 1.3: FP-Growth and Power Iteration Clustering