MLlib - AshokBhat/ml GitHub Wiki

About

  • MLlib is Apache Spark's scalable machine learning library.

Algorithms

  • Classification: logistic regression, naive Bayes,...
  • Regression: generalized linear regression, survival regression,...
  • Decision trees, random forests, and gradient-boosted trees
  • Recommendation: alternating least squares (ALS)
  • Clustering: K-means, Gaussian mixtures (GMMs),...
  • Topic modeling: latent Dirichlet allocation (LDA)
  • Frequent itemsets, association rules, and sequential pattern mining

Workflow utilities

  • Feature transformations: standardization, normalization, hashing,...
  • ML Pipeline construction
  • Model evaluation and hyper-parameter tuning
  • ML persistence: saving and loading models and Pipelines

See also