Hackathon 2017 base api - shogun-toolbox/shogun GitHub Wiki
Machine
Supervised learning, unsupervised learning
Methods
-
Machine fit(Features, Labels)
-
Machine fit(Features)
# might do "empty" labels -
Labels predict(Features)
-
no predict_log_proba
-
no fit_predict
Examples With labels
- SVM
- KRR
- KNN
- Metric Learning
No labels
- GMM
- KDE
- KMeans
Transformer
Transformer fit(Features)
- might have labels here as well
Features transform(Features)
Examples
- PCA
Features
No more splitting of various feautres types. global factory method that generates
Features features(Matrix)
Features features(SparseMatrix)
Features features(FileStream)
Features features(ArrowBuffer)
Features features(Strings)
Option:
This impacts all downstream API calls for features
Features add_transformer(Transformer)
Meta Machines
Wrapper every Machine type in Shogun
Pipeline
To chain preprocessors and machines
Pipeline : Machine
Pipeline with(Transformer)
Composite composite()
Machine then(Machine)
# accepts the thing that should be wrapped
trans = ZeroMean()
trans.fit(feats)
svm = SVM()
svm.C = ...
pipeline().with(trans, IS_FITTED).then(svm) # this returns a Machine interface
some cool stuff
Composite : Pipeline
Composite with(Machine)
Machine then(Machine)
# accepts the thing that should be wrapped
pipeline().with(trans)
.composite()
.with(kernel_machine('LibSVM'))
.with(distance_machine('NearestNeighbor)) # averaging multiple predictions
.then(Bagging) # returns Machine API
Kernel
Stateless
matrix(Feautures, Features)
matrix(Feautures, Features, idx a, idx b)
Distance
As kernel
Testing
float64_t test(Features, labels)
# two/three sample test, independence test - via labels
Not part of interface
- optimization
- NNs (just expose fit/preict, but they are actually keras), we have a Machine/Transformer that wraps keras, cool GSoC project. Delete NN code (apart from RBM, DBN)
Distribution
fit(Features)
log_pdf()
score()
# gradient of log density
gmm = GMM()
gmm.fit(feats) # runs RM
gmm.predict(feats_trest) # returns cluster index (multiclass)
gmm.as(Distribution).log_pdf(feats_test) + return probabilities
Lazy evaluation of auxiliary methods, e.g. Gaussian process probabilities that are not computed during "fit"
gp.sets("param1", ...)
gp.sets("param2", ...)
gp.train(feats, labels).gets("crazy_covariance")
in jupyter notebook
gp.fit()
>>> GaussianProcessesRegressor(kernel=GaussianKernel(sigma=1), crazy_covariance=Lazy())
hidden in train():
setsMatrix('crazy_covariance', this.computeCrazyCovariance, param1, param2)
GP method
Matrix computeCrazyCovariance(param, param)