Hackathon 2017 base api - shogun-toolbox/shogun GitHub Wiki

Machine

Supervised learning, unsupervised learning

Methods

Machine fit(Features, Labels)
Machine fit(Features) # might do "empty" labels
Labels predict(Features)
no predict_log_proba
no fit_predict

Examples With labels

SVM
KRR
KNN
Metric Learning

No labels

GMM
KDE
KMeans

Transformer

Transformer fit(Features)
might have labels here as well
Features transform(Features)

Examples

Features

No more splitting of various feautres types. global factory method that generates

Features features(Matrix)
Features features(SparseMatrix)
Features features(FileStream)
Features features(ArrowBuffer)
Features features(Strings)

Option:

This impacts all downstream API calls for features

Features add_transformer(Transformer)

Meta Machines

Wrapper every Machine type in Shogun

Pipeline

To chain preprocessors and machines

Pipeline : Machine

Pipeline with(Transformer)
Composite composite()
Machine then(Machine) # accepts the thing that should be wrapped

trans = ZeroMean()
trans.fit(feats)
svm = SVM()
svm.C = ...

pipeline().with(trans, IS_FITTED).then(svm) # this returns a Machine interface

some cool stuff

Composite : Pipeline

Composite with(Machine)
Machine then(Machine) # accepts the thing that should be wrapped

pipeline().with(trans)
          .composite()
             .with(kernel_machine('LibSVM'))
             .with(distance_machine('NearestNeighbor)) # averaging multiple predictions
          .then(Bagging) # returns Machine API

Kernel

Stateless

matrix(Feautures, Features)
matrix(Feautures, Features, idx a, idx b)

Distance

As kernel

Testing

float64_t test(Features, labels) # two/three sample test, independence test - via labels

Not part of interface

optimization
NNs (just expose fit/preict, but they are actually keras), we have a Machine/Transformer that wraps keras, cool GSoC project. Delete NN code (apart from RBM, DBN)

Distribution

fit(Features)
log_pdf()
score() # gradient of log density

gmm = GMM()
gmm.fit(feats) # runs RM
gmm.predict(feats_trest) # returns cluster index (multiclass)
gmm.as(Distribution).log_pdf(feats_test) + return probabilities

Lazy evaluation of auxiliary methods, e.g. Gaussian process probabilities that are not computed during "fit"

gp.sets("param1", ...)
gp.sets("param2", ...)
gp.train(feats, labels).gets("crazy_covariance")

in jupyter notebook

gp.fit()
>>> GaussianProcessesRegressor(kernel=GaussianKernel(sigma=1), crazy_covariance=Lazy())

hidden in train(): setsMatrix('crazy_covariance', this.computeCrazyCovariance, param1, param2)

GP method

Matrix computeCrazyCovariance(param, param)