How to choose Kernel - SoojungHong/MachineLearning GitHub Wiki

First of all, what is Kernel? A kernel is a similarity function. It is a function that you, as the domain expert, provide to a machine learning algorithm. It takes two inputs and spits out how similar they are.

Given two objects, the kernel outputs some similarity score. The objects can be anything starting from two integers, two real valued vectors, trees whatever provided that the kernel function knows how to compare them.

The arguably simplest example is the linear kernel, also called dot-product. Given two vectors, the similarity is the length of the projection of one vector on another.

Another interesting kernel examples is Gaussian kernel. Given two vectors, the similarity will diminish with the radius of σ. The distance between two objects is "reweighted" by this radius parameter.

The success of learning with kernels (again, at least for SVMs), very strongly depends on the choice of kernel. You can see a kernel as a compact representation of the knowledge about your classification problem. It is very often problem specific. Kernel is a way of computing the dot product of two vectors x and y in some (possibly very high dimensional) feature space, which is why kernel functions are sometimes called "generalized dot product".

Suppose we have a mapping φ:Rn→Rm that brings our vectors in Rn to some feature space Rm. Then the dot product of x and y in this space is φ(x)Tφ(y). A kernel is a function k that corresponds to this dot product, i.e. k(x,y)=φ(x)Tφ(y).

Why is this useful? Kernels give a way to compute dot products in some feature space without even knowing what this space is and what is φ.

For example, consider a simple polynomial kernel k(x,y)=(1+xTy)2 with x,y∈R2. This doesn't seem to correspond to any mapping function φ, it's just a function that returns a real number. Assuming that x=(x1,x2) and y=(y1,y2), let's expand this expression:

k(x,y)=(1+xTy)2=(1+x1y1+x2y2)2==1+x21y21+x22y22+2x1y1+2x2y2+2x1x2y1y2

reference : https://www.quora.com/What-are-kernels-in-machine-learning-and-SVM-and-why-do-we-need-them https://stats.stackexchange.com/questions/152897/how-to-intuitively-explain-what-a-kernel-is?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa

How to choose Kernel With so many kernels to choose from, how can you decide which one to use? As a rule of thumb, you should always try the linear kernel first (remember that LinearSVC is much faster than SVC(kernel="linear")), especially if the training set is very large or if it has plenty of features. If the training set is not too large, you should try the Gaussian RBF kernel as well; it works well in most cases. Then if you have spare time and computing power, you can also experiment with a few other kernels using cross-validation and grid search, especially if there are kernels specialized for your training set’s data structure.