Chapter 3, Classification - wwbin2008/Handson_ml_demo-and-some-notes GitHub Wiki
-
MNIST dataset, which is a set of 70,000 small images of digits handwritten by high school students and employees of the US Census Bureau. Each image is labeled with the digit it represents. This set has been studied so much that it is often called the “Hello World” of Machine Learning: whenever people come up with a new classification algorithm, they are curious to see how it will perform on MNIST. Whenever someone learns Machine Learning, sooner or later they tackle MNIST.
-
first transfer pic to 28*28 (pixel), then use plt.imshow to plot it.
plt.imshow(some_digit_image, cmap = matplotlib.cm.binary, interpolation="nearest")
-
Shuffle the training set; this will guarantee that all cross-validation folds will be similar (you don’t want one fold to be missing some digits). Moreover, some learning algorithms are sensitive to the order of the training instances, and they perform poorly if they get many similar instances in a row. Shuffling the dataset ensures that this won’t happen. Some data should not be shuffled first, like time series data, dependency structure is important.
shuffle_index = np.random.permutation(60000)
X_train, y_train = X_train[shuffle_index], y_train[shuffle_index]
-
confusion matrix, precision/recall, receiver operating characteristic (ROC) curve
-
GridSearchCV to find hyper-parameters
-
Multi-label Classification In some cases you may want your classifier to output multiple classes for each instance. For example, consider a face- recognition classifier: what should it do if it recognizes several people on the same picture? Of course it should attach one label per person it recognizes. Say the classifier has been trained to recognize three faces, Alice, Bob, and Charlie; then when it is shown a picture of Alice and Charlie, it should output [1, 0, 1] (meaning “Alice yes, Bob no, Charlie yes”). Such a classification system that outputs multiple binary labels is called a multilabel classification system.
Train as normal, nothing fancy here.