LAB 3 - VyshnaviPerla/PYTHON GitHub Wiki

LAB ASSIGNMENT 3

1.Linear Discriminant Analysis

Implementation:

Imported matplot library and linear discriminant analysis. Assigned variables to data and target and also imported kneighbors classifier. Splitted the data into training and testing. Using y_pred predicted the test data and plotted using three different colors with the title LDA OF IRIS DATASET

2. SVM classification

Implementation:

The dataset is loaded and split into two parts. The size of the test data is 20% and remaining is taken as training data. Then SVM model is applied with linear kernel and model is fit with training data. From the above analysis both the kernel and RBF accuracy results similar with a slight difference by changing the c and gamma values we can change the accuracy as equal. I predict RBF kernel is better fit model as class 3 and class 2 of iris data are not linearly separated. Using linear kernel for this data is not ideal.The accuracy will be improved if the data is increased in the dataset and the linear kernel has highest accuracy.

3.LEMMATISATION AND BIGRAM

Implementation:

Put the info content document in IDE and read the information from the content record. Imported nltk and introduced to perform Tokenization, lemmatization and bigrams. Getting the o/p from tokenization and considering showed the bigrams

4.K-NEAREST NEIGHBOUR

Implementation:

To perform K-means clustering on the considered dataset. We perform K-means clustering with different cluster sizes, say 1 and 50. And then note the various changes that occur when the number of clusters change. We find a great change in accuracy in both the cases of different number of clusters. Accuracy changes drastically here when the number of clusters change. But it doesn’t apply to every data set. As this performance is confined to this dataset. This might be a limitation. When the dataset changes, we cannot guarantee the same variance in the accuracy when we consider such number of clusters