Python Lab Assignment 1 - saisampathkumar/Python_Lab GitHub Wiki
Submitted By:
Sai Sampath Kumar Raigiri (32)
Anvesh Mandadi (24)
Pavankumar Manchala (22)
Tasks:
- Suppose you have a list of tuples as follows: [( ‘John’, (‘Physics’, 80)) , (‘ Daniel’, (‘Science’, 90)), (‘John’, (‘Science’, 95)), (‘Mark’,(‘Maths’, 100)), (‘Daniel’, (’History’, 75)), (‘Mark’, (‘Social’, 95))]
Create a dictionary with keys as names and values as list of (subjects, marks)in sorted order.
{ John : [(‘Physics’, 80), (‘Science’, 95)] Daniel : [ (’History’, 75), (‘Science’, 90)] Mark : [ (‘Maths’, 100), (‘Social’, 95)] }
To solve this, given tuple as input, defined an empty dictionary. Using a loop, the scores of each subject of a particular student are assigned if the student is not there in the dictionary. If the student data is already present then dictionary will be updated based on key i.e, student name.
The output of the above code is:
- Given a string, find the longest sub-strings without repeating characters along with the length as a tuple.
Input: "pwwkew"
Output: (wke,3), (kew,3)
The code to generate tuple of sub-string and its length by condition without repeating characters from the given string is as follows:
Based on the length of the string the characters of given string are compared with temporary variable result which is used to store the tuples. If there is no such character in result then it appends and count gets incremented, at the end the result sub-string with count is stored in form of tuple.
The output of the program is as below:
- Write a python program to create any one of the following management systems.
a). Airline Booking Reservation System (e.g. classes Flight, Person, Employee, Passenger etc.)
b). Library Management System(eg: Student, Book, Faculty, Department etc.)
For this task, we developed both systems. In this
- Create Multiple Regression by choosing a data set of your choice (again before evaluating, clean the data set with the EDA learned in the class). Evaluate the model using RMSE and R2 and also report if you saw any improvement before and after the EDA.
Code to perform Linear regression for the given data set is shown below:
We have taken Diabetes dataset, where we evaluate performance of Linear Regression model over the dataset by seeing the RMSE and R2 scores before performing any EDA(Exploratory Data Analysis). Then EDA such as finding correlation between features and drop the unwanted fields, replacing the null values are performed. Then evaluate the performance by finding RMSE and R2 scores.
The RMSE and R2 before EDA performed:
EDA(Exploratory Data Analysis): Here the null values, undefined values, infinite values are either replaced or dropped from the data. The replace is done by mean or maximum values of the feature.
After performing EDA:
RMSE and R2 values:
- Pick any data set from the data set sheet in the class sheet or online which includes both numeric and non-numeric features.
a) Perform exploratory data analysis on the data set (like Handling null values, removing the features not correlated to the target class, encoding the categorical features, ...)
b) Apply the three classification algorithms Naive Bayes's, SVM and KNN on the chosen data set and report which classifier gives better result.
For this we have chosen the Heart data set, where we fitted three classifiers and compared the accuracy. Before training the model over data set we performed the EDA to remove the null values and unwanted fields. Then we partitioned the data set into training and test data sets.
The Naive Bayes's classifier is trained and tested. The accuracy is as follows:
The KNN classifier with k value as 2 neighbours is considered for better accuracy. The output is given below:
The Linear SVM classifier is trained and tested.
- Choose any data set of your choice. Apply K-means on the data set and visualize the clusters using matplotlib or seaborn. a) Report which K is the best using the elbow method. b) Evaluate with silhouette score or other scores relevant for unsupervised approaches (before applying clustering clean the data set with the EDA learned in the class)
For the K-means clustering, we find the suitable k value using the Elbow method.
EDA is performed over the data set to replace the null values with mean, then perform k-means. Evaluating the k-means clustering using Silhouette score: