Project Exam 1 - Murarishetti-Shiva-Kumar/Python-Deep-Learning-Programming GitHub Wiki

Team 3:

Name Class ID Email
Jagruthi Bobbala 06 [email protected]
Lavanya Gadde 12 [email protected]
Sravani Garikapati 13 [email protected]
Shiva Kumar Murarishetti 31 [email protected]

1) It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase. Dataset for fraud detection.

Description of the dataset

1)Time: Number of seconds elapsed between this transaction and the first transaction in the dataset

2)V1-V28: Result of a PCA Dimensionality reduction to protect user identities and sensitive features

a) Apply Any classification of your choice (KNN, Naïve Bayes, SVM, Random Forest, …) and report the performance

image image image image image image

b) Visualize the number of samples per class (This is a binary classification, 0: Non-Fraud and 1: Fraud) and report your observation

image

c) This dataset is unbalanced meaning we don’t have equal number of samples per class. Consequently, we need specific techniques when dealing with unbalanced dataset. Please study one of the techniques or challenges we face while working on unbalanced dataset and discuss it shortly.

reference

image image image

2) Apply K-means on the dataset in this link and visualize the clusters using matplotlib or seaborn.

Description of the dataset:

1)Customer_id

2)Age

3)Annual Income

4)Spending score

Our goal is to cluster our customers into buying groups based off of their Annual Income and Spending Scores

a. Report which K is the best using the elbow method.

image

b. Evaluate with silhouette score or other scores relevant for unsupervised approaches (before applying clustering clean the data set if needed)

image

c. Can you interpret the clustering result that you have visualized?

image

3) Use the dataset in this link. Predict the temperature using the weather details specified in the columns

Description of the dataset:

This dataset has 12 columns and we want to predict “Temperature (C)” using other independent variables.

a) Apply some Exploratory Data Analysis to draw some insight from the data

image image

b) Visualize the data and draw the model line

image image image image image image

c) Evaluate the model and try to interpret the performance that you get

image image

4) Use the dataset in this link and apply classification on that.

Description of the dataset:

The dataset is Spam Dataset and has two columns

  1. class

  2. text

You need to initially clean the text data and apply the techniques we have learned for transforming the text data into numeric format (TFIDF, Count_Vectorizer, …)

image image image image image image image image

Evaluate the model and try to interpret the result

image image image

5) Pick any dataset online for the classification problem which includes both numeric and non-numeric features bank.zip

a. Perform exploratory data analysis on the data set (it can be anything on your choice that gives insight about the dataset)

image image image

b. Apply the three classification algorithms Naïve Bayes, SVM and KNN on the chosen data set and report which classifier gives better result.

image image image image image image

c. Try SVM with linear and non-linear kernel and report which one gives better performance

image image image image