Project Exam 1 - Murarishetti-Shiva-Kumar/Python-Deep-Learning-Programming GitHub Wiki

Team 3:

Name	Class ID	Email
Jagruthi Bobbala	06	[email protected]
Lavanya Gadde	12	[email protected]
Sravani Garikapati	13	[email protected]
Shiva Kumar Murarishetti	31	[email protected]

1) It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase. Dataset for fraud detection.

Description of the dataset

1)Time: Number of seconds elapsed between this transaction and the first transaction in the dataset

2)V1-V28: Result of a PCA Dimensionality reduction to protect user identities and sensitive features

a) Apply Any classification of your choice (KNN, Naïve Bayes, SVM, Random Forest, …) and report the performance

b) Visualize the number of samples per class (This is a binary classification, 0: Non-Fraud and 1: Fraud) and report your observation

c) This dataset is unbalanced meaning we don’t have equal number of samples per class. Consequently, we need specific techniques when dealing with unbalanced dataset. Please study one of the techniques or challenges we face while working on unbalanced dataset and discuss it shortly.

reference

2) Apply K-means on the dataset in this link and visualize the clusters using matplotlib or seaborn.

Description of the dataset:

1)Customer_id

2)Age

3)Annual Income

4)Spending score

Our goal is to cluster our customers into buying groups based off of their Annual Income and Spending Scores

a. Report which K is the best using the elbow method.

b. Evaluate with silhouette score or other scores relevant for unsupervised approaches (before applying clustering clean the data set if needed)

c. Can you interpret the clustering result that you have visualized?

3) Use the dataset in this link. Predict the temperature using the weather details specified in the columns

Description of the dataset:

This dataset has 12 columns and we want to predict “Temperature (C)” using other independent variables.

a) Apply some Exploratory Data Analysis to draw some insight from the data

b) Visualize the data and draw the model line

c) Evaluate the model and try to interpret the performance that you get

4) Use the dataset in this link and apply classification on that.

Description of the dataset:

The dataset is Spam Dataset and has two columns

class
text

You need to initially clean the text data and apply the techniques we have learned for transforming the text data into numeric format (TFIDF, Count_Vectorizer, …)

Project Exam 1 - Murarishetti-Shiva-Kumar/Python-Deep-Learning-Programming GitHub Wiki

Team 3:

1) It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase. Dataset for fraud detection.

a) Apply Any classification of your choice (KNN, Naïve Bayes, SVM, Random Forest, …) and report the performance

b) Visualize the number of samples per class (This is a binary classification, 0: Non-Fraud and 1: Fraud) and report your observation

c) This dataset is unbalanced meaning we don’t have equal number of samples per class. Consequently, we need specific techniques when dealing with unbalanced dataset. Please study one of the techniques or challenges we face while working on unbalanced dataset and discuss it shortly.

2) Apply K-means on the dataset in this link and visualize the clusters using matplotlib or seaborn.

a. Report which K is the best using the elbow method.

b. Evaluate with silhouette score or other scores relevant for unsupervised approaches (before applying clustering clean the data set if needed)

c. Can you interpret the clustering result that you have visualized?

3) Use the dataset in this link. Predict the temperature using the weather details specified in the columns

a) Apply some Exploratory Data Analysis to draw some insight from the data

b) Visualize the data and draw the model line

c) Evaluate the model and try to interpret the performance that you get

4) Use the dataset in this link and apply classification on that.

You need to initially clean the text data and apply the techniques we have learned for transforming the text data into numeric format (TFIDF, Count_Vectorizer, …)

Evaluate the model and try to interpret the result

5) Pick any dataset online for the classification problem which includes both numeric and non-numeric features bank.zip

a. Perform exploratory data analysis on the data set (it can be anything on your choice that gives insight about the dataset)

b. Apply the three classification algorithms Naïve Bayes, SVM and KNN on the chosen data set and report which classifier gives better result.

c. Try SVM with linear and non-linear kernel and report which one gives better performance