Project Exam 1 - Murarishetti-Shiva-Kumar/Python-Deep-Learning-Programming GitHub Wiki
Team 3:
Name | Class ID | |
---|---|---|
Jagruthi Bobbala | 06 | [email protected] |
Lavanya Gadde | 12 | [email protected] |
Sravani Garikapati | 13 | [email protected] |
Shiva Kumar Murarishetti | 31 | [email protected] |
Dataset for fraud detection.
1) It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase.Description of the dataset
1)Time: Number of seconds elapsed between this transaction and the first transaction in the dataset
2)V1-V28: Result of a PCA Dimensionality reduction to protect user identities and sensitive features
a) Apply Any classification of your choice (KNN, Naïve Bayes, SVM, Random Forest, …) and report the performance
b) Visualize the number of samples per class (This is a binary classification, 0: Non-Fraud and 1: Fraud) and report your observation
c) This dataset is unbalanced meaning we don’t have equal number of samples per class. Consequently, we need specific techniques when dealing with unbalanced dataset. Please study one of the techniques or challenges we face while working on unbalanced dataset and discuss it shortly.
link and visualize the clusters using matplotlib or seaborn.
2) Apply K-means on the dataset in thisDescription of the dataset:
1)Customer_id
2)Age
3)Annual Income
4)Spending score
Our goal is to cluster our customers into buying groups based off of their Annual Income and Spending Scores
a. Report which K is the best using the elbow method.
b. Evaluate with silhouette score or other scores relevant for unsupervised approaches (before applying clustering clean the data set if needed)
c. Can you interpret the clustering result that you have visualized?
link. Predict the temperature using the weather details specified in the columns
3) Use the dataset in thisDescription of the dataset:
This dataset has 12 columns and we want to predict “Temperature (C)” using other independent variables.
a) Apply some Exploratory Data Analysis to draw some insight from the data
b) Visualize the data and draw the model line
c) Evaluate the model and try to interpret the performance that you get
4) Use the dataset in this link and apply classification on that.
Description of the dataset:
The dataset is Spam Dataset and has two columns
-
class
-
text
You need to initially clean the text data and apply the techniques we have learned for transforming the text data into numeric format (TFIDF, Count_Vectorizer, …)
Evaluate the model and try to interpret the result
bank.zip
5) Pick any dataset online for the classification problem which includes both numeric and non-numeric featuresa. Perform exploratory data analysis on the data set (it can be anything on your choice that gives insight about the dataset)
b. Apply the three classification algorithms Naïve Bayes, SVM and KNN on the chosen data set and report which classifier gives better result.
c. Try SVM with linear and non-linear kernel and report which one gives better performance