VTU Viva Questions and Answers for Data Science and Its Applications Laboratory (21AD62) - FarhaKousar1601/DATA-SCIENCE-AND-ITS-APPLICATION-LABORATORY-21AD62- GitHub Wiki
VTU Viva Questions and Answers for Data Science and Its Applications Laboratory (21AD62)
Module 1: Introduction to Python/R and Basic Data Visualization
Q1: What is Python?
A1: Python is a high-level, interpreted programming language known for its simplicity and readability, commonly used for data analysis, web development, and machine learning.
Q2: What are the main data types in Python?
A2: The main data types in Python are int, float, str, list, tuple, dict, and set.
Q3: What is a bar chart?
A3: A bar chart is a graphical representation of data using rectangular bars where the length of each bar is proportional to the value it represents.
Q4: How do you install Python libraries?
A4: Python libraries can be installed using the pip package manager with the command pip install library_name
.
Q5: What is the purpose of Matplotlib?
A5: Matplotlib is a Python library used for creating static, interactive, and animated visualizations in Python.
Module 2: Data Cleaning and Manipulation
Q6: What is data cleaning?
A6: Data cleaning involves identifying and correcting (or removing) errors and inconsistencies in data to improve its quality.
Q7: What are missing values?
A7: Missing values are data entries that are not recorded or are null, and handling them is crucial for accurate data analysis.
Q8: What is data manipulation?
A8: Data manipulation is the process of changing data to make it more organized and easier to analyze.
Q9: What is Pandas?
A9: Pandas is a Python library used for data manipulation and analysis, providing data structures like DataFrame and Series.
Q10: How do you handle missing data in Pandas?
A10: Missing data in Pandas can be handled using methods like dropna()
to remove missing values or fillna()
to replace them with specific values.
Module 3: Supervised Learning Techniques
Q11: What is supervised learning?
A11: Supervised learning is a type of machine learning where the model is trained on labeled data.
Q12: What is logistic regression?
A12: Logistic regression is a statistical method used for binary classification, predicting the probability of a binary outcome.
Q13: What is a support vector machine (SVM)?
A13: SVM is a supervised learning algorithm used for classification and regression tasks by finding the optimal hyperplane to separate classes.
Q14: What is overfitting?
A14: Overfitting occurs when a model learns the training data too well, including its noise and outliers, and performs poorly on new, unseen data.
Q15: How can you prevent overfitting?
A15: Overfitting can be prevented by using techniques like cross-validation, pruning, regularization, and using a simpler model.
Module 4: Decision Trees and Clustering Methods
Q16: What is a decision tree?
A16: A decision tree is a model used for classification and regression that splits data into branches based on feature values to make decisions.
Q17: What is entropy in a decision tree?
A17: Entropy is a measure of the randomness or disorder in a dataset, used to determine how a decision tree splits the data.
Q18: What is clustering?
A18: Clustering is an unsupervised learning technique used to group similar data points together based on their features.
Q19: What is K-means clustering?
A19: K-means clustering is a method that partitions data into K distinct clusters based on the mean of the data points in each cluster.
Q20: What is the difference between supervised and unsupervised learning?
A20: Supervised learning uses labeled data to train models, while unsupervised learning uses unlabeled data to find patterns and relationships.
Module 5: Web Scraping Project
Q21: What is web scraping?
A21: Web scraping is the automated process of extracting data from websites.
Q22: What tools are commonly used for web scraping?
A22: Common tools for web scraping include BeautifulSoup, Scrapy, and Selenium.
Q23: What is BeautifulSoup?
A23: BeautifulSoup is a Python library used for parsing HTML and XML documents to extract data.
Q24: What are the ethical considerations of web scraping?
A24: Ethical considerations include respecting the website's terms of service, avoiding overloading the server with requests, and not scraping personal or sensitive data.
Q25: What is an API?
A25: An API (Application Programming Interface) is a set of rules that allows different software entities to communicate with each other.
Q26: How can you handle dynamic content when web scraping?
A26: Dynamic content can be handled using tools like Selenium, which can interact with JavaScript and simulate browser actions.
Q27: What is the importance of data preprocessing?
A27: Data preprocessing is crucial for preparing raw data into a suitable format for analysis, ensuring accuracy and consistency.
Q28: What is a DataFrame in Pandas?
A28: A DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns).
Q29: What is feature scaling?
A29: Feature scaling is a method used to normalize the range of independent variables or features in data, typically using techniques like Min-Max scaling or Standardization.
Q30: What is cross-validation?
A30: Cross-validation is a technique used to evaluate the performance of a model by partitioning the data into a training set and a test set multiple times to ensure the model's generalizability.