Titanic dataset: data cleaning and validation - j-ros/data-cleaning GitHub Wiki
Author: Jesús Ros Solé
Objective
The objective of this project is to understand the factors that contribute to the survival of the Titatic accident and to build a model to estimate the survival probability of passengers. In more detail we will:
- Perform data cleaning and validation techniques to prepare data for analysis.
- Analise resulting dataset to discover correlations between features and compare different passenger groups.
- Build a model to estimate survival probability of Titanic passengers.
Content
- data
- raw: raw datasets as obtained from Kaggle.
- clean: clean dataset produced as output of the report.
- submission: submission file with the predicted labels for test file.
- doc
- report.pdf: pdf output of the report.
- src
- report.Rmd: source code to generate the report.
- README.md: this file.