Titanic dataset: data cleaning and validation - j-ros/data-cleaning GitHub Wiki

Author: Jesús Ros Solé

Objective

The objective of this project is to understand the factors that contribute to the survival of the Titatic accident and to build a model to estimate the survival probability of passengers. In more detail we will:

  • Perform data cleaning and validation techniques to prepare data for analysis.
  • Analise resulting dataset to discover correlations between features and compare different passenger groups.
  • Build a model to estimate survival probability of Titanic passengers.

Content

  • data
    • raw: raw datasets as obtained from Kaggle.
    • clean: clean dataset produced as output of the report.
    • submission: submission file with the predicted labels for test file.
  • doc
    • report.pdf: pdf output of the report.
  • src
    • report.Rmd: source code to generate the report.
  • README.md: this file.