Lab 4, Part 1 - data-ppf/data-ppf.github.io GitHub Wiki

Bouk and 'smoothing' mortality curves

context

  • hypothesis of age being the most important activity
  • random geocities website for data
  • activity: 4 minutes to find birth / death statistics from random website

example: obtaining, scrubbing, exploring, modeling, interpreting

  • obtaining & scrubbing data: example of real-world data analysis

    • using lynx, awk, grep on data set
    • problems of pdfs
    • problems of time zones to be watched for
    • lots of shellscript fu on web sites
    • hypothesis of singers vs. poets
  • exploratory data analysis

    • load
    • compute probability
    • start plotting
    • some basic intuitions around insurance
    • hazard, survival probability
    • then can start making money
    • calculate survival (product of probs)
  • modeling the data

    • straight line as model/design choice
    • many procedures to draw trend lines
    • subjective design choices
    • example of dangers of complicated models
    • "negative consequences of choosing a crazy model"
  • interpreting / goals

    • example of herschel not clear picking curve of sinuosity
    • marketing choice: beauty
    • making money
    • boss like curve wiggly
    • "modern machine learning": prediction primary goal
    • alternate: maximize success metric (e.g., profit, sales...)