Lab 4, Part 1 - data-ppf/data-ppf.github.io GitHub Wiki
Bouk and 'smoothing' mortality curves
context
- hypothesis of age being the most important activity
- random geocities website for data
- activity: 4 minutes to find birth / death statistics from random website
example: obtaining, scrubbing, exploring, modeling, interpreting
-
obtaining & scrubbing data: example of real-world data analysis
- using lynx, awk, grep on data set
- problems of pdfs
- problems of time zones to be watched for
- lots of shellscript fu on web sites
- hypothesis of singers vs. poets
-
exploratory data analysis
- load
- compute probability
- start plotting
- some basic intuitions around insurance
- hazard, survival probability
- then can start making money
- calculate survival (product of probs)
-
modeling the data
- straight line as model/design choice
- many procedures to draw trend lines
- subjective design choices
- example of dangers of complicated models
- "negative consequences of choosing a crazy model"
-
interpreting / goals
- example of herschel not clear picking curve of sinuosity
- marketing choice: beauty
- making money
- boss like curve wiggly
- "modern machine learning": prediction primary goal
- alternate: maximize success metric (e.g., profit, sales...)