Final Project Requirements - adparker/GADSLA_1403 GitHub Wiki

For the Data Science final project, students will work individually or in small groups to analyze a problem in their field of interest using tools from the course.

Address a data­-related problem in your professional field or in a field you're interested in. Pick a subject that you're passionate about; if you're strongly interested in the subject matter it'll be more fun for you and you'll probably produce a better project! You can also choose a Kaggle competition, or model yours after one.

In the course of the project, you will complete the following tasks and questions:

  1. Gather, preprocess and visualize a dataset. What can you learn from a high­-level analysis?
  2. Apply modeling techniques (regression, recommendation, classification, etc.) and data analysis principles (cross­validation, caution against overfitting, etc.) and report your results.
  3. Plan out how you would implement what you’ve done in a live system. Where would the data live? How would it represented? How would end­-users access it? How often would you have to re-­do your analysis?

Vet your project with the instructional team to make sure the scope is suitable for this course.

Outline (Due March 25th, Present & Discuss March 27th)

  • Problem you are solving;
  • Description of data set and how you will obtain it;
  • Hypothesis;
  • If you know, what statistical methods you plan to use and why;
  • What business applications do you think your findings will have;
  • Is there any related or prior work (by you or others) that you can point to;

Presentations (Last Day of Class, May 15th)

On the last day of class, all students will give a 5 – 7 minute presentation per student that summarizes their data results. The presentations should target a non­-technical audience, along the lines of a TED talk. If you're in a group, each student should have a turn to speak, closer to 5 minutes each.

What to cover in the presentation:

  • Overview of problem and hypothesis
  • Overview of data
  • Any visualizations or overview you created
  • Modeling techniques used and why
  • What decisions your findings allow you to make.
  • Discuss your implementation plan (or any hurdles there would be)

Grading

  • Excellent
    • Presentation is engaging, clear, and informative, describing the project, approach, and conclusions, and is suitable for a non­technical audience.
  • Good
    • Presentation is as above but is either inadequately engaging, clear, or informative.
  • Fair
    • Student's presentation fails on two out of three of engaging, clear, and informative.
  • Poor
    • Student's presentation fails on all three or is off­topic with respect to their paper.

Additional open­ended feedback will be provided to each student

Written Report (Or Well-Annotated Ipython Notebook)

Students will also submit a short paper with code, or a well-annotated IPython notebook, that describes the project’s technical details. If you could package your code (and data) in a way that's reproducible (in Github or something), that would be great. The report should target a technical audience, such as a technical blog post. Something you can point to in a job interview.

What to cover in the report:

  • Description of problem and hypothesis.
  • Detailed description your data set.
    • How did you decide what features to use in your analysis?
    • What challenges did you face in terms of obtaining and organizing the data? o What did you learn from the initial exploration phase
  • Describe what kinds of statistical methods you used, and perhaps others you considered but did not use, and how you decided what to use.
  • What business applications do your findings have?
  • Describe the implementation plan in detail from the ingesting of data to how end­-users would access it.

Grading

  • Excellent
    • Report demonstrates thorough understanding of statistical techniques, data management, and the application of these in programming, and is clearly communicated to a reasonably technical audience.
  • Good
    • Report demonstrates above knowledge, but lacks some necessary rigor, detail, and/or exploratory depth or is not well communicated.
  • Fair
    • Report demonstrates some learning of principles taught in class, but is clearly lacking in rigor and/or depth.
  • Poor
    • Report is incomplete or does not conclusively demonstrate understanding of statistics or programming.

Additional open­-ended feedback will be provided to each group

Import Dates

Deliverable Deadline
Outline of project March 25th
Class Discussion of outline March 27th
Final Presentation & Report May 15th

The instructors will be checking in with you periodically to make sure you are making good progress on your projects. Please use office hours to obtain additional help.