Final Project Requirements - adparker/GADSLA_1403 GitHub Wiki
For the Data Science final project, students will work individually or in small groups to analyze a problem in their field of interest using tools from the course.
Address a data-related problem in your professional field or in a field you're interested in. Pick a subject that you're passionate about; if you're strongly interested in the subject matter it'll be more fun for you and you'll probably produce a better project! You can also choose a Kaggle competition, or model yours after one.
In the course of the project, you will complete the following tasks and questions:
- Gather, preprocess and visualize a dataset. What can you learn from a high-level analysis?
- Apply modeling techniques (regression, recommendation, classification, etc.) and data analysis principles (crossvalidation, caution against overfitting, etc.) and report your results.
- Plan out how you would implement what you’ve done in a live system. Where would the data live? How would it represented? How would end-users access it? How often would you have to re-do your analysis?
Vet your project with the instructional team to make sure the scope is suitable for this course.
Outline (Due March 25th, Present & Discuss March 27th)
- Problem you are solving;
- Description of data set and how you will obtain it;
- Hypothesis;
- If you know, what statistical methods you plan to use and why;
- What business applications do you think your findings will have;
- Is there any related or prior work (by you or others) that you can point to;
Presentations (Last Day of Class, May 15th)
On the last day of class, all students will give a 5 – 7 minute presentation per student that summarizes their data results. The presentations should target a non-technical audience, along the lines of a TED talk. If you're in a group, each student should have a turn to speak, closer to 5 minutes each.
What to cover in the presentation:
- Overview of problem and hypothesis
- Overview of data
- Any visualizations or overview you created
- Modeling techniques used and why
- What decisions your findings allow you to make.
- Discuss your implementation plan (or any hurdles there would be)
Grading
- Excellent
- Presentation is engaging, clear, and informative, describing the project, approach, and conclusions, and is suitable for a nontechnical audience.
- Good
- Presentation is as above but is either inadequately engaging, clear, or informative.
- Fair
- Student's presentation fails on two out of three of engaging, clear, and informative.
- Poor
- Student's presentation fails on all three or is offtopic with respect to their paper.
Additional openended feedback will be provided to each student
Written Report (Or Well-Annotated Ipython Notebook)
Students will also submit a short paper with code, or a well-annotated IPython notebook, that describes the project’s technical details. If you could package your code (and data) in a way that's reproducible (in Github or something), that would be great. The report should target a technical audience, such as a technical blog post. Something you can point to in a job interview.
What to cover in the report:
- Description of problem and hypothesis.
- Detailed description your data set.
- How did you decide what features to use in your analysis?
- What challenges did you face in terms of obtaining and organizing the data? o What did you learn from the initial exploration phase
- Describe what kinds of statistical methods you used, and perhaps others you considered but did not use, and how you decided what to use.
- What business applications do your findings have?
- Describe the implementation plan in detail from the ingesting of data to how end-users would access it.
Grading
- Excellent
- Report demonstrates thorough understanding of statistical techniques, data management, and the application of these in programming, and is clearly communicated to a reasonably technical audience.
- Good
- Report demonstrates above knowledge, but lacks some necessary rigor, detail, and/or exploratory depth or is not well communicated.
- Fair
- Report demonstrates some learning of principles taught in class, but is clearly lacking in rigor and/or depth.
- Poor
- Report is incomplete or does not conclusively demonstrate understanding of statistics or programming.
Additional open-ended feedback will be provided to each group
Import Dates
Deliverable | Deadline |
---|---|
Outline of project | March 25th |
Class Discussion of outline | March 27th |
Final Presentation & Report | May 15th |
The instructors will be checking in with you periodically to make sure you are making good progress on your projects. Please use office hours to obtain additional help.