Business Understanding - mdengo/cab-survey GitHub Wiki

Identifying our Business Goals

This project is an obligatory component of the course Introduction to Data Science, in Autumn Semester 2018 at University of Tartu. The goal of the project is to apply the learned techniques independently and successfully on some self-chosen data set. Hence, the overall business goal is at the course instructors’ side. We can only assume, that the goal is to educate the students and to make them pass the course with a big knowledge gain.

Assessing our situation

Inventory of Resources

Our resources comprise brain power of the team member, the team’s private technical equipment and the equipment available to students of University of Tartu in general. Our time resource for the whole project is 26 days (calculated from 22nd of November) or 21 days after the project pitch on 28th of November. Regarding software, we will rely on the Data Science packages from Python programming language.

Requirements, Assumptions and Constraints

We assume our data set is relatively clean and minable with the given technologies from the lecture. Unfortunately, we are constrained in our computing resources. Furthermore, some kind of project schedule is needed and continuously reporting according to CRISP-DM to track our workflow and to detect errors.

Risks and Contingencies

As there are no kernels published yet on kaggle.com for this particular data set, there might be the case that it is not relevant enough or not of enough interest. However, this circumstance is very unlikely - and even if our insights will not be significant, they are still insights which will be sufficient for passing the course. The team members’ schedule might prevent the team from meeting the deadline, but good time management should handle this.

Terminlogy

Our glossar can be found here

Costs and Benefits

This project will cost no money, but it will cost the team’s time and energy. Still, our benefits will be knowledge gain and some interesting insights into the Indian social community which might even help development organizations or the Indian administration. Finally, completing any kind of project will strengthen our competencies and satisfy our eagerness to create.

Defining our Data Mining Goals

Data Mining Goals

The main goal is to derive a reasonable hypothesis from the data set applying reasonable data mining methods. Meanwhile, the team needs to deliver the project code of the data preparation, modeling, evaluation and deployment and a summary of a project report as a presentation poster.

Data Mining Success Criteria

The success of our project depends on the instructors Meelis Kull, Mikk Puustusmaa and the rest of the Introduction to Data Science Team. The formal evaluation criteria were given in a project kickoff lecture. Every team member will receive the team grading of maximum 20 points. 10 points can be achieved by representational quality, which means our poster will hold all necessary content such as main results, applied data science methods, motivation and objectives. 10 more points can be reached by technical quality, which means to state a clear objective in our report, gain relevant insight from the chosen relevant data and execute everything time-efficiently.