Weekly Schedule of Observer - Kelvin-Zhong/Click-Through-Rate-Prediction GitHub Wiki
Week 7 Feb. 22 Meeting
- 
Getting hand on the dataset 
- 
Preprocessing of the data A. Use the first 6 days as training data set, and the 7th data as testing dataset. 
 B. Keep the following 6 attributes:click,banner_pos,site_category,app_category, device_ip,device_model,device_type.
 (The ip address should be prased into location code)
- 
Appling algorithm to the data 
Week 4 Feb. 1 Meeting
- Proposal and Presentation
 Check out how is the proposal and presentation schedule and like (time duration, summary format. ect.)
- Paper
 Since there are much content in the paper and the presentation and summary cannot be so long, we just select a few of part of the paper works to present and write in the summary. Week 5: look through the paper, and then have a meeting together to assign related work and some other part to different teammates.
 Week 6: all teammates work on the paper presentation, writing the summary.
- Project
After finishing paper, do project.
 Idea:
 (1) first do dimension reduction by filtering out some useless features using frequent pattern mining. (2) Using some other classification algorithm based on the filtered features.
Week 4 Jan. 27
- 
Check and get familiar with the project and paper: Project: Predict whether a mobile ad will be clicked. (https://www.kaggle.com/c/avazu-ctr-prediction) 
 Paper: Dynamics of News Events and Social Media Reaction. (http://disi.unitn.it/~themis/publications/kdd14.pdf)
 Download the "test" part of the dataset and have a look. (since the "train" part of the dataset is so large, handle it after meeting)
- 
Meeting: 
 Meeting为星期天(Jan. 31) 晚上,内容:讨论分工,写proposal, 交流一下对project,paper的意见.
- 
Tools for implementation: 
 If nothing unexpected, we shall use Python as our primary programming language. If you don't know Python, just spend 10 mins to check the "Python 101" part in this webpage (https://course.ie.cuhk.edu.hk/~engg4030/tutorial/tutorial3/)
 Also, iPython and iPython Notebook are improved versions of Python, which are more convenient for development.
Comment:
Kelvin: Dataset有超过1千万条数据, 因此处理大数据可能会遇到些困难.
As for the paper: too much contents, for presentation, if it is short, don't need to present everything.
And for the project: we can start from the "All 0.5 Benchmark	0.6931472"(sample submission).
About team work assignment: 3 people for the paper, 3 people for the project, after finishing paper work, join into project work.