CS290 Spring 2016 - Texera/texera GitHub Wiki

CS290: Text Analytics in the Big Data Era

Spring 2016, Department of Computer Science, UC Irvine

Goal:

Gain hands-on experiences to build a system to manage large amounts of text information
Study research challenges related to text and data management
Form teams to do a group project; learn tools and skills to manage a software project.

Schedule

No.	Date	Topics	Todos
01	03/28/2016	Introduction, SystemT Overview (by Instructor and Zuozhi)	Bid on tasks, form teams, github warmup
02	04/04/2016	Task assignments, [Lucene Overview] (https://docs.google.com/presentation/d/1P9HUFFW72ogqdEZf07r5Y7_gM9JK6Wu8UVgH0bGNkF0/edit?usp=sharing) (by team 1)	Lucene sample program, design phase
03	04/11/2016	ScanOperator (team 1), Data Store (team 1), Development environment (team 2), progress report (all teams)	Design phase, operator interface, test cases
04	04/18/2016	Token-based fuzzy operator (Team 5), progress report (all teams)	Operator interface, test cases
05	04/25/2016	[Stanford NLP] (https://docs.google.com/presentation/d/1ek18Zr0OqQ0RONj8D7W2aSGs9sz1etnf9bEnWTEA2ag/edit?usp=sharing) (Team 7), progress report (all teams)	Test cases, Implementation
06	05/02/2016	[Regex Matching] (https://docs.google.com/presentation/d/1F3Xboeb_azHSjWbJ2Cl36kGHpIeo_6-lI24XwXjq_hA/edit#slide=id.g12e478a39d_0_10) (Team 3), progress report (all teams)	Implementation
07	05/09/2016	[Fuzzy Tokenizer] (Foobar) (Team 2), progress report (all teams)	Implementation, Documentation
08	05/16/2016	Progress report (all teams)	Finishing Implementation, Starting Documentation

Course schedule:

Prerequisites:

Commitment: 10 hours per week, 2 units

Software Tools:

Tasks (Welcome to propose your own):

Related Projects:

Project Management:

Form teams to do tasks. Each team has 1 or 2 members;
Write test cases first;
If possible, use a simplest solution (even if it's scan-based), then develop a more advanced solution;
Be prepared to make adjustments during the course of the project.

Project Protocol:

Do not add large files to git. Check github guidance for details.
Write high-quality code.
Do high-quality peer reviews.
Write good documentations using github wiki. Each wiki page has authors and reviewers with email address.
Drawing diagrams: Use Google Drawings. Add diagram source files to Google Drive and change the ownership to "texeraproject AT gmail.com". Add authors to each diagram, and include the source file link on the wiki. Here is an example.
Use the "sandbox/" folder on git for your only experiments. Use the format of "[firstname]-[lastname]" (all lower case) for the name of your folder under "sandbox/".
Use Github Issues to manage tasks and bugs.


Sandeep Reddy Madugala	Rajesh Yarlagadda	Sudeep Meduri


Kishore Narendran	Shiladitya Sen


Zuozhi Wang	Shuying


Akshay Jain	Prakul Agarwal


Varun Bharill	Parag Sarogi


Jinggang Diao	Flavio Bayer	Qing Tang


Feng Hong	Yang Jiao