Phase II Roadmap - roeiba/WikiRep GitHub Wiki
Web site for simple usage
- Be able to run test on server with good data visualization
- Simple Environment options
Database
- Test number of database to overcome memory limits
- Choose language independent db format for data representation
Flexebilty
We what to be able to choose between different realizations of
- Stemming algorithms
- Pruning algorithms
- Compare algorithms
Pruning
- We want to reduce size of feature space by removing too short or too specific articles
Performance
- We want to be able measure performance of the comparing and parsing, so we can improve it
Research
- Find best stemming, prunnig and comaring algorithms
Stemmers
- Find Different stemmers implementation and test them
Compare methods
- compare different metrics
Metadata
- use wikipedia metadata such as links for prunning and comparing