Phase II Roadmap - roeiba/WikiRep GitHub Wiki

Web site for simple usage

  • Be able to run test on server with good data visualization
  • Simple Environment options

Database

  • Test number of database to overcome memory limits
  • Choose language independent db format for data representation

Flexebilty

We what to be able to choose between different realizations of

  • Stemming algorithms
  • Pruning algorithms
  • Compare algorithms

Pruning

  • We want to reduce size of feature space by removing too short or too specific articles

Performance

  • We want to be able measure performance of the comparing and parsing, so we can improve it

Research

  • Find best stemming, prunnig and comaring algorithms

Stemmers

  • Find Different stemmers implementation and test them

Compare methods

  • compare different metrics

Metadata

  • use wikipedia metadata such as links for prunning and comparing