CLI Command Line Interface - roeiba/WikiRep GitHub Wiki

Develop mode: execute python main.py

Makedump commad

$ python main.py makedump -h

Description

downloads specified articles from wikipedia site, 
merges them into one file and compresses them it as Wikipedia dump file
article1: article canonical name on Wikipedia webpage
dump: output file name (if not specified [[defaults]] are used)

positional arguments:

article         List of articles for download

optional arguments:

-h, --help      show this help message and exit
-dump DUMPFILE  output file name

Parse command

python main.py parse --help

Description

Parses dump file into WikiDocuments in XML representation.

optional arguments:

-h, --help            show this help message and
-d DUMP, --dump DUMP  input dump file path
-o OUTPUT, --output OUTPUT
                    output XML file path

Build command

python main.py build -h

Description

Iterates all WikiDocuments found in 'src' 
and builds a words database (DatabaseWrapper) and saves it to dest

optional arguments:

-h, --help  show this help message and exit
--src SRC   Source XML which contain output of 'parse' command
--dst DST   output wdb filename

Get Value command

python main.py get_value -h

Description

Calculates the text vector in Wikipedia concepts space 
according to workds database at 'wikibuild.wdb'

positional arguments:

text             text for value calculation

optional arguments:

-h, --help       show this help message and exit
--dbpath DBPATH  word database path

Compare command

main.py compare -h

Description

Compares two texts according to words database at 'wikibuild.wdb'

optional arguments:

-h, --help       show this help message and exit
--dbpath DBPATH  word database path
--text1 TEXT1    first text
--text2 TEXT2    second text

Production mode : Executable wiki_rep

  • wiki_rep makedump article1 [article2] […] [-dump wikidump.bz2]

    downloads specified articles from wikipedia site, 
    merges them into one file and compresses them it as Wikipedia dump file
    article1: article canonical name on Wikipedia webpage
    dump: output file name (if not specified [[defaults]] are used)
  • wiki_rep download [-src url] [-o|–output wikidump.bz2]

    Downloads Wikipedia dump file  from Wikipedia site,
    url:  URL of dump file (if not specified [[defaults]] are used)
    src:  dump file name on disk (if not specified [[defaults]] are used)
  • wiki_rep parse [-d|–dump wikidump.bz2] [-o|–output wikiparsed.xml]

    Parses dump file into WikiDocuments in XML representation - Each of which contains:
        doc_id: Wikipedia concept's ID
        title: Wikipedia concept's title
        text: - Clean text 
        rev_id: - Wikipedia concept's revision
  • wiki_rep bulid src dest

    Iterates all WikiDocuments found in 'src' and builds a words database (DatabaseWrapper) and saves it to dest
    src: Source XML which contain output of 'parse' command
    dest: Output destination
  • wiki_rep compare text1 text2 [–word_db wikibuild.wdb]

    Compares two texts according to words database at 'wikibuild.wdb'
    
  • wiki_rep get_value text [–word_db wikibuild.wdb]

    Calculates the text vector in Wikipedia concepts space according to workds database at 'wikibuild.wdb'
  • wiki_rep analyze text [-word_db wikibuild.wdb]

    To be defined
    
⚠️ **GitHub.com Fallback** ⚠️