CLI Command Line Interface - roeiba/WikiRep GitHub Wiki

Develop mode: execute python main.py¶ ↑

Makedump commad¶ ↑

$ python main.py makedump -h

Description

downloads specified articles from wikipedia site, 
merges them into one file and compresses them it as Wikipedia dump file
article1: article canonical name on Wikipedia webpage
dump: output file name (if not specified [[defaults]] are used)

positional arguments:

article         List of articles for download

optional arguments:

-h, --help      show this help message and exit
-dump DUMPFILE  output file name

Parse command¶ ↑

python main.py parse --help

Description

Parses dump file into WikiDocuments in XML representation.

optional arguments:

-h, --help            show this help message and
-d DUMP, --dump DUMP  input dump file path
-o OUTPUT, --output OUTPUT
                    output XML file path

Build command¶ ↑

python main.py build -h

Description

Iterates all WikiDocuments found in 'src' 
and builds a words database (DatabaseWrapper) and saves it to dest

optional arguments:

-h, --help  show this help message and exit
--src SRC   Source XML which contain output of 'parse' command
--dst DST   output wdb filename

Get Value command¶ ↑

python main.py get_value -h

Description

Calculates the text vector in Wikipedia concepts space 
according to workds database at 'wikibuild.wdb'

positional arguments:

text             text for value calculation

optional arguments:

-h, --help       show this help message and exit
--dbpath DBPATH  word database path

Compare command¶ ↑

main.py compare -h

Description

Compares two texts according to words database at 'wikibuild.wdb'

optional arguments:

-h, --help       show this help message and exit
--dbpath DBPATH  word database path
--text1 TEXT1    first text
--text2 TEXT2    second text

Production mode : Executable wiki_rep¶ ↑

wiki_rep makedump article1 [article2] […] [-dump wikidump.bz2]

downloads specified articles from wikipedia site, 
merges them into one file and compresses them it as Wikipedia dump file
article1: article canonical name on Wikipedia webpage
dump: output file name (if not specified [[defaults]] are used)

wiki_rep download [-src url] [-o|–output wikidump.bz2]

Downloads Wikipedia dump file  from Wikipedia site,
url:  URL of dump file (if not specified [[defaults]] are used)
src:  dump file name on disk (if not specified [[defaults]] are used)

wiki_rep parse [-d|–dump wikidump.bz2] [-o|–output wikiparsed.xml]

Parses dump file into WikiDocuments in XML representation - Each of which contains:
    doc_id: Wikipedia concept's ID
    title: Wikipedia concept's title
    text: - Clean text 
    rev_id: - Wikipedia concept's revision

wiki_rep bulid src dest

Iterates all WikiDocuments found in 'src' and builds a words database (DatabaseWrapper) and saves it to dest
src: Source XML which contain output of 'parse' command
dest: Output destination

wiki_rep compare text1 text2 [–word_db wikibuild.wdb]

Compares two texts according to words database at 'wikibuild.wdb'

wiki_rep get_value text [–word_db wikibuild.wdb]

Calculates the text vector in Wikipedia concepts space according to workds database at 'wikibuild.wdb'

wiki_rep analyze text [-word_db wikibuild.wdb]
```
To be defined
```

CLI Command Line Interface - roeiba/WikiRep GitHub Wiki

Develop mode: execute python main.py¶ ↑

Makedump commad¶ ↑

Parse command¶ ↑

Build command¶ ↑

Get Value command¶ ↑

Compare command¶ ↑

Production mode : Executable wiki_rep¶ ↑

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️