CLI Command Line Interface - roeiba/WikiRep GitHub Wiki
$ python main.py makedump -h
Description
downloads specified articles from wikipedia site, merges them into one file and compresses them it as Wikipedia dump file article1: article canonical name on Wikipedia webpage dump: output file name (if not specified [[defaults]] are used)
positional arguments:
article List of articles for download
optional arguments:
-h, --help show this help message and exit -dump DUMPFILE output file name
python main.py parse --help
Description
Parses dump file into WikiDocuments in XML representation.
optional arguments:
-h, --help show this help message and -d DUMP, --dump DUMP input dump file path -o OUTPUT, --output OUTPUT output XML file path
python main.py build -h
Description
Iterates all WikiDocuments found in 'src' and builds a words database (DatabaseWrapper) and saves it to dest
optional arguments:
-h, --help show this help message and exit --src SRC Source XML which contain output of 'parse' command --dst DST output wdb filename
python main.py get_value -h
Description
Calculates the text vector in Wikipedia concepts space according to workds database at 'wikibuild.wdb'
positional arguments:
text text for value calculation
optional arguments:
-h, --help show this help message and exit --dbpath DBPATH word database path
main.py compare -h
Description
Compares two texts according to words database at 'wikibuild.wdb'
optional arguments:
-h, --help show this help message and exit --dbpath DBPATH word database path --text1 TEXT1 first text --text2 TEXT2 second text
-
wiki_rep makedump article1 [article2] […] [-dump wikidump.bz2]
downloads specified articles from wikipedia site, merges them into one file and compresses them it as Wikipedia dump file article1: article canonical name on Wikipedia webpage dump: output file name (if not specified [[defaults]] are used)
-
wiki_rep download [-src url] [-o|–output wikidump.bz2]
Downloads Wikipedia dump file from Wikipedia site, url: URL of dump file (if not specified [[defaults]] are used) src: dump file name on disk (if not specified [[defaults]] are used)
-
wiki_rep parse [-d|–dump wikidump.bz2] [-o|–output wikiparsed.xml]
Parses dump file into WikiDocuments in XML representation - Each of which contains: doc_id: Wikipedia concept's ID title: Wikipedia concept's title text: - Clean text rev_id: - Wikipedia concept's revision
-
wiki_rep bulid src dest
Iterates all WikiDocuments found in 'src' and builds a words database (DatabaseWrapper) and saves it to dest src: Source XML which contain output of 'parse' command dest: Output destination
-
wiki_rep compare text1 text2 [–word_db wikibuild.wdb]
Compares two texts according to words database at 'wikibuild.wdb'
-
wiki_rep get_value text [–word_db wikibuild.wdb]
Calculates the text vector in Wikipedia concepts space according to workds database at 'wikibuild.wdb'
-
wiki_rep analyze text [-word_db wikibuild.wdb]
To be defined