Import - momo54/DSMW GitHub Wiki
Importing pages in DSMW
If you have an important set of pages and you want to import in DSMW, configuration of mediawiki matters. Remember that all pages has to be processes before being used with DSMW.
- The import of a large xml file can fail due to php, apache, mediawiki setup (see import large XML dump) . To do that, i configured my php.ini (/etc/php5/apache2/php.ini on ubuntu) as follow:
max_execution_time = 2000
max_input_time = 2000
memory_limit = -1
post_max_size = 80M
upload_max_filesize = 80M
The documentation say also to set
; Default timeout for socket based streams (seconds)
default_socket_timeout = 2000
- Next, i imported once large xml file through 'Special:import' page of traditional mediawiki.This page is available when logged as WikiSysop. No particular output is generated during import, just a "waiting for locahost" from the browser. It can be long...More documentation at importing XML Dumps. I finally prefered to import using importDump.php:
molli@molli-VirtualBox:/var/www/mw1.16.4/maintenance$ php importDump.php /home/molli/dumpXML.xml
100 (158.80 pages/sec 158.80 revs/sec)
200 (183.75 pages/sec 183.75 revs/sec)
300 (201.51 pages/sec 201.51 revs/sec)
400 (211.86 pages/sec 211.86 revs/sec)
...
9000 (8.15 pages/sec 8.15 revs/sec)
9100 (8.17 pages/sec 8.17 revs/sec)
Done!
You might want to run rebuildrecentchanges.php to regenerate
the recentchanges page.
- The first time i run that, it got the php error "Maximum function nesting level of '100' reached". I changed the file "/etc/php5/conf.d/xdebug.ini" adding "xdebug.max_nesting_level=1000":
zend_extension=/usr/lib/php5/20090626+lfs/xdebug.so
xdebug.remote_enable=1
xdebug.remote_handler=dbgp
xdebug.remote_mode=req
xdebug.remote_host=127.0.0.1
xdebug.remote_port=9000
xdebug.max_nesting_level=1000
- Next i run:
molli@molli-VirtualBox:/var/www/mw1.16.4/maintenance$ php rebuildrecentchanges.php
Loading from page and revision tables...
$wgRCMaxAge=7862400 (91 days)
Updating links and size differences...
Loading from user, page, and logging tables...
Flagging bot account edits...
Flagging auto-patrolled edits...
Deleting feed timestamps.
Done.
molli@molli-VirtualBox:/var/www/mw1.16.4/maintenance$
-
results of import can be surprising. For example, i'don't pages of the import. Some complications can appear in import (see Complications
-
The imported history of one page can be older than a similar local page. (often the case for the main_page).
-
Next, you must run "article update". You have to run DSMW/maintenance/ArticleUpdate.php
molli@molli-VirtualBox:/var/www/mw1.16.4/extension/DSMW/maintenance$ php ArticleUpdate.php
...
processing (9187,9192) : Chicken chimi in the oven 4.
processing (9188,9192) : Recipe section title.
processing (9189,9192) : Constraint section title.
processing (9190,9192) : Substitution 1 in My strawberry pie.
processing (9191,9192) : News.
processing (9192,9192) : TODO list.