20100506 from wordpress to flatpress - plembo/onemoretech GitHub Wiki

title: From Wordpress to Flatpress link: https://onemoretech.wordpress.com/2010/05/06/from-wordpress-to-flatpress/ author: lembobro description: post_id: 163 created: 2010/05/06 02:40:25 created_gmt: 2010/05/06 02:40:25 comment_status: open post_name: from-wordpress-to-flatpress status: publish post_type: post

From Wordpress to Flatpress

Following more late nights than I really wanted to invest, the eldapo blog has been converted from Wordpress to Flatpress, making it entirely database free for the first time since it was published in 2004.

Doing this conversion was an educational experience. It was also quite humbling. In the end, a combination of a Flatpress project supplied php script, a good text editor with data sort capability and a couple of simple perl scripts got the job done.

Given the amount of effort involved the obvious question is, why?

I moved from Blogger to Wordpress a number of years ago, after finding the former too confining for my creative tastes. Although I did learn enough about Wordpress to do some minor tweaks to themes and other superficial aspects, the content of this blog was always more important to me than its appearance.

The one enduring issue I had with Wordpress was its dependence on a database. Given the relatively small number of entries on this blog (approximately 700), it’s hard to justify using a db to store what is substantially a collection of unformatted text. So off I went looking for a halfway decent blogging package that didn’t need a database.

As a result, I came upon Flatpress and began to gather the knowledge and tools I’d need to convert to it from Wordpress.

Before beginning my conversion efforts I set up a new Wordpress instance with its own database on my home workstation. I then used Wordpress’s built in export and import facility to get the entry data from my Internet host.

Initially I thought the best way to migrate my entries from the old blog to the new would be to do an export from Wordpress into its unique .xml format. I was able to find a python script that would parse that into separate .html files. After some tweaking to overcome what must have been a typo in the script, I made some slight modifications to insert the post’s title and date into each page.

After several abortive attempts I ultimately found the task of scripting the conversion of these .html files into Flatpress’s own special page format too time consuming to pursue further. Instead, after searching the Flatpress project wiki, I found the wpimport script. This is a php script that directly queries the backend database of a Wordpress blog and converts the data obtained into individual Flatpress entries. One of the nice things about this script is that it automatically preserves any existing categories and the assignment of entries to them.

The only problem with this approach is that Wordpress saves successive drafts of posts in separate database records. As a result, a heavily edited post will have several database entries. The first time I imported one of my blogs into Flatpress using wpimport, I was faced with a dozen entries for a single article, including one for each image. My original solution to this problem was to manually prune the unwanted entries on the command line. Later, I discovered that the number of extra entries converted could be substantially reduced by doing a Wordpress export to xml from my local instance and then re-importing it again. Even after this there was some cleanup to do. I found that running the Flatpress re-indexing routine would give me a list containing entry id’s and subject lines that I could sort by subject. The resulting list could then be pruned of any entries I wanted to keep, usually after a visual inspection of the file to make sure I really wanted it. The final product was then passed through a script that deleted the unwanted entries.

Here’s the code for that:

`

#!/usr/bin/perl -w
use strict;
use Text::ParseWords;
	
my $HOME = $ENV{'HOME'};
my $infile = "/tmp/tobedeleted.csv";
my $WEBDIR = '/var/www/html/eldapo/wp-content/content';
	
open FH, "<$infile" or die $!;
	
while (<FH>) {
	
	 chomp;
   	 my (
  	     $entryid,
	     $title
	
		 ) = ( &parse_line(',',0,$_));
	
	my $path = `find $WEBDIR -name "$entryid.txt" -print`;
	
	`rm -f $path`;
}
	
close FH;

`

Handling the re-mapping of image urls was pretty straightforward. All I had to do was write a little perl script find each img url in each converted entry and transform them to point to the new location for my images (to keep things simple this was a single, flat, directory, wp-content/images).

Here’s my code:

`

#!/usr/bin/perl -w
use strict;
	
my $HOME = $ENV{'HOME'};
my $infile = "$HOME/tmp/eldapo_images.txt";
my $WEBDIR = '/var/www/html/eldapo/wp-content/content';
	
`find $WEBDIR -name "*.txt" | xargs grep 'img src' > $infile`;
	
open FH, "<$infile" or die $!;
	
while (<FH>) {
	chomp;
	
	m/(./d{2}/d{2}/)(entryd{6}-d{6}.txt)/;
	
	my $prepath = $1;
	my $entryid = $2;
	for($prepath) { s/^.//; }
	
	my $path = $WEBDIR . $prepath . $entryid;
	@ARGV = glob "$path" or die $!;
	$^I = "~";
	
        while (<>) {
   	    s/http://eldapo.lembobrothers.com/wp-content/uploads/d{4}/d{2}//wp-content/images/g;
	    print;
	 }
	 unlink ("$path~") or warn "could not delete $path $!n";
	 print $entryid, " processedn";
}
close FH;

`

With all of my entries processed, I ran a final re-index and archived my local copy of the blog. After that it was just a matter of uploading it to my Internet server and unarchiving (after first renaming my original Wordpress blog folder).

Once I had my new blog in place, it was time to do a little configuring. Apart from downloading and installing a new theme (Leggero 3c+1), I followed these suggestions to disable comments globally. It’s not that I don’t enjoy comments, on other people’s blogs. Moderating comments, even on a blog as little travelled as this one, is just one chore too many for this overworked sysadmin.

Copyright 2004-2019 Phil Lembo