20six blog downloader script

January 16, 2006 on 11:16 pm | In Coding | 6 Comments

As discussed yesterday, Here’s the (mostly) finished script.

It could do with being checked on Windows, I’ll do that at work tomorrow I expect.

You’ll need PHP 4.3 or above, get it from php.net. Then edit BlogBackupScript.php to have your 20six username and password, and run it!

Good luck!

Download here: blog downloader script (52Kb zip file)

Edit: Or try the brand new Windows version here: (requires .Net framework 2) Blog Downloader Windows Version 2 (47Kb zip file) – It now does everything the script version does!
Edit: The windows version of PHP can be downloaded here.

I’ve had a report of timeouts on downloading some blogs. I’ll see if there’s a fix for that. I may have to give up using the weblog API for downloading the posts and just grab the web pages directly (nuts!)

Edit: Important note for Windows

The default install of PHP seems to set up a 30 second time limit on your scripts; this means you can’t download a big blog without changing it. Fortunately it’s quite simple to do so:

  1. Find your php.ini file – probably it is in C:\WINDOWS\PHP.INI – and open it in Wordpad or your favourite text editor.
  2. Find the line that reads
    max_execution_time = 30

    and change the limit to be 0. This should stop the script from being shut off after 30 seconds.

  3. You might also want to put a semi-colon in front of the line starting ‘memory_limit’ to turn off any memory limit.
  4. Save php.ini
  5. Try your script again – it should now continue downloading your blog.

A good way of checking your blog is to take the XML files and opening them in Internet Explorer or Firefox; this will report an error if the file isn’t complete.
Really big blog?

If you have over 1000 posts in your blog you also need to edit the script slightly; find the line that reads

$numposts = 1000;

and add a 0 on the end to that it will download up to 10,000 posts.

Backing up from 20six

January 16, 2006 on 10:13 am | In Coding | 2 Comments

My old blogging ‘platform’ was 20six.co.uk, who were pretty reasonable for a free service and had some excellent community building features such as giving other users ’sweeties’ and tracking the latest posts in your friends’ blogs.

Unfortunately they are just about to change over to a new technology and will lose many of the great community features (allegedly); also they will break the image links (although the images themselves will apparently remain).

In order to rescue my old blog – and more to the point, Bobble’s old blog which is far bigger – I spent most of Sunday playing around with code that could archive the old blog, images and comments. I’m nearly there now, having written some code that can save the blog entries and images. Comments I will sort out tonight, and then post the code here for others to use.

A quick warning; The code is written in PHP, so you will have to install that first. Also I’ve developed it on the Mac so it might not work on Windows – I’ll try testing it tomorrow at work to iron out any windows/mac bugs.

The format that blogs are saved in is RSS – the same format that you can ’subscribe’ to using Bloglines or applications like NetNewsWire. Some applications (e.g. NetNewsWire) understand RSS files and well as websites and you can still browse your old blog entries from the archived files.

Ultimately the aim will be to take the RSS archive and have another bit of code to import it into the new blog software that I’m running on this site (Wordpress). But that’s a task for another day.

Quick bit of technical detail: I’m using PHP 4.x which was already installed on my Mac. I’m using the MetaWeblog API to get the blog entries from 20six, then downloading the images locally and changing the blog entries to point to the local images. Soon I will add comment downloading and will embed the old comments into the blog entries themselves.

If you are really lucky I might put the code onto this website so that it can do the hard work for you; I’ll give that a try later tonight as well.

technorati tags: , , , ,

Creative Commons License This work is licensed under a Creative Commons Attribution 2.5 License.
Powered by WordPress with Pool theme design by Borja Fernandez, modified by bubb.
Entries and comments feeds. Valid XHTML and CSS. ^Top^