November 2nd in SEO, Webmaster, WordPress by charlie .

Mapping URLs When Importing Movable Type to WordPress

Sometimes you can’t do things the easy way – you just gotta bear down and do the drudgery. This is me, getting it done. And no, HELL NO, it wasn’t easy. It was long and hard. And the results were totally worth it!

  • Twitter
  • Facebook
  • StumbleUpon
  • Technorati Favorites
  • FriendFeed
  • Delicious
  • LinkedIn
  • Yahoo Buzz
  • Yahoo Bookmarks
  • Google Bookmarks
  • Share/Bookmark

Just now wrapping up a huge project migrating from Movable Type 3.17 to WordPress 2.8.5, and it wasn’t so bad, all things considered.

This blog has 20 authors, 102 categories, and 4,787 posts with about 108,000 comments at the time of migration.

There was the small matter of manually removing over 76,000 spam comments. I put in a lot of time researching spam removal, but comment spam tools only work as the comments are submitted, so I put in a lot of hours in MySQL  doing comment spam search and destroy. That sure wasn’t worth an ‘engineering solution’, and an engineering solution wouldn’t have been possible anyways.

And yeah, if folks I work with had wanted to reduce the number of categories or create super categories, I would have been oh so sad, but they’re sane and reasonable people for the most part and quite happy with the end result.

Really it was all hard work but pretty straightforward – except for mapping URLs. That was a little trickier.

The old site uses an outdated /archives/postid.html URL format for posts, like /archives/998483.html, and the new blog uses the WordPress pretty permalinks – /year/month/day/post-title/. The archives and categories are also getting friendlier URLs.

I definitely need to map old URLs to new URLs.

I’m not losing years of SEO value or going 404 on 4,787 posts and hundreds of monthly archive and category pages. Not happening. I needed a way to map URLs and create 301 redirects at the same time.

I tried Redirection but had too many problems to make it work.

I decided to use HTACCESS, and in the end it turned out to be a really simple solution.

Exporting Movable Type to WordPress

There are only a few good posts about exporting Movable Type to WordPress, and pretty much everyone says to use the stock Movable Type exporter and WordPress Movable Type/Typepad importer, but they miss the point of maintaining URLs.

When you migrate from Movable Type to WordPress you need the Post ID to map each post to its new URL, and the old Movable Type exporter doesn’t give you this.

The people who’ve thought it through all link over to the Mudita Journal: Importing from MT to WordPress, where Joshua Zader gives some seriously good How-To on migrating Movable Type to WordPress. He specifically explains how to get the Post IDs exported from Movable Type, and also how to import them into WordPress.

So I made the change to the Movable Type export file and downloaded a massive .HTML file from Movable Type into Firefox, saved it, opened it in Textpad and was ecstatic to see all the posts with Post IDs and comments.

The file was huge, so I broke it out into smaller file, each with about 100k lines, so I’d be importing files around 4-5MB. This is much less prone to errors than importing 25MB files.

I also got the existing Movable Type database in a MySQL dump file, and imported that onto my server because I don’t have access to the blog’s existing database servers. (This turned out to be a huge bonus move as I explain below.)

My next step was to replace the WordPress Movable Type/TypePad  import file with Joshua’s file for Movable Type – I FTP’d his file into /wp-admin/import and imported a block of test data. Content displayed in Category and Archive pages, but I wasn’t able to view individual posts – instead I got an undisplayable content error.

Frustrating.

I’m NOT a PHP coder so I’m not about to muck about trying to figure out Joshua’s script, and nobody has written anything on this that I could find, so I was pretty screwed.

And then I realized that I have the IDs and Post Titles in my original Movable Type database.

Brainblast

So I EXPORTED that data to a CSV file, popped that open in Excel, did a little spreadsheet magic, a little copy/paste/search and replace in TextPad, and created the entire URL mapping by hand in less than 2 hours. And it was FUN too!

Now here’s where I got all slick, thinking how I’ve got the data, I can just magic up a solution with the Redirection plugin for WordPress and I’m good to go.

But alas, it was not meant to be. (Look at me, “alas” – I kill myself sometimes..)

I tried and tried using Redirection, but I just could NOT make it work. I tried importing the redirects into the plugin but it could never upload the CSV cleanly. I also tried importing the redirects into MySQL, but Redirection wouldn’t recognize what I’d done. I LOVE Redirection, but I’m not entering 5,000 URLs by hand.

It’s right about now when I realize how totally screwed I truly am.

Stepping into the World of HTACCESS

Redirection would have been sweet because I wouldn’t actually need to know the full path to the new URL, I’d just need the title or part of the title, Redirection is THAT good. But no.

So I read up on using HTACCESS and I swear to God I’m beginning to understand it…but then I realize that nobody talks about wildcards on the new URL, so I ask one of the networking gurus I work with and he told me without doubt, you need to know the full URL that you’re redirecting to.

And then I realized that not only do I have the IDs and Post Titles in my original database, I also have the Publish Date/Times!

I’m creating the WordPress pretty URLs by hand!

So, back to excel, do some sorting, do some cell formatting, some more copy and paste, and voila, I’ve got thousands of solid redirects in my HTACCESS file.

Here’s what it looks like.

Mapping URLs with HTACCESS

Mapping URLs with HTACCESS

First of all, I’m in WordPress and mod_rewrite is enabled already:

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress

and my HTACCESS file is in my site root, so I don’t need to restate that. Also, I’m doing one-to-one redirects, so I don’t need to use regex syntax. So here’s what it all comes down to – 4,978 lines that look like this:

redirect 301 /original-path/filename.html http://www.newdomain/2009/10/29/new-filename/

That, my friend, is a thing of beauty. Simple, written in English so anybody can understand it, and it works!!

And of course I’m also using Smart 404 and Redirection to catch any 404s that come up on me, cuz I know I’ll have them.

Downsides? My HTACCESS file is like 512k, small filesize, but there’s now 4,978 lines of redirects in my HTACCESS and I know that’s going to hurt performance. I’m on a fast dedicated server but still, I don’t like taking that hit. And of course there’s no reporting like Redirection would give me, so I’m left to log files which is fine, I don’t mind.

After all these years managing websites, searching logfiles, coding, reporting, all that, I just can’t get past how key Excel and TextPad are. Give me TextPad and Excel and I can rule the world!

charlie

Hey, thanks for stopping by! Please take a few moments to comment on this topic and let me know how I'm doing.

Leave A Comment.