Migrating from WordPress to Hugo

In preparation for my move from WordPress to Hugo, I read a few blog posts on the subject to make sure I wouldn’t run into a brick wall. After all, Google had already indexed over 3000 posts covering the subject in detail so what could possibly go wrong?

Moving from WordPress to Hugo

My new Hugo based site on the left, my old WordPress installation sitting to the right.

Avoiding obvious pitfalls

My main concern regarding the migration from WordPress to Hugo was predominantly to avoid implementing a new URL structure. Such a change would trigger a re-indexing of my site with search engines and would also mess up my subscribers RSS feeds.

Spoiler: I would end up partially failing my most prioritized task.

Exporting content from WordPress

This process is already thoroughly documented on many different blogs, and there is also a section available on the topic on Hugo’s official website. Personally, I used the WordPress to Jekyll plugin, but I’ll leave it at that and dive straight into the unexpected issues I experienced.

With WordPress, I deploy the “month and name” structure, actually written as /%year%/%monthnum%/%postname%/, so the first order of business was to replicate that setting in Hugo.

Looking at the Hugo documentation, the solution was as simple as adding the following setting to the default site configuration:

# config.toml 
[permalinks]
  posts = "/:year/:month/:title/"

This change worked as intended, or so I initially believed at first glance. However, there are subtle differences between how WordPress and Hugo handle unreserved characters in a URI.

For instance, WordPress will replace any occurrence of the period (.) character with a dash (-) if found inside of a word in the title. Hugo on the other hand preserves it. A great number of my permalinks contain version numbers and thus an unintentional mess was already at hand:

# Hugo canonical url
/2019/03/how-to-enable-tls-1.3-on-gentoo-linux/

# WordPress canonical url
/2019/03/how-to-enable-tls-1-3-on-gentoo-linux/

In addition to getting HTTP 404 (not found) errors from search engine traffic, this blunder also had the effect of feeding duplicated content to feed readers:

RSS - Duplicate URLS

An additional canonical URL introduces duplicate content to the RSS feed.

The rather simple solution to this problem was to specify the permalink (url) manually in the front matter and thus the old links were restored:

title: "How to enable TLS 1.3 on Gentoo Linux"
url: /2019/03/how-to-enable-tls-1-3-on-gentoo-linux/

Rewrite rules

Both Hugo and WordPress adds tags and categories as default taxonomies. However, Hugo constructs its URLs using the plural form, while WordPress uses the singular form:

# Hugo
https://blog.paranoidpenguin.net/categories/android/
https://blog.paranoidpenguin.net/tags/android/
https://blog.paranoidpenguin.net/index.xml (rss)

# WordPress
https://blog.paranoidpenguin.net/category/android/
https://blog.paranoidpenguin.net/tag/android/
https://blog.paranoidpenguin.net/feed/ (rss)

Additionally, I had previously implemented hierarchical taxonomies with WordPress, giving me with the following neat structure:

- GNU/Linux 
-- Slackware Linux
-- CentOS

- Microsoft
-- Office 365
-- Windows

I decided to abandon hierarchical taxonomies altogether and simply redirect requests for parent and child categories to one or the other, depending on what makes the most sense.

As an example: I want all requests for
/category/gnu-linux/slackware-linux/
to be permanently redirected to
/categories/slackware-linux/

# Nginx rewrite rules for hierarchical taxonomies
rewrite ^/category/gnu-linux/slackware-linux/$ https://$server_name/categories/slackware-linux/ permanent;
rewrite ^/category/gnu-linux/centos/$ https://$server_name/categories/centos/ permanent;
rewrite ^/category/microsoft/office-365/$ https://$server_name/categories/microsoft/ permanent;
rewrite ^/category/microsoft/windows/$ https://$server_name/categories/microsoft/ permanent;

# Nginx general rewrite rules for taxonomies and feeds
rewrite ^/tag/(.*) https://$server_name/tags/$1 permanent;
rewrite ^/category/(.*) https://$server_name/categories/$1 permanent;
rewrite ^/feed$ https://$server_name/index.xml permanent;
rewrite ^/feed/(.*) https://$server_name/index.xml permanent;

RSS feeds

My WordPress installation offers a limited number of items per feed presented in a full-text format. Hugo’s defaults, on the other hand, is an unlimited display of all your content (posts and pages in my case) presented in a summary format.

I’m not a fan of forcing subscribers to visit my site in order to access the full content, so I’ll be using my own custom rss.xml template. It’s easier than it sounds though, just copy Hugo’s RSS template and add it to theme/layouts/_default/rss.xml

In addition to showing the full content instead of just a preview, I’ll also limit the number of items listed, and exclude anything but posts.

Hugo RSS template

Modified rss.xml template on the left, original on the right.

# Show the 25 latest items only from content/posts
{{ range first 25 (where .Pages "Section" "posts") }} 

# Show the full content instead of a summery
{{ .Content | html }}

Now, with this configuration, I don’t have any feed for pages, which is fine by me. Should I change my mind I could always add another rss.xml template to something like theme/layouts/pages/rss.xml.

Sitemap

Hugo will create a sitemap including all your content, which is fine, but I want to exclude taxonomies as I’ve meticulously marked that type of content with a noindex tag. Therefore, in my case, it would simply be preferable to exclude those pages from the sitemap.

Again, copying Hugo’s sitemap template and adding it to theme/layouts/sitemap.xml allows me to customize the output.

Hugo's sitemap template

Modified sitemap.xml template on the left, original on the right.

# Exclude taxonomies from the listing 
{{ if not .Data.Singular }}
...
{{ end }}

Hmmm that doesn’t look quite right

What I failed to take into consideration was how much time I had to spend fixing the layout and markup of my old posts. I did figure I would have to spend some time adding open graph and twitter cards to my posts, but I did not anticipate so many pages being mangled beyond recognition.

A mangled post after being exported from WordPress

Content nested and mangled beyond recognition after being exported using the WordPress to Jekyll plugin.

When you have more than 200 posts to brush over, it translates to a lot of work. In the end, I found myself preferring to delete a lot of old posts rather than fixing them.

Parting notes

I can’t really blame Hugo for any of my mishaps. I should have spent more time reading the documentation and less time playing around with the different templates. But then again, where’s the fun in that.