Flying to Pelican

October 10, 2025

This site now gets built by Pelican. Formerly, it lived on WordPress(.com), but WordPress software updates over the last several years made writing there intolerable.

Moving a site is easy if you don’t care about fidelity. If you do care, it’s not hard, but tedious; what follows is an exhausting tour of how I moved the blog you’re reading now. It might contain helpful tips if you are also moving from WordPress to Pelican, but the directions won’t be useful verbatim on your site. This also is not a primer on technical skills: I won’t explain how to use a command line, nor the regular expressions that describe most find-replace operations.¹

Export

Export xml from WordPress for pelican-import to ingest.

Beware that the WordPress xml export contains private data, specifically commenter email addresses. Scrub the email addresses before² committing to source control:

Find	`<wp:comment_author_email>.*?</wp:comment_author_email>`
Replace	`<wp:comment_author_email>[scrubbed]</wp:comment_author_email>`

I also scrub ip addresses. It’s a bit silly to consider ip addresses private, but some people do, and I see no point keeping them.

Find	`<wp:comment_author_IP>.*?</wp:comment_author_IP>`
Replace	`<wp:comment_author_IP>[scrubbed]</wp:comment_author_IP>`

Your own email address will also appear; search for the @ symbol to check what’s left.

The other commenter identifiers are things that WordPress displays with each comment, so they’re public already.

Scrubbing the non-public identifiers that belong to other people puts my mind at rest about booby traps in history, but only because I don’t plan to make this blog’s source repo public. If I planned to make this site’s source public, I would not save the WordPress xml in history at all; it still contains non-public data, such as drafts; I don’t think think it contains anything sensitive, but it would be hard to say for sure.

Import

Using Pelican 4.9.1, I run quickstart before import, but I think only the second command actually is necessary:

pelican-quickstart  # probably I wasn't meant to run this first

pelican-import --wpfile --wp-attach --dir-page [backup xml]

I remove the auto-generated Makefile and publishconf.py and tasks.py. For deployment, I wrote a shell script to delete drafts and rsync --checksum the output to my host, as recommended.

The importer also created an .rst file “custom styles” I guess because I had custom css in WordPress. Delete it:

rm content/wp-global-styles-pub-2ftwentysixteen.rst

Add 404 page as documented by Pelican custom 404.

To avoid spurious 404s, create an empty file content/robots.txt and add robots.txt to pelicanconf STATIC_PATHS.

For testing, I publish to a subdomain blogtest.josiahulfers.com as well as running locally. Locally, I run Pelican with:

pelican -dlr

Without -d, it’s easy to view outdated output, either because I refresh too quickly after making a change, or because an .rst syntax error caused page regeneration to fail.

Scanning the existing site

Use wget to create an easily-searched copy of my existing site:

wget --mirror 'https://josiahulfers.com'

This copy makes it easy to list most links that need redirection:

find josiahulfers.com -name index.html

The above finds a handful of links that I’ll ignore as they appear to be pages dedicated to specific attachments. The rest do need redirects. With the wget artifact index.html removed and the segments replaced by Pelican equivalents, the patterns are:

/category/{category.slug}/
/category/{category.slug}/feed/
/tag/{tag.slug}/
/tag/{tag.slug}/feed/
/{year}/{month}/{day}/{article.slug}/
/{year}/{month}/{day}/{article.slug}/feed/  (article comments)
/{year}/{month}/  (pelican handles this without redirect)
/author/ulfers/
/author/ulfers/feed/
/author/ulfers/page/{n}/  (paginated)
/page/{n}/ (paginated lists of posts)
/reading-list/
/reading-list/feed/
/about/
/about/feed/
/feed/

A couple patterns aren’t found by wget, but still need redirects or rewrites:

R /feed/atom
R /{year}/{month}/{day}/  (articles from that day)

Pelican’s *_ARCHIVE_SAVE_AS settings replicate year and month archive patterns. Since I almost never publish more than once a day, I don’t consider the day indexes like josiahulfers.com/2018/05/31/ useful, so I redirect them to the month.

I’ll put redirects in an Apache config file that lives in source control as josiahulfers.vhost and gets copied to the server by my deployment script; .htaccess would be an alternative. The basic redirects are:

# One for each page, for example
RedirectMatch "^/about/?$" /pages/about.html

# I am the only author
RedirectMatch "^/author/" /

# I won't use pagination
RedirectMatch "^/page/\d+/?" /

# Flattened paths to articles, plus .html extensions
RedirectMatch "^/\d{4}/\d{2}/\d{2}/([^/]+)/?$" "/$1.html"

Instead of redirecting the last, Pelican could replicate the deeper directory structure for articles, but I don’t think it adds much and I prefer the output html structure more closely match the flat source structure in the “content” directory. That flat source structure, plus setting Pelican’s SLUGIFY_SOURCE = 'basename' ensures slugs are unique, which will be useful later.

More redirects appear below as needed; the order in this article will not necessarily match the order necessary in Apache config.

Categories and tags

Some of my old articles are in “categories,” but “tags” can do anything “categories” can do, so I don’t use “categories” anymore. This makes WordPress categorize my newer articles in the “Uncategorized” category. Pelican’s default theme assumes that articles have a category, so I set the default category in config: DEFAULT_CATEGORY = 'Uncategorized'. Later, I’ll remove the category-related parts of the theme and this won’t be required.

Also, stock Pelican doesn’t allow articles to be in multiple categories, but WordPress did; I use editmetadata.py to rewrite .rst file metadata, moving categories to tags.

WordPress redirects /category/whatever/ to /tag/whatever (assuming the whatever tag exists). For Apache, I use:

Redirect /category/uncategorized /
Redirect /category/ /tag/

# Apart from .html suffix Pelican /tags/ pattern matches WordPress
RedirectMatch "^/tag/([^/.]+)/page/\d+/?" "/tag/$1/"
RedirectMatch "^/tag/([^/.]+)/?$" "/tag/$1.html"

Relocate pictures and attachments

I don’t care for the folder structure that put images into folders reflecting a date. I move them and skip redirects since I never meant for them to be viewed out-of-context:

find content/images -type f | wc -l
find content/20* -name '*.png' | wc -l
find content/20* -name '*.jpg' | wc -l
find content/20* -name '*.gif' | wc -l

find content/20* -name '*.png' -exec mv {} content/images \;
find content/20* -name '*.jpg' -exec mv {} content/images \;
find content/20* -name '*.gif' -exec mv {} content/images \;

Verify the count to ensure there were no collisions:

find content/images -type f | wc -l

Then replace the paths in .rst files, using Pelican’s internal link format:

Find	`image:: \{static\}\d{4}/\d\d/(.*)`
Replace	`image:: {static}images/\1`

Find	`:target: \{static\}\d{4}/\d\d/(.*)`
Replace	`:target: {static}images/\1`

Find	`<\{static\}\d{4}/\d\d/(.*.(png\|jpg\|gif))`
Replace	`<{static}images/\1`

Look for remaining attachments, there are just a couple:

find content/20* -type f

I create a content/attachments folder and add it to Pelican’s STATIC_PATHS, then move the last files and delete the nuisance year folders.

Fixing rst

Lots of markup fails to translate from WordPress to reStructuredText. Most obvious are a pile of warnings that appear upon generating the site immediately after import; I fix each by hand.

Html 5

Pelican doesn’t plan to use the html 5 writer until version 5.0 and I’m on 4.11. That could be a big a big breaking change and I’d rather start using html 5 as soon as possible to avoid rework, but I don’t see a readily available solution. There’s a Pelican plugin that subs in Docutils html5_polyglot writer, but best not look too closely at it since it’s not licensed for any use. Writing my own plugin would be simple enough: docs for pelican.readers.RstReader explicitly mention how to do it, but I’ll postpone that to another day.

Links

Links that should be relative became absolute; instead, they should use Pelican’s internal link format that begins with {filename}.

Find	`<https://josiahulfers\.com/\d{4}/\d\d/\d\d/([^>]+)/>`
Replace	`<{filename}\1.rst>`

I do a similar find-replace for links to https://josiahulfers.com/tag.

Image links were broken in pages files and one broken image link showed up as [gallery…].

Images

I remove the pointless ?w=[number] parameters on image links. This is a little too broad and hits a link it should have left alone.

Find	`image:: ([^?]+)\?\S+`
Replace	`image:: \1`

There are wp-image classes and other extraneous classes I won’t use:

sed -i '/:class:.*wp-image/d' content/*.rst
sed -i '/:class:.*wp-image/d' content/pages/*.rst

While at it, I also remove the :target: attribute and explicit image dimensions – :height: and :width: – since I don’t think most images deserve to be links and the dimensions distored the images. Some articles might be better laid out with some images scaled differently, but that is not readily automated, so I plan to revisit along with removing the image substitutions.

Substitutions, used by the importer for most images, make Docutils render akward html when the images aren’t inline:

<p>… text … </p>
<p><img… </p>
<p>… more text … </p>

I’d rather the <img> not be the solo content of a <p>, and using an ordinary image directive instead of a substitution would do that, but is not a change easily done with find-replace.

Some image substitutions picked up a trailing backslash, for example, |Diagram of measurements|\ Android …. These aren’t the only examples of useless or harmful extra backslashes; a couple show up as errors when generating html, but most don’t. I deal with them as I find them, and later will search for all occurrences to eliminate the stragglers.

In rst, the image role doubles as video for html, but I’ll replace that with the raw:: html, below.

Video embeds

WordPress automatically renders YouTube share links (links to youtu.be) as embeds, but they remain the original links in the source. To make them embeds in rst:

Find	`https://youtu\.be/(\S+)`
Replace	`.. raw:: html\n \n <iframe src="https://www.youtube.com/embed/\1" title="TODO: FILL BY HAND" allowfullscreen></iframe>`

I also translate the t=… url parameters to start=…, expressed in seconds, since the query string parameters for youtu.be don’t match youtube.com/embed.

It bugs me that embedding a YouTube video causes the browser to load Google Garbage in the background. Using the iframe attribute loading=lazy defers loading till it’s likely to become visible and that plus a details element to make the frame initially hidden makes browsers defer loading YouTube until displayed. I add the autoplay query parameter so opening the details element starts the video:

<details><summary>[video title]</summary>
   <iframe src="…?autoplay=1" loading=lazy …></iframe>
</details>

I do similar for one Vimeo embed.

Desktop Chrome doesn’t autoplay YouTube videos shown in this way, but that’s not so bad: it just means you need two clicks to play it instead of one. The problem is particular to YouTube, as desktop Chrome does autoplay the Vimeo video. Similarly, because the lazy-loading iframes is a young feature, older browsers will need two clicks to play the video: one to reveal it and one to click play.

Because I did not give the iframe allow=autoplay, browsers should not play it immediately (and invisibly) on page load, but write me if you find a counterexample.

Strikethrough

Weirdly, reStructuredText lacks strikethrough, causing <del> tags to become, for example [STRIKEOUT:YouTube] in Print text messages: video edition. Search for these and replace with substitutions plus raw html directives:

… my first |YouTube| …

.. |YouTube| raw:: html

   <del>YouTube</del>

Footnotes

Footnotes became a pattern like \ `1 <#footnote-1>`__; I replace with rst-style footnotes.

Find	\\ `\d+ <(#footnote-\d+)>`__
Replace	`\x20[\1]_` (\x20 is leading space)

Then go to each modified article and manually fix the list of footnotes at its end.

To make footnotes render as superscript in the text takes the Docutils footnote-references setting. This could go in Pelican’s DOCUTILS_SETTINGS, but I put it in docutils.conf.

One footnote doesn’t include the footnote- prefix in the anchor, so looks like `1 <#1>`__. Fix it by hand.

“Smart” glyphs

Docutils has a smart_quotes setting to convert easily-typed Ascii to Unicode.

Source	Render
`"Double 'single' quotes"`	“Double ‘single’ quotes”
`isn't`	isn’t
`'tis 'twas`	‘tis ‘twas technically incorrect
`...`	…
`--`	–
`---`	—

Rendering ’em pedantically requires using the actual Unicode character in the source, or a unicode directive, but I’ve never needed such a thing on this site before now.

I’ve apparently been writing my dashes wrong on WordPress for some time. In places where I intend an em dash, I’ve been writing a single dash surrounded by spaces: word - word, which WordPress renders as en dash. I should replace space-surrounded hyphens with ---, but a simple find-replace would wrongly target minus signs in code. There are enough of these that I’ll leave them as hyphens for now and come back to fix it later.

Blockquote citations

WordPress structures blockquotes as:

<blockquote>
   <p>
      <cite>

Although sensible, this technically violates the spec:

Attribution for the quotation, if any, must be placed outside the blockquote element… [it] is not part of the quote and therefore doesn’t belong inside the blockquote itself.

—Whatwg on blockquote

Perhaps because it’s technically invalid, Pandoc didn’t translate WordPress block quotes to the reStructuredText syntax for block quotes with attribution, which ought to look like this:

Although sensible, this technically violates the spec:

   Attribution... must be placed outside the blockquote element.

   -- `Whatwg on blockquote`_

To find what needs fixing, I grep <cite in the WordPress xml backup and fix them by hand.

The html that reStructuredText renders is also non-compliant. I expect this to change when switching to Docutils html 5 writer:

<blockquote>
   <p>Quoted text
   <p>
      <a>Attribution

If a <cite> were present, Whatwg would insist I’m misusing it here:

A person’s name is not the title of a work — even if people call that person a piece of work — and the element must therefore not be used to mark up people’s names.

—Whatwg on cite

They want the structure to be:

<figure>
   <blockquote>
   <figcaption>
      [author] <cite>

Some places where I use blockquotes might be better styled as pull quote or sidebar. For now, I’ll leave them be, but might come back to change them later.

Shortcodes

Shortcodes are WordPress markup that looks like html tags, but with square brackets. Search for the opening of the end tag \[/ to find those not translated to reStructuredText. I find several.

Video shortcodes

Replace [wpvideo] shortcode — a self-hosted video rather than an embed — with raw:: html. I could use the rst image directive instead; it accepts .mp4, but I don’t see how to add captions or poster image and, until the html 5 writer, it renders an <object> tag instead of <video>. Maybe there’s a way to get captions and the poster image, but I see little advantage to using the rst directive instead of raw html for videos.

Raw html does require the link be absolute, since Pelican doesn’t adapt the {static} placeholder in raw html. Later, I’ll change the theme not to display article summary, so this won’t matter, but it’s good practice anyway.

Search \battachments/ — folder created above — to find other places that might be affected.

Code blocks

Blocks of [code] lost indentation. They became line blocks like this:

| [code language="java"]
| public String poke(String stooge)
| throws StoogeNotFoundException {
| if (stooges.contains(stooge)) {
| return "Woopwoopwoopwoop";
| } else {
| throw new StoogeNotFoundException("Wise guy, eh");
| }
| }
| [/code]

I fix them by hand: the easiest way is to visit the original WordPress article in a browser and copy-paste. Pelican does support syntax hilighting, but I won’t use it, as I’ve come to see it as a distraction in articles like these.

The import also added backslashes before *, _ and \ characters in code blocks. Copy-pasting the original code mostly fixes these, but some are leftover outside blocks like the above. Search for the following, not regex:

\\
\*
\_
\ (backslash followed by space, some necessary, most weren’t)

Extraneous backslashes plus unwanted linebreaks also appeared in the Truth-Importance blocks in Exception Rules:

| **Always keep the cause when chaining exceptions**
| **Truth:**\ *high
  *\ **Importance:**\ *high*

Figures

WordPress renders images with a [caption] as figures, but instead of rst figures, they became ordinary image substitutions.

Find	`\[caption\b.+?\\|(.?)\\|\s(.*?)\[/caption\]`
Replace	`.. figure::\n :alt: \1\n \n \2`

Then move the related elements manually, so they no longer use substitutions.

Pelican doesn’t render these as <figure> elements yet, but presumably will when switched to the html 5 writer.

Theme

I copy Pelican’s built-in “simple” theme to a theme folder in my working dir and set THEME='theme' in Pelican config to use it. Even the “simple” theme can be substantially simplified, since I need few of its customization options. By the end, I’ll modify it so heavily it’s nearly unrecognizable.

cp -R venv-blog/lib/python3.11/site-packages/pelican/themes/simple theme

Theming is mostly a matter of ordinary html, css, and taste, but a few things are notable.

The pages template used publication date, thus requiring :date: metadata in pages rst files. This would be too easy to forget when I change them, so I remove it from the template, which in turn lets me remove it from the .rst files.³
Remove the lines showing author since it’s always me, and delete the relevant metadata from the articles.
Change DEFAULT_DATE_FORMAT to month day, year.
Remove default summaries (snips of first part of each article), I’d prefer showing well-chosen summaries on the index page, but the auto-generated summaries are clumsy.
Remove “translations” from the theme as I’ll probably never localize this.
Remove pagination as I doubt I’ll write enough for pagination to become useful.
Set *_SAVE_AS setting for archive, author and category to empty, making those parts of the theme unnecessary and deletable.

The simple theme writes an <h2> for both the article (or page) title and the first level of sub-headings in the article content, meaning the article title gets the same heading level as sections with the article. Docutils initial_header_level setting could correct it, but presumably would also affect the feed, which might seems weird to some. Plus, my gut says that the article title is the most important title, so I make the article heading <h1> in the template instead of changing Docutils settings; H1 is also the level of the site title, but that’s ok by me.

To get a “top tags” feature, I write a custom Jinja filter, since I don’t see a way do the equivalent of this with builtin Jinja filters:

def top_tags(tags):
    return sorted(tags, key=lambda p: len(p[1]), reverse=True)[:12]

JINJA_FILTERS = {'top_tags': top_tags}

Feeds

The pelicanconf.py generated by the quickstart command turns off some feed generation with a comment that says “feed generation is usually not desired when developing,” but I don’t see why not. I remove these.

With find . -name 'feed' I can locate feed patterns in the wget download. Cleaned up, they are:

..

`/tag/{slug}/feed`
`/{year}/{month}/{day}/{article_slug}/feed`	Comments for the article
`/{page_slug}/feed`
`/category/{slug}/feed`
`/comments/feed`
`/author/ulfers/feed`
`/feed`

I will only preserve the last — the main feeds — as I don’t expect to publish enough for the more targeted feeds to be useful. I set a pile of feed settings:

`RSS_FEED_SUMMARY_ONLY`	`False`
`[CATEGORY\|TAG\|AUTHOR]_FEED_*`	`None`. Not useful for my little site.
`FEED_ATOM` and `FEED_RSS`	Docs say required, but the feeds I want get generated without them. I leave them out.
`FEED_ALL_ATOM` and `FEED_ALL_RSS`	`'feeds/all.atom'` and `'feeds/all.rss'`. The .rss and .atom extensions appear in `/etc/mime.types`, making Apache set Content-Type header correctly.

I remove the Rss feed <link> tag from the theme’s base.html because when I point a feed reader to the home page, the reader shows a choice of either Rss or Atom, an unnecessary and confusing choice as there’s no material difference for my site. At risk of speculative generalarity, I keep atom over rss because if I later define article summaries, atom can show them. This differs from the Wordpress default, rss.

I assume that some badly-built feed readers won’t handle 302 redirects well, so I use an internal redirect to keep the feed url change transparent to existing readers:

RewriteRule "^/feed/?$" "/feeds/all.rss"
RewriteRule "^/feed/atom/?$" "/feeds/all.atom"

On the slim chance than anybody followed the topic feeds, I redirect them as well:

RedirectMatch "/feed/?$" /feed/
RedirectMatch "/feed/atom/?$" /feed/atom/

Unlike Pelican, which uses tag uris, WordPress uses feed item identifiers that look like links:

<item>
   <link>https://josiahulfers.com/2025/07/25/sunset-on-wordpress/</link>
   <guid isPermaLink="false">http://josiahulfers.com/?p=2339</guid>
   …

Feed readers shouldn’t assume they can link to the <guid>, but some probably do. I will, therefore, redirect links from the guid tags, but first, I’ll find a way to keep the old guids in the feed.

Keeping feed items unread

Have you ever seen a slew of old articles suddenly become “unread” in your feed reader? I have, and it’s probably because somebody switched blog hosts.

I assume feed readers vary in how they remember which articles are read and which are unread, but there are only a few logical things ways to record that status: the most obvious is to use item identifiers — <guid> in rss and <id> in atom — as keys to look up unread status. The second most obvious strategy would be to key unread status off the <link> value. If I’m right, I’ll need to carry these from WordPress to the imported articles, or my followers will see old articles appear unread.

I write a plugin using article metadata; run as a script, this plugin module imports feed identifiers from WordPress xml, storing them as :feed_guid: and :feed_link: metadata in each article. This would be better done in Pelican core, so I ask Pelican’s forum about it, to no reply so far.

For a small plugin like this, Pelican’s PLUGINS setting is simpler than auto-discovery, and I like it better anyway for its explicitness.

Examining a few feed readers, to check my guess, I see one has an option to mark changed articles as unread, so could there be other tags relevant to marking old posts unread? Atom’s <updated> tag perhaps?

The “atom:updated” element is a Date construct indicating the most recent instant in time when an entry or feed was modified in a way the publisher considers significant.

—Rfc4287

To me, moving to another host is “significant,” but what readers consider significant would be a better standard.

Pelican uses :modified: metadata to set <updated>, so I wonder if pelican-import should copy the last mtime from <wp:post_modified> in WordPress xml. This could make the mod time technically correct, but seems near-pointless as I don’t show mod time in my theme and if it’s ever important, I can get it from source control. Without mod time set, Pelican falls back to the publication time, so dropping mod time moves the updated feed metadata toward the past, not the future, and I can tolerate the chance that feed readers see that as a change. It’s particularly unlikely to be a problem because WordPress doesn’t make Atom discoverable, so probably nobody uses this Atom feed, and Rss has no equivalent.

If a feed reader with an option like “mark updated articles unread” has a different strategy, such as to look at article content, there’s little I can do about it. Preserving every aspect of the feed is impractical.

With feed guids now written into the .rst files, I can add redirects for the WordPress identifiers that look like links, on the assumption that some readers will link to these. Using Apache config to match url query parameters is more complicated than other redirects. I do it in two steps, first an internal redirect:

RewriteCond %{QUERY_STRING} "p=([0-9]+)"
RewriteRule "^/$" "/wordpressfeed/%1?"

Then a 302 redirect for each article:

egrep '^:feed_guid:.*' content/*.rst \
| sed -E 's~content/([^/]+)\.rst.*\?p=([0-9]+)~RewriteRule "^/wordpressfeed/\2$"\t/\1.html [R]~' \
>> josiahulfers.vhost

Which creates a pile of redirects like this, for example:

RewriteRule "^/wordpressfeed/1510$" /why-not-pdf.html [R]

Finally, I set FEED_MAX_ITEMS = 10 in Pelican config, to avoid the feed suddenly including articles more articles than it previously did.

Link testing

Using the archive created by wget, I can create a pageful of links I expect to be found:

find josiahulfers.com -name 'index.html' \
   | sed 's_^_- http://blogtest._g' \
   | sed s/index.html//g \
   >> content/pages/linktest.rst

And same without the trailing slashes:

find josiahulfers.com -name 'index.html' \
   | sed 's_^_- http://blogtest._g' \
   | sed 's_/index.html__g' \
   >> content/pages/linktest.rst

Add feed item identifiers that look like links:

egrep -ho '^:feed_guid:.*' content/*.rst | sed -E 's/.*(http.*)/- \1/g' \
>> content/pages/linktest.rst

Download the original sitemap.xml and add those links:

egrep -o 'loc>[^\<]+</' josiahulfers.com/sitemap.xml \
| sed -E 's_.*https://([^<]+).*_- http://blogtest.\1_g' \
>> content/pages/linktest.rst

The sitemap includes a bunch of links to attachments like /wp-content/uploads/2012/07/winxp-after.png. I could redirect these to images, but they weren’t visible enough for wget to find them and I never intended them to be shared outside the context of the article that linked them anyway, so I think it’s fine to break thse links.

There’s a Pelican plugin that can generate sitemaps, but sitemaps seem pointless to me.

Test redirected links with wget:

wget --spider -rl 1 -H --no-verbose 'http://localhost:8000/pages/linktest.html'

Test all internal links:

wget --spider -rl 0 --no-verbose 'http://localhost:8000/'

The above doesn’t report certificate errors. Perhaps there’s a way to make it do so, but I didn’t look into that.

Search

There’s no client-side search like Sphinx has.

I’d prefer not to rely on third-party indexing, but it’s the easiest option for search. DuckDuckGo could make this easier if they allowed an url parameter to restrict results to a site, but they don’t. I could do it with a little js, but I haven’t needed JavaScript so far, so I’ll use Apache redirects. In WordPress, the s=… query parameter searches, so the analogous Apache config incantation is:

RewriteCond %{QUERY_STRING} "\bs=([^&]*)"
RewriteRule "^/?$" "https://duckduckgo.com/?q=%1+site:josiahulfers.com" [NE,R]

Math

I sometimes write mathematical formulae, in Calculations for pinch-to-zoom, for example. These used \$latex notation in WordPress

Find doubled backslashes \$latex.*\\\\, replace with single backslash
Replace math blocks ^(\s*)\$latex\s+([^$]+?)\s*\$$ with \1.. math::\1 \1 \2
Search for backticks in inline math :math:`[^`]*`[^$]*\$ (found none, so next substitution is safe)
Replace inline math \$latex\s+([^$]+?)\s*\$ with :math:`\1`

Might be ok to use MathML now. It was removed from Chrome in 2013, “because the code [was] not yet production ready,” but implemented ten years later, presumably when somebody noticed that none of the Chrome code is production-ready, so what’s the difference? If I want to use it, Docutils has a setting to enable MathML; for now, however, less than two years of Chrome support doesn’t seem mature enough.

Allowing comments

I won’t consider Disqus. Fossil has a built-in forum, and I suppose I could easily enough create a per-article forum thread and iframe it into each article or something like that.

But either is overkill; I already have a system for receiving and moderating comments: email. I write too infrequently or too poorly to inspire many comments, so reviewing and manually posting whatever people send won’t be hard.

I’ll create a file of comments per article, if there are any comments. The comment files go in the templates directory, which is weird, but the easiest way to allow embedding with Jinja:

<details><summary>Comments on <em>{{ article.title }}</em></summary>
{% include ['comments/' + article.slug + '.comments.html', 'comments/_empty.html'] %}

Iframing would avoid the oddity of putting content in the templates directory, but break links to particular comments, an informative note on usb interference for example.

To preserve the old comments, Python script copy-wp-comments.py creates comment files from the WordPress xml export.

Writing comments as html will be a minor nuisance, so maybe someday I’ll write a Pelican plugin to translate comments from .rst. On the other hand, if translating comments to html ever becomes burdensome enough to be worth that effort, I’ll more likely switch to a traditional commenting system.

I already use WordPress to forward various aliases @josiahulfers.com to an account that I monitor, so I add another: comments@josiahulfers.com and a mailto link at the bottom of the article template. I’ll keep the forwarding in WordPress.com for now, but I’ll need to use something else if I drop them as registrar.

Transfer

The end in sight, I write a final post, Sunset on WordPress to inform email subscribers that I’ll no longer be mailing them new articles. Then copy that last article’s content to the appropriate rst file, create a redirect for the feed guid url, and add its links in linktest.rst.

Last, I switch to my hew host (details not particular to Pelican), remove blogtest subdomain from linktest.rst and re-test links.

Notes

[1]	Uses of regular expressions do not mean I endorse parsing irregular grammars like xml with regex! Use with supervision.

[2]	I had to shun the first WordPress xml archive I checked in because I didn’t originally notice the email addresses.

[3]	The `article.html` template renders date, but does not fail when articles with `draft` status omit date. Omitting date on articles with `published` status, however, does cause an error.