RSS! If you're one of the dozen-or-so bots that crawl this website weekly, you've probably seen this coming for a while. I've come back to RSS time and time again because it's great! It can be a welcome relief from traditional social-media and I highly recommend trying it. While reading through an online discussion a few weeks back I came upon a comment that really struck me (sadly that comment is lost to time). Essentially the commenter noted that for those seeing a lack of RSS in the world, be the change! So...I decided to add RSS to this blog!
Thankfully instead of listening to the voices of reason I listened to a random blog post (also lost to time) from someone who said they hand-crated their RSS feed. Although this sounded like a lot, I know a lot of older protocols/specs are actually quite simple so I decided to look into integrating this into my generator script that already builds this site. After looking at a few examples, I got to work adding Yet Another Feature to my growing script!
In short, RSS is really just an .xml
file that must contain a tiny few tags. Being me, I decided to aim for the few tags that were needed and would add something of value to this blog, forgoing such tags as lastBuildDate
and ttl
. This left me with something like:
To make that simple structure even easier, everything but the item
-related tags are static (since my blogs' base information doesn't change from week-to-week). This means I could leave the top half in the ./parts/assets/
folder and further keep the content out of the ./generator.sh
script! Sadly this is where it stops getting easy and starts getting hack-y.
To actually insert the blog posts and all their details into the item tags I turned to simple but in-elegant solutions. In the script, while we're processing each file, we check to see if the file is in a directory called "blog" and subject it to the rss code.
The easiest thing to parse out is the link
. For this, we ask for the base blog url to be configured in the config options at the top of the script (ex: "https://example.com/blog/"). Then we concatenate that to the basename
of the file and that's it!
Then comes the worse-but-workable title
and pubDate
tags. For each of these I ended up with similar grep | cut | cut
commands. grep
is great, and here I used it to find all lines containing the text: "h2". This is premised on the idea that I'll never include another set of h2 tags before the title...we'll see if that holds true! This (hopefully singular) file line is passed to cut
twice, which first strips off the first part, then the second, depending on if we're looking for the title or date.
Finally, our greatest sin, the description
tag. For a while I ran the feed with the pubDate
also acting as the description
tag (for simplicity), but it just doesn't look good in my RSS reader! There's three generally accepted 'standards' for what to put in this field.
Technically it's not escaping but entity-encoded HTML that's allowed in the description
. Interestingly there's two ways to do it, convert the tags yourself or use a CDATA section. Since I'm not keen on turning this Bash script into a proto-compiler I opted for using a CDATA wrapper. This is actually pretty easy in practice, all you do is wrap your post HTML with the CDATA start/end sequences, like these great examples. That said, the gotcha is when you start nesting things it can get messy. In theory this is avoided by not showing-off your cool CDATA-wrapping-techniques inside a CDATA-wrapped section of XML, but I'll error on the side of caution at this point.
Sadly CDATA doesn't get rid of all the HTML that doesn't need to exist, so I added extra fragility by tail
and head
ing the post content first, stripping off the body
, section
, and h2
lines/tags.
Alrighty, that's all I've got to confess for today. If you'd like to see what the above results in, here's the direct link to the feed file. As usual, I had a great time learning about the ins-and-outs of this specification and really appreciate that it looks good in my feed reader. I'd encourage anyone who has control over such things to consider supporting RSS again. I don't yet miss it, but probably will some day.
Happy Hacking!
- Chris