Choosing the API and defining some ADTs

As I said in the previous article, for wp2o2b, the plan is:

So, the task is to download all the articles in my existing sites, reformat them into org-mode files with appropiate metadata for org2blog, store them locally in a hierarchy that mirrors the one on the server.

WordPress implements an XML-RPC interface for accessing your blog programatically. It supports older legacy styles of access (Blogger, MovableType, and metaWeblog), but recommends that for new development you work with their new API which, incidentally, has the nice benefit of being well-documented.

(Much of this new API was introduced in version 3.4, released in June, 2012—so it’s only six months old at this point. Normally I would hesitate to depend on something that new, if I cared about wide applicability, but WordPress is one of those things where I think you should be keeping up with releases, if only for security reasons, so I don’t perceive it as too much of a limitation.)

The first thing we have to do is implement a data structure for holding a post. Where in a dynamic language, you’d might just get back a big wodge of XML and pick at it as necessary in Haskell, you need to define a data type to hold your results.

So we’ll start there.

Working from the definition of a post in the API documentation what we end up with is something like:

data WPPost = WPPost {
  pPostId :: String,
  pPostTitle :: String,
  pPostDate :: CalendarTime,
  pPostDateGmt :: CalendarTime,
  pPostModified :: CalendarTime,
  pPostModifiedGmt :: CalendarTime,
  pPostStatus :: String,
  pPostType :: String,
  pPostFormat :: String,
  pPostName :: String,
  pPostAuthor :: String,
  pPostPassword :: String,
  pPostExcerpt :: String,
  pPostContent :: String,
  pPostParent :: String,
  pPostMimeType :: String,
  pLink :: String,
  pGuid :: String,
  pMenuOrder :: Int,
  pCommentStatus :: String,
  pPingStatus :: String,
  pSticky :: Bool,
  pPostThumbnail :: [WPMediaItem],
  pTerms :: [WPTerm],
  pCustomFields :: [WPCustomField]
} deriving Show

This refers to a few other structs that we’ve defined—the process is pretty straightforward, so I’m not going to go over it.

Then we need to take this type and give Haskell a way to convert back and forth from it to XML-RPC. The HaXR page on the Haskell wiki link to some example code, and the Network.XmlRpc docs include a little explanation on how to do this.

If your XML-RPC structure has names that are sufficiently unique to map to record names without conflicts, and you’re comfortable with Template Haskell, you could just do:

$(asXmlRpcStruct ''WPPost)

However, if you have an aversion to Template Haskell, or (perhaps more likely) you have field names that are generic enough to present significant conflicts (id, or type or some such) you will have to do it by hand by defining an XmlRpcType instance for your constructor. That ends up looking like:

instance XmlRpcType WPPost where
  toValue struct = toValue $ [("post_id", toValue (pPostId struct)),
                              ("post_title", toValue (pPostTitle struct)),
                              ("post_date", toValue (pPostDate struct)),
                              ("post_date_gmt", toValue (pPostDateGmt struct)),
                              ("post_modified", toValue (pPostModified struct)),
                              ("post_modified_gmt", toValue (pPostModifiedGmt struct)),
                              ("post_status", toValue (pPostStatus struct)),
                              ("post_type", toValue (pPostType struct)),
                              ("post_format", toValue (pPostFormat struct)),
                              ("post_name", toValue (pPostName struct)),
                              ("post_author", toValue (pPostAuthor struct)),
                              ("post_password", toValue (pPostPassword struct)),
                              ("post_excerpt", toValue (pPostExcerpt struct)),
                              ("post_content", toValue (pPostContent struct)),
                              ("post_parent", toValue (pPostParent struct)),
                              ("post_mime_type", toValue (pPostMimeType struct)),
                              ("link", toValue (pLink struct)),
                              ("guid", toValue (pGuid struct)),
                              ("menu_order", toValue (pMenuOrder struct)),
                              ("comment_status", toValue (pCommentStatus struct)),
                              ("ping_status", toValue (pPingStatus struct)),
                              ("sticky", toValue (pSticky struct)),
                              ("post_thumbnail", toValue (pPostThumbnail struct)),
                              ("terms", toValue (pTerms struct)),
                              ("custom_fields", toValue (pCustomFields struct))]
  fromValue v = do
    struct <- fromValue v
    a <- getField "post_id" struct
    b <- getField "post_title" struct
    c <- getField "post_date" struct
    d <- getField "post_date_gmt" struct
    e <- getField "post_modified" struct
    f <- getField "post_modified_gmt" struct
    g <- getField "post_status" struct
    h <- getField "post_type" struct
    i <- getField "post_format" struct
    j <- getField "post_name" struct
    k <- getField "post_author" struct
    l <- getField "post_password" struct
    m <- getField "post_excerpt" struct
    n <- getField "post_content" struct
    o <- getField "post_parent" struct
    p <- getField "post_mime_type" struct
    q <- getField "link" struct
    r <- getField "guid" struct
    s <- getField "menu_order" struct
    t <- getField "comment_status" struct
    u <- getField "ping_status" struct
    v <- getField "sticky" struct
    w <- getField "post_thumbnail" struct
    x <- getField "terms" struct
    y <- getField "custom_fields" struct
    return WPPost {
      pPostId = a,
      pPostTitle = b,
      pPostDate = c,
      pPostDateGmt = d,
      pPostModified = e,
      pPostModifiedGmt = f,
      pPostStatus = g,
      pPostType = h,
      pPostFormat = i,
      pPostName = j,
      pPostAuthor = k,
      pPostPassword = l,
      pPostExcerpt = m,
      pPostContent = n,
      pPostParent = o,
      pPostMimeType = p,
      pLink = q,
      pGuid = r,
      pMenuOrder = s,
      pCommentStatus = t,
      pPingStatus = u,
      pSticky = v,
      pPostThumbnail = w,
      pTerms = x,
      pCustomFields = y }
  getType _ = TStruct

Yeah, so that’s the obvious way to do it. And boy is it tedious—I need to figure out a better way to make this happen. because that’s a lot of pointless boilerplate.

It seems to me that I should somehow be able to define a small data structure and then pull the necessary bits out just once, rather than having to repeat everything at least twice. I guess that’s the purpose that the Template Haskell code serves, but I need more power.

Oh, well, it’s done for the moment.


I want to emphasize here that at this point, I’m just trying to get things done. I am intrigued by the theoretical underpinnings of Haskell (although my understanding of most of them is…shallow at the very least), but I’m also a working programmer—I need to be able to be productive. I want the benefits that I think Haskell has to provide—static typing to keep me from making as many dumb mistakes, good performance—but I have to be able to produce actual code for those things to be worth anything.

At the same time, I recognize that what I’ve just done probably represents a small chunk of technical debt. I’d love to learn enough to be able to pay it off.

Beginning of wp2o2b

I recently made the decision to start doing my blogging using org2blog. I’ve been using WordPress to host a couple of blogs (Radios Appear and Gurave, as well as my friend Chet’s blog Miscellaneous Heathen) for the last couple of years, but I’ve always done my writing in Emacs (using Textile for formatting), and then simply cut-and-pasted what I wrote into WordPress’s text entry box: of course a more integrated solution based entirely in Emacs sounded attractive.

And, indeed, I like the results. In addition to the original two blogs, I’m using this workflow to write my articles about exploring Emacs (Do You Even Lisp?) as well as this blog (I gave some serious thought to Brent Yorgey’s Blog Literately package, appropriately written in Haskell, but first, the Debian Haskell situation is in a lot of flux right now and I didn’t want to have to work that hard to package it and second, a little consistency of tooling is always welcome).

Being the sort of compulsive neat-freak that I am, though, I decided that it wasn’t enough to just create new articles using org2blog—what I really wanted was to export all the existing articles in my long-term blogs into my org2blog setup.

I was originally going to do this using Perl—it’s been my go-to language for the last 17 years, I can whip up what I want in it faster than in anything else, and I’ve been doing it long enough to write code that is readable after the fact. Besides, I already have a script for doing some WordPress interaction, so it would be a great starting point.

And then I decided that I was going to do it in Haskell (which I have come to realize was probably an even better choice than I initially expected—I’ll talk about that later).

So, the task is to download all the articles in my existing sites, reformat them into org-mode files with appropiate metadata for org2blog, store them locally in a hierarchy that mirrors the one on the server.

I think it’s going to be easier than I expected.