Simplify FAIL

So I said yesterday that I really wanted to find a better way to do our by-hand mapping of XML-RPC structs, because doing it by hand—and, specifically, repeating a bunch of information multiple times—was tedious, error-prone and ugly. Here’s a smaller struct we’re working with, for WPCustomField—smaller, but it’s still a bunch of boilerplate:

data WPCustomField = WPCustomField {
  cfId :: String,
  cfKey :: String,
  cfValue :: String
} deriving Show

instance XmlRpcType WPCustomField where
  toValue struct = toValue $ [("id", toValue (cfId struct)),
                              ("key", toValue (cfKey struct)),
                              ("value", toValue (cfValue struct))]
  fromValue v = do
    struct <- fromValue v
    a <- getField "id" struct
    b <- getField "key" struct
    c <- getField "value" struct
    return WPCustomField {
      cfId = a,
      cfKey = b,
      cfValue = c }
  getType _ = TStruct

So I started thinking about it. It seemed obvious to me that I would want to start with a list of tuples—each tuple establishing a mapping from XML-RPC attribute name to accessor function, and put in a list because I was going to need to keep their ordering in order to feed them to the data constructor in the proper order.

So I did this:

cfMapping = [("id", cfId),
             ("key", cfKey),
             ("value", cfValue)];

This seemed simple enough that I didn’t bother to write a type declaration, or let ghc-mod do it for me—in which case I might have seen the upcoming problem.

At first I thought my biggest limitation was going to be the fact that I couldn’t see a way to transform the fromValue function—while I could map over the entries in cfMapping, I didn’t see how I was going to be able to take the resulting list and give it to the WPCustomField data constructor.

Then it hit me—I could fold over the list, and partially apply each value to the Data constructor, so that when we got to the end of the list, we’d have an actual value.

Boy did I feel proud of that solution.1

Figuring that I had that problem licked, I decided to first rewrite the toValue function for the WPCustomField structure. I wrote a function that would map over our mapping list, and return the sort of list we were looking for:

mapToValue mapping struct = toValue $ map aListToValue mapping
    where aListToValue (key, accessor) = (key, toValue (accessor struct))

By making sure that the struct was the last thing handed in, I even got to write the new toValue function in a point-free style:

toValue = mapToValue cfMapping

So I compiled it and it ran. “Great!” I thought. “This is going to be easy!” And then I hit WPEnclosure:

data WPEnclosure = WPEnclosure {
  eUrl :: String,
  eLength :: Int,
  eType :: String
} deriving Show

Many of you will see what is wrong immediately. I hinted at the direction of the problem when I mentioned that I didn’t bother to write a type signature for the cfMapping list—because once you see it, and look at WPEnclosure, I think it becomes obvious what the problem is:

cfMapping :: [(String, WPCustomField -> String)]

That’s right—the eLength field being an Int among String fields means that we’ve got heterogenous tuple types for the WPEnclosure type. FAIL.

So, for the moment I’m going to put this cleanup on hold, and just proceed with the hand-rolled instances.

Footnotes:

1 In retrospect, I see that this wouldn’t work, because the types of each of those partially applied functions would not be the same, so the accumulator couldn’t typecheck. Oh, well, pride goeth before the fall and all that.

Choosing the API and defining some ADTs

As I said in the previous article, for wp2o2b, the plan is:

So, the task is to download all the articles in my existing sites, reformat them into org-mode files with appropiate metadata for org2blog, store them locally in a hierarchy that mirrors the one on the server.

WordPress implements an XML-RPC interface for accessing your blog programatically. It supports older legacy styles of access (Blogger, MovableType, and metaWeblog), but recommends that for new development you work with their new API which, incidentally, has the nice benefit of being well-documented.

(Much of this new API was introduced in version 3.4, released in June, 2012—so it’s only six months old at this point. Normally I would hesitate to depend on something that new, if I cared about wide applicability, but WordPress is one of those things where I think you should be keeping up with releases, if only for security reasons, so I don’t perceive it as too much of a limitation.)

The first thing we have to do is implement a data structure for holding a post. Where in a dynamic language, you’d might just get back a big wodge of XML and pick at it as necessary in Haskell, you need to define a data type to hold your results.

So we’ll start there.

Working from the definition of a post in the API documentation what we end up with is something like:

data WPPost = WPPost {
  pPostId :: String,
  pPostTitle :: String,
  pPostDate :: CalendarTime,
  pPostDateGmt :: CalendarTime,
  pPostModified :: CalendarTime,
  pPostModifiedGmt :: CalendarTime,
  pPostStatus :: String,
  pPostType :: String,
  pPostFormat :: String,
  pPostName :: String,
  pPostAuthor :: String,
  pPostPassword :: String,
  pPostExcerpt :: String,
  pPostContent :: String,
  pPostParent :: String,
  pPostMimeType :: String,
  pLink :: String,
  pGuid :: String,
  pMenuOrder :: Int,
  pCommentStatus :: String,
  pPingStatus :: String,
  pSticky :: Bool,
  pPostThumbnail :: [WPMediaItem],
  pTerms :: [WPTerm],
  pCustomFields :: [WPCustomField]
} deriving Show

This refers to a few other structs that we’ve defined—the process is pretty straightforward, so I’m not going to go over it.

Then we need to take this type and give Haskell a way to convert back and forth from it to XML-RPC. The HaXR page on the Haskell wiki link to some example code, and the Network.XmlRpc docs include a little explanation on how to do this.

If your XML-RPC structure has names that are sufficiently unique to map to record names without conflicts, and you’re comfortable with Template Haskell, you could just do:

$(asXmlRpcStruct ''WPPost)

However, if you have an aversion to Template Haskell, or (perhaps more likely) you have field names that are generic enough to present significant conflicts (id, or type or some such) you will have to do it by hand by defining an XmlRpcType instance for your constructor. That ends up looking like:

instance XmlRpcType WPPost where
  toValue struct = toValue $ [("post_id", toValue (pPostId struct)),
                              ("post_title", toValue (pPostTitle struct)),
                              ("post_date", toValue (pPostDate struct)),
                              ("post_date_gmt", toValue (pPostDateGmt struct)),
                              ("post_modified", toValue (pPostModified struct)),
                              ("post_modified_gmt", toValue (pPostModifiedGmt struct)),
                              ("post_status", toValue (pPostStatus struct)),
                              ("post_type", toValue (pPostType struct)),
                              ("post_format", toValue (pPostFormat struct)),
                              ("post_name", toValue (pPostName struct)),
                              ("post_author", toValue (pPostAuthor struct)),
                              ("post_password", toValue (pPostPassword struct)),
                              ("post_excerpt", toValue (pPostExcerpt struct)),
                              ("post_content", toValue (pPostContent struct)),
                              ("post_parent", toValue (pPostParent struct)),
                              ("post_mime_type", toValue (pPostMimeType struct)),
                              ("link", toValue (pLink struct)),
                              ("guid", toValue (pGuid struct)),
                              ("menu_order", toValue (pMenuOrder struct)),
                              ("comment_status", toValue (pCommentStatus struct)),
                              ("ping_status", toValue (pPingStatus struct)),
                              ("sticky", toValue (pSticky struct)),
                              ("post_thumbnail", toValue (pPostThumbnail struct)),
                              ("terms", toValue (pTerms struct)),
                              ("custom_fields", toValue (pCustomFields struct))]
  fromValue v = do
    struct <- fromValue v
    a <- getField "post_id" struct
    b <- getField "post_title" struct
    c <- getField "post_date" struct
    d <- getField "post_date_gmt" struct
    e <- getField "post_modified" struct
    f <- getField "post_modified_gmt" struct
    g <- getField "post_status" struct
    h <- getField "post_type" struct
    i <- getField "post_format" struct
    j <- getField "post_name" struct
    k <- getField "post_author" struct
    l <- getField "post_password" struct
    m <- getField "post_excerpt" struct
    n <- getField "post_content" struct
    o <- getField "post_parent" struct
    p <- getField "post_mime_type" struct
    q <- getField "link" struct
    r <- getField "guid" struct
    s <- getField "menu_order" struct
    t <- getField "comment_status" struct
    u <- getField "ping_status" struct
    v <- getField "sticky" struct
    w <- getField "post_thumbnail" struct
    x <- getField "terms" struct
    y <- getField "custom_fields" struct
    return WPPost {
      pPostId = a,
      pPostTitle = b,
      pPostDate = c,
      pPostDateGmt = d,
      pPostModified = e,
      pPostModifiedGmt = f,
      pPostStatus = g,
      pPostType = h,
      pPostFormat = i,
      pPostName = j,
      pPostAuthor = k,
      pPostPassword = l,
      pPostExcerpt = m,
      pPostContent = n,
      pPostParent = o,
      pPostMimeType = p,
      pLink = q,
      pGuid = r,
      pMenuOrder = s,
      pCommentStatus = t,
      pPingStatus = u,
      pSticky = v,
      pPostThumbnail = w,
      pTerms = x,
      pCustomFields = y }
  getType _ = TStruct

Yeah, so that’s the obvious way to do it. And boy is it tedious—I need to figure out a better way to make this happen. because that’s a lot of pointless boilerplate.

It seems to me that I should somehow be able to define a small data structure and then pull the necessary bits out just once, rather than having to repeat everything at least twice. I guess that’s the purpose that the Template Haskell code serves, but I need more power.

Oh, well, it’s done for the moment.


I want to emphasize here that at this point, I’m just trying to get things done. I am intrigued by the theoretical underpinnings of Haskell (although my understanding of most of them is…shallow at the very least), but I’m also a working programmer—I need to be able to be productive. I want the benefits that I think Haskell has to provide—static typing to keep me from making as many dumb mistakes, good performance—but I have to be able to produce actual code for those things to be worth anything.

At the same time, I recognize that what I’ve just done probably represents a small chunk of technical debt. I’d love to learn enough to be able to pay it off.

Beginning of wp2o2b

I recently made the decision to start doing my blogging using org2blog. I’ve been using WordPress to host a couple of blogs (Radios Appear and Gurave, as well as my friend Chet’s blog Miscellaneous Heathen) for the last couple of years, but I’ve always done my writing in Emacs (using Textile for formatting), and then simply cut-and-pasted what I wrote into WordPress’s text entry box: of course a more integrated solution based entirely in Emacs sounded attractive.

And, indeed, I like the results. In addition to the original two blogs, I’m using this workflow to write my articles about exploring Emacs (Do You Even Lisp?) as well as this blog (I gave some serious thought to Brent Yorgey’s Blog Literately package, appropriately written in Haskell, but first, the Debian Haskell situation is in a lot of flux right now and I didn’t want to have to work that hard to package it and second, a little consistency of tooling is always welcome).

Being the sort of compulsive neat-freak that I am, though, I decided that it wasn’t enough to just create new articles using org2blog—what I really wanted was to export all the existing articles in my long-term blogs into my org2blog setup.

I was originally going to do this using Perl—it’s been my go-to language for the last 17 years, I can whip up what I want in it faster than in anything else, and I’ve been doing it long enough to write code that is readable after the fact. Besides, I already have a script for doing some WordPress interaction, so it would be a great starting point.

And then I decided that I was going to do it in Haskell (which I have come to realize was probably an even better choice than I initially expected—I’ll talk about that later).

So, the task is to download all the articles in my existing sites, reformat them into org-mode files with appropiate metadata for org2blog, store them locally in a hierarchy that mirrors the one on the server.

I think it’s going to be easier than I expected.

Welcome to Functional Paradise!

I first started programming in 1980 on a Heathkit H-8 in Basic (Benton Harbor) when I was 10, and in the intervening 30-odd years I’ve programmed more-or-less competently in Basic, Modula-2, C, Turbo Pascal, Clipper, C++, Rexx, Bourne Shell, Perl, PHP and Javascript.

I’ve also been exposed to many other languages—Forth, COBOL, Fortran, Java, C#, Ruby, Python, and so forth. By exposed I mean I can read and perhaps divine the intent of simple, straightforward code, but wouldn’t easily understand complex code or write anything of any significance.

And of course I’ve probably forgotten most of that stuff anyway—I have no memory of even the barest syntax of Modula-2, say—not that I suppose it matters much: if you look at that list there’s not a whole lot to strongly distinguish one language from another.

Sure, some are strongly typed and some are weakly typed and some are compiled and others aren’t, but in the end they’re all imperative languages (sometimes with functional features baked in), and the ones that I’m really familiar with all fall into the general ALGOL family of languages.

Which is why Haskell has been so interesting to study and attempt to use—which is what I’ve been doing for the last year or so. And I still feel like I’m taking baby steps.

Like a parent dutifully taking home videos of those baby steps, I’ve decided to do a little log of my work with Haskell, both to remind me of what I’ve accomplished and perhaps entertain others.