Replacing Nothing values with Just values in a nested structure

This is what I expect will become the first of a series of posts where I detail some of the solutions I’ve arrived at, based on the test programs that I’ve written to work through the problem.

The situation I was running into was this:

I was writing tests for some parsing code. The output of the parse was a nested data structure. Some (well, many) parts of the data structure were optional. One in particular was giving me a hard time, though—it was a default being set from a dynamic value being passed into the parser. Obviously I had to get that same dynamic value into the test data as well, in order for them to match.

(I will certainly admit that there might be better ways to structure all this, but this is what I have for the moment.)

Anyway, at the time I run a test, I have the test case (which has both the inputs and the expected outputs), and I have the dynamic value, and I needed to insert that dynamic value into the expected outputs before comparing with the actual outputs.

Oh, and did I mention that there was a list of possible outputs, all of which needed to be massaged?

So, I created the following fake data structure that, nonetheless, mirrors the attributes of my test data structure that I care about.

Anyway, the lens library makes these sorts of traversal-with-replacement bits of code quite concise. In this example code, we use over to apply the function that is its second argument (which will either simply pass along an existing Just value, or return a constant value, and wrap either in another Just) to all of the entities that are described by its second argument—that is the containerItem contained in each entry in the collectionItems list.

There is a part of me that suspects that there’s an even easier way to do this—it seems like there should be some way for me to narrow the traversal to only pick up items with the value of Nothing (using the _Nothing prism), and then simply setting those items to the default value. But the obvious way of constructing that (adding _Nothing to the traversal, and dropping the fromMaybe in favor of a simple Just value) did not typecheck.

{-# LANGUAGE OverloadedStrings, TemplateHaskell #-}module Main whereimport Control.Lensimport Data.Maybeimport Data.Textdata Item = Item {
  _itemName :: Text} deriving (Show)
makeLenses ''Item
makePrisms ''Item

data Container = Container {
  _containerName :: Text,
  _containerItem :: Maybe Item} deriving (Show)
makeLenses ''Container
makePrisms ''Container

data Collection = Collection {
  _collectionName :: Text,
  _collectionItems :: ![Container]
} deriving (Show)
makeLenses ''Collection
makePrisms ''Collection

collection :: Collectioncollection = Collection "Outermost" [Container "A" (Just (Item "A")),
                                     Container "B" (Nothing),
                                     Container "C" (Just (Item "C")),
                                     Container "D" (Nothing),
                                     Container "E" (Just (Item "E")),
                                     Container "F" (Just (Item "F")),
                                     Container "G" (Just (Item "G"))]

makeDefault :: Collection -> CollectionmakeDefault =  over (collectionItems.each.containerItem) (Just . fromMaybe (Item "Bar"))

Simplify FAIL

So I said yesterday that I really wanted to find a better way to do our by-hand mapping of XML-RPC structs, because doing it by hand—and, specifically, repeating a bunch of information multiple times—was tedious, error-prone and ugly. Here’s a smaller struct we’re working with, for WPCustomField—smaller, but it’s still a bunch of boilerplate:

data WPCustomField = WPCustomField {
  cfId :: String,
  cfKey :: String,
  cfValue :: String
} deriving Show

instance XmlRpcType WPCustomField where
  toValue struct = toValue $ [("id", toValue (cfId struct)),
                              ("key", toValue (cfKey struct)),
                              ("value", toValue (cfValue struct))]
  fromValue v = do
    struct <- fromValue v
    a <- getField "id" struct
    b <- getField "key" struct
    c <- getField "value" struct
    return WPCustomField {
      cfId = a,
      cfKey = b,
      cfValue = c }
  getType _ = TStruct

So I started thinking about it. It seemed obvious to me that I would want to start with a list of tuples—each tuple establishing a mapping from XML-RPC attribute name to accessor function, and put in a list because I was going to need to keep their ordering in order to feed them to the data constructor in the proper order.

So I did this:

cfMapping = [("id", cfId),
             ("key", cfKey),
             ("value", cfValue)];

This seemed simple enough that I didn’t bother to write a type declaration, or let ghc-mod do it for me—in which case I might have seen the upcoming problem.

At first I thought my biggest limitation was going to be the fact that I couldn’t see a way to transform the fromValue function—while I could map over the entries in cfMapping, I didn’t see how I was going to be able to take the resulting list and give it to the WPCustomField data constructor.

Then it hit me—I could fold over the list, and partially apply each value to the Data constructor, so that when we got to the end of the list, we’d have an actual value.

Boy did I feel proud of that solution.1

Figuring that I had that problem licked, I decided to first rewrite the toValue function for the WPCustomField structure. I wrote a function that would map over our mapping list, and return the sort of list we were looking for:

mapToValue mapping struct = toValue $ map aListToValue mapping
    where aListToValue (key, accessor) = (key, toValue (accessor struct))

By making sure that the struct was the last thing handed in, I even got to write the new toValue function in a point-free style:

toValue = mapToValue cfMapping

So I compiled it and it ran. “Great!” I thought. “This is going to be easy!” And then I hit WPEnclosure:

data WPEnclosure = WPEnclosure {
  eUrl :: String,
  eLength :: Int,
  eType :: String
} deriving Show

Many of you will see what is wrong immediately. I hinted at the direction of the problem when I mentioned that I didn’t bother to write a type signature for the cfMapping list—because once you see it, and look at WPEnclosure, I think it becomes obvious what the problem is:

cfMapping :: [(String, WPCustomField -> String)]

That’s right—the eLength field being an Int among String fields means that we’ve got heterogenous tuple types for the WPEnclosure type. FAIL.

So, for the moment I’m going to put this cleanup on hold, and just proceed with the hand-rolled instances.

Footnotes:

1 In retrospect, I see that this wouldn’t work, because the types of each of those partially applied functions would not be the same, so the accumulator couldn’t typecheck. Oh, well, pride goeth before the fall and all that.

Choosing the API and defining some ADTs

As I said in the previous article, for wp2o2b, the plan is:

So, the task is to download all the articles in my existing sites, reformat them into org-mode files with appropiate metadata for org2blog, store them locally in a hierarchy that mirrors the one on the server.

WordPress implements an XML-RPC interface for accessing your blog programatically. It supports older legacy styles of access (Blogger, MovableType, and metaWeblog), but recommends that for new development you work with their new API which, incidentally, has the nice benefit of being well-documented.

(Much of this new API was introduced in version 3.4, released in June, 2012—so it’s only six months old at this point. Normally I would hesitate to depend on something that new, if I cared about wide applicability, but WordPress is one of those things where I think you should be keeping up with releases, if only for security reasons, so I don’t perceive it as too much of a limitation.)

The first thing we have to do is implement a data structure for holding a post. Where in a dynamic language, you’d might just get back a big wodge of XML and pick at it as necessary in Haskell, you need to define a data type to hold your results.

So we’ll start there.

Working from the definition of a post in the API documentation what we end up with is something like:

data WPPost = WPPost {
  pPostId :: String,
  pPostTitle :: String,
  pPostDate :: CalendarTime,
  pPostDateGmt :: CalendarTime,
  pPostModified :: CalendarTime,
  pPostModifiedGmt :: CalendarTime,
  pPostStatus :: String,
  pPostType :: String,
  pPostFormat :: String,
  pPostName :: String,
  pPostAuthor :: String,
  pPostPassword :: String,
  pPostExcerpt :: String,
  pPostContent :: String,
  pPostParent :: String,
  pPostMimeType :: String,
  pLink :: String,
  pGuid :: String,
  pMenuOrder :: Int,
  pCommentStatus :: String,
  pPingStatus :: String,
  pSticky :: Bool,
  pPostThumbnail :: [WPMediaItem],
  pTerms :: [WPTerm],
  pCustomFields :: [WPCustomField]
} deriving Show

This refers to a few other structs that we’ve defined—the process is pretty straightforward, so I’m not going to go over it.

Then we need to take this type and give Haskell a way to convert back and forth from it to XML-RPC. The HaXR page on the Haskell wiki link to some example code, and the Network.XmlRpc docs include a little explanation on how to do this.

If your XML-RPC structure has names that are sufficiently unique to map to record names without conflicts, and you’re comfortable with Template Haskell, you could just do:

$(asXmlRpcStruct ''WPPost)

However, if you have an aversion to Template Haskell, or (perhaps more likely) you have field names that are generic enough to present significant conflicts (id, or type or some such) you will have to do it by hand by defining an XmlRpcType instance for your constructor. That ends up looking like:

instance XmlRpcType WPPost where
  toValue struct = toValue $ [("post_id", toValue (pPostId struct)),
                              ("post_title", toValue (pPostTitle struct)),
                              ("post_date", toValue (pPostDate struct)),
                              ("post_date_gmt", toValue (pPostDateGmt struct)),
                              ("post_modified", toValue (pPostModified struct)),
                              ("post_modified_gmt", toValue (pPostModifiedGmt struct)),
                              ("post_status", toValue (pPostStatus struct)),
                              ("post_type", toValue (pPostType struct)),
                              ("post_format", toValue (pPostFormat struct)),
                              ("post_name", toValue (pPostName struct)),
                              ("post_author", toValue (pPostAuthor struct)),
                              ("post_password", toValue (pPostPassword struct)),
                              ("post_excerpt", toValue (pPostExcerpt struct)),
                              ("post_content", toValue (pPostContent struct)),
                              ("post_parent", toValue (pPostParent struct)),
                              ("post_mime_type", toValue (pPostMimeType struct)),
                              ("link", toValue (pLink struct)),
                              ("guid", toValue (pGuid struct)),
                              ("menu_order", toValue (pMenuOrder struct)),
                              ("comment_status", toValue (pCommentStatus struct)),
                              ("ping_status", toValue (pPingStatus struct)),
                              ("sticky", toValue (pSticky struct)),
                              ("post_thumbnail", toValue (pPostThumbnail struct)),
                              ("terms", toValue (pTerms struct)),
                              ("custom_fields", toValue (pCustomFields struct))]
  fromValue v = do
    struct <- fromValue v
    a <- getField "post_id" struct
    b <- getField "post_title" struct
    c <- getField "post_date" struct
    d <- getField "post_date_gmt" struct
    e <- getField "post_modified" struct
    f <- getField "post_modified_gmt" struct
    g <- getField "post_status" struct
    h <- getField "post_type" struct
    i <- getField "post_format" struct
    j <- getField "post_name" struct
    k <- getField "post_author" struct
    l <- getField "post_password" struct
    m <- getField "post_excerpt" struct
    n <- getField "post_content" struct
    o <- getField "post_parent" struct
    p <- getField "post_mime_type" struct
    q <- getField "link" struct
    r <- getField "guid" struct
    s <- getField "menu_order" struct
    t <- getField "comment_status" struct
    u <- getField "ping_status" struct
    v <- getField "sticky" struct
    w <- getField "post_thumbnail" struct
    x <- getField "terms" struct
    y <- getField "custom_fields" struct
    return WPPost {
      pPostId = a,
      pPostTitle = b,
      pPostDate = c,
      pPostDateGmt = d,
      pPostModified = e,
      pPostModifiedGmt = f,
      pPostStatus = g,
      pPostType = h,
      pPostFormat = i,
      pPostName = j,
      pPostAuthor = k,
      pPostPassword = l,
      pPostExcerpt = m,
      pPostContent = n,
      pPostParent = o,
      pPostMimeType = p,
      pLink = q,
      pGuid = r,
      pMenuOrder = s,
      pCommentStatus = t,
      pPingStatus = u,
      pSticky = v,
      pPostThumbnail = w,
      pTerms = x,
      pCustomFields = y }
  getType _ = TStruct

Yeah, so that’s the obvious way to do it. And boy is it tedious—I need to figure out a better way to make this happen. because that’s a lot of pointless boilerplate.

It seems to me that I should somehow be able to define a small data structure and then pull the necessary bits out just once, rather than having to repeat everything at least twice. I guess that’s the purpose that the Template Haskell code serves, but I need more power.

Oh, well, it’s done for the moment.


I want to emphasize here that at this point, I’m just trying to get things done. I am intrigued by the theoretical underpinnings of Haskell (although my understanding of most of them is…shallow at the very least), but I’m also a working programmer—I need to be able to be productive. I want the benefits that I think Haskell has to provide—static typing to keep me from making as many dumb mistakes, good performance—but I have to be able to produce actual code for those things to be worth anything.

At the same time, I recognize that what I’ve just done probably represents a small chunk of technical debt. I’d love to learn enough to be able to pay it off.