Getting our ducks in a row

I have plenty of ideas of how to proceed, but I like to actually implement things in a fairly methodical fashion, especially if I’m doing it in public.

I spent some of the day before yesterday getting a GitHub project set up; you can now find all the code associated with this project at https://github.com/mdorman/couch-simple. I was pleased to find that TravisCI provides access to a CouchDB instance, which means that our testing story can be very clear—unwanted dependencies on the configuration of my local CouchDB server won’t be an issue.

With that in mind, I pushed a skeleton project up night before last. Now real development can begin.

The core of the library

If you’re interacting with CouchDB, you have to speak HTTP+JSON.

Even if you send an Accept header that said you only read text/plain values—which, I should point out, the Couch documentation says is a valid option—it will send you back JSON…that is just marked as text/plain.

The HTTP requirement is even more inescapable—the only port open speaks HTTP, Full Stop.

With regard to the JSON requirement, I don’t think there’s any real competition: the aeson library is far and away the most popular JSON library, to the extent that one might be forgiven for not realizing that there were any others. Although I know the json package exists, I’ve never had occasion to so much as read its documentation.

What to use for HTTP is a little less clear, but not, to my eyes, much.

If we got back to my prior statement of requirements, I included:

  • choice in streaming library

While I am most familiar with the Conduit ecosystem (largely because I was using couchdb-conduit, so this might change!), others might prefer to work with Pipes, or, I suppose, io-streams.

Now that doesn’t necessarily mean anything—if I’m doing a request in some function being called as part of someone’s stream processing, I don’t necessarily need to be integrated with them—usually I just need to write a wrapper that waits for requests, makes them, then yields them.

And for the most part if I’m producing a streaming view, for instance, it’s the same thing—I just need to yield each result as I get it; I don’t need to be somehow intrinsically tied to the framework.

But still, the http-client package has streaming wrappers for all three of the mentioned libraries—which suggests that it’s easily compatible—it’s got baked-in support for connection pooling, and good support for incremental input (which is a lot of what we need at this low level).

So that is the basis upon which I’m going to build what I”m currently thinking I will call couchdb-simple.

Initiating a new project

I’ve been working on a personal project in my spare time for…an awfully long time now. Although a huge part of what it does is necessarily server-based, I want the UI of the project to be offline-mobile-first—that is, in addition to mobile first, it must be able to run offline seamlessly, etc.

I don’t have the time or the energy or expertise to seriously consider building my own infrastructure for doing such things, so I’ve elected to go with CouchDB+PouchDB as my data storage solution. I will admit to a tiny bit of concern about PouchDB—not so much whether it’s good, but whether it’s capable of handling the data storage needs I envision. Still, even if it’s not, it provides a starting point, and there are other options (TouchDB, or Couchbase-Lite or whatever it’s called right now) that I can consider.

On the server side—in the stuff I am definitely doing in Haskell, as opposed to the client-side where I would like to be able to use Haskell, but might compromise if necessary—I want, first and foremost, a well-maintained database library.

Unfortunately, of the five libraries to interface with CouchDB, four haven’t been updated in at least two years, and even the one that has is a major version behind on one of its primary dependencies. So none of these, IMHO, represent a well-maintained option.

I have, to this point, been using couchdb-conduit (with a bunch of patches I’ve maintained to keep it compatible with current libs), but I’ve recently run across an issue whose workaround is annoying enough—trying to handle exceptions when calling a routine from within a segment of a conduit—that I think I’m just going to write my own.

So, my first potentially-public Haskell library. It’s actually a little intimidating.

My first step, I think, is to identify what I want

  • easy access to the CouchDB API

    It’s actually pretty important to me that this mirror the official API—it allows me to refer to it as documentation, it gives me (and others) a good guide to relatively completeness, it makes tracking any changes easier, etc. It gives me a built in structure.

  • good type guarantees for correctness

    Especially when some of these calls end up feeling like log strings of parameters, I want to make sure the compiler will tell me when I leave one out.

  • to process streaming outputs in a streaming fashion

    Most access to individual records and what-not doesn’t require actual streaming—really, just incremental processing of what is ultimately going to be one result.

    But when you want to handle a bunch of records coming out of a view, or you want to hook into the _changes feed? Streaming must be available.

  • choice in streaming library

    At this point, I am more acquainted with Conduit, no question, but I would like to make sure not to exclude users of the Pipes library from being able to provide their own streaming option on top of this.

  • implicit parameters (host, port, database) most of the time

    Most of the time, I just want to stuff all the connection stuff into a Reader instance and never have to mention it again…

  • explicit parameters when I need it

    …except sometimes, when I really need to do something odd in the middle of a bunch of other stuff..

  • well maintaned

    Even if I have to do it myself.

So, that’s what I want to achieve. I think it is all achievable, but I’m going to start small and try and build up to it. The great thing is that I already have a body of code that’s currently using couchdb-conduit that I can use as a development testbed.

Replacing Nothing values with Just values in a nested structure

This is what I expect will become the first of a series of posts where I detail some of the solutions I’ve arrived at, based on the test programs that I’ve written to work through the problem.

The situation I was running into was this:

I was writing tests for some parsing code. The output of the parse was a nested data structure. Some (well, many) parts of the data structure were optional. One in particular was giving me a hard time, though—it was a default being set from a dynamic value being passed into the parser. Obviously I had to get that same dynamic value into the test data as well, in order for them to match.

(I will certainly admit that there might be better ways to structure all this, but this is what I have for the moment.)

Anyway, at the time I run a test, I have the test case (which has both the inputs and the expected outputs), and I have the dynamic value, and I needed to insert that dynamic value into the expected outputs before comparing with the actual outputs.

Oh, and did I mention that there was a list of possible outputs, all of which needed to be massaged?

So, I created the following fake data structure that, nonetheless, mirrors the attributes of my test data structure that I care about.

Anyway, the lens library makes these sorts of traversal-with-replacement bits of code quite concise. In this example code, we use over to apply the function that is its second argument (which will either simply pass along an existing Just value, or return a constant value, and wrap either in another Just) to all of the entities that are described by its second argument—that is the containerItem contained in each entry in the collectionItems list.

There is a part of me that suspects that there’s an even easier way to do this—it seems like there should be some way for me to narrow the traversal to only pick up items with the value of Nothing (using the _Nothing prism), and then simply setting those items to the default value. But the obvious way of constructing that (adding _Nothing to the traversal, and dropping the fromMaybe in favor of a simple Just value) did not typecheck.

{-# LANGUAGE OverloadedStrings, TemplateHaskell #-}module Main whereimport Control.Lensimport Data.Maybeimport Data.Textdata Item = Item {
  _itemName :: Text} deriving (Show)
makeLenses ''Item
makePrisms ''Item

data Container = Container {
  _containerName :: Text,
  _containerItem :: Maybe Item} deriving (Show)
makeLenses ''Container
makePrisms ''Container

data Collection = Collection {
  _collectionName :: Text,
  _collectionItems :: ![Container]
} deriving (Show)
makeLenses ''Collection
makePrisms ''Collection

collection :: Collectioncollection = Collection "Outermost" [Container "A" (Just (Item "A")),
                                     Container "B" (Nothing),
                                     Container "C" (Just (Item "C")),
                                     Container "D" (Nothing),
                                     Container "E" (Just (Item "E")),
                                     Container "F" (Just (Item "F")),
                                     Container "G" (Just (Item "G"))]

makeDefault :: Collection -> CollectionmakeDefault =  over (collectionItems.each.containerItem) (Just . fromMaybe (Item "Bar"))

Using user authentication with CouchDB and couchdb-conduit

I’ve started getting back into working on my Sekrit Haskell Application, for which I plan on using a CouchDB back-end.

This left me with a choice between and . I went with couchdb-conduit for a couple of reasons.

A large part of it was simply that is a well-supported, fairly well-documented base for building this sort of higher-level library. The other part was that the older CouchDB library was using the library, and I wanted to use .

(I freely concede that some of this is about libraries that are mindshare-winners, rather than necessarily the best from a technical standpoint; I recognize that may have a better theoretical foundation than counduit, and I don’t have a clear idea which is more sophisticated between json and aeson.)

Anyway, I found myself having to contribute to bring it up to date with the 1.0 release of conduit. And then yesterday, I noticed that . But at this point, I believe that things are set up to work.

So, just for reference, here’s the code I wrote to actually do something. I can create a database, named for the user, with authentication open to just that user, with the passed in password:

data UserCredentials = UserCredentials {
    credentialEmail :: ByteString, -- ^The email address of the new user    credentialPassword :: ByteString  -- ^The password for the new user} deriving (Show)

connection :: CouchConnectionconnection = def {couchLogin = "administrator", couchPass = "ThisIsn'tReallyThePassword"}

userDb :: ByteString -> ByteStringuserDb = (intercalate "/") . reverse . splitWith (`elem` "@.")

authId :: ByteString -> ByteStringauthId email = concat ["org.couchdb.user:", email]

authRecord :: AntilibrationCredentials -> ValueauthRecord (AntilibrationCredentials email password) = object ["name" .= email, "roles" .= ([] :: [ByteString]), "type" .= ("user" :: ByteString), "password" .= password]

createUserDB :: UserCredentials -> IO ()createUserDB credentials =  runCouch connection $ do    _ <- couchPut "_users" (authId $ credentialEmail credentials) "" [] (authRecord credentials)
    couchPutDB_ (userDb $ credentialEmail credentials)
    couchSecureDB (userDb $ credentialEmail credentials) [] [] [] [(credentialEmail credentials)]

It’s taken me a while to feel comfortable enough with Haskell to get to where I could write this code, but now that it’s done, I’m impressed with how straightforward it ends up being.

Simplify FAIL

So I said yesterday that I really wanted to find a better way to do our by-hand mapping of XML-RPC structs, because doing it by hand—and, specifically, repeating a bunch of information multiple times—was tedious, error-prone and ugly. Here’s a smaller struct we’re working with, for WPCustomField—smaller, but it’s still a bunch of boilerplate:

data WPCustomField = WPCustomField {
  cfId :: String,
  cfKey :: String,
  cfValue :: String
} deriving Show

instance XmlRpcType WPCustomField where
  toValue struct = toValue $ [("id", toValue (cfId struct)),
                              ("key", toValue (cfKey struct)),
                              ("value", toValue (cfValue struct))]
  fromValue v = do
    struct <- fromValue v
    a <- getField "id" struct
    b <- getField "key" struct
    c <- getField "value" struct
    return WPCustomField {
      cfId = a,
      cfKey = b,
      cfValue = c }
  getType _ = TStruct

So I started thinking about it. It seemed obvious to me that I would want to start with a list of tuples—each tuple establishing a mapping from XML-RPC attribute name to accessor function, and put in a list because I was going to need to keep their ordering in order to feed them to the data constructor in the proper order.

So I did this:

cfMapping = [("id", cfId),
             ("key", cfKey),
             ("value", cfValue)];

This seemed simple enough that I didn’t bother to write a type declaration, or let ghc-mod do it for me—in which case I might have seen the upcoming problem.

At first I thought my biggest limitation was going to be the fact that I couldn’t see a way to transform the fromValue function—while I could map over the entries in cfMapping, I didn’t see how I was going to be able to take the resulting list and give it to the WPCustomField data constructor.

Then it hit me—I could fold over the list, and partially apply each value to the Data constructor, so that when we got to the end of the list, we’d have an actual value.

Boy did I feel proud of that solution.1

Figuring that I had that problem licked, I decided to first rewrite the toValue function for the WPCustomField structure. I wrote a function that would map over our mapping list, and return the sort of list we were looking for:

mapToValue mapping struct = toValue $ map aListToValue mapping
    where aListToValue (key, accessor) = (key, toValue (accessor struct))

By making sure that the struct was the last thing handed in, I even got to write the new toValue function in a point-free style:

toValue = mapToValue cfMapping

So I compiled it and it ran. “Great!” I thought. “This is going to be easy!” And then I hit WPEnclosure:

data WPEnclosure = WPEnclosure {
  eUrl :: String,
  eLength :: Int,
  eType :: String
} deriving Show

Many of you will see what is wrong immediately. I hinted at the direction of the problem when I mentioned that I didn’t bother to write a type signature for the cfMapping list—because once you see it, and look at WPEnclosure, I think it becomes obvious what the problem is:

cfMapping :: [(String, WPCustomField -> String)]

That’s right—the eLength field being an Int among String fields means that we’ve got heterogenous tuple types for the WPEnclosure type. FAIL.

So, for the moment I’m going to put this cleanup on hold, and just proceed with the hand-rolled instances.

Footnotes:

1 In retrospect, I see that this wouldn’t work, because the types of each of those partially applied functions would not be the same, so the accumulator couldn’t typecheck. Oh, well, pride goeth before the fall and all that.

Choosing the API and defining some ADTs

As I said in the previous article, for wp2o2b, the plan is:

So, the task is to download all the articles in my existing sites, reformat them into org-mode files with appropiate metadata for org2blog, store them locally in a hierarchy that mirrors the one on the server.

WordPress implements an XML-RPC interface for accessing your blog programatically. It supports older legacy styles of access (Blogger, MovableType, and metaWeblog), but recommends that for new development you work with their new API which, incidentally, has the nice benefit of being well-documented.

(Much of this new API was introduced in version 3.4, released in June, 2012—so it’s only six months old at this point. Normally I would hesitate to depend on something that new, if I cared about wide applicability, but WordPress is one of those things where I think you should be keeping up with releases, if only for security reasons, so I don’t perceive it as too much of a limitation.)

The first thing we have to do is implement a data structure for holding a post. Where in a dynamic language, you’d might just get back a big wodge of XML and pick at it as necessary in Haskell, you need to define a data type to hold your results.

So we’ll start there.

Working from the definition of a post in the API documentation what we end up with is something like:

data WPPost = WPPost {
  pPostId :: String,
  pPostTitle :: String,
  pPostDate :: CalendarTime,
  pPostDateGmt :: CalendarTime,
  pPostModified :: CalendarTime,
  pPostModifiedGmt :: CalendarTime,
  pPostStatus :: String,
  pPostType :: String,
  pPostFormat :: String,
  pPostName :: String,
  pPostAuthor :: String,
  pPostPassword :: String,
  pPostExcerpt :: String,
  pPostContent :: String,
  pPostParent :: String,
  pPostMimeType :: String,
  pLink :: String,
  pGuid :: String,
  pMenuOrder :: Int,
  pCommentStatus :: String,
  pPingStatus :: String,
  pSticky :: Bool,
  pPostThumbnail :: [WPMediaItem],
  pTerms :: [WPTerm],
  pCustomFields :: [WPCustomField]
} deriving Show

This refers to a few other structs that we’ve defined—the process is pretty straightforward, so I’m not going to go over it.

Then we need to take this type and give Haskell a way to convert back and forth from it to XML-RPC. The HaXR page on the Haskell wiki link to some example code, and the Network.XmlRpc docs include a little explanation on how to do this.

If your XML-RPC structure has names that are sufficiently unique to map to record names without conflicts, and you’re comfortable with Template Haskell, you could just do:

$(asXmlRpcStruct ''WPPost)

However, if you have an aversion to Template Haskell, or (perhaps more likely) you have field names that are generic enough to present significant conflicts (id, or type or some such) you will have to do it by hand by defining an XmlRpcType instance for your constructor. That ends up looking like:

instance XmlRpcType WPPost where
  toValue struct = toValue $ [("post_id", toValue (pPostId struct)),
                              ("post_title", toValue (pPostTitle struct)),
                              ("post_date", toValue (pPostDate struct)),
                              ("post_date_gmt", toValue (pPostDateGmt struct)),
                              ("post_modified", toValue (pPostModified struct)),
                              ("post_modified_gmt", toValue (pPostModifiedGmt struct)),
                              ("post_status", toValue (pPostStatus struct)),
                              ("post_type", toValue (pPostType struct)),
                              ("post_format", toValue (pPostFormat struct)),
                              ("post_name", toValue (pPostName struct)),
                              ("post_author", toValue (pPostAuthor struct)),
                              ("post_password", toValue (pPostPassword struct)),
                              ("post_excerpt", toValue (pPostExcerpt struct)),
                              ("post_content", toValue (pPostContent struct)),
                              ("post_parent", toValue (pPostParent struct)),
                              ("post_mime_type", toValue (pPostMimeType struct)),
                              ("link", toValue (pLink struct)),
                              ("guid", toValue (pGuid struct)),
                              ("menu_order", toValue (pMenuOrder struct)),
                              ("comment_status", toValue (pCommentStatus struct)),
                              ("ping_status", toValue (pPingStatus struct)),
                              ("sticky", toValue (pSticky struct)),
                              ("post_thumbnail", toValue (pPostThumbnail struct)),
                              ("terms", toValue (pTerms struct)),
                              ("custom_fields", toValue (pCustomFields struct))]
  fromValue v = do
    struct <- fromValue v
    a <- getField "post_id" struct
    b <- getField "post_title" struct
    c <- getField "post_date" struct
    d <- getField "post_date_gmt" struct
    e <- getField "post_modified" struct
    f <- getField "post_modified_gmt" struct
    g <- getField "post_status" struct
    h <- getField "post_type" struct
    i <- getField "post_format" struct
    j <- getField "post_name" struct
    k <- getField "post_author" struct
    l <- getField "post_password" struct
    m <- getField "post_excerpt" struct
    n <- getField "post_content" struct
    o <- getField "post_parent" struct
    p <- getField "post_mime_type" struct
    q <- getField "link" struct
    r <- getField "guid" struct
    s <- getField "menu_order" struct
    t <- getField "comment_status" struct
    u <- getField "ping_status" struct
    v <- getField "sticky" struct
    w <- getField "post_thumbnail" struct
    x <- getField "terms" struct
    y <- getField "custom_fields" struct
    return WPPost {
      pPostId = a,
      pPostTitle = b,
      pPostDate = c,
      pPostDateGmt = d,
      pPostModified = e,
      pPostModifiedGmt = f,
      pPostStatus = g,
      pPostType = h,
      pPostFormat = i,
      pPostName = j,
      pPostAuthor = k,
      pPostPassword = l,
      pPostExcerpt = m,
      pPostContent = n,
      pPostParent = o,
      pPostMimeType = p,
      pLink = q,
      pGuid = r,
      pMenuOrder = s,
      pCommentStatus = t,
      pPingStatus = u,
      pSticky = v,
      pPostThumbnail = w,
      pTerms = x,
      pCustomFields = y }
  getType _ = TStruct

Yeah, so that’s the obvious way to do it. And boy is it tedious—I need to figure out a better way to make this happen. because that’s a lot of pointless boilerplate.

It seems to me that I should somehow be able to define a small data structure and then pull the necessary bits out just once, rather than having to repeat everything at least twice. I guess that’s the purpose that the Template Haskell code serves, but I need more power.

Oh, well, it’s done for the moment.


I want to emphasize here that at this point, I’m just trying to get things done. I am intrigued by the theoretical underpinnings of Haskell (although my understanding of most of them is…shallow at the very least), but I’m also a working programmer—I need to be able to be productive. I want the benefits that I think Haskell has to provide—static typing to keep me from making as many dumb mistakes, good performance—but I have to be able to produce actual code for those things to be worth anything.

At the same time, I recognize that what I’ve just done probably represents a small chunk of technical debt. I’d love to learn enough to be able to pay it off.

Beginning of wp2o2b

I recently made the decision to start doing my blogging using org2blog. I’ve been using WordPress to host a couple of blogs (Radios Appear and Gurave, as well as my friend Chet’s blog Miscellaneous Heathen) for the last couple of years, but I’ve always done my writing in Emacs (using Textile for formatting), and then simply cut-and-pasted what I wrote into WordPress’s text entry box: of course a more integrated solution based entirely in Emacs sounded attractive.

And, indeed, I like the results. In addition to the original two blogs, I’m using this workflow to write my articles about exploring Emacs (Do You Even Lisp?) as well as this blog (I gave some serious thought to Brent Yorgey’s Blog Literately package, appropriately written in Haskell, but first, the Debian Haskell situation is in a lot of flux right now and I didn’t want to have to work that hard to package it and second, a little consistency of tooling is always welcome).

Being the sort of compulsive neat-freak that I am, though, I decided that it wasn’t enough to just create new articles using org2blog—what I really wanted was to export all the existing articles in my long-term blogs into my org2blog setup.

I was originally going to do this using Perl—it’s been my go-to language for the last 17 years, I can whip up what I want in it faster than in anything else, and I’ve been doing it long enough to write code that is readable after the fact. Besides, I already have a script for doing some WordPress interaction, so it would be a great starting point.

And then I decided that I was going to do it in Haskell (which I have come to realize was probably an even better choice than I initially expected—I’ll talk about that later).

So, the task is to download all the articles in my existing sites, reformat them into org-mode files with appropiate metadata for org2blog, store them locally in a hierarchy that mirrors the one on the server.

I think it’s going to be easier than I expected.