Hakyll, Contexts and Metadata

Created: 24th July 2018

I recently switched my website to using the Hakyll static site generator. Hakyll’s written in Haskell, and consequently you have to write your configuration files in Haskell. I’m going to discuss an aspect of Hakyll I found pretty difficult at first, the Context monoid. Note, I’m using Hakyll 4. The Context monoid lets you pass data to a template. It is passed to the loadAndApplyTemplate function.

Hakyll gives you a defaultContext already. It does things like put the content in the body variable and give you access to YAML metadata. There is a caveat with the YAML metadata though. It doesn’t work with lists or nested data.

Because Context is a monoid, it can be combined using <>. a <> b will create a new Context where a is queried first for a variable, and if it’s not found then b is queried. It’s left-biased.

Let’s look a bit more at what a Context is:

newtype Context a = Context
    { unContext :: String -> [String] -> Item a -> Compiler ContextField
    }

In other words, you can think of Context as a a function that takes three parameters and gives you a Compiler ContextField. The first variable is the name of the variable, the second is a list of arguments and third is the Item that is being compiled. We’ll get back to what an Item and a Compiler ContextField are in a bit.

When you refer to a variable $a$ in a template, it will pass the string "a" as the first parameter. However, you can also pass arguments, as if you’re calling a function e.g. $a("b", "c")$, in which case the second parameter will be the list ["b", "c"].

Let’s get back to the Item data type and Compiler monad. These are at the heart of Hakyll. Throughout Hakyll, data is operated on values belonging to the Item type. A file starts off as an Item String which contains the raw file contents as a String. The item also contains the item identifier. But it doesn’t contain the item metadata. To access the metadata, you need to be inside the Compiler monad. Why? Well, the getMetadata function takes an item identifier and returns the metadata, but it returns m Metadata, where m is an instance of MonadMetadata. Guess what’s an instance of MonadMetadata, a Compiler.

The Compiler monad is also central. Things such as pandocCompiler, loadAndApplyTemplate all take an Item and return a Compiler String. We want something that produces data that can be used by the template, so we want a Compiler ContextField.

To give a concrete example, here’s a Context, that will simply return b for the variable a:

simpleCtx :: Context a
simpleCtx = Context f
    where f "a" _ _  = return $ StringField "b"
          f _ _ _ = empty

empty just means I have nothing to offer for this variable name. Let’s do a more interesting example. Let’s say you want to embed YouTube videos easily. YouTube videos have an ID, visible at the end of their URL after /watch?v=. Suppose the ID is dQw4w9WgXcQ, wouldn’t it be cool to type in $youtube("dQw4w9WgXcQ")$ and for the video to be embedded. Here’s an example of how that could be achieved.

youtubeCtx :: Context a
youtubeCtx = Context f
    where f "youtube" [id] _  = return $ StringField (makeHTML id)
          f _ _ _ = empty
          makeHTML id = "<iframe width=\"560\" height=\"315\"" 
                            ++ " src=\"https://youtube.com/embed/"
                            ++ id ++ "\"></iframe>"

But actually you don’t even have to do that, since Hakyll provides a convenience function functionField, but it’s good to know what’s going on behind the scenes:

youtubeContext :: Context a
youtubeContext = functionField "youtube" f
    where f [id] _ = return $ "<iframe width=\"560\" height=\"315\"" 
                            ++ " src=\"https://youtube.com/embed/"
                            ++ id ++ "\"></iframe>"
          f _ _ = empty

Note that with functionField, f no longer returns a Compiler ContextField but a Compiler String (which is put into a StringField). Let’s do something slightly more advanced. Let’s take advantage of the fact we’re in a MonadMetadata and actually use the metadata for something. Let’s have a function len, that returns the length of a string in metadata:

import qualified Data.HashMap.Strict as Map
import Data.Text (unpack, pack)

lenContex = functionField "len" f
    where f [id] item
            = do
                m <- getMetadata (itemIdentifier item)
                str <- ensureExists (Map.lookup (pack id) m)
                        >>= decodeString
                return $ show (length str)
          ensureExists (Just v) = return v
          ensureExists _ = fail "Key doesn't exist"
          decodeString (String t) = return (unpack t)
          decodeString _ = fail "Expected string"

Note that in Hakyll Metadata is an alias for Object which is provided in the aeson package (and re-exported in the yaml package):

data Value = Object !Object
           | Array !Array
           | String !Text
           | Number !Scientific
           | Bool !Bool
           | Null
             deriving (Eq, Read, Show, Typeable, Data, Generic)

type Object = HashMap Text Value

Another thing to note is the use of fail, this is better than something like error as it integrates with Hakyll’s logging system, and won’t crash your entire program. Fun fact, the reason you can use fail is because Compiler is an instance of MonadError which comes from the mtl package. Additionally, Value and Object use Text instead of String, which is why I needed to use unpack and pack to convert between them.

One thing that’s used to form defaultContext is metadataContext, the issue with this, as I mentioned above, is it doesn’t take into account nested properties. Let’s create our own version that does. Note I’m going to use the split package which provides splitOn so I can go from a.b.c to ["a", "b", "c"]. This context only work, when the YAML is a scalar property (i.e. not an array or object).

import Data.List.Split (splitOn)

metadataCtx :: Context a
metadataCtx = Context f
    where f t _ item 
            = do
                m <- getMetadata (itemIdentifier item)
                StringField <$> (f' (splitOn "." t) m)
          f' [x] m = ensureExists (Map.lookup (pack x) m)
                            >>= decodeString
          f' (x:xs) m = ensureExists (Map.lookup (pack x) m)
                            >>= decodeObject 
                            >>= f' xs
          ensureExists (Just v) = return v
          ensureExists _ = fail "Key doesn't exist"
          decodeString (String t) = return (unpack t)
          decodeString _ = fail "Expected string"
          decodeObject (Object obj) = return obj
          decodeObject _ = fail "Key doesn't exist"

So far we have only covered StringField, but there is another type called ListField. This can be used with for-loops in templates, like so:

<ul>
$for(list)$
<li>$value$</li>
$endfor$
</ul>

Let’s look at how a ListField is defined:

data ContextField
    = StringField String
    | forall a. ListField (Context a) [Item a]

One thing you might not be familiar with is the forall a. This is different from doing data ContextField a. If you did data ContextField a, then a ContextField String would be a different type from a ContextField [String], but here regardless of the type of the value contained, it’s a single type.

So a ListField is formed from a list of items that are iterated over, and a Context a. This context is to provide the variables inside the loop. To understand this, consider that the most common use of a ListField is when creating something like an archives page, which lists other pages so people can access them. Therefore, the list of items is the list of pages, which get be obtained using loadAll. The Context provided can then load the relevant data from the page’s metadata, to provide the variables in the body of the loop. To make this easy, there is a convenience function listField:

listField :: String -> Context a -> Compiler [Item a] -> Context b

To use it you might do something like this:

posts <- loadAll "posts/*"
let ctx = listField "posts" defaultContext (return posts) `mappend` defaultContext

But suppose you want to do something more complex. For instance suppose you have a list of scripts that you want to insert into every page. Suppose you want to be able to control them from your Haskell configuration. You could store them in a list:

scripts = ["/scripts/react.js", "/scripts/redux.js"]

You could then use them as a list field like this:

let ctx = listField "scripts" scriptCtx (mapM makeItem scripts)

makeItem just creates an item whose contents is what’s passed to it. Note that these items don’t have any metadata, attached to them. We now need to define a scriptCtx that will take an Item String whose itemBody is the script, and provide it under some key.

scriptCtx :: Context String
scriptCtx = Context f
    where f "src" [] item = return $ StringField (itemBody item)
          f _ _ _ = empty

Now we can do:

$for(scripts)$
<script src="$src$"></script>
$endfor$

That’s all well and good, but what if the list wasn’t stored in our Haskell file, but in metadata. For instance:

scripts: ["/scripts/react.js", "/scripts/redux.js"]

In this case we would have to write our own custom context:

import qualified Data.Vector as Vector

context :: Context a
context = Context f
    where f "scripts" [] item =
           do
             m <- getMetadata (itemIdentifier item)
             scripts <- (ensureExists (Map.lookup "scripts" m))
                        >>= decodeArray
                        >>= mapM decodeString
                        >>= mapM makeItem
             return $ ListField scriptCtx scripts
            
          ensureExists (Just v) = return v
          ensureExists (Nothing) = fail "Key doesn't exist"
          decodeArray (Array v) = return $ Vector.toList v
          decodeArray _ = fail  "Key not list"
          decodeString (String s) = return (unpack s)
          decodeString _ = fail "Expected string" 

This seems like an awful amount of effort to go through to deal with a list of strings in metadata. However it’s possible to generalise it so it works for all lists of strings in your metadata. For my website, I have a context, that I use instead of metadataContext. It has three advantages:

  • Dealing with a.b.c as discussed above
  • Dealing with lists of objects
  • Dealing with lists of strings