Joshua Potter

Tagless Final Parsing

25 Dec 2021

In his introductory text, Oleg Kiselyov discusses the tagless final strategy for implementing DSLs. The approach permits leveraging the strong typing provided by some host language in a performant way. This post combines key thoughts from a selection of papers and code written on the topic. We conclude with an implementation of an interpreter for a toy language that runs quickly, remains strongly-typed, and can be extended without modification.

♤♠♤♠♤

Introduction

To get started, let’s write what our toy language will look like.

digit = ? any number between 0-9 ? ;

(* We ignore any leading 0s *)
integer = digit, { digit } ;
e_integer = [ e_integer, ("+" | "-") ]
          , ( "(", e_integer, ")" | integer )
          ;

boolean = "true" | "false" ;
e_boolean = [ e_boolean, ("&&" | "||") ]
          , ( "(", e_boolean, ")" | boolean )
          ;

expr = e_integer | e_boolean ;

The above expresses addition, subtraction, conjunction, and disjunction, to be interpreted in the standard way. All operations are left-associative, with default precedence disrupted via parenthesis (()). Our goal is to use megaparsec to interpret programs in this language, raising errors on malformed or invalidly-typed inputs. We will evaluate interpreter performance on inputs such as those generated by

echo {1..10000000} | sed 's/ / + /g' > input.txt

Info

To support inputs like above, ideally we mix lazy and strict functionality. For example, we should lazily load in a file but strictly evaluate its contents. Certain languages struggle without this more nuanced evaluation strategy, e.g.

$ (echo -n 'print(' && echo -n {1..10000000} && echo ')') |
  sed 's/ / + /g' > input.py
$ python3 input.py
Segmentation fault: 11

Note it’d actually be more efficient on our end to not use megaparsec at all! The library loads in the entirety of to-be-parsed text into memory, but using it will hopefully be more representative of projects out in the wild.

Initial Encoding

How might we instinctively choose to tackle an interpreter of our language? As one might expect, megaparsec already has all the tooling needed to parse expressions. We can represent our expressions and results straightforwardly like so:

data Expr
  = EInt  Integer
  | EBool Bool
  | EAdd  Expr Expr
  | ESub  Expr Expr
  | EAnd  Expr Expr
  | EOr   Expr Expr
  deriving (Show)

data Result = RInt Integer | RBool Bool deriving (Eq)

We use a standard ADT to describe the structure of our program, nesting the same data type recursively to represent precedence. For instance, we would expect a (so-far fictional) function parse to give us

>>> parse "1 + 2 + 3"
EAdd (EAdd (EInt 1) (EInt 2)) (EInt 3)

We can then evaluate expressions within our Expr type. Notice we must wrap our result within some Error-like monad considering the ADT does not necessarily correspond to a well-typed expression in our language.

asInt :: Result -> Either Text Integer
asInt (RInt e) = pure e
asInt _ = Left "Could not cast integer."

asBool :: Result -> Either Text Bool
asBool (RBool e) = pure e
asBool _ = Left "Could not cast boolean."

toResult :: Expr -> Either Text Result
toResult (EInt e)  = pure $ RInt e
toResult (EBool e) = pure $ RBool e
toResult (EAdd lhs rhs) = do
  lhs' <- toResult lhs >>= asInt
  rhs' <- toResult rhs >>= asInt
  pure $ RInt (lhs' + rhs')
toResult (ESub lhs rhs) = do
  lhs' <- toResult lhs >>= asInt
  rhs' <- toResult rhs >>= asInt
  pure $ RInt (lhs' - rhs')
toResult (EAnd lhs rhs) = do
  lhs' <- toResult lhs >>= asBool
  rhs' <- toResult rhs >>= asBool
  pure $ RBool (lhs' && rhs')
toResult (EOr lhs rhs) = do
  lhs' <- toResult lhs >>= asBool
  rhs' <- toResult rhs >>= asBool
  pure $ RBool (lhs' || rhs')

With the above in place, we can begin fleshing out our first parsing attempt. A naive solution may look as follows:

parseNaive :: Parser Result
parseNaive = expr >>= either (fail . unpack) pure . toResult
 where
  expr = E.makeExprParser term
    [ [binary "+" EAdd, binary "-" ESub]
    , [binary "&&" EAnd, binary "||" EOr]
    ]

  binary name f = E.InfixL (f <$ symbol name)

  term = parens expr <|> EInt <$> integer <|> EBool <$> boolean

There are certainly components of the above implementation that raises eyebrows (at least in the context of large inputs), but that could potentially be overlooked depending on how well it runs. Running parseNaive¹ against input.txt shows little promise though:

$ cabal run --enable-profiling exe:initial-encoding -- input.txt +RTS -s
50000005000000
...
            4126 MiB total memory in use (0 MB lost due to fragmentation)
...
  Total   time   25.541s  ( 27.121s elapsed)
...
  Productivity  82.3% of total user, 78.9% of total elapsed

A few immediate takeaways from this exercise:

This solution

has to run through the generated AST twice. First to build the AST and second to type-check and evaluate the code (i.e. toResult).
is resource intensive. It consumes a large amount of memory (approximately 4 GiB in the above run) and is far too slow. A stream of 10,000,000 integers should be quick and cheap to sum together.
is unsafe. Packaged as a library, there are no typing guarantees we leverage from our host language. For example, expression EAnd (EInt 1) (EInt 2) is valid.
suffers from the expression problem. Developers cannot extend our language with new operations without having to modify the source.

Let’s tackle each of these problems in turn.

Single Pass

Our naive attempt separated parsing from evaluation because they fail in different ways. The former raises a ParseErrorBundle on invalid input while the latter raises a Text on mismatched types. Ideally we would have a single error type shared by both of these failure modes. Unfortunately, this is not so simple - the Operator types found within makeExprParser have incompatible function signatures. In particular, InfixL is defined as

InfixL :: m (a -> a -> a) -> Operator m a

instead of

InfixL :: (a -> a -> m a) -> Operator m a

To work around these limitations, we define our binary functions to operate and return on values of type Either Text Expr. We then raise a Left on type mismatch, deferring error reporting at the Parser level until the current expression is fully parsed:

parseSingle :: Parser Result
parseSingle = expr >>= either (fail . unpack) pure
 where
  expr = E.makeExprParser term
    [ [binary "+"  asInt  EInt  EAdd, binary "-"  asInt  EInt  ESub]
    , [binary "&&" asBool EBool EAnd, binary "||" asBool EBool EOr ]
    ]

  binary name cast f bin = E.InfixL do
    void $ symbol name
    pure $ \lhs rhs -> do
      lhs' <- lhs >>= cast
      rhs' <- rhs >>= cast
      toResult $ bin (f lhs') (f rhs')

  term = parens expr <|>
         Right . RInt <$> integer <|>
         Right . RBool <$> boolean

Resource Usage

Though our above strategy avoids two explicit passes through the AST, laziness sets it up so that we build this intertwined callstack without actually evaluating anything until we interpret the returned Result. The generated runtime statistics are actually worse as a result:

cabal run --enable-profiling exe:initial-encoding -- \
  input.txt -m single +RTS -s

We’ll use a heap profile to make the implementation’s shortcomings more obvious:

cabal run --enable-profiling exe:initial-encoding -- \
  input.txt -m single +RTS -hr

Tip

To iterate faster, reduce the number of integers in input.txt. The resulting heap profile should remain proportional to the original. Just make sure the program runs for long enough to actually perform meaningful profiling. This section works on inputs of 100,000 integers.

The generated heap profile, broken down by retainer set, looks as follows:

Our top-level expression parser continues growing in size until the program nears completion, presumably when the expression is finally evaluated. To further pinpoint the leaky methods, we re-run the heap generation by cost-centre stack to get:

Even our straightforward lexing functions are holding on to memory. The laziness and left associative nature of our grammar hints that we may be dealing with a thunk stack reminiscient of foldl. This implies we could fix the issue by evaluating our data strictly with e.g. deepseq. The problem is unfortunately deeper than this though. Consider the definition of InfixL’s parser:

pInfixL :: MonadPlus m => m (a -> a -> a) -> m a -> a -> m a
pInfixL op p x = do
  f <- op
  y <- p
  let r = f x y
  pInfixL op p r <|> return r

Because this implementation uses the backtracking alternative operator (<|>), we must also hold onto each <|> return r in memory “just in case” we need to use it. Since we are working with a left-factored grammar, we can drop the alternatives and apply strictness with a custom expression parser:

parseStrict :: Parser Result
parseStrict = term >>= expr
 where
  expr t = do
    op <- M.option Nothing $ Just <$> ops
    case op of
      Just OpAdd -> nest t asInt  EInt  EAdd
      Just OpSub -> nest t asInt  EInt  ESub
      Just OpAnd -> nest t asBool EBool EAnd
      Just OpOr  -> nest t asBool EBool EOr
      _          -> pure t

  nest
    :: forall a
     . Result
    -> (Result -> Either Text a)
    -> (a -> Expr)
    -> (Expr -> Expr -> Expr)
    -> Parser Result
  nest t cast f bin = do
    t' <- term
    a <- either (fail . unpack) pure do
      lhs <- cast t
      rhs <- cast t'
      toResult $ bin (f lhs) (f rhs)
    a `deepseq` expr a

  term = do
    p <- M.option Nothing $ Just <$> symbol "("
    if isJust p then (term >>= expr) <* symbol ")" else
      RInt <$> integer <|> RBool <$> boolean

This implementation runs marginally faster and uses a constant amount of memory:

Info

ARR_WORDS corresponds to the ByteString constructor and is unavoidable so long as we use megaparsec.

To get a better sense of where runtime tends to reside, let’s re-run our newly strict implementation on our 10,000,000 file again alongside a time profile:

echo {1..10000000} | sed 's/ / + /g' > input.txt
cabal run --enable-profiling exe:initial-encoding -- \
  input.txt -m strict +RTS -p

space          Text.Megaparsec.Lexer      Text/Megaparsec/Lexer.hs:(68,1)-(71,44)                 54.9   57.3
decimal        Text.Megaparsec.Char.Lexer Text/Megaparsec/Char/Lexer.hs:363:1-32                  24.7   21.4
lexeme         Text.Megaparsec.Lexer      Text/Megaparsec/Lexer.hs:87:1-23                        12.6   15.7
ops            Parser.Utils               src/Parser/Utils.hs:(69,1)-(74,3)                        2.6    2.7
toResult       Parser.Initial             src/Parser/Initial.hs:(62,1)-(79,29)                     2.3    2.2
run            Main                       app/Main.hs:(42,1)-(54,58)                               1.6    0.6
readTextDevice Data.Text.Internal.IO      libraries/text/src/Data/Text/Internal/IO.hs:133:39-64    1.3    0.0

Surprisingly, the vast majority of time (roughly 95%) is spent parsing. As such, we won’t worry ourselves about runtime any further.

Type Safety

Let’s next explore how we can empower library users with a stricter version of the Expr monotype. In particular, we want to prohibit construction of invalid expressions. A common strategy is to promote our ADT to a GADT:

data GExpr a where
  GInt  :: Integer -> GExpr Integer
  GBool :: Bool    -> GExpr Bool
  GAdd  :: GExpr Integer -> GExpr Integer -> GExpr Integer
  GSub  :: GExpr Integer -> GExpr Integer -> GExpr Integer
  GAnd  :: GExpr Bool    -> GExpr Bool    -> GExpr Bool
  GOr   :: GExpr Bool    -> GExpr Bool    -> GExpr Bool

By virtue of working with arbitrary, potentially mis-typed input, we must eventually perform some form of type-checking on our input. To get the GADT representation working requires additional machinery like existential datatypes and Rank2Types. The end user of our library reaps the benefits though, acquiring a strongly-typed representation of our AST:

data Wrapper = forall a. Show a => Wrapper (GExpr a)

parseGadt :: Parser Wrapper
parseGadt = term >>= expr
 where
  ...
  nest
    :: forall b
     . Show b
    => Wrapper
    -> (forall a. GExpr a -> Either Text (GExpr b))
    -> (b -> GExpr b)
    -> (GExpr b -> GExpr b -> GExpr b)
    -> Parser Wrapper
  nest (Wrapper t) cast f bin = do
    Wrapper t' <- term
    case (cast t, cast t') of
      (Right lhs, Right rhs) -> do
        let z = eval $ bin lhs rhs
        z `seq` expr (Wrapper $ f z)
      (Left e, _) -> fail $ unpack e
      (_, Left e) -> fail $ unpack e

We hide the full details here (refer to the linked Github repository for the full implementation), but note the changes are minimal outside of the signatures/data types required to make our existentially quantified type work. Users of the parser can now unwrap the Wrapper type and resume like normal.

Warning

This is an arguable "improvement" considering convenience takes a dramatic hit. It is awkward working with the Wrapper type.

Expression Problem

Lastly comes the expression problem, and one that is fundamentally unsolvable given our current implementation. By nature of (G)ADTs, all data-types are closed. It is not extensible since that would break any static type guarantees around e.g. pattern matching. To fix this (and other problems still present in our implementation), we contrast our initial encoding to that of tagless final.

Tagless Final

Let’s re-think our GADT example above and refactor it into a typeclass:

class Symantics repr where
  eInt  :: Integer -> repr Integer
  eBool :: Bool    -> repr Bool
  eAdd  :: repr Integer -> repr Integer -> repr Integer
  eSub  :: repr Integer -> repr Integer -> repr Integer
  eAnd  :: repr Bool    -> repr Bool    -> repr Bool
  eOr   :: repr Bool    -> repr Bool    -> repr Bool

This should look familiar. Instances of GExpr have been substituted by a parameter repr of kind * -> *. Kiselyov describes typeclasses used this way as a means of defining a class of interpreters. An example interpreter could look like evaluation from before:

newtype Eval a = Eval {runEval :: a}

instance Symantics Eval where
  eInt  = Eval
  eBool = Eval
  eAdd (Eval lhs) (Eval rhs) = Eval (lhs + rhs)
  eSub (Eval lhs) (Eval rhs) = Eval (lhs - rhs)
  eAnd (Eval lhs) (Eval rhs) = Eval (lhs && rhs)
  eOr  (Eval lhs) (Eval rhs) = Eval (lhs || rhs)

To belabor the similarities between the two a bit further, compare the following two examples side-by-side:

let expr = foldl' eAdd (eInt 0) $ take count (eInt <$> [1..])
let expr = foldl' EAdd (EInt 0) $ take count (EInt <$> [1..])

Outside of some capitalization, the encodings are exactly the same. Considering these similarities, it’s clear type safety is not a concern like it was with Expr. This new paradigm also allows us to write both an implementation that parallels the memory usage of above as well as a proper solution to the expression problem. To follow along though first requires a quick detour into leibniz equalities.

Leibniz Equality

Leibniz equality states that two objects are equivalent provided they can be substituted in all contexts without issue. Consider the definition provided by the eq package:

newtype a := b = Refl {subst :: forall c. c a -> c b}

Here we are saying two types a and b actually refer to the same type if it turns out that simultaneously

• Maybe a ~ Maybe b
• Either String a ~ Either String b
• Identity a ~ Identity b
• ...

and so on, where ~ refers to equality constraints. Let’s clarify further with an example. Suppose we had types A and B and wanted to ensure they were actually the same type. Then there must exist a function subst with signature c A -> c B for all c. What might that function look like? It turns out there is only one acceptable choice: id.

This might not seem particularly valuable at first glance, but what it does permit is a means of proving equality at the type-level and in a way Haskell’s type system respects. For instance, the following is valid:

>>> import Data.Eq.Type ((:=)(..))
>>> :set -XTypeOperators

-- Type-level equality is reflexive. I can prove it since
-- `id` is a suitable candidate for `subst`.
>>> refl = Refl id
>>> :t refl
refl :: b := b

-- Haskell can verify our types truly are the same.
>>> a = refl :: (Integer := Integer)
>>> a = refl :: (Integer := Bool)

<interactive>:7:5: error:
    • Couldn't match type ‘Integer’ with ‘Bool’

We can also prove less obvious things at the type-level and take advantage of this later on. As an exercise, consider how you might express the following:

functionEquality
  :: a1 := a2
  -> b1 := b2
  -> (a1 -> b1) := (a2 -> b2)

That is, if a1 and a2 are the same types, and b1 and b2 are the same types, then functions of type a1 -> b1 can equivalently be expressed as functions from a2 -> b2.

Answer

I suggest reading from the bottom up to better understand why this works.

import Data.Eq.Type ((:=)(..), refl)

newtype F1 t b a = F1 {runF1 :: t := (a -> b)}
newtype F2 t a b = F2 {runF2 :: t := (a -> b)}

functionEquality
  :: forall a1 a2 b1 b2
   . a1 := a2
  -> b1 := b2
  -> (a1 -> b1) := (a2 -> b2)
functionEquality
  (Refl s1)  -- s1 :: forall c. c a1 -> c a2
  (Refl s2)  -- s2 :: forall c. c b1 -> c b2
  = runF2    -- (a1 -> b1) := (a2 -> b2)
  . s2       -- F2 (a1 -> b1) a2 b2
  . F2       -- F2 (a1 -> b1) a2 b1
  . runF1    -- (a1 -> b1) := (a2 -> b1)
  . s1       -- F1 (a1 -> b1) b1 a2
  . F1       -- F1 (a1 -> b1) b1 a1
  $ refl     -- (a1 -> b1) := (a1 -> b1)

Dynamics

Within our GADTs example, we introduced data type Wrapper to allow us to pass around GADTs of internally different type (e.g. GExpr Integer vs. GExpr Bool) within the same context. We can do something similar via Dynamics in the tagless final world. Though we could use the already available dynamics library, we’ll build our own for exploration’s sake.

First, let’s create a representation of the types in our grammar:

class Typeable repr where
  pInt  :: repr Integer
  pBool :: repr Bool

newtype TQ t = TQ {runTQ :: forall repr. Typeable repr => repr t}

instance Typeable TQ where
  pInt  = TQ pInt
  pBool = TQ pBool

TQ takes advantage of polymorphic constructors to allow us to wrap any compatible Typeable member function and “reinterpret” it as something else. For example, we can create new Typeable instances like so:

newtype AsInt a = AsInt (Maybe (a := Integer))

instance Typeable AsInt where
  pInt  = AsInt (Just refl)
  pBool = AsInt Nothing

newtype AsBool a = AsBool (Maybe (a := Bool))

instance Typeable AsBool where
  pInt  = AsBool Nothing
  pBool = AsBool (Just refl)

We can then use TQ to check if something is of the appropriate type:

>>> import Data.Maybe (isJust)
>>> tq = pInt :: TQ Integer
>>> case runTQ tq of AsInt a -> isJust a
True
>>> case runTQ tq of AsBool a -> isJust a
False

Even more interestingly, we can bundle this type representation alongside a value of the corresponding type, yielding our desired Dynamic:

import qualified Data.Eq.Type as EQ

data Dynamic repr = forall t. Dynamic (TQ t) (repr t)

class IsDynamic a where
  type' :: forall repr. Typeable repr => repr a
  lift' :: forall repr. Symantics repr => a -> repr a
  cast' :: forall repr t. TQ t -> Maybe (t := a)

instance IsDynamic Integer where
  type' = pInt
  lift' = eInt
  cast' (TQ t) = case t of AsInt a -> a

instance IsDynamic Bool where
  type' = pBool
  lift' = eBool
  cast' (TQ t) = case t of AsBool a -> a

toDyn :: forall repr a. IsDynamic a => Symantics repr => a -> Dynamic repr
toDyn = Dynamic type' . lift'

fromDyn :: forall repr a. IsDynamic a => Dynamic repr -> Maybe (repr a)
fromDyn (Dynamic t e) = case t of
  (cast' -> r) -> do
    r' <- r
    pure $ EQ.coerce (EQ.lift r') e

>>> a = toDyn 5           :: Dynamic Expr
>>> runExpr <$> fromDyn a :: Maybe Integer
Just 5

By maintaining a leibniz equality within our Dynamic instances (i.e. AsInt), we can internally coerce the wrapped value into the actual type we care about. With this background information in place, we can finally devise an expression parser similar to the parser we’ve written using initial encoding:

parseStrict
  :: forall repr
   . NFData (Dynamic repr)
  => Symantics repr
  => Parser (Dynamic repr)
parseStrict = term >>= expr
 where
  expr :: Dynamic repr -> Parser (Dynamic repr)
  expr t = do
    op <- M.option Nothing $ Just <$> ops
    case op of
      Just OpAdd -> nest t eAdd OpAdd
      Just OpSub -> nest t eSub OpSub
      Just OpAnd -> nest t eAnd OpAnd
      Just OpOr  -> nest t eOr  OpOr
      _          -> pure t

  nest
    :: forall a
     . IsDynamic a
    => Dynamic repr
    -> (repr a -> repr a -> repr a)
    -> Op
    -> Parser (Dynamic repr)
  nest t bin op = do
    t' <- term
    case binDyn bin t t' of
      Nothing -> fail $ "Invalid operands for `" <> show op <> "`"
      Just a -> a `deepseq` expr a

  term :: Parser (Dynamic repr)
  term = do
    p <- M.option Nothing $ Just <$> symbol "("
    if isJust p then (term >>= expr) <* symbol ")" else
      toDyn <$> integer <|> toDyn <$> boolean

Interpretations

So far we’ve seen very little benefit switching to this strategy despite the level of complexity this change introduces. Here we’ll pose a question that hopefully makes at least one benefit more obvious. Suppose we wanted to interpret the parsed expression in two different ways. First, we want a basic evaluator, and second we want a pretty-printer. In our initial encoding strategy, the evaluator has already been defined:

eval :: GExpr a -> a

We say eval is one possible interpreter over GExpr. It takes in GExprs and reduces them into literal values. How would a pretty printer work? One candidate interpreter could look as follows:

pPrint :: GExpr a -> Text
pPrint (GInt e)  = pack $ show e
pPrint (GBool e) = pack $ show e
pPrint (GAdd lhs rhs) = "(" <> pPrint lhs <> " + " <> pPrint rhs <> ")"
...

Unfortunately using this definition requires fundamentally changing how our GADT parser works. parseGadt currently makes certain optimizations based on the fact only eval has been needed so far, reducing the expression as we traverse the input stream. Generalizing the parser to take in any function of signature forall b. (forall a. GExpr a) -> b (i.e. the signature of some generic interpreter) would force us to retain memory or accept additional arguments to make our function especially generic to accommodate.²

On the other hand, our tagless final approach expects multiple interpretations from the outset. We can define a newtype like

newtype PPrint a = PPrint {runPPrint :: a}

instance Symantics PPrint where
  eInt  = PPrint . pack . show
  eBool = PPrint . pack . show
  eAdd (PPrint lhs) (PPrint rhs) = PPrint $ "(" <> lhs <> " + " <> rhs <> ")"
  ...

and interpret our

parseStrict :: forall repr. Symantics repr => Parser (Dynamic repr)

Parser as a Dynamic PPrint instead of a Dynamic Eval without losing any previously acquired gains.

Expression Revisited

There does exist a major caveat with our tagless final interpreters. If owning a single Dynamic instance, how exactly are we able to interpret this in multiple ways? After all, a Dynamic cannot be of type Dynamic Eval and Dynamic PPrint. What we’d like to be able to do is maintain a generic Dynamic repr and reinterpret it at will.

One solution comes in the form of another newtype around a polymorphic constructor:

newtype SQ a = SQ {runSQ :: forall repr. Symantics repr => repr a}

instance Symantics SQ where
  eInt  e = SQ (eInt e)
  eBool e = SQ (eBool e)
  eAdd (SQ lhs) (SQ rhs) = SQ (eAdd lhs rhs)
  eSub (SQ lhs) (SQ rhs) = SQ (eSub lhs rhs)
  eAnd (SQ lhs) (SQ rhs) = SQ (eAnd lhs rhs)
  eOr  (SQ lhs) (SQ rhs) = SQ (eOr  lhs rhs)

We can then run evaluation and pretty-printing on the same entity:

data Result = RInt Integer | RBool Bool

runBoth :: Dynamic SQ -> Maybe (Result, Text)
runBoth d = case fromDyn d of
  Just (SQ q) -> pure ( case q of Eval a -> RInt a
                      , case q of PPrint a -> a
                      )
  Nothing -> case fromDyn d of
    Just (SQ q) -> pure ( case q of Eval a -> RBool a
                        , case q of PPrint a -> a
                        )
    Nothing -> Nothing

This has an unintended side effect though. By using SQ, we effectively close our type universe. Suppose we now wanted to extend our Symantics type with a new multiplication operator (*). We could do so by writing typeclass:

class MulSymantics repr where
  eMul :: repr Integer -> repr Integer -> repr Integer

instance MulSymantics Eval where
  eMul (Eval lhs) (Eval rhs) = Eval (lhs * rhs)

instance MulSymantics PPrint where
  eMul (PPrint lhs) (PPrint rhs) = PPrint $ "(" <> lhs <> " * " <> rhs <> ")"

Warning

Naturally we also need to extend our parser to be aware of the new operator as well. To avoid diving yet further into the weeds, we do not do that here.

But this typeclass is excluded from our SQ type. We’re forced to write yet another SQ-like wrapper, e.g.

newtype MSQ a = MSQ {runMSQ :: forall repr. MulSymantics repr => repr a}

just to keep up. This in turn forces us to redefine all functions that operated on SQ.

Copy Symantics

We can reformulate this more openly, abandoning any sort of Rank2 constructors within our newtypes by choosing to track multiple representations simultaneously:

data SCopy repr1 repr2 a = SCopy (repr1 a) (repr2 a)

instance (Symantics repr1, Symantics repr2)
  => Symantics (SCopy repr1 repr2) where
  eInt e  = SCopy (eInt e) (eInt e)
  eBool e = SCopy (eBool e) (eBool e)
  eAdd (SCopy a1 a2) (SCopy b1 b2) = SCopy (eAdd a1 b1) (eAdd a2 b2)
  eSub (SCopy a1 a2) (SCopy b1 b2) = SCopy (eSub a1 b1) (eSub a2 b2)
  eAnd (SCopy a1 a2) (SCopy b1 b2) = SCopy (eAnd a1 b1) (eAnd a2 b2)
  eOr  (SCopy a1 a2) (SCopy b1 b2) = SCopy (eOr  a1 b1) (eOr  a2 b2)

instance (MulSymantics repr1, MulSymantics repr2)
  => MulSymantics (SCopy repr1 repr2) where
  eMul (SCopy a1 a2) (SCopy b1 b2) = SCopy (eMul a1 b1) (eMul a2 b2)

As we define new classes of operators on our Integer and Bool types, we make SCopy an instance of them. We can then “thread” the second representation throughout our function calls like so:

runEval'
  :: forall repr
   . Dynamic (SCopy Eval repr)
  -> Maybe (Result, Dynamic repr)
runEval' d = case fromDyn d :: Maybe (SCopy Eval repr Integer) of
  Just (SCopy (Eval a) r) -> pure (RInt a, Dynamic pInt r)
  Nothing -> case fromDyn d :: Maybe (SCopy Eval repr Bool) of
    Just (SCopy (Eval a) r) -> pure (RBool a, Dynamic pBool r)
    Nothing -> Nothing

runPPrint'
  :: forall repr
   . Dynamic (SCopy PPrint repr)
  -> Maybe (Text, Dynamic repr)
runPPrint' d = case fromDyn d :: Maybe (SCopy PPrint repr Text) of
  Just (SCopy (PPrint a) r) -> pure (a, Dynamic pText r)
  Nothing -> Nothing

runBoth'
  :: forall repr
   . Dynamic (SCopy Eval (SCopy PPrint repr))
  -> Maybe (Result, Text, Dynamic repr)
runBoth' d = do
  (r, d') <- runEval' d
  (p, d'') <- runPPrint' d'
  pure (r, p, d'')

Notice each function places a Dynamic repr of unknown representation in the last position of each return tuple. The caller is then able to interpret this extra repr as they wish, composing them in arbitrary ways (e.g. runBoth').

Limitations

The expression problem is only partially solved with our Dynamic strategy. If for instance we wanted to add a new literal type, e.g. a String, we would unfortunately need to append to the Typeable and Dynamic definitions to support them. The standard dynamics package only allows monomorphic values so in this sense we are stuck. If only needing to add additional functionality to the existing set of types though, we can extend at will.

Conclusion

I was initially hoping to extend this post further with a discussion around explicit sharing as noted here, but this post is already getting too long. I covered only a portion of the topics Oleg Kiselyov wrote about, but covered at least the majority of topics I’ve so far been exploring in my own personal projects. I will note that the tagless final approach, while certainly useful, also does add a fair level of cognitive overhead in my experience. Remembering the details around dynamics especially is what prompted me to write this post to begin with.