Anti tutorial on Origami - nedervold/origami GitHub Wiki

Here is an anti-tutorial on Origami. Since I'm the author of Origami, it's not a completely uninformed approach, but as the first user, it's the best you'll find so far.

Origami has to do with data transformation; my task for this anti-tutorial will be to write a filter for Java source files that converts classes to interfaces. We'll be using the language-java package from Hackage.

Set-up

Setting up the Cabal project

The first step is to set up a Cabal project. We add language-java and origami to the Build-Depends: section of the Cabal file. I prefer to run in a sandbox, so I run cabal sandbox init. I then run cabal install --dependencies-only to download and install the dependencies.

I create two files, Main.hs and JavaFold.hs. Main.hs contains only

module Main where

import JavaFold()

main :: IO ()
main = return ()

JavaFold.hs contains

{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE TemplateHaskell #-}
module JavaFold where

import Data.Origami
import Language.Java.Syntax

$(buildFolds [] [] [])

I run cabal build and am told that no constructors are used and that the resulting fold would be empty. True enough.

Discovering the fold family

Looking at the Haddocks for Language.Java.Syntax, I see that the top-level construct (and what the parser reads) is CompilationUnit, so it will be our root datatype. I change the line to $(buildFolds [''CompilationUnit] [] []). Now let's see what happens.

I run cabal build. And...it fails. I'm told that Data.Maybe.Maybe is used in functor position but wasn't declared. I change the splice to $(buildFolds [''CompilationUnit] [''[], ''Maybe] []). Let's try again.

Now I'm told that Traversable is not visible at the splice. I import Data.Traversable and try building one more time.

Now I'm told that Char# is not visible. Why would we be using it or any type in GHC.Prim? There's a stack trace, and I see that we got to it through Ident, which is a wrapper around String. (And I find my first bug.) So the discovery process is being too zealous at digging down into my code in search of datatypes. I don't need to see inside Strings, so I'll declare them as atomic. The splice now reads $(buildFolds [''CompilationUnit] [''[], ''Maybe] [''String]).

I try to build and learn that 2-tuples are used in Annotation, so we add ''(,) to the functor section and try to build again. The discovery process hits bottom again, this time at Int# because Integers are used in Java Literals. Looking at Literal, I see a number of datatypes that I don't need to look inside, so I add them all to the atomic list.

After another attempt at building, I'm warned about Int# again because Int is used in Exp's constructor ArrayCreate. I declare ''Int as atomic too and the splice now looks like

$(buildFolds [''CompilationUnit]
             [''[], ''(,), ''Maybe]
             [''Bool, ''Char, ''Double, ''Int, ''Integer, ''String])

I try to build one last time and finally it succeeds!

Simplifying the fold family

I now have a viable fold family for Java syntax trees and have a ton of code in JavaFold.hs generated. I run cabal haddock and look at the resulting documentation.

The Fold declaration begins:

data Fold annotation arrayIndex arrayInit assignOp block blockStmt catch classBody classDecl classType compilationUnit constructorBody decl elementValue enumBody enumConstant exp explConstrInv fieldAccess forInit formalParam ident importDecl interfaceBody interfaceDecl lhs literal memberDecl methodBody methodInvocation modifier name op packageDecl primType refType stmt switchBlock switchLabel type typeArgument typeDecl typeParam varDecl varDeclId varInit wildcardBound
    = Fold {mkAbstract :: modifier,
            mkActualType :: (refType -> typeArgument),
            mkAdd :: op,
            mkAddA :: assignOp,
            mkAnd :: op,
            mkAndA :: assignOp,
            ⁞

It seems odd that there are so many constructors with no arguments. I look at the definitions of Op and AssignOp and see that they are basically enumerations with no internal data, as is PrimType. I'd rather consider them as atomic, as there's no point in looking inside them and doing so simplifies the code. So one last modification to the splice:

$(buildFolds [''CompilationUnit]
             [''[], ''(,), ''Maybe]
             [''Bool, ''Char, ''Double, ''Int, ''Integer, ''String,
              ''AssignOp, ''Op, ''PrimType])

yields a (slightly) simpler Fold:

data Fold annotation arrayIndex arrayInit block blockStmt catch classBody classDecl classType compilationUnit constructorBody decl elementValue enumBody enumConstant exp explConstrInv fieldAccess forInit formalParam ident importDecl interfaceBody interfaceDecl lhs literal memberDecl methodBody methodInvocation modifier name packageDecl refType stmt switchBlock switchLabel type typeArgument typeDecl typeParam varDecl varDeclId varInit wildcardBound
    = Fold {mkAbstract :: modifier,
            mkActualType :: (refType -> typeArgument),
            mkAnnotation :: (annotation -> modifier),
            mkArrayAccess :: (arrayIndex -> exp),
            mkArrayCreate :: (type -> [exp] -> Int -> exp),
            mkArrayCreateInit :: (type -> Int -> arrayInit -> exp),
            ⁞

Filtering

Writing the filter framework

Now that we've got the Origami framework set up, we write the framework for the filter. We modify Main.hs to

import Language.Java.Parser
import Language.Java.Pretty

main :: IO ()
main = do
    src <- getContents
    let ecu = parser compilationUnit src
    case ecu of
        Left pe -> print pe
        Right cu -> print $ filter cu

filter :: CompilationUnit -> CompilationUnit
filter = id

I try running a Java file through it and I get pretty-printed source out. Yay. Since I'll be using folds to transform the data, I redefine filter to be

filter = foldCompilationUnit idFold

with no change in the output, since folding over the idFold just replaces all the constructors with themselves. I do eventually want to modify the compilation unit, so I move idFold out as a new variable that I'll modify:

filter = foldCompilationUnit xform

xform :: Fold Annotation ArrayIndex ArrayInit Block BlockStmt Catch ClassBody ClassDecl ClassType CompilationUnit ConstructorBody Decl ElementValue EnumBody EnumConstant Exp ExplConstrInv FieldAccess ForInit FormalParam Ident ImportDecl InterfaceBody InterfaceDecl Lhs Literal MemberDecl MethodBody MethodInvocation Modifier Name PackageDecl RefType Stmt SwitchBlock SwitchLabel Type TypeArgument TypeDecl TypeParam VarDecl VarDeclId VarInit WildcardBound
xform = idFold

Thinking about how to filter

MORE TO DO