Anti tutorial on Origami - nedervold/origami GitHub Wiki
Here is an anti-tutorial on Origami. Since I'm the author of Origami, it's not a completely uninformed approach, but as the first user, it's the best you'll find so far.
Origami has to do with data transformation; my task for this anti-tutorial will be to write a filter for Java source files that converts classes to interfaces. We'll be using the language-java package from Hackage.
Set-up
Setting up the Cabal project
The first step is to set up a Cabal project. We add language-java
and origami
to the Build-Depends:
section of the Cabal file. I prefer to run in a sandbox, so I run cabal sandbox init
. I then run cabal install --dependencies-only
to download and install the dependencies.
I create two files, Main.hs
and JavaFold.hs
. Main.hs
contains only
module Main where
import JavaFold()
main :: IO ()
main = return ()
JavaFold.hs
contains
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE TemplateHaskell #-}
module JavaFold where
import Data.Origami
import Language.Java.Syntax
$(buildFolds [] [] [])
I run cabal build
and am told that no constructors are used and that the resulting fold would be empty. True enough.
Discovering the fold family
Looking at the Haddocks for Language.Java.Syntax
, I see that the top-level construct (and what the parser reads) is CompilationUnit
, so it will be our root datatype. I change the line to $(buildFolds [''CompilationUnit] [] [])
. Now let's see what happens.
I run cabal build
. And...it fails. I'm told that Data.Maybe.Maybe
is used in functor position but wasn't declared. I change the splice to $(buildFolds [''CompilationUnit] [''[], ''Maybe] [])
. Let's try again.
Now I'm told that Traversable
is not visible at the splice. I import Data.Traversable
and try building one more time.
Now I'm told that Char#
is not visible. Why would we be using it or any type in GHC.Prim
? There's a stack trace, and I see that we got to it through Ident
, which is a wrapper around String
. (And I find my first bug.) So the discovery process is being too zealous at digging down into my code in search of datatypes. I don't need to see inside String
s, so I'll declare them as atomic. The splice now reads $(buildFolds [''CompilationUnit] [''[], ''Maybe] [''String])
.
I try to build and learn that 2-tuples are used in Annotation
, so we add ''(,)
to the functor section and try to build again. The discovery process hits bottom again, this time at Int#
because Integer
s are used in Java Literal
s. Looking at Literal
, I see a number of datatypes that I don't need to look inside, so I add them all to the atomic list.
After another attempt at building, I'm warned about Int#
again because Int
is used in Exp
's constructor ArrayCreate
. I declare ''Int
as atomic too and the splice now looks like
$(buildFolds [''CompilationUnit]
[''[], ''(,), ''Maybe]
[''Bool, ''Char, ''Double, ''Int, ''Integer, ''String])
I try to build one last time and finally it succeeds!
Simplifying the fold family
I now have a viable fold family for Java syntax trees and have a ton of code in JavaFold.hs
generated. I run cabal haddock
and look at the resulting documentation.
The Fold
declaration begins:
data Fold annotation arrayIndex arrayInit assignOp block blockStmt catch classBody classDecl classType compilationUnit constructorBody decl elementValue enumBody enumConstant exp explConstrInv fieldAccess forInit formalParam ident importDecl interfaceBody interfaceDecl lhs literal memberDecl methodBody methodInvocation modifier name op packageDecl primType refType stmt switchBlock switchLabel type typeArgument typeDecl typeParam varDecl varDeclId varInit wildcardBound
= Fold {mkAbstract :: modifier,
mkActualType :: (refType -> typeArgument),
mkAdd :: op,
mkAddA :: assignOp,
mkAnd :: op,
mkAndA :: assignOp,
⁞
It seems odd that there are so many constructors with no arguments. I look at the definitions of Op
and AssignOp
and see that they are basically enumerations with no internal data, as is PrimType
. I'd rather consider them as atomic, as there's no point in looking inside them and doing so simplifies the code. So one last modification to the splice:
$(buildFolds [''CompilationUnit]
[''[], ''(,), ''Maybe]
[''Bool, ''Char, ''Double, ''Int, ''Integer, ''String,
''AssignOp, ''Op, ''PrimType])
yields a (slightly) simpler Fold
:
data Fold annotation arrayIndex arrayInit block blockStmt catch classBody classDecl classType compilationUnit constructorBody decl elementValue enumBody enumConstant exp explConstrInv fieldAccess forInit formalParam ident importDecl interfaceBody interfaceDecl lhs literal memberDecl methodBody methodInvocation modifier name packageDecl refType stmt switchBlock switchLabel type typeArgument typeDecl typeParam varDecl varDeclId varInit wildcardBound
= Fold {mkAbstract :: modifier,
mkActualType :: (refType -> typeArgument),
mkAnnotation :: (annotation -> modifier),
mkArrayAccess :: (arrayIndex -> exp),
mkArrayCreate :: (type -> [exp] -> Int -> exp),
mkArrayCreateInit :: (type -> Int -> arrayInit -> exp),
⁞
Filtering
Writing the filter framework
Now that we've got the Origami framework set up, we write the framework for the filter. We modify Main.hs
to
import Language.Java.Parser
import Language.Java.Pretty
main :: IO ()
main = do
src <- getContents
let ecu = parser compilationUnit src
case ecu of
Left pe -> print pe
Right cu -> print $ filter cu
filter :: CompilationUnit -> CompilationUnit
filter = id
I try running a Java file through it and I get pretty-printed source out. Yay. Since I'll be using folds to transform the data, I redefine filter
to be
filter = foldCompilationUnit idFold
with no change in the output, since folding over the idFold
just replaces all the constructors with themselves. I do eventually want to modify the compilation unit, so I move idFold
out as a new variable that I'll modify:
filter = foldCompilationUnit xform
xform :: Fold Annotation ArrayIndex ArrayInit Block BlockStmt Catch ClassBody ClassDecl ClassType CompilationUnit ConstructorBody Decl ElementValue EnumBody EnumConstant Exp ExplConstrInv FieldAccess ForInit FormalParam Ident ImportDecl InterfaceBody InterfaceDecl Lhs Literal MemberDecl MethodBody MethodInvocation Modifier Name PackageDecl RefType Stmt SwitchBlock SwitchLabel Type TypeArgument TypeDecl TypeParam VarDecl VarDeclId VarInit WildcardBound
xform = idFold
Thinking about how to filter
MORE TO DO