Assignment 5: Type Inference and Constraint Solving (210 points)

In Assignment 4, you wrote an interpreter for the Nano programming language, which resembles a (very, very small) cousin of Haskell. This interpreter lets you execute Nano programs and see what their results are.

However, some Nano programs don't really make sense: for example, what should a program like 3 && True, or (\x -> x) && False, or 3 + head mean? Should these programs even have a meaning? In fact, our Nano interpreter will throw an error for them. Wouldn't it be great if we could tell if a Nano program was going to fail before we ran it?

By now, you will have seen Haskell report type errors for this kind of situation. For example, you can use the :t command in GHCi to ask what the type of any Haskell expression is -- without actually evaluating the expression. This is called type checking, and it's a form of static analysis: static, because it occurs before runtime, and analysis, because we are studying a program to obtain some information about it. Type checking can't catch all errors; but if your program is considered well-typed by the type checker, then you're guaranteed that it won't exhibit the kinds of errors that your type system is meant to catch.

In this assignment, you will implement a type checker for Nano, just like the :t command in GHCi. If a program fails to type-check, you will be able to identify the error (e.g., if your program tried to add 3 and False); and if a program does type-check, you will be able to say what its type actually is.

As a bonus, your checker is designed in such a way that it can support type polymorphism, so that generic functions like \x -> x and map can be defined without having to write down specialized copies for every type of data you want to invoke them on.

Note: Start early! Type inference has many subtle points that can take a while to get right.

Submission Instructions

This assignment may be done either individually, or as a team of two.

If you choose to work as a team of two, both teammates will be expected to be able to understand and explain all of the submitted code. So you need to work on all the problems collaboratively (e.g., Alice and Bob pair program on all problems), rather than splitting it up (e.g., Alice does problem 1, Bob does problem 2, ...). As a result, working on a team may actually be harder than working alone (e.g., if your teammate isn't great at communication and collaboration). We recommend working alone unless you have already identified a teammate who you know well and are confident that you can communicate and collaborate with well.

When you accept the assignment on GitHub Classroom, you'll be asked to create a team or join an existing team.

If you want to work individually, just create a new team for yourself (and name your team however you like).
If you want to work together with another student, then decide which person will create the team, and then have the other person join the newly created team. Both of you will need to accept the assignment on GitHub Classroom, but the first person will create the team and the second person will join the team. The team will then work together in a single repository. Do not join someone's team without getting their permission first!

Even if you work on a team, each teammate still needs to individually submit the assignment to the autograder and will individually receive a grade.

When you are ready to submit your solutions to the autograder, follow these instructions step-by-step:

You must create a file INTEGRITY.md at the top level of your assignment repository in which you give credit to all sources you used while working on the assignment. Thorough citation is the way to avoid running afoul of accusations of misconduct.
- If you work as a team, then your team's INTEGRITY.md file should include the names of both teammates, and should explain how you collaborated on all the assignment problems.
- So that you have an idea of the level of detail that is expected, there is an example INTEGRITY.md file that you can reference.
- If you didn't reference any external sources, just say so in INTEGRITY.md.
Commit your changes and push your Git repository. (You should be doing this incrementally as you work on the assignment anyway.)
Submit your solutions using the autograder client:
```
python3 -m autograder.run.submit \
    INTEGRITY.md \
    src/Language/Nano/TypeCheck.hs \
    --user <email> \
    --pass <password> \
    --course <course>
```
where <email> is your UCSC email address, <password> is your autograder password, and <course> is an identifier specific to this course and section, which you can find on the Canvas assignment description for this assignment.

Running the autograder.run.submit command will submit your code to the autograder for evaluation and print out the results. Have patience -- it might take a minute or two to see output from the autograder. You can submit as many times as you like before the deadline, and only the last submission will be counted.

The autograder client has other commands that you can use to interact with the autograder. In particular, you can run python3 -m autograder.run.peek to see the results of your last submission, and python3 -m autograder.run.history to see the results of all your submissions. Refer to the autograder client documentation for more information.

If you cannot get your implementation of a function to compile by the time you submit your solutions, leave the broken function defined as error ..., with your partial solution enclosed below as a comment. Since all of your solutions are are compiled together, a compile-time error with one solution would prevent all of them from being graded!

Solving the Problems

In this assignment, you will edit the src/Language/Nano/TypeCheck.hs file. This file contains a set of functions with bodies that simply throw an error (error "TBD: ..."); your mission is to complete those functions according to the problem descriptions below. You should only need to modify the parts of the files that say error "TBD: ..." with suitable Haskell implementations. However, you may define your own helper functions if it helps you; and if you're asked to fill in a function definition, such as:

f xs = error "TBD: ..."

you are also allowed to split this definition into multiple equations, like so:

f []     = ...
f (x:xs) = ...

Most problems require relatively little code ranging from 2 to 10 lines. Some functions, particularly unify and infer, will have many clauses due to pattern matching; then each clause should individually require relatively little code. If any clause requires more than that, you can be sure that you need to rethink your solution.

Checking your Solutions

As you complete each problem, you should run stack test to check for various possible issues with your solution. If any tests fail, you will not get full credit. Conversely, you should assume that the autograder may run some additional tests that are not included with the assignment; so having a clean stack test does not necessarily mean that your assignment will get full points. You should only trust the numbers reported by the autograder when you submit your assignment.

When you run stack test, the end of the output should include the lines

All N tests passed (...)
OVERALL SCORE = ... / ...

K out of N tests failed
OVERALL SCORE = ... / ...

If your output does not have one of the above, your code will receive a zero.

The other lines will give you a readout for each test. You are encouraged to try to understand the testing code, but you will not be graded on this.

You may also find it useful to use the Haskell interpreter, GHCi, to test your code by hand. To do this, run the stack ghci command, which should wait for further input on a ghci> prompt. You can enter Haskell expressions at this prompt to evaluate them, including expressions that use functions defined in the assignment. The ghci documentation has a fuller introduction to GHCi.

Problem descriptions

Data Structures and Overview

This assignment builds upon our previous implementation of Nano from HW4, so most data types in Types.hs are the same as they were before. We have added the following definitions to support type inference.

Types

We will use the following Haskell data types to represent Nano's types and poly-types. (A "poly-type" is a polymorphic type, like the type of the identity function, \x -> x: its type is forall a . a -> a. (Haskell also has polymorphic types, but you don't usually write the forall a . part.)

data Type
  = TInt             -- Int
  | TBool            -- Bool
  | Type :=> Type    -- function type: T1 -> T2
  | TVar TId         -- type variable: a, b, c
  | TList Type       -- list type: [T]

data Poly = Mono Type        -- mono-type
          | Forall TVar Poly -- polymorphic type

Above, TId is just a type alias for String, used to represent type variable names:

type TId = String

For example, a Nano type Int -> Int is represented inside your type checker as:

TInt :=> TInt

whereas a polymorphic Nano type forall a . List a -> a is represented as:

Forall "a" (Mono (TList (TVar "a") :=> TVar "a"))

(or simply forall "a" (list "a" :=> "a") using convenience functions forall and list, also defined in Types.hs).

Type Environments

A type environment is a dictionary that maps program variables to poly-types. As before, the dictionary is implemented as a list of pairs:

type TypeEnv = [(Id, Poly)]

For example, a type environment that maps the program variable x to the type Int, and maps the program variable f to the type Int -> Int, is represented as:

[("x", Mono TInt), ("f", Mono (TInt :=> TInt))]

Type Substitutions

A type substitution (or just "substitution" for short) is also a dictionary, but it maps type variables to types they should be replaced with.

type Subst = [(TId, Type)]

For example, a substitution that maps the type variable a to the type Int, maps the type variable b to the type Int -> [Int], and maps the type variable c to the type d (where d is itself a type variable), is represented as:

[("a", TInt), ("b", TInt :=> TList TInt), ("c", TVar "d")]

When you implement operations on substitutions, you can assume that the keys (type variables) in a substitution are unique. Conversely, in code that uses operations on substitutions, you should guarantee that this is the case.

Part 1: Warm-up (70 points)

In this problem, you will implement some helper functions for your type checker. These functions will be useful in later parts of the assignment.

Problem 1(a): Free type variables (20 points)

First, implement two functions that compute the list of free type variables in a type and in a poly-type, in the order they appear in the type.

A type variable is free if it is not bound by an enclosing forall binder. For example, in the poly-type forall a . a -> [b], the type variable a is bound, but b is free, so freeTVars should return ["b"]:

ghci> freeTVars (forall "a" ("a" :=> "b"))
["b"]

The list that freeTVars returns should not contain duplicate elements. For example, even though a occurs free twice in a -> a -> b, freeTVars should still return only ["a", "b"].

To enable overloading (using a function with the same name for types and poly-types), the freeTVars function is part of a type class HasTVars.

-- | Type variables of a type
instance HasTVars Type where
  freeTVars :: Type -> [TId]
  freeTVars t     = error "TBD: type freeTVars"

-- | Free type variables of a poly-type (remove forall-bound vars)
instance HasTVars Poly where
  freeTVars :: Poly -> [TId]
  freeTVars s     = error "TBD: poly freeTVars"

As always, you are allowed (and encouraged) to replace freeTVars t and freeTVars s with multiple equations for different patterns.

Hint

For the Type instance of freeTVars, You should pattern-match against each of the constructors for the Type type: TInt, TBool, :=>, and so on. Not all type constructors have free type variables. For example, TInt doesn't. In that case, you should return [].

The trickiest case is the :=> constructor; in that case, you'll need to make a couple of recursive calls to freeTVars and combine the results somehow, remembering to handle duplicates.

Hint

For the Poly instance of freeTVars, You should pattern-match against each of the constructors for the Poly type. Call freeTVars recursively as needed. forall-bound variables aren't free, so remember to remove them!

Hint

Throughout this assignment, you are allowed to use any library functions, as long as you don't add new import statements. In particular, some useful functions from the List library include union and delete (look them up on Hoogle). Also, keep in mind that the List library is imported qualified:

import qualified Data.List as L

So whenever you want to use a function from that library, you have to prefix its name with L. (e.g., L.delete).

When you are done, you should get the following behavior:

ghci> freeTVars TInt
[]

ghci> freeTVars (forall "a" (list "a" :=> "a"))
[]

ghci> freeTVars (TVar "a")
["a"]

ghci> freeTVars (list (list "a"))
["a"]

ghci> freeTVars (TVar "a" :=> TVar "b")
["a", "b"]

ghci> freeTVars (TVar "a" :=> TVar "a")
["a"]

ghci> freeTVars (forall "a" (TVar "a" :=> TVar "b"))
["b"]

ghci> freeTVars (forall "a" (TVar "a" :=> TVar "a"))
[]

Problem 1(b): Substitutions (50 points)

Recall from the description above that a substitution maps type variables to types. For example, a substitution that maps the type variable a to the type Int, maps the type variable b to the type Int -> [Int], and maps the type variable c to the type d (where d is itself a type variable), is represented as:

[("a", TInt), ("b", TInt :=> TList TInt), ("c", TVar "d")]

Implement a function to look up what a type variable maps to in a substitution:

lookupTVar :: TVar -> Subst -> Type

and a function to remove the mapping for a given type variable from a substitution (recall that all type variables in a substitution are supposed to be unique):

removeTVar :: TVar -> Subst -> Subst

When you are done you should get the following behavior:

ghci> lookupTVar "a" [("a", TInt)]
Int

-- if the specified type variable isn't in the substitution,
-- lookupTVar should just return it (as a type)
ghci> lookupTVar "a" [("b", TInt)]
a

ghci> removeTVar "a" [("a", TInt)]
[]

ghci> removeTVar "a" [("b", TInt)]
[("b",Int)]

Hint

A Subst is just a list. You know how to traverse these in Haskell. Hint: use recursion!

Hint

In removeTVar, you're building a list that (potentially) leaves out an element from the original list.

Next, use lookupTVar and removeTVar to write two functions that apply a substitution to a type and to a poly-type (both named apply), and return the resulting type. Applying a substitution to a type means replacing any free type variables in the type with whatever the substitution maps them to. You should take care to leave bound type variables alone. For example, if you're applying the substitution [("a",TInt)] to the type forall a . [a], then the resulting type is still forall a . [a].

instance Substitutable Type where
  apply :: Subst -> Type -> Type
  apply sub t         = error "TBD: ..."

instance Substitutable Poly where
  apply :: Subst -> Poly -> Poly
  apply sub p         = error "TBD: ..."

Once you have implemented this functionality, you should get the following behavior:

ghci> apply [("a",TInt)] (list "a")
[Int]

ghci> apply [("a",TInt)] (list "b")
[b]

ghci> apply [("a",list TInt), ("b", TInt)] ("b" :=> "a")
(Int) -> [Int]

ghci> apply [("a",TInt)] (forall "a" (list "a"))
forall a . [a]

ghci> apply [("b", TInt)] (forall "a" ("a" :=> "b"))
forall a . (a) -> Int

Hint

Use removeTVar to deal with bound type variables in the implementation of apply.

Finally, use apply to implement the function extendSubst :: Subst -> TVar -> Type -> Subst, which extends a substitution with a new mapping from a type variable to a type.

The behavior of extendSubst is rather subtle:

If the existing substitution contains a mapping for a given type variable, and the type you're adding contains free occurrences of that type variable, then extendSubst should replace those occurrences with whatever the existing substitution maps them to. For example, if the current substitution is [("a", Int)] and you are adding a mapping from b to [a], then the substitution becomes [("b",[Int]),("a",Int)] -- not [("b",[a]),("a",Int)]
If you are adding a mapping for a given type variable, and any types in the existing substitution contain occurrences of that type variable, then extendSubst should replace those occurrences with whatever the new mapping says they should be. For example, if the current substitution is [("a", [b]] and you are adding a mapping from b to Bool, then the substitution becomes [("b",Bool),("a",[Bool])] -- not [("b",Bool),("a",[b])].

When you are done, you should get the following behavior:

ghci> extendSubst [("a", TInt)] "b" TBool
[("b",Bool), ("a",Int)]

ghci> extendSubst [("a", TInt)] "b" (list "a")
[("b",[Int]),("a",Int)]

ghci> extendSubst [("a", list "b")] "b" TBool
[("b",Bool), ("a",[Bool])]

Hint

Your implementation of extendSubst should call apply twice.

Part 2: Unification (50 points)

Unification is a crucial part of type inference. Unification means coming up with a substitution that makes two types the "same".

For instance, consider the types a -> Int and Bool -> b. To make these types the same, the type variable a should be replaced with Bool, and the type variable b should be replaced with Int. Therefore, the substitution [(a, Bool), (b, Int)] is a unifier for these two type.

Of course, some types can't be unified. For example, there is no substitution that is a unifier for the types Int and Bool. More interestingly, there is no substitution that is a unifier for the types a and [a]. Therefore, some attempts at unification will fail.

In a nutshell, the type inference algorithm that we'll implement works by:

recursively traversing the program, assembling expression types from the types of their sub-expressions;
when some type is not yet known, it picks a fresh type variable to represent this type;
whenever two types must be the same, the algorithm tries to unify them and figure out what these type variables actually stand for. If unification fails, then type inference fails.

A fresh type variable is a type variable that is distinct from all type variables the algorithm has generated before. Our implementation will name the fresh type variables a0, a1, .... In order to remember the index of the first unused fresh variable, we introduce a new data type called InferState:

data InferState = InferState {
    stSub :: Subst -- ^ current substitution
  , stCnt :: Int   -- ^ number of fresh type variables generated so far
}

An InferState is a Haskell record, which we haven't seen before. A record is like a tuple, but its fields have names. An InferState has two named fields: stSub, which represents the current substitution, and stCnt, which represents the number of fresh type variables represented so far. It represents the "current state" of the unification and inference algorithms. It remembers how many fresh type variables the algorithm has generated so far, alongside the "current substitution".

The provided code defines initInferState for you, which contains an empty substitution and a stCnt of 0.

Problem 2(a): Unification for type variables (20 points)

Implement a function unifyTVar :: InferState -> TId -> Type -> InferState. unifyTVar takes an InferState (containing a substitution) st, a type variable a, and a type t, and tries to produce a new InferState (containing an updated substitution that is a unifier for a and t.
If it succeeds, it records the new type assignment by extending the state st. If it fails, it throws an exception.

You can use the provided helper function extendState to extend a substitution inside a state.

Assumption: In the implementation of unifyTVar, you can assume that a does not appear in the domain of stSub st (i.e., among the type variables it maps); remember that code that uses unifyTVar should guarantee this property.

When you are done, you should get the following behavior (recall that initInferState contains an empty substitution):

ghci> stSub (unifyTVar initInferState "a" (list "b"))
[("a",[b])]

ghci> stSub (unifyTVar initInferState "a" "a")
[]

ghci> stSub (unifyTVar initInferState "a" (TInt :=> TBool))
[("a",(Int) -> Bool)]

ghci> stSub (unifyTVar initInferState "a" ("a" :=> TBool))
*** Exception: Error {errMsg = "type error: cannot unify a and (a) -> Bool (occurs check)"}

ghci> stSub (unifyTVar initInferState "a" (list "a"))
*** Exception: Error {errMsg = "type error: cannot unify a and [a] (occurs check)"}

Hint

The implementation of unifyTVar is quite simple. There are three cases to handle:

Unifying a type variable with itself (for instance, unifying a with a) succeeds, and doesn't change the existing substitution.
Attempting to unify a type variable with a type containing a free occurrence of that type variable somewhere in its structure (for instance, unifying a with [a], or unifying a with a -> Int) fails, and is known as the "occurs check", so you should throw an exception. To get an exception like the one above, you could use throw (Error ("type error: cannot unify " ++ a ++ " and " ++ show t ++ " (occurs check)")) (but any error message containing the string "type error" will pass tests).
Unifying a type variable with a type not containing a free occurrence of that type variable succeeds, and you should use extendState.

Problem 2(b): Unification for types (30 points)

With unifyTVar done, you can now implement a function unify :: InferState -> Type -> Type -> InferState. unify takes an InferState containing a substitution, and two types t1 and t2. It tries to unify t1 and t2, extending the substitution as needed to unify them, or fails and throws an exception.

Assumption: In the implementation of unify, you can assume that any type variables in t1 and t2 do not appear in the domain of stSub st; remember that code that uses unify should guarantee this property.

Here are some examples of how unify should behave:

ghci> stSub (unify initInferState TInt TInt)
[]

ghci> stSub (unify initInferState TInt TBool)
*** Exception: Error {errMsg = "type error: cannot unify Int and Bool"}

ghci> stSub (unify initInferState "a" "a")
[]

ghci> stSub (unify initInferState "a" "b")
[("a",b)]

ghci> stSub (unify initInferState ("a" :=> TInt) (TBool :=> "b"))
[("b",Int),("a",Bool)]

ghci> stSub (unify initInferState (list "a") (list (list "a")))
*** Exception: Error {errMsg = "type error: cannot unify a and [a] (occurs check)"}

ghci> stSub (unify initInferState (list "a" :=> "b") ("c" :=> TBool))
[("b",Bool),("c",[a])]

ghci> stSub (unify initInferState (list "a" :=> "b") (TInt :=> TBool))
*** Exception: Error {errMsg = "type error: cannot unify [a] and Int"}

ghci> stSub (unify initInferState ("a" :=> "b") (TInt :=> TBool))
[("b",Bool),("a",Int)]

ghci> stSub (unify initInferState ("a" :=> (TBool :=> "b")) ((TInt :=> "c") :=> ("d" :=> list "e")))
[("b",[e]),("d",Bool),("a",(Int) -> c)]

Hint

Here are some hints on how to implement the trickiest parts of `unify`:

When either Type argument is a TVar, then delegate to unifyTVar.
The trickiest case is when both Type arguments are functions, e.g., argType :=> resultType. In this situation, you should:
1. Unify the argTypes.
2. Apply the newfound substitution to the resultTypes. You should already know what function does this.
3. Unify the resultTypes.

Part 3: Type Inference (90 points)

Now we have everything in place to implement type inference!

Problem 3(a): Mono-types (20 points)

Let's first consider the fragment of the Nano language without:

binary operators
conditionals
polymorphism
recursion
lists

Implement the function infer :: InferState -> TypeEnv -> Expr -> (InferState, Type) for this fragment.

The infer function takes arguments st (an InferState), gamma (a TypeEnv), and e (an Expr). It attempts to infer the type of e in type environment gamma given that the current state is st. If e imposes constraints on the types of its sub-expressions, infer calls unify to enforce these constraints. If all unification steps succeed, infer returns a tuple of the InferState (possibly extended with new type assignments generated during unification) and the inferred type of e.

For this part of the assignment, you can assume that the type environment maps all variables to mono-types, i.e., Mono t.

In particular, you should observe the following behavior:

ghci> typeOfString "True"
Bool

ghci> typeOfString "1"
Int

ghci> typeOfString "(\\x -> x) 5"
Int

ghci> typeOfString "(\\x -> x + 1) True"
*** Exception: Error {errMsg = "type error: cannot unify Int and Bool"}

ghci> typeOfString "let x = 5 in x + 3"
Int

Hint

There are six cases of infer that you'll need to implement: EInt, EBool, EVar, ELam, EApp, and ELet.

EInt and EBool should be easy.
EVar will require you to look up the type of a variable in the type environment, using lookupVarType. The only tricky thing to deal with here is that a TypeEnv contains Poly types, and you just want a Type. To pass test cases like those shown above (and all of the other test cases for 3(a)), you can write a simple helper function that takes a Poly and returns a Type. In the Mono case, this function should return the inner value of type Type, and in the Forall case, you could throw an error or leave it as undefined for now. (You'll need to revisit this decision when implementing part 3(b) and beyond, but for this part of the assignment, it's fine.)
For the (ELam x body) case, remember what the typing rule for lambda expressions looks like. Your result type will be something of the shape t1 :=> t2. You'll need to:

Create a fresh type variable for the type of the function's argument, x. (Remember to update the InferState to count this newly created type variable.)
Extend the type environment with x and the newly created type variable.
Infer the type of the function body in the extended type environment.
Since step 3 may update the substitution, you'll need to apply the updated substitution to the argument type variable to get the final argument type. (For example, consider a function like \x -> x + 1. This function has type Int -> Int, but the only way that we know that the type of the argument x is Int is because of the constraint imposed by + in the body, which says that both of its arguments must be of type Int. So this last step is crucial. Without it, you'll end up inferring something like (a0) -> Int as the type of \x -> x + 1. That's a sure sign that you've forgotten this last step.)

For the (EApp e1 e2) case, remember what the typing rule for application expressions looks like. e1 should be of function type, e2 should be the same as the argument type of e1, and the type of the entire expression should be the same as the result type of e1. You'll need to:

Infer the type of e1, the function.
Infer the type of e2, the argument (using the InferState that you got from step 1).
Create a fresh type variable for the function's result type (which will be the type of the entire application expression). (Remember to update the InferState to count this newly created type variable.)
Unify the inferred type of e1 with the function type that it ought to be (making use of the inferred type of e2 and the type variable that you created). This step will ensure that the specified argument is of the appropriate type to be called with the specified function. If you leave out this step, then your type checker will not reject programs like (\x -> x + 1) True as ill-typed!
Since the previous steps may update the substitution, you'll need to apply the updated substitution to the type variable you created to get the type of the entire application expression. (For example, consider an application like (\x -> True) 3. This expression has type Bool, but the only reason we know that is because of the constraint imposed by True in the body of the function. If you leave out this step, the type that you infer for this expression will still be an uninstantiated type variable.

For the (ELet x e1 e2) case, remember what the typing rule for let expressions looks like. You'll need to:

Infer the type of e1.
Create a fresh type variable for x's type. (Remember to update the InferState to count this newly created type variable.)
Extend the type environment with the newly created type variable.
Unify the inferred type of e1 with the type of x.
Since the previous steps may update the substitution, you'll need to apply the updated substitution to the type environment.
Finally, infer the type of e2 in the extended type environment, giving you the type of the entire let expression.

When you are finished with this part of the assignment, you should be passing tests 3atest1.hs through 3atest4.hs, and the above examples should also work.

Problem 3(b): Polymorphism (20 points)

In this part of the assignment, you will add support for polymorphism.

Variables in the type environment can now have polymorphic types, like forall a . a -> a.
Whenever we use a variable, we have to instantiate its poly-type, i.e., replace all bound type variables with fresh free type variables.
Whenever we define a variable using a let-expression, we have to generalize its type into a poly-type.

First implement the function generalize :: TypeEnv -> Type -> Poly.

generalize gamma t generalizes the type t into a poly-type, i.e., binds all its type variables that are not free in gamma with a Forall:

When you are done, you should observe the following behavior:

ghci> generalize [] ("a" :=> "a")
forall a . (a) -> a

ghci> generalize [("x", Mono "a")] ("a" :=> "a")
(a) -> a

We've provided for you an implementation of the function instantiate :: Int -> Poly -> (Int, Type), which is the opposite of generalize: instantiate n s instantiates a poly-type s by replacing its bound type variables with fresh type variables. Here, n is the index of the first unused fresh type variable.

ghci> instantiate 0 (Forall "a" (Forall "b" (Mono ("a" :=> "b"))))
(2,(a0) -> a1)

Finally, modify your implementation of infer to support polymorphism, using generalize and instantiate.

When you are done, your implementation should be able to type-check the following example, in which the same function (the identity function, id) is used in two different places, on arguments of different types:

ghci> typeOfString "let id = \\x -> x in let y = id 5 in id (\\z -> z + y)"
(Int) -> Int

Hint

You'll need to use instantiate in just one place: when you look up the type of a variable in a type environment (which will happen in the EVar case of infer). You'll also need to use generalize in just one place: when inferring the type of a let-bound variable, you'll want to generalize it (so that it can be later instantiated in different ways). Without generalization, the above example that applies the function id to arguments of different types will fail to type-check.

Problem 3(c): Built-in functions (20 points)

Instead of implementing separate type inference for binary operators, conditionals, and lists, we decided to represent them as built-in functions and reuse the type inference for variables and function application.

Fill in the types of built-in functions inside preludeTypes (we have pre-filled the one for +). Remember that you are not allowed to use a0, a1, ... as bound type variables, since those are reserved for use as free type variables by the algorithm.

Problem 3(d): Recursion (30 points)

Modify your implementation of infer to support recursive function definitions. Once you are done, all the provided tests should pass.

Hint

Recall how in Assignment 4, to support recursion, when evaluating a let-bound expression, we had to extend the environment to include the variable that the expression itself was bound to. For example, to evaluate an expression like let f = \n -> if n == 0 then 1 else n * f (n - 1) in f 5, the subexpression if n == 0 then 1 else n * f (n - 1) needed to be evaluated in an extended environment that contained a binding for f.

For type inference, you'll need to do something similar. In the ELet x e1 e2 case of infer, when inferring the type of e1, you'll need to make sure to do so in an extended type environment that has a binding for x. (You're probably already extending the type environment with a binding for x and using that extended type environment to infer the type of e2, but to support recursive functions, you'll need to also use the extended type environment to infer the type of e1, because x may occur in e1.)

Development and Debugging Tips

As with the previous assignments, you can use the stack ghci command to start the GHCi interpreter with your code loaded. We very strongly recommend using GHCi to develop and debug your code.

In GHCi, we recommend using the following provided functions to develop and debug your code as you work on the assignment:

`typeOfExpr`

The typeOfExpr function takes an expression, represented as a Nano AST of type Expr, and infers the type of the expression, passing the preludeTypes type environment.

ghci> typeOfExpr (EBin Plus (EInt 1) (EInt 2))
Int

`typeOfString`

Since it would be annoying to always have to write out ASTs for Nano programs, there's also a typeOfString helper function that takes a string, parses it into a Nano expression, and then infers the type of the expression.

For example, this will work once you correctly implement the EBool case of infer:

ghci> typeOfString "True"
Bool

Most Nano programs will exercise more parts of infer. For example, adding two numbers requires the EApp case of infer to be implemented (since 1 + 2 is an EBin expression, and the EBin case, which is implemented for you, leverages the EApp case).

Once you've implemented the EApp case, this will work:

>>> typeOfString "1 + 2"
Int

Once you've implemented EApp and ELet, this will work:

>>> typeOfString "let x = 1 in x + 2"
Int

Note that backslashes (\) are escape characters in Haskell strings, so you need to escape them with another backslash to write lambda expressions. For example, the lambda expression \x -> x + 1 should be written as "\\x -> x + 1".

Once you've implemented the EApp and ELam cases of infer, this should work:

ghci> typeOfString "\\x -> x + 1"
(Int) -> Int

`typeOfFile`

You can write your own Nano programs, save them in files, and call typeOfFile to type-check them.

The typeOfFile function takes a file path, reads the file's contents as a string, parses the string into a Nano AST, and type-checks the AST using infer, passing the preludeTypes type environment. This function is useful for testing your code with files.

>>> typeOfFile "tests/input/3atest1.hs"
Bool

README - UCSC-CSE-114A/05-types GitHub Wiki

Assignment 5: Type Inference and Constraint Solving (210 points)

Submission Instructions

Solving the Problems

Checking your Solutions

Problem descriptions

Data Structures and Overview

Types

Type Environments

Type Substitutions

Part 1: Warm-up (70 points)

Problem 1(a): Free type variables (20 points)

Problem 1(b): Substitutions (50 points)

Part 2: Unification (50 points)

Problem 2(a): Unification for type variables (20 points)

Problem 2(b): Unification for types (30 points)

Part 3: Type Inference (90 points)

Problem 3(a): Mono-types (20 points)

Problem 3(b): Polymorphism (20 points)

Problem 3(c): Built-in functions (20 points)

Problem 3(d): Recursion (30 points)

Development and Debugging Tips

`typeOfExpr`

`typeOfString`

`typeOfFile`

⚠️ GitHub.com Fallback ⚠️

README - UCSC-CSE-114A/05-types GitHub Wiki

Assignment 5: Type Inference and Constraint Solving (210 points)

Submission Instructions

Solving the Problems

Checking your Solutions

Problem descriptions

Data Structures and Overview

Types

Type Environments

Type Substitutions

Part 1: Warm-up (70 points)

Problem 1(a): Free type variables (20 points)

Problem 1(b): Substitutions (50 points)

Part 2: Unification (50 points)

Problem 2(a): Unification for type variables (20 points)

Problem 2(b): Unification for types (30 points)

Part 3: Type Inference (90 points)

Problem 3(a): Mono-types (20 points)

Problem 3(b): Polymorphism (20 points)

Problem 3(c): Built-in functions (20 points)

Problem 3(d): Recursion (30 points)

Development and Debugging Tips

typeOfExpr

typeOfString

typeOfFile

⚠️ **GitHub.com Fallback** ⚠️

`typeOfExpr`

`typeOfString`

`typeOfFile`

⚠️ GitHub.com Fallback ⚠️