Semantic Analysis

What does this phase do?

"The semantic analysis phase of a compiler connects variable definitions to their uses, checks that each expression has a correct type."

Actually, the excerption above is only a part of the whole in Tiger Book. But i think it describes this phase better, so i just let the rest go.

More deeply, the restriction given by Syntax Analysis phase is not enough since it's too general. Take a small code line written in C for example:

int func(int a) {...}

It's obviously a function definition, and its syntax may looks like this:

TypeId FuncId(TypeId VarId...) {...}

TypeID, FuncId, VarId all stand for identifiers, ... stand for more literal characters and (), {} stand for themselves.

We can see that the formats are loosely restricted by language syntax. For example, VarId are just IDs, so that any ID can be passed to the function no matter whether it stands for a variable. This is because language must have flexibility to allow developers to explore possibilities in code. However, for a language which has a type system, the correctness of types occurred must be checked. So that the compiler will throw an error when we pass a Int variable while a Short is expected.

In fact, much of the work in this phase is about type-checking.

What is environment?

Environment here stands for information collections. There are two major environments in tiger language:

Type Environment
Function and Variable Environment

A type environment gathers information about types and function and variable environment gathers information about functions and variables.

Why Type Environment

In static programming languages like c++, users have to given the variable types before declaring variables. Thus, compilers have to check if the given type is already defined and if it satisfy the context. Naturally, there is a necessary to keep all the types users declared so far for later checking.

Why Function and Variable Environment

Like type environment, we have to keep information about variables and functions.

For variables, we need only to keep their types. Because when passing variables to functions or used as expressions, we need to check it for type consistency.

For functions, we need to keep their parameter types and return types. Because when using them, we need to pass variables and their return values can be assigned to variables. Naturally, type checking occurs.

Recursive Declarations(Types or Functions)

One powerful feature tiger supports is recursive declarations. However, there are to kinds of recursions:

Recursive
Mutually recursive

Recursive

Recursive is a recursion which recursive itself. For example, the following type is a recursive type:

type intList = { hd : int, tl : intList }

This type contains a int field and a field whose type is itself. Functions are similar.

Mutually recursive

Generally, every type must be constructed by some already defined types or built-in types. However, mutually recursive types are exceptions.

Mutually recursive is a recursion declared by a consecutive sequence of type or function declarations without intervening non-type or non-function declarations. For example, the following is a valid mutually recursive type:

type tree = { key : int, children : treelist }
type treelist = { hd : tree, tl : treelist }

But the following is not a valid one(because type tree and treelist are separated by a variable declaration):

type tree = { key : int, children : treelist }
var varTemp = 0;
type treelist = { hd : tree, tl : treelist }

Each type recursion cycle must pass through a record or array type. Thus, the following is not a valid one:

type b = c
type c = b

implementation

Since recursive declarations contains types or function names we haven't completely defined yet, implement them in one pass is not possible. So, we have to do two passing, the first process the headers(names for types, parameters, return type and names for functions) of them and keep enough information, the second process the bodies(part in the right side of = for types, bodies for functions) of them.

Tiger Semantics

In the following descriptions, any words ends with id is a identifier, and any words starts with exp is an expression.

Expressions

There are many kinds of expressions in tiger:

L-Values
Function Call
Nil
Integer Literal
String Literal
Array Creation
Record Creation
Arithmetic
Parentheses
No Value
Sequencing
For
If-Then-Else
If-Then
Comparison
Boolean Operate
Assignment
Let
While
Negation
Break

L-Values

There are three kind of left-values : Variable, Record Field and Array Subscript.

Syntax : lvalue

Variable

Syntax :
- id
Semantics :
- A variable ID.

Record Field

Syntax :
- lvalue . id
Semantics :
- lvalue is a left value which represents a record-type variable.
- id is a field ID in record type denoted by lvalue.

Array Subscript

Syntax :
- lvalue [ exp ]
Semantics :
- lvalue is a left value which represents an array-type variable.
- exp is an integer-result expression.

Function Call

Syntax :
- id () or id (exp [,exp])
Note :
- Function parameters are calculated in reverse order.
Semantics :
- id is a function ID.
- exp's result type must match the types of the calling function in order.(If the formal parameter type is a record type, then exp's result type can be nil)

Nil

Syntax :
- nil
Semantics :
- nil can only denote a value of record type.
- nil must be used in a context where the type can be determined.

Integer Literal

None.

String Literal

None.

Array Creation

Syntax :
- type-id [exp1] of exp2
Semantics :
- type-id denotes an array type.
- exp1's result type must be of integer type.
- exp2's result type must be the same of array element.(If the array element type is a record type, then exp2's result type can be nil)

Record Creation

Syntax :
- type-id { } or type-id { id=exp, {,id=exp}}
Semantics :
- type-id denotes a record type.
- ids must be field IDs of type type-id in order.
- exps' result types must match the field types of type type-id in order.(If the field type is a record type, then exp's result type can be nil)

Arithmetic

Syntax :
- exp + exp
- exp - exp
- exp / exp
- exp * exp
Semantics :
- exp's result must be of integer type.

Parentheses

Syntax :
- ( exp )
Semantics :
- exp is semantic-checked.

No Value

Syntax :
- ()
Semantics :
- () denotes no value and thus, no result type.

Sequencing

Syntax :
- ( exp;exp;...exp )
Semantics :
- All expressions are semantic-checked in order.
- The result type of a sequence is the result type(if any) yielded by the last of the expressions.

For

Syntax :
- for id := exp1 to exp2 do exp3
Semantics :
- exp1 and exp2's result types must be of integer type.
- exp3 must has no return value or return type.
- id is a new variable ID implicitly declared by the for statement, whose scope covers only exp3. And it cannot be assigned to.

If-Then-Else

Syntax :
- if exp1 then exp2 else exp3
Semantics :
- exp1's result type must be of integer type.
- exp2 and exp3's result type must be the same, or both have no result type.

If-Then

Syntax :
- if exp1 then exp2
Semantics :
- exp1's result type must be of integer type.
- exp2 must has no result type.

Comparison

Syntax :
- exp = exp
- exp <> exp
- exp > exp
- exp < exp
- exp >= exp
- exp <= exp
Semantics :
- Two exp must has same result type.
- exps can be integer or string type expressions.
- When operators are = or <>, exps can be of record type or array type.(When one exp is of record type, the result type of another can be nil)

Boolean Operate

Syntax :
- exp & exp
- exp | exp
Semantics :
- exp must be of integer type.

Assignment

Syntax :
- lvalue := exp
Semantics :
- lvalue and exp's result type must be same.(If lvalue is of a record type, then exp's result type can be nil)

Let

Syntax :
- let decs in expseq end
Note :
- expseq is a sequence of zero or more expressions, separated by semicolons.
Semantics :
- decs are all semantic-checked.
- expseq are all semantic-checked.

While

Syntax :
- while id := exp1 to exp2 do exp3
Note :
- id is a implicitly declared variable whose scope is only the while expression.
Semantics :
- The result type of exp1 and exp2 must be integer type.
- exp3 must have no return value.

Negation

Syntax :
- -exp
Semantics :
- The result type of exp must be integer.

Break

Syntax :
- break
Semantics :
- break must be within a while or for.

Declarations

Syntax :
- decs -> { dec }
Note :
- { dec } stands for a possible empty sequence of dec)

There are three kinds of declarations:

Variable Declaration
Type Declaration
Subroutine Declaration

Variable Declaration

Syntax :
- vardec

Variable declarations have to formats:

Short Form
Long Form

Short Form

Syntax :
- var id := exp
Semantics :
- exp must has result type and can be nil.

Long Form

Syntax :
- var id : type-id := exp
Semantics :
- The type denoted by type-id must have already been declared.
- exp's result type must be the same as type-id.(If type-id is of a record type, then exp's result type can be nil)

Type Declaration

Syntax :
- tydec

There are three types of type declarations:

Named
Record
Array

Named Type

Syntax :
- type type-id1 = type-id2
Semantics :
- type-id2 must be a already defined type ID.
- type-id1 cannot be the same as type-id2.(Which means a self recursive type declaration)

Record

Syntax :
- type type-id1 { tyfields }
- tyfields -> empty
- tyfields -> id:type-id2 { ,id:type-id3 }
Semantics :
- Types denoted by type-ids in tyfields must have already been defined.

Array

Syntax :
- type type-id1 = array of type-id2
Semantics :
- type-id2 must be a already defined type ID.
- type-id1 cannot be the same as type-id2.(Which means a self recursive type declaration)

Subroutine Declaration

Syntax :
- fundec

In tiger, there are two kinds of subroutines:

Function
Procedure

Here, i use subroutine to describe what named funtion in C language, since functions in tiger mean C functions that have return value and procedures in tiger mean C functions that have no return value.

Procedure

Syntax :
- function id ( tyfields ) = exp
Semantics :
- Types in tyfields must have already been defined.
- exp must have no result type(namely no return value).

Function

Syntax :
- function id ( tyfields ) : type-id = exp
Semantics :
- Types in tyfields must have already been defined.
- Function return type type-id must have already been defined.
- The result type of exp must be the same with type-id.(If type-id is a record type, then the result type of exp can be nil)

Sources

Support for major semantic analysis:

mySemantic.h
mySemantic.c

Support for recursive declarations:

recursiveDecs.h
recursiveDecs.c

Support for break:

breakChecker.h
breakChecker.c

Support for for:

forChecker.h
forChecker.c

Support for Type, Variable and Function environment:

myEnvironment.h
myEnvironment.c

Support for type equality checking:

typeChecker.h
typeChecker.c

Support for error report:

myError.h
myError.c

Strategy problems in Error Detection

When looking through the mySemantic.c file, you will discover many boolean variables used. This is because i originally want to detect as many error as possible while parsing the tiger sources, and i use boolean variables to keep the results which will be used for later error reporting.

But, i finally found it meaningless. If you want to continue error detection while one was already found, then you must do error correction which is lacked in this compiler. So the whole error messages are mixed by some wrong's, and it may confuse users. However, users can definitely trust the first one.