Semantic Analysis - lgxZJ/Tiger-Compiler GitHub Wiki

Semantic Analysis

What does this phase do?

"The semantic analysis phase of a compiler connects variable definitions to their uses, checks that each expression has a correct type."

Actually, the excerption above is only a part of the whole in Tiger Book. But i think it describes this phase better, so i just let the rest go.

More deeply, the restriction given by Syntax Analysis phase is not enough since it's too general. Take a small code line written in C for example:

int func(int a) {...}

It's obviously a function definition, and its syntax may looks like this:

TypeId FuncId(TypeId VarId...) {...}

TypeID, FuncId, VarId all stand for identifiers, ... stand for more literal characters and (), {} stand for themselves.

We can see that the formats are loosely restricted by language syntax. For example, VarId are just IDs, so that any ID can be passed to the function no matter whether it stands for a variable. This is because language must have flexibility to allow developers to explore possibilities in code. However, for a language which has a type system, the correctness of types occurred must be checked. So that the compiler will throw an error when we pass a Int variable while a Short is expected.

In fact, much of the work in this phase is about type-checking.

What is environment?

Environment here stands for information collections. There are two major environments in tiger language:

  • Type Environment
  • Function and Variable Environment

A type environment gathers information about types and function and variable environment gathers information about functions and variables.

Why Type Environment

In static programming languages like c++, users have to given the variable types before declaring variables. Thus, compilers have to check if the given type is already defined and if it satisfy the context. Naturally, there is a necessary to keep all the types users declared so far for later checking.

Why Function and Variable Environment

Like type environment, we have to keep information about variables and functions.

For variables, we need only to keep their types. Because when passing variables to functions or used as expressions, we need to check it for type consistency.

For functions, we need to keep their parameter types and return types. Because when using them, we need to pass variables and their return values can be assigned to variables. Naturally, type checking occurs.

Recursive Declarations(Types or Functions)

One powerful feature tiger supports is recursive declarations. However, there are to kinds of recursions:

  1. Recursive
  2. Mutually recursive

Recursive

Recursive is a recursion which recursive itself. For example, the following type is a recursive type:

type intList = { hd : int, tl : intList }

This type contains a int field and a field whose type is itself. Functions are similar.

Mutually recursive

Generally, every type must be constructed by some already defined types or built-in types. However, mutually recursive types are exceptions.

Mutually recursive is a recursion declared by a consecutive sequence of type or function declarations without intervening non-type or non-function declarations. For example, the following is a valid mutually recursive type:

type tree = { key : int, children : treelist }
type treelist = { hd : tree, tl : treelist }

But the following is not a valid one(because type tree and treelist are separated by a variable declaration):

type tree = { key : int, children : treelist }
var varTemp = 0;
type treelist = { hd : tree, tl : treelist }

Each type recursion cycle must pass through a record or array type. Thus, the following is not a valid one:

type b = c
type c = b

implementation

Since recursive declarations contains types or function names we haven't completely defined yet, implement them in one pass is not possible. So, we have to do two passing, the first process the headers(names for types, parameters, return type and names for functions) of them and keep enough information, the second process the bodies(part in the right side of = for types, bodies for functions) of them.

Tiger Semantics

In the following descriptions, any words ends with id is a identifier, and any words starts with exp is an expression.

Expressions

There are many kinds of expressions in tiger:

  1. L-Values
  2. Function Call
  3. Nil
  4. Integer Literal
  5. String Literal
  6. Array Creation
  7. Record Creation
  8. Arithmetic
  9. Parentheses
  10. No Value
  11. Sequencing
  12. For
  13. If-Then-Else
  14. If-Then
  15. Comparison
  16. Boolean Operate
  17. Assignment
  18. Let
  19. While
  20. Negation
  21. Break

L-Values

There are three kind of left-values : Variable, Record Field and Array Subscript.

  • Syntax : lvalue
Variable
  • Syntax :

    • id
  • Semantics :

    • A variable ID.
Record Field
  • Syntax :

    • lvalue . id
  • Semantics :

    • lvalue is a left value which represents a record-type variable.
    • id is a field ID in record type denoted by lvalue.
Array Subscript
  • Syntax :

    • lvalue [ exp ]
  • Semantics :

    • lvalue is a left value which represents an array-type variable.
    • exp is an integer-result expression.

Function Call

  • Syntax :

    • id () or id (exp [,exp])
  • Note :

    • Function parameters are calculated in reverse order.
  • Semantics :

    • id is a function ID.
    • exp's result type must match the types of the calling function in order.(If the formal parameter type is a record type, then exp's result type can be nil)

Nil

  • Syntax :

    • nil
  • Semantics :

    • nil can only denote a value of record type.
    • nil must be used in a context where the type can be determined.

Integer Literal

None.

String Literal

None.

Array Creation

  • Syntax :

    • type-id [exp1] of exp2
  • Semantics :

    • type-id denotes an array type.
    • exp1's result type must be of integer type.
    • exp2's result type must be the same of array element.(If the array element type is a record type, then exp2's result type can be nil)

Record Creation

  • Syntax :

    • type-id { } or type-id { id=exp, {,id=exp}}
  • Semantics :

    • type-id denotes a record type.
    • ids must be field IDs of type type-id in order.
    • exps' result types must match the field types of type type-id in order.(If the field type is a record type, then exp's result type can be nil)

Arithmetic

  • Syntax :

    • exp + exp
    • exp - exp
    • exp / exp
    • exp * exp
  • Semantics :

    • exp's result must be of integer type.

Parentheses

  • Syntax :

    • ( exp )
  • Semantics :

    • exp is semantic-checked.

No Value

  • Syntax :

    • ()
  • Semantics :

    • () denotes no value and thus, no result type.

Sequencing

  • Syntax :

    • ( exp;exp;...exp )
  • Semantics :

    • All expressions are semantic-checked in order.
    • The result type of a sequence is the result type(if any) yielded by the last of the expressions.

For

  • Syntax :

    • for id := exp1 to exp2 do exp3
  • Semantics :

    • exp1 and exp2's result types must be of integer type.
    • exp3 must has no return value or return type.
    • id is a new variable ID implicitly declared by the for statement, whose scope covers only exp3. And it cannot be assigned to.

If-Then-Else

  • Syntax :

    • if exp1 then exp2 else exp3
  • Semantics :

    • exp1's result type must be of integer type.
    • exp2 and exp3's result type must be the same, or both have no result type.

If-Then

  • Syntax :

    • if exp1 then exp2
  • Semantics :

    • exp1's result type must be of integer type.
    • exp2 must has no result type.

Comparison

  • Syntax :

    • exp = exp
    • exp <> exp
    • exp > exp
    • exp < exp
    • exp >= exp
    • exp <= exp
  • Semantics :

    • Two exp must has same result type.
    • exps can be integer or string type expressions.
    • When operators are = or <>, exps can be of record type or array type.(When one exp is of record type, the result type of another can be nil)

Boolean Operate

  • Syntax :

    • exp & exp
    • exp | exp
  • Semantics :

    • exp must be of integer type.

Assignment

  • Syntax :

    • lvalue := exp
  • Semantics :

    • lvalue and exp's result type must be same.(If lvalue is of a record type, then exp's result type can be nil)

Let

  • Syntax :

    • let decs in expseq end
  • Note :

    • expseq is a sequence of zero or more expressions, separated by semicolons.
  • Semantics :

    • decs are all semantic-checked.
    • expseq are all semantic-checked.

While

  • Syntax :

    • while id := exp1 to exp2 do exp3
  • Note :

    • id is a implicitly declared variable whose scope is only the while expression.
  • Semantics :

    • The result type of exp1 and exp2 must be integer type.
    • exp3 must have no return value.

Negation

  • Syntax :

    • -exp
  • Semantics :

    • The result type of exp must be integer.

Break

  • Syntax :

    • break
  • Semantics :

    • break must be within a while or for.

Declarations

  • Syntax :
    • decs -> { dec }
  • Note :
    • { dec } stands for a possible empty sequence of dec)

There are three kinds of declarations:

  1. Variable Declaration
  2. Type Declaration
  3. Subroutine Declaration

Variable Declaration

  • Syntax :
    • vardec

Variable declarations have to formats:

  1. Short Form
  2. Long Form
Short Form
  • Syntax :

    • var id := exp
  • Semantics :

    • exp must has result type and can be nil.
Long Form
  • Syntax :

    • var id : type-id := exp
  • Semantics :

    • The type denoted by type-id must have already been declared.
    • exp's result type must be the same as type-id.(If type-id is of a record type, then exp's result type can be nil)

Type Declaration

  • Syntax :
    • tydec

There are three types of type declarations:

  1. Named
  2. Record
  3. Array
Named Type
  • Syntax :

    • type type-id1 = type-id2
  • Semantics :

    • type-id2 must be a already defined type ID.
    • type-id1 cannot be the same as type-id2.(Which means a self recursive type declaration)
Record
  • Syntax :

    • type type-id1 { tyfields }
    • tyfields -> empty
    • tyfields -> id:type-id2 { ,id:type-id3 }
  • Semantics :

    • Types denoted by type-ids in tyfields must have already been defined.
Array
  • Syntax :

    • type type-id1 = array of type-id2
  • Semantics :

    • type-id2 must be a already defined type ID.
    • type-id1 cannot be the same as type-id2.(Which means a self recursive type declaration)

Subroutine Declaration

  • Syntax :
    • fundec

In tiger, there are two kinds of subroutines:

  1. Function
  2. Procedure

Here, i use subroutine to describe what named funtion in C language, since functions in tiger mean C functions that have return value and procedures in tiger mean C functions that have no return value.

Procedure
  • Syntax :

    • function id ( tyfields ) = exp
  • Semantics :

    • Types in tyfields must have already been defined.
    • exp must have no result type(namely no return value).
Function
  • Syntax :

    • function id ( tyfields ) : type-id = exp
  • Semantics :

    • Types in tyfields must have already been defined.
    • Function return type type-id must have already been defined.
    • The result type of exp must be the same with type-id.(If type-id is a record type, then the result type of exp can be nil)

Sources

Support for major semantic analysis:

  • mySemantic.h
  • mySemantic.c

Support for recursive declarations:

  • recursiveDecs.h
  • recursiveDecs.c

Support for break:

  • breakChecker.h
  • breakChecker.c

Support for for:

  • forChecker.h
  • forChecker.c

Support for Type, Variable and Function environment:

  • myEnvironment.h
  • myEnvironment.c

Support for type equality checking:

  • typeChecker.h
  • typeChecker.c

Support for error report:

  • myError.h
  • myError.c

Strategy problems in Error Detection

When looking through the mySemantic.c file, you will discover many boolean variables used. This is because i originally want to detect as many error as possible while parsing the tiger sources, and i use boolean variables to keep the results which will be used for later error reporting.

But, i finally found it meaningless. If you want to continue error detection while one was already found, then you must do error correction which is lacked in this compiler. So the whole error messages are mixed by some wrong's, and it may confuse users. However, users can definitely trust the first one.

⚠️ **GitHub.com Fallback** ⚠️