général
curriculum vitae
contact

CARDIWEB
2007
2006
2005

Etudes
les projets
les sites web
les exposés

liens
OTB World
EPITA
PCN
CARDIWEB

 

COMPILATEUR TIGER

This document presents the EPITA version of the Tiger project. It was last updated 5 February 2003.

Copyright © 2000, 2001, 2002, 2003 Akim Demaille.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover texts and with the no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License."

Reports

Everything exposed in this document is expected to be known.


Node:Top, Next:, Up:(dir)

Compilation: Projects and Reports

This document was last edited on 5 February 2003. Make sure the sections have been updated for your promotion.

This document (split and beautiful or in one simple piece) details the various tasks the "Compilation" students must complete. It includes both the practical part (Tiger compiler) and the theoretical part (reports).


Node:Tarballs, Next:, Previous:Top, Up:Top

Tarballs

There are a few mandatory requirements over the tarballs.

Delivery Dates

The tarballs must be made via make distcheck, and put in the requested location (see Student Tarballs). This tarball will be copied at the exact moment of the first delivery date, and the list of project which were found/not found is reported in epita.cours.compile.

If your tarball was not fetched, send it immediately to yaka@epita.fr and akim@epita.fr, with [Tiger]: bardec_f-tc-n.tar.bz2 as Subject. If you are quick enough (within a couple of hours), there won't be penalties. Run this:

     mutt -c akim@epita.fr \
          -s "[Tiger]: bardec_f-tc-n.tar.bz2" \
          -a bardec_f-tc-n.tar.bz2 \
          yaka@epita.fr </dev/null
     

Warning: Do not use elm to send your tarballs. Each time someone tried, it failed. And anyway, just never use elm.

Within a couple of hours after delivery date, a program evaluates your tarball. The results are available on the web, as bardec_f-tc-n.log. You should read this file as soon as possible, do not wait for the marks to be computed to react. If you realize there were stupid mistakes (such as displaying /* escapes */ instead of /* escaping */), send immediately (the day of the first delivery date) another tarball. This will cost you a patch, which is typically -1.

If you have massive failures, there is a second delivery date, which costs super_patch, typically -3. Beyond the second delivery date, no additional change will be accepted. Not that the second delivery is not automated: you have to send immediately to yaka@epita.fr and akim@epita.fr, with [Tiger]: bardec_f-tc-n.tar.bz2 as Subject.

Given Tarballs

The naming scheme for provided tarballs is different from the scheme you must follow (see Student Tarballs). Our naming scheme looks like 2004-tc-2.0.tar.bz2. If we update the tarballs, they will be named 2004-tc-2.x.tar.bz2. But your tarball must be named login-tc-2.tar.bz2, even if you send a second version of your project.

We also provide some test cases. There are few of them, and you will have to write your own tests. Writing tests is part of the project. Do not just copy test cases from other groups, as you will not understand why they were written.

The initial test suite is available for download at tests.tgz. It contains the following directories:

good
This programs are correct.
scan
These programs have lexical errors.
parse
These programs have syntax errors.
type
These programs contain type mismatches.

Project Layout

Here is the mandatory layout of the tarball:

AUTHORS
In the top level of the distribution, there must be a file AUTHORS which contents is as follows:
          Fabrice Bardèche     <bardec_f@epita.fr>
          Jean-Paul Sartre     <sartre_j@epita.fr>
          Jean-Paul Deux       <deux_j@epita.fr>
          Jean-Paul Belmondo   <belmon_j@epita.fr>
          

The group leader is the first in the list. Do not include emails other than those of EPITA. I repeat: give the 6_1@epita.fr address. Note that the file AUTHORS is automatically distributed, but pay attention to the spelling.

ChangeLog
Optional. The list of the changes made in the compiler, with the dates and names of the people who worked on it. See the Emacs key binding C-x 4 a.
README
Various free information.
src/
All the sources are in this directory.
tests/
Test files. Do not include this directory in the tarball, I already have my own test files, there is no need for you to give it to me again. Save my disk!
src/tc
Your compiler.
src/tc.cc
Main entry, option processing. Called, the driver.
src/symbol/
Namespace symbol:
symbol.hh
The handling of the symbols.
table.hh
The handling of generic symbol tables, i.e., it is independent of functions, types and variables.

src/ast/
Namespace ast. The abstract syntax.
src/parse/
Namespace parse: parsetiger.yy, scantiger.ll, libparse.hh which prototypes what tc.cc needs to know about the module parse.
src/type/
Namespace type. Type checking.
libtype.hh
The interface for the driver.
types.hh
The definition of all the types. You are free to use whatever layout you wish (several files); we have a single types.hh file.
type-entry.hh
Definitions of type::TypeEntry, type::VarEntry, and type::FunEntry, used in type::TypeEnv to associate data to types, variables, and functions (obviously).
type-env.hh
The types environment, comprising three symbol tables: types, functions, and variables.

src/translate/
Namespace translate. Translation to intermediate code translation. It includes:
libtranslate.hh
The interface.
libtranslate.cc
The compiled module.
translation.hh
Helping routines to perform the translations. For instance, it contains Exp *simpleVar (const Access &access, const Level &level), Exp *callExp (const temp::Label &label, std::list<Exp *> args) etc. which are routines that produce some Tree::Exp.
translate-visitor.hh
Implements the class TranslateVisitor which performs the IR generation thanks to translation.hh. It must not be polluted with translation details: it is only coordinating the AST traversal with the invocation of translation routines. For instance, here is the translation of a ast::SimpleVar:
               virtual void visit (const SimpleVar& e)
               {
                 const Access &access = _env.var_access_get (e.name_get ());
                 _exp = simpleVar (access, *_level);
               }
               

Student Tarballs

Each group must provide a tarball. If bardec_f is the head of your group, the tarball must be ~bardec_f/c/rendu/tiger/bardec_f-tc-n.tar.bz2 where n is the number of the "release". The following commands must work properly:

     $ bunzip2 -cd bardec_f-tc-n.tar.bz2 | tar xvf -
     $ cd bardec_f-tc-n
     $ ./configure CC=gcc-3.2 CXX=g++-3.2
     $ make
     $ cd src
     $ ./tc /tmp/test.tig
     

where gcc-3.2 and g++-3.2 are the name of the GNU C and C++ Compiler 3.2 executables.

Your tarball must be done via make distcheck. If you just run make dist, then you might forget to include some files in the distribution. If you don't even run make dist, then not only some files might be missing, but you have no guarantee that the tarball will compile elsewhere (not to mention that we don't care about object files etc.).

Do not deliver the tests that we gave: we already have them.

Any tarball which is not built thanks to make distcheck (this is easy to see: they include files we don't want, and don't contain some files we need...) will be penalized with at least ### tarball_not_clean.

Code Sanity

The code you deliver must be clean. In particular, when some code is provided, and you have to fill in the blanks denoted by FIXME: Some code has been deleted.. Sometimes you will have to write the code from scratch.

In any case, dead code and dead comments must be removed. You are free to leave comments spotting places where you fixed a FIXME:, but never leave a fixed FIXME: in your code. Nor any irrelevant comment.

The official compiler for this project, is GNU C++ Compiler, 3.2 or higher.


Node:Evaluation, Next:, Previous:Tarballs, Up:Top

Evaluation

Some stages are evaluated only by a program, and others are evaluated both by humans, and a program.

Automated Evaluation

Each stage of the compiler will be evaluated by an automatic corrector. As soon as the tarball are delivered, the logs are available on http://www.lrde.epita.fr/~akim/compil, in the directory corresponding to your promotion and stage. For instance, 2004 students ought to read http://www.lrde.epita.fr/~akim/compil/2004/4/bardec_f-tc-4.log.

We stress that automated evaluation enforces the requirements: you must stick to what is being asked. For instance, for T3 it is explicitly asked to display something like:

     var /* escaping */ i : int := 2
     

so if you display any of the following outputs

     var i : int /* escaping */ := 2
     var i /* escaping */ : int := 2
     var /* Escapes */ i : int := 2
     

be sure to fail all the tests, even if the computation is correct.

If you find some unexpected errors (your project does compile with the reference compiler, some files are missing, your output is slightly incorrect etc.) immediately send a new tarball to yaka@epita.fr with [Tiger] as prefix of the subject. This corresponds to ### patch.

Do not wait for the final marks to be computed, this is extremely irritating, and doomed to failure. You must understand that (i) you increase our workload, and (ii) anyway this is the wrong approach, the Tiger Compiler is a big project which must be continuously improved.

If, anyway, you send a tarball to fix your problems long after the initial date, you will be flagged as ### super_late, which impact on the mark is quite bad...

During the Examination

When you are defending your projects, here are a few rules to follow:

Don't talk
Don't talk unless you are asked to: when a person is asked a question, s/he is the only one to answer. You must not talk to each other either: often, when one cannot answer a question, the question is asked to another member. It is then obvious why the members of the group shall not talk.
Don't touch the screen
Don't touch my display! You have nice fingers, but I don't need their prints on my screen.
Tell the truth
If there is something the examiner must know (someone did not work on the project at all, some files are coming from another group etc.), say it immediately, for, if we discover that by ourselves, you will be severely sanctioned.
Learn
It is explicitly stated that you can not have worked on a stage provided this was an agreement with the group. But it is also explicitly stated that you must have learned what was to be learned from that compiler stage, which includes C++ techniques, Bison and Flex mastering, object oriented concepts, design patterns and so forth.
Complain now!
If you don't agree with the notation, say it immediately. Private messages about "this is unfair: I worked much more than bardec_f but his grade is better than mine" are thrown away.

Conversely, there is something I wish to make clear: I, Akim, and the other examiners, will probably be harsh (maybe even very harsh), but this does not mean I disrespect you, or judge you badly.

You are here to defend your project and knowledge, I'm here to stress them, to make sure they are right. Learning to be strong under pressure is part of the exercise. Don't burst into tears, react! Don't be shy, that's not the proper time: you are selling me something, and I will never buy something from someone who cries when I'm criticizing his product.

You should also understand that human examination is the moment where we try to evaluate who, or what group, needs help. We are here to diagnose your project and provide solutions to your problems. If you know there is a problem in your project, but you failed to fix it, tell it to the examiner! Work with her/him to fix your project.

Human Evaluation

The point of this evaluation is to measure:

the quality of the code
How clean it is, amount of code duplication, bad hacks, standards violations (e.g., stderr is forbidden in proper C++ code) and so forth. It also aims at detecting cheaters, who will be severely punished (mark = -42).
the knowledge each member acquired
While we do not require that each member worked on a stage, we do require that each member (i) knows how the stage works and (ii) has perfectly understood the (C++, Bison etc.) techniques needed to implement the stage.

In results in an evaluation file (login-tc-stage.eval, e.g., bardec_f-tc-2.eval), structured as follows.

The format is free, except some lines starting with specific markers, when they are at the first column. The first marker is *, used to denote the login of the group head:

     * bardec_f
     ----------------------------------------
     2002-03-07: T2 par akim
     tc-check: Summary:
     tc-check: == Testing ./tc --parse-trace -l --ast-display ==
     Successes: 177/177

     Suppléments:
     - --stdin pour parser stdin.
     - Le parser fait attention à ses leaks, même sur les Symbol.
     - --gcc-ast
     - ChangeLog
     - Ne meurt pas quand on lui donne un répertoire en entrée.

     Le code est joli, et en 80 col !
     Plutôt que des listes vides, passent 0.
     

As you can see, the person who evaluated is allowed to put whatever comment comes to her mind. Specific features, specific failures, explanations to help the students to fix their project should be included. Each time an entry is made, there should be the date, and who wrote it (2002-03-07: T2 par akim).

Then, the mark of the project, aka, the note of the group:

     ### Note T2 = 20
     

and optionally:

### late
The code was delivered reasonably late (the same day).
### super_late
More than one day.
### patch
Within 24 hours from the first delivery date, the group submitted another (hopefully) fixed tarball.
### super_patch
Before the second delivery date, the group submitted another tarball.
### not_compile
It doesn't compile, and the examiner had to change the code.
### tarball_not_clean
Clear enough: bad name, bad content etc.
### cheater
The group has cheated. The examiner is then asked to write as many details as possible, and to flag the group as cheaters. The mark is likely to be negative, and exclusion from EPITA is possible.
### reevaluation
The group had a mark < 8, and is now being reexamined.
### bonus
The group has implemented additional features, such as XML output etc. and the project works properly. We do not care about extra features on top of a broken project.
Important note to the examiners: Note field.

The examiner should not take (too much) the automated tests into account to decide the mark. This is because the mark is computed later, taking this into account, so don't do it twice. Similarly, a very nice project, which is super late, shall be flagged as super late, but should have a good ## Note entry.

Important note to the examiners: broken tarballs.

If you fixed the tarball (### not_compile, or ### tarball_not_clean or whatever modification, you must run make distcheck again, and replace the tarball they delivered with the new one. Do not keep the old tarball, do not install it in a special place: just replace the first tarball with it, but say so in the eval file.

The rationale is simple: only tarballs pass the tests, and every tarball must be able to pass the tests. If you don't do that, then someone else will have to do it again.

The next bits is the evaluation of each member of the group:

     ----------------------------------------
     ## bardec_f:100:120
     Parser, locations, scanner.  A pratiquement e'crit tout T2.

     ##sartre_j:100:40
     AST, Visitor
     Pourtant ne comprend rien au Visitor.  Se fout de ma gueule.

     ##deux_j:80:80
     Pas de problème C++.
     FIXME: Sa note de T1 est à revoir (resoutenance).

     ##belmon_j:100:100
     Actions, les delete dans le parser.
     A implémenté la sortie en AST compatible avec GCC.
     

The formalism is ## login:bonus1:bonus2. If the stage is T2, then bonus1 refers to a bonus on T1, and bonus2 on T2. For T4, it covers T3 and T4, and so forth.

The values can be "negative" (80, 60, 40, 0), or positive if a member deserves it (100, 120, or even more if justifiable).

There should be information about who did what.

Marks Computation

Because the Tiger Compiler is a project with stages, the computation of the marks depends on the stages too. To spell it out explicitly:

A stage is penalized by bad results on tests performed for previous stages.

It means, for instance, that a T3 compiler will be exercised on T1, T2, and T3. If there are still errors on T1 and T2 tests, they will pessimize the result of T3 tests. The older the errors are, the more expensive they are.

As an example, here are the formulas to compute the global success rate of T3 and T5:

     global-rate-T3 := rate-T3 * (+ 2 * rate-T1
                                  + 1 * rate-T2) / 3
     global-rate-T5 := rate-T5 * (+ 4 * rate-T1
                                  + 3 * rate-T2
                                  + 2 * rate-T3
                                  + 1 * rate-T4) / 10
     

Because a project which fail half of the time is not a project that deserves half of 20, the global-rate is elevated to 1.7 before computing the mark:

     mark-T3 := roundup (power (global-rate-T3, 1.7) * 20 - malus-T3, 1)
     

where roundup (x, 1) is x rounded up to one decimal (roundup (15, 1) = 15, roundup (15.01, 1) = 15.1).

When the project is also evaluated by a human, power is not used. Rather, the success rate modifies the mark given by the examiner:

     mark-T2 := roundup (eval-T2 * global-rate-T2 - malus-T2, 1)
     

Reevaluation

It happens that some groups have big problems with their project. Here is how to solve these issues.

But first of all, you must know that cheating, stealing another group's project, is not a solution, because:

  • you won't learn. Do not forget that the Tiger Compiler is only an exercise: having a good grade is less important than understanding and being efficient when you will be hired by a company.
  • we will certainly notice you are not able to answer our questions.
  • when we find cheaters, they pay it very severely (### cheater).

Because the Tiger Compiler is a long project, you continuously have to improve it, but marks starting from 8 to higher are definitive. If your mark is less than 8, can be re-examined: you must provide a better tarball, and in the case of human evaluated stages, another audition will be made.

This new audition will be flagged as ### reevaluation. It cannot be better than 12/20, even if your project is excellent: a good project out of date cannot be judged better than a reasonable good project on time.

Pay attention that when providing an updated tarball for reexamination of stage n, it must follow the naming scheme of stage n, even if the contents goes further. For instance, it is perfectly valid to ask for a T2 reevaluation with a compiler that implements T4, but the tarball must be bardec_f-tc-2.tar.bz2.

You have to ask for a reevaluation, we will not look after you.


Node:Compiler Stages, Next:, Previous:Evaluation, Up:Top

Compiler Stages

The compiler will be written in several steps, described below.


Node:T0, Next:, Up:Compiler Stages

T0, Naive Scanner and Parser

This section has been updated for EPITA-2005.

T0 is a weak form of T1: the scanner and the parser are written, but there is a set of simplifications:

the hierarchy
We don't need several directories, you can program in the top level of the package.
the build
The GNU Build System is not used: there is no need for Autoconf, Automake etc.
no driver
The driver is reduced to its simplest form:
     int
     main (int argc, const char *argc[])
     {
       assert (argc == 1);
       yyin = fopen (argv[1]);
       assert (yyin);
       return !!yyparse ();
     }
     

i.e., there is no support for options at all.

SCAN, PARSE
There is no support for options, but that does not mean that you cannot use the tracing/debugging features from Flex/Bison. It is required that you bind tracing to the environment variables SCAN and PARSE. I.e., running
     PARSE=1 ./tc foo.tig
     

will set yydebug to 1, which causes the traces of the parsing to be displayed.

YYPRINT
There is no requirement to implement YYPRINT support.
yylval
All the values are supported: strings, integers and even symbols. Nevertheless, symbols are returned as plain strings for the time being: the class symbol::Symbol is to be implemented in T1.


Node:T0 Code to Write, Up:T0

T0 Code to Write

You must write

scantiger.ll
The scanner.
parsetiger.yy
The parser, and maybe main if you wish.
tc.cc (optionally)
The driver, i.e., main. Putting it into parsetiger.yy is OK in T0.
Makefile
This file is mandatory. Running make must build the binary tc.

The requirements on the tarball are the same as usual, see Tarballs.


Node:T1, Next:, Previous:T0, Up:Compiler Stages

T1, Scanner and Parser

T1 delivery is Friday, February 14th 2003 at noon.
This section is updated for EPITA-2005.

Scanner and parser are properly running, but the abstract syntax tree is not built yet. Differences with T0 include:

GNU Build System
Autoconf, Automake are used.
Options, Tasks
The compiler supports basic options via in the Task module.
locations
The locations are computed are properly reported in the error messages


Node:T1 Samples, Next:, Up:T1

T1 Samples

The only information the compiler will give is about lexical and syntax errors.

If there are no errors, the compiler shuts up, and exits successfully:

     /* an array type and an array variable */
     let
       type  arrtype = array of int
       var arr1 : arrtype := arrtype [10] of 0
     in
       arr1[2]
     end
     
File 1: test01.tig
     $ tc test01.tig

     =>0
     
Example 2: tc test01.tig

If there are lexical errors, the exit status is 2, and a an error message is output on the standard error output. Note that its format is standard: file, (precise) location, and then the message.

     1
      /* This comments starts at /* 2.2 */
     
File 3: unterminated-comment.tig
     $ tc unterminated-comment.tig

     error-->unterminated-comment.tig:2.2-3.0: unexpected end of file in a comment
     =>2
     
Example 4: tc unterminated-comment.tig

If there are syntax errors, the exit status is set to 3:

     let var a : nil := ()
     in
       1
     end
     
File 5: type-nil.tig
     $ tc type-nil.tig

     error-->type-nil.tig:1.13-15: syntax error, unexpected "nil", expecting "identifier"
     error-->Parsing Failed
     =>3
     
Example 6: tc type-nil.tig

If there are errors which are non lexical, nor syntactic (Windows will not pass by me):

     $ tc C:/TIGER/SAMPLE.TIG

     error-->cannot open `C:/TIGER/SAMPLE.TIG': No such file or directory
     =>1
     
Example 7: tc C:/TIGER/SAMPLE.TIG

The option --parse-trace, which relies on Bison's %debug directive, and the use of YYPRINT, must work properly:

     $ cat foo.tig
     a + "a"
     $ ./tc --parse-trace foo.tig
     Starting parse
     Entering state 0
     Reading a token: Next token is 258 (ID a)
     Shifting token 258 (ID), Entering state 2
     Reading a token: Next token is 270 (PLUS)
     Reducing via rule 74 (line 318), ID  -> varid
     state stack now 0
     Entering state 16
     Reducing via rule 34 (line 196), varid  -> lvalue
     state stack now 0
     Entering state 13
     Next token is 270 (PLUS)
     Reducing via rule 33 (line 192), lvalue  -> exp
     state stack now 0
     Entering state 12
     Next token is 270 (PLUS)
     Shifting token 270 (PLUS), Entering state 30
     Reading a token: Next token is 257 (STRING a)
     Shifting token 257 (STRING), Entering state 1
     Reducing via rule 15 (line 151), STRING  -> exp
     state stack now 0 12 30
     Entering state 65
     Reading a token: Now at end of input.
     Reducing via rule 27 (line 182), exp PLUS exp  -> exp
     state stack now 0
     Entering state 12
     Now at end of input.
     Reducing via rule 1 (line 106), exp  -> program
     state stack now 0
     Entering state 159
     Now at end of input.
     Shifting token 0 ($), Entering state 160
     Now at end of input.
     $ echo $?
     =>0
     

Note that (i), it cannot see that the variable is not declared nor that there is a type checking error, since type checking... is not implemented, and (ii), the output might be slightly different, depending upon the version of Bison you use. But what matters is that one can see the items: ID a, STRING a.


Node:T1 Given Code, Next:, Previous:T1 Samples, Up:T1

T1 Given Code

Some code is provided: 2005-tc-1.0.tar.bz2. It contains:

src/common.hh
Used throughout the project.
src/tc.cc
The compiler driver (option handling etc.).
src/parse/scantiger.ll
The scanner.
src/parse/parsetiger.yy
The parser.
src/ast/position.hh
Keeping track of a point (cursor) in a file.
src/ast/location.hh
Keeping track of a range (two cursors) in a (or two) file.


Node:T1 Code to Write, Next:, Previous:T1 Given Code, Up:T1

T1 Code to Write

Be sure to read Flex and Bison documentations and tutorials, see Flex & Bison.

src/scantiger.ll
The scanner must be completed. It must be able to read strings, identifiers etc. Strings and identifiers will be stored as C++ std::string. See the following code for the basics.
          ...
          \"          yylval->str = new std::string (); BEGIN STATE_STRING;

          <STATE_STRING>{ /* Handling of the strings.  Initial " is eaten. */
               \" {
                 BEGIN INITIAL;
                 return STRING;
               }
          ...
               \\x[0-9a-fA-F]{2}  {
                 yylval->str->append (1, strtol (yytext + 2, 0, 16));
               }
          ...
          }
          

The locations are tracked.

src/parsetiger.yy
The grammar must be complete, but do not implement actions. You have to implement support for --parse-trace (see T1 Samples). Pay special attention to the display of strings and identifiers.

Bison will certainly complain because of a type clash for some actions. For instance, if you have given a type to STRING, but none to exp, then it will choke on:

          exp: STRING;
          

because it actually means

          exp: STRING { $$ = $1; };
          

which is not type coherent. So write this instead:

          exp: STRING {};
          

src/ast/position.hh
The class ast::Position is completed.
src/ast/location.hh
The class ast::Location must be completed.
src/symbol/symbol.hxx
The class symbol::Symbol keeps a single copy of identifiers, so that (i) we save space, and (ii) symbol comparison is fast. The file src/symbol/symbol.hh describes the interface of the class symbol::Symbol, but the implementation is to be written in src/symbol/symbol.hxx.


Node:T1 FAQ, Previous:T1 Code to Write, Up:T1

T1 FAQ

Options
The first version of argp we used (1.1) was buggy: on ./tc foo.tig -A, it considered -A was a file, not an option. This bug is now fixed. The patch is from Niels Möller:
     diff -u -r1.16 -r1.17
     --- argp-parse.c	18 Feb 2001 22:40:03 -0000	1.16
     +++ argp-parse.c	4 Feb 2003 19:52:30 -0000	1.17
     @@ -1021,6 +1021,8 @@
      	  *arg_ebadkey = 1;
      	  if (parser->first_nonopt != parser->last_nonopt)
      	    {
     +	      exchange(parser);
     +
      	      /* Start processing the arguments we skipped previously. */
      	      parser->state.next = parser->first_nonopt;

     

Go into argp/, run patch -p0, paste the patch, type <CTRL>-d.


Node:T2, Next:, Previous:T1, Up:Compiler Stages

T2, Building the Abstract Syntax Tree

This section has been updated for EPITA-2004. Last update: 2002-02-14.

At the end of this stage, the compiler can build abstract syntax trees of Tiger programs and pretty-print them.


Node:T2 Samples, Next:, Up:T2

T2 Samples

Here are a few examples of expected features.

The parser builds abstract syntax trees that can be output by a pretty-printing module:

     /* define a recursive function */
     let
       /* calculate n! */
       function fact (n : int) : int =
         if  n = 0
           then 1
           else n * fact (n - 1)
     in
       fact (10)
     end
     
File 8: simple-fact.tig
     $ tc --ast-display simple-fact.tig

     
     /* == Abstract Syntax Tree. == */
     let
        function fact (n : int) : int =
           if (n = 0)
              then 1
              else (n * fact ((n - 1)))
     in
        fact (10)
     end
     
Example 9: tc -ast-display simple-fact.tig

The output from your pretty-printer must be valid Tiger code, and be equivalent to the input.

By valid, we mean that any Tiger compiler must be able to parse with success your output. Pay attention to the banners such as == Abstract...: you should use comments: /* == Abstract... */. There is one exception: you are not asked to escape quotes and backslashes when printing a literal string.

By equivalent, we mean that except for syntactic sugar, the output and the input are equal. Syntactic sugar refers to &, |, unary -, and in some cases if then:

     % cat boolean-1.tig
     1 = 1 & 2 = 2
     % ./tc --ast-display boolean-1.tig
     /* == Abstract Syntax Tree. == */
     if  (1 = 1)
        then  (2 = 2)
        else 0
     

For loops must be properly displayed, i.e., although we use a ast::VarDec for the index of the loop, you must not display var:

     /* valid for and let */
     let
             var a := 0
     in
             for i :=0 to 100 do (a :=a+1; ())
     end
     
File 10: for-loop.tig
     $ tc --ast-display for-loop.tig

     
     /* == Abstract Syntax Tree. == */
     let
        var a := 0
     in
        for i := 0 to 100 do
           (
              a := (a + 1);
              ()
           )
     end
     
Example 11: tc -ast-display for-loop.tig


Node:T2 Given Code, Next:, Previous:T2 Samples, Up:T2

T2 Given Code

Some code is provided: 2004-tc-2.0.tar.bz2. It contains:

src/ast
Implementation of the abstract syntax tree. The file README gives an overview of the involved class hierarchy.
src/ast/location.hh, position.hh
Classes used to hold locations computed from Tiger programs.
src/ast/visitor.hh
Abstract base class of the compiler's visitor hierarchy.
src/ast/default-visitor.hh
Implementation of the DefaultVisitor class, which walks the abstract syntax tree, doing nothing. It is mainly used as a basis for deriving other visitors.
src/ast/print-visitor.hh
Implementation of the PrintVisitor class, which performs pretty-printing in the tiger compiler.
src/symbol.*
Have a look at how this class is implemented. Ask questions if you don't understand it. You will not be given questions about it in T2, but T3, there will be.


Node:T2 Code to Write, Next:, Previous:T2 Given Code, Up:T2

T2 Code to Write

What is to be done:

src/parse/scantiger.ll
The scanner must now use symbol::Symbol instead of std::string for identifiers. Of course, the parser must be adjusted too.

The scanner must be updated to keep track of locations of tokens in Tiger programs. To adjust your scanner, you are strongly encouraged to use YY_USER_ACTION, and also the yylex prologue:

          ...
          %%
          %{
            // Everything here is run each time yylex is invoked.
          %}
          "if"    return IF;
          ...
          %%
          ...
          

Have a look at the scanner and parser chapters of this draft.

src/parse/parsetiger.yy
The parser must be updated to ensure error recovery. The grammar must be changed to process declarations by chunks.

In Tiger, the following program is invalid:

          let function foo () = ()
              function foo () = ()
              var      foo    := 0
          in
            ()
          end
          

while the following code is valid:

          let function foo () = ()
              var      foo    := 0
              function foo () = ()
          in
            ()
          end
          

this is because declarations are cut in "chunks" of declarations of the same kind. In the first example, there are two chunks: one chunk of two function declarations and one variable declaration. In the second example: there are three chunks: one chunk containing only one function declaration, one with one variable declaration, and the third one, with a single function declaration again.

The rule is "a chunk cannot define twice the same name".

In order to implement this easily, you must adjust your grammar so that declarations are parsed by chunks. Pay special attention to the implementation of ast::FunctionDecs, ast::VarDecs, and ast::TypeDecs (which are all implemented thanks to ast::AnyDecs): they are these chunks of declaration. Therefore, an ast::LetExp uses a list of chunks.

src/ast
The abstract syntax tree module must be completed. Basically, this means there should remain no FIXME: anywhere in the code we gave. Several files are missing (fieldvar.hh nilexp.hh intexp.hh stringexp.hh callexp.hh assignexp.hh whileexp.hh breakexp.hh arrayexp.hh). See src/ast/README for additional information on the missing classes.
src/ast/location.hh, position.hh
The classes Position and Location must be completed.
src/ast/default-visitor.hh
The DefaultVisitor class must be completed, and must be able to walk whole abstract syntax trees. Do not forget that your DefaultVisitor must be a sound basis for your further work on the Tiger compiler.
src/ast/print-visitor.hh
The PrintVisitor class must be completed, and able to pretty-print any Tiger program.


Node:T2 FAQ, Previous:T2 Code to Write, Up:T2

T2 FAQ

bison
Be sure to read its dedicated section: Flex & Bison.
escapes (escapes_get, etc.)
These are place holders for T3. You don't have to complete them. Actually, the easiest is probably just to comment these functions out. Or better yet: #if 0/#endif.


Node:T3, Next:, Previous:T2, Up:Compiler Stages

T3, Computing the Escaping Variables

This section was updated for Tiger 2004. The project will be taken on Friday, March 15th, at noon.

At the end of this stage, the compiler must be able to compute and display the escaping variables. Be sure to read the chapter "Escapes" in the lecture notes. Its parser is equipped with error recovery.


Node:T3 Samples, Next:, Up:T3

T3 Samples

This example demonstrates the computation and display of escaping variables/formals.

     let
        var escaping := "I rule the world!"
        var not_escaping := "Peace on Earth for humans of good will."
        function print_escaping () =
           print (escaping)
     in
        (not_escaping; print_escaping ())
     end
     
File 12: variable-escapes.tig
     $ tc --escapes-display variable-escapes.tig

     
     /* == Escapes. == */
     let
        var /* escaping */ escaping := "I rule the world!"
        var /* escaping */ not_escaping := "Peace on Earth for humans of good will."
        function print_escaping () =
           print (escaping)
     in
        (
           not_escaping;
           print_escaping ()
        )
     end
     
Example 13: tc -escapes-display variable-escapes.tig
     $ tc --ast-display --escapes-compute --escapes-display variable-escapes.tig

     
     /* == Abstract Syntax Tree. == */
     let
        var escaping := "I rule the world!"
        var not_escaping := "Peace on Earth for humans of good will."
        function print_escaping () =
           print (escaping)
     in
        (
           not_escaping;
           print_escaping ()
        )
     end
     /* == Escapes. == */
     let
        var /* escaping */ escaping := "I rule the world!"
        var not_escaping := "Peace on Earth for humans of good will."
        function print_escaping () =
           print (escaping)
     in
        (
           not_escaping;
           print_escaping ()
        )
     end
     
Example 14: tc -ast-display -escapes-compute -escapes-display variable-escapes.tig

You are strongly encouraged to run your compiler on merge.tig and to study its output.

Another part of T3 is the improvement of your parser: it must be robust to some forms of errors. On the following input:

     (
       1;
       (2, 3);
       (4, 5);
       6
     )
     

it finds two errors: it does not stop on the first one:

     % ./tc error.tig
     error-->error.tig:3.5: parse error, unexpected ",", expecting ";"
     error-->error.tig:4.5: parse error, unexpected ",", expecting ";"
     

Of course, the exit status still reveals the parse error.

Be sure that your error recovery does not break the rest of the compiler...


Node:T3 Code To Write, Previous:T3 Samples, Up:T3

T3 Code To Write

src/parse/parsetiger.yy
Implement error recovery. There should be at least three uses of the token error. Read the Bison documentation about it.
src/parse/parsetc-class.hh
The current version of Bison no longer outputs this file. Be sure to remove it from your project: src/parse/parsetiger.hh alone is enough.
ast::VarExp
What is it used for? What is its meaning? Find another means to do the same things, but without it. Hence destroy it, and remove its file: ast must no longer use it.
ast::PrintVisitor
Be sure to display the /* escaping */ flag where needed, and only where needed. If you don't pay attention, you might display meaningless flags due to implementation details.
symbol::Table< class Entry_T >
This item is irrelevant for Tiger 2004, as the code was accidently delivered to you. Nevertheless, if you're willing to understand templates in C++, you are encouraged to try writing your own version, without taking inspiration from the code we gave.

Write the class template symbol::Table in src/symbol/table.hh which is a table of symbols dedicated to storing some data which type is Entry_T * (i.e., it maps a symbol::Symbol to an Entry_T *). You are encouraged to implement something simple, based on stacks (see std::list) of maps (see std::map). We used to promote hash tables, but STL does not include hash tables: this is a GNU (SGI actually) extension.

symbol::Table is implemented as a class template because we will use it with several other kinds of Entry_T in both the type::TypeVisitor and translate::TranslateVisitor.

symbol::Table must provide this interface:

scope_begin () void
Open a new scope.

scope_end () void
Close the last scope, forgetting everything since the latest scope_begin ().

put (Symbol key, Entry_T & value) void
Associate value to key in the current scope.

get (Symbol key) const Entry_T *
If key was associated to some Entry_T in the open scopes, return the most recent insertion. Otherwise return the empty pointer.

print (std::ostream & ostr) const void
Send the content of this table on ostr in a readable manner, the top of the stack being displayed last.

ast::EscapeVisitor
Write the class ast::EscapeVisitor in src/ast/escape-visitor.hh.

You are suggested to implement three additional classes in this file:

Definition
An abstract class which is used to instantiate the template class symbol::Table into Table <Definition>.

escape_set (void) virtual void
Sets the escape to true.

int _depth Variable
Depth at which this object has been created.

depth_get () const int
Returns the depth associated to this Definition object.

VariableDefinition
Inherits from Definition. It has one additional attribute, a VarDec &. The method escape_set is implemented, and when invoked, set the escapes flags of the corresponding VarDec.
FormalDefinition
Inherits from Definition. To be designed by yourself. Do not forget that the ast class used to register formals is used elsewhere, and it would be a pity that your implementation makes no difference... Be sure to write a test that verifies that your implementation is not abused. I have one such test...

Equip ast
All the sites where variables and formals (i.e., the arguments of the functions being defined, not being used) are introduced must be equipped with the escape_get and escape_set methods. Most probably the code was already given, and is using const_casts; try to use mutable instead.

Modify the code so that each definition of an escaping variable/formal is preceded by the comment /* escaping */ if the flag display_escapes_p is true. See the item "Driver" for an example.

Driver
The driver, src/tc.cc, must be enhanced to support two additional options: --escapes-compute, which triggers the ast::EscapeVisitor, and --escapes-display, which sets display_escapes_p to true. This is not a modification of --ast-display, it is a genuine second ast::PrintVisitor which is invoked.

Please, note that --escapes-display does not imply --escapes-compute (see T3 Samples).


Node:T4, Next:, Previous:T3, Up:Compiler Stages

T4, Type Checking

T4 delivery is Friday, April 12th 2002 at noon.
At the end of this stage, the compiler type checks Tiger programs. Clear error messages are required.


Node:T4 Samples, Next:, Up:T4

T4 Samples

Here are a few examples:

     $ echo '1 + "2"' | ./tc -
     $ echo '1 + "2"' | ./tc - --types-check
     error-->standard input:1.1-7: type mismatch
     error-->  right operand type: string
     error-->  expected type: int
     
     $ echo '1 + a' | ./tc -
     $ echo '1 + a' | ./tc - --types-check
     error-->standard input:1.5: undeclared variable: a
     
     $ echo 'if 1 then 2' | ./tc -
     $ echo 'if 1 then 2' | ./tc - --types-check
     error-->standard input:1.1-11: type mismatch
     error-->  then clause type: int
     error-->  else clause type: void
     


Node:T4 Given Code, Next:, Previous:T4 Samples, Up:T4

T4 Given Code

src/type/libtype.hh
src/type/libtype.cc
The interface of the Type module. It exports a simple procedure, type_check.
src/type/types.hh
The definition of all the types. The type type::Nil is given in full.
src/type/type-env.hh
The environment used by the type::TypeVisitor.


Node:T4 Code to Write, Previous:T4 Given Code, Up:T4

T4 Code to Write

What is to be done.

src/type
New directory. Don't forget to update the SUBDIRS variable in src/Makefile.am, and the AC_CONFIG_FILES invocation in configure.ac.
src/type/types.hh
type::String, type::Int, and type::Void are to be implemented. It's dead easy. Using templates would be particularly appreciated to factor the code between the four singleton classes.

type::Named is almost entirely given.

type::Array is even simpler than the four Singletons.

type::Record is somewhat incomplete.

Pay attention to the implementation of type::Type::assignable_to and type::Type::comparable_to.

src/type/type-entry.hh
This file is really a empty nutshell, so we give it complete so that you concentrate on things that matter. Nonetheless you will be asked questions on this file, so have a look at it.
src/type/type-env.hh
The constructor of type::TypeEnv must be completed: it must fill the environment with the definition of builtin types and functions. See the Tiger Reference Manual.

The handling of types is left as an example, you still have to implement the variables and functions support.

type::TypeVisitor
Of course this is the most tricky part. I hope there are enough comments in there so that you understand what is to be done. Please, post your questions and help me improve it.
Driver
The driver must run the type checker when given --types-check. Note the plural, it spells --types-check, not --type-check.


Node:T5, Next:, Previous:T4, Up:Compiler Stages

T5, Translating to the High Level Intermediate Representation

At the end of this stage, all the errors of previous stages must have been fixed, and the compiler translates the AST into the high level intermediate representation, HIR for short.

T5 first delivery is Friday, June 14th, at noon.
Second delivery date is Monday, June 17th, at noon.


Node:T5 Samples, Next:, Up:T5

T5 Samples

T5 can be started (and should be started if you don't want to finish it in a hurry) by first making sure your compiler can handle code that uses no variables. Then, you can complete your compiler to support more and more Tiger features.

T5 Primitive Samples

This example is probably the simplest Tiger program.

     0
     
File 15: 0.tig
     $ tc --hir-display 0.tig

     
     /* == High Level Intermediate representation. == */
     # Routine: Main Program
     label l`Main'
     # Prologue
     # Body
     sxp
         const 0
     # Epilogue
     label end

     
Example 16: tc -hir-display 0.tig

You should then probably try to make more difficult programs with literals only. Arithmetics is one of the easiest tasks.

     1 + 2 * 3
     
File 17: arith.tig
     $ tc --hir-display arith.tig

     
     /* == High Level Intermediate representation. == */
     # Routine: Main Program
     label l`Main'
     # Prologue
     # Body
     sxp
         binop (+)
             const 1
             binop (*)
                 const 2
                 const 3
     # Epilogue
     label end

     
Example 18: tc -hir-display arith.tig

You should use havm to exercise your output.

     $ tc --hir-display arith.tig >arith.hir

     =>0
     
Example 19: tc -hir-display arith.tig >arith.hir
     $ havm arith.hir

     =>0
     
Example 20: havm arith.hir

Unfortunately, without actually printing something, you won't see the final result, which means you need to implement calls. Fortunately, you can ask havm for a verbose execution:

     $ havm --trace arith.hir

     error-->call ( name Main )
     error-->const 1
     error-->const 2
     error-->const 3
     error-->binop (*) 2 3
     error-->binop (+) 1 6
     error-->sxp 7
     
Example 21: havm -trace arith.hir

If you look carefully, you will find an sxp 7 in there...

Then you are encouraged to implement control structures.

     if 101 then 102 else 103
     
File 22: if-101.tig
     $ tc --hir-display if-101.tig

     
     /* == High Level Intermediate representation. == */
     # Routine: Main Program
     label l`Main'
     # Prologue
     # Body
     sxp
         eseq
         seq
             cjump ne
                 const 101
                 const 0
                 name l0
                 name l1
             label l0
             move
                 temp t0
                 const 102
             jump
                 name l2
             label l1
             move
                 temp t0
                 const 103
             jump
                 name l2
             label l2
         seq end
             temp t0
     # Epilogue
     label end

     
Example 23: tc -hir-display if-101.tig

And even more difficult control structure uses:

     while 101 & 102
       do (if 103 then break)
     
File 24: while-101.tig
     $ tc --hir-display while-101.tig

     
     /* == High Level Intermediate representation. == */
     # Routine: Main Program
     label l`Main'
     # Prologue
     # Body
     seq
         label l7
         cjump ne
             eseq
             seq
                 cjump ne
                     const 101
                     const 0
                     name l0
                     name l1
                 label l0
                 move
                     temp t0
                     const 102
                 jump
                     name l2
                 label l1
                 move
                     temp t0
                     const 0
                 jump
                     name l2
                 label l2
             seq end
                 temp t0
             const 0
             name l8
             name l3
         label l8
         seq
             cjump ne
                 const 103
                 const 0
                 name l4
                 name l5
             label l4
             jump
                 name l3
             jump
                 name l6
             label l5
             sxp
                 const 0
             jump
                 name l6
             label l6
         seq end
         jump
             name l7
         label l3
     seq end
     # Epilogue
     label end

     
Example 25: tc -hir-display while-101.tig
T5 Builtin Calls Samples

But the game becomes more interesting when you implement function calls (which is easier than compiling functions). print_int is probably the first builtin to implement:

     (print_int (101); print ("\n"))
     
File 26: print-101.tig
     $ tc --hir-display print-101.tig > print-101.hir

     =>0
     
Example 27: tc -hir-display print-101.tig > print-101.hir
     $ havm print-101.hir

     
     101
     
Example 28: havm print-101.hir

Complex values, arrays and records, also need calls to the runtime system:

let type list = { h: int, t: list }
    var list := list { h = 1, t = list { h = 2, t = nil } }
in
  print_int (list.t.h); print ("\n")
end
File 29: print-list.tig
     $ tc --hir-display print-list.tig

     
     /* == High Level Intermediate representation. == */

     label l0
     	"
     "
     # Routine: Main Program
     label l`Main'
     # Prologue
     move
         temp t2
         temp fp
     move
         temp fp
         temp sp
     move
         temp sp
         binop (-)
             temp sp
             const 4
     # Body
     seq
         move
             mem
                 temp fp
             eseq
             seq
                 move
                     temp t1
                     call
                         name l`malloc'
                         const 8
                     call end
                 move
                     mem
                         binop (+)
                             temp t1
                             const 0
                     const 1
                 move
                     mem
                         binop (+)
                             temp t1
                             const 4
                     eseq
                     seq
                         move
                             temp t0
                             call
                                 name l`malloc'
                                 const 8
                             call end
                         move
                             mem
                                 binop (+)
                                     temp t0
                                     const 0
                             const 2
                         move
                             mem
                                 binop (+)
                                     temp t0
                                     const 4
                             const 0
                     seq end
                         temp t0
             seq end
                 temp t1
         seq
             sxp
                 call
                     name l`print_int'
                     mem
                         binop (+)
                             mem
                                 binop (+)
                                     mem
                                         temp fp
                                     const 4
                             const 0
                 call end
             sxp
                 call
                     name l`print'
                     name l0
                 call end
         seq end
     seq end
     # Epilogue
     move
         temp sp
         temp fp
     move
         temp fp
         temp t2
     label end

     
Example 30: tc -hir-display print-list.tig
     $ tc --hir-display print-list.tig > print-list.hir

     =>0
     
Example 31: tc -hir-display print-list.tig > print-list.hir
     $ havm print-list.hir

     
     2
     
Example 32: havm print-list.hir
T5 Samples with Variables

Here is an example which demonstrates the usefulness of information about escapes: in the first case, all the variables are expected to escape, hence are stored on the stack, while in the second case, we know they don't escape, hence they can be stored in a temporary.

     let var a := 1
         var b := 2
         var c := 3
     in
       a := 2;
       c := a + b + c;
       print_int (c);
       print ("\n")
     end
     
File 33: vars.tig
     $ tc --hir-display vars.tig

     
     /* == High Level Intermediate representation. == */

     label l0
     	"
     "
     # Routine: Main Program
     label l`Main'
     # Prologue
     move
         temp t0
         temp fp
     move
         temp fp
         temp sp
     move
         temp sp
         binop (-)
             temp sp
             const 12
     # Body
     seq
         move
             mem
                 temp fp
             const 1
         move
             mem
                 binop (+)
                     temp fp
                     const -4
             const 2
         move
             mem
                 binop (+)
                     temp fp
                     const -8
             const 3
         seq
             move
                 mem
                     temp fp
                 const 2
             move
                 mem
                     binop (+)
                         temp fp
                         const -8
                 binop (+)
                     binop (+)
                         mem
                             temp fp
                         mem
                             binop (+)
                                 temp fp
                                 const -4
                     mem
                         binop (+)
                             temp fp
                             const -8
             sxp
                 call
                     name l`print_int'
                     mem
                         binop (+)
                             temp fp
                             const -8
                 call end
             sxp
                 call
                     name l`print'
                     name l0
                 call end
         seq end
     seq end
     # Epilogue
     move
         temp sp
         temp fp
     move
         temp fp
         temp t0
     label end

     
Example 34: tc -hir-display vars.tig
     $ tc --escapes-compute --hir-display vars.tig

     
     /* == High Level Intermediate representation. == */

     label l0
     	"
     "
     # Routine: Main Program
     label l`Main'
     # Prologue
     # Body
     seq
         move
             temp t0
             const 1
         move
             temp t1
             const 2
         move
             temp t2
             const 3
         seq
             move
                 temp t0
                 const 2
             move
                 temp t2
                 binop (+)
                     binop (+)
                         temp t0
                         temp t1
                     temp t2
             sxp
                 call
                     name l`print_int'
                     temp t2
                 call end
             sxp
                 call
                     name l`print'
                     name l0
                 call end
         seq end
     seq end
     # Epilogue
     label end

     
Example 35: tc -escapes-compute -hir-display vars.tig
     $ tc --escapes-compute --hir-display vars.tig >vars.hir

     =>0
     
Example 36: tc -escapes-compute -hir-display vars.tig >vars.hir
     $ havm vars.hir

     
     7
     
Example 37: havm vars.hir

The second example uses a simple function.

     let function fact (i: int) : int =
         if i = 0 then 1
                  else i * fact (i - 1)
     in
       print_int (fact (15));
       print ("\n")
     end
     
File 38: fact15.tig
     $ tc --hir-display fact15.tig

     
     /* == High Level Intermediate representation. == */
     # Routine: fact
     label l0
     # Prologue
     move
         temp t1
         temp fp
     move
         temp fp
         temp sp
     move
         temp sp
         binop (-)
             temp sp
             const 8
     move
         mem
             temp fp
         temp i0
     move
         mem
             binop (+)
                 temp fp
                 const -4
         temp i1
     # Body
     move
         temp rv
         eseq
         seq
             cjump eq
                 mem
                     binop (+)
                         temp fp
                         const -4
                 const 0
                 name l1
                 name l2
             label l1
             move
                 temp t0
                 const 1
             jump
                 name l3
             label l2
             move
                 temp t0
                 binop (*)
                     mem
                         binop (+)
                             temp fp
                             const -4
                     call
                         name l0
                         mem
                             temp fp
                         binop (-)
                             mem
                                 binop (+)
                                     temp fp
                                     const -4
                             const 1
                     call end
             jump
                 name l3
             label l3
         seq end
             temp t0
     # Epilogue
     move
         temp sp
         temp fp
     move
         temp fp
         temp t1
     label end


     label l4
     	"
     "
     # Routine: Main Program
     label l`Main'
     # Prologue
     # Body
     seq
         sxp
             call
                 name l`print_int'
                 call
                     name l0
                     temp fp
                     const 15
                 call end
             call end
         sxp
             call
                 name l`print'
                 name l4
             call end
     seq end
     # Epilogue
     label end

     
Example 39: tc -hir-display fact15.tig
     $ tc --hir-display fact15.tig >fact15.hir

     =>0
     
Example 40: tc -hir-display fact15.tig >fact15.hir
     $ havm fact15.hir

     
     2004310016
     
Example 41: havm fact15.hir


Node:T5 Given Code, Next:, Previous:T5 Samples, Up:T5

T5 Given Code

Code that was delivered:

src/temp/temp.hh
So called temporaries are pseudo-registers: we may allocate as many temporaries as we want. Eventually the register allocator will map those temporaries to either an actual register, or it will allocate a slot in the activation block (aka frame) of the current function.
src/temp/label.hh
We need labels for JUMPs, for functions, strings etc.
src/tree
The implementation of the intermediate representation. The file README should give enough explanations to understand how it works.

Reading the corresponding explanations in Appel's book is mandatory.

It is worth noting that contrary to A. Appel, just as we did for ast, we use n-ary structures. For instance, where Appel uses a binary SEQ, we have an n-ary SEQ which allows us to put as many statements as we want.

To avoid gratuitous name clashes, what Appel denotes EXP is denoted SXP (Statement Expression), implemented in translate::Sxp.

Please, pay extra attention to the fact that there are temp::Temp used to create unique temporaries (similar to symbol::Symbol), and tree::Temp which is the intermediate representation instruction denoting a temporary (hence a tree::Temp needs a temp::Temp). Similarly, on the one hand, there is temp::Label which is used to create unique labels, and on the other hand there are tree::Label which is the IR statement to define to a label, and tree::Name used to refer to a label (typically, a tree::Jump needs a tree::Name which in turn needs a temp::Label).

src/translate/level.hh, src/translate/level.cc
They implement translate::Level, which are wrappers around frame::Frame. A Frame knows only what are the "variables" it contains, but for a language such as Tiger, where you may need to access the variables of the "parent function", you must be able to go into the parent's frame: this is the role of Level.

Level are Frame with, in addition, a link (parallel to the static link) to the level of the enclosing function.

src/translate/level-entry.hh
All the information that the environment must keep about variables and functions.
src/translate/level-env.hh
The levels environment, containing LevelVarEntry's and LevelFunEntry's. We don't need to store information related to types here.
src/translate/translation.hh
This file contains functions used by the translate::TranslateVisitor to translate the AST into IR. They handle all the unCx etc. magic.
src/translate/translate-visitor.hh
The translate::TranslateVisitor which walks the AST and builds the IR tree.


Node:T5 Code to Write, Next:, Previous:T5 Given Code, Up:T5

T5 Code to Write

You are encouraged to try first very simple examples: nil, 1 + 2, "foo" < "bar" etc. Then consider supporting variables, and finally handle the case of the functions.

Driver
The driver must always perform the translation, but displays the result iff the option --hir-display was given. Obviously, an input that has not been type-checked cannot be translated, so all the T5 options (--hir-display, and optionally --hir-use-ifexp) imply --types-check.
Type Checking
Beware that there is a new builtin: print_int.
src/translate/fragment.hh
It implements translate::Fragment, an abstract class, translate::DataFrag to store the literal strings, and translate::ProcFrag to store the routines.

There remains to implement translate::ProcFrag::print which outputs the routine themselves plus the glue code (allocating the frame etc.).

src/translate/level-env.hh
Code is missing. In particular, bear in mind that the initial environment is not empty...
src/translate/translation.hh
There are many holes to fill.
src/translate/translate-visitor.hh
There are holes to fill.


Node:T5 Options, Previous:T5 Code to Write, Up:T5

T5 Options

This section documents possible extensions you could implement in T5.

T5 Optimizing Cascading If

This optimization aims at optimizing the number of jumps needed to compute nested if. It is easy to implement, and therefore will be granted with a reasonable bonus.

It consists in implementing the option --hir-use-ifexp, which causes the compiler to use translate::IfThenElseExp instead of the simpler scheme based on translate::Cx, Nx, and Ex only.

Warning: If you support the option --hir-use-ifexp then you will be tested on translate::IfThenElseExp. If this options does nothing in your compiler, then be sure to get 0 on all the corresponding tests.

First, here is the output without --hir-use-ifexp, and the performances reported by havm:

     1 = 1 & 2 = 2
     
File 42: boolean.tig
     $ tc --hir-display boolean.tig

     
     /* == High Level Intermediate representation. == */
     # Routine: Main Program
     label l`Main'
     # Prologue
     # Body
     sxp
         eseq
         seq
             cjump eq
                 const 1
                 const 1
                 name l0
                 name l1
             label l0
             move
                 temp t0
                 eseq
                 seq
                     move
                         temp t1
                         const 1
                     cjump eq
                         const 2
                         const 2
                         name l3
                         name l4
                     label l4
                     move
                         temp t1
                         const 0
                     label l3
                 seq end
                     temp t1
             jump
                 name l2
             label l1
             move
                 temp t0
                 const 0
             jump
                 name l2
             label l2
         seq end
             temp t0
     # Epilogue
     label end

     
Example 43: tc -hir-display boolean.tig
     $ tc --hir-display boolean.tig >boolean-1.hir

     =>0
     
Example 44: tc -hir-display boolean.tig >boolean-1.hir
     $ havm --profile boolean-1.hir

     error-->/* Profiling.  */
     error-->fetches from temporary : 2
     error-->fetches from memory    : 0
     error-->binary operations      : 0
     error-->function calls         : 1
     error-->stores to temporary    : 2
     error-->stores to memory       : 0
     error-->jumps                  : 1
     error-->conditional jumps      : 2
     error-->/* Execution time.  */
     error-->number of cycles : 17
     
Example 45: havm -profile boolean-1.hir

and here is the same, but with --hir-use-ifexp (pay attention that it is passed before --hir-display):

     $ tc --hir-use-ifexp --hir-display boolean.tig

     
     /* == High Level Intermediate representation. == */
     # Routine: Main Program
     label l`Main'
     # Prologue
     # Body
     seq
         cjump eq
             const 1
             const 1
             name l0
             name l1
         label l0
         seq
             sxp
                 const 2
             sxp
                 const 2
         seq end
         jump
             name l2
         label l1
         sxp
             const 0
         label l2
     seq end
     # Epilogue
     label end

     
Example 46: tc -hir-use-ifexp -hir-display boolean.tig
     $ tc --hir-use-ifexp --hir-display boolean.tig >boolean-2.hir

     =>0
     
Example 47: tc -hir-use-ifexp -hir-display boolean.tig >boolean-2.hir
     $ havm --profile boolean-2.hir

     error-->/* Profiling.  */
     error-->fetches from temporary : 0
     error-->fetches from memory    : 0
     error-->binary operations      : 0
     error-->function calls         : 1
     error-->stores to temporary    : 0
     error-->stores to memory       : 0
     error-->jumps                  : 1
     error-->conditional jumps      : 1
     error-->/* Execution time.  */
     error-->number of cycles : 12
     
Example 48: havm -profile boolean-2.hir

Of course, if you did not implement translate::IfThenElseExp support, --hir-use-ifexp must cause tc to exit 64.

T5 Bound Checking

Implementing bound checking is quite simple: it consists in having the program die when the program accesses an invalid subscript in an array. For instance, the following code is "succeeds" with a non-bound-checking compiler.

     let type int_array = array of int
         var  size  := 2
         var  arr1  := int_array [size] of 0
         var  arr2  := int_array [size] of 0
         var  two   := 2
         var  m_one := -1
     in
       arr1[two]   := 3;
       arr2[m_one] := -1;

       print_int (arr1[1]);
       print ("\n");
       print_int (arr2[0]);
       print ("\n")
     end
     
File 49: bound-violation.tig
     $ tc --hir-display bound-violation.tig >bound-violation.hir

     =>0
     
Example 50: tc -hir-display bound-violation.tig >bound-violation.hir
     $ havm bound-violation.hir

     
     -1
     3
     
Example 51: havm bound-violation.hir

When run with --bound-checking, have your compiler produce code that dies on such cases.

T5 Optimizing Static Links

Warning: this optimization is difficult to do it perfectly, and therefore, expect a big bonus.

In a first and conservative extension, the compiler considers that all the functions (but the builtins!) need a static link. This is correct, but inefficient: for instance, the traditional fact function will spend almost as much time handling the static link, than its real argument.

Some functions need a static link, but don't need to save it on the stack. For instance, in the following:

     let var foo := 1
         function foo () : int = foo
     in
       foo ()
     end
     

the function foo does need a static link to access the variable foo, but does not need to store its static link on the stack.

It is suggested to address these problems in the following order:

  1. Implement the detection of functions that do not need a static link (see exercise 6.5 in "Modern Implementation of Compilers"), but still consider any static link escapes.
  2. Adjust the output of --escapes-display to display /* escaping sl */ before the first formal argument of the functions (declarations) that need the static link:
              $ tc --escapes-display fact.tig
              /* == Escapes. == */
              let
                 function fact (/* escaping sl *//* escaping */ n : int) : int =
                    if  (n = 0)
                       then 1
                       else  (n * fact ( (n - 1)))
              in
                 fact (10)
              end
    
              $ tc --escapes-display --escapes-com fact.tig
              /* == Escapes. == */
              let
                 function