Implementing Type-Checking, CSC430, Winter 2024

8.12

Implementing Type-Checking, CSC430, Winter 2024🔗

1 Goal🔗

For this assignment, you must extend Assignment 5 by implementing mutable arrays. These arrays are instead of boxes. Just as the previous assignments asked you to generalize the book’s solution from one argument to many arguments, this assignment generalizes from a single box to an array.

As we’ve discussed in class, this code will use no side effects in its implementation; that is, it’s implementing state without using it. The key technology here is SPS, Store Passing Style.

In addition, you will be adding mutable bindings, and a simple type system.

2 Guidelines🔗

For this and all remaining assignments, every function you develop must come with the following things:

A commented header line that expresses the result of the function in terms of its inputs, written in English. Be as precise as you can within the space of a line or two.
A type declaration (possibly inline), specifying the input and output types.
Test cases. A function without test cases is incomplete. Write the test cases first, please.

For this assignment, you must develop your solutions using the typed/racket language. If you haven’t seen them, you might be interested in these Hints on Using Typed Racket in CPE 430.

Your test cases must use the check-equal?, check-=, or check-exn forms.

Your solution should take the form of a single file.

Hand in your solution using the handin server. For help with the handin server, please see the course web page.

2.1 Handling Errors🔗

All of your error messages must contain the string "OAZO". Essentially, this allows my test cases to distinguish errors correctly signaled by your implementation from errors in your implementation. To be more specific: any error message that doesn’t contain the string "OAZO" will be considered to be an error in your implementation.

Additionally, your error messages should be actually helpful. Since you are the primary consumer of your own error messages, making these error messages good in the first place should reduce your overall development time. There are two parts to this: first, the error message should include text that actually indicates what the programmer did wrong. Second, include the text of the user’s program, so they (actually you) can figure out how to fix it. See lab 3 for an example of how to do this. (Apologies in advance if I renumber the labs and fail to update this paragraph....)

2.2 Progress Toward Goal comment🔗

Graders are happier when they know what to expect. Your final submission should start with a short one- or two-line comment indicating how far you got through the project. Ideally, this would just be: “Full project implemented.” But if you only implemented, say, squazz and blotz, and didn’t get to frob or dringo, please indicate this in the comment, so that we don’t spend all our time searching for bits that aren’t there.

3 The Assignment🔗

This assignment will build on the Assignment 5 interpeter. You don’t want to try to build this on top of Assignment 6, ditch those extra I/O primitives.

This assignment makes us pay in sweat and tears for every form in the language.

Fortunately, we simplified the language significantly in Assignment 5, when we stopped representing binops as separate language forms. Also, we represented true and false as variable references. All of these simplify the language.

Best of all, we implemented let as syntactic sugar; that means that we won’t need to make any changes to it, either. Yay!

3.1 New Values🔗

Add a value that represents the null value. It will be used as the result of mutation operations. The serialize function should produce "null" when called with a null value.

The array value is also a new kind of value, but I’m going to delay discussing it for a few paragraphs.

3.2 New Forms🔗

Implement these new forms as primitive bindings in a top-level environment, just like your existing operations such as + and -

arr: creates a fresh array of the given size, with all cells filled with the given value. So, for instance, {arr 34 0.0} would create an array of 34 cells, all containing 0.0. It is illegal to create an array with fewer than one cell.
aref : returns an element of an array. So, for instance, {aref p 15} would return the contents of cell 15 of the array named by p. If the array does not have that many elements, you must signal an error. The first element has index 0.
aset : the aset form is for arrays. So, for instance, {aset p 15 {f 6}} would set element 15 of the array named by p to be the result of calling f with 6. It must return nullV. The first element of an array has index 0.
seq : evaluates a sequence of expressions, returning the last one. So, for instance, {seq {f 9} p} would evaluate {f 9}, then return the value of p. It’s illegal to call seq with fewer than one argument.
substring : accepts a string and a start and end position, and returns the corresponding substring. Use Racket’s substring to implement this function.

There’s one more form, and this one can’t be implemented as a primitive function:

:= : the := form is used to mutate bindings. So, for instance, {l := 9} will change the binding of l to contain 9. It must return nullV. It is an error to mutate a variable that is not already bound.

In order to allow mutable bindings and arrays, you’ll need to add a store, and rewrite your interpreter in store-passing style, as we discussed in class.

In order to allow mutable bindings, you will have to change the type of the environment; rather than mapping names to values, it will map names to locations. That is, "every" binding will be a reference to the store.

3.3 Order of Evaluation🔗

In a language with mutation, programmers can observe the order of evaluation of function call arguments. For this language, all forms must perform left-to-right evaluation.

3.4 Array Values🔗

In this assignment, arrays will contain only real numbers. This is has nothing to do with mutation, but instead with type checking. In order to support arrays of varying types, we need parameterized types, for instance array-of-numbers and array-of-strings. This construction complicates type checking somewhat, so we’re simplifying the assignment and allowing only numbers as array members.

The serialize function must handle arrays. It should simply return the string "#<array>" for an array.

4 Syntax of OAZO8🔗

A OAZO8 program consists of a single expression.

The concrete syntax of the OAZO8 expressions with these additional features can be captured with the following EBNFs.

Expr	=	Num
	\|	id
	\|	String
	\|	{id := Expr}
	\|	{if Expr then Expr else Expr}
	\|	{let [[id : ty] <- Expr] ... Expr}
	\|	{anon {[ty id] ...} : Expr}
	\|	{seq Expr ...}
	\|	{Expr Expr ...}

ty	=	num
	\|	bool
	\|	str
	\|	void
	\|	{ty ... -> ty}
	\|	numarray

operator	=	+
	\|	-
	\|	*
	\|	/
	\|	num-eq?
	\|	str-eq?
	\|	<=
	\|	substring
	\|	arr
	\|	aref
	\|	aset
	\|	alen

... where an id is not let, :=, if, then, else, :, <-, seq.

4.1 Primitives🔗

procedure
(+ a b) → num
a : num
b : num

Compute a + b.

procedure
(- a b) → num
a : num
b : num

Compute a - b

procedure
(* a b) → num
a : num
b : num

Compute a * b

procedure
(/ a b) → num
a : num
b : num

If b is not zero, compute a/ b.

procedure
(<= a b) → boolean
a : num
b : num

Return true if a is less than or equal to b.

procedure
(num-eq? a b) → boolean
a : num
b : num

Return true if a is equal to b.

procedure
(str-eq? a b) → boolean
a : str
b : str

Return true if a is equal to b.

procedure
(arr-eq? a b) → boolean
a : array
b : array

Return true if a and b refer to the same array.

procedure
(arr size default) → numarray
size : num
default : num

Return a new array of size size whose elements are all initially set to default, which must be a number.

procedure
(aref arr idx) → num
arr : numarray
idx : num

Given an array and an index, return the value in the array at the given index.

procedure
(aset arr idx newval) → Void
  arr : numarray
  idx : num
  newval : num

Given an array and an index and a new value, mutate the array at the given index to contain the new value, and return void.

procedure
(alen arr) → num
arr : numarray

Given an array, return its size.

value
true : boolean

the literal boolean representing true.

value
false : boolean

the literal boolean representing false.

5 Suggested Implementation Strategy🔗

Store-passing style is a nijlpaard. Here’s how I’d get started.

First thing, strip down the language. In the interp function, comment out everything except the evaluation of numbers and applications of primitive functions. Add an other rule that signals an "unimplemented" error for all other forms. Make sure that your test cases for binops still work.

Time to add the store! Formulate the define-types and type aliases that you’ll need for stores (just like the book’s, or use immutable hash tables if you prefer). Change the Env type to map names to locations. Choose a representation for a Value-combined-with-store, and change the type of the interp function so that it accepts a store and returns a Value-combined-with-store. Update your test cases so that they pass a store in, and expect the v*s including the answer and the store. Rewrite the interp rules for numbers and primitive applications so that they thread the store through the computation as they should.

With luck and some effort, you should be able to get those binop programs working again.

If this takes you a lot of time and effort, don’t worry: this might be the hardest part of the assignment. Once you get the hang of transforming code into store-passing style, it will get easier. Check to make sure that each store is used exactly once (with the exception of the mutation operations you’ll add later).

Next, I would add identifiers and functions; these are both one-line changes. Now you can re-enable your tests involving these items.

Next, I would add the mutation operations. Note that I’m advising you to add these before adding your other language forms (if, application of closures) back in. First off, I think I’d add an allocate helper function that accepts a store, a number of locations to be allocated, and a value to place in all of them, and returns two things; the base location, and an extended store.

Design your store so that the "next allocated" location is derived directly from the store. It could be a separate counter that’s part of a define-type, or it could be a function that just scans the store to find a new address. Don’t use the new-loc defined by the book; it makes testing quite painful.

Next, I would add a new arrayV value to the set of values. This will require a bunch of extra clauses in various places (serialize, for instance). Note that the representation of arrays is up to you, but it had probably better include a location and a length.

Following this, I would add the arr operation. At this point, you should be able to create arrays, and the result of interpretation should include a store that contains lots of new allocated locations.

At this point, I would go back and add test cases that create arrays as subexpressions of the arr-eq? binop; check that the allocations happen in the right order. As you go forward, you’ll want to use this technique to check order of evaluation for all of your forms.

At this point, it starts to make less difference what order you add language forms in. I think I would probably wait on applications, just because there will be lots of opportunities for mistakes.

5.1 Type Checking🔗

Implement a type checker for your language. Note that since functions must now come annotated with types for arguments, you will need to have a type parser that parses types. You should definitely make a separate function for this. Note that the types of functions are extended to handle multiple arguments. So, for instance, the type {num str -> bool} refers to a function that accepts two arguments, a number and a string, and returns a boolean.

All type rules are standard.

5.1.1 Operators🔗

Type-checking binops is more or less as you might expect. For instance, a + should receive two numbers, and will produce a number. The <= operator will take two numbers, and return a boolean.

The equal? operator is a bit different. Specifically, we don’t have subtyping, and we treat the equality operator as a function in the environment, so it must have a single type. In order to simplify our lives, we split it into three equality operators; one that only works for numbers, called num-eq?, with type {num num -> bool}, one that only works for strings, called str-eq?, with type {str str -> bool}, and one that only works for arrays, called array-eq?, with type {array array -> bool}.

6 Suggested Type checking Implementation Strategy🔗

Here are some of the steps that I followed. I wrote test cases for every step before implementing it.

6.1 Type Checking for Existing Language🔗

Add a data definition for Types.
Add a parser for the language of types.
Update your test cases to include types for the function declaration and where forms. At the same time, update your parser to accept the updated forms, parse the types, and just throw them away. This step is an important one, because it suddenly takes you from passing basically no test cases at all to passing a whole bunch of them.
Develop a type checker that only works for constants.
Add a definition for Type Environments, mapping symbols to types.
Extend your type checker to handle variables and if expressions.
Extend your type checker to handle applications. (Since you don’t have types for LamC terms yet, your test cases should just use primitive functions.)
Extend your AST nodes for LamC nodes to include types with the parameters, modify parser to store types in the AST node rather that just throwing them away.
Extend your type checker to handle LamC terms.
Extend your parser and type checker to handle the modified let syntax, update test cases.
Add your type checker to top-interp. Type checking should happen after parsing and before interpretation. The result of type checking is ignored; if it doesn’t signal an error, that’s all you’re looking for.

At this point, you’re in good shape; your assignment should now pass a significant fraction of the test cases.

7 Interface🔗

Make sure that you include the following functions, and that they match this interface:

procedure
(parse s) → ExprC
s : Sexp

Parses an expression.

procedure
(parse-type s) → Ty
s : Sexp

Parse a type.

procedure
(type-check e env) → Ty
e : ExprC
env : TEnv

Type-check an expression.

value
base-tenv : TEnv

The base type environment.

procedure
(interp e env sto) → Value-And-Store
  e : ExprC
  env : Environment
  sto : Store

Interprets an expression, with a given environment and store, returning both the resulting value and the new store.

procedure
(top-interp s) → string
s : s-expression

Combines parsing, type-checking, interpretation, and serialization.

1	Goal
2	Guidelines
3	The Assignment
4	Syntax of OAZO8
5	Suggested Implementation Strategy
6	Suggested Type checking Implementation Strategy
7	Interface