The GENERATOR language is used to write programs which process list/tree structures. These structures may have been produced by SYNTAX rules or by other GENERATOR functions.

The GENERATOR language has many special features oriented toward the coding of the semantic phase of a language processor. How ever, many of the language constructs are general list and symbol manipulation functions, allowing SLIC to be used for many AI applications. If the GENERATOR language program is a part of a compiler the output is PSEUDO instruction lists.

A program written in the GENERATOR language is made up of functions and/or procedures called generators. The syntax of a generator may be described as follows:

GENERATOR = ID TRANSFORM $TRANSFORM;

A generator is as a set of transforms that operate on the input. Each transform is made up of an input pattern and an action:

TRANSFORM = PATTERN '=' ACTION;

PATTERN = '[' (PARAMETER_DESCRIPTION / .EMPTY) ']';

Each transform describes a pattern and action to be performed for a specific input. For example:

FACTORIAL[0] = .RETURN 1;

[N] = .RETURN FACTORIAL[N-1] * N;

FACTORIAL is a generator with one input parameter. If the input parameter matches the first description--namely, zero a value of one is returned. Otherwise, the input parameter is given the name N, and the value returned will be the value of the expression FACTORIAL(N-1) * N.

An example that evaluates a numerical expressions:

digit: '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | '0';

num .. digit $digit MakeNum[];

exp = term (('+':add | '-':sub) term !2)  print[*1];

term = factor (('*':mul | '/':div) factor !2);

print[xpr] = output[eval[xpr], con];

eval     [add[x,y]] =  return eval(x) + eval(y);

         [sub[x,y]] =  return eval(x) - eval(y);

         [mul[eval(x),eval(y)]] =  return x * y;

         [div[eval(x),eval(y)]] =  return x / y;

         [isnumeric(x)] = return x;

Note in the above that eval is called from the action in case of add and sub node transforms. The return values from eval are then added or subtracted. In the case of mul and div eval is called in the pattern of their transforms. This is a special function call. When a function call is made in a pattern the unnamed pattern element is passed to the function and the arguments of the function are the returned values of the function. In the above case eval(x) of the mul and div transforms take the unnamed left branch of the mul or div tree as it's single argument and return x as the result. Like wise eval(y) does the same for the right branch of the tree.

A programming language may be described in terms of its basic data types and the operators that may be performed on them. These operators and data types are combined to form expressions, which direct the performance of operations and return as a value the result of those operations. In the GENERATOR language, these expressions may be composed of operations involving a variety of data types.

Some of these data types, such as integer, Boolean and string, are familiar from more conventional programming languages and need little explanation. Some, such as gensym, and list, are less common, and it would be well to discuss them.

Tokens were encountered in the discussion of the SYNTAX language and defined as the "words" of the source language being compiled. In the GENERATOR language, this class of data is divided into three subsets: numbers, symbols, and strings. A symbol is that token which is created by the MakeSymbol function. Strings are a data type like symbol consisting of a collection of characters; however, unlike symbol they have no dictionary entry and cannot have attributes. String and symbol resemble what in conventional programming languages is called a "string", or a "literal". In the GENERATOR language, however a symbol is more than a collection of characters; it is an entity which may have attributes assigned to it.

The gensym is a data type essential to a compiler that is used to describe the processing of a language. It is a symbol generated by a compiler, a unique name that the compiler builder employs to provide a branch point or data storage location for which no name exists in the source program being compiled.

In a real sense each of these data types--numeric, Boolean, symbol, gensym, or string--is a subtype of the single class of data treated in the GENERATOR language. This class of data is the atom, so called because it is the smallest meaningful unit of information. Atoms may be either numeric or non-numeric, and all atoms may (in the appropriate context) have a Boolean meaning. Atoms may be combined into structures of arbitrary complexity, called lists (or trees).

An atom may appear in a variety of guises with a variety of characteristics. A numeric atom, for example, is the only type of atom upon which arithmetic may be performed. Of the nonnumeric atoms, only symbols and gensyms may have property lists. These lists contain attributes assigned to the atom by the programmer and are comparable to a compiler's or assemblers symbol table or dictionary of the identifiers used in a source program. A dictionary might contain such information about the items entered into it as their print name, memory address, data type, size, etc.

It must be emphasized that every single item of data in the GENERATOR language is an atom. The Boolean constant .TRUE is an atom. The constant .NIL, which is used to represent the Boolean value .FALSE, is an atom, even though it also represents the list of zero elements (the empty list) and is thus an exception to the normal rule that a list is not an atom (and vice versa). Anything whose value is not .NIL has the value .TRUE.

A GENERATOR language variable may reference or contain any type of atom, tree or list. Variables do not have any specific type associated with them, and nothing about a variable suggests the nature of what it contains, It may contain a number at one point in time, then the value .TRUE, then a string, then another number, etc. The GENERATOR language implements operators and functions that permit the programmer to interrogate a variable as to its nature at a specific time.

The classic assembler or compiler builds a symbol table to associate the various symbols encountered in processing a source program with the value of the location counter at the time the label is defined.

The symbol table is a simple function that maps names into addresses, generally by a table lookup operation. Since there is only one value associated with each symbol, there is no need to specifically name the attributes LOCATION or to identify the contents of the location counter as the value of each symbol's LOCATION attribute, since neither can have any other meaning.

In more sophisticated language processors, such as those produced with SLIC many attributes, as well as values for these attributes, become associated with each symbol of the source program. For example X might have the attribute LOCATION with a value 8000, the attribute CLASS with the value VARIABLE, the attribute TYPE with the value REAL.

Such a collection of attribute-value pairs associated with a symbol is called a property list. The collection of symbols encountered during operation of a compiler, each with its associated property list, is called the dictionary.

In order to allow a symbol to have nested scope there is a symbol tables stack. A new symbol table instance is created using PushSymbols() and released using PopSymbols(). These functions return the current level of the symbol table nesting. ScopeLimit(n) is used to restrict symbols to scope n and above. These functions are used to force MakeSymbol to create symbols in the current scope. If it is not apparent why one would use these functions you should be browsing somewhere else.

Back to SLIC main page