David N. Welton
davidw@dedasys.com
2006-03-15
Have you ever wondered what it's like to create your own programming language? This article covers the design, and above all, implementation of the Hecl programming language. It explains, in detail, how we go about taking some text in a file, and making the computer carry out the instructions written therein. Since Hecl is not a fancy, complicated language, but rather a small, simple, humble language, it should be pretty easy to follow. If you are working on your PhD in computer science, you probably won't find this interesting, as you most likely already know more than I do about the subject. But for everyone else, I hope to provide an interesting glimpse into what goes into the making of a programming language.
A form created with Hecl, running on the J2ME emulator.
As I said above, Hecl is a small, simple language. It's that way for a reason - first of all, because I wanted to have it up and running quickly, but most of all, because I wanted something that would run in the space constraints of my J2ME mobile phone. Furthermore, despite - or perhaps because of - having a simple design, Hecl is quite flexible and extensible, meaning that it could grow into a lot of different niches. And, of course, it makes a good subject for an article, rather than the entire book that would be necessary to describe more complicated languages.
I started working on Hecl when a couple of things happened: I bought a cell phone that ran J2ME, and I decided it was probably time to learn some Java. This was a fortuitous combination, as I'm not very good at learning things without a project to hack on as an incentive, and the cell phone provided just what I needed. Java doesn't hold a lot of fascination for me - it's got some nice aspects, but most of my programming needs are covered either by dynamic languages like Tcl or Ruby, or C for the fast, low-level bits. However, J2ME is a very interesting space to work in, because it's already out there on millions of phones, and many of the phones that support it can't run anything else. The Symbian platform offers a bit more - you can port regular C code to it, but when I purchased my phone, that was still limited to relatively expensive models. Time will tell which platform will win, but in the meanwhile, I liked the idea of writing something that will work on millions of devices. And while there are some other scripting languages for Java, there only a few others that can run on the J2ME platform, because it imposes quite a few limits on what you can do with Java. For example, the reflection API, which is heavily used by languages like Groovy, does not exist.
The Hecl code that creates the above form.
With a goal in mind, and fingers itchy to starting laying down code, my last stop was to consider the language's overall design. In reality, this wasn't so hard. I really like the Tcl programming language, especially its syntax, which borrows more from Lisp than a lot of people realize or give it credit for. Tcl is, on the surface, a pretty simple language, which in turn allows it to be incredibly flexible. It's possible to write an object (classes, inheritance, and so on) system in Tcl itself, and even to add new control structures (like 'if' and 'while') that are written in Tcl itself! It is also an easy language to learn - it "scales down", something that I place a lot of value on because it means that the language is accessible to more people, even if they aren't great programmers or don't have a degree in computer science. Tcl isn't perfect, but since I'm not trying to write a Tcl clone in Java, I am also free to take what I like, improve what I don't like, and be creative where I need to be. Once again, the decision to abandon Tcl compatibility was, in part, driven by the need to keep things small, and in part to maximize the fun value of the project - it's more exciting to work without pre-established limits. There is already a version of Tcl written in Java called Jacl, but it is far too large for my phone's scarce memory.
The rest of this article deals with the technical details of the Hecl language. For those interested in seeing exactly how the code described fits into the rest of the system, it's possible to browse Hecl's code on line here: https://github.com/davidw/hecl - in particular the core classes, located here: https://github.com/davidw/hecl/tree/master/core/org/hecl
So - how does the computer recognize and act on some scribblings in a file? The first thing that Hecl must do is read what's in the file, and turn it into small chunks that it can make sense of. This portion of Hecl is the parser, defined in Parse.java. As mentioned previously, Hecl is a simple language. So in order to more easily discuss what's happening, let's look at a bit of Hecl code:
set txt [getprop $tf text]
if { ne $txt "" } {
# a comment
stringitem label "You wrote:" text $txt
puts "hello" ; puts "world" ;
}
Hecl's parser only recognizes a few characters as special: "" [] {} $ # ; and newline. The parser reads each line of text, and breaks it into pieces according to simple rules:
puts "The variable a is
equal to $a"
foo = bar()
, in Hecl, you
would do set foo [bar]
."foo\[bar"
would contain a literal left bracket instead of trying to evaluate
bar as a command, as an unquoted brace would do.That's all! Hecl's syntax rules can be described by just 8 rules. Even if that looks a little complex, think about what it would take you to describe the syntax of a language like Ruby, Python, or C (or Perl, for real masochists!)
With that in mind, let's look at the snippet of code above. The first line can be broken up like so:
set
.
txt
[getprop $tf text]
. Hecl
doesn't evaluate it immediately though, it just stores it and makes a note that it
should be evaluated.The second bit of code, also a single command, is similar.
if
,
followed by two arguments, which happen to contain more Hecl code:
ne $txt ""
# a comment
stringitem
label "You wrote:" text $txt
puts "hello" ; puts "world" ;
This is important to understand,
because it's very different from how many languages work. In Hecl,
if
is an ordinary command just
like any other, whereas in C or Python it is part of the syntax,
rather than a function that users can define or modify.
In any case, what Hecl does as it reads in the source code is create an "internal representation" of the various commands and variables it is reading about. It does this so that it can store the whole thing, and then run it when it's ready, rather than trying to execute it piece by piece. This speeds things up some when code gets executed multiple times - the code in a while loop, for example - because Hecl doesn't have to re-parse it each time. When taken at as a whole, the parsed Hecl program could be viewed as a tree, with each node a part of the program as described below.
This brings up a discussion of how Hecl stores values, which we need to cover before we come back to our discussion of parsing and Hecl's internal representation.
Since the term "object" has been used far too much in the
realm of computer science, for Hecl I decided to revert to the equally
vague, but more anglo-saxon term "thing". All values in
Hecl are covered by the Thing
class, which holds another
value internally depending on what type of value it's holding. As far
as more or less standard types of things, Hecl has
StringThings
, IntThings
,
ListThings
, and HashThings
, as well as
DoubleThings
for non-J2ME versions of Hecl. These are all
pretty standard data types/structures found in most scripting
languages. [1] Hecl also has an ObjectThing
, which can
contain any Java object, with the caveat that if it's transformed into
its string representation, it will lose the reference to the object it
contains.
Where things get tricky is in the internal types, where Hecl has
CodeThings
, GroupThings
, and
SubstThings
, which, along with plain old
StringThings
, cover the internal representation of Hecl
code. SubstThings
are the simplest. When the parser
encounters a variable with a dollar sign or ampersand, it creates a
SubstThing
to place in Hecl's internal view of the
script. To understand how this case works, consider the following
script:
set i 0
while { < $i 100 } {
puts $i
incr $i
}
Obviously, Hecl needs to reexamine the value contained in 'i' each time through the script rather than substituting it once! SubstThing lets Hecl do that, because it knows it's dealing with something that really stands for something else - "$i" in this case refers to an integer, but a different one each time through the loop, so it must be substituted each time through.
Things get a bit more complex, though, because Hecl, like many scripting languages, lets you do "string interpolation". Consider this example:
puts "The value of 'i' is still $i and 1 + 1 is still [+ 1 1]"
We use the GroupThing
class to handle this case. As
the name suggests, it's a group of other things, as shown below:
GroupThing
s
evaluate their components when Hecl needs to know what the GroupThing
itself contains.
Things are implemented in Java like so: a Thing
class
that contains a RealThing
, which is its actual
value:
public class Thing {
public RealThing val;
CodeThing
, GroupThing
,
StringThing
and all the others implement the
RealThing
interface, so that Hecl can deal with them more
or less interchangeably. The implementation of IntThing is a good
starting point, because it's pretty easy to understand what an
IntThing is and does.
public class IntThing implements RealThing {
private int val;
We see that an IntThing holds an int - the real value that we are actually interested in after all this indirection! There is a convenience method in IntThing for the creation of new Things representing integers:
public static Thing create(int i) {
return new Thing(new IntThing(i));
}
Which gets used a lot like so: Thing foo =
IntThing.create(56);
However, in terms of importance, the
heart of what makes Hecl's types dynamic are two methods:
private static void setIntFromAny(Thing thing) throws HeclException {
RealThing realthing = thing.val;
if (!(realthing instanceof IntThing)) {
thing.setVal(new IntThing(thing.getStringRep()));
}
}
public static int get(Thing thing) throws HeclException {
setIntFromAny(thing);
IntThing getint = (IntThing) thing.val;
return getint.val;
}
What's happening here is that when you need to get an int from a
Thing
, you do call get like so: int foo =
IntThing.get(foothing);
as you can see, this code examines the
thing's internal value, and in case it's not already an
IntThing
, turns it into a string and tries to parse it
into an integer. Of course, that might fail, because if you try and
get an integer out of a string "bleagh", it's just not going
to work out! That's why both methods are capable of throwing
HeclException
, which we'll cover shortly.
Prior to understanding how Hecl code is actually evaluated, let's return to our discussion of parsing for just a moment. Now that you know how Hecl actually stores things internally, it's easy to understand what happens when Hecl code is parsed - it gets turned into Things:
puts "foo bar" -> StringThing (puts) and a GroupThing (foo bar).
puts $i -> StringThing (puts) and a SubstThing (i)
set i [+ $i 10] -> StringThing (set), StringThing (i), followed by a
CodeThing. The CodeThing is parsed up like so:
+ $i 10 -> StringThing (+), SubstThing (i), StringThing(10)
You may be wondering why the 10 above is a StringThing
when it's pretty obvious that it's actually a number and intended to
be used like one. Your intuition is correct, but Hecl initially
parses everything as strings, and only transforms them into other
types of things when the code is evaluated.
To understand how Hecl is actually evaluated once we have a
representation stored in memory, we need to examine
CodeThings
, which are the real motor of Hecl. You've
probably guessed from the name that they contain code, which is indeed
correct. CodeThings
are divided into
"stanzas", which are equivalent to individual commands. For
instance, in the following code, the actual code would be transformed
into a CodeThing
with two stanzas, puts
$i
and incr $i
. Each time the
CodeThing
is asked to run, it loops through the Stanzas,
and in turn, calls their run
method.
The actual evaluation takes place in the stanzas. To Hecl, the first stanza looks something like this:
StringThing (puts) SubstThing (i)
In order to do something with that, it needs to look up
"puts" as a command. The Interp class contains a hash table
that maps command names such as "puts", "if",
"proc", "+", "list" and so forth, to the
respective Java classes that implement that code. The
"puts" command maps to the PutsCmd
class, which
is very, very simple:
class PutsCmd implements Command {
public void cmdCode(Interp interp, Thing[] argv)
throws HeclException {
System.out.println(argv[1].getStringRep());
}
}
All command classes implement the
Command interface, and therefore must have a cmdCode
method. But let's not get ahead of ourselves...
The stanza that contains the puts
command also has something to pass to it as an argument, in this case
a SubstThing
. For each of the command's
arguments in the stanza, Hecl checks what type it is. If it's
something simple like an IntThing
or
StringThing
, it gets passed on to the
command sans modification. SubstThing
s
GroupThing
s and other CodeThing
s
are recursively called in order to find out what value they actually
represent, so that the actual command might be called like so:
puts 23
The arguments (including the command
name) are passed to the cmdCode
method
as an array of Things: Thing[] argv
array. In the case of PutsCmd
, all it
does is print out the string representation of the first argument.
This is how all Hecl commands work, even if
and
while
, which take code as arguments, evaluate it, and act
accordingly.
This simple strategy isn't blazingly fast, but it is very, very flexible. You can replace code on the fly, rename commands, and do all kinds of other interesting tricks. Additionally, it's easy to implement this system in the constrained J2ME environment.
Astute observers will have noted that when we covered SubstThings above, we didn't really explain how they work. How does Hecl go from a variable name - say, "foo", to a value contained therein, like 42? Hecl uses a simple hash table lookup scheme where each "stack level" has its own hash table that associates names to values. What this means is that inside a 'proc', a new hash table is created to hold local values. To see this in action:
set a 0
puts "a is $a"
proc foo {} {
set a 10
puts "in foo, a is $a"
}
foo
puts "a is $a"
when run, gives us:
a is 0
in foo, a is 10
a is 0
In terms of program flow, you can see in the above figure that at
1, the global variable lookup table is in effect. When program
evaluation enters the foo
procedure, the global lookup
table is replaced by a new one for the procedure being executed. At
3, when the procedure exits, the hash table that was created in the
interpreter to hold its variables is set to null and left for Java's
garbage collector to clean up. The stack itself is implemented with
the convenient java.util.Stack
class.
The variable hash table lookup routines are contained in the
Interp
class (along with the command lookup hash
table).
Indeed, the Interp
class is the central data structure
that holds everything together. It contains the program's state in
terms of variables, existing commands and procedures, exceptions, and
so on. For Java developers looking to include Hecl in their own code,
one of the handy aspects of Interp
is that you can have
more than one of them at a time. Imagine, for instance, a server that
evaluates Hecl code for clients. Each authenticated client could have
their own separate Hecl interpreter without interfering with the
others!
The most commonly used methods in an interpreter are those which exist for modifying its state.
Make a command visible to Hecl:
interp.commands.put("newcommandname", new ClassThatImplementsCommand());
Create and set a variable:
interp.setVar("temperature", new Thing(99));
Evaluate code:
interp.eval(new Thing("puts {hello world}"));
Set the result from a command:
interp.result = new Thing("ok, like, that worked, and stuff");
So far, we've described how things should work. Life, however, has a habit of not going exactly as we'd planned, and that applies to Hecl programs as well. We need a way to deal with errors. Java itself has a featureful exception mechanism, so we take advantage of that in Hecl. Anything that could cause an error in Hecl throws a HeclException, so that this code, in Stanza.java, can catch that and deal with it appropriately:
try {
interp.result = new Thing("");
tmpcommand.cmdCode(interp, newargv);
This is the actual command being executed.
} catch (HeclException e) {
if (newargv[0] != null) {
e.where(newargv[0].getStringRep());
}
throw e;
We tell the exception what command it happened in, so that we can get a stack trace to show the chain of commands that failed.
} catch (Exception e) {
String msg;
msg = e.getMessage();
if (msg == null) {
msg = "(null exception of type " + e.getClass() + ")";
} else {
msg = "Exception of type " + e.getClass() + ": " + msg;
}
throw new HeclException(msg);
}
And in the case of a regular Java exception, we catch it and create a
new HeclException
, so that it can be dealt with in Hecl
itself. Hecl has a command, catch
, that can be used to
catch exceptions and deal with them programmatically. It works by
evaluating the code passed to it, catching any exceptions that code
may have thrown, and then simply returning those exceptions to the
caller of the command in the form of a Hecl variable. In other
words:
set cmds " some code to evaluate "
catch $cmds result
This will evaluate the code in cmds
and place any results
in the result variable. For programs that must not halt, even on
errors, catch affords the programmer a means by which he or she may
attempt to handle errors gracefully.
Exceptions have another purpose, besides errors. If you think about a while loop, it might look like this:
while { true } {
if { > $somevar 100 } { break }
do_other stuff
incr $somevar
}
How does the break
command work? It stops execution of
the loop when it's run, yet it's not an error. It throws a
HeclException
itself, tagging it as a break
exception: throw new HeclException(HeclException.BREAK)
.
At that point, it's up to commands implementing loops to catch it.
Here's what the while command looks like:
while (Thing.isTrue(interp.eval(argv[1]))) {
try {
interp.eval(argv[2]);
} catch (HeclException e) {
if (e.code == HeclException.BREAK) {
break;
} else if (e.code == HeclException.CONTINUE) {
} else {
throw e;
}
}
}
Remember, while is a command with two arguments - the condition, and
the code. As long as the condition is true, the loop keeps executing
- each time through, the code is evaluated. However, we also look out
for the two special exceptions and act on them. Break causes the loop
to end, while continue causes the loop to short-circuit back to the
evaluation of the condition. If, however, the
HeclException
is not one of these special types, the
error is thrown again and percolates its way up the stack.
As discussed at the beginning of this article, Hecl isn't meant as a replacement for Java, but rather to work alongside it. It's unlikely that an entire software system would be written entirely in Hecl at this point in time (although it's likely to evolve in this direction), so it's important to discuss the integration of Hecl and Java. There are two ways of doing this:
If you have a program written in Java, and you'd like to make it scriptable, adding a Hecl interpreter to it is very simple. This would be an appropriate strategy if you wanted to let users change some variables, or create callbacks that are executed at certain points in the life cycle of your program. A simple example might be a web server that evaluates Hecl scripts when certain URL's are requested.
The place to look at for an example of this technique is the Hecl
command line itself, specifically the file
commandline/Hecl.java
. Here's the main
method:
public static void main(String[] args) {
try {
int i;
Interp interp = new Interp();
/* Add the standard packages in. */
new HeclFile().loadModule(interp);
new org.hecl.fp.HeclFloat().loadModule(interp);
new org.hecl.load.HeclLoad().loadModule(interp);
Vector argv = new Vector();
for (i = 0; i < args.length; i++) {
argv.addElement(new Thing(args[i]));
}
interp.setVar("argv", ListThing.create(argv));
if (args.length > 0) {
HeclFile.sourceFile(interp, args[0]);
} else {
Hecl.commandLine(interp);
}
} catch (Exception e) {
e.printStackTrace();
}
}
Simple enough. First, the interpreter is created, some modules loaded
in, and the command line arguments examined. If there is a command
line argument, it's treated as the name of a file containing Hecl code
to evaluate, which is done via the HeclFile.sourceFile
method.
If there aren't any command line arguments, Hecl runs the
commandLine
method, a "REPL", or
read-eval-print loop, reading Hecl commands from standard input,
evaluating them, printing the result, and so on. That code is pretty
simple too:
while (true) {
System.out.print("hecl> ");
System.out.flush();
line = buff.readLine();
/* Exit on end of file. */
if (line == null) {
System.exit(0);
}
try {
interp.eval(new Thing(line));
if (interp.result != null &&
Compare.compareString(interp.result,
new Thing("")) != 0) {
System.out.println(interp.result);
}
} catch (HeclException he) {
System.out.println(he);
}
}
The key points are that to utilize Hecl code from Java, you need to initialize the interpreter, and then use it to evaluate code.
The other model is to write your program in Hecl, and add Java extensions to perform specific tasks. For instance, to create the charts for some programming language popularity statistics that I ran (http://www.dedasys.com/articles/language_popularity.html) I wrote the program in Hecl, and wrote a Hecl extension that provides an interface to the JFreeChart library.
This model is very popular with mature languages like Tcl, Ruby and Python, where the scripting language contains all the functionality necessary to write a fully functioning system, and extensions (usually written in C), are needed only to deal with specific external libraries or devices.
In any case, it's easy to do with Hecl - the basic trick is to
write a class that implements the
org.hecl.modules.HeclModule
interface, like so:
public class JFreeChartHecl implements Command, HeclModule {
public void cmdCode(Interp interp, Thing[] argv) throws
HeclException {
... command implementation ...
}
public void loadModule(Interp interp) throws HeclException {
interp.commands.put("barchart", this);
interp.commands.put("savechart", this);
}
public void unloadModule(Interp interp) throws HeclException {
interp.commands.remove("barchart");
interp.commands.remove("savechart");
}
}
In this case, the module also implements a command - the basic goal of any extension to Hecl will be to add one or more commands that provide access to some external library or resource
This can then be compiled into a .jar file, and then loaded into Hecl with the load command:
load JFreeChartHecl {
/home/davidw/hecllib/build/jfreechart/
/home/davidw/downloads/jfreechart/lib/jfreechart-1.0.0-rc1.jar
/home/davidw/downloads/jfreechart/lib/jcommon-1.0.0-rc1.jar
}
Which tries to load the named class and its dependencies from the
specified URL's. Simple (local) files are transformed into URL's. In
this case, it tries to load JFreeChart, which it finds in the
jfreechart/build
directory of hecllib. That in turn
requires code from jfreechart and jcommon, which are loaded from the
respective jar files.
As you've seen from this tour through Hecl, an interpreter is an involved piece of software that requires many decisions. But that's what makes it so much fun - there is a lot of room both for both clever technical solutions, and elegant design - although as they say, "beauty is in the eye of the beholder"! In any case, Hecl is a small and simple enough language that anyone with a bit of Java knowledge should be able to use it for their own experiments. The software, by the way, is covered by the very liberal Apache license, Version 2 (which can be found here http://www.apache.org/licenses/LICENSE-2.0), meaning that you can even include it in proprietary, commercial offerings, so by all means, use it in your own software.
I aim to keep Hecl small and simple enough to run on as many cell phones as possible, meaning that the core itself will stay as small as possible. I am a big believer in the "make it, make it right, make it fast" strategy, and we are still making Hecl, so there is lots of room for people to help out in terms of adding features and extensions, and, since Hecl is at such an early stage, even experimenting with the language itself. Creating a programming language has been very challenging, creative endeavor, and those interested in participating in shaping Hecl's future are welcome.