What has to happen in all hygienic macro is a fake reconstruction of lexical scope at compile time. It's really like Epicycles: http://csep10.phys.utk.edu/astr161/lect/retrograde/aristotle... ('The "solution" to these problems came in the form of a mad, but clever proposal').
The only satisfying solution is the one adopted by Kernel, and it is to not quote, but rather include first-class values in the generated source. This way, you leverage the existing lexical scope machinery to achieve hygiene - as $DEITY intended.
It is true that symbols are unhygienic, but if you use boxes instead, then you get perfect hygiene. Unlike function/vau values, boxes can exist entirely at compile-time, so they're suitable for compilation to JavaScript.
For a working example, check out my language Nulan, which compiles to JavaScript and has hygienic macros that use boxes:
So the idea behind boxes is that you can create boxes at compile-time for runtime stuff that doesn't exist yet?
Note that the same idea can work for Vau first-class values: at compile-time you simply create a faux value for each non-compile-time top-level (or other) definition. Macros can then insert them into generated code, even though they are not the real runtime object.
Yes, at least in Nulan the boxes only exist at compile-time: at runtime they're normal JS variables. Of course, if you were writing a VM, the boxes could also exist at runtime, which would be faster than having to lookup a symbol: you would just unbox it!
And yes, I suppose it would be possible to use boxes in some way with vaus, especially because modern vau-based languages use lexical scope. This doesn't completely solve the "vau are slow" problem, but it at least avoids a symbol lookup for every single variable.
In fact, come to think of it... I had been playing around with this idea, but I had forgotten about it. The basic idea is to create a first-class environment/namespace, but rather than mapping symbols to values, it would map symbols to boxes.
The idea is that, this namespace exists both at compile-time and run-time, so that at compile-time it uses the namespace to replace symbols with boxes (thus gaining speed and the ability to statically verify things at compile-time), but using "eval" at runtime would first lookup the symbol in the namespace (which returns a box), and would then unbox it.
If done properly, I think such a system would appear to behave similarly to the normal environment system as used by Kernel, but it could potentially be faster, because most symbols can be replaced at compile-time with boxes. In any case, it's an implementation detail, unless the language decides to expose the boxes underneath.
My plan is to use these ideas not with fexprs, but with procedural macros, thereby gaining fully static expansion at compile-time, because I want a type checker, and type checking fexprs is currently not possible to my knowledge.
The macroexpansion pass would register fake bindings (very similar in spirit to your boxes) for all runtime definitions in the compile-time environment.
Macros then include these fake objects at compile-time (just as fexprs would with the real objects at runtime) into the generated code.
Thereby, I hope to maintain the pleasurable hygiene properties of fexprs in a language with preprocessing macros.
Well then, that sounds exactly like what Nulan does. I'm glad to see these ideas spreading, whether independently discovered or not.
Since I've already explored these ideas, I would like to mention a couple things.
---
What you call "fake bindings", Nulan calls boxes. I like this name because it's short, unique, and intuitive. A box is simply a data structure that can hold a single item and supports getting that item and setting that item.
In other words, putting something into the box, and taking something out of the box. But changing the contents of the box doesn't change the box itself, obviously. And because they're mutable, boxes are equal only to themself. So even if two boxes hold the same value, they might not be equal to eachother.
Interestingly, boxes are exactly equivalent to gensyms:
(define foo (gensym)) ; Create the box
(eval foo) ; Get the value of the box
(eval `(set! ,foo 1)) ; Set the value of the box
And gensyms are equal only to themself. So they satisfy all the conditions of boxes, and in Nulan, gensyms actually are boxes! So another way of thinking about it is that Nulan replaces all symbols with gensyms.
Yet another way to look at it is that boxes are equivalent to pointers in C, but now we're straying a bit far away from Lisp...
Because I wanted Nulan to be hygienic by default, I decided to use boxes for everything. In addition, they're a first-class entity. That is, Nulan lets you run arbitrary code at compile-time, and also gives you access to boxes at compile-time.
And the way that Nulan implements macros is... basically, a "macro" is simply a box that has a ¯o property. When Nulan sees a box as the first element of a list, it checks if the box has a ¯o property (which should be a function), and if so, it calls it.
But boxes can have other properties as well, for instance &pattern lets you define custom pattern-matching behavior, &get is called when the box isn't a macro, and &set is called when assigning to the box:
This means the same box can have different behavior depending on its context. And because these things are tied to the box rather than the symbol, everything is perfectly hygienic and behaves as you expect.
Boxes also have other properties as well, like whether they're local or global, and whether they were defined at compile/run-time. And the Nulan IDE (http://pauan.github.io/nulan/doc/tutorial.html) understands boxes as well, so that if you click on a box, it will highlight it. So even though two variables may have the same name, the Nulan IDE knows they're two different boxes.
By the way, in Nulan, the correct way to write the "when" macro is like this:
$mac when -> test @body
'if test
| ,@body
Basically, Nulan doesn't have "quasiquote". Instead, "quote" supports "unquote" and "unquote-splicing". In addition, "quote" returns boxes rather than symbols, so it is by default hygienic.
If you want to write an unhygienic macro, you can use the "sym" function, which takes a string and converts it into a symbol:
$mac aif -> test @rest
w/box it = sym "it"
'w/box it = test
if it ,@rest
I prefer this way of doing things because it reduces the amount of syntax (no quasiquote), it means everything is hygienic by default, and it allows you to break hygiene, but you have to explicitly use the "sym" function. Because unhygienic macros are rare, I think this is a good tradeoff.
Also, you might have noticed that the above two macros don't use "unquote". As a convenience, when using "quote", if the box is local (bound inside a function), it will automatically "unquote" it, meaning that these two are equivalent:
w/box foo = 5
'foo
w/box foo = 5
',foo
The reason for having "unquote" at all is for splicing in expressions:
Actually, I don't mind unhygienic macros that much, so "hygienic macros" isn't the primary reason I love boxes. The main reason I like them is because they make it really easy to implement hyper-static scope. And once your language has hyper-static scope, it's very easy to make an amazingly simple, concise, powerful, and flexible namespace system.
So boxes really kill multiple birds with one stone, in a really simple to understand and simple to implement way. They may not have the generality of vau or functions, but they're a practical data structure that does a lot for little cost.
I can understand the logic for not using quotes in favor of simpler techniques for hygenic macros, but I still like having them as short hand for really common list manipulations. Or even just an easy way to make a symbol. Of course, I've also always wanted something like python's list and *dict for splicing a list onto the end of an argument set without having to do something ugly like (apply f (join (list a b) c)). (f a b @c) looks so much nicer to me...
Anyway, aside from aesthetics, boxes really do seem useful. Another thought that just occurred to me is that they should enable trivial implementation of setforms and "out args".
I.e., if (car (cdr (obj key))) returned a box, you would be able to just call '= on it, with no thought for how you found the value.
And for out args, all you need to do is have double-boxes for function arguments. If you access the outer box, 'get is passed through to the inner box and you just receive the value. If you just call '= on the outer box, the value gets shadowed like normal, because 'set would not be passed through by default. If you want to modify the external variable, all you have to do is unbox it and call 'set on the original box.
I feel like there might be a similar way to enable dynamically scoped functions with first class environments, but I'm not sure.
I've been thinking about this some more, and it seems to me that boxes in the way Nulan uses them make for a completely different evaluation paradigm than the traditional metacircular evaluation.
In the traditional evaluation scheme, using environments, symbols, read, eval, and apply, the process is as follows:
1) read turns text into code (lists of symbols and values)
2) eval looks up a symbol in the current environment, or expands a macro, and calls apply to get the value of a function
3) apply calls eval to get the value of the arguments, and then returns the value of the function
In a compile-time based box scenario, the environments go away and read/eval change to something like this:
1) parse the new code into lists of appropriately boxed values, and expand macros
2) recursively apply functions to values to get the final result.
More of a traditional compile/run separation. The main difference to note in the box formulation is that there is no such thing as an environment. The 'hyperstatic scope' that Pauan keeps mentioning is another way of saying, 'there is no scope', only variables.
Thus, there is no real way to make variations on the scoping scheme using hyperstatic scope, because that's all a read/compile time feature. Dynamic scope is impossible without tracking environments, which are otherwise unnecessary.
Now, if one were to use both first-class environments and boxes, one can theoretically have the best of both worlds, with only a little extra overhead, and most of that at compile time. Flexible scoping becomes possible by specifically referencing the variable in the environment instead of using the default boxed values. They would still be boxes, just referenced by symbol in the environment table.
Now what I'm wondering is whether there is any useful equivalence to be found between a box and an environment. Heck, just make a box an object that has a few hard coded slots (that can be accessed by array offset) and a hash table for the rest, and voila! It can replace cons cells and regular hash tables too :P All we need is a reason for making everything use the same core structure...
Yes, that's intentional. Because I wanted to compile to fast JavaScript, I chose an evaluation model that has a strict separation between compile-time and run-time.
---
"The 'hyperstatic scope' that Pauan keeps mentioning is another way of saying, 'there is no scope', only variables."
Actually, there is still scope. After all, functions still create a new scope. It's more correct to say that, with hyper-static scope, every time you create a variable, it creates a new scope:
box foo = 1 # set foo to 1
def bar -> foo # a function that returns foo
box foo = 2 # set foo to 2
bar; # call the function
In Arc, the call to "bar" would return 2. In Nulan, it returns 1. The reason is because the function "bar" is still referring to the old variable "foo". The new variable "foo" shadowed the old variable, rather than replacing it like it would in Arc. That's what hyper-static scope means.
From the compiler's perspective, every time you call "box", it creates a new box. Thus, even though both variables are called "foo", they're separate boxes. In Arc, the two variables "foo" would be the same box.
---
"Thus, there is no real way to make variations on the scoping scheme using hyperstatic scope, because that's all a read/compile time feature. Dynamic scope is impossible without tracking environments, which are otherwise unnecessary."
If by "dynamic scope" you mean like Arc where globals are overwritten, then that's really easy to do. As an example of that, check out Arc/Nu, which also uses boxes:
In the Arc/Nu compiler, it literally only takes a single line of code to switch between Arc's dynamic scope and hyper-static scope.
If by "dynamic scope" you mean like Emacs Lisp, then... actually that should be really easy as well. You would just replace the same symbol with the same box, and then use box mutation at run-time. Of course, at that point I don't think there'd be any benefit over run-time environments... In any case, I like lexical scope, and boxes work well for lexical scope.
Also, although Nulan has hyper-static scope, it does have dynamic variables. The way that it works in Nulan is, you create a box like normal...
box foo = 1
...and then you can dynamically change that variable within a code block:
w/box! foo = 2
...
Within the "w/box!" block, the variable "foo" is 2, but outside, it is 1. You can do this with any variable in Nulan.
---
"Now what I'm wondering is whether there is any useful equivalence to be found between a box and an environment."
No. And that's a good thing. I've found one of the major benefits of boxes is that they're a single stand-alone entity: each variable corresponds to a single box. In the environment model, you have a hash table which maps symbols to values, so you have a single data structure representing many variables.
The reason I prefer each variable being represented by a separate box is that it makes namespaces really really easy to design and implement. For instance, in Arc/Nu you can selectively import only certain variables:
(w/include (foo bar qux)
(import some-file))
This is really easy with boxes: you simply grab the "foo", "bar", and "qux" boxes and import them. But with environments, it's a lot harder, because the environment for the file "some-file" may contain all kinds of variables that you don't want to import.
You could take the values from the environment and copy them into the current environment, but now any changes made won't show up. With boxes, the changes show up. I don't think environments work well for namespace systems. But boxes work wonderfully well.
---
"It can replace cons cells and regular hash tables too :P"
I've thought about ways to replace hash tables with boxes, but I don't think it'd be especially helpful. Hash tables serve a different purpose from boxes, so it makes sense for a language to have both data structures. If you want a box, just use a box. If you want a hash table, just use a hash table.
Interestingly enough, both Nulan and Arc/Nu have a giant hash table at compile-time that maps symbols to boxes. This is basically the compile-time equivalent of the run-time environments that languages like Kernel have. Creating a new scope or changing the existing scope is as easy as changing this hash table, which makes it really easy to play around with different namespace systems.
"The main difference to note in the box formulation is that there is no such thing as an environment."
Doesn't step 1 need to use environment(s)? How else would it turn text into a structure that contains references to previously defined values?
---
"The 'hyperstatic scope' that Pauan keeps mentioning is another way of saying, 'there is no scope', only variables."
The hyper-static global environment (http://c2.com/cgi/wiki?HyperStaticGlobalEnvironment) is essentially a chain of local scopes, each one starting at a variable declaration and continuing for the rest of the commands in the program (or just the file). I think the statement "there is no scope" does a better job of describing languages like Arc, where all global variable references using the same name refer to the same variable.
---
"Flexible scoping becomes possible by specifically referencing the variable in the environment instead of using the default boxed values. They would still be boxes, just referenced by symbol in the environment table."
I don't understand. If we're generating an s-expression and we insert a symbol, that's because we want that symbol to be looked up in the evaluation environment. If we insert a box, that's because we want to look up the box's element during evaluation. Are you suggesting a third thing we could insert here?
Perhaps if the goal is to make this as dynamic as possible, the inserted value should be an object that takes the evaluation environment as a parameter, so that it can do either of the other behaviors as special cases. I did something as generalized as this during Penknife's compilation phase (involving the compilation environment), and this kind of environment-passing is also used by Kernel-like fexprs (vau-calculus) and Christiansen grammars.
---
"Now what I'm wondering is whether there is any useful equivalence to be found between a box and an environment."
I would say yes, but I don't think this is going to be as profound as you're expecting, and it depends on what we mean by this terminology.
I call something an environment when it's commonly used with operations that look vaguely like this:
String -> VariableName
(Environment, VariableName) -> Value
(Environment, Ast) -> Program
Meanwhile, I call something a box when it's primarily used with an operation that looks vaguely like this:
Box -> Value
This might look meaningless, but it provides a clean slate so we can isolate some impurity in the notion of "operation" itself. When a mutable box is unboxed, it may return different values at different times, depending on the most recent value assigned to that box. Other kinds of boxes include Racket parameters, Racket continuation marks, Racket promises, JVM ThreadLocals, and JVM WeakReferences, each with its own impure interface.
When each entry of an environment needs to have its own self-contained impure behavior, I typically model the environment as a table which maps variable names to individual boxes. The table's get operation is pure (or at least impure in a simpler way), and the boxes' get operation is is impure.
You were wondering about equivalences, and I have two in mind: For any environment, (Environment, VariableName) is a box. For any box, we can consider that box to be a degenerate environment where the notion of "variable name" includes only a single name.
I think he's talking about run-time environments a la Kernel, Emacs Lisp, etc.
Nulan and Arc/Nu use a compile-time environment to replace symbols with boxes at compile-time. But that feels quite a bit different in practice from run-time environments (it's faster too).
---
"Are you suggesting a third thing we could insert here?"
Once again, I think he's referring to run-time environments. Basically, what he's saying is that you would use boxes at compile-time (like Nulan), but you would also have first-class environments with vau. The benefit of this system is that it's faster than a naive Kernel implementation ('cause of boxes), but you still have the full dynamicism of first-class run-time environments. I suspect there'll be all kinds of strange interactions and corner cases though.
---
Slightly off-topic, but... I would like to point out that if the run-time environments are immutable hash tables of boxes, you effectively create hyper-static scope, even if everything runs at run-time (no compile-time).
On the other hand, if you create the boxes at compile-time, then you can create hyper-static scope even if the hash table is mutable (the hash table in Nulan is mutable, for instance).
I'm not sure I understand your argument... aside from macros, how often do you use symbols? The only other case I can think of is using symbols as the keys of hash tables:
(= foo (obj))
(foo 'bar)
But in Nulan, hash tables use strings as keys, so that's no problem. As for your "really common list manipulation"... my system is actually better for that. Compare:
(cons a b) ; Arc 3.1
`(,a ,@b) ; Arc 3.1
(list a b c) ; Arc 3.1
`(,a ,b ,c) ; Arc 3.1
(join (list a) b (list c)) ; Arc 3.1
`(,a ,@b ,c) ; Arc 3.1
'a ,@b # Nulan
{a @b} # Nulan
'a b c # Nulan
{a b c} # Nulan
'a ,@b c # Nulan
{a @b c} # Nulan
---
"(f a b @c) looks so much nicer to me..."
I agree. Nulan supports @ splicing for both pattern matching and creating lists:
# a is the first argument
# b is everything except a and c
# c is the last argument
-> a @b c ...
# works for list destructuring too!
# this function accepts a single argument, which is a list
# a is the first element of the list
# b is everything in the list except a and c
# c is the last element of the list
-> {a @b c} ...
# in addition to working for function arguments, it also works for assignment
# a is 1
# b is {2 3 4}
# c is 5
box {a @b c} = {1 2 3 4 5}
# and of course you can nest it as much as you like
box {a {b c {d} @e}} = {1 {2 3 {4} 5 6 7}}
# equivalent to (join (list a) b (list c)) in Arc
{a @b c}
# equivalent to (apply foo bar qux) in Arc
foo bar @qux
# equivalent to (apply foo (join bar (list qux)))
foo @bar qux
---
"I.e., if (car (cdr (obj key))) returned a box, you would be able to just call '= on it, with no thought for how you found the value."
In Nulan, you could do what you're talking about, except it would all have to be done at compile-time, because boxes only exist at compile-time. In addition, you wouldn't be able to set the box directly, you would instead generate code that sets the box at run-time, i.e. macros.
Nulan has the restrictions that it does because I wanted to compile to really fast JavaScript. Other languages (like Arc) don't have that restriction, so it should be possible to design a language that has boxes at run-time, in which case your idea should work.
I actually really like symbols, and the existence of a symbol type in lisp is one of my favorite features. Technically in Nulan a 'symbol' is replaced by a 'box', and if your box had a string property called "name" that held the name of the variable they would probably be interchangeable.
Either way, I agree that your list tools are generally more useful than quote/unquote. The main things I use it for are 1) to get a literal symbol and 2) to get something like list splicing. It looks rather arcane and adds clutter though, so in most other cases I wouldn't use it.
If there was another way to just splice in a list in the middle of the code the way Nulan seems to, I might not feel as attached to it.
"I actually really like symbols, and the existence of a symbol type in lisp is one of my favorite features."
I too like symbols. But I think if you examine why you like symbols, you'll realize that you like them because... most other languages don't have a first-class way to refer to variables. But in Lisp you can, using symbols.
Well, Nulan has both symbols (representing unhygienic variables), and boxes (representing hygienic variables). It's just that hygienic variables are so much better in so many situations that there's not much reason to make it easy to create symbols in Nulan.
---
"Technically in Nulan a 'symbol' is replaced by a 'box', and if your box had a string property called "name" that held the name of the variable they would probably be interchangeable."
Yes boxes have a name property. This is currently only used when printing the box. Yes you could convert from a box to a symbol, but Nulan doesn't do this, because I haven't found a reason to.
Actually, there is ONE situation I've encountered where I would have liked to use symbols... in my playlist program, I have a list of file extensions which are considered "audio". Here's how it looks in Arc:
'(flac flv mid mkv mp3 mp4 ogg ogm wav webm wma)
In Nulan, that would have to be written like this:
This relates to yet another idea I think I've actually mentioned before.
Namely, making the lists that form the code/ast have reverse links, so you can mutate above the macro call level, instead of just insert arbitrary code in place. This wouldn't be feasible for general lists, as it is possible for a sub-list to be referenced in more than one place, but for code, each piece is generally considered unique even if it looks the same.
Anyway, this would allow for affects ranging from splicing to arithmetic and other much more evil and nefarious but possibly useful effects. I haven't thought through all of the implications, and I bet most of them are negative, but it would still be interesting to consider.
An implementation of intermediate splicing would be something like:
I wrote the rest of this post thinking you were talking about the first one, but right at the end I realized I wasn't so sure. :)
---
"Namely, making the lists that form the code/ast have reverse links, so you can mutate above the macro call level, instead of just insert arbitrary code in place."
I'll make an observation so you can see if it agrees with what you're thinking of: The expression "above the macro call level" will always be a function call or a special form, never a macro call. If it were a macro call, we'd be expanding that call instead of this one.
---
For these purposes, it would be fun to have a cons-cell-like data structure with three accessors: (car x), (cdr x), and (parent x). The parent of x is the most recent cons-with-parent to have been constructed or mutated to have x as its car. If this construction or mutation has never happened, the parent is nil.
Then we can have macros take cons-with-parent values as their argument lists, and your macro would look like this:
Unfortunately, if we call (list 1 2 (splice 3 4) 5), then when the splice macro calls (parent list), it'll only see ((splice 3 4) 5). If it calls (parent (parent list)), it'll see nil.
---
Suppose we have a more comprehensive alternative that lets us manipulate the entire surrounding expression. I'll formulate it without the need to use conses-with-parents or mutation:
; We're defining a macro called "splice".
; The original code we're replacing is expr.
; We affect 1 level of code, and our macro call is at location (i).
(mac-deep splice expr (i)
(let (before ((_ . args) . after)) (cut expr i)
(join before args after)))
If I were to implement an Arc-like language that supported this, it would have some amusingly disappointing consequences:
As to using JSON source. Yeah, it's a bit ugly, and you have to type more than with S-expressions. BUT: the typing overhead is constant, so I don't really worry about it. After all, the real typing savings come from the macros that are enabled by this syntax.
What I really like about it is that by using JSON as syntax, Wat is JavaScript on a fundamental level. Wat can coexist within JS files, heck even within single JS functions. The ease of deployment enabled by this is immense.
Yes, that is a nice feature. I also like the fact that it results in an extremely small base, and more separation-of-concerns. Someone else can write an sexp to json parser rather easily.
I'm just lazy enough to wish that someone else had already done so :P
"I'm just lazy enough to wish that someone else had already done so :P"
It's funny, I've done exactly that recently, with the same goal of getting away from JavaScript with as little cruft as possible. My goal was to make a hackish little language (nevertheless better than JS) and then make a better language on top of it.
Awkwardly, by the time I had a nice reader, I realized I didn't have any particular semantics in mind for the little language! So I decided to work on the semantics I really cared about instead. :-p Although this semantics is still incomplete, the reader is there whenever I'm ready for it. Maybe it could come in handy for Wat.
It's part of Era (https://github.com/rocketnia/era), and specifically it depends on the two files era-misc.js and era-reader.js.
It's yet another read-table-based lisp reader. Every syntax, including lists, symbols, and whitespace, is a reader macro. My implementation of the symbol syntax is a bit nontraditional, because I also intend to use it for string literals.
I specifically designed it so it can run stand-alone, without any dependencies on the rest of Nulan. It takes a string and returns a JS array of strings, numbers, and Symbols. You can easily write a simple function to transform the output so that it's compatible with wat.
You're also free to modify it, though I'm not sure what license to use for Nulan. It'll probably be either the BSD/ISC license or Public Domain.