We'll all be waiting ^^. How'd you implement closures? As a structure or just an array? Boehm might get confused in the array case if you're using the first entry of the array as a type tag (which isn't a pointer). Or maybe not; I haven't studied Boehm GC very well.
As for GC: what kind did you write? Copy or mark? If it's marking, I'd suggest a mark-and-don't-sweep collector. I think most incremental and thread-friendly modern GC's are copying though.
Edit: as for me doing the macro hacking stuff, well, it looks like I'm all hacked out. Hehehe^^
Hmm... I'm not very good at terminology, but I'm almost sure it's a mark-and-sweep. The implementation relies on system malloc. Every time some memory is required, the user calls gc_malloc. This function calls malloc stores the pointer in an array and returns that pointer. Once the array is full (we're not talking about consumed memory yet, but about built references, so it can break down if you build very big objects), collection is performed : everything not reachable from the stack (or recursively from reachable objects) is freed. It has to be improved, but for now on it's working.
I implemented closures as an array of long. Very easy to deal with. The first one is the tag type, the second one is the goto label, the third is size of the array (we need it for garbage collection) and all others are the arguments (well, they are objs, but they are implemented as a long).
It does indeed seem to be a mark-and-sweep. Generally though most GC's will handle the heap: they allocate one big bunch of memory via malloc() and allocate from that.
"Mark" means to determine if a memory area is accessible. Usually this means setting some sort of bit or variable for each memory area. After you've marked all reachable memory, you perform a "sweep": any unmarked memory is freed.
A slightly-more-efficient algorithm than mark-and-sweep is mark-and-don't-sweep (obviously because you skip the "sweep" step), but this requires us to handle the heap directly. Here's an explanation:
Each memory area in the heap has a "free/in-use" bit. This bit's sense of "free" can vary. For example, at any one time, all "free/in-use" bits may have the meaning:
0 = FREE
1 = IN-USE
At another time, however, the meaning might be:
0 = IN-USE
1 = FREE
The magic here is the way the free/in-use bit is interpreted by the memory manager.
Now suppose we allocate a bit of memory that is too large for the current free memory pointed at the alloc pointer:
|-------| <- I need something this big
+---+-----+--------------+---+------------+---------------+-------+
| 1 | 0 | 1 | 1 | 0 | 1 | 1 |
+---+-----+--------------+---+------------+---------------+-------+
^
alloc pointer
Obviously, we have to skip the free memory that's too small. However, let me introduce an invariant: everything to the left of the alloc pointer must be in-use. So if ever we skip free memory that's too small, we still mark it in-use, but we don't return it (obviously, it's too small!). Instead we continue over to the next free memory and see if that is large enough, and so on.
In this case the very next portion of memory is available:
|-------| <- I need something this big
+---+-----+--------------+---+-------+----+---------------+-------+
| 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 |
+---+-----+--------------+---+-------+----+---------------+-------+
| ^alloc pointer
v
returned
...and afterwards... uhh... well.... we just allocate as normal, except the meaning of 0/1 of the free/in-use bit has flipped. "Don't sweep". Thus our sweep step is part of our allocation.
As an aside: I've started writing an 'eval function for use with macros in arc2c. This is done by creating a new "eval" function using (make-eval) in make-eval.arc. It's not done yet though.
My plan is that for each compilation run, we (make-eval) a new version of 'eval. Why? Because we want to protect the global environment.
For example, the user code might want to use the following macro:
(mac xe body
`(tag (div class 'xe)
,@body))
Unfortunately, 'xe is a function defined and used by arc2c. If we were to simply 'eval all 'mac forms, then user code could thrash arc2c.
Instead, we create a "protected" eval. This eval, when used, will prevent writes to global variables. Instead, writes to global variables will mutate a global-variable table, not the "real" global variables.
However, it's not done yet, there are a bunch of TODO's floating around. And unfortunately, I might not be able to do this for a week. Or maybe two weeks, or maybe a month.
A friend of mine has a pretty big personal Real Life(TM) problem (it involves, like nearly every big personal RL problem, a member of the opposite sex). I'll need to help him for now. Sorry.
(the guy will, usa embassy willing, be in san francisco, california, usa a month from now. he's had to borrow quite a bit from his friends too, so we're all pretty tapped out and can't accompany him. err. just wondering if someone near there could keep an eye on him.)
The code for the 'eval interpreter is on github. Anyone who wants to try continuing it is welcome to do so. You're even welcome to completely dismantle that bit and start some other way of doing macros.
That looks clever, and not too complicated... I'll try to implement it when I'll have enough time... As a matter of fact, dealing with heap space myself would let me reduce the size of closure objects (I wouldn't need to know the # of arguments they hold anymore).
Well, good luck with your friend, and see you soon !
That works, but it only eliminates half as much nesting. Would it work for more with first-class macros? (I'm just going to expand it into a compose call because I don't want to make it all one symbol)
Exactly what do mean? Macros seem to work in functional position at least.
arc> (and:or nil t)
t
> But really, what improvement in readability do you get with the 'block style?
I agree with you on this. 'block doesn't really remove parens, it just moves them around and removes the nested structure. According to pg's Ansi Common Lisp, "Lisp programmers read and write code by indentation, not by parentheses." I find nested code to be more readable than flat code for exactly that reason, even if the parens stack up at the end of the block.
Which is exactly what I meant by "macros seem to work in functional position." But amkglor's original statement "compose does not work with macros" does not take into account the special treatment of 'compose in functional position, which is why I was confused.
And if I am not mistaken, first class macros, once implemented, could be used in other situations like your example above.
Looking at alkmglor's comment, it appears to indicate that a literalcompose doesn't work; when it's optimized out, it works fine. In other words, everyone is vehemently agreeing with each other ;)
And of course, first class macros could do this. But we don't have those yet…
I find it more readable personally, and definitely easier to write, if there are fewer parentheses at the end of the block. Also, the lack of indentation keeps things lined up nicely, without spilling them off the right edge of the screen.
But really, the block macro defines context (in fact, I considered calling it "context"). That is the semantic meaning. It gives you a list of expressions that define the context for the rest of the list.
I personally find it easier to read, but I am curious about whether other people do, and why or why not. You said the details of 'repeat don't look special anymore. Do you think they should?
Unless I've messed up the parentheses, those should be equivalent. I think the problem is that expand-metafn-call is being called by setforms when it shouldn't be. (Or alternatively, it shouldn't give up and die.) The example works if you first turn expand-metafn-call into a nop:
No, it was a mistake by me: Anarki works similarly to ArcN in this case. Sorry for muddying the waters. ^^
The problem appears to be that the current '= was meant for use with the older ssyntax, which supports only : for composition. I'll take a better look maybe later, I'm just home from work, cooling off (very hot in the philippines right now)
The "invert" thing is really confusing, I didn't add docstrings to all the '= related stuff because of that. Me, I say refactor '=. LOL
Edit: Fixed on the git.
(def metafn (x)
(set x (ssyntax x))
(and (acons x) (in (car x) 'compose 'complement)))
True. I'm trying to hack the macro stuff, to not much effect. Erk.
In fact quite a bit of arc.arc is now compileable, although you do have to transform (def ...) to (set .... (fn ...)) manually. So really what's needed now is macros. Also trying to think of how best to implement optional args and destructuring args - probably by just hacking off rest arguments (for optional args) and let's (for destructures)
Macros should be easier to implement once the compiler is able to compile itself, because this way the compiler and the compiled macro have the same internal representation of data structures, so passing arguments between the two shouldn't be too hard.
There are several uses of macros in the compiler, unfortunately. In particular the 'def macro is too much of a convenience. So in order for the compiler to easily compile itself, it first has to implement macros. Chicken, meet egg.
Ah heck, maybe I should just use 'eval now and implement a compiled 'eval interpreter later that can interpret code and yet allow interpreted code to call compiled code and vice versa.
In fact I already have a bit of a sketch for this (which is necessary if we want to allow compiled programs to use 'eval). Basically put interpreted '(fn ...) forms into a 'interpreted-fn annotated type together with surrounding environment, add an entry to the 'calls* table (via defcall, say) for 'interpreted-fn to, say, a $$interpreted-fn-apply function which binds the parameters into an environment table and calls the 'eval interpreter.
Of course this requires some changes in the base system: we need at the very least a %symeval primitive which when given a symbol will give its global binding, a %symset primitive which will modify a symbol's global binding, and obviously we need a link from the symbol to the GLOBAL() array (and dynamically create new containers for created symbols - if it's not in the GLOBAL() array then the compiled code would never read that global anyway, only the interpreted code ever will).
The rest of the interpreter is just a standard scheme interpreter, the only real support we need is to be able to call compiled-from-interpreted and interpreted-from-compiled, and the reading and binding of global symbols, including those that aren't in the GLOBAL table.
Ouch, my head hurts. And sacado's the one doing the Unicode strings. LOL
Would it be worth implementing 'def directly? This would give a lot more functionality right away. This could be temporary until macros are implemented.
Possibly. There's a bunch of "macro" transformations in xe.arc, possibly I'm just a bit too lazy to think. However I don't like depending on those transforms, I want to do it "properly"
For the global vars problem, a solution could be to associate top level values directly with the symbol, this way a symbol would consist of three values: its string representation, its global value (initially a special 'unbound value') and a property list.
The current style has an optimization where all globals are simply referenced directly from an array in O(1). I'd rather that symbols point to entries in this array, because symbol-as-global-variable lookups are expected to be completely nonexistent if 'eval isn't involved in the program anyway (who uses 'eval in a language with 'read?). Only newly created symbols must have allocated variable values, and only for the benefit of 'eval'ed code - we can already know the global variables in the compiled code, because the compiler need that info anyway.
Basically:
struct {
long type; /*T_SYM*/
char* stringform;
#ifdef EVAL_USED
obj* binding;
#endif
} symbol;
int main(){
/*compiler generated only if eval is used*/
obj sym; symbol* sympt;
sym = SYM2OBJ("globalvar0");
sympt = (symbol*) sym;
sympt->binding = &GLOBAL(0);
sym = SYM2OBJ("globalvar1");
sympt = (symbol*) sym;
sympt->binding = &GLOBAL(1);
...
}
This way the current performance is retained (global variable lookups are O(1)).
I don't know how much this solution will be once support for a dynamic load (e.g. from the REPL) will have to be implemented, because you'll have to keep an index of the last global variables created across different compilation sessions. With threads it gets even more complicated (mutex on the index?). With symbols it would be simpler to implement a dynamic load or definition of a global var from the REPL. The price paid is a slightly slower access to global variables, because 2 references to memory are necessary for every refrence to a global var. Global variables lookups are still O(1) though, e.g: sym->binding for read access and sym->binding = value for write access.
> the last global variables created across different compilation sessions
I don't understand this part. I was proposing that 'eval would be an interpreter, not a compiler. My intentions was that compiled code would be statically generated (the way it's done now), so 'eval cannot possibly compile code. It would be a compiled interpreter of Arc. arc2c is a static compiler, so 'eval won't add ever add compiled code; the best it can do is create a 'interpreted-fn object that contains an interpreted function's code (as a list) and the enclosing interpreted environment
So a dynamic load would just interpret the expressions in the file being loaded:
(set load
(fn (f)
(w/infile s f
(whilet e (read s)
(eval e)))))
'eval would be able to access the global variable table indirectly via the symbols and %symeval/%symset.
Basically, 'eval would be compiled to something like this:
(set eval
(fn (e (o env nil))
(if (isa e 'symbol)
(if env (lookup-environment env e)
(%symeval e))
(...))))
Also: if the compiled code doesn't reference it, it won't be in the GLOBAL() array. The reason is simple: the compiled code won't reference it, ever. If 'globalvar isn't in GLOBAL(), then it does not exist in the compiled code. So it doesn't matter that it's not in the GLOBAL() array - the compiled code never referenced that global, so it won't ever use an index into the GLOBAL() array to refer to it. The interpreted code might, but that's why we have an indirect reference connected to the symeval.
Also, when I say O(1), I mean O(1) with the number one, as in only one layer of indirection (an index within a table). If global bindings are kept with the symbol only, then all global accesses - even precompiled ones - need (1) to find the symbol and (2) get the binding, for a total of O(2).
In other words: 'compile-file compiles, but it creates a new executable which is never connected to the process that ran 'compile-file. 'eval just interprets, and if the interpreted code mutates a global of the program, then that global gets mutated for real, in the program (what are you doing using 'eval on untrusted coe anyway). But if the interpreted code mutates a global that is never used in the program, it just creates a new global variable, one which is never referenced by the program (by definition, because the program never used it).
I thought eval compiled code, loaded it and then executed it. I've been mistaken. With the compiled code completely static then your strategy is better than assigning values to symbols.
Hmm. After a bit of research, it seems that some VM's handle this by using pthreads a few times (presumably one per core), then having each pthread spawn several green threads.
This seems a good solution if we can arrange to spawn a new pthread also for blocking I/O, maybe by catching I/O functions and putting into a pthread the I/O operation caller. This hybrid would make the compiler code not very clean, because it should handle both green threads and pthreads.
Here's an alternative using Anarki scanners (or if you prefer, you can always use coerce which is cheaper on the memory; however scanners are lazier and, if you need to process less than maybe 1/4 the original string, is cheaper in memory too)
(def shortify (i s)
(let shorten nil
(= shorten
(fn (i ss)
(if (is i 0)
nil
(when ss
(pr:car ss)
(if (and (caris ss #\newline) (is (cadr ss) #\newline))
(let ss (cdr ss)
(while (caris ss #\newline)
(prn)
(zap cdr ss))
(shorten (- i 1) ss))
(shorten i (cdr ss)))))))
; replace with (coerce s 'cons) to taste
(tostring:shorten i (scanner-string s))))
Untested of course, and you better make sure the parens match, I don't have access to vi right now ~.~ ^^.
Edit: s/l/ss/, because l (small letter L) looks like 1 (number one)
Edit2: err. Actually I thought i would be number of paras. Oh well. At least now you have more generic code ^^:
Same with defop. Accessing the GET/POST arguments to an operation is done via (arg req "foo"). It would have been worlds nicer if I could just define something like
macros are processed before anything else - before even executing anything else.
your code
|
v
macro expander
|
v
Arc core
So if part of your execution is assigning the macro, then it won't work.
In your first case, this is what happens:
arc> (set my-plus (annotate 'mac (fn (x y) `(+ ,x ,y))))
|
v
macro expansion
(hmm, I don't see any macros
to expand... oh well!)
|
v
(set my-plus (annotate 'mac (fn (x y) `(+ ,x ,y))))
|
v
Arc core
(okay, so I construct a fn,
annotate it with 'mac, then
store it in my-plus... done!)
arc> (my-plus 19 71)
|
v
macro expansion
(hmm.. oh! my-plus is a macro!
lemme expand that...)
|
v
(+ 19 71)
|
v
Arc core
(okay, so I add 19 and 71....
hey, the answer is 90!)
However, for your second case:
arc>((fn () (set my-plus (annotate 'mac (fn (x y) `(+ ,x ,y)))) (my-plus 19 71)))
|
v
macro expander
(meh, I don't see any macros,
how boring...)
|
v
((fn () (set my-plus (annotate 'mac (fn (x y) `(+ ,x ,y)))) (my-plus 19 71)))
|
v
Arc core
(okay, so there's this function
I have to execute... load
my-plus with some object I don't
know about... hey! Why am I
trying to call some object I
don't know about?? Error hander!
Tell the boss there's a problem!)
|
v
Error: "Function call on inappropriate object #3(tagged mac #<procedure>) (19 71)"
In other words: local macros don't exist. On the Anarki, there's this hacky thing in "lib/macrolet.arc" which simulates local macros, called 'maclet, 'macwith etc - be warned though, it's guaranteed to be buggy.
Awesome, thanks for the explanation. Is this a language design choice, or is there some fundamental reason that it must be so? I ask because the ((fn () ... )) case works in rainbow - but I would like to have a set of tests that behave the same way on ac.scm as well as on rainbow - this might ultimately prove useful to other arc implementors too ...
The fundamental reason is really the problem of how to implement macros.
Most Lisp's are targetted towards compilation. So what happens is really like this:
your code
|
v
macro expander
|
v
compiler
This means that if you define this code:
(mac my-add (x y)
`(+ ,x ,y))
(def my-function (x y z)
(my-add (my-add x y) z))
Then the macro expander will expand the code to
(set my-function
(fn (x y z) (+ (+ x y) z)))
But if we really, really wanted to have macros as first class, then how would the following code get compiled?
(def my-function (x y)
(my-oper x y))
(mac my-oper (x y)
`(+ ,x ,y))
(pr (my-function x y))
(mac my-oper (x y)
`(- ,x ,y))
(pr (my-function x y)) ;exactly the same call, completely different meaning
(mac my-oper (x y)
`(* ,x ,y))
(pr (my-function x y)) ;exactly the same call, completely different meaning
If we supported macros as first-class objects, then a "compiled" program would have to compile itself while running, because the macros might have changed between invocations of the macro. In such a case, you might as well have just stuck with the interpreted version.
The problem isn't intractible (you could do JIT compilation and check if the macro inside the variable is still the same to the older macro you used), but it's not easy either. And besides, most people don't find a need to redefine macros anyway. So most Lisps just punt: the rule is, the macro exists before the Lisp reads the code.
But couldn't you decide that any macro was only evaluated once (so the my-oper in my-function wouldn't change), but macros were searched for in the lexical namespace anyway? This would mean that any call with a lexical in the functional position would have to be checked for macro-expansions at runtime, of course, but it would be slightly more reasonable.
Yes, but again: compilation during runtime. Meaning (most likely) some sort of JIT. Wanna try to implement this? You could start by hacking this onto pg's arc-to-scheme implementation.
Okay. Be careful to still be able to properly handle environments, without actually turning it into an Arc interpreter.
pg's ArcN is really an Arc-to-Scheme compiler. And I dearly suspect that this was the main reason for not implementing first-class macros. Macros are intended to work on the code before the compiler does, so having true first-class macros is a little difficult.
2) Many functions have been decoupled from their primitives. Functions that need access to primitives must be declared as library functions in lib-ac.scm.arc.
For most cases, functions will be inlined anyway, so the resulting code will be practically the same as in previous versions, but at least the current version could do something like: