Just a question, but I wonder if you could give eds write access to the repo? And of course, if there's anyone else out there who would like to contribute, just let us know, I think sacado would be glad to accommodate you ^^
I'd like to have access too. Currently I don't have much free time to contribute, but if I find some time i'll be glad to contribute. My github username is, surprisingly, 'stefano'.
1) pg is rumored to have actually implemented Arc several times, on various Lisps. I would assume he went for the more portable approach, one which would work on both CL and Scheme.
2) There "should be" no real difference between continuations and closures anyway. For example, in arc2c, everything is converted to CPS anyway.
The portability thing was my initial thought on why he chose to eschew first class continuations for a closure-based CPS architecture. For some reason I decided to read over his "Lisp in Web Applications" essay again and I noticed that the mechanism he was using in Arc was described in perfect detail in his essay, and that got me thinking--was he just using this method because that's what's been in Arc all along? Are there other advantages outside of portability to different implementation languages that can be derived from using closures vs native continuations? Now that Arc is finally available for human consumption and running on mzscheme is there any barrier to porting the srv.arc application to a native continuations-based architecture?
These are just the thoughts I have wondering around in my head as I read through the Arc source. I think Arc is valuable for two reasons: 1) as a language to get things done, 2) as a crash course in language and library design. The second attribute of Arc is the one I find most interesting as it's rare that we get to look into the design of a language and its libraries at such an early and still uncluttered state. Add to that the fact that it's written in Scheme and you have a perfect case study for stimulating conversation.
Anyway, just my thoughts. I appreciate the response, and I'm hoping to get a few more, perhaps even one from Paul himself.
Basically instead of a cons cell (as suggested by stefano) I use a new structure, the "sharedvar", which is just a container for an obj. These also means that the actual closure objects are immutable after creation.
Only local variables that are ever set are put in sharedvar's, other local variables are kept in the flat closure. However I don't analyse for variables that are set only once, or which aren't shared by other closures yet.
Yeah, this is the approach taken by chez, mlton, and many recent compilers of functional languages.
And all the optimization stuff(unboxing of "sharedvar", inlining, type inference, known function analysis, unused variable elimination, etc) can be made agressive by only global flow analysis, which is rather time-consuming. But I'm curious to know how far a relatively conservative compiler which doesn't do any flow analysis is able to go. If stalin performs a little worse, but compiles 10x times faster, I believe that much more people would use it.
Sharedvar unboxing is a little difficult, since there are two types of local variables: those in closures, and those in parameters:
(let kept ()
(set keeper
(fn (x)
(set kept (cons x kept)))))
; vs.
(set fooer
(fn (x)
(set x (rev x))
(do-something x)))
So basically we need two types of local-variable-set primitives: one for closures-variable-set (first case above) and another for parameter-variable-set (second case)
Re: Stalin - is it that slow? Meaning an order of magnitude improvement of time is needed to make it comfortable?
Edit:
Type inference: well I can't think of a good way of getting type inference generically, but certainly it's possible for e.g. '+. '+ requires that all parameters are either numbers, or all strings, or all lists, and if we can determine that one parameter is of a specific type, we can put the checking that the other parameters are of that type and immediately bind the + to the specific type.
For example if we have %n+ for numeric addition, %s-join for string concatenation, and %l-join for list concats:
(+ x y z)
=>
(+ x y z) ; can't determine type
(+ 1 x)
=>
(%n+ 1 (let check x
(if (is (type check) 'num)
check
(err "+: type mismatch"))))
(+ (list 1 2 3) x)
=>
(%l-join (list 1 2 3)
(let check x
(if (is (type check) 'cons)
check
(err "+: type mismatch"))))
Stalin might be the most optimizing but slowest functional language compiler ever written.
Sharedvar unboxing is not an important issue because it doesn't make much difference in efficiency. Most scheme programs don't update local variables very often. The most useful optimization parts are(in my opinion): Special treatment of let and letrec, inlining and known function detection.
General ML-style type inference for scheme is impossible. What we can do is to infer as more types as possible.
1)Trasforming let and letrec to ((fn (...) ...) ...) is not efficient. First, it would allocate a closure. Second, it would perform a function call. Instead, the variable bound by let and letrec should be allocated on the stack, and no function calls are needed.
2)For example, in:
(f x)
If f is statically known, and f's environment is null or is the same as the environment of the calling site, then the function call should be a direct jump. It eliminates the cost of (1)global fetching of 'f, (2)extracting the information of the address and environment, (3)switching the environment, (4)an indirect jump.
By known function detection, tail recursive functions can be compiled to exactly the same code as loops in imperative languages do.
1) Given that everything is transformed to CPS, pretty much everything - including sequences I think - ends up being a function call. In fact, only jumps exist at all.
Not sure about how the closure-conversion works. This may be workable, would you be willing to work on this?
2) I get this now, although I'm not sure how to translate this into the current Arc2c output. I'll discuss the current arc2c calling convention and you tell me how workable your proposal is.
------
Currently, arc2c output works out like this:
1) Each function is simply a case in a large switch statement:
jump: switch(pc){
case 0:
...code for function 0...
case 1:
...code for function 1...
}
2) There exists a "stack" which is not the C-stack:
3) At the start of each function (with the exception of function 0, which is the top-level), the stack contains a [0] closure for the current function, [1] the continuation, and [2+] the Arc parameters. This is assured by the calling function.
4) Functions are passed around as closure structures. The first eleemnt of the closure structure is a non-encoded number, representing the case for that function, while the rest is simply an array of closure variables.
5) Functions simply use the stack for temporary scratch space. For example this is how (+ 1 2) would compile to:
PUSH(FIX2OBJ(1));
PUSH(FIX2OBJ(2));
ADD();
6) Just prior to calling a function, the calling function pushes the parameters in order: [0] closure (the function to call), [1] continuation [2+] arguments. The number of elements N for the call is computed by the compiler
7) Then at the function call, the calling function copies the top N elements of the stack into the bottommost N elements, and assures that sp = &stack[N]. Then it sets the C-variable pc to the closure's function field, and does a C goto jump;
Well, I only have experience of writing direct-style compilers, not CPS-style ones, so my advice needs to be adapted.
But from mechanism of the current arc2c output you showed above, I see many places for improvement:
1)In a function:
(fn (x y z ...) (g A B C D ...)),
if B doesn't rely on x, C doesn't rely on x and y, D doesn't rely on x, y and z...etc, the calling function could avoid copying elements to the bottom. Instead, it moves the stack pointer to the bottom first, and then pushes the arguments.
2)For functions having no environments, we don't have to push a full closure, we just have to push pc.
3)For known functions, we just do a C goto jump not to the jump label, but to the (case n), because C cases are in fact labels.
Finally, in my opinion, a CPS-style compiler is no longer a better choice nowadays. It complicates the source, the debugging information and the (human) analysis of the program structure. Since we are already using a separate stack that is different to C's, continuations can be implemented in direct-style compilers as easily as in CPS-style ones. And codegen for direct-style compilers is just slightly more difficult, which isn't an issue. In addition, a naive direct-style compiler performs much better than a naive CPS-style one. The latter needs a source simplifying step to eliminate unnecessary closures and function calls produced by CPS conversion.
1) personally I think this is a rare case, but I could be wrong
2) arc2c closures are very lightweight: it's just a simple array of obj(s), with the first obj being the pc. So in effect for functions having no environment, we are pushing a pointer to the pc.
That said, closures are also used to represent functions that can be passed around. Unfortunately closures are currently untyped, so we expect the current closure style to be changed.
Also we need to support the possibility that a "function" being called isn't really a function: after all table syntax is just (tb key). And this is perfectly valid Arc:
(let sometable (table)
(each k lst
(= sometable.k (generate-something k)))
(map sometable ; yes, we're passing a table as if it were a function!
foolst))
3) I was actually thinking of this too, although I haven't gotten around to it.
re: CPS: I wouldn't really know. Me, I'm just hacking around at the transformations before the CPS and Closure conversions. Because of the somewhat modular construction of arc2c, in theory you could write a drop-in replacement for CPS and Closure conversions, as well as code generator, and we can then put either CPS or the direct style as options, maybe.
1)It's not a rare case. It's important for speed improvement for most of useful programs. For example, map & foreach, which are used quite often, can be optimized by not copying data on stack.
A non-optimizing compiler leads easily to a "fast enough" executable. Without optimizations I think the compiled code would be 7x~10x slower than C.
Edit: I've tried the Fibonacci "benchmark" on a simple compiler i'm writing: it takes 0.2 seconds to compile the program and to compute the 32snd Fibonacci's number. On the current Arc interpreter it takes ~5 seconds.
Your compiler might be much slower if it's with true scheme numbers, + operator as a function(not a primitive operator) and stack overflow checking. These features are currently supported by the arc interpreter on mzscheme.
If you can correctly eliminate function calls on +, your compiler is an optimizing one, not non-optimizing...
I've tried the same example putting a function call and a test around every arithmetic operation, and execution time went from ~0.2s to ~0.26s, not a big difference, although a few optimization will probably be necessary for something more complex than fibonacci's example.
Is the function call overhead so small? I didn't realize.^^
But there are other issues: the fib example is not a very good benchmark suit, because in C, general recursion is not a common paradigm. If we compare C loops to Arc tail recursive calls generated by a simple compiler instead of comparing C recursions to Arc recursions, I believe that the difference will be much larger. Because C compiler writers have spent at least 20 years on optimizing loops...
That's absolutely true. Reaching C speed with high level languages such as Lisp it's very very difficult. CMUCL and SBCL reach roughly the speed of C, but they've been developed for a long time.
As of loops speed vs. tail recursion speed, the difference shouldn't be too big.
Stalin performs as better as C in numerical programs and many other benchmarks. The most exciting thing is that unlike CL, stalin doesn't need type declarations to guide optimizations. It would infer as much type information as possible. The problems is that it compiles too slow and it's not maintained anymore.
Naively implemented tail recursions is still not fast, because many common loop optimizations can't be directly applied to them unless you eliminate the function calls and regard them as true goto's. It's a rough task because the global flow analysis is needed for eliminating as many calls as we can.
Re: symbols - perhaps it's better to leave them in UTF-8, then only convert them to UTF-32 if and only if they are coerced to strings.
I suggest making tables work first before the really weird numbers ^^
Edit: there's also the problem of making I/O ports redirect to strings under construction. 'tostring is pretty much the idiomatic way of creating strings in Arc.
Edit2: As an aside, here's what I intend to do with ac.scm built-ins:
1) Remove ac.scm xdef'ed functions from the mac* stuff in xe.arc
2) Create a special lib-ac.scm.arc, where we can access primitives:
(set car
(fn (x)
(%car x))) ;will access primitive %car, but only in lib-ac.scm.arc
Code that isn't in lib-ac.scm.arc will not be able to access the primitives %foo.
The above will of course allow code like:
(map car (pair lst))
3) Finish up my inliner, so that user code of the form:
Yep. I left symbols encoded in UTF-8. When a string is output to anything (included when transformed to a symbol) it is translated to UTF-8. And I was planning to implement tables before the numeric tower. I think there's more fun in tables rather than in numbers :)
Okay. Although I'd like to ask if you can implement the Anarki 'defcall and 'call* too?
Basically this means that instead of updating the pc C-var at END_JUMP(), our jump: C-label does the updating. If it's an ordinary function, extract pc from CLOSURE_REF(LOCAL(0),0). If it's not, lookup its type in the call* table, and rearrange the stack:
(obj k . ind)
=>
((call* (type obj)) k obj . ind)
This also means that closures need type tags now too.
Ok, I'll work on this too. I need to type closures now anyway, as I am implementing full support for things like pr and type.
Btw, about type tags and the pointer-as-fixnum hack : I read a paper about the implementation of Lua (for tips about tables implementation). In Lua, everything, including numbers, is implemented as a structure containing first the type, then the data for the actual object (somebody already mentioned this in the forum). The reason is that the ANSI C standard does not allow the pointer-as-fixnum trick : you cannot know for sure that addresses will have a 0 low bit. In practice, it works on the most common architectures, but it's not fully portable (which is a problem given Lua's target). Hmm, you were right, maybe later we could add another version of codegen that would be slower and more memory-consuming but completely portable.
True, which is why I was always a bit leery of the trick. Besides, if you're going to implement bignums anyway, you might as well start off "fixnums" as very small bignums ^^. LOL. Of course there's the obvious problem that most applications won't use bignums, and in applications that do, most numbers still aren't bignums.
But then, if you add two fixnums together and the result won't fit in a fixnum...
As an aside, in the current mzscheme implementation, it seems fixnums are type 'int and everything else is type 'num
Finally: if you need to access call* , it may be possible to determine its position in GLOBAL() and add a C-global CALL_STAR:
You're welcome. You can prolly review bits of the code via github if you don't have access to your own computer this week.
Wonder how eds is doing on arc2c? BTW have you requested to mentor his GSoC application? If you already did I'll withdraw my request.
The bit about strings is - how do we represent them? UTF-8? UTF-32? As an array or list of characters?
Arc's underlying mzscheme divides strings into code points; each code point is representable by a single 32-bit number (I think). An individual "character" in mzscheme is thus a code point (from what I gather), although in Unicode a character could be represented by several code points (or so I hear).
Now the point is that the following is quite valid:
arc> (= p "asdf")
"asdf"
arc> (= (p 0) #\c)
#\c
arc> p
"csdf"
So obviously access to individual characters should be easy, and replacing individual characters should probably not cause us to mess too much with memory. This almost prevents the use of UTF-8, even though all I/O will pretty much just use UTF-8.
Yes, I think UTF-8 would be a disaster with modifiable strings. mzscheme uses UCS-4 (UTF-32) internally, and that would be the simplest approach. If you are willing to ignore Unicode characters > 65536, then UCS-2 would be okay with half the memory usage. When you talk about a character represented by several code points, are you talking about Unicode surrogates for characters > 65536? (Oversimplifying, two UCS-2 surrogate characters are used to represent one Unicode code point > 65536.) I think you'd be better off with UTF-32 than UTF-16 and surrogates, as surrogates look like a nightmare that you'd only want if you need backwards compatibility, see Java's character support: http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Character....
Oh, Unicode combining characters and normalization. I classify that as "somebody else's problem." Specifically, if you're writing a font rendering engine, it's your problem. If you're writing an Arc compiler, it's not your problem. If you want complete Unicode library support in your language (like MzScheme's normalization functions string-normalize-nfd, etc.), then you just use an existing library such as ICU, and it's not your problem. ICU: http://www-306.ibm.com/software/globalization/icu/index.jsp
I've been following up on the forum threads but I haven't had time to actually read the code yet. (And the last time I checked, the arc2c executable gave me a segfault.)
The current arc2c output assumes that you have a proper Boehm GC installation, but since I can't seem to get a good install here (prolly something to do with being AMD64 again) I just disable the GC for now.
I finally got arc2c to work (without GC as you suggested). One note though: apparently arc2c relies on rm-global.arc, but doesn't load it by default. So under the current version of the compiler, 'compile-file will error until you (load "rm-global.arc").
Oops. Must have forgotten to add it to arc2c.arc then ^^. Unfortunately I won't be able to fix this until maybe 7 hours from now T.T, haha, I'm in the office ^^
The thing about GC working: well, you need to somehow download the development version of Boehm GC, and, well, that's what's stopping me for now T.T
The server will need to specify arbitrary content-types eventually, and that's way easier than supporting xhtml, so specifying an arbitrary MIME type seems like the obvious solution to me. Also, the official MIME type for SVG is image/svg+xml, so trying to use xhtml both for SVG and HTML seems like a fragile hack.
Thanks for the great replies- I was actually also wondering how the RSS feed in news.arc deals with this... Is there some hack in there for overriding the Content-Type or are web browsers more forgiving for RSS feeds? I couldn't find any specific handling in news.arc for this issue on inspection the other day...
Firefox's Live HTTP Headers plugin tells me that news.ycombinator.com/rss has "Content-Type: text/html; charset=utf-8". I imagine RSS clients are not very strict about what they accept.
Actually I'm in the Philippines. Our holiday is actually supposed to be tomorrow, but our president has a tendency to move holidays near weekends to give long weekends.
LOL. I suppose it's because the populace is mostly dissatisfied with the president, and the president is trying to appease the populace? Those are the conditions in our country anyway ^^
1) hmm. Interesting. Can't think of how to do inlining yet though.
As an aside, my intent was that library functions in a specially defined library file can access primitives %car etc., but not other code - user code can use %car etc. for their own purpose without clashing with the primitives, if only for compatibility with Arc.
2) Yes, this seems correct. And there's also 'eval. Yes, eval's not often used, but still...
5) The problem is using green threads with blocking I/O. Obviously in a server if one thread is blocked by I/O, other threads should still continue. It's really the threads/IO interaction that's bothering me.
Edit: which reminds me - currently closure structures are untyped, meaning we can't safely get the type of a function.
Some background first: the compiler first puts all top-level expressions as parts of a do-block. For much of the compilation run (until it reaches CPS transformation) the compiler represents the entire program in this do-block.
I intend that the library's will simply be inserted at the front of the do-block's list of subexpressions.
The inline transformation phase then iterates over the top-level elements of the topmost do-block. If a top-level element is an assignment to a global variable, we attempt to determine if the assignment is eligible for inlining.
To determine if the assignment is eligible for inlining, we check if it's assigning a function. Since this is a top-level block, the function cannot close over any variables. Then we detect if the function's parameters are referenced 0 or 1 times (if referenced more than that, we can't safely inline it without putting it in a let-block - which creates a function anyway, so no point inlining). Note that we can actually allow the function to reference itself via the global, since we won't remove the assignment to the global.
If we determine that a global is eligible for inlining, we add the symbol and its function to a table of inlinable functions.
Now here's the hard part: we also have to ensure that the global can be safely inlined. If a global is assigned to exactly once, then it could.
While scanning, we check if the global is already in the inlineable set. If it is, we add the global in the banned set. This means that redefining a global will prevent it from being inlined:
(set global
(fn () t))
(prn:global) ; t
(set global
(fn () nil))
(prn:global) ; nil
; cannot safely inline
If a top-level expression isn't an assignment to a global, we scan through its subexpressions for an assignment to a global. For each global assignment, we add the global in the banned set. This prevents us from inlining non-trivially inlineable stuff:
(let c nil
(set reader
(fn () c))
(set writer
(fn (v) (set c v))))
After this scan through, we have a set of inlinable functions and a set of banned-from-inlining. We remove from the inlineable set those that are in the banned set. Then we perform inlining.
Inlining is then done this way: We scan the entire syntax tree and search for function applications, where the function position is a reference to a global variable in our final inlineable set. If it is, we then replace the application with a copy of the function's contents (the function's contents are always placed in a do-block, incidentally). We scan through the copy and look for references to the function's parameters, replacing the parameters with the appropriate expression in the function application. For vararg inlining, we may use the %cons primitives directly to build the vararg parameter.
The assignment to the global is retained. However, we can then repeat the unused-global-removal step (or move that step after this step) to remove the actual non-inlined version if it's not being passed as a function.
1. Local functions which have enclosing environments are harder to inline. If a function's environment is different to the caller's environment, we should replace all its free variables to references to its environment. For simplicity, you can inline only the combinators(functions which have no free variables).
2. When inlining, we should rewrite the parameters only if they are free to the function body, not bound by other local functions in the body.
1. I'm not proposing yet to inline local functions, especially those that close on environments. However, what algorithm would you propose for inlining local functions?
As an aside, closure-conversion makes the creation of environments explicit. Perhaps an inlining step can be added after closure-conversion?
I see. This is actually handled implicitly in the compiler's AST structure: during the conversion from the list form to the AST form, each local variable is given a unique ID:
(fn (x y) (g x y (fn (x) (h x))))
=>
(fn (x@1 y@2) (g x@1 y@2 (fn (x@3) (h x@3))))
; approximation of the AST structure, the AST
; is really a table of properties
So mindless replacement of the inlined version will simply replace x@1, not x@3.
Hmm. Turns out this is a real issue, but for a different reason: since local variables are given unique ID's, we should actually replace local variable ID's for subfunctions when a function is inlined several times: