I've taken a look at CLN and it seems to be a C++ library. I suppose this means we need to create some sort of set of wrapper functions and wrapper data types to handle this for us.
Also since allocation is all from the heap it seems we can use Boehm GC. Limited stack allocations (for leaf functions) would be cool too ^^. Edit: Oops. Since we don't actually end the function until at HALT(), it seems we can't free memory allocated via alloca().
I can see rejecting .. to disallow access above the tree, but rejecting / doesn't make any sense to me. So we have to just plop everything down in one directory? If so, that's ridiculous. I think I need to be less critical of the forkers.
Hehehehe. It's not that we're deliberately forking off - it's more that:
1. pg has been talking about this language for years
2. he has built up a big fan club
3. he released arc in a "not-quite-done" state and unleashed a firestorm of third-party fixes
4. he hasn't updated arc again for more than a month or so.
So yes, although we're not deliberately forking off, I fear this is just what will happen unless pg updates us all again within the next few months, or puts major parts of Anarki in Arc3.
Within Anarki all the file serving was done out of the special docroot directory (at least it was a month ago). I'm not sure if disallowing "../" is what we want or if it what we really want is to ensure that all file serving comes from that dir.
Come to think of it, it seems that collections-in-function-position would make function calls difficult and/or slightly slower. And then there's the annotate thingy.
Hmm. I think to properly support annotate we really need to attach two types to the object: the real type of the representation and the user-declared type of the representation.
Attaching too much information to an object leads to serious problems. Think about fixnums: they're usually implemented using 2 bits out of 32 as a type tag. This way you don't have to allocate them on the heap. If you attach an extra information you have to put everything on the heap (very slow for fixnums).
As of collections-in-function-position instead of returningm for example, and hash-table one could return a fuction that acts as an interface towards the real hash-table.
> Attaching too much information to an object leads to serious problems. Think about fixnums: they're usually implemented using 2 bits out of 32 as a type tag. This way you don't have to allocate them on the heap. If you attach an extra information you have to put everything on the heap (very slow for fixnums).
Then let's not attach both types to the same object. pg's implementation creates a new object and attaches the type to that object, leaving the original representation object (type id and all) untouched. Basically we have an annotation type which just contains the attached type and the original object, leaving the type id bits untouched.
> This doesn't handle setting values, but I think that a few macros could solve the problem.
Yes. I implemented fixnums with the last bit as a tag-bit. If it's 0, everything else is a fixnum. If not, well, it's something else... Maybe nil, t or a reference to something else.
Your idea for callable collections seems quite good, too. But I'm not there yet.
Note that I've actually taken advantage of this fact; my w/html macro, in order to support a better tag class attribute syntax, will process tags of the form div.class#id, which will break if the reader auto-transforms this to (div class#id)
If you're interested, the Anarki lib/arcformat.arc is actually a start at parsing Arc syntax; we just have to change the filters so that instead of emitting html, we emit Arc objects wrapped in singleton lists. It uses treeparse.
treeparse is pretty good, and I'm sure a full Arc parser can be built on treeparse.
Looks like I need to speed up writing my treeparse tutorial ^^
As for parsing special syntax, an Arc version of ssyntax would work: we just need a symbol->string conversion (meaning full 'coerce support) and seek ~!.: in the substring.
edit: how did you handle continuations and tail call opts?
Ah, I'll have a closer look at arcformat then. Might be good to have a reader :)
Continuations & tail-calls are treated the way Steele proposed it in the lambda papers.
The following code :
(set f (fn (n)
(if (< n 2)
n
(+ (f (- n 1)) (f (- n 2))))))
(prn (f 30))
is first translated so as to have unique names for variables and so that primitive operations are prefixed with a % (we won't treat them the same way as other operations in the next steps) :
Then we perform continuation-passing-style transformation : each function call is given, as an extra parameter, the portion of code (the continuation) it will send its result to : functions do not return, they call another function with the value they calculated :
(Sorry for the indentation, I don't feel like indenting this by hand, but you get the idea I guess ?)
Generating C code from there is then easy. It is almost only Marc feeley's code, based on Steele's ideas a few years ago. I only changed the code so that it understands Arc code and primitives instead of Scheme's...
I assume that for code blocks 3 and 4, 'let is the Lisp 'let, not the Arc one? (let ((var val)) ...) not (let var val ...)?
Also on code block 1 you show (prn (f 30)) but in code block 2 you seem to pass (f <continuation> 20)? I assume this is just a mistake?
I managed to understand the transformation up to code block 3, but not code block 4; what does '%closure do? Allocate heap memory for the closure? Also, what's the syntax for "goto"?
Finally: how about the equivalent C code?
LOL, maybe I should just wait for you to push it on the git and experiment with it myself.
Aruu, okay okay I finally actually looked at the presentation docs, which I probably should have looked at first. One of the things that threw me off was 'self - I was thinking of Arc's sense of 'self as in 'afn!
The output C code looks suspiciously like assembly language to me. Perhaps we can also target a temporary assembly syntax so that we can do minimal peephole opts, such as convert stuff like PUSH(x); y = TOS(); to MOVE(x,y);. Can't wait to actually see this code on the git ^^.
P.S. Given that the transformations are (gasp!) syntactic, it might actually be possible to implement the entire compiler as (gasp gasp!) a treeparse parser (or at the very least a piped chain of treeparse parsers) ^^.
Maybe treeparse would be the right thing to use... The code is getting uglier everytime I try to add a new primitive / special form... I dunno...
As for the generated code, yes, it's a lot like a portable assembly code. There are certainly easy optimizations to perform on it, but as for now, it's working and that's a lot :)
And yes, let is the traditional one -- with tons of parens everywhere.
I'll go through the code later to see what can be done. Certainly the AST looks representable as plain lists to me, although I haven't fully analyzed it yet.
As an aside compile-file could be restructured like the following:
(def compile-file (filename)
(compile-ast (parse-file filename) (+ (strip-ext filename) ".c")))
; to allow programmatic access
(def compile-ast (ast dest)
; chain of conversions
(let chain
(list
(list cps-convert "CPS-CONVERSION")
(list closure-convert "CLOSURE-CONVERSION"))
; do reduction
(let final-ast
(reduce
(fn (ast (f desc))
(let new-ast (f ast)
(prn "----------------- AST after " desc)
(prn (source new-ast))
new-ast))
chain ast)
(prn "-------------------- C Code:")
(w/outfile f dest
(w/stdout f
(prn:liststr:code-generate final-code))))))
This should allow easy insertion of any steps (e.g. optimization steps) along the way.
In fact the chain list should probably be better use `(,), so that we can support flags or suchlike for optimizations:
For that matter my concern is with the expansion of PUSH and POP:
PUSH(x); y = POP();
=>
*sp++ = x; y = *--sp;
Can gcc peephole the above, completely eliminating the assignment to the stack (which is unnecessary in our case after all)?
y = x; //desired target
Without somehow informing gcc to the contrary, gcc will assume that writing to * sp is significant, even though in our "higher-level" view the assignment to * sp is secondary to transferring the data between two locations.
Well, with full optimizations on gcc (-O3), it doesn't change anything (at least in execution time, I didn't compare generated machine codes). Wow, gcc seems really clever. Now that I know how hard it is to implement a compiler, I can only applaud :)
If there's no need to be physically there, I'm willing to mentor you, in case sacado is somehow disqualified or is otherwise unable to mentor you (I'll probably ask sacado to unofficially mentor my mentoring you, though, sort of a meta-mentor). Anyway I've just applied, although I'll gladly defer to sacado (it'll be mostly his code anyway) if he is able to mentor you.
Note that due to various circumstances, I won't be able to leave my country until about a year or so, so if my physical presence is needed, sorry!
> If you try to use a tag attribute that Arc doesn't know about, it puts a comment in the HTML <!-- ignoring mytag for a-->. Error reporting in the HTML itself seems like a bad idea.
I think it doesn't - in my experience these warnings occur when I load the file, and are printed on stderr. Not sure though, I haven't actually checked the code.
In any case I'd recommend the use of my w/html macro, in whtml.arc, which IMO is more consistent:
(w/html
('head
('title
(pr "This is My Page!")))
('body
('h1 (pr "This is My Page!"))
('.content
('p
(pr "This is my content!"))
('p
(pr "Another paragraph of my content."))
('p
(pr "The third paragraph of my content!")))
('.footer
(pr "This is where I put my footer string!"))))
.foobar is equivalent to (div class "foobar"), and if you want to put tags with other attributes, just put it in a () form (same syntax as 'tag). So you can create an image by ('(img src "foo.gif"))
w/html also supports the use of #id stuff, but because of the Arc parser, anything that starts with # will not work - you have to either also add a class, or explicitly put the div.
The problem is the inherent difference between binding x in f where it is defined as opposed to binding x in f where it is called.
Dynamic binding fails in this condition:
(let x 42
(def set-x (new-x)
(= x (+ new-x 1)))
(def f ()
x))
The point is that x does not have a local or a global lifetime. Instead, it has a local or a global scope. The lifetime of every variable is like that of any object: when everything that can access it is lost, it gets garbage collected. That's even more consistent: not only are values automatically managed, locations are automatically managed too.
Global variables are always accessible and therefore will never be destroyed. But local variables might die as soon as the function exits, or it might live on. The only functions that have the right to access them, however, are those that are defined specifically to be within its scope. This means we have automatic protection.
Granted, you want to release this protection; presumably you have some plan to have programmers explicitly ask for the protection. I've programmed in a language which supports both dynamic and syntactic binding, and I always prefer using the syntactic binding. This is because while hacking into some other function's global is cool, some day you are going to do this accidentally, and when that day comes, you'll know why a library that used to work suddenly fails.
> This is because while hacking into some other function's global is cool, some day you are going to do this accidentally, and when that day comes, you'll know why a library that used to work suddenly fails.
Isn't that one of the goals of Arc? A hackable language?
That's a good reason if Arc is designed for
average programmers, but pg says it is explicitly
designed for good programmers.
This, it seems to me, is a very different use of "hack." It's more akin to forbidding (scar 'blue 'red), though not quite as severe: the latter is something nonsensical, and though you could hack together a meaning for it, it would certainly break down. The former case hacks around the definition of a function, but is also certain to break. These uses of "hack" are the ugly kind, closer to "kludge", not the beautiful or powerful kind.
LOL. This is where the semi-deliberate delicious ambiguity of "hack is kludge! what you mean? xy is x times y but 23 is not 2 times 3?" vs "hack is elegant! what you mean? e^(i*pi) = -1?" rears its ugly head.
Right. So tell me, suppose I have a very, very old piece of code, a classic, like this:
(def map1 (f xs)
" Return a sequence with function f applied to every element in sequence xs.
See also [[map]] [[each]] [[mappend]] [[andmap]] [[ormap]] "
(if (no xs)
nil
(cons (f (car xs)) (map1 f (cdr xs)))))
Now suppose in the future some random guy creates an application focused on cars. Since the guy knows he won't use 'car and 'cdr (car? cdr? pfft. that's so low-level! use proper let destructuring, foo!), he's quite willing to reuse the name 'car as a variable:
(let car (get-user-car)
(map1 color-wheel car!wheels))
Oops! map1's car function got overridden!
This is a rather contrived example, but hey, believe me: it'll happen, and it won't be as obvious as that. That's why dynamic variables are explicit in CL. That's why the default binding will be static.