As an aside, there already exists a parser combinator library for Arc; my position is that it would be preferable to use that if possible.
Also, the tokenizer function could be redone as a scanner using (scanner 'car (car-expression) 'cdr (cdr-expression)). You can even have each state of the tokenizer as a separate sub-function and model state transfers as tail-function-calls. You could do something like this:
(def tokenize-scanner (s (o ind 0))
(with ((reading-comment reading-unquote ...) nil
cur-token nil
next-ind ind)
(= reading-comment
(fn () ... (default-state)))
...
(default-state)
(scanner 'car cur-token
'cdr (tokenize-scanner s next-ind))))
It might also be good to abstract away the generator portion and put the generator as a library (arguably though, scanners are already monadic generators)
It might be useful too to have the reader read from a list or scanner, and in my opinion will be the way it will be done in SNAP and/or arc2c.
Still, I'm not above actually using your code ^^ Good job!
? I've been browsing arki source for examples but I'm completely out of my depth there ...
One aspect of the token generator is that sometimes it recognises two tokens simultaneously. In other words, when it sees the right-paren in
... foo)
, it recognises "foo" and right-paren all at once. Perhaps this is the wrong way to do it, and I should be using 'peekc instead. But I suppose I can do this with a scanner:
Note that the expressions in the 'scanner form are delayed, i.e. they are not evaluated until a 'car or 'cdr is performed on your scanner, and they are evaluated only once.
edit: an important note: scanners have the exact read semantics of lists. So simply zapping cdr at a scanner will not advance the scanner, it will only advance the place you are zapping.
There's no need to use 'peekc or similar: all you need is to use stuff like 'cadr, 'caddr.
Because scanners have the exact read semantics of lists, you can use such things as 'each, 'map, etc. Just don't write using scar, scdr, or sref.
If you wanted to emulate lists, you can do something like:
(def my-cons (a d)
(scanner 'car a
'cdr d))
Of course, since a and d are just var references, there's little point in delaying their execution.
edit2: Here's how you might make a generator:
(def generator (f v)
(scanner 'car v
'cdr (generator f (f v))))
(= b (generator [+ _ 1] 0))
(car b)
=> 0
(cadr b)
=> 1
(cadr:cdr b)
=> 2
'map, 'keep, and a few other functions become lazy when applied on scanners, so you can use an infinite-series generator on them safely
I feel that it's almost as good as the other solution I presented, but I wonder what the opinion of others are. One advantage it has is that it's the non-Arc code that has special syntax, unlike the marcup version where it's the Arc code that has the special syntax.
Apparently the mockup 'marcup is a bit more popular ^^
Would anyone prefer the 'marcup version over the current, existing 'w/html?
If you want to be technical anyway a continuation is a form of closure. Mostly the difference is the intent of the closure: if it's meant to be the continuation of control flow, it's a continuation.
> But there's a bigger question: ever since I started writing web apps, I've heard the mantra "keep no state on the server". Arc's continuation or closure thing looks like it's totally breaking the rules. "What about scalability!!", as we say in javaland. So someone's got it all wrong, and why doesn't Hacker News fall over more often?
As far as I know Hacker News is only one server. Presumably it's pretty well tuned, and besides, there may be lots of hackers there, but I'm sure the readership is much less than, say, friendster, which I'm sure has to solve larger scalability problems.
This may also very well be the reason why Yahoo rewrote Viaweb ^^
> the server needs to store 30 separate closures
Closures are cheap ^^. For example in arc2c, a closure has one cell (cell=4 bytes for 32bit, 8 bytes for 64bit) for type, one cell for length, one cell for function, and one additional cell for each captured variable. Often the captured variables are just one or two, so on a 64-bit system you're looking at 40 bytes for each closure.
On the other hand a string might take up just as much space, and would result in not-so-clear code.
the good news is that it doesn't hang any more on unbalanced parens. In fact, it highlights them in bright red! What i'd like though is auto-insert of right-parens, and auto-surround selected text. textmate does this and it's really nice, it seems dr scheme doesn't. It's really convenient for turning (expr) into (do (expr) ...)