Arc Forumnew | comments | leaders | submitlogin
New wart interpreter: lisp-1, fexprs
4 points by akkartik 4912 days ago | 32 comments
http://github.com/akkartik/wart/tree/unstable

I've moved off Common Lisp entirely.

Credit to many here, especially diiq for his quoted-param syntax for suppressing evaluation[1], and evanrmurphy for getting me thinking about fexprs[2]. I spent a while looking at Factor as well before deciding to bite the bullet and write an interpreter from scratch. Picolisp was my model for the low level details at the start.

It barely exists at this point. def and mac exist, but no keyword or optional args. The basic evaluation model is done, and the ref-counting garbage-collector is rock-solid. Half the code is tests.

[1] http://github.com/diiq/eight, http://arclanguage.org/item?id=10719

[2] http://arclanguage.org/item?id=13585



3 points by rocketnia 4912 days ago | link

I think it would be nifty to have no symbols in the language, just strings which evaluate like symbols and (quote ...) for suppressing that. You mentioned not reserving | for symbols (http://arclanguage.org/item?id=14882), but essentially we can reserve " instead.

  wart> '"car"
  car
  wart> "car"
  #<fn:car [native code]>, or whatever
I suppose this is related to how PicoLisp treats ".

---

"The basic evaluation model is done"

Any thoughts about continuations, TCO, or concurrency? Not that you necessarily need any of those, but I don't expect you personally to be satisfied until you have them all. :)

Also, specialization (as in, partial evaluation) is likely to be your friend in an fexpr language, as far as performance goes. You've probably heard this from me before, but I think one of Factor's potential advantages is the way it helps guarantee the immutability of many variables, so that a hypothetical specialization process could inline them as constants.

Furthermore, I've found that module system design kinda gets a tentacle into the evaluation semantics, at least at the top level. But you've had different module ideals than I have, so I won't be surprised if you're already on the best track for what you want. :)

Fortunately, hygiene design won't muck things up. I bet you practically get hygiene for free once you're using fexprs with lexical scope. ^_^ (No need for PicoLisp's transient symbols. @_@ )

---

"no keyword or optional args"

Since you're interpreting the argument list at run time, I think it makes sense to design keyword and optional args according to whatever's most straightforward to analyze and execute (the way s-expressions are straightforward for value expressions). If it's inconvenient to write in that syntax, no worries; people can use macros, or you can redesign the reader just for the occasion.

---

Guess I've made a bit of a Christmas list here, lol. XD

Don't worry about it too much. I'm just reacting with some wild brainstorming while it's still an appropriate time for me to do so. :-p

-----

1 point by akkartik 4912 days ago | link

I love it; I did ask to be talked to, after all :)

"I bet you practically get hygiene for free once you're using fexprs with lexical scope. ^_^"

Yes, that was explicitly a hypothesis.

I also wanted to try to eliminate interpreter errors as much as possible. car/cdr of non-cons is nil, sym coerced to num is 0, etc. If the interpreter never throws errors it lets you build other languages atop it without abstraction leakage. And it forces me to be more disciplined in writing tests.

"Any thoughts about continuations, TCO, or concurrency?"

I'm hoping that parsimonious memory use[1] will let me just run lots of server processes in parallel for throughput. I know concurrency isn't quite the same thing :), but it hasn't been on my radar much. I'm kinda hoping outsourcing storage to some external (erlang) program will get me out of jail on that one. For the foreseeable future wart is just for webapps because that's all I want to build.

I don't understand continuations well enough to implement them. I'm not sold on call/cc (and certainly not delimited continuations) but I do want yield and coroutines somehow[2]. I just haven't at all thought about how to build them. I don't even know what sort of support I want for early returns[3].

TCO is just an optimization and has nothing to do with the evaluation model ^_^. In general I haven't thought about optimization at all (I'll keep partial eval in mind, thanks). My first milestone is to get to the point where I can replace the common lisp version with this interpreter. While keeping the code simple and readable. After that I may strike out from that base camp to more speculative experiments for readable concurrency or optimization or continuations. I have this pipe dream of being able to add optimizations from within lisp, without changing the underlying C.

But I also want to build stuff with the interpreter :)

---

What would unifying strings and syms accomplish? I never understood why picolisp tried to do that. They are indeed similarly implemented[4], but that's wholly accidental. I find escaping and spaces inside symbols to be really weird[5]. And auto-converting to/from "strings" like in shell or Tcl is super ugly; I like to be clear in my mind about what's a string and what isn't. But maybe it's just my C++ roots showing.

---

[1] But not too parsimonious ;) Cons cells currently use nearly twice as much memory as they need to for alignment reasons.

[2] http://arclanguage.org/item?id=14221

[3] I made one abortive -- and poorly named -- attempt: http://github.com/akkartik/wart/commit/0fd8569716cecba126627...

[4] One difference: strings are not interned.

[5] What's more, it sucks to use up '|' for such an infrequently-used feature.

-----

1 point by rocketnia 4912 days ago | link

"If the interpreter never throws errors it lets you build other languages atop it without abstraction leakage."

I don't see the advantage. If calling 'car on a non-cons gives me a value that I can't tell was caused by an error, then if I want a leak-proof abstraction I'll do anything I can to detect the error in a separate step from calling 'car, just as I would if 'car raised an exception.

An exception with an error message is at least nice for detecting that there's a problem and troubleshooting it, even if the message is as obscure as "Can't take car of 1".

---

"TCO is just an optimization and has nothing to do with the evaluation model ^_^."

Tell that to my lathe.js and chops.js, which I'm having trouble getting to work on Firefox because its JavaScript stack is small. XD I have a lot of ifsomething( ... ) utilities that take callbacks and tail-call them (CPS), and that pattern just doesn't seem to work out. :-p

Anyway, I'm not complaining about JavaScript necessarily. I knew about the lack of TCO from the outset. Whatever way this forces me to redesign lathe.js will help with Penknife, which I don't expect to have TCO either (for straightforwardness's sake, and because unfortunately Penknife/Lathe/'extend rulebooks don't call their rules in tail position anyway).

---

"I don't even know what sort of support I want for early returns."

That can be implemented in a standard library as long as there are exceptions that can hold arbitrary contents. I implement early returns in lathe.js's point(), for instance--which, since it isn't a standard part of JS, means it doesn't necessarily interact well with other people's exception-handling code. :-p

---

"I don't understand continuations well enough to implement them."

What you're doing with pushing and popping lexical and dynamic scopes seems about halfway to continuations. If you also push expressions on a stack and evaluate them in a separate loop, the combination of scope stacks and an expression stack is basically a continuation. To capture a continuation, copy the stack(s); to call it, restore the stack(s) and push a return value. Something like that anyway. :-p (If you don't understand continuations from a language user perspective, I guess I don't expect this explanation to help.)

I haven't tried to implement partial evaluation/specialization yet, but I expect it to kinda work the same way. To specialize a function, start some fresh stack(s) for it with special dummy values in place of the arguments, somehow step the calculation forward as far as possible (without doing I/O--including variable lookup--or doing any inspection of the dummy values), and copy the stack(s) back off as the result. I think it'd be pretty difficult. ><;

---

"What would unifying strings and syms accomplish?"

For me it's more like: What does separating them accomplish? The way I use Arc, they're both interned, immutable, finite sequences of arbitrary characters. The fact that they have different types means the meaning of a single character sequence can be overloaded ("nil", especially), but for the most part I just coerce them back and forth to appease the language.

---

"But maybe it's just my C++ roots showing."

I didn't know you had C++ roots until now. :)

-----

1 point by akkartik 4912 days ago | link

"If calling 'car on a non-cons gives me a value that I can't tell was caused by an error, then if I want a leak-proof abstraction I'll do anything I can to detect the error in a separate step from calling 'car, just as I would if 'car raised an exception."

Yeah you're right, I was making no sense. Or at least I've forgotten the concrete situation I was (over) reacting to.

Wart does throw a message: 'car of non-cons: 1'. It's just a warning, it doesn't halt execution, but I'm still contradicting myself by that decision.

-----

1 point by Pauan 4911 days ago | link

I'll note that in my code I frequently have to use errsafe, because the thing may not be a cons. For instance, suppose you had a table called "foo". I've been using this pattern a lot recently:

  (if (foo:car x) ...)
But that will break if x isn't a cons, so I have to do this:

  (if (errsafe:foo:car x) ...)
So I don't think it's necessarily a bad idea for car/cdr to return a value rather than throw an error... but I can see why some people would prefer it to throw an error. For me personally, I think my code would benefit more from car/cdr returning nil, rather than throwing an error.

-----

1 point by rocketnia 4911 days ago | link

I'd use this:

  (if (foo acons&car.x) ...)
...except maybe I wouldn't, 'cause I think Rainbow treats a&b.c as (andf a b.c).

-----

1 point by akkartik 4912 days ago | link

"they're both interned, immutable, finite sequences of arbitrary characters."

There aren't constraints against it, but I tend to think of strings as binding-less things. They're closer to numbers in that respect.

I often use syms when I mean strings (one less keystroke, no need to hit <shift>). But I only convert strings to syms inside macros.

-----

1 point by Pauan 4912 days ago | link

"[5] What's more, it sucks to use up '|' for such an infrequently-used feature."

Yeah, I agree. When I want a symbol with weird characters in it, I'd rather just escape it, like so:

  foo\ bar
Instead of:

  |foo bar|
The biggest reason I've found for using symbols with odd characters is when I'm using them as string substitutes. For instance, in a hypothetical Arc library that parses stuff, you might want to be able to write this:

  (prefix new ...
          !   ...
          ~   ...
          +   ...
          -   ...)
Rather than this:

  (prefix "new" ...
          "!"   ...
          "~"   ...
          "+"   ...
          "-"   ...)
Just looks a lot cleaner. Aside from that, I haven't had much reason to use || or symbol escapes... besides abusing it for fun, of course:

  (def |(prn "hello world")| ()
    (prn "goodbye world"))

  (|(prn "hello world")|) -> "goodbye world"

-----

1 point by akkartik 4912 days ago | link

Yeah there's a whole series of articles about symbol abuse at http://kazimirmajorinc.blogspot.com (at and just before http://kazimirmajorinc.blogspot.com/2011/07/implementing-dat...)

-----

1 point by akkartik 4911 days ago | link

  (prefix new ...
          !   ...
          ~   ...
          +   ...
          -   ...)
You should be able to just do that without any sym-escaping, right? Perhaps quote should suppress ssyntax expansion?

The only reason for escaping seems to be spaces. And really weird characters like quotes.

-----

1 point by Pauan 4911 days ago | link

"The only reason for escaping seems to be spaces. And really weird characters like quotes."

Yes. For instance, in arc2js:

  (prefix    new     new\      17
             del     delete\   15
             no      !         14)

  (bin       mod     %         13
             and     &&        5
             or      \|\|      3)

  (binsep    do      \,        1)
Note the escape for , and || and spaces. I still find this much more readable than using strings. But you don't need || for that... you can just use \ escaping. So I think it's a very good idea to free up || for something else.

---

As for preventing ssyntax expansion... right now, ssyntax is handled in arc.arc, so it's only expanded if you explicitly call ssexpand (or if a function like setforms calls ssexpand)... so in the example above, I simply don't call "ssexpand", and it all works out fine.

However, if ssyntax were handled by the reader, then there should definitely be a way to prevent it from expanding... quote might be a good way of doing that.

-----

1 point by rocketnia 4911 days ago | link

"I still find this much more readable than using strings."

I don't! XD I don't have a problem with you finding it readable, but for me, if I have something with spaces in it, especially if the space is at the end, I surround it with some kind of delimiter:

  (prefix    new     "new "    17 ...)
  (prefix    new     |new |    17 ...)
That way if I write my code this way...

  (prefix new new\  17
          del delete\  15
          no ! 14)
...then I won't be tempted to "fix" the stray whitespace to this:

  (prefix new new\ 17
          del delete\ 15
          no ! 14)
In fact, I don't think I use \ at all, and sometimes I've thought it could be cool to free up for something else. It is a non-shifted punctuation mark.

-----

1 point by Pauan 4911 days ago | link

"In fact, I don't think I use \ at all, and sometimes I've thought it could be cool to free up for something else. It is a non-shifted punctuation mark."

Yeah, but... \ is already used in strings for escaping, so using it for symbol escaping is quite natural (read: consistent). If we freed up \ then how would we escape symbols? Unless you're saying symbols shouldn't be escaped, and we should be limited in the number of things we can put into a symbol...

-----

1 point by rocketnia 4910 days ago | link

"If we freed up \ then how would we escape symbols?"

We'd escape them with ||. Inside ||, we could still use \ for escaping things like |, even if it doesn't have that meaning on the outside.

-----

1 point by Pauan 4910 days ago | link

No no no, the whole point is that we're trying to free up | because it's a useful character... more useful than \ in my opinion, especially since \ is so incredibly commonly used for escaping.

-----

1 point by rocketnia 4907 days ago | link

Then how about using \ as a delimiter?

  (prefix new \new \ 17
          del \delete \ 15
          no ! 14)
To reiterate a bit, the reason I wouldn't use \ the way you do is that "\ " at the end of a symbol makes the space blend in with the whitespace around it. For an example other than this one, consider that putting such a symbol at the end of a line might introduce bugs when people remove all trailing whitespace in a file.

-----

2 points by Pauan 4907 days ago | link

"Then how about using \ as a delimiter?"

As said, that's totally inconsistent with the way that \ works everywhere else, including within strings. I'm not entirely sure how to fix this problem adequately without keeping | in the language.

But... you did mention making strings and symbols identical... then we would just use "" to delimit symbols with strange characters in them. Of course then we'd be giving up mutable strings, but is that really so bad? That's how we use them most of the time anyways. I actually rather like that idea. It would also make (is "foo" "foo") consistent, whereas right now it isn't.

But... this has some amusing implications. I had mentioned earlier that I wanted to make strings identical to a list of characters... so you can call car and cdr on a string. But if strings and symbols are identical, then that basically means symbols would be... a list of characters. So, symbols would be mutable. And (car 'foo) would be #\f. And symbols would no longer be interned... Pretty crazy stuff.

---

"To reiterate a bit, the reason I wouldn't use \ the way you do is that "\ " at the end of a symbol makes the space blend in with the whitespace around it. For an example other than this one, consider that putting such a symbol at the end of a line might introduce bugs when people remove all trailing whitespace in a file."

Yeah, I know. I just decided that for my particular use case, I was willing to accept the practical disadvantage in exchange for increased readability and one less character. To put it bluntly, I prefer the way "new\ " looks, compared to "|new |" I do agree that delimiters are overall better, though.

-----

1 point by rocketnia 4907 days ago | link

"totally inconsistent with the way that \ works everywhere else, including within strings"

Where else does it work, besides strings? Regexes? If you're talking about quoting code from other languages (not counting TeX and other not-so-C-inspired languages which already use backslash for other purposes :-p ), yes, it can be nice to have some consistency between layers of escaping.

Sometimes I really don't like the exponential growth that results from that:

  ""
  "\"\""
  "\"\\\"\\\"\""
  "\"\\\"\\\\\\\"\\\\\\\"\\\"\""
  "\"\\\"\\\\\\\"\\\\\\\\\\\\\\\"\\\\\\\\\\\\\\\"\\\\\\\"\\\"\""
  "\"\\\"\\\\\\\"\\\\\\\\\\\\\\\"\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"\\\\\\\\\\\\\\\"\\\\\\\"\\\"\""
Well, It's not quite as bad as that. If we have \x escapes, it can become quadratic:

  ""
  "\"\""
  "\"\\\"\\\"\""
  "\"\\\"\\\\x22\\\\x22\\\"\""
  "\"\\\"\\\\x22\\\\x5Cx22\\\\x5Cx22\\\\x22\\\"\""
  "\"\\\"\\\\x22\\\\x5Cx22\\\\x5Cx5Cx22\\\\x5Cx5Cx22\\\\x5Cx22\\\\x22\\\"\""
And \u escapes (supported by JSON) become quadratic in the same way:

  ""
  "\"\""
  "\"\\\"\\\"\""
  "\"\\\"\\\\\\\"\\\\\\\"\\\"\""
  "\"\\\"\\\\\\\"\\\\\\\\u0022\\\\\\\\u0022\\\\\\\"\\\"\""
  "\"\\\"\\\\\\\"\\\\\\\\u0022\\\\\\\\u005Cu0022\\\\\\\\u005Cu0022\\\\\\\\u0022\\\\\\\"\\\"\""
IMO, it's too bad we don't just have quadratic growth from the start:

  ""
  "\q\q"
  "\q\-q\-q\q"
  "\q\-q\--q\--q\-q\q"
  "\q\-q\--q\---q\---q\--q\-q\q"
  "\q\-q\--q\---q\----q\----q\---q\--q\-q\q"
However, the exponential growth is a bit easier to keep track of than these quadratic options, I think, and that's a bit more important.

What I really think could be cool is for every language to be Penknife-like, with string syntaxes accepting any nested square brackets inside. Then we'd get to spend almost no editing effort and yet have linear growth:

  s[]
  s[s[]]
  s[s[s[]]]
  s[s[s[s[]]]]
  s[s[s[s[s[]]]]]
  s[s[s[s[s[s[]]]]]]
---

"But... you did mention making strings and symbols identical... then we would just use "" to delimit symbols with strange characters in them."

That's why I brought it up. Akkartik mentioned not reserving |, and I don't like the symbol-string divide anyway, so I figured it could kill two birds with one stone.

Actually there's another option. We can take advantage of the string syntax to write odd symbols using some kind of #s"hey, I'm a symbol" reader macro... at the cost of giving up #. (We could also use #s[hey, I'm a symbol].)

---

"we'd be giving up mutable strings"

Well, I did mention that I use strings as though they're immutable and interned. I do this mainly because of 'is comparing strings by content, and because I like the thought that every string literal reuses the same instance, but I also just don't see the point of mutating strings.

---

"I wanted to make strings identical to a list of characters [...] So, symbols would be mutable."

Yeah, I wouldn't suggest doing symbol=string and string=list at the same time. It's pretty crazy to have mutable symbols, since that makes them more confusing to use as hash table keys. Given the choice, I'd ditch string=list first (and I'm sure you expected me to say that).

-----

1 point by Pauan 4906 days ago | link

"Actually there's another option. We can take advantage of the string syntax to write odd symbols using some kind of #s"hey, I'm a symbol" reader macro... at the cost of giving up #. (We could also use #s[hey, I'm a symbol].)"

No, you don't give up # because Racket's reader already uses # for various stuff. For instance, you can use #rx"foo(?=bar)" to write compiled regexps. Now, if we were writing a reader in Arc, then we'd need to decide whether to use # or not... which I guess we'd have to do anyways, if we were going to unreserve |

I think using #s is fine, except then I'd have to write #s"new " which is even worse than |new | In that case we might as well remove the distinction between strings and symbols...

---

"... but I also just don't see the point of mutating strings."

I think it makes sense to have mutable strings if you see them as a list of characters. But Arc doesn't treat them as a list of characters... so they're this weird data type that's sorta in between symbols and lists...

If Arc treated them as a list of characters, then mutability makes sense, and then we'd have to use `iso` rather than `is`, which also makes sense. But given the way Arc treats strings right now, I think it's better to see them as a special subtype (supertype?) of symbols, in which case I agree that mutability doesn't make sense.

---

"Yeah, I wouldn't suggest doing symbol=string and string=list at the same time."

But... but my beautiful, crazy, and absurd plans! How can you dismiss them so easily?! :P

---

"...since that makes them more confusing to use as hash table keys."

Why? We can already use mutable lists as hash table keys...

-----

1 point by rocketnia 4906 days ago | link

"No, you don't give up # because Racket's reader already uses # for various stuff."

Just because the cost is paid once rather than once for each feature doesn't mean it isn't a cost. If we didn't use # for anything else, we could use it in symbols too. (Penknife allows # in symbols, for instance.)

If we take Racket as a starting point, then # is a heavily overloaded escape character and pretty hard to get rid of. How about changing it to \? :-p Then characters would be written not as #\a but as \\a --or maybe \c"a", if we insist not to use \ for anything but escaping.

---

"Why? We can already use mutable lists as hash table keys..."

Are you saying that makes sense? XD

Here's a http://tryarc.org/ session:

  arc> (= foo (tabl­e))
  #hash()
  arc> (= bar list.­2)
  (2)
  arc> (= foo.b­ar 'woo)­
  woo
  arc> foo
  #hash(((2 . nil) . woo))
  arc> (= bar.0­ 3)
  3
  arc> foo.bar
  nil
  arc> (each (k v) foo prn.v­)
  woo
  #hash(((3 . nil) . woo))
  arc> (each (k v) foo (prn foo.k­))
  nil
  #hash(((3 . nil) . woo))
  arc> (= (foo list.­2) 'boo)­
  boo
  arc> (= bar.0­ 2)
  2
  arc> (each (k v) foo prn.v­)
  boo
  woo
  #hash(((2 . nil) . woo) ((2 . nil) . boo))
  arc> keys.foo
  ((2) (2))
  arc> (each k keys.­foo (prn foo.k­))
  woo
  woo
  nil
This behavior seems to indicate that the table saves the hash of a key when it's entered and then doesn't update that hash when the key is mutated, and that it (probably) stores any keys with equal hashes so that they're checked for full equality in the order they were put in. However, the Racket docs just call the behavior "unpredictable" (http://docs.racket-lang.org/reference/hashtables.html), and that's just fine for a behavior that isn't here for any API-like reason, just here because it's an easier and more efficient implementation than more paranoid alternatives.

Anyway, I suspect you never meant mutable keys made that much sense, just that they make sense as long as we don't actually mutate them. :-p I'm fine with that, just like I'm fine with treating strings as immutable. (I guess one of the paranoid alternatives is to freeze anything used as a table key.)

---

That said, there's a completely different side to this topic I haven't gotten to: On Jarc, every table key is implicitly converted to a string for lookup purposes. I've brought up the point that it would be more consistent if Jarc simulated Racket 'equal? (which pg-Arc uses), but I don't really believe either of these things is the right thing to do, since they compare tagged values by their implementation content. So for my own purposes, I already consider it unpredictable what happens if I use non-symbol table keys, let alone mutable lists.

-----

1 point by Pauan 4906 days ago | link

"Just because the cost is paid once rather than once for each feature doesn't mean it isn't a cost. If we didn't use # for anything else, we could use it in symbols too. (Penknife allows # in symbols, for instance.)"

I know, which is why I mentioned that we would probably need to write an Arc reader in order to unreserve | And if we did that, we would then also need to decide whether to use # or something else.

---

"If we take Racket as a starting point, then # is a heavily overloaded escape character and pretty hard to get rid of. How about changing it to \? :-p Then characters would be written not as #\a but as \\a --or maybe \c"a", if we insist not to use \ for anything but escaping."

Using \ as an escape does seem better than using # to be honest. If we were to write a reader in Arc (which I support, by the way), I think it'd be interesting to try out \ rather than #

---

"Are you saying that makes sense? XD"

I like it better than Python's "you can only store hashable things in keys" approach. Or JavaScript's insane "all keys are coerced to a string, so all Objects are put into the same key" approach.

---

"Anyway, I suspect you never meant mutable keys made that much sense, just that they make sense as long as we don't actually mutate them. :-p I'm fine with that, just like I'm fine with treating strings as immutable."

Yeah, something like that. I suspect Racket behaves that way for performance reasons, but ideally I think it'd be nice if it did update the hashes, so that mutable keys would behave as expected.

---

"(I guess one of the paranoid alternatives is to freeze anything used as a table key.)"

That's basically Python's approach: only hashable things are allowed as keys in objects.

-----

2 points by rocketnia 4905 days ago | link

"I know, which is why I mentioned[...]"

Yeah, I know. ^_^

---

"I like it better than Python's "you can only store hashable things in keys" approach."

Well... you can only use hashable things as keys in Racket too. (How else would hash tables work?) Almost everything in Racket is hashable, but it's possible to make things that kinda aren't, like this:

  arc> ($:define-­struct foo (dumm­y) #:pro­perty prop:­equal+hash­ (list­ (lamb­da args #t) (lamb­da args (erro­r "Can'­t prima­ry hash this.­")) (lamb­da args (erro­r "Can'­t secon­dary hash this.­"))))
  #<void>
  arc> (= my-ta­ble (tabl­e))
  #hash()
  arc> (= my-fo­o $.mak­e-foo!dumm­y-val)
  #<foo>
  arc> (= my-ta­ble.my-foo­ 2)
  Error: "Can't primary hash this."
This is using the 'equal? hash. I don't know if there's a way to make something in Racket that's unhashable with regard to 'eq? or 'eqv?.

You can get at these hash values directly using 'equal-hash-code and friends. http://docs.racket-lang.org/reference/hashtables.html

-----

1 point by Pauan 4905 days ago | link

"Well... you can only use hashable things as keys in Racket too."

Hm... odd, I remember in Python, that you weren't able to use things like lists (which are mutable) as keys... but I guess I remember wrong because I just tried it and it works. Not sure what the problem was that I had with Python's hash tables...

---

By the way, I feel like hash table keys should only use eq, not equal...

-----

1 point by rocketnia 4905 days ago | link

"By the way, I feel like hash table keys should only use eq, not equal..."

Racket has eq-based tables too those, and I wouldn't want 'equal? for weak table keys. However, if I want to use a list or something as a key, it's likely because I want to look something up using multiple values, in which case 'equal? is useful. (In those cases, the ability to intern something into an immutable data structure 'equal? to it would probably be just as useful....)

-----

1 point by Pauan 4906 days ago | link

By the way, I just realized something... symbols, when evaluated, are basically a variable reference... but strings are self-evaluating. So if strings == symbols... then what should (+ "foo" 1 2 3) do? Should it treat the "foo" as a variable reference, or a literal? Would we then need to use '"foo" instead?

So... because symbols evaluate to variables, and strings are self-evaluating, I don't really see a way to use the same syntax for both of them. If we decide that "foo" is equivalent to 'foo then that means we cannot have variable names with weird stuff in them.

On the other hand, if we decide "foo" is equivalent to foo, then that means we can no longer use string literals without quoting them, which is pretty funky. So any solution we choose will be right half the time.

Thus.. we would need to have a separate syntax for them, like using #s"foo" for symbols, and "foo" for strings...

---

Hm... crazy brainstorming: global symbols that don't refer to anything are self-evaluating. That would solve the problem at the cost of being really crazy and confusing. :P

In any case, if strings == symbols, then "foo" would basically just be a shorthand for '#s"foo" (note the quote)

-----

1 point by rocketnia 4906 days ago | link

"Would we then need to use '"foo" instead?"

That's pretty much what I used for my original example at http://arclanguage.org/item?id=14883. If I thought (quote ...) around strings was a deal breaker, the topic would have never come up. :-p

---

"Hm... crazy brainstorming: global symbols that don't refer to anything are self-evaluating."

Isn't that the same idea as http://arclanguage.org/item?id=13823? Or are you having this be determined at compile time rather than run time? That could be an interesting spin on things.

-----

2 points by Pauan 4906 days ago | link

"Isn't that the same idea as http://arclanguage.org/item?id=13823? Or are you having this be determined at compile time rather than run time? That could be an interesting spin on things."

Yes. To be more specific, it's the same idea as point #3 in that post. But it doesn't matter whether it's done at compile time or runtime. The end result is the same: if you have a string literal "foo" and somebody defines a global variable foo, then suddenly your program's behavior changes drastically. Which is why I called it "really crazy and confusing".

--

So, right now, my opinion is that "foo" should basically mean the same thing as (quote \s"foo") where \s"" is syntax for creating a symbol with odd characters in it. That approach isn't 100% ideal in every circumstance[1], but it should be overall the best in the general case.

There is one question, though: should symbols or strings be the primitive? In other words, should it be (isa "foo" 'sym) or (isa "foo" 'string) ?

Personally, though the term "string" is more familiar, I'm actually leaning toward sym. The word "string" is pretty confusing, when you think about it, but "symbol" is a very reasonable thing to call a sequence of characters.

---

As a side effect of making symbols eq to strings... we would also make this work:

  (coerce 'foo 'cons) -> (\\f \\o \\o)
I'm using \\f to mean the same thing as #\f by the way.

Hm... what if we removed chars from the language? What's the purpose of them, really?

---

P.S. Somewhat related: PHP apparently sometimes treats strings as variable references:

http://stackoverflow.com/questions/1995113/strangest-languag...

---

* [1]: I'd have to write "new " rather than new\ How horrible. :P

-----

2 points by rocketnia 4905 days ago | link

"But it doesn't matter whether it's done at compile time or runtime. The end result is the same: if you have a string literal "foo" and somebody defines a global variable foo, then suddenly your program's behavior changes drastically."

If it's done at compile time, it's a bit better: The program's behavior doesn't change at all if someone defines a global variable 'foo after your code has been loaded.

---

"So, right now, my opinion is that "foo" should basically mean the same thing as (quote \s"foo")..."

My first impression seeing reader syntaxes like \s"foo" is that the reader will recursively read "foo" and then convert the result somehow.

I guess it could read "foo" as (quote foo) and convert that result using 'cadr, lol. :-p

---

"The word "string" is pretty confusing, when you think about it, but "symbol" is a very reasonable thing to call a sequence of characters."

Huh, nice observation. ^_^ I think I've called strings "texts" sometimes, so mayhaps that's another option.

---

"I'm using \\f to mean the same thing as #\f by the way."

I see that as using \ in a non-escaping way. I don't have a problem with using it that way, but I don't have a problem with using it as a delimiter either.

---

"Hm... what if we removed chars from the language? What's the purpose of them, really?"

Would you have a symbol be a sequence of length-one symbols, and have every length-one symbol be its own element? Anyway, I don't have any opinion about this. :-p

---

"P.S. Somewhat related: PHP apparently sometimes treats strings as variable references"

Jarc does the same thing if you call a symbol. I don't really have an opinion about this either.

-----

1 point by Pauan 4905 days ago | link

"Huh, nice observation. ^_^ I think I've called strings "texts" sometimes, so mayhaps that's another option."

Sure, but traditionally Lisp has used the term "symbol", and even languages like Ruby have picked up on it. And there's another thing. Symbols in Lisp are often used as variable names. In that context, the word "text" doesn't make much sense, but the word "symbol" still makes perfect sense. So I still vote for "symbol", even though I think "text" is more reasonable than "string".

---

"Would you have a symbol be a sequence of length-one symbols, and have every length-one symbol be its own element? Anyway, I don't have any opinion about this. :-p"

Yes. :P At least, when coerced to a list. This would let us get rid of two similar-but-not-quite-the-same data types: strings and chars. It annoys me that I have to use (tokens foo #\newline) rather than (tokens foo "\n")

I don't really see much purpose or benefit of having a distinction between chars and strings... Python and JavaScript get by just fine with only strings, for instance.

In addition, I find myself juggling between symbols and strings a lot... converting to a sym when I want Arc to see it as an identifier, and as a string when I want to do things like index into it, or coerce it to a list... or when Arc code requires a string rather than a sym, etc... The more I think about it, the better it seems to unify strings/symbols/characters.

-----

2 points by rocketnia 4905 days ago | link

I thought the point of the term "symbol" was to signify something that was used as a name. I do think it's a tad more evocative to call every string a symbol, but it feels a bit like calling every cons cell a form.

Inasmuch as I have any desire to be traditional, I'll call strings strings. :-p

-----

3 points by Pauan 4912 days ago | link

"and evanrmurphy for getting me thinking about fexprs[2]"

I've been coming to the conclusion that although macros are great, they have some pretty severe limitations, due to them always being expanded at compile-time. So I'd be very interested to see if fexprs would be a good fit for Arc.

What I think would be neat is if macros were expanded at compile-time as much as possible, and when they can't be expanded at compile-time, they'll revert back to interpreting at run-time. Then, you can get full macro speed most of the time, but it would still allow you to do things like, say, use `apply` on a macro, or pass it as an argument to `map`, etc.

Alternatively, you could just not have macros, only having fexprs, and then try to optimize fexprs in various ways. I doubt it'd be as fast as expanding macros at compile-time, but... who knows, it might still be Fast Enough(tm).

Anyways, I think it's good for us to experiment with fexprs more, and see how things turn out.

---

In fact, because of the severe problems with macros, I independently came up with a pattern that rocketnia had already discovered: thunks. By wrapping every argument to a function in a thunk, you basically turn them into ghetto fexprs. So rather than doing this:

  (foo a b c)
You'd do this:

  (foo (fn () a) (fn () b) (fn () c))
But obviously that's a total pain, so what I did was rename "foo" to "foofn", and then have a macro called "foo". So that way, this:

  (foo a b c)
Would expand into this:

  (foofn (fn () a) (fn () b) (fn () c))
As near as I can tell, this is pretty much hygienic macros, since foofn is an ordinary function that has lexical scope. It's not nearly as clean or expressive as fexprs, but it's a cool little hack that can help work around the limitations of macros.

But, if we get to the point where a significant number of macros are basically just expanding into thunks... then I think we'd be better off having fexprs.

-----

2 points by bogomipz 4910 days ago | link

About thunks; this is actually a situation where statically typed languages have the potential to do something interesting that is hard in dynamically typed systems.

If foo takes an argument that should be a zero arity function that returns a number, and you write:

  (foo (+ x y))
The compiler could recognize that the expression you used for the argument can be converted into a function with the correct signature. I think I have heard of languages on the JVM that do this, not sure which.

Also note that Smalltalk-80 is very much designed around the principle of thunks. E.g:

  a > b ifTrue: [Transcript show: 'A is great!']

-----