Arc Forum | "The only reason for escaping seems to be spaces. And really weird characters li...

Arc Forum

1 point by Pauan 5472 days ago | link | parent

"The only reason for escaping seems to be spaces. And really weird characters like quotes."

Yes. For instance, in arc2js:

  (prefix    new     new\      17
             del     delete\   15
             no      !         14)

  (bin       mod     %         13
             and     &&        5
             or      \|\|      3)

  (binsep    do      \,        1)

Note the escape for , and || and spaces. I still find this much more readable than using strings. But you don't need || for that... you can just use \ escaping. So I think it's a very good idea to free up || for something else.

---

As for preventing ssyntax expansion... right now, ssyntax is handled in arc.arc, so it's only expanded if you explicitly call ssexpand (or if a function like setforms calls ssexpand)... so in the example above, I simply don't call "ssexpand", and it all works out fine.

However, if ssyntax were handled by the reader, then there should definitely be a way to prevent it from expanding... quote might be a good way of doing that.

1 point by rocketnia 5472 days ago | link

"I still find this much more readable than using strings."

I don't! XD I don't have a problem with you finding it readable, but for me, if I have something with spaces in it, especially if the space is at the end, I surround it with some kind of delimiter:

  (prefix    new     "new "    17 ...)
  (prefix    new     |new |    17 ...)

That way if I write my code this way...

  (prefix new new\  17
          del delete\  15
          no ! 14)

...then I won't be tempted to "fix" the stray whitespace to this:

  (prefix new new\ 17
          del delete\ 15
          no ! 14)

In fact, I don't think I use \ at all, and sometimes I've thought it could be cool to free up for something else. It is a non-shifted punctuation mark.

-----

1 point by Pauan 5472 days ago | link

"In fact, I don't think I use \ at all, and sometimes I've thought it could be cool to free up for something else. It is a non-shifted punctuation mark."

Yeah, but... \ is already used in strings for escaping, so using it for symbol escaping is quite natural (read: consistent). If we freed up \ then how would we escape symbols? Unless you're saying symbols shouldn't be escaped, and we should be limited in the number of things we can put into a symbol...

-----

1 point by rocketnia 5472 days ago | link

"If we freed up \ then how would we escape symbols?"

We'd escape them with ||. Inside ||, we could still use \ for escaping things like |, even if it doesn't have that meaning on the outside.

-----

1 point by Pauan 5472 days ago | link

No no no, the whole point is that we're trying to free up | because it's a useful character... more useful than \ in my opinion, especially since \ is so incredibly commonly used for escaping.

-----

1 point by rocketnia 5469 days ago | link

Then how about using \ as a delimiter?

  (prefix new \new \ 17
          del \delete \ 15
          no ! 14)

To reiterate a bit, the reason I wouldn't use \ the way you do is that "\ " at the end of a symbol makes the space blend in with the whitespace around it. For an example other than this one, consider that putting such a symbol at the end of a line might introduce bugs when people remove all trailing whitespace in a file.

-----

2 points by Pauan 5469 days ago | link

"Then how about using \ as a delimiter?"

As said, that's totally inconsistent with the way that \ works everywhere else, including within strings. I'm not entirely sure how to fix this problem adequately without keeping | in the language.

But... you did mention making strings and symbols identical... then we would just use "" to delimit symbols with strange characters in them. Of course then we'd be giving up mutable strings, but is that really so bad? That's how we use them most of the time anyways. I actually rather like that idea. It would also make (is "foo" "foo") consistent, whereas right now it isn't.

But... this has some amusing implications. I had mentioned earlier that I wanted to make strings identical to a list of characters... so you can call car and cdr on a string. But if strings and symbols are identical, then that basically means symbols would be... a list of characters. So, symbols would be mutable. And (car 'foo) would be #\f. And symbols would no longer be interned... Pretty crazy stuff.

---

"To reiterate a bit, the reason I wouldn't use \ the way you do is that "\ " at the end of a symbol makes the space blend in with the whitespace around it. For an example other than this one, consider that putting such a symbol at the end of a line might introduce bugs when people remove all trailing whitespace in a file."

Yeah, I know. I just decided that for my particular use case, I was willing to accept the practical disadvantage in exchange for increased readability and one less character. To put it bluntly, I prefer the way "new\ " looks, compared to "|new |" I do agree that delimiters are overall better, though.

-----

1 point by rocketnia 5468 days ago | link

"totally inconsistent with the way that \ works everywhere else, including within strings"

Where else does it work, besides strings? Regexes? If you're talking about quoting code from other languages (not counting TeX and other not-so-C-inspired languages which already use backslash for other purposes :-p ), yes, it can be nice to have some consistency between layers of escaping.

Sometimes I really don't like the exponential growth that results from that:

  ""
  "\"\""
  "\"\\\"\\\"\""
  "\"\\\"\\\\\\\"\\\\\\\"\\\"\""
  "\"\\\"\\\\\\\"\\\\\\\\\\\\\\\"\\\\\\\\\\\\\\\"\\\\\\\"\\\"\""
  "\"\\\"\\\\\\\"\\\\\\\\\\\\\\\"\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"\\\\\\\\\\\\\\\"\\\\\\\"\\\"\""

Well, It's not quite as bad as that. If we have \x escapes, it can become quadratic:

  ""
  "\"\""
  "\"\\\"\\\"\""
  "\"\\\"\\\\x22\\\\x22\\\"\""
  "\"\\\"\\\\x22\\\\x5Cx22\\\\x5Cx22\\\\x22\\\"\""
  "\"\\\"\\\\x22\\\\x5Cx22\\\\x5Cx5Cx22\\\\x5Cx5Cx22\\\\x5Cx22\\\\x22\\\"\""

And \u escapes (supported by JSON) become quadratic in the same way:

  ""
  "\"\""
  "\"\\\"\\\"\""
  "\"\\\"\\\\\\\"\\\\\\\"\\\"\""
  "\"\\\"\\\\\\\"\\\\\\\\u0022\\\\\\\\u0022\\\\\\\"\\\"\""
  "\"\\\"\\\\\\\"\\\\\\\\u0022\\\\\\\\u005Cu0022\\\\\\\\u005Cu0022\\\\\\\\u0022\\\\\\\"\\\"\""

IMO, it's too bad we don't just have quadratic growth from the start:

  ""
  "\q\q"
  "\q\-q\-q\q"
  "\q\-q\--q\--q\-q\q"
  "\q\-q\--q\---q\---q\--q\-q\q"
  "\q\-q\--q\---q\----q\----q\---q\--q\-q\q"

However, the exponential growth is a bit easier to keep track of than these quadratic options, I think, and that's a bit more important.

What I really think could be cool is for every language to be Penknife-like, with string syntaxes accepting any nested square brackets inside. Then we'd get to spend almost no editing effort and yet have linear growth:

  s[]
  s[s[]]
  s[s[s[]]]
  s[s[s[s[]]]]
  s[s[s[s[s[]]]]]
  s[s[s[s[s[s[]]]]]]

---

"But... you did mention making strings and symbols identical... then we would just use "" to delimit symbols with strange characters in them."

That's why I brought it up. Akkartik mentioned not reserving |, and I don't like the symbol-string divide anyway, so I figured it could kill two birds with one stone.

Actually there's another option. We can take advantage of the string syntax to write odd symbols using some kind of #s"hey, I'm a symbol" reader macro... at the cost of giving up #. (We could also use #s[hey, I'm a symbol].)

---

"we'd be giving up mutable strings"

Well, I did mention that I use strings as though they're immutable and interned. I do this mainly because of 'is comparing strings by content, and because I like the thought that every string literal reuses the same instance, but I also just don't see the point of mutating strings.

---

"I wanted to make strings identical to a list of characters [...] So, symbols would be mutable."

Yeah, I wouldn't suggest doing symbol=string and string=list at the same time. It's pretty crazy to have mutable symbols, since that makes them more confusing to use as hash table keys. Given the choice, I'd ditch string=list first (and I'm sure you expected me to say that).

-----

1 point by Pauan 5468 days ago | link

"Actually there's another option. We can take advantage of the string syntax to write odd symbols using some kind of #s"hey, I'm a symbol" reader macro... at the cost of giving up #. (We could also use #s[hey, I'm a symbol].)"

No, you don't give up # because Racket's reader already uses # for various stuff. For instance, you can use #rx"foo(?=bar)" to write compiled regexps. Now, if we were writing a reader in Arc, then we'd need to decide whether to use # or not... which I guess we'd have to do anyways, if we were going to unreserve |

I think using #s is fine, except then I'd have to write #s"new " which is even worse than |new | In that case we might as well remove the distinction between strings and symbols...

---

"... but I also just don't see the point of mutating strings."

I think it makes sense to have mutable strings if you see them as a list of characters. But Arc doesn't treat them as a list of characters... so they're this weird data type that's sorta in between symbols and lists...

If Arc treated them as a list of characters, then mutability makes sense, and then we'd have to use `iso` rather than `is`, which also makes sense. But given the way Arc treats strings right now, I think it's better to see them as a special subtype (supertype?) of symbols, in which case I agree that mutability doesn't make sense.

---

"Yeah, I wouldn't suggest doing symbol=string and string=list at the same time."

But... but my beautiful, crazy, and absurd plans! How can you dismiss them so easily?! :P

---

"...since that makes them more confusing to use as hash table keys."

Why? We can already use mutable lists as hash table keys...

-----

1 point by rocketnia 5468 days ago | link

"No, you don't give up # because Racket's reader already uses # for various stuff."

Just because the cost is paid once rather than once for each feature doesn't mean it isn't a cost. If we didn't use # for anything else, we could use it in symbols too. (Penknife allows # in symbols, for instance.)

If we take Racket as a starting point, then # is a heavily overloaded escape character and pretty hard to get rid of. How about changing it to \? :-p Then characters would be written not as #\a but as \\a --or maybe \c"a", if we insist not to use \ for anything but escaping.

---

"Why? We can already use mutable lists as hash table keys..."

Are you saying that makes sense? XD

Here's a http://tryarc.org/ session:

  arc> (= foo (table))
  #hash()
  arc> (= bar list.2)
  (2)
  arc> (= foo.bar 'woo)
  woo
  arc> foo
  #hash(((2 . nil) . woo))
  arc> (= bar.0 3)
  3
  arc> foo.bar
  nil
  arc> (each (k v) foo prn.v)
  woo
  #hash(((3 . nil) . woo))
  arc> (each (k v) foo (prn foo.k))
  nil
  #hash(((3 . nil) . woo))
  arc> (= (foo list.2) 'boo)
  boo
  arc> (= bar.0 2)
  2
  arc> (each (k v) foo prn.v)
  boo
  woo
  #hash(((2 . nil) . woo) ((2 . nil) . boo))
  arc> keys.foo
  ((2) (2))
  arc> (each k keys.foo (prn foo.k))
  woo
  woo
  nil

This behavior seems to indicate that the table saves the hash of a key when it's entered and then doesn't update that hash when the key is mutated, and that it (probably) stores any keys with equal hashes so that they're checked for full equality in the order they were put in. However, the Racket docs just call the behavior "unpredictable" (http://docs.racket-lang.org/reference/hashtables.html), and that's just fine for a behavior that isn't here for any API-like reason, just here because it's an easier and more efficient implementation than more paranoid alternatives.

Anyway, I suspect you never meant mutable keys made that much sense, just that they make sense as long as we don't actually mutate them. :-p I'm fine with that, just like I'm fine with treating strings as immutable. (I guess one of the paranoid alternatives is to freeze anything used as a table key.)

---

That said, there's a completely different side to this topic I haven't gotten to: On Jarc, every table key is implicitly converted to a string for lookup purposes. I've brought up the point that it would be more consistent if Jarc simulated Racket 'equal? (which pg-Arc uses), but I don't really believe either of these things is the right thing to do, since they compare tagged values by their implementation content. So for my own purposes, I already consider it unpredictable what happens if I use non-symbol table keys, let alone mutable lists.

-----

1 point by Pauan 5468 days ago | link

"Just because the cost is paid once rather than once for each feature doesn't mean it isn't a cost. If we didn't use # for anything else, we could use it in symbols too. (Penknife allows # in symbols, for instance.)"

I know, which is why I mentioned that we would probably need to write an Arc reader in order to unreserve | And if we did that, we would then also need to decide whether to use # or something else.

---

"If we take Racket as a starting point, then # is a heavily overloaded escape character and pretty hard to get rid of. How about changing it to \? :-p Then characters would be written not as #\a but as \\a --or maybe \c"a", if we insist not to use \ for anything but escaping."

Using \ as an escape does seem better than using # to be honest. If we were to write a reader in Arc (which I support, by the way), I think it'd be interesting to try out \ rather than #

---

"Are you saying that makes sense? XD"

I like it better than Python's "you can only store hashable things in keys" approach. Or JavaScript's insane "all keys are coerced to a string, so all Objects are put into the same key" approach.

---

"Anyway, I suspect you never meant mutable keys made that much sense, just that they make sense as long as we don't actually mutate them. :-p I'm fine with that, just like I'm fine with treating strings as immutable."

Yeah, something like that. I suspect Racket behaves that way for performance reasons, but ideally I think it'd be nice if it did update the hashes, so that mutable keys would behave as expected.

---

"(I guess one of the paranoid alternatives is to freeze anything used as a table key.)"

That's basically Python's approach: only hashable things are allowed as keys in objects.

-----

2 points by rocketnia 5467 days ago | link

"I know, which is why I mentioned[...]"

Yeah, I know. ^_^

---

"I like it better than Python's "you can only store hashable things in keys" approach."

Well... you can only use hashable things as keys in Racket too. (How else would hash tables work?) Almost everything in Racket is hashable, but it's possible to make things that kinda aren't, like this:

  arc> ($:define-struct foo (dummy) #:property prop:equal+hash (list (lambda args #t) (lambda args (error "Can't primary hash this.")) (lambda args (error "Can't secondary hash this."))))
  #<void>
  arc> (= my-table (table))
  #hash()
  arc> (= my-foo $.make-foo!dummy-val)
  #<foo>
  arc> (= my-table.my-foo 2)
  Error: "Can't primary hash this."

This is using the 'equal? hash. I don't know if there's a way to make something in Racket that's unhashable with regard to 'eq? or 'eqv?.

You can get at these hash values directly using 'equal-hash-code and friends. http://docs.racket-lang.org/reference/hashtables.html

-----

1 point by Pauan 5466 days ago | link

"Well... you can only use hashable things as keys in Racket too."

Hm... odd, I remember in Python, that you weren't able to use things like lists (which are mutable) as keys... but I guess I remember wrong because I just tried it and it works. Not sure what the problem was that I had with Python's hash tables...

---

By the way, I feel like hash table keys should only use eq, not equal...

-----

1 point by rocketnia 5466 days ago | link

"By the way, I feel like hash table keys should only use eq, not equal..."

Racket has eq-based tables too those, and I wouldn't want 'equal? for weak table keys. However, if I want to use a list or something as a key, it's likely because I want to look something up using multiple values, in which case 'equal? is useful. (In those cases, the ability to intern something into an immutable data structure 'equal? to it would probably be just as useful....)

-----

1 point by Pauan 5468 days ago | link

By the way, I just realized something... symbols, when evaluated, are basically a variable reference... but strings are self-evaluating. So if strings == symbols... then what should (+ "foo" 1 2 3) do? Should it treat the "foo" as a variable reference, or a literal? Would we then need to use '"foo" instead?

So... because symbols evaluate to variables, and strings are self-evaluating, I don't really see a way to use the same syntax for both of them. If we decide that "foo" is equivalent to 'foo then that means we cannot have variable names with weird stuff in them.

On the other hand, if we decide "foo" is equivalent to foo, then that means we can no longer use string literals without quoting them, which is pretty funky. So any solution we choose will be right half the time.

Thus.. we would need to have a separate syntax for them, like using #s"foo" for symbols, and "foo" for strings...

---

Hm... crazy brainstorming: global symbols that don't refer to anything are self-evaluating. That would solve the problem at the cost of being really crazy and confusing. :P

In any case, if strings == symbols, then "foo" would basically just be a shorthand for '#s"foo" (note the quote)

-----

1 point by rocketnia 5468 days ago | link

"Would we then need to use '"foo" instead?"

That's pretty much what I used for my original example at http://arclanguage.org/item?id=14883. If I thought (quote ...) around strings was a deal breaker, the topic would have never come up. :-p

---

"Hm... crazy brainstorming: global symbols that don't refer to anything are self-evaluating."

Isn't that the same idea as http://arclanguage.org/item?id=13823? Or are you having this be determined at compile time rather than run time? That could be an interesting spin on things.

-----

2 points by Pauan 5468 days ago | link

"Isn't that the same idea as http://arclanguage.org/item?id=13823? Or are you having this be determined at compile time rather than run time? That could be an interesting spin on things."

Yes. To be more specific, it's the same idea as point #3 in that post. But it doesn't matter whether it's done at compile time or runtime. The end result is the same: if you have a string literal "foo" and somebody defines a global variable foo, then suddenly your program's behavior changes drastically. Which is why I called it "really crazy and confusing".

So, right now, my opinion is that "foo" should basically mean the same thing as (quote \s"foo") where \s"" is syntax for creating a symbol with odd characters in it. That approach isn't 100% ideal in every circumstance[1], but it should be overall the best in the general case.

There is one question, though: should symbols or strings be the primitive? In other words, should it be (isa "foo" 'sym) or (isa "foo" 'string) ?

Personally, though the term "string" is more familiar, I'm actually leaning toward sym. The word "string" is pretty confusing, when you think about it, but "symbol" is a very reasonable thing to call a sequence of characters.

---

As a side effect of making symbols eq to strings... we would also make this work:

  (coerce 'foo 'cons) -> (\\f \\o \\o)

I'm using \\f to mean the same thing as #\f by the way.

Hm... what if we removed chars from the language? What's the purpose of them, really?

---

P.S. Somewhat related: PHP apparently sometimes treats strings as variable references:

http://stackoverflow.com/questions/1995113/strangest-languag...

---

* [1]: I'd have to write "new " rather than new\ How horrible. :P

-----

2 points by rocketnia 5466 days ago | link

"But it doesn't matter whether it's done at compile time or runtime. The end result is the same: if you have a string literal "foo" and somebody defines a global variable foo, then suddenly your program's behavior changes drastically."

If it's done at compile time, it's a bit better: The program's behavior doesn't change at all if someone defines a global variable 'foo after your code has been loaded.

---

"So, right now, my opinion is that "foo" should basically mean the same thing as (quote \s"foo")..."

My first impression seeing reader syntaxes like \s"foo" is that the reader will recursively read "foo" and then convert the result somehow.

I guess it could read "foo" as (quote foo) and convert that result using 'cadr, lol. :-p

---

"The word "string" is pretty confusing, when you think about it, but "symbol" is a very reasonable thing to call a sequence of characters."

Huh, nice observation. ^_^ I think I've called strings "texts" sometimes, so mayhaps that's another option.

---

"I'm using \\f to mean the same thing as #\f by the way."

I see that as using \ in a non-escaping way. I don't have a problem with using it that way, but I don't have a problem with using it as a delimiter either.

---

"Hm... what if we removed chars from the language? What's the purpose of them, really?"

Would you have a symbol be a sequence of length-one symbols, and have every length-one symbol be its own element? Anyway, I don't have any opinion about this. :-p

---

"P.S. Somewhat related: PHP apparently sometimes treats strings as variable references"

Jarc does the same thing if you call a symbol. I don't really have an opinion about this either.

-----

1 point by Pauan 5466 days ago | link

"Huh, nice observation. ^_^ I think I've called strings "texts" sometimes, so mayhaps that's another option."

Sure, but traditionally Lisp has used the term "symbol", and even languages like Ruby have picked up on it. And there's another thing. Symbols in Lisp are often used as variable names. In that context, the word "text" doesn't make much sense, but the word "symbol" still makes perfect sense. So I still vote for "symbol", even though I think "text" is more reasonable than "string".

---

"Would you have a symbol be a sequence of length-one symbols, and have every length-one symbol be its own element? Anyway, I don't have any opinion about this. :-p"

Yes. :P At least, when coerced to a list. This would let us get rid of two similar-but-not-quite-the-same data types: strings and chars. It annoys me that I have to use (tokens foo #\newline) rather than (tokens foo "\n")

I don't really see much purpose or benefit of having a distinction between chars and strings... Python and JavaScript get by just fine with only strings, for instance.

In addition, I find myself juggling between symbols and strings a lot... converting to a sym when I want Arc to see it as an identifier, and as a string when I want to do things like index into it, or coerce it to a list... or when Arc code requires a string rather than a sym, etc... The more I think about it, the better it seems to unify strings/symbols/characters.

-----

2 points by rocketnia 5466 days ago | link

I thought the point of the term "symbol" was to signify something that was used as a name. I do think it's a tad more evocative to call every string a symbol, but it feels a bit like calling every cons cell a form.

Inasmuch as I have any desire to be traditional, I'll call strings strings. :-p

-----