Arc Forumnew | comments | leaders | submit | almkglor's commentslogin
2 points by almkglor 6420 days ago | link | parent | on: Are strings useful ?

I personally think that strings and symbols should be separate, largely because of their different uses.

That said, I did recently write an app which used symbols as de facto strings, and the text file format being used as a configuration/task description file by the user was just an s-expr format. The app wasn't written in Lisp or anything like it, which was why I settled for using symbols as strings (to make it easier on my readerfunction - I just didn't have a separate readable string type).

Given that experience, well, I really must insist that having separate strings and symbol types are better. In fact in the app I wrote it for, the config/taskdesc file was just a glorified association list (where the value is the 'cdr of the list, not the 'cadr)!

As for strings being lists/arrays of characters, yes, that's a good idea. We might hack into the writer and have it scan through each list it finds, checking if all elements are characters, and if so just print it as a string. We might add a 'astring function which does this checking (obviously with circular list protection) instead of [isa _ 'string].

-----

2 points by bogomipz 6412 days ago | link

I think the strongest reason for separate strings and symbols is that you don't want all strings to be interned - that would just kill performance.

About lists of chars. Rather than analyzing lists every time to see if they are strings, what about tagging them? I've mentioned before that I think Arc needs better support for user defined types built from cons cells. Strings would be one such specialized, typed use of lists.

Also, how do you feel about using symbols of length 1 to represent characters? The number one reason I can see not to, is if you want chars to be Unicode and symbols to be ASCII only.

-----

2 points by sacado 6412 days ago | link

Symbols, ASCII only ? No way, I'm writing my code in French, and I'm now used to calling things the right way, i.e. with accents. "modifié" means "modified", "modifie" means "modifies", that's not the same thing, I want to be able to distinguish between both. Without accents, you can't.

Furthermore, that would mean coercing symbols into strings would be impossible (or at least the 1:1 mapping would not be guaranteed anymore).

-----

2 points by stefano 6412 days ago | link

From the implementation point of view representing characters as symbols is a real performance issue, because you would have to allocate every character on the heap, and a single character would then take more than 32 bytes of memory.

-----

2 points by sacado 6412 days ago | link

I think that's an implementation detail. You could still somewhat keep the character type in the implementation, but write them "x" (or 'x) instead of #\x and making (type c) return 'string (or 'sym).

Or, if you take the problem the other way, you could say "length-1 symbols are quite frequent and shoudn't take too much memory -- let's represent them a special way where they would only take 4 bytes".

-----

1 point by stefano 6411 days ago | link

This would require some kind of automatic type conversions (probably at runtime), but characters-as-symbols seems doable without the overhead I thought it would lead to.

-----

2 points by almkglor 6421 days ago | link | parent | on: MySQL needed?

I suggest that you study lib/file-table.arc .

I introduced file-table.arc here: http://arclanguage.com/item?id=3762

The series of "create your own collection":

http://arclanguage.com/item?id=3595 Suggest PG: Settable function objects

http://arclanguage.com/item?id=3698 Create your own collection in Arc: settable functions now implemented on arc-wiki.git

http://arclanguage.com/item?id=3762 Create your own collection: use directories as if they were tables with file-table

http://arclanguage.com/item?id=3858 Create your own collection: bidirectional tables

http://arclanguage.com/item?id=5254 Create your own collection: cached-table

You might also be interested in Arki, the wiki in Anarki, which is the first application that used file-table (and treeparse, and cached-table, and many other Anarki-only extensions) http://arclanguage.com/item?id=5053 and http://arclanguage.com/item?id=5227

-----

1 point by almkglor 6421 days ago | link | parent | on: Anarki tut: urexample fails

Hmm. It's the /tmp/ thing. I'll try to look into this maybe 9-10 hours from now, unless someone beats me to it.

-----

1 point by almkglor 6422 days ago | link | parent | on: I think its called introspection...

None yet in Arc, not sure about mzscheme.

-----

2 points by almkglor 6422 days ago | link | parent | on: arc2c update

re GC: this looks interesting: http://use.perl.org/~chromatic/journal/36212?from=rss

Personally I think memory should be managed by refcounts, and GC only when the cyclic garbage adds up. However adding refcounts is somewhat harder since every state-mutating 'set, 'sref, 'scar, 'scdr, and 'cons needs to decrement the current obj refcount and increment the new obj refcount.

I also suppose that currently the time taken by GC isn't actually very big yet, since all we've been compiling are a few dorky simply bits of Arc code.

-----

3 points by kens 6422 days ago | link

I thought the big problem with refcounting was circular data structures. (Arc supports those, but I haven't seen anything actually use them.)

Which brings to mind the Lisp koan (http://www.lisp.org/humor/ai-koans.html):

One day a student came to Moon and said, "I understand how to make a better garbage collector. We must keep a reference count of the pointers to each cons." Moon patiently told the student the following story-

    "One day a student came to Moon and said, "I understand how to make a better garbage collector...

-----

1 point by almkglor 6422 days ago | link

Which is why there's a backup GC. The point however is that circular data structures are pretty rare anyway. Always optimize for the common case.

-----

1 point by sacado 6422 days ago | link

hmm, interesting. I'm not really fond of refcounting. It makes FFI or C extensions really hard. That's what I don't like with Python's FFI. You always have to think : "do I have to increment the refcount ? decrement it ? Leace it alone ?" If you don't do it well, you have random bugs. The sad story is that Python makes C programming harder than it already is.

On the opposite, palying with mzscheme's or Lua's FFI is a real pleasure : you don't have to bother with GC. You even have (sometimes) your malloced object collected for you.

But if we can cetnralize the refcount operations in a single (or a very small number) of places, I'm OK... Their talking about stack_push / stack_pop is rather inspiring...

For information : On a GC-relativey-intensive program (mainly calculating (fib 40), which generates a lot of garbage), and with a heap of 50 million possible references, for a total time of 228000 ms, i got the following GC info :

  total time : 177ms, nb cycles : 17, max time : 42ms, avg time : 10 ms
That's far from perfect, of course, but it doesn't look so bad to me.

Btw, doctstrings are a real performance killer : they are useless, but are allocated on the heap and fill it up really quick (ah, recursive functions...). We should add something in the code removing immediate values in functions.

-----

3 points by almkglor 6422 days ago | link

> Btw, doctstrings are a real performance killer : they are useless, but are allocated on the heap and fill it up really quick (ah, recursive functions...). We should add something in the code removing immediate values in functions.

Really? You've tried it? Because docstrings are supposed to be removed by the unused global removal step.

Edit: I just did. This is a bug.

Edit2: Fixed and on the git.

-----

1 point by stefano 6422 days ago | link

> Btw, doctstrings are a real performance killer : they are useless, but are allocated on the heap and fill it up really quick (ah, recursive functions...

Are you saying that you alloc a docstring at every function call?

-----

1 point by sacado 6422 days ago | link

Well, for the moment, yes. Every object appearing in the program has to be allocated (it's not an optimizing compiler yet). Useless objects are not detected, so every time the compiler sees a string, it generates code to allocate it, and it is freed on the next GC cycle. Every time, you call the function, that code is executed. Well, that's an easy optimisation, so I'll work on it very soon I guess.

-----

1 point by stefano 6422 days ago | link

Yes, it's not difficult. You just have to find all constant values, create a global var for each constant, assign the const value to the global var and substitute the occurence of the constant with the global var name.

-----

1 point by binx 6422 days ago | link

Refcounting performs a lot worse than a generational gc. When dealing with many deep data structures, it becomes more worse. And a simple generational gc is not very hard to implement.

-----

2 points by almkglor 6422 days ago | link

Well, then: how about a limited form of refcounting, solely for closures used as continuations?

We could even leave off true refcounting, instead just setting a flag on a closure-as-continuation if it's used in 'ccc

-----

1 point by almkglor 6422 days ago | link | parent | on: Learningcurve of LISP?

LOL. Considering that my boss thinks I'm an engineer (and hired me as such), this really hurts!

-----

1 point by absz 6422 days ago | link

Eh. It's attitude, not job description :) And anyway, you should probably take that with a relatively enormous grain of salt, as small sample sizes aren't conducive to accurate data.

-----


LOL

-----

1 point by almkglor 6423 days ago | link | parent | on: arc2c update

  -#define CAR() { if (TOS() != NILOBJ) {pair * p = (pair *) POP(); PUSH((obj)(p->car)); }}
  -#define CDR() { if (TOS() != NILOBJ) {pair * p = (pair *) POP(); PUSH((obj)(p->cdr)); }}
  +#define CAR() { pair * p = (pair *) POP(); PUSH((obj)(p->car)); }
  +#define CDR() { pair * p = (pair *) POP(); PUSH((obj)(p->cdr)); }
Just wondering: Why was this changed? Shouldn't it be that (car nil) == nil?

-----

1 point by sacado 6423 days ago | link

oh, oh, that's a mistake, sorry. I forgot to merge this with my own code. I'll do it on the next commit, if you don't do it before me...

-----

1 point by almkglor 6423 days ago | link

I'll change it back then ^^

Edit: done

-----

2 points by almkglor 6423 days ago | link | parent | on: Will/Can Arc create a LISP-revival?

Where pg fails.... --warning-blatant-self-promotion-- Anarki! Whee!

Hmm. Probably need a "Report on Anarki" as a spec of the standard Anarki, particularly 'call* and 'defcall, which may very well be the most important extension in Anarki.

-----

25 points by almkglor 6423 days ago | link | parent | on: Will/Can Arc create a LISP-revival?

Let me tell you a story about a language called "Stutter". It's a full m-expr language. Function calls have the syntax f[x], math quite sensibly uses a + b * c notation, etc. The only weird thing is that assigning a variable uses the set special form - set[x 42] - because of some weird stuff in the parser that is only tangentially related to our discussion.

Now I'll introduce to you a special syntax in Stutter. In Stutter, () introduces a data constant called an array, which is just like any other sequential data collection in any other language. So it's possible to do something like this in Stutter:

  set[var  (1 2 3 4)]
Stutter is a dynamically-typed language, and arrays can contain strings, numbers, or even arrays:

  set[var (1 "hello" ("sub-array" 4 ))]
Let me introduce to you something new about Stutter. In Stutter, variable names are themselves data types. Let's call them labels (this is slightly related to why we have a set[] special form). And like any other data type, they can be kept in arrays:

  set[var (hello world)]
Remember that () introduces an array constant. So (hello world) will be an array of two labels, hello and world. It won't suddenly become (42 54) or anything else even if hello is 42 and world is 54. The variable's name is a label, but the label is not the variable (at least not in the array syntax; it was a bit of a bug in the original implementation, but some guys went and made code that used labels in that manner and it got stuck in the language spec).

The array is just like any other array in any other language. You can concatenate them like so:

  set[var append[(array 1) (array 2)]]
  => now var is (array 1 array 2)
You can add an element in front of it like so:

  set[var cons[1 (some array)] ]
  => now var is (1 some array)
Array access syntax is not very nice, but it does exist:

  nth[1 (this is the array)]
  => this returns the label "is"
You could create an empty array with:

  nil[]
And you could create an array with a single element with:

  array["hello"]
  => returns the array ("hello")

Oh, and remember those guys who abused the labels in array syntax I told you about? Well, they created a sort-of Stutter interpreter, in Stutter. However, they sucked at parsing, so instead of accepting files or strings or stuff like that, their Stutter interpreter accepted arrays. They were going to make the parser later, but they just really sucked at parsing.

They called their Stutter interpreter "lave", because they were hippie wannabes and were tripping at the time they were choosing the name. It was supposed to be "love", but like I said, they were tripping.

Of course, since lave accepted arrays, it couldn't get at the nice f[x] syntax. So they decided that the first element of an array would be the function name as a label. f[x] would become the array (f x).

lave had some limitations. For example, instead of Stutter's nice infix syntax a + b, lave needed (plus a b). Fortunately lave included a plus[x y] function which was simply:

  define plus[x y]
   x + y
So how come these guys became so influential? You see, Stutter's BDFL is a bit of a lazy guy. He is so lazy that he didn't even bother to fix up the syntax for if-then-else. In fact, there was no if-then-else. What was in fact present was a ridiculously ugly cond syntax:

  cond[
    { x == y
         ...your then code...
    }
    ;yes, what can I say, Stutter's BDFL is lazy
    { !(x == y)
         ...your else code...
    }
  ]
lave's creators pointed out that you could in fact represent the above code, in lave-interpretable arrays, as:

  (cond
    ( (eq x y)
      ...your then code...)
    ( (not (eq x y))
      ...your else code...))
Then they created a new Stutter function which would accept 3 arrays, like so:

  if[
    (eq x y)
    (...your then code...)
    (...your else code...)]
And then it would return the cond array above.

It looked like this:

  define if[c then else]
    append[
        (cond)
        cons[c
          array[then]]
        cons[ append[(not) array[c] ]
          array[else]]]
You could then use an if-then-else syntax like this:

  lave[
    if[ (eq x y)
         (...your then code...)
         (...your else code...)
     ]
  ]
Then they thought, hmm, maybe we can integrate this into our lave function. So they wisely decided to create a new feature in lave, called "orcam". I think it was supposed to be "okra", but unfortunately I asked them about it while they were tripping, so maybe I just got confused.

Basically, you could tell lave that certain Stutter functions would be treated specially in their lave-syntax. These functions would have the "orcam" property set in some private data of lave. Instead of just running the function, lave would extract the array components, pass them to the function, and then run whatever array that function returned. So you could simply say:

  lave_set_orcam_property[(if)]
  lave[
   (if (eq x y)
      (...your then code...)
      (...your else code...)
   )
  ]
Because of this, people started creating all sorts of orcam-property-functions. For example, there was only a while loop in the language (lazy, lazy). Someone created an orcam-property-function called for:

  define for[s c u code]
    append[ (begin) //begin{} is just a compound statement
        cons[ s
            append[(while)
                cons[c
                    cons[ code array[u]]
                ]
            ]
        ]
    ]
So you could do:

  for[(set i 0) (less i 42) (preincrement i)
    (begin (print i))]
And it would look like:

  (begin
    (set i 0)
    (while (less i 42)
       (begin (print i))
       (preincrement i)
    )
  )
So if you wanted something like a C for loop you could do:

  lave_set_orcam_property[(for)]
  lave[
    (for (set i 0) (less i 42) (preincrement i)
        (begin
             (print i)
        )
    )
  ]
It was particularly difficult to create nice orcam-property-functions, but it was easier than trying to get Stutter's BDFL to move.

Soon after lave's creators added orcam-property-functions, Stutter's BDFL decided to do something about the language. He was always bothered about the bug in Stutter array syntax where something like (hello world) would return, well, the array (hello world), instead of sensibly returning an array with the values of hello and world. So he introduced the `, syntax. An array constant prefixed with "`" would have a special meaning. It would not be completely a constant. Instead, when it saw a "," Stutter would evaluate the expression following the comma and insert that element into the array being created. So `(,hello ,world) could now become (42 54), if hello was 42 and world was 54.

Some of the top ocam-property-function writers realized that Stutter's new `, syntax would really, really help. For example, instead of the kludgy, virtually unreadable if code, you could just write:

  define if[c then else]
   `(cond
        (,c ,then)
        ((not ,c) ,else)
    )
And you could also define for as:

  define for[s c u code]
     `(begin
          ,s
         (while ,c
              ,code
              ,u
          )
      )
However, you weren't limited to just the `, syntax. It was usually the best choice, but if there was a lave-expression array you wanted that couldn't exactly be given by "`,", you could still use the good old append[] and cons[] functions. In fact, for really complex lave-expression arrays, a combination of the new `, syntax and the old append[] and cons[] functions would do quite well.

Because of this, creating orcam-property-functions became easier and their power skyrocketed. Other languages which automatically evaluated variable labels in their arrays couldn't imitate it (so what was originally a bug - that (hello world) did not evaluate the variables hello and world - became a feature). Worse, those other languages' arrays sometimes couldn't themselves contain arrays, or even have different types.

Their arrays just weren't powerful enough to hold code, so other languages never managed to create a powerful orcam-property syntax.

Eventually, people were writing Stutter programs like so:

  lave[
   (define (fn x)
      (if (less x 1)
          1
          (times x (fn (minus x 1)))
      )
   )
  ]
And so, Stutter's BDFL decided to be even more lazy and simply wrote Stutter as:

  while[true]{
    print[ lave[ read[] ] ]
  }
so that everyone didn't have to keep writing "lave[]"

-----

More