Arc Forum | almkglor's comments

Arc Forum

new | comments | leaders | submit | almkglor's comments

4 points by almkglor 6365 days ago | link | parent | on: Poll: Which library should we focus on first?

Interesting, you actually defined it mathematically ^^

It might also be useful to create a lib/mz-fast-math.arc which just imports 'sin, 'cos, 'tan from mzscheme using '$

-----

1 point by jmatt 6364 days ago | link

I think that's a good idea.

-----

1 point by almkglor 6365 days ago | link | parent | on: SNAP: Shared Nothing Arc Processes the VM discussi...

Err, no. Hehe. Although I believe some Lisps actually implemented their memory managers (GC!) in Lisp itself ^^, but I think it was back when Lisp was more imperative (i.e. when (prog ...) with labels was more popular).

Hmm. Interesting idea, although I can't quite imagine how to do this.

-----

1 point by shader 6354 days ago | link

"interesting" doesn't convey much information. Does it mean positive interesting, as in you'd be interested in trying it? Or does it mean negative interesting, as in your interested in why I would want such a crazy feature in the first place? :)

-----

1 point by almkglor 6354 days ago | link

> Does it mean positive interesting, as in you'd be interested in trying it?

This one. If it were the negative interesting, I'd ask you directly "what for?" ^^

Hmm. Although it might be difficult to do with a copying GC.

As an aside, I'm thinking of exposing the "sharedvar" objects into arc, basically they would be called the "container" type. An arc-on-mzscheme implementation would be:

  (require "lib/settable-fn.arc")
  (def container ((o val))
    (add-attachments
      '= (fn (v) (= val v))
      (annotate 'container
        (fn () val))))

Basically it would be used this way:

  (= foo (container 0))
  (foo)
  => 0
  (= (foo) 42)
  (foo)
  42

Basically assigning to it would be equivalent to setting the pointer, and (foo) would be equivalent to (* foo) in C.

I thought it might be useful in a C FFI (when I get there in, say, T=inf). If a C function accepts a int*, for example, where it would put an int, you would pass in a container, which would automagically get the int.

-----

1 point by shader 6354 days ago | link

Sounds like a reasonable syntax. Would the value in a container in a container be accessed with ((foo))? It fits, but it might get old after a while.

Also, how similar would these "containers" be to pointers? It doesn't seem like it would be as useful, because I doubt you can increment/decrement a container.

So, what are the problems we might encounter if we actually use this?

-----

1 point by almkglor 6354 days ago | link

The value pointed to by the container would be accessed with (foo), just a single layer of parens. The container's pointer would be mutated as (= (foo) ...) though. Oh well.

Edit: Oh, you said "container in a container". Yes, ((foo)) it is.

> It doesn't seem like it would be as useful, because I doubt you can increment/decrement a container

This functionality would be in a scanner, although a scanner, technically, would be just an iterator.

For instance, I fully intend to implement string scanners and input file scanners on the C++-side in SNAP; they would be somewhat "safe" wrappers around C++ iterators, which are just a generalization on the pointer. so (car foo) would be *foo and (cdr foo) would be (foo + 1).

> what are the problems we might encounter if we actually use this?

None at all, if you're talking about just containers and scanners. A container is just a reference to an Arc object. A scanner would just be an Arc-safe wrapper around an iterator.

-----

1 point by almkglor 6304 days ago | link

http://okmij.org/ftp/Scheme/pointer-as-closure.txt

-----

1 point by almkglor 6367 days ago | link | parent | on: Rethinking macros

You know, what I'm really thinking is that placing the annotations in the argslist looks horrible. The body itself is not bad, but the argslist is really, really horrible.

How about this instead:

  (mac while (cond . body)
    (w/hof ( () ,cond
             () ,@body)
      ((afn ()
         (when (cond)
           (body)
           (self))))))

Where w/hof expands to:

  (do
    ; inefficient since we set it everytime the macro is called, but...
    (set gs42 ; set so it won't say "redefined"
      (fn (cond body)
        ((afn ()
           (when (cond)
             (body)
             (self))))))
    `(gs42 (fn () ,cond) (fn () ,@body)))

The above strikes me as being easier to implement.

Of course unfortunately it won't work properly for variable capture.

Edit: Hmm, maybe it'll actually be able to support some variable capture:

  (mac awhile (v cond . body)
    (w/hof ( ()   ,cond
             (,v) ,@body)
      ((afn ()
         (let tmp (cond)
           (when tmp
             (body tmp)
             (self)))))))

HOF-style anaphora:

  (mac afn (params . body)
    (w/hof ( (self ,@params) ,@body )
      (let self nil
        (= self
           (fn rest
             (apply body self rest)))
        self)))

-----

1 point by rkts 6367 days ago | link

Well, we can cut down on the parentheses in the parameter list. This looks ok to me:

  (defho while ([test] . [f])
    ...)

  (defho awhen (x . [f it])
    ...)

As to w/hof, it may be easier to implement, but it seems kind of hard to use.

-----

1 point by almkglor 6367 days ago | link

> As to w/hof, it may be easier to implement, but it seems kind of hard to use.

Then perhaps you can define 'defho in terms of 'mac and 'w/hof, maybe? The Arc solution: throw more macros at it.

Regarding (x . [f it]) :

In canonical arc2, [f it] is (fn (_) (f it)), so (x . [f it]) becomes (x fn (_) (f it)) . In Anarki, [f it] is (make-br-fn (f it)), so (x . [f it]) becomes (x make-br-fn (f it))

This makes the use of [] problematical.

'w/hof is trivially implementable, 'defho much less so due to the syntax.

w/hof is also more general, at the expense of being more talkative.

In fact it might be better to do things this way:

  (mac while (cond . body)
    (w/hof (cond (fn () ,cond)
            body (fn () ,@body))
      ((afn ()
         (when (cond)
           (body)
           (self))))))

This allows a few shortcuts in anaphora:

  (mac aif (cond then (o else))
    (w/hof (cond ,cond
            then (fn (it) ,then)
            else (fn () ,else))
      (if cond
          (then cond)
          (else))))

-----

1 point by rkts 6366 days ago | link

> 'w/hof is trivially implementable, 'defho much less so due to the syntax.

Oh. I haven't bothered to learn how to hack the syntax in Arc, so I'll take your word on that.

-----

1 point by almkglor 6367 days ago | link | parent | on: Poll: Priorities

http://arclanguage.com/newpoll

Here's a list started by stefano btw:

http://github.com/nex3/arc/wikis/libraries-todo-list

-----

1 point by almkglor 6367 days ago | link | parent | on: Rethinking macros

> Now, what did I forget that makes this obviously a bad idea?

Interesting, so we can get something that runs at compile time, evals the args (which we presumably evaluate in the runtime environment) and returns a value.

Oh, you can't get the runtime environment at compile time yet? Hmm.

> If I understand it right, the [] syntax requires a function of a certain signature. Interesting idea.

No, it transforms it to a function with that signature:

  (defho foo ([hmm (it)])
    hmm)
  =>
  (mac foo (hmm)
    `(foo-internal (fn (it) ,hmm)))
  (def foo-internal (hmm)
    hmm)

  (macex '(foo it))
  =>
  (foo-internal (fn (it) it))

-----

1 point by shader 6367 days ago | link

lol, I realize that you could try and apply those three at the same time, but the "compile time" modifier was supposed to be used by the programmer to tell arc that it was safe to do just such a thing. Other wise, it would be evaluated at run time. So while you could use those three at once, it would like you say have an undefined consequence.

Basically, I thought that it might be useful to have functions that didn't eval all their args. At that point, they're only a few steps away from macros, and I thought it would be cool if we could unify them.

>transforms to a function Just shows I didn't understand it right. :) That makes more sense, but I'm still not sure I understand it. Oh well.

Does your p-m:def support matching based on function sigs?

-----

1 point by almkglor 6367 days ago | link

> Basically, I thought that it might be useful to have functions that didn't eval all their args.

  (def foo-internal (x y z)
    (do-something x y z))

  (mac foo (x y z)
    ; eval x, don't eval y, eval z
    `(foo ,x ',y ,z))

> Does your p-m:def support matching based on function sigs?

Yes, although I'm not sure I quite understand what you're asking.

For example factorial works out like this:

  (p-m:def factorial
    "a factorial function"
    (1)  1
    (N)  (* N (factorial:- N 1)))

-----

1 point by shader 6367 days ago | link

Actually, what I meant by match on function sigs was: while writing a higher order function that wants a function as an input, can you require that it be a function that's signature implies one var, two vars, a list etc?

As to the non-evaling functions, I suppose we could have a macro that defined a macro / function combination; the resulting macro would expand into a call to the function with ' before each item in the arg list.

Oh, and I think that `(foo ,x ... is supposed to be `(foo-internal ,x ... Otherwise you have an infinitely recursive macro. :)

-----

1 point by almkglor 6367 days ago | link

> can you require that it be a function that's signature implies one var, two vars, a list etc?

Nope ^^

> Oh, and I think that `(foo ,x ... is supposed to be `(foo-internal ,x ... Otherwise you have an infinitely recursive macro. :)

Yep ^^

-----

1 point by almkglor 6367 days ago | link | parent | on: Rethinking macros

Well, if the final syntax doesn't actually change (i.e. higher-order function while still looks like macro while) then what difference does it make just how, exactly, the base system works?

Other than to, say, make it difficult for hackers who've gotten used to `(, ,@) syntax?

Personally, I don't quite get what the [] and *v things are supposed to mean. Can you explain them?

As an aside, it might be better to implement it after all - I honestly don't see much difference for the end-user (you're still using macros anyway), but I do see some difficulty for the macro writer.

-----

1 point by rkts 6367 days ago | link

The point is that it makes a large class of macros easier to define. Really, compare my examples with the definitions in arc.arc; the former are much shorter and easier to understand than the latter.

My examples are from the core language, obviously, but the suggestion would benefit anyone wanting to extend the language with their own macros.

Placing a parameter inside brackets just says that the argument should be expanded into an fn. The thing after the parameter name is the parameter list that the fn should have. As to the asterisk, I don't know how to explain that better than I already have.

-----

1 point by almkglor 6367 days ago | link

Well, it would be a good addition to Anarki if you can actually implement it, and can prove your point that it's easier to use than using macros - obviously to do that, you can't just rewrite existing code, you have to write new code that uses your version of macros and see if it does indeed work better.

Further, using arc.arc as a basis is pointless; the reason those macros are there is because it's a waste of time to rewrite them. Most of the macros I've written in Arc depend on new syntax, not just rearranging expressions: look at p-m: and w/html , which I doubt are possible in ordinary HOF style.

So yes: while it certainly looks interesting, it doesn't seem to leverage the "common enough" theme quite enough.

Also, I somehow feel that what you really want are hygienic macros, which would look much more similarly to what you're doing.

-----

1 point by rkts 6367 days ago | link

Yes, hygienic macros could work too, provided we devise a good way to hack anaphora onto them. Another possible solution would be to just have a very concise syntax for function literals. Either way, I think that the rampant use of (unhygienic) macros where they are not necessary is a problem and needs to be addressed somehow. Especially since Arc is a Lisp-1.

-----

2 points by almkglor 6367 days ago | link

> hack anaphora onto them

Why not just leave unhygienic macros for the anaphora?

> Another possible solution would be to just have a very concise syntax for function literals.

Bingo. cref the discussion on currying some months back.

-----

1 point by rkts 6367 days ago | link

Because unhygienic macros are a pain in the ass. Surely I'm not the only one who thinks this?

-----

1 point by applepie 6367 days ago | link

Maybe you'd like them more if you called them "macros" instead of "unhygienic macros" ;)

No, really, macros, being essentially compilers, give you enough power to build everything you'd ever want into Lisp, including "hygienic" macros, and even to build facilities to write hof-like things in less painful ways.

Maybe they're a pain in the ass if you don't "go meta", in the same way computers are a pain in the ass if you don't build OSes and compilers first.

-----

1 point by stefano 6367 days ago | link

The real point in favor of unhygienic macros is that they are less constraining and, personally, I find them easier to write and read than hygienic macros. I don't find a bad idea to have both hygienic macros and unhygienic macros.

-----

1 point by rkts 6366 days ago | link

Sigh...

Obviously I like unhygienic macros when they are necessary. The problem is, I keep writing higher-order functions and getting tired of typing the "fn" over and over, and then I have to convert the function to a macro, making it twice as long and hard to read. Hygienic macros help, but they still are not as easy to write as plain functions.

I know I'm not the only person who has this problem, but maybe I'm the only one who realizes I have this problem. I see people on the Internet raving about the AMAZING POWER of macros, and most of their examples are just higher-order functions with some small cosmetic changes. Most of the macros in Arc are of the same kind.

I'm not denying that macros are powerful. I just think there is a gross inefficiency in using them where you shouldn't have to, just because of minor syntactic concerns.

I wanted to solve this with a short, clean syntax for function literals, but I haven't been able to come up with one and neither, apparently, has anyone else. So instead I decided to try something that would generate macros out of HOFs.

I thought this would be evident from my post, but apparently it wasn't.

-----

1 point by shader 6366 days ago | link

Maybe we could use { args | body } ? I don't think the braces are taken.

Now, maybe that's a bad idea; instead, we could redefine the brackets, so that a pipe divides args and body, and if it doesn't have a pipe, it defaults to binding _ ? I don't know how hard that would be, or some variation on the concept, but it would be a bit shorter than (fn (args) (body)), if you don't want to type that all the time.

And how exactly does 'w/hof work, as defined so far? And if it's "standard" now, why not just implement it?

-----

3 points by almkglor 6366 days ago | link

Since Anarki defines [ ... ] as (make-br-fn ...), it's actually possible to have the syntax:

  [params || body]

Which would be:

  (make-br-fn (params || body))

(The double bar is needed because a single | is reserved for weird symbols)

By simply redefining make-br-fn, you can redefine the syntax of [ ... ] to an extent

-----

1 point by shader 6366 days ago | link

I wondered if redefining [...] might be possible. It seems to me that the new double bar syntax is practically a super-set of the old one: if it doesn't have the double bar, just treat it like the old bracket function and define _ to be the argument; otherwise use the argument list provided. Including an empty list, I would hope.

Any word on how hard that would be?

-----

1 point by almkglor 6366 days ago | link

  (let old (rep make-br-fn)
    (= make-br-fn
       (annotate 'mac
         (fn (l)
           (if (some '|| l)
               (do
                 your-stuff)
               (old l))))))

Be careful of supersets: someone's code might unexpectedly break ^^

-----

1 point by shader 6366 days ago | link

Isn't that to be expected in an evolving open source language :)

Do you think || is the best choice, or something else?

-----

2 points by rkts 6366 days ago | link

I think it's a bad choice, personally. I'm not crazy about the single pipe either, but || is awful.

Tangent: this may be a dumb question, but do we really need the pipe character for symbols? I know I've never used it. Why not disallow spaces (and the like) in symbols, and free the pipe for new syntax?

-----

2 points by shader 6366 days ago | link

If you don't like the pipe, then recommend something :)

Other possibilities, in no particular order:

  [ # ]
  [ - ]
  [ = ]
  [ -> ]
  [ : ]
  [ => ]
  [ > ]
  [ ~ ]
  [ % ]
  [ ! ]
  [ $ ]
  [ ^ ]
  [ & ]
  [ * ]
  [ @ ]
  [ + ]
  [ | ]
  [ || ]
  [ ? ]

Most of those are either bad looking or already taken. Anything stand out as a good / ok / not bad choice?

-----

2 points by almkglor 6366 days ago | link

:, =>, and -> don't look bad.

# can't be redefined

Let's try some mockups:

  [ a b c :
    (/ (+ (- b) (sqrt (+ (* b b) (* 4 a c))))
       (* 2 a))]

  [: (thunk this)]

  [ a b c ->
    (/ (+ (- b) (sqrt (+ (* b b) (* 4 a c))))
       (* 2 a))]

  [-> (thunk this)]

  [ a b c =>
    (/ (+ (- b) (sqrt (+ (* b b) (* 4 a c))))
       (* 2 a))]

  [=> (thunk this)]

-----

1 point by shader 6351 days ago | link

So, did we ever make a decision about this? Does someone who knows more than I do about this want to implement it?

Also, is there a way to compose or nest these lambda shortcuts? Or would that make this almost impossible to implement?

-----

1 point by almkglor 6351 days ago | link

Nesting doesn't seem impossible: the reader, I think, will handle nesting as:

  [foo [bar]]

  (make-br-fn (foo (make-br-fn (bar))))

As for implementation, it's easy:

  (given ; this gives us access to the old
         ; implementation of [] syntax; it
         ; is used when we don't find the
         ; separator
         old (rep make-br-fn)
         ; use a variable to easily change
         ; the separator
         separator ': ;for experimentation
    (= make-br-fn
       ; a macro is just a function that has
       ; been tagged (or annotated) with the
       ; symbol 'mac
       (annotate 'mac
         ; the reader reads [...] as
         ; (make-br-fn (...))
         (fn (rest)
               ; find the separator
           (if (some separator rest)
               ; note the use of the s-variant givens
               ; the "s" at the end of the name of givens
               ; means that the variables are specifically
               ; bound in order, and that succeeding variables
               ; may refer to earlier ones
               (givens ; scans through the list, returning
                       ; an index for use with split
                       ; (no built-in function does this)
                       scan
                       (fn (l)
                         ((afn (l i)
                            (if (caris l separator)
                                i
                                (self (cdr l) (+ i 1))))
                          l 0))
                       ; now do the scan
                       i (scan rest)
                       ; this part destructures a two-element
                       ; list
                       (params body)
                         ; used to get around a bug in split
                         (if (isnt i 0)
                             (split rest i)
                             (list nil rest))
                 ; it just becomes an ordinary function
                              ; body includes the separator,
                              ; so we also cut it out
                 `(fn ,params ,@(cut body 1)))
               ; pass it to the old version of make-br-fn
               ; if a separator was not found
               (old rest))))))

Edit: tested. Also reveals a bug in split: (split valid_list 0) == (split valid_list 1)

  (= foo [ i :
           [ : i]])

  ((foo 42))

edit2: p.s. probably not really easy much after all^^. As a suggestion, (help "stuff") is good at finding stuff.

edit3: added comments

-----

1 point by shader 6351 days ago | link

Hmm. It doesn't seem to work with the older version. If I try ([+ _ 10] 3) it complains: "reference to undefined identifier: ___"

It used to complain "#<procedure>: expects 1 argument, given 3: + _ 10", but something seems to have changed between updates :)

-----

1 point by almkglor 6351 days ago | link

Have you tried restarting Arc and then repasting the code?

Probably some dirt left from older versions ^^

-----

1 point by shader 6365 days ago | link

I agree, those aren't bad.

I think that out of those, : makes the most sense. They all make logical sense with arg lists, but : looks the best without any.

-----

1 point by almkglor 6366 days ago | link

> Tangent: this may be a dumb question, but do we really need the pipe character for symbols? I know I've never used it. Why not disallow spaces (and the like) in symbols, and free the pipe for new syntax?

Wanna start implementing a reader for Arc?

-----

1 point by rkts 6365 days ago | link

Looks like this is configurable in MzScheme. Do

  (read-accept-bar-quote #f)

-----

1 point by rkts 6366 days ago | link

How would you write a function with no arguments?

-----

1 point by shader 6366 days ago | link

leave the part before the pipe empty, I suppose

  {| body}

it might need a space between the brace and the pipe, but I don't know

-----

1 point by shader 6367 days ago | link

Pardon my ignorance, but what is anaphora as it relates to macros?

-----

2 points by almkglor 6367 days ago | link

It's a macro which automagically binds a name. For example, 'afn:

  (afn ()
    (your-code))
  =>
  (let self nil
    (= self
      (fn ()
        (your-code))))

Or aif:

  (aif x
    (your-code))
  =>
  (let it x
    (if it
      (your-code)))

-----

2 points by almkglor 6368 days ago | link | parent | on: Poll: Priorities

> I have an itch for a personal-finance-management system

Interesting. Want me to help you, uh, scratch that? ^^

My mom's an accountant who absolutely hates accounting, which is the main reason I've been fascinated by it^^

-----

2 points by almkglor 6368 days ago | link | parent | on: SNAP: Shared Nothing Arc Processes the VM discussi...

Nicely enough, although I haven't managed to push some of my more recent changes on the github (for some inexplicable reason our internet connection is very slow...).

There are quite a few comments in the code but no real top-level description of the code yet. Here's a first attempt:

The basic class structure for memory management is:

  Generic (abstract/non-instantiatable base class)
   +- .... all Arc objects

  Heap (abstract/non-instantiatable base class)
  +-Globals (concrete)
  +-Process (concrete)

  Semispace

  Heap contains a pointer to Semispace, "s"
  Heap contains a vector of Semispace pointers, "other_spaces"
  Heap has the following dynamic functions:
    get_root_set()
      - gets the root set of this heap, i.e. if it's a
        Process, the process's stack and cache of globals

  A Semispace is defined as a memory area containing 0 or
  more Generic objects.  Each Generic may have 0 or more
  references to other Generic objects.

  A Semispace may be held by only one Heap.

  Each Generic may refer only to Generic objects in the
  same Semispace, or to another Semispace held by the
  Heap, i.e. can only refer to Generic objects in the same
  Heap.

  When a Heap wants to transfer an object to another Heap,
  it must copy all those objects into a new Semispace
  (one which it, conceptually, does not "hold" except
  during initialization), then adds it to the destination
  Heap's "other_spaces" vector (obviously locking is
  required).  While holding the lock it must also add
  the copied memory to the mailbox or wherever in the
  destination's root set it can be held.

  During a GC, the Heap holds its lock, then computes
  a "good" size for a new Semispace, creates it, then
  copies the root set to the new Semispace.  Then it
  releases all its Semispace objects (in "s" and
  "other_spaces")

So, err. that's memory management so far. ^^

-----

1 point by almkglor 6367 days ago | link

Here's a further bit, about ToPointerLock:

  ToPointerLock is a RAII class which protects the
  to-pointers of objects.

  The to-pointers are used for tracing and copying.

  Each Generic includes a to-pointer.  This is private to
  the Generic class, and is not accessible except via
  a ToPointerLock.

  During copying, the copy routine gets a reference
  from the to-copy set.  Then it queries the ToPointerLock's
  to() function on the reference.  If it returns a
  non-NULL pointer, the reference is changed to that
  value.  Otherwise, the copy routine clones the object
  to the destination Semispace, then sets the to-pointer
  via the ToPointerLock's pointto() function, and then
  changes the reference.

  ToPointerLock keeps track of modified to-pointers.  If
  something throws and the ToPointerLock goes out of scope,
  it will reset the to-pointers back to NULL (so that
  the Heap will remain valid).

  The ToPointerLock can be told to forget about resetting
  the to-pointers back to NULL, for example after a
  successful GC, where the memory areas will be reclaimed
  anyway and the to-pointer values don't matter.  This is
  done by calling the good() function.

-----

2 points by almkglor 6359 days ago | link

Here's a widdle bit more, about Executor's, BytecodeSequence's, and Bytecode's:

  An Arc function is represented by the Generic-derived
  Closure class.  The Closure class is composed of a
  function variable and a vector of Generic pointers.

  The Closure's function variable is a pointer to an
  Executor.  An Executor is usually just a wrapper around
  an _executor_label variable, but a derived class called
  ArcExecutor includes a BytecodeSequence as well.

  The implementation of the _executor_label type will
  depend on whether you are using GCC or not.  If you're
  using GCC, then _executor_label is a void* and will be
  a pointer to a lebel in the source code.  Otherwise it
  will be an enumeration, which will be used in a switch
  statement for standard C.

  (GCC supports an extension to C / C++ where the pointer
  to a source code label can be taken using &&label.  In
  standard C / C++ this is not allowed.  This is used to
  speed up interpretation so we don't have to go through
  a switch statement, insted we just use indirect jumps)

  A BytecodeSequence is just a sequence of Bytecode's (the
  exact format is not yet well defined - might be a linked
  list for simplicity, or might do some hackery to put
  them into a vector).  Most Bytecode's are (like
  Executor's) just a wrapper around a _bytecode_label type,
  which (like _executor_label) will be either a void* or an
  enum depending on your C++ compiler.

  However there are two additional variants of Bytecode.
  One is the Bytecode with BytecodeSequence variant, called
  a SeqBytecode.  A SeqBytecode is used to represent an
  inner function prior to creation from a closure:

    arc2c AST:
    (%closure (fn (self k) (k (%closure-ref self 0)))
              x)
    SNAP symbolcode:
    ( (fn (check-vars 2)
          (local 1 #|k|#)
          (local 0#|self|#) (number 0) closure-ref
          (apply 2))
      (local 2 #|x|#)
      closure
    )

  It's also used to represent an if:

    arc2c AST:
    (if x
        (k 0)
        (k 1))
    SNAP symbolcode:
    ( (local 2 #|x|#)
      ; the if contains only the then branch
      ; when flow of control enters the if, it won't
      ; leave the branch!
      (if
         (local 1 #|k|#)
         (number 0)
         (apply 2)
      );else
      (local 1 #|k|#)
      (number 1)
      (apply 2)
      )

  Another kind of Bytecode is the IntBytecode, which are
  used to represent symbolcodes with parameters, such as
  local and apply above (in fact a majority of Bytecode's
  will be IntBytecode's)

  I might also need bytecodes for strings, although I
  might just insert a step in the arc2c compilation
  code to transform them (as well as quoted lists) into
  the following code:

    (def foo ()
      (pr "foo"))
    =>
    (let quoted (string (cons #\f (cons #\o (cons #\o nil))))
      (def foo ()
        (pr quoted)))

  Character literals would probably also be transformed
  to IntBytecode's where the int is the character's
  unicode.

  Then there are the symbol literals ^^, which I haven't
  thought of yet.

-----

1 point by almkglor 6357 days ago | link

Okay, I've been (re)thinking about bytecodes a bit.

The current arc2c code generator essentially outputs a stack-based "bytecode", with the bytecodes being directly output as C macros.

However I've been reading some good things about register-based bytecodes recently. In essence register-based bytecodes have the advantage of describing more operations per bytecode, which reduces the number of dispatches in a function - i.e. we have fewer but longer instructions, unlike a stack-based bytecode system which has simpler but more instructions.

Now from my analysis of arc2c, in the final AST output (just prior to code generation) each function's body is composed of just one thing. It can either be a function call, or an if.

An if statement would then have either a variable reference or a primitive call in its condition, and its then and else branches are either function calls or if statements of the same type.

The component sub-expressions of a function call would then be either variable references or a primitive call (function literals at this time have been converted to primitive calls to %closure, and any nested function calls will have been transformed away by CPS conversion).

So it seems that pretty much everything can be transformed to variable references.

Hmm...

Edit: Waa!! I'm being forced to study MFC against my will!! LOL

-----

1 point by almkglor 6354 days ago | link

If anyone at all's still interested, I've pushed some notes on the bytecode system. I've also started to adapt bits of the arc2c compiler (a few changes were necessary in the final cps and closure-conversion changes) and creating the bytecode generator (unpushed though).

There's an untested symbolcode-to-bytecode converter in the executors source of SNAP too.

The VM is a somewhat register, mostly stack machine now. A few bytecodes were modified/added to allow direct access from a local variable instead of just modifying the stack top. So a sequence {(local N) bytecode} becomes {(bytecode-local-push N)}, removing one bytecode dispatch. It still pushes on the stack top though, but then given that there are relatively few "addressing modes" so to speak, it's not bad.

I suspect it will make JIT, if ever, a bit worse though.

-----

2 points by almkglor 6349 days ago | link

Of course nobody actually cares anyway, but SNAP's bytecode engine now seems to be working.

The tests/executors_test.cpp contains the test for the bytecode/executor engine. It's very basic - it just converts a list-encoded symbolcode/bytecode to a function, then executes that function.

The symbolcode calls a builtin function, '$ (which I'm reserving for implementation-specific Arc stuff). Since this is CPS, the builtin function returns by calling back to a bytecoded/interpreted function.

Assuming you have boost in a standard include, and g++, you can try it out by downloading/gitting from http://github.com/AmkG/snap/tree/master , then go to the tests/ directory and:

  ./compile executors_test.cpp
  ./a.out

IIRC the only thing I've used so far in boost are the shared_ptr and scoped_ptr, which are probably the most portable parts of boost (and don't require compiling a library).

It's tested on 8.04 Ubuntu with the versions of g++ and boost included in it (don't remember exact versions, and don't currently have access to my machine).

This will probably end up being how SNAP initializes:

  1. Construct a bytecode list for compilation by the
     executor system.  This bytecode list will be derived
     from a version of arc2c compiling itself with the
     assumption that it's safe to use macros in the same
     environment.
  2. Convert the bytecode list to a function.  Since this
     requires the executor system, which assumes that it
     normally executes interpreted code and only
     occassionally built-in CPS functions, this conversion
     will need to use some temporary global variables to
     access bits of the executor system that are normally
     accessible only to interpreted code.
  3. Execute the function.  This will end up being similar
     to how interpreted code is normally executed, except
     that we don't have to go through work queues etc.

And of course... a few more tests would be necessary for the executor system, then we have to consider how to handle a worker thread pool to execute processes, and also to disable the use of a worker thread pool if we want to use a single OS thread.

-----

1 point by almkglor 6337 days ago | link

Update...

Currently I'm waffling on I/O. Asynchronous I/O is hard, let's go redditing...

Originally the plan was that there would be a separate builtin type for I/O ports, which would be just shared_ptr wrappers around AsyncPort objects.

Of course there's the rather minor problem though when several separate Arc processes try to access the port.

And there's also the minor problem where in Linux some (most?) filesystems assume they're so fast that they will always return "ready" for select() or poll(). Haha.

Currently I'm trying to implement ports as separate C++ processes with their own message queues.

And just suddenly now, it dawned unto me that this time there's the minor problem where I/O is performed on reads. A read might consume a variable number of characters; the problem is, which variable number of characters? How do I determine, before I even look at anything, how many characters to consume? Since I only get one queue entry, and another process might get the next entry? Take for example the 'read function: it knows it will start a list when it sees #\(, but it can't know how many characters, total, the list will be. We can't "release" the port until we actually complete the 'read function, because if we do, another process might read from the port and start reading random characters, meaning we (this process and the other proess) both end up not reading anything useful.

I'm not making sense I think.

So in the end I may very well have AsyncPort objects in the Arc-side, and when reading pass in an entire function which will consume characters, and then when the function returns a value (or throws an error), then we "know" that the port is released and another process can grab it by passing in a new function.

-----

1 point by stefano 6337 days ago | link

For the read problem, the shared I/O port should have a buffer containing the characters read, and an index on this buffer for every process that read data from it. This way every process can act as if it were the only one to read data because it will read starting from its own index. This doesn't work for writing data, of course, so output port shouldn't be shared.

-----

1 point by almkglor 6336 days ago | link

Suppose process A reads up to the second-to-the-last character. Then suppose it passes the port to process B. Process B does a read. What does it read? First character or last character? What would the programmer reasonably expect?

-----

2 points by stefano 6335 days ago | link

Both the options seem reasonable to me. If B wants to read the first character probably he would open a new port, so when B receives a port from A he should read from the last character. What if now A tries to read from the port? To keep the behavior of the virtual machine consistent (always copy) A should read the last character, and not EOF. If this didn't happen, B would have successfully changed the state of something belonging to A. In conclusion, every process should have their own index in the buffer, and when a port is passed from A to B, B gets a new index initialized with the value of A's index.

-----

1 point by almkglor 6335 days ago | link

Huh. Didn't think of that.

Anyway, the internet connection at home is down, I've got girl problems again, and I am in the mood to sleep before I start shooting everybody (especially since I suck at first player shooters, so playing those games just pisses me off). Don't anybody go around expecting anything decent for SNAP this weekend, not that there was anything decent there anyway.

-----

2 points by almkglor 6333 days ago | link

http://www.monkey.org/~provos/libevent/ Hmm. Possibly useful, we might be able to use its event_loop(). Possibly we can implement a central I/O process written in C++ which simply waits for I/O, then notifies the calling process that actually performs I/O.

As an aside, the problem with having separate buffer pointers for each process in each input port is that it stops being orthogonal to output ports, where it's simply impossible to have separate buffer pointers.

So I think I'm putting it back to "I/O ports are really processes", where the I/O port process is an Arc process that acts as a serializing mutex around the I/O port.

-----

1 point by almkglor 6330 days ago | link

Oh man, asynch I/O is hard...

As an aside, the newer (but presumably less well-developed) libev http://software.schmorp.de/pkg/libev.html supports nice timeouts and child process monitoring, which would really help in implementing 'sleep and 'system.

Further, I'm also thinking of ways of reusing continuation closures.

In arc2c many continuation closures can be eliminated because the compiler can inline global functions with impunity (simply by detecting if a global variable is assigned with a function exactly in one place, at the top-level). These remove the need to construct continuations for many cases, such as calling the basic 'car and 'cdr functions (they are inlined to the 'car and 'cdr primitives), which in most cases are defined only once, in the library.

However in SNAP we want to support full dynamism, so we can't do inlining for many cases and we must actually perform a CPS-style call, which requires constructing a continuation closure.

A continuation closure, once invoked, can usually be discarded immediately; in fact, the continuation closure's data can even be completely overwritten.

The only exception here is with 'ccc. So, what I'm thinking is, we add a new type of closure, a k-closure, which is just a subtype of standard closures. It contains an additional bool, specifying if it can be safely reused (the default).

When 'ccc is invoked, it clears that bool. In addition, it must also search through the entire existing "stack" of k-closures, clearing all their reusable flags.

A continuation function can then simply reuse its own k-closure while constructing a new continuation, unless the reusable flag is cleared.

-----

1 point by almkglor 6327 days ago | link

Okay, I've implemented the continuations idea above. I also tested it a little, and tested a beta version of the arc to bytecode compiler (that's now on the git). Testing revealed a hidden bug in the executors framework ^^ specifically the first bytecode in each function did not get executed. Since the most usual first bytecode is a check of the number of parameters, I didn't see the error ^^

I'm thinking of also implementing the continuations idea above in arc2c.

Edit: Oh, and since we're on the subject of continuations: I don't know, but the lack of a full 'dynamic-wind support in Arc seems rather, err, well, puzzling. What's supported is about half of 'dynamic-wind, i.e. the half that handles exiting the 'dynamic-wind; it's exposed as the ac.scm 'after. This means that if someone creates a generator library and does something like:

  (defgen foo (f)
    (w/infile p f
      (yield (read p))))

... it won't actually work, because once 'yield calls outside of the context of 'w/infile (via a continuation), the file is closed and can't be reopened (because 'w/infile doesn't use an "before" handler, just an "after" handler).

What I'm wondering about is: is this actually OK? If I implement just "after" handlers on continuations, is that "good enough" for Arc?

-----

1 point by stefano 6329 days ago | link

How do you safely inline functions?

Suppose you have a file with these definitions:

  (def f () 8)
  (def g () (f))

and you compile it. arc2c would then inline f when called by g. Suppose now that I load the compiled file from the repl and then type:

  (def f () 5)

now calling g from the repl should return 5, but f were inlined before, so it will return 8. Does arc2c handle this kind of problems? If it does, how is this implemented? I'm asking because I wanted to do a similar optimization in nyac, but this problem blocked me.

-----

1 point by almkglor 6329 days ago | link

Simple: arc2c doesn't have a REPL ^^

Basically, it expects all definitions to be in the file. It only inlines definitions that are:

1. Done at the top-level, including loaded and required files

2. Defined only once (i.e. assigned to only once)

Basically this is largely an optimization of the arc.arc and ac.scm libraries.

This optimization is useable only if 'eval is not ever used. If 'eval is ever used, this optimization will have to be disabled.

-----

2 points by stefano 6329 days ago | link

I wonder if there is a way to make that optimization work also when eval is used, maybe by tracking a dependecies list for every function, because it is really helpful. To make an example: the '- function takes a variable number of args, so all the arguments passed to it must be consed. With inlining it would be possible to avoid consing the arguments when they are known to be 1 or 2. To give you an idea of how much consing the args costs, take the fibonacci example: to calculate the 32nd number on NYAC when consing '- args it takes (on my computer) ~3.4 secs, without consing '- args it takes ~0.6 secs. This is a huge difference.

-----

1 point by almkglor 6326 days ago | link

> With inlining it would be possible to avoid consing the arguments when they are known to be 1 or 2.

Hmm.

Since I control the VM anyway, it may be possible for me to add some sort of feature/bytecode/built-in-function that can perform reduction of arguments without consing (i.e. just by allocating a continuation function and reusing it). It would be possible to also have it "short-circuit" so to speak, prevent it from allocating a new closure and just pass its own called continuation directly to the child function.

Basically it would be like this:

  (with (f0 (fn () (handle-0-arguments))
         f1 (fn (a) (handle-1-argument a))
         f2 (fn (a b) (handle-2-arguments a b)))
    (fn rest ; not actually expanded into a list
      ; this will be implemented in C
      (if
        (no rest)
          (f0) ; pass the continuation
        (no (cdr rest))
          (f1 (car rest))
        (no (cdr:cdr rest))
          (f2 (car rest) (cadr rest))
        ; perform reduction
          (ccc
            ; enclose these variables
            (with (a (f2 (car rest) (cadr rest))
                   rest (cdr:cdr rest))
              ; also enclose the continuation
              (afn (k)
                (zap a (f2 (car rest) (cadr rest)))
                    ; rest will be an index into the
                    ; closure variables
                (if (zap cdr rest)
                    (self k)
                    (k a)))))))))

Oh and here's a blog of SNAP: http://snapvm.blogspot.com/

-----

2 points by stefano 6326 days ago | link

So handle-n-arguments is a special form? Exposing such a special form would break compatibility with both arcN.tar and Anarki, but this seems a good solution.

> Oh and here's a blog of SNAP: http://snapvm.blogspot.com/

Nice! This way informations about SNAP will be no more spreaded in the forum. I suggest copying everything already present in the forum about SNAP into the blog.

-----

1 point by almkglor 6325 days ago | link

It won't be exposed as a special form - rather it will be exposed as a bytecode, and the symbolcode-to-bytecode compiler will be exposed as a function that the implementation specific '$ will return. This should reduce the conceptual footprint of extensions to Arc that I add strictly for efficiency, versus actual axioms.

Basically we just grab $ as a sort of dispatcher for implementation-specific stuff (much like it is used to access mzscheme in Anarki), and put anything that is more-or-less implementation-specific there.

-----

1 point by almkglor 6327 days ago | link

True; I'm thinking of something like using a dependencies list like you suggested, and doing dynamic recompilation (inlining the code) when a function call is done a particular number of times. I then keep a reference to the original code, in case some member of the dependencies list is modified.

The problem here however lies in the severely parallel nature of SNAP. If I'm recompiling code, then I either make a (process-local!) copy, or I force everything to use (comparatively slow!) locks just to read code. The better choice is the process-local copy but it has the drawback that optimizations that run on one process won't get passed to another T.T unless SNAP is run in single-worker-thread mode.

-----

2 points by shader 6348 days ago | link

I do too care. It's just that I'm not intelligent/experienced enough to understand what you're doing in a way that would allow me to help. I have very little experience with pl design, and even less with compiler/vm implementation (read, none). However, I'm very interested in how it turns out in case I could learn something and especially because I want to use it when you're finished.

So please, keep working on it, and updating us about the changes. And it's not like there's anything else going on around here for you to distract us from anyway.

-----

1 point by almkglor 6348 days ago | link

Well, this is my first time hacking together a VM anyway ^^. And arc2c is the first time I've hacked on a compiler, and I didn't even start arc2c.

Here are some interesting bits and pieces about VM's that I got via google trawl:

http://www.sidhe.org/~dan/blog/ - Squawks of the Parrot, a blog made by one of the first developers of Parrot VM. SNAP does things very differently from Parrot, partly because of its shared-nothing constraints, partly because this is a lisplike, and mostly because the Parrot VM is just something ^^.

http://blog.mozilla.com/dmandelin/2008/06/03/squirrelfish/ - about the squirrelfish VM. This is the main reason why I decided to use bytecode instead of AST traversal, and why I made a bit of an effort to reduce dependence on the stack (by introducing the -local-push and -clos-push variants of some bytecodes). It also has a bit about direct threading, which is portably implementable only on GNU compilers, and is the fastest portable implementation of bytecode interpreters (faster methods require assembly coding, or at least (in the case of dynamic inlining) some knowledge of what type of code can be copied without modification on what type of processor).

http://research.sun.com/self/papers/papers.html - a bunch of papers about Self, a smalltalk-like language and its implementation. I hear tell it's the most powerful VM on the planet, and its results are what is fueling Sun's recent JVM optimizations. The page itself does not contain the papers, but you can search for their titles and have a good chance of finding them online (or at least, when citeseer is up ^^)

And some more discussion:

Dynamic inlining is a sort of "poor man's JIT". Instead of generating a bytecode sequence as a vector of pointers-to-labels (as in direct threading), we directly copy the bytecode's implementation into the stream.

JIT is still the fastest way to implement an interpreter.

-----

1 point by almkglor 6348 days ago | link

Added a few more tests.

Also, the g++ version I've got is 4.2.3, and libboost is 1.34.1 . Ubuntu here is the 64-bit version, 8.04

-----

5 points by almkglor 6368 days ago | link | parent | on: Poll: Priorities

There's kens' excellent http://arcfn.com , as well as the Anarki-only (help ...) : use (help function-name) for help on a specific function and (help "regular expression") to search function names and documentation.

re: native compiler: possibly a gcc front-end might work?

-----

5 points by almkglor 6371 days ago | link | parent | on: PLT Scheme 400 officially out -- anybody tried Arc...

Arc instead of Scheme:

1. It's very similar to Cadence Skill, which is the Lisplike we use in the office: CL-style list-based macros, t and nil, lexical variables (at least Skill++)

2. PG fanboy

3. highest listed karma in "leaders"

4. ssyntax

-----