Arc Forum | almkglor's comments

Arc Forum

new | comments | leaders | submit | almkglor's comments

4 points by almkglor 6371 days ago | link | parent | on: PLT Scheme 400 officially out -- anybody tried Arc...

There is indeed a plan to make arc2c self-compileable, and to implement a REPL on top of an arc2c that's also capable of interpreting code.

-----

4 points by shader 6371 days ago | link

In that case, what needs to be done to finish it, or at least make it "self-compilable"? And then what do we need to do to skip c? Or is that even reasonable?

-----

3 points by almkglor 6371 days ago | link

> In that case, what needs to be done to finish it, or at least make it "self-compilable"?

Macros, a reader (can be implemented in Arc using lib/treeparse.arc), symeval, I/O

> And then what do we need to do to skip c? Or is that even reasonable?

SNAP

-----

3 points by almkglor 6370 days ago | link

Regarding I/O: it's difficult to do I/O in the presence of green threads. Obviously when one green thread is blocked by I/O the other threads must continue running, meaning we need some sort of async I/O. Most of the async I/O libraries I've seen so far seem to concentrate more on sockets than I/O in general.

-----

2 points by shader 6370 days ago | link

What are your ideas for handling async io? I know there are system specific ways, but I don't think that's a good idea.

We could have separate os threads for the io, one for each major io system: HDDs, network, keyboard, etc. We could write special io methods for the green threads that request the io from the scheduler. The scheduler adds the request to the appropriate io thread queue. When the operation completes, it either unblocks the gt, uses the provided call-back, or passes a SNAP style message.

I don't know if that would work very well. It doesn't seem very scalable, and at the very least is complicated. It would be best of course, if we could find a cross platform async io lib. But it still might be a good idea to make the scheduler responsible for io, whatever we use. I'm sure you know more about the whole process than I do.

Do you know how any other systems do it? Does erlang have async io? How does the JVM get it's async io? The wikipedia page claims that it is possible to synthesize async io from polling and interrupts (supposedly the JVM does that), but I don't know how standard and trans-system those are. http://en.wikipedia.org/wiki/Non-blocking_I/O

-----

2 points by almkglor 6370 days ago | link

> What are your ideas for handling async io?

Here's a list of ideas:

nil

> Does erlang have async io?

I assume it uses async I/O: For a long time Erlang had only green threads and didn't actually use OS threads. From what little I could grok of the Erlang source, it seems it has a set of OS-specific "drivers" for I/O, but these drivers seem to be extremely basic. It seems to have some sort of queue, but I can't figure out how it handles the queue (i.e. how it prevents the queue from blocking normal operation).

Each port is apparently modeled as a process that happens to have the ability to resume/suspend other processes. When the port gets its timeslice (?) it goes through its queue, does one I/O request (?), then resumes the process that asked for that I/O (I think).

In fact the scheme you described seems, approximately, what Erlang does. I think. I haven't found any particularly good discussions on the Erlang source.

Then there's some note somewhere in erl_async too that if threads (OS threads?) are unsupported, then the "async" I/O becomes synchronous (?).

> How does the JVM get it's async io?

I think that again, this is OS-dependent. After all, JVM is a virtual machine, so it's up to the JVM implementation to handle this. Figuring out how a random JVM implementation does this probably means digging through C source.

-----

2 points by stefano 6370 days ago | link

To do async IO there are 2 ways: use OS threads or use green threads and OS-specific asynchronous I/O calls. Either way, the VM should, obviously, abstract away the OS-dependant features. Using OS threads seems easier to me.

-----

1 point by almkglor 6369 days ago | link

Both ways seem quite OS-specific to me ^^

Hmm. This will also have to be done in arc2c too.

-----

1 point by stefano 6369 days ago | link

I don't think there is some way to avoid using OS specific functions for async IO. Abstracting this shouldn't be very hard, though.

-----

1 point by shader 6368 days ago | link

Well, there is a Boost library for cross-platform threading. We could just use that and "simulated" async using threads to offload the synchronous io. Now, maybe that's a bad idea, and we need to use some abstracted os specific async libraries, but it would work.

Boost also seems to have plenty of libraries that would be useful for SNAP. How is that going anyway, almkglor? I should probably ask questions pertaining to that subjects in it's own thread.

Pretty quiet around here. Hopefully that means everyone's busy ;)

-----

1 point by almkglor 6368 days ago | link

> Well, there is a Boost library for cross-platform threading. We could just use that and "simulated" async using threads to offload the synchronous io.

Boost also has asio, but it appears to be concentrated more on socket I/O than I/O in general.

What I would like is to have a general-purpose asynchronous I/O library, which would handle all details of asynchronicity.

Basically, I intend to allow SNAP to be compiled in two modes:

1. Single worker thread, multiple process. For cases where the OS doesn't have decent threads (dorky embedded systems? LOL, quite a few of the embedded systems I've seen recently actually have decent threads)

2. Multiple worker OS-level threads, single runqueue.

async I/O is needed for 1, and would be nice for 2: in case 2, if a worker thread blocks on synchronous I/O, the others will still fetch processes to run from the runqueue, although the blocked worker thread is one less thread that can work.

However in case 1, there's just one thread, so it can't block.

-----

3 points by almkglor 6372 days ago | link | parent | on: SNAP: Shared Nothing Arc Processes the VM discussi...

> I wonder if the aim of the vm is to be faster than the standard Arc implementation, or just to have a means of reifying the continuation and thus be able to stop a process to do process management.

More of a way of taking advantage of multiple processors/cores while not having to do a lot of process management. Of course, there's not much point in using multiple processors anyway if the VM itself is slow ^^

> I would go for simplicity and copy always. Unless Arc had some kind of inmutable data structure and the vm were able to copy just the mutable parts.

I agree, I did have a plan for a sometimes-sharing, copy-on-write-and-if-receiver-has-a-shared-copy VM, but it felt more complex than necessary.

It would probably be useful to also have binary blobs like in Erlang.

-----

3 points by almkglor 6372 days ago | link | parent | on: Reader Macros?

Not officially, although you could hack into the underlying Scheme via $

-----

4 points by almkglor 6372 days ago | link | parent | on: SNAP: Shared Nothing Arc Processes the VM discussi...

'car, 'cdr, 'list .... global variables all. Private copies for each process? All of arc.arc in private copies for each process?

How about when the user, say, loads lib/scanner.arc and then expects that existing running processes are now able to handle scanners?

There's a reason I suggested using some sort of cache for these.

-----

2 points by shader 6372 days ago | link

If they're immutable, then great. But using shared memory to transfer state is flawed, and can cause race conditions, etc. That's why we're using the actor model to begin with, right?

I suppose they can't really be immutable, or we couldn't do hot code loading. How does Erlang do it?

-----

1 point by almkglor 6372 days ago | link

> But using shared memory to transfer state is flawed,

cough let the programmer second-guess you cough

> How does Erlang do it?

It doesn't, actually. Functions are still effectively global variables, ones which you can't write to except via the c() BIF.

-----

4 points by shader 6372 days ago | link

>cough...

Heh, you're right. So, in that mindset, why not give a lot of nice macros for controlling share vs copy, and make the default be copy? Then programmers could control nearly everything. Of course, they could always hack on your vm if they really wanted tons of control.

But still, concurrent writing to a global variable sounds dangerous.

I kind of like the idea of them being "registered processes." I'll have to do some more thinking on that.

>It doesn't, actually...

Yes, that answers some of the question, but I was a bit more interested in how they implemented their hot code loading. The code still exists for a while as the existing processes continue to use it. But they eventually phase out the functions and swap to the new ones.

IMHO, hot code loading is a very nifty feature. Combined with remote REPL makes it especially useful. I don't know how well current lisps support hot swap, but I don't think it can work effectively without a concurrent system.

-----

3 points by almkglor 6372 days ago | link

> Yes, that answers some of the question, but I was a bit more interested in how they implemented their hot code loading

It's in the OTP library actually. For example, they have a standard gen_server module. The gen_server would look approximately like this in snap:

  (def gen-server (fun state)
    (<==
      ('request pid tag param)
        (let (state . response) (fun state param)
          (==> pid (list 'response tag response))
          (gen-server fun state))
      ; hot code swapping!!
      ('upgrade new-fun)
        (gen-server new-fun state)
      ('stop)
        t))

So yes: hot swapping just means sending in a message with the new code ^^

It's actually more complex than that - they generally make messages include a version number so that nodes with new versions can communicate to nodes with older versions. This versioning is, in fact, part and parcel of the gen_server series of functions. Requests to servers are made via functions which send a message (with tags and other metadata such as versions abstracted away) and wait for a receive.

I think what they say is, the programmers good at concurrency write the gen_server and other parts of the OTP, while the average programmers write application code ^^

Much of Erlang isn't implemented in the VM level ^^

-----

3 points by shader 6372 days ago | link

It makes sense that they wouldn't do that at the vm level. Your code even makes sense, though I thought "let" only assigned one variable.

I'm still not quite able to read arc fluently, so any explanations of the subtle that I likely missed will always be appreciated. Come to think of it, any explanations of any code would be nice, as the thoughts and reasons behind code don't always come out in the source itself. And I also like learning new things :)

-----

2 points by almkglor 6372 days ago | link

  (def gen-server (fun state)
    (<==

I'm using pattern matching. Although Arc doesn't actually have pattern-matching built in, someone wrote a pattern matching library a long time ago using macros: http://arclanguage.com/item?id=2556 http://arclanguage.org/item?id=1825 . The modern evolution uses something like p-m:def to define a pattern-matching function, p-m:fn to create an anonymous pattern-matching function, etc.

      ('request pid tag param)

The pattern above means "match a 4-element list, whose first element is the symbol 'request, and which has 3 more elements that we'll call pid, tag, and param".

        (let (state . response) (fun state param)

This is a destructuring. It simply means that (fun state param) should return a cons cell, with the 'car of the cell being placed in state and the 'cdr being placed in response. So we expect fun to return something like (cons state response)

          (==> pid (list 'response tag response))

Note the use of 'tag here. We expect that 'tag would be a 'gensym'ed symbol, and is used in pattern matching so that the client can receive the message it's looking for.

          (gen-server fun state))

Plain recursion.

      ; hot code swapping!!
      ('upgrade new-fun)
        (gen-server new-fun state)

This is it, really: just give a new function.

      ('stop)
        t))

-----

2 points by cchooper 6372 days ago | link

Functions (that aren't closures) can be safely cached because they're immutable. If we assume arc.arc is part of the 'spec' (and hence itself immutable) then we can safely link each process to the same functions, but give each one it's own global bindings, maybe?

-----

2 points by almkglor 6372 days ago | link

> arc.arc is part of the 'spec' (and hence itself immutable)

But ac.scm itself is not immutable. cref lib/scanner.arc , which redefines 'car and 'cdr (which are in ac.scm). If ac.scm, which is even more basic than arc.arc, is itself not immutable, then why should arc.arc be immutable?

So no.

In arc2c functions are represented by closures. Pointers to closures are effectively handles to the actual function code.

Now the function code is immutable (that's how arc2c does it - after all, all the code has to be written in C). When a function is redefined, we create a new closure, which contains a pointer to the new function code (which was already compiled and thus immutable), then assign that closure to the global variable.

Basically my idea for a cache would also have an incremented update pointer:

  class SymbolAtom : public Atom {
  private:
          std::string name;
          Generic* value;
          size_t version;
  public:
          friend class Process;
  };

  class Process : public Heap {
  /*blah blah heap stuff...*/
  private:
          std::map<Atom*, pair<size_t, Generic*> > g_cache;
  public:
          Generic* get_global(Atom* a){
                  std::map<Atom*, pair<size_t, Generic*> >::iterator i;
                  i = g_cache.find(a);
                  // not in cache
                  if(i == g_cache.end()){
                          Generic* mycopy = a->value->clone(*this);
                          g_cache[a] = pair<size_t, Generic*>(a->version, mycopy);
                          return a->value;
                  } else {
                          pair<Atom*, pair<size_t, Generic*> > ip = *i;
                          pair<size_t, Generic*> ipp = ip.second();
                          // no change, return value
                          if(a->version == ipp.first()){
                                  return ipp.second();
                          } else {
                                  //recache
                                  Generic* mycopy = a->value->clone(*this);
                                  g_cache[a] = pair<size_t,Generic*>(a->version, mycopy);
                                  return mycopy;
                          }
                  }
          }
  }

-----

1 point by almkglor 6372 days ago | link | parent | on: SNAP: Shared Nothing Arc Processes the VM discussi...

Well, first: does reading a global variable at some point create a copy or not?

Having to do the checking on each write has the slight disadvantage of slowing down writes. For that matter, we still have to consider how heaps are organized.

For example, in my current stop-and-copy, always-copy-on-message-send, each heap has several semispaces. Semispaces conceptually belong to only one heap. One of the semispaces is the current semispace, the one that is being allocated into. The others were semispaces that were received during message sending.

When a message is sent, it is copied into its own semispace, then the semispace is added to the list of "other semispaces" held by the heap.

During a GC, the GC copies all semispaces into a single new semispace, then the heap releases all the other semispaces and sets the new semispace as the current semispace.

In theory it should be possible to reduce slightly the GC overhead by just copying some (but not all) semispaces into the newest semispace: this is basically an incremental GC

Edit: As an aside, the advantage of leaving it "undefined" is so that we can change the implementation of global variables without committing ourselves to stone. "What, you kept a table in a global variable and expected to use that to communicate with other processes? Well, sorry, but that won't necessarily work (or not work)"

-----

4 points by almkglor 6372 days ago | link | parent | on: SNAP: Shared Nothing Arc Processes the VM discussi...

Okay, here's a github public repo:

http://github.com/AmkG/snap/tree/master

-----

1 point by almkglor 6373 days ago | link | parent | on: SNAP: Shared Nothing Arc Processes the VM discussi...

From: http://arclanguage.com/item?id=7124

> you could use a Bloom filter to determine if the object is present on the other process.

Hmm. I did think of using Bloom filters, but then I would also end up "copying unnecessarily" anyway for some N% of false positives. Not to mention that I'd still have to build k different hash functions for pointers. So I decided, darn it, just copy all.

> How much are you implementing on the c++ side?

Memory management, worker threads (processes are green but are distributed among a set of worker threads), async I/O (technically Arc-side won't get async I/O, just sync, but then the Arc-side can launch a process to do the actual I/O and send a message to the original process that the I/O's done.) Async I/O is always a problem (even in arc2c), it doesn't seem to be very portable.

Might do a very very basic reader in C, maybe just enough to read numbers, symbols, and lists of same. I want to do the actual reader in Arc and use raymyers' treeparse ^^.

Also, I have to design the bytecode.

The arc2c "machine" is really a simple pushdown stack machine, except that function calls will simply reset the stack to the top N entries (where N is the number of arguments to the function call, including the function/object to call and the continuation).

My intent is to have the external representation of the bytecode use a stack machine similar to the arc2c stack machine. Internally, if the implementation can, it is free to recompile the bytecode to a register bytecode, which I hear is slightly faster because the number of instructions is reduced. It is free to use superinstructions, inline (but only if it's sure it can inline - in particular code that references, say, the 'car global function, should be able to adjust if 'car is redefined to, say, work on scanners), etc etc.

-----

2 points by shader 6372 days ago | link

Concerning bloom filters, they are only helpful if knowing the fact that a particular process has a reference matters when deciding to copy. So, it's useful for the case when you're sending an object to a process. Otherwise ref counting is better, because you aren't broadcasting "do you have this?" messages.

Bloom filters are probably only a good idea if your copy is a large one, rather than just a small variable.

Here's an interesting idea: for extremely large messages, have some sort of "lazy copying" where the process only copies pieces just before it uses them, or when there isn't a cost associated with the transfer. That way large copies don't slow down your system. You'd still increment the reference counter, if you were using one, and any mutations would have to change a different copy.

Actually, a good question is what the relative costs of sharing vs. copying are. I would think that for small things copying is less expensive than keeping track of who has it, etc., just to avoid copying. For bigger items, it might be worth avoiding a premature copy. I don't know.

Edit 1: About the k hash functions, I think that if there is low enough correlation between bytes in the result you can break up one hash function into several. Also, most good hash functions, (the ones that might be too slow for this purpose) have enough of a difference in the result with slight changes to the input that you could add just 1 to the value several times, and get a dozen hashes. So you could use:

  (hash val)
  (hash (+ val 1))
  (hash (+ val 2))
  ...

and get "3" hash functions. If you break each hash into bytes as well, and each produced multiple bytes, you could easily have 6, 9, or 27 "hash functions"

-----

2 points by almkglor 6372 days ago | link

> So, it's useful for the case when you're sending an object to a process.

Yes, but it is in fact the only case where we copy objects anyway (even reading a global variable would be effectively sending the object in the global to the process in question)

> for extremely large messages...

  #define N    I don't know
  if(msg.total_size() > N){
    msg.share(process);
  } else {
    msg.copy(process);
  }

I think this will end up having twice the overhead though: you need mechanisms to support both sharing with copy-on-write-and-if-receiver-already-has-it and copying.

Take note also that sending messages is recursive: we might be sending a list and thus end up sending several items.

> I would think that for small things copying is less expensive than keeping track of who has it, etc., just to avoid copying.

This was the justification in the Erlang BEAM and JAM machines to always copy.

> About the k hash functions...

Cool, I didn't think of that!

-----

2 points by shader 6372 days ago | link

So then, should we just always copy?

Unless we make some special immutable types as mentioned above; then we could share those. Maybe I'm just prematurely optimizing, and should just give up on trying to find ways to avoid the copy. Life is much simpler just copying. :)

How about just starting with always-copy, and later when we have performance data, and experience using it, we can code exceptions? I don't know how easy it would be to code such exceptions, but it would be less likely to break if we didn't make these decisions without empirical evidence.

>twice the overhead... I didn't think it would create any extra mechanisms, just a short circuit rule when deciding to copy or not. Rather than checking the ref counts each time the item is mutated or sent, if it is smaller than a certain amount, we know it was copied.

Now, maybe ref counts are faster than total_size(), but I don't know.

Certainly, always copying would avoid this whole overhead business. :)

-----

8 points by almkglor 6375 days ago | link | parent | on: Qi's Amazing Macros

  (load "foo.arc" your-macro-expander-here)

Anarki only

-----

5 points by rincewind 6374 days ago | link

load would be even more useful if you were able to supply your own reader as the second argument.

-----

4 points by almkglor 6374 days ago | link

  (def load-my-reader (file read)
    (push current-load-file* load-file-stack*)
    (= current-load-file* file)
    (after
      (w/infile f file
        (whilet e (read f)
          (eval (hook e))))
      (do (= current-load-file* (pop load-file-stack*)) nil)))

-----

1 point by Bogart 6374 days ago | link

Wouldn't this force macros and code to be in separate files, though?

-----

1 point by almkglor 6374 days ago | link

It forces macro-expanders and code to be in separate files. You can still put macros in the code file, as long as your macro-expander can understand them.

-----

3 points by almkglor 6375 days ago | link | parent | on: Arc Programming Assignment

  (def f (n k)
    (if
      (is n k 0)
        1
      (and (even n) (odd k))
        0
      ; else
        (f (round:/ n 2) (round:/ k 2)))) ; works if n and k are ints

-----

3 points by eds 6375 days ago | link

Not quite, since 'round returns the nearest even integer on halves. s/round/trunc/g gives the correct solution.

  arc> (round 4.5)
  4
  arc> (round 5.5)
  6
  arc> (trunc 4.5)
  4
  arc> (trunc 5.5)
  5

-----

2 points by bOR_ 6375 days ago | link

Ah.. I remember reading about the reasoning behind that (Round-to-even)

from: http://en.wikipedia.org/wiki/Rounding

  Although it is customary to round the number 4.5 up to 5, 
  in fact 4.5 is no nearer to 5 than it is to 4 (it is 0.5 
  away from both). When dealing with large sets of 
  scientific or statistical data, where trends are 
  important, traditional rounding on average biases the data
  upwards slightly. Over a large set of data, or when many 
  subsequent rounding operations are performed as in digital
  signal processing, the round-to-even rule tends to reduce 
  the total rounding error, with (on average) an equal 
  portion of numbers rounding up as rounding down. This 
  generally reduces upwards skewing of the result.

-----

2 points by almkglor 6375 days ago | link | parent | on: (delist somelist)

'apply

  (apply rand-choice (list 1 2 3))

-----

2 points by bOR_ 6375 days ago | link

Had tried apply, but that doesn't work in arc2. I get a

  "Function call on inappropriate object #3(tagged mac #<procedure>) (1 2 3)"

Might be inevitable for me to move over towards anarki, if it works there. :P

-----

3 points by absz 6375 days ago | link

Actually, in looking over the functions again, it turns out that there is already a function that does what you want: random-elt. That should do what you need it to do.

As for your problem (and no, it doesn't work on Anarki[1]), the problem is that rand-choice is a macro, not a function, and apply only works on functions. The only solution is a terrible hack:

  (def mapply (macro . parms)
    " Applies the macro `macro' to `parms', which are not quoted.
      See also [[mac]] [[apply]] "
    (eval:apply (rep macro) (join (butlast parms) (last parms))))

But don't use that!

[1]: But you should use Anarki anyway :)

-----

2 points by bOR_ 6375 days ago | link

Thanks for the reminder not to apply on macro's, and for finding random-elt. I need to learn how to read ;).

random-elt does exactly what I want.

(btw.. small consistency issue in the syntax here?) rand-choice random-elt

-----

1 point by absz 6375 days ago | link

To further clarify, if you've programmed in Ruby or Python: (apply func ... arglist) is equivalent to func(..., * arglist) in those languages.

-----