Arc Forumnew | comments | leaders | submit | sacado's commentslogin
1 point by sacado 6439 days ago | link | parent | on: arc2c : new version, very soon on the git

Excellent idea :)

-----

4 points by almkglor 6438 days ago | link

wrote a code walker, on the git.

Basically to use:

  (def your-function (argument-data)
    (code-walk a-variable argument-data
      (if
        (something-you-need-to-process a-variable)
          `(stuff ,(car a-variable) ,(cdr a-variable))
        ; else return the variable as-is
          a-variable)))
code-walk will automatically skip ' as well as the constant parts of `(), but will reprocess when it sees a comma or comma-at `( ,...)

See to-3-if sample

Also I removed the err in 'source - since part of codegen now emits the code in list form rather than AST, source just passes it if it's not an AST

Edit: also added ssyntax support

-----

1 point by sacado 6439 days ago | link | parent | on: arc2c : new version, very soon on the git

Ok, I'll rename arc2c's union to something else or I'll remove it and use (union is a b).

-----

1 point by sacado 6439 days ago | link | parent | on: arc2c : new version, very soon on the git

Sorry about that. I'll correct it this evening (GMT+1). As for now, you can still find debut.arc in Anarki. It didn't change since last time.

-----

2 points by almkglor 6439 days ago | link

I can't find debut.arc in Anarki either.

Some more Anarki-related questions:

1. Do you intend to use Anarki-specific features? This is so that I know if I can use Anarki in any mods.

2. Do you intend to still use the arc2c copy on Anarki, say as a stable branch?

-----

1 point by sacado 6439 days ago | link

Yep. Just checked that. In fact, "debut.arc" was renamed "structs.arc". Just change this in "arc2c.arc", and it will be working.

1. Yes, I will probably. I only work with the Anarki version, so when I'm finding something useful in the source, I don't even know if it's "official" or not.

2. Yes. Using Anarki as a repository for the stable versions of arc2c seems a good idea.

-----

1 point by almkglor 6439 days ago | link

Also, another question - Anarki has a CONVENTIONS file, will we be following this too? One Anarki convention that isn't followed is with regards to tabs and indentation, is it OK to change the tabs to follow Anarki conventions or do you prefer the current indentation?

-----

1 point by sacado 6439 days ago | link

Oh, sure, I never really checked the CONVENTIONS file, but I think it's better to indent this way (it's the tradition after all:). I usually don't do it naturally because it's not the way I learnt it, but I guess I should follow the conventions anyway. So, OK, let's change the tabs.

-----

3 points by sacado 6439 days ago | link | parent | on: arc2c : new version, very soon on the git

That's an excellent idea. I've just started a git at http://github.com/sacado/arc2c . The new version is published on it. If I configured it correctly, anyone can push modifications on it. I have 2 invitations to github left, by the way.

-----

2 points by almkglor 6439 days ago | link

  ERROR: Write permission to sacado/arc2c denied to AmkG.
  fatal: The remote end hung up unexpectedly
  error: failed to push to 'git@github.com:sacado/arc2c.git'

-----

1 point by sacado 6439 days ago | link

Hmm... Thaks for the information. I can't see how I can say "this git is open to everyone" in the config menus. Any idea ?

-----

1 point by almkglor 6439 days ago | link

Hmm. Can't seem to do this either. Erk.

I suppose nex3 specifically requested github for special treatment of Anarki? ^^

-----

2 points by sacado 6439 days ago | link

Anyway, I can add specifically anyone who asks me to. You are now an authorized "committer", btw.

-----

2 points by almkglor 6439 days ago | link

Thanks, I'll push something in maybe an hour or two, am at the office.

The push I was going to make was simply to wrap some of the private functions in codegen.arc in a 'let form.

-----

1 point by skenney26 6439 days ago | link

I'd appreciate an invite if you still have any left.

-----

1 point by sacado 6439 days ago | link

done.

-----

1 point by skrishna23 6439 days ago | link

can I have one please ? TIA.

-----

1 point by almkglor 6439 days ago | link

what's your email address?

-----

1 point by skrishna23 6439 days ago | link

my user id at yahoo.com

-----

1 point by almkglor 6439 days ago | link

Sent!

-----

4 points by sacado 6440 days ago | link | parent | on: arc-to-c : soon on the git

Hey, by the way, I have a working GC now. It's terribly slow, but it's my first one, so I like it anyway. Yes, I know I should have used Boehm (no time lost to implement a slow, buggy, informally-specified implementation of a GC) but, after all, I don't play implementing compilers every day, so... It will be changed when it will need to (when Google SoC will be over, we'll have a new GC !)

-----

1 point by almkglor 6440 days ago | link

LOL.

Code or it didn't happen ^^. Would like to see it for hacking through too.

-----

3 points by sacado 6441 days ago | link | parent | on: arc-to-c : soon on the git

Yep. For the moment, I'm treating them as builtins. They will have to be implemented the way you mention it, that's for sure. I guess the easiest way would be to declare them both ways : as primitives (i.e., + translated to %+ when it is met) and as built-in functions, the way ccc is defined.

-----

3 points by almkglor 6441 days ago | link

Hmm, that's true, you do need to handle stuff like:

  (map + (list 1 2 3) (list 9 8 7))
Haruu, compiling dynamic languages is hard... ^^

For that matter, it shouldn't be so difficult - basically if the compiler ever sees (set + ...) anywhere (where + is a global, at least), it disables the use of %+.

Possibly, we could add another step in the transformation process (which is why I suggested the structure in http://arclanguage.com/item?id=5598 to allow easy insertion/deletion of steps).

Basically the new step just traverses the structure (without mutating it) and creates a list of built-ins to disable.

Edit: or better - it traverses the structure and replaces + with %+, until it reaches a top-level node which has (set + ...), so that code which doesn't use the redefined + will still refer to the builtin %+.

-----

1 point by sacado 6442 days ago | link | parent | on: arc-to-c : soon on the git

For macros, that what I thought too. Anyway, we'll need eval for completeness. I don't know how it will behave with compiled code, though, and I know it's not an easy problem as stalin scheme simply ignored it.

For the other points, I didn't think about it yet :)

-----

1 point by sacado 6442 days ago | link | parent | on: arc-to-c : soon on the git

Actually, GC is more complicated than I thought : references to objects are not always pointers (i.e., addresses to something allocated on the heap). Fixnums, for example, but also t and nil, are a special kind of references (for fixnums, the last bit is 0, indicating "the actual value of the x reference is x >> 1". And these values have to be collected from time to time. The fib example runs out of memory quite fast, without a malloc since it's only using fixnums.

I started writing a naïve one yesterday (you know, the kind of things that were state of the art 50 years ago) and it's a little easier than I expected. At least, it doesn't core dump anymore. I'm in infinite loops now :)

-----

2 points by binx 6442 days ago | link

Boehm's GC is conservative, which means it treats all the values on stack as pointers. So there may be some space leak, but it's acceptable. At least many compilers-to-C use boehm's GC, and many of them are in practical use now.

-----

2 points by sacado 6442 days ago | link

Yes, but the problem is that even the references that are actual pointers do not really hold a pointer address, as this address should not end with a 0. You would do something like :

  ref = (malloc (sizeof (cons-cell)) << 1) + 1;
That way, ref knows it is pointing something (as ref & 1 == 1), it knows the address to this thing (it is ref >> 1), but this is totally invisible to Boehm's GC, I suppose. I guess it would think "hmm, this address 01001000 does not seem to be pointed by anyone. The only pointer I can see around is 10010001. Not the same thing obviously. Let's free 01001000 !"

But maybe I'm missing something, I never used Boehm except in toy apps.

-----

3 points by binx 6442 days ago | link

Oh, in order to use boehm's GC, you should store references as normal pointers. And you have to tag the lowest two bits of fixnums and immediate consts with 01/10/11.

Another solution is to write your own object memory, allocation functions and GC yourself.

-----

3 points by almkglor 6442 days ago | link

This is true; we'll have to change tags so that real objects are actual pointers. The alternative is to make everything a pointer, i.e. even "numbers" are really pointers to numbers.

I think it's easier to swap the meaning of numbers with objects rather than build our own GC.

As for infinite loops in GC - do you make sure not to scan a memory area you've already scanned?

-----

2 points by binx 6442 days ago | link

Yes, and if he wants to write a GC, I advise him to start with a semi-space copying GC, which is simpler and has no recursions. By this you don't have to preserve an extra bit for mark-and-sweep, and you need no GC stacks for deep-first traversing.

-----

1 point by sacado 6442 days ago | link

I don't really like making everything a pointer. I don't know any Lisp implementation working this way (even the slowest ones). It really implies a performance hit every time you use a small integer value (with -2G < small < 2G). Anyway, binx's idea seems to be good...

Well, about infinite loops, I don't know, I stopped yesterday when my head was starting to burn... And gcc always compaining about pointers being of the wrong type (the way it is implemented, with tagged references, I have to cheat a lot). Oh, come on, gcc, who do you think you are, an Ada compiler ?

-----

4 points by binx 6442 days ago | link

You don't need to make everything a pointer, but you can make everything a structure of 12 or 16 bytes. Lua takes this approach and it performs rather well.

Here's the Lua way:

struct obj{ int type; union data{ double num; void *ptr; }; }

By this means, you can achieve architecture independence.

-----

3 points by almkglor 6441 days ago | link

This approach is the best. In fact this has to be done, because not all computers have sizeof(int) == sizeof(void* ); certainly my AMD64 running on a 64-bit kernel doesn't.

Note that we can actually abstract away the representation of objects from the actual binary bits that represent it. For instance I would recommend that objects use the following representation (as noted by binx):

  typedef struct s_obj{
    enum e_type {
      other = 0,
      number = 1,
      symbol = 2,
      character = 3
    }
    type;
    union u_data{
      void* other;
      int number;
      void* symbol;
      Uchar character; //unicode character
    } data;
  } obj;
  #define OBJ_TYPE(o) ((o).type)
  #define OBJ2FIXNUM(o) ((o).data.number)
  #define FIXNUM_MIN INT_MIN //requires glibc
  #define FIXNUM_MAX INT_MAX
  //others defined similarly
However, for machines with the following characteristics:

1. 32-bit pointers and 32-bit int's

2. malloc() always returns an address at a 32-bit boundary (i.e. every four bytes, or with the lowest two bits zero)

We can then use:

  typedef int obj;
  #define OBJ_TYPE(o)  ((o) & 0x03)
  #define OBJ2FIXNUM(o)  ((o) >> 2)
  #define FIXNUM_MIN (-(1 << 30))
  #define FIXNUM_MAX ((1 << 30) - 1)
Thus we are able to have portability while still gaining optimizations for some architectures.

-----

1 point by sacado 6440 days ago | link

In fact, maybe it's not a problem even on machines where ints are not as big as pointers. You can treat each value as an int pointer, and, when you need a fixnum, do

  (int) my_int_ptr << 1
If ints are bigger than pointers, that's okay : you loose possible bits, but at least it's working (for example, if ints are on 6 bytes and addresses on 4 bytes, only the 4 least significant bytes are useful, the 2 other ones are lost). If pointers are bigger than ints, it's still working, but this time lost space is in their pointer representation.

The only thing is that fixnums don't have an architecture-independant size, but that's not a problem in practice anyway. The only constraint is that the biggest fixnum is

  2**(max(sizeof(int), sizeof(int*)) - 2)
The last-bit-is-1-for-fixnum trick works on every architecture except machines where addresses are on 8 bits. Well, maybe we can ignore these ?

-----

3 points by almkglor 6440 days ago | link

Actually it is, arc2c output crashes on my AMD64. I'm trying to hack out a replacement, but it's not working yet.

The problem is that apparently the bits of pointers that happen to be beyond the bits of ints are not zero, so assigning a pointer to an int chops off the significant bits.

-----

1 point by sacado 6440 days ago | link

Oh... Maybe a simple fix would be to change int to long. I know this is not standard behavior, but in practice I think long and pointers are always the same size. mzscheme obviously works that way : every object is a pointer that you can cast to a long if it is a fixnum.

-----

2 points by almkglor 6440 days ago | link

This seems to be correct on my AMD64:

  #include<stdio.h>
  
  int main(void){
        printf("sizeof(int) = %d\n", sizeof(int));
        printf("sizeof(long) = %d\n", sizeof(long));
        printf("sizeof(void*) = %d\n", sizeof(void*));
        return 0;
  }

  sizeof(int) = 4
  sizeof(long) = 8
  sizeof(void*) = 8
However I think there may be processors/archs where sizeof(long) != sizeof(void* ). My attempt at fixing was to use a union, but something's wrong with the way closures are handled in my fix attempt - closures don't seem to have a type associated with them, so I'm not exactly sure how they're supposed to be done.

Edit: I'll probably need to search through C language specs, though - I'm not sure, but it might be standardized that sizeof(long) >= sizeof(void* )

-----

2 points by absz 6439 days ago | link

What you want are the two typedefs intptr_t and uintptr_t . Each one is defined to be an integer big enough to hold a pointer (the former being signed, and the latter being unsigned); they aren't mandatory, though, so you might have

  #if defined(__COMPILER1__) || defined(__COMPILER2__)
    typedef intptr_t fixnum;
  #else
    typedef long fixnum;
  #endif
This would guarantee you correct behaviour if intptr_t were defined, and probably give you the correct behaviour even if it weren't.

-----

1 point by almkglor 6439 days ago | link

From this?

  #include<stdint.h>
Hmm. Interesting. How well supported is it?

-----

1 point by sacado 6439 days ago | link

Thanks for the information ! I'll try it.

-----

1 point by stefano 6441 days ago | link

Making a fixnum a pointer to a allocated memory containing the value is terribly slow.

The infinite loop could be caused by two objects that references each other, e.g. a references b and b references a, so the GC keeps going from a to b and from b to a. To solve the problem, if this is the problem, if an object has been already marked, don't follow it.

Note: I've not read the source yet.

-----

1 point by sacado 6442 days ago | link

Hmm... That would work because addresses are on 4 bytes, so they always end with 00, right ? That's a nice idea :) But it wouldn't work on an architecture with addresses on less than 4 bytes, no ? There are not many nowadays, I guess, but I suppose that should be added as an assertion is the code...

-----

0 points by stefano 6441 days ago | link

You have to be sure that every allocated object is at 8 byte boundaries for the pointer to end with 00.

-----

1 point by ambi 6432 days ago | link

Do you know ALL_INTERIOR_POINTERS? In gc/doc/README:

"Any objects not intended to be collected must be pointed to either from other such accessible objects, or from the registers, stack, data, or statically allocated bss segments. Pointers from the stack or registers may point to anywhere inside an object. The same is true for heap pointers if the collector is compiled with ALL_INTERIOR_POINTERS defined, or GC_all_interior_pointers is otherwise set, as is now the default."

-----

2 points by sacado 6442 days ago | link | parent | on: arc-to-c : soon on the git

It's on the git now. It now includes fixnums and symbols.

To use it, load arc2c.arc, then run (compile-file "foo.arc"). It will then show you all the transformations it performs and generate a file, foo.c. Compile it with your favorite C compiler, and you're done.

Are currently implemented :

- +, -, *, <, >, <=, >=, is, isnt, prn on fixnums - is, isnt, prn on symbols (including t and nil) - let, if, fn, set, and, or, quote.

Basically, that's all. A few warnings : if you call a non-existant function, your program will crash without a warning. And remember there is no GC. Once you run out of pre-defined memory, the program dies.

-----

3 points by sacado 6442 days ago | link | parent | on: arc-to-c : soon on the git

Yes. I implemented fixnums with the last bit as a tag-bit. If it's 0, everything else is a fixnum. If not, well, it's something else... Maybe nil, t or a reference to something else.

Your idea for callable collections seems quite good, too. But I'm not there yet.

-----

More