Create a website-building app, in Arc, which will:
1. give tips on-the-fly (i.e. while the user is building it) for properly writing for the web, as per useit.com
2. automatically include a decent search engine
3. make it easy to modify pages
4. include history for page changes (visible only to the website builders I suppose)
5. include an archive for older pages (i.e. for events etc.)
6. make it easy to link to other pages on the website, including back links (i.e. if a new page is related to an older, existing page, you can specify a back link on the new page which creates a link from the older page to the new page)
7. ???
8. Profit
P.S. When I said website-building, no, not a blog or programmer-oriented site: I mean a business site, one which doesn't suck like all the other b2b/b2c sites.
A very nasty lack is the fact that you can have an operation that's always a redirect, or an operation that always generates HTML (and only HTML too - the server automatically assumes that it's HTML without asking), but not one which might redirect or might generate HTML (or generate something else, like a bitmap from, say, a ray tracer or ray caster ^^).
> I'm pretty sure that harvest-fnids will die in the split if there are < 2000 fnids and > 18000 timed-fnids, but I haven't verified this.
Haruu, this conjecture seems correct.
Edit: Fixed(?) and on the git. Somebody had better check my code though.
>Is lack of consistency something that bothers other people, or do I have the wrong mindset here?
It bothers me too. For that reason I'm practically rebuilding bits and pieces of the server (e.g. w/html, cached-table).
Basically the only thing that can't be redefined on Anarki are:
quote
quasiquote
if
compile
fn
set
lset
This also means that everything else can be redefined - including +, -, * , /, is, isnt, car, cdr, scar, scdr, sref... Sure, it's not, exactly, "recommended", but exploratory, exploratory...
For instance, scanners redefine car and cdr, and settable-fn's redefine sref.
I propose instead that the compiler detect if the program actually redefines + - * / is isnt etc. If they aren't (which will be 99.9999% of the time) then the compiler can treat them as builtins. Otherwise, the compiler must treat them as globals and insert their definitions into the code (the same way ccc is inserted); then when the program redefines it (probably using (let old car (= car (fn (c) (prn "foo!") (old c)))))), the "builtin" is stored as a function in a global, which the program can refer to and replace.
Yep. For the moment, I'm treating them as builtins. They will have to be implemented the way you mention it, that's for sure. I guess the easiest way would be to declare them both ways : as primitives (i.e., + translated to %+ when it is met) and as built-in functions, the way ccc is defined.
Hmm, that's true, you do need to handle stuff like:
(map + (list 1 2 3) (list 9 8 7))
Haruu, compiling dynamic languages is hard... ^^
For that matter, it shouldn't be so difficult - basically if the compiler ever sees (set + ...) anywhere (where + is a global, at least), it disables the use of %+.
Possibly, we could add another step in the transformation process (which is why I suggested the structure in http://arclanguage.com/item?id=5598 to allow easy insertion/deletion of steps).
Basically the new step just traverses the structure (without mutating it) and creates a list of built-ins to disable.
Edit: or better - it traverses the structure and replaces + with %+, until it reaches a top-level node which has (set + ...), so that code which doesn't use the redefined + will still refer to the builtin %+.
This is true; we'll have to change tags so that real objects are actual pointers. The alternative is to make everything a pointer, i.e. even "numbers" are really pointers to numbers.
I think it's easier to swap the meaning of numbers with objects rather than build our own GC.
As for infinite loops in GC - do you make sure not to scan a memory area you've already scanned?
Yes, and if he wants to write a GC, I advise him to start with a semi-space copying GC, which is simpler and has no recursions. By this you don't have to preserve an extra bit for mark-and-sweep, and you need no GC stacks for deep-first traversing.
I don't really like making everything a pointer. I don't know any Lisp implementation working this way (even the slowest ones). It really implies a performance hit every time you use a small integer value (with -2G < small < 2G). Anyway, binx's idea seems to be good...
Well, about infinite loops, I don't know, I stopped yesterday when my head was starting to burn... And gcc always compaining about pointers being of the wrong type (the way it is implemented, with tagged references, I have to cheat a lot). Oh, come on, gcc, who do you think you are, an Ada compiler ?
You don't need to make everything a pointer, but you can make everything a structure of 12 or 16 bytes. Lua takes this approach and it performs rather well.
Here's the Lua way:
struct obj{
int type;
union data{
double num;
void *ptr;
};
}
By this means, you can achieve architecture independence.
This approach is the best. In fact this has to be done, because not all computers have sizeof(int) == sizeof(void* ); certainly my AMD64 running on a 64-bit kernel doesn't.
Note that we can actually abstract away the representation of objects from the actual binary bits that represent it. For instance I would recommend that objects use the following representation (as noted by binx):
typedef struct s_obj{
enum e_type {
other = 0,
number = 1,
symbol = 2,
character = 3
}
type;
union u_data{
void* other;
int number;
void* symbol;
Uchar character; //unicode character
} data;
} obj;
#define OBJ_TYPE(o) ((o).type)
#define OBJ2FIXNUM(o) ((o).data.number)
#define FIXNUM_MIN INT_MIN //requires glibc
#define FIXNUM_MAX INT_MAX
//others defined similarly
However, for machines with the following characteristics:
1. 32-bit pointers and 32-bit int's
2. malloc() always returns an address at a 32-bit boundary (i.e. every four bytes, or with the lowest two bits zero)
In fact, maybe it's not a problem even on machines where ints are not as big as pointers. You can treat each value as an int pointer, and, when you need a fixnum, do
(int) my_int_ptr << 1
If ints are bigger than pointers, that's okay : you loose possible bits, but at least it's working (for example, if ints are on 6 bytes and addresses on 4 bytes, only the 4 least significant bytes are useful, the 2 other ones are lost). If pointers are bigger than ints, it's still working, but this time lost space is in their pointer representation.
The only thing is that fixnums don't have an architecture-independant size, but that's not a problem in practice anyway. The only constraint is that the biggest fixnum is
2**(max(sizeof(int), sizeof(int*)) - 2)
The last-bit-is-1-for-fixnum trick works on every architecture except machines where addresses are on 8 bits. Well, maybe we can ignore these ?
Actually it is, arc2c output crashes on my AMD64. I'm trying to hack out a replacement, but it's not working yet.
The problem is that apparently the bits of pointers that happen to be beyond the bits of ints are not zero, so assigning a pointer to an int chops off the significant bits.
Oh... Maybe a simple fix would be to change int to long. I know this is not standard behavior, but in practice I think long and pointers are always the same size. mzscheme obviously works that way : every object is a pointer that you can cast to a long if it is a fixnum.
However I think there may be processors/archs where sizeof(long) != sizeof(void* ). My attempt at fixing was to use a union, but something's wrong with the way closures are handled in my fix attempt - closures don't seem to have a type associated with them, so I'm not exactly sure how they're supposed to be done.
Edit: I'll probably need to search through C language specs, though - I'm not sure, but it might be standardized that sizeof(long) >= sizeof(void* )
What you want are the two typedefs intptr_t and uintptr_t . Each one is defined to be an integer big enough to hold a pointer (the former being signed, and the latter being unsigned); they aren't mandatory, though, so you might have
Making a fixnum a pointer to a allocated memory containing the value is terribly slow.
The infinite loop could be caused by two objects that references each other, e.g. a references b and b references a, so the GC keeps going from a to b and from b to a. To solve the problem, if this is the problem, if an object has been already marked, don't follow it.
Basically 'marcup would accept a list representing an abstract syntax tree for the HTML code to be generated.
Then I decided to implement w/html instead, which was almost as good ^^; >.< In fact it's arguably better, since the copious ' marks denote non-Arc code (i.e. HTML tags), as opposed to the marcup style above where , marks denote Arc code.
(def foo ()
(my-mac))
(mac my-mac ()
'(prn "foo!"))
(foo) ;this last will also depend on whether you're on Anarki or ArcN
So basically, you need to determine if a global would be assigned with a macro somehow, save the macro-code, and at macro-expansion time detect the macros and execute the macro-code in, say, an Arc interpreter.
In principle, whether a global variable will be assigned with a macro is impossible to detemine. Arc doesn't distinguish the run-time global environment and the expand-time global environment, so you still have to carry the source information to keep the semantics of Arc-n...
Yes you can - remember that macros must be assigned to global variables before they are used.
So what we do is, we use our own 'eval (this should be easy, this is a lisp after all). However, this 'eval's 'set must determine if it's assigning to a global or a local (again, this should be easy, since it has to know what to write where). Then it checks if a global assign is that of a macro.
For each top-level form, we 'eval the form and check if any global assign is a macro assign. Now we know if a macro has been assigned to a global. Then we compile the form.
The alternative is simply to check if a form has 'mac anywhere in it. If it has (mac ...) or (annotate 'mac ...) anywhere, we run this in 'eval instead of compiling that form.
Really, though, we don't need to keep the semantics of Arc completely. We can add a few rules for macro writers: Always use 'mac. Don't use anything other than the built-in functions, or if you really need a non-builtin function for your macro, then put it in a 'let form together with your macro, and put two copies of the 'let form.
Alternatively, we could just make a REPL for Arc, in Arc, write it in a form suitable for compilation, and then write the compiler in a form compatible with the REPL and arcN.
___
v \
REPL compiler
\___^
Besides: having some macros is better than having no macros.
You can't eval each top-level form in compile-time. They may have side-effects, may expect user inputs, and may terminate in weeks(for example, a program doing ray-tracing or nuclear-explosion simulation).
Explicitely distinguish expand-time and run-time is necessary for real static compilation. If you don't want to distinguish them, you are inevitably forced to do compilation in run-time.
> You can't eval each top-level form in compile-time. They may have side-effects, may expect user inputs, and may terminate in weeks(for example, a program doing ray-tracing or nuclear-explosion simulation).
And you're compiling (not executing) a top-level form that does those things?
All right then - make the limitation that macros must be defined in their own top-level forms, and if a top-level form ever contains 'mac in a function position, or (annotate 'mac ...), then it's executed and any macros it assigns to globals are treated as such: macros. If it terminates in weeks, too bad.
Alternatively just write an interpreted REPL and include the ability to compile code, but not include macros in compilation - macros must be loaded into the REPL, then the actual runnable code is compiled (and the compiled code's macros are ignored).
1) We could add a "fail" continuation which is called upon error. By default the "fail" continuation will be the same fail continuation as the caller, but for forms such as 'errsafe and 'after, we replace the "fail" continuation.
2) Alternatively, we use a global (or thread-local...) variable to store the current fail handler. Then 'errsafe and friends will store the current value of the fail handler in a local, replace it with its own, and then run its subforms; afterwards (whether it returned via fail or normal continuation) it restores the previous value of the fail handler.
Idea: macros
1) We could piggyback off the existing Arc and just use repeated 'macex until we reduce down to 'fn and 'if and function calls.
1.1) Then for metacompilation, we could create two levels of Arc: an interpreted Arc and a compiled Arc. Macros are done in interpreted Arc while full programs are compiled. Basically interpreted Arc would simply be an 'eval for Arc, in Arc.
Idea: Arc built-in library
1) In the sample we already have a "built-in" which is in fact added to the source to be compiled: ccc or call/cc. We could extend/generalize this to include a set of built-ins that are added to the source to be compiled, maybe like so:
; assuming car and cdr are true built-ins
(def-arc2c-builtin map1 (k f l)
(if l
(map1
(fn (rest-v)
(f
(fn (v)
(cons k v rest-v))
(%car l)))
f (%cdr l))
(k nil))
> In the sample we already have a "built-in" which is in fact added to the source to be compiled: ccc or call/cc. We could extend/generalize this to include a set of built-ins that are added to the source to be compiled.
I suppose the set of functions to be included would be those in arc.arc and libs.arc, in which case you would just compile arc.arc and libs.arc before compiling the target file, right?
Basically, ccc is handled by detecting if it is used. If it's not used, it's not inserted. When I was talking about extending this, this is what I was referring to: adding code which is defined as "inserted if used".
Simply inserting the entire arc.arc code will add bloat, because most programs don't even use most of the arc.arc code. Given the level of documentation of arc functions (arcfn.com notwithstanding), it is more than likely that the user will not use arc.arc code. So it's better if we simply insert the code if it is used.
Also, it may be better to use arc2c specific code, to take advantage of certain peculiarities in how code is generated. For example, if you decide to use unrolled lists, map1 and friends are better off allocating the full list and then iterating over the values.
This is fine if the compiler is static like Stalin, but if you omit parts of arc.arc that aren't used, you run the risk of not being able to deal with code known only at run time (e.g. (loop:print:eval:read)).
Come now, if you want real support for eval, you'll also need to include the compiler:
(eval `(fn (x) ,@my-variable)) ;how will it build the function?
The alternative is to punt: if there's ever an 'eval, then add arc.arc completely, make 'eval an interpreter which somehow uses 'symeval to lookup globals (and execute global functions as compiled functions), and when it encounters a function, will build the function as an interpreted function (and obviously allow interpreted code to call compiled functions and vice versa).
Basically you create a virtual function:
(def eval (e (o env))
(if
(caris e 'fn)
(add-attachment
'environment env
(annotate 'virtual-function
(cdr e)))
...))
(defcall virtual-function (f . args)
(with (env (get-attachment 'environment)
(arglist . body) args)
; has to be nondestructive
(zap add-args-to-environment env arglist args)
(each e body
(eval e env))))
For macros, that what I thought too. Anyway, we'll need eval for completeness. I don't know how it will behave with compiled code, though, and I know it's not an easy problem as stalin scheme simply ignored it.
For the other points, I didn't think about it yet :)