The port for HTTP should really be port 80. However, I think most of the people here who have deployed the Arc server generally put Apache (serving on port 80) between the real world and the Arc server (serving on port 8080). This is generally done because nobody trusts the Arc server to be hackproof yet, especially not the Anarki version, which has been touched by quite a few people and may not be perfectly secure (but at least has a few more abilities than the Arc2 version).
If you're willing to risk it, then try (nsv 80), which forces the Arc server to listen on 80; you might need to run as root though.
I don't have root access, so naturally (nsv 80) gives "listen on 80 failed (Permission denied..." error.
Apache is up, but I'm not sure what I can do with it. Is there a simple redirect I need to run?
To be precise, I've loaded arc2 into a directory [mydomain]/news corresponding to news.[mydomain].com, with the news.arc specifying "news.[mydomain].com" as the site url; I launch the Arc REPL from there ("mzscheme -m -f as.scm") and do "(nsv)"... to no avail.
Try accessing news.[mydomain].com:8080. If you have some sort of terminal access on your remote computer, try lynx http://news.mydomain.com:8080/
As for the Apache <-> Arc talking thing, you'll have to wait for someone else to answer that question; I honestly don't know, because I haven't deployed one yet.
The problem with transforming each Arc function to a C function is the stack space consumed at each call. Sure, gcc does tail call optimization, but not all compilers are gcc.
Chicken fixes this by keeping track of stack and GC'ing the stack when it's full; I'm not sure about Bigloo but I hear it's got a pretty good Scheme function == C function equivalency, so it might be useful to see how they fixed the tail call optimization problem there.
That said, the current execute() function accepts a pc argument, which specifies which Arc function it begins with. It may be possible to pass the pc to go to together with a new stack, but I don't know pthreads.
I think you could trust every decent C compiler to do tail recursion optimization, but if you want full compatibility then the mapping Arc fun -> C fun doesn't work. pthread requires you to pass a pointer to a C function i.e. an adress where to jump. We could wrap every thread within a C function and leave all the other functions as they are currently implemented. The C function would just call execute() with the right parameters. If execute() doesn't use global vars, there will not be race condition problems.
Well, I've started to import bits of execute from globals to locals. However I do have access to a few bits of global variables, specifically the quoted-constants array (those created by 'foo and '(a b c d), etc.); this table is initialized at startup (before the first call to execute). I would suppose this read-only array would be okay to access?
As for wrapping them in C functions: the problem is that the most basic Arc threading function isn't 'thread, it's 'new-thread, which accepts a function as input. 'thread is defined as:
(mac thread body
`(new-thread (fn () ,@body)))
In theory, new-thread could be called with any function:
Sure, it won't happen most of the time, since most people will sensibly use the simpler 'thread, but exploratory, exploratory...
It would be possible to implement if pthreads or whatnot can pass even just a single pointer to the newly-threaded function, but if it can't (why not?) then our alternative is to create a bunch of C functions for each Arc function, which just calls the execute() function with the correct number.
---------
Edit: okay, I did a little eensy-weensy bit of research on pthreads, and it seems that pthreads can pass a pointer to the called C function.
This could work. As for global variables, if they are read-only, then there won't be any problem. What happens if you load in parallel two different modules (with their constants)? Maybe loading a file should be made atomic, or at least a part of it, such as constant values initialization.
Yes, true. It would also be a pretty simple solution IMO, although potentially fraught with some dangers, especially when checking dead threads.
In any case I'm currently thinking of adding a library, where if a global variable is not read (only assigned to), it is simply removed.
Basically: extract the set of global variables that are read, and the set that are written to. If both sets are not equal, eliminate all set operations on members of the written set that are not members of the read set (replace them with their (car ast!subx)). Then eliminate any hanging constants: naked fn definitions and other constants that are in sequences not in the tail position. Repeat until set of global variables that are written is a full subset of the set of global variables that are read, and if they're not equal, raise an unbound variable error (i.e. a global is read but never assigned to, which as you mentioned would segfault. Not perfect of course, because reading the global before it is assigned to will still crash, but better than nothing).
This appears to be correct. However, what asynchronous I/O do you need?
It may be possible to write some sort of semaphore-like object which would synchronize across threads, alternatively I think you can probably query the state of threads.
Also, how do you want async I/O? Do you want a callback style where a callback function is called once the I/O completes or fails? Or an async I/O which aborts if it doesn't complete within the necessary time?
Either of the above solutions is implementable using threads.
Actually I can live without asynchronous I/O (smth like posix aio_* functions). Just nonblocking read and select/poll would be enough. I have some experience with threads and believe that they just are not worth the throuble in most cases. I find the twisted python's approach (with reactor and deferreds) much more elegant. So I wanted to implement smth similar in arc but it seems I can't at least in the obvious way.
Yes, thanks, it must work. But it means that you need to have exactly N suspended threads when you're waiting for data from N sockets. Thread pool helps a lot but in worst case you will still need exactly N threads.
Thread management is done automatically for you by the underlying mzscheme. A bit more of research seems to suggest that mzscheme can actually handle blocking I/O even if it uses green threads, and it can also have OS threads if compiled with support for it.
MzScheme does not use OS threads. A long time ago it did, but it is pretty hard to make it work on Windows, OS X and Linux at the same time. Also context switching becomes cheaper without OS threads.
If you have found anything on OS threads in "Inside MzScheme" it must be in the section on how to embed MzScheme into a C program (with its own thread).
Oh, is that rather usual that scheme implementation uses green threads? If so, maybe I really shouldn't worry about it. I was concerned only because I thought it was expensive to have one system thread per connection.
wow, congratulations, you are much more productive than I am ! I won't work a lot on the compiler this week as I am far from my own computer, so I guess many things will have changed next time I'll explore the code :)
I've been circling this problem for some time now, getting nowhere quick.
The problem is that the current style of implementing closures - where local and closure variables are copied to the closure structure - strikes me as a premature optimization. A useful one - if closures are not nested, then all variable references are O(1). However for shared mutable variables we need to use a different strategy. I'm trying to figure out some way of retaining the current closure style for cases where variables are not mutated, versus for cases that closure variables are.
BTW how come no one else seems to want to work on the compiler? It seems that aside from you I'm the only commiter?
Well, I'm watching with fascination, but it's over my head. I'm working through Essentials of Programming Languages (Friedman, Wand, and Haynes), though, so maybe soon ;)
I'm not convinced of that. It appears to be a macro for building a cache of various things with a timeout. It doesn't appear to touch the stories* table.
On an unrelated note, can anyone explain to me the memory model/threading mechanism interaction? It seemed at first that it was a single-threaded server, but after looking at srv.arc, it's clear that multiple threads are being spawned for the server process, yet stories, profs, and votes* aren't don't seem to be rebuilt from permanent storage, which implies the tables can be shared between processes. Which, I guess, goes back to my original question. So this isn't unrelated at all.
Hmm. After another cursory glance, yes, this appears about correct.
As for memory model: global variables are always shared between threads. The underlying implementation automatically locks tables when mutating them AFAIK.
As for suck up memory: I suppose the thinking is that a single post may very well take less than half a kilobyte, and modern computers have gigabytes of memory, so...
That was my thought too. I suppose I may have been overestimating the traffic to HN, or the size of the server or both. I guess I wanted a definitive answer about having to restart the server periodically. That seems to be the case, but it doesn't seem to be too bad.
This seems to be a pretty good model for a moderate-traffic app with data that gets modified infrequently. I don't think I'm quite ready to arc just for that, but I was wondering if this is a sensible approach in any other language, say, python? Any thoughts?
Well, Arc is still starting. In more-established languages, such approaches are unnecessary: there are libraries which will handle caching of data etc. properly, releasing unused memory for garbage collection once they are no longer accessed for some time, and rebuilding objects from permanent store if they have been disposed. In Arc it's still not yet implemented, so that approach works fine.
That said you might be interested in the so-called "Anarki" repository, which contains some of the elements I and others have built so that the server works a little better. For example: being able to serve files in subdirectories of your Arc installation, instead of the Arc installation; table-like data structures for caching data, or for persistent disk-based data; a slightly more extensible language, with some of the more common methods of extension already prepackaged in macros; etc.
Can you give an example of such libraries, off the top of your head?
I am basically interested in small multi-user applications that don't sit on top of relational databases. There really isn't too much information out there on the matter. Everyone seem to want to use a database, even for the simplest things. I suppose that given the pedigree of arc, this flat file storage business seems sensible enough.
Anyway, thanks for the pointer to anarki. I'll take a look at it.
For that matter, most languages prefer db's because of the fact that file storage operations don't have, umm, structure.
In fact the canonical Arc web app, news.arc, has a list structure to store in the "flat" file. Thus for simple apps where entities only have a few not very complex fields, textual representations of lists seem to be enough.
In other languages however their "array" syntax (which is approximately what lists are in Arc) is usually not readable by a built-in function a la lisp 'read. Also, their array syntax is usually not the center of attention, unlike in Lisp where the code syntax is itself the "array" syntax.
1) I use a new structure, the sharedvar, which doesn't correspond to any Arc structure. See the other post which had me blinking stupidly at stefano's transformation before I realized how cool it was: http://arclanguage.com/item?id=5784
2) This new structure is untyped (i.e doesn't have type tags, unlike the pair and symbol structures). I intend to make shared variables "seamless" to the upper Arc though, so this should be fine...
3) My original name for this structure was closure-settable, which I shortened to sharedvar instead.
Making the new structure untyped shouldn't create problems, but it could make development difficult: if you have a bug in the transformer or in something related to it, you'll probably get a Seg. fault error instead of a clean "Not a sharedvar" error. The check could then be removed when the code is released.
Hmm. So far the transformation seems correct anyway; and really, the transform is quite simple and appears mathematically correct. Also, because of the way the compile-file driver is structured, it would be possible to remove/replace each individual transform.
Which reminds me, we need to have a proper error continuation too.
The base idea is pretty good, although the 'cons cell (which is composed of a type id, a car pointer, and a cdr pointer) can be replaced with a smaller "closure-settable" cell, which would be an untyped object composed of a single pointer:
(fn (x)
; % means it's a primitive
(let x (%closure-settable x)
(list
(fn () (%closure-settable-read x))
(fn (v) (%closure-settable-write x v)))))
Because the closure variable abstraction is not accessible from the high-level Arc language, the closure-settable doesn't need to be typed tagged