I also had a problem with tab and also think that it should be fixed.
Btw what do I need to load to get the base language without any web stuff. Are there an official "base" part? As.scm loads all the libraries for me. Do I need to load arc.arc only to get the base language.
I guess these issues will eventually be fixed. These are only bugs, not design issues.
As for is-a vs. has-a, I was wondering, isn't the biggest mistake considering that a piece of data only has one type ? That is, isa & type are broken ?
Considering values have a list of types (instead of a single type) would fix a few problems. For example, what is the type of nil ? Is it a symbol or a list ? Yes, it is both, but to do so, pg had to define the alist function ! (type nil) returns sym and cannot let you know nil is a list. What is '(1 2 3) ? Is it a cons or a list ? It is a list, but its type is only cons. type should return a list (it is already possible through annotate) and the definition of isa should be :
And the system would remain very generic (even more than now). Thus, almkglor's scanners would work with each (provided the definitions of car and cdr were adapted) :
(type s) -> '(cons scanner)
(isa cons s) -> t
Other representations would be possible, including the ones we already discussed here. The name isa does not necessarily mean we have an is-a semantic (and not has-a). Once again, each item in type is an annotation and can mean many things that are left to the programmer (it can be a class, a class type, a bunch of functions, whatever).
But the "objects-have-one-type" model seems broken. At least for lists, which is quite annoying for a Lisp !
"objects-have-one-type" model helps with nex3's defm:
(defm something ((t f scanner))
(do-something-on-scanner f))
(defm something ((t f cons))
(do-something-on-real-cons-cell f))
In this case, if we pass an object that is-a scanner and is-a cons, which method gets called?
This is the main reason I'm advocating is-a and has-a separation. We can say that an object is-a 'cons cell and has-a scanner. If something requires that an object is-a real, element-pointer-and-next-pointer 'cons cell, as opposed to somethingthing that requires that an object has-a 'car and 'cdr, we can make the distinction.
So we can say that an object is-a 'cons cell - it's what it really is, what it's implemented with. However, a 'cons cell has-a scanner interface, and if it's proper it has-a list interface, etc.
Separating is-a and has-a could also be useful for optimization of basic parts.
For example, a basic non-optimized diff algorithm might operate on has-a 'scanner, and use 'car and 'cdr operations. However a string-scanner is really just a wrapper around a string and an index into the string, as well as the string's length. Each 'string-scanner object contains three slots: one for the string, one for the index, and one for the length. This applies to each 'cdr on a string-scanner.
Now suppose we have a version of the diff algo which specifically detects if an object is-a string-scanner. It destructures the string-scanner into the string, index, and end, and instead of carrying around a triple of (string, index, length) it only carries the index, leaving string and length into local variables. This reduces memory consumption to only one-third.
(Note that the diff algo I posted a while back actually keeps entire sections of the list, in order to properly scan through their differences; that is, it keeps several scanners)
Hmm, I think there are 2 really different concepts here :
- type declaration of real-implementation (what you call "is-a"),
- type declaration in the sense of "capabilities" an object has (what you call has-a).
I think they should really be distinguished. The former is about optimized compilation, the latter about which functions can be applied to a given object.
But optimization is linked to variables (e.g. "in this block n always holds an integer, s always holds a string and l is always a cons) and does not need to be declared until you want to compile something.
On the opposite, capabilities are linked to values (e.g., "n, s and l are all scanners, they all have scanner capabilities, you can apply car and cdr to all of them. This is currently true, but could change if values referenced by n, s or l change). These are mandatory, and have to be known dynamically (this is not a declaration in the static meaning, they can even change later). When you apply car to a variable, you must know if its attached value can answer it (and eventually how).
(= str (string-scanner "foo bar baz"))
(type str)
-> (scanner string)
(def scan (s)
(if (no s)
""
(cons (foo (car s)) (cdr s))))
There, the values held by s are considered as a scanner and a string, that is, car and cdr can be applied to them. A dispatch algorithm is applied to them on the moment we need it. If, at any moment, an object held by s cannot be applied the method car or cdr, we have an error. Until we want more speed, that's enough.
Now suppose we want more. All we have to do is :
(def scan (s)
(istype string-scanner s
(if (no s)
(cons (foo (car s) (cdr s)))))
That way, for optimization purpose, we state that s only holds string-scanner objects. It does not even have to care with the annotations you added to the value (or values) held by s. If an object held by s is not really a string-scanner, well, anything could happen.
I might be wrong, but I think super-optimizing CL compilers work that way. You say them "optimize that function, and btw, this var always holds strings, don't even bother checking and dispatching the right function".
Quite accurate. The main thing is that I think people should use has-a for everyday programming, and only use is-a if absolutely necessary, e.g. optimization.
My second proposal, probably lost somewhere in the confusion, is that has-a information would be connected to an object's is-a type.
Yes, but then it does not work with the core functions (notably each & friends), as they are relying on isa. Renaming your function isa does not work either (that would be too easy) : atom and some call isa themselves, so you get in an infinite loop. Btw, obj is a macro (at least in Anarki, I don't know if it's present in the official Arc2), so your code has a red flag on it (although it does seem to work).
I tried this, it does work :
(redef isa (x typ)
(isa-rec type.x typ))
(def isa-rec (types typ)
(if
(no types) nil
(is types typ) t
(isnt type.types 'cons) nil
(is (car types) typ) t
(isa-rec (cdr types) typ)))
(isa nil 'sym) -> t
(isa (annotate '(sym list) nil) 'sym) -> t
(isa (annotate '(sym list) nil) 'list) -> t
(isa (annotate '(sym list) nil) 'cons) -> nil
And now the funny part :
(redef car (x)
(if (isa x 'int)
(if (> rep.x 0)
'a
nil)
(old x)))
(redef cdr (x)
(if (isa x 'int)
(if (> rep.x 0)
(annotate type.x (- rep.x 1))
nil)
(old x)))
Both car and cdr now work on objects of type 'int (and objects of type 'cons, as before). If an object is both an int and a cons, its int being is taken into consideration. Every operation requiring 'cons cells or calling car and cdr can now be overridden to use ints instead (an int being a list whose car is the symbol 'a and whose cdr is that num - 1).
The problem is that each calls acons and alist, but they are defined in terms of (is (type x) 'cons) instead of (isa x 'cons). Once you redefine them, it works fine.
pg, do you still accept suggestions about the core functions ? Shouldn't acons and alist be defined with isa instead of is ? That would let us redefine them more easily. Btw, what do you think of all these discussions about types ?
My current major objection is the size of the headlines, which I think should be larger. Alternatively make the summaries shorter. How do you generate the summaries?
Also, the bottommost part appears too ragged for me. Some amount of measurement may be possible.
A final suggestion: perhaps make the highest-ranked 1 or 2 headlines near the top of the page in even larger text (in addition to the headline+summary currently on the page.). The top headlines in large text should maybe be just headlines, no summary, but put an anchor link to the headline+summary version in the rest of the page. Basically, something a little like the banner headline of the front page of a newspaper.
Thanks for the feedback. The summaries are put in manually when the comment is posted, so nothing fancy. In practice one just has to remember to grab some text before using the bookmarklet (which fills in the rest).
A lot of people have said that the ragged bottom takes away a bit from the look and feel. There doesn't seem to be an easy way to fix that. The best I can come up with is to try and estimate the line height of each story, and if it exceeds a certain amount then truncate it, but only if it is on a page w/ other stories.
I'll play around with the top-top headlines idea. Is it inline with the look and feel of newspapers.
Thanks for the suggestions!
-----
Edit: The headlines are now bigger. It is better, thanks!
I moved them from 1em to 1.25em after trying it via firebug.
Regarding summaries, I think it would be possible to actually pull some text using Arc (although currently there seems to be no decent way to open a client connection, and certainly there don't seem to be any libraries for client-side HTTP). Of course the summarization would have to be done too. Hmm.
What I could do is if you use the bookmarklet, then it will pull in a best guess as to what the summary is and put it into the box. Then the user can edit that if they want. That wouldn't be too hard.
With ajax I could do the same thing once the user fills in the url.
IMO the hard part is the "best guess". ^^ I've been looking for papers about summarization and haven't found much. Hmm. Maybe look at the title and try to fetch words around words in the title, i.e. use the title's terms as search terms.
There is an easy way and a hard way to do it. For most articles, if you take the first element in the DOM that is a paragraph and contains above a certain number of words, then I am guessing that would most times be the leader paragraph.
The second easy way is to do the above most of the time, but have some site-specific things that are used instead.
You could also use some classifying software to ID the proper paragraph. You could have a training set of all the descriptions that have been on the site before, and find text that most matches that text, and use that. Or find the first bit of text that matches beyond a certain threshold, and use that.
The hardest way is to automatically generate a summary. I work in the automated document analysis business, and this is indeed pretty hard to do.
If I'm right, there are many concepts behind Unicode :
A Unicode string is a sequence of codepoints (or characters).
A codepoint (or character) is a numeric id that can be represented by many ways (UTF-32 : always 4 bytes ; UTF-8 : only one byte for codepoints < 128 ; from 2 to 4 bytes for codepoints >= 128 ; ...).
A glyph is what you display on the screen : a Unicode string is displayed as a sequence of glyphs. A glyph can be a single character, or the combination of 2 or more characters.
e.g., 10 glyphs on the screen can be represented as 11 characters (the two last ones being composed in a single glyph). Depending on the underlying "physical" encoding, these 11 characters can occupy 44 bytes (UTF-32) (with a O(1) access to substrings) or, say, 25 bytes (UTF-8) (with a O(n) access to substrings), or ...
In a few words : characters have a meaning in Unicode, but they don't match well with bytes (and even with physical representation) and, sometimes, with the way things are representing on the string.
Yep. As '() () and nil are equivalent, maybe #\x and "x" should be equivalent too (and so that would work with #\null too : it is the only way to read it (I think), but it is used as a 1-char-long string.
Wow, I think you won the price of the longest post so far. And it is even a very clever one, actually. And I think your view and almkglor's are not so far from each other.
You state that there should only be the few basic types currently defined in Arc. Paul's idea was to eventually get rid of strings (they are a special kind of list) and even numbers (they are a special kind of list too...). But he finally didn't, and won't, at least for numbers. He also said that this view (as few basic types as possible) finally forced him to develop a basic type system (with annotate, type, rep and isa) to distinguish between the raw list '(a a a a a) as the number 5, as the string "aaaaa" or as an actual list of 5 symbols.
In a way or another, you need explicit types if you want some kind of dynamic typing. Assembly language work the opposite way : e.g. you state (explictly or not) the arg of your function is a number. If the user gave you what he considers a string, too bad for him, because you can't distinguish between them. That means your function can't be polymorphic and you are stuck in an even more contrived space than with user types.
You need an isa function (call it isa or hasa, never mind as for now). For example, car should have a list, and nothing else. To do so, you have to check its type. If you want to redefine car so as to take scanners, generators, ... into consideration, that's easy too : just define your own version of car : if arg is a list, call the original car, else arg is a generator à la Python, so funcall it :
(let _ car car
(def car (x)
(if (isa x 'list)
(_car x)
((x)))))
Ok, you're right until now cchooper, predefined types are enough for these situations, and that's how we should do in such cases. Now imagine we want to deal with lists, generators and arrays defined through FFI. You have to distinguish between the latter two, but how can you do that ? Encapsulating them in a cons whose car is a discriminant between both types will not work here, as a cons isa 'list. That's why you need a way to define these new types, and that is what annotate is for.
(let _car car
(def car (x)
(if (isa x 'list)
(_car x)
(isa x 'generator)
((rep x))
(isa x 'array)
(a-get x 1)
(err "Not a valid type for car : " type.x))))
Now, about the distinction between isa and hasa. The wonderfull thing about annotate is that it is very generic ! It does not provide you with a way to say your data is of a specific type, it lets you annotate your data with whatever data you want ! The fact that it works with isa is a side effect actually, annotate does not care about isa. You can annotate with a symbol for sure, but also a string, a number, a list, a macro, a closure, a continuation, whatever !
That means you can do something this way, for example :
And you've got an object system where the car function is embedded into the data when you don't apply it to lists. That's it, you used annotate as it is now defined to create an has-a behavior. Almkglor has got many other funny ideas with typing, and I think all of them can be implemented simply with annotate and encapsulating old definitions of core functions and axioms into usertype-aware ones.
Maybe Arc needs a few more facilities right there (for example, having to use rep on annotated data is, I think, the biggest mistake of that type system. Please, pg, correct this !), but I think we've got everything we need. Almkglor only proposes a few macros and discipline in librarys, but this can be done with the current language definition (and ignored by everybody but him :)
> Wow, I think you won the price of the longest post so far.
Well someone has to load-test this thing!
> And it is even a very clever one, actually.
Thanks :)
> having to use rep on annotated data is, I think, the biggest mistake of that type system.
This is exactly this problem that got me thinking about types. I've been tempted to create a few types with annotate already, but each time I stopped because I didn't want to have to reimplement every function to work on my new type. Each time, I found a different solution to the problem that didn't require new types, which started me thinking "hey, perhaps we don't need new types after all!"
But you're right that you'll always need new types eventually. The solution you suggested is, I think, the right one. It's a bit like the object system in ANSI Common Lisp (pg even used hash tables to store the object's methods!) but it uses annotate to associate methods with objects.
So I'll modify my position and say that you should avoid creating new types, but if you have to do it, duck typing is the way to go.
I particularily like that one : "I expect type names will ordinarily be symbols, but they don't have to be. Either argument can be of any type. I can't imagine why users would want to have type labels other than symbols, but I also can't see any reason to prevent it."
You know, the first thing I thought when I read that x years ago was "You could pass around a load of functions as the type to do polymorphism"! I wonder if everyone has the same thought.
I'm using Arc at my work as a webapp. Really not a web site, just an app with a web frontend. I first used Arc0 (yep !) Switching to Arc1 was breaking a few minor things (the thread function changed). Switching to Arc2 did not cause any problem.
This is a single Arc process. As I needed it to be secured (through SSL), I use an Apache frontend. Requests on the SSL port are redirected on the 8080 port, which is Arc's port. Accessing directly this port (that is, not through SSL), is forbidden (see defsop macro).
I'ms using Ubuntu's 7.10 version of Mzscheme (360). I don't have any persistence problem (this is only a frontend to the underlying filesystem, and logins/passwords are dealt by Arc's web server). I don't have any probleme with access time, scaling and such things as I have only very few users and as it is a local application.
Adding / removing / adapting features in this app is a real pleasure. I never felt so productive in developing a webapp. Now, of course, this is a small one, I don't know how it feels to develop, say, a Hacker News forum...
At first sight, I totally agree with you. Characters seem even less usefull as, Arc strings being Unicoded, you can't use characters to manipulate raw bytes of data.