Arc Forumnew | comments | leaders | submitlogin
Ask AF: Ordered Tables?
3 points by kinnard 2162 days ago | 33 comments
What would it take to add ordered tables[1] to arc? Behavior is different from tables and alists.

  arc> (= atist '((a "up") (b "down")))
  '((a "up") (b "down"))

  arc> atist.0
  '(a "up")

  arc> atist!a
  <: contract violation
    expected: real?
    given: 'a

  (= testble {a "up" b "down" })
  testble!a
  "up"

  arc> testbl.0
  'nil
Behaves differently than alists on index access, returning a value rather than a keyvalue pair and also supports key lookup returning a value:

  arc> (= grail (otable a "up" b "down"))
  arc> grail.0
  "up"
  arc> grail!a
  "up"
Q: If special syntax support for alists were added would a!k return a keyvalue pair (`assoc`) or just a value(`alref`)?

[1] https://en.wikipedia.org/wiki/Associative_containers



3 points by aw 2161 days ago | link

> Q: If special syntax support for alists were added would a!k return a keyvalue pair (`assoc`) or just a value(`alref`)?

One option is to extend calling lists so that calling a list with a number would continue to do the same thing (return the item at that position), while calling a list with a non-number would treat the list as an association list and do a lookup on that key.

Then no special syntax is needed because the standard `alist!x` would work, as it expands into `(alist 'x)`.

Naturally, this means that you couldn't do a lookup in your alist with a number using the `!` syntax, but that might be OK for you if you aren't using numbers in keys in the alists that you want to use the `!` syntax with.

This option wouldn't return the alist value on an index access though, as it would continue to return the association pair.

Another option would be to create a new type (e.g. using `annotate`) and then have calling objects of that type do what you want (for example, in Anarki you could use `defcall`).

-----

2 points by kinnard 2160 days ago | link

So resoning anaphorically

testible.clef means look up the value stored at what clef evaluates to in testible

testible!clef means look up the value stored at clef in testible

testable.0 means look up the value stored (at what 0 evaluates to) in testible

testible!0 means look up the value stored at 0 in testible

atist.0 means look up the value stored at what 0 evaluates to in atist

atist!0 means look up the value stored at 0 in atist

0 of course being an atom evaluates to 0

but what if you want it to evalute to something like the key stored at 0 . . . . . . if quote means something like "don't evaluate" and unquote means something like "do evaluate" then one could reason that atist,0 means look up the value stored at what 0 evaluates to (do evaluate it) in atist

  arc> atist.0
    '(a "up")
  
  arc> atist!0
    '(a "up")
  
  arc> atist,0
    "up"
! and . behave the same with alists and numbers while its inconvenient if you want to access the key it makes sense.

Is this reasonable?

Edit: this could work for alists and insertion-ordered tables since it's unobvious how testible!0 & testible.0 should behave, numbers can and should be able to be keys so one can imagine a situation where behavior would be like so:

  arc> (= oble {1 "dream" 2 "bigger" 0 "awake"})
    {1 "dream" 2 "bigger" 0 "awake"}
  arc> oble.0
    "awake"
  arc> oble!0
    "awake"
  arc> oble,0
    "dream"

-----

2 points by akkartik 2159 days ago | link

Pasting from my chat to you:

"Immediate reaction: this is a bad idea. Commas mean something specific in Lisp. And having `0` mean different things in different contexts is a recipe for disaster.

"I prefer aw's proposal above. Support list indexing, support alist lookup, don't support alist lookup by integer keys. Not the end of the world, people can just use `alref` in that situation."

-----

3 points by i4cu 2161 days ago | link

Just out of curiosity, what's your use case for needing ordered tables?

If it's just to lookup an entry by the index position IMHO that's not really a good trade for poor efficiency memory laden tables.

In my experience, more often than not, people think they need it when really they don't - and actually they're worse off for using it.

-----

3 points by kinnard 2161 days ago | link

It's for an application called pipeline: https://github.com/Kinnardian/pipeline

Meant to operate as a generalized pipeline for items to move through.

Could function as a basis for a kanban or as a sales management system.

In which cases the "stacks" as I call the stages in a pipeline would be:

"todo" "fortnight" "week" "today" "done"

"attention" "interest" "decision" "action" "love" "refer"

Respectively.

I was working on converting the whole thing from lists (https://github.com/Kinnardian/pipeline/commits/templatize) when I realized that tables are not order preserving (I can't believe this never came up working with json).

Order definitely matters, and I'm envisioning functionality like repeating tasks that once moved to the "done" stack/marked done wrap around to `pipe.0`(Imagine something that must be done once a month or once a week). I suppose I could tag items as monthly or weekly in which case they'd get appended to pipe!month or pipe!week as soon as they're done.

Another example is the `(createItem)` function which should take an item and an optional "stack" parameter. If no stack parameter items should be appended to the zeroeth stack `pipe.0` identified by index because it will be named by different keys in different pipelines.

Another example is a `(push)` function which pushes an item from its current stage to the "next" stage. With order this is simple.

EDIT: Order even matter matters within a stack. In the current kanban system I have a bunch of things I need to do today, the top priorities are actually at the top. I wouldn't want them getting all disheveled.

. . .(please excuse my disheveled codebase)

-----

3 points by rocketnia 2159 days ago | link

"I can't believe this never came up working with json"

I think order-preserving tables are pretty useful for JSON -- particularly to help ensure implementation details aren't leaking even in a distributed system where JSON is being passed through multiple parsers and serializers -- but it's worth noting that even JSON objects are specified to be unordered and are commonly treated as such. (RFC 8259: "An object is an unordered collection ...")

Even JavaScript's standard JSON.parse(...) doesn't preserve order the way you might like it to. Here's what I get in Firefox right now:

  > JSON.stringify(JSON.parse("{\"a\": 10, \"1\": 20}"))
  "{\"1\":20,\"a\":10}"
This is because JavaScript objects historically didn't preserve any particular iteration order, and although they have more consistent cross-browser behavior these days, they seem to have settled on some rules like iterating through numeric properties like "1" before other properties like "a". (From what I've found, I see people saying this behavior is a standard as of ES2015, but I don't see it in the ES2018 spec.)

JavaScript Maps are a different story; they're a newer addition to the language, and they're strictly specified to preserve insertion order. Other languages use dictionaries which preserve order, including Groovy (where [a: 1, b: 2] is a LinkedHashMap literal), Ruby, and... you've already found details on Python.

Even in Groovy, JavaScript (Maps), and Ruby, the only way to observe the order of their collections is to iterate from the beginning, and the only way to influence it is to insert entries at the end. This means any substantial interactions with the order of these collections will be just about as inefficient and inconvenient as if we had converted the whole collection to an association list, processed it that way, and converted it back. Essentially, it seems the order is preserved only to prevent programs from having language-implementation-dependent bugs, and maybe to help with recognizing values in debug output, not because it's a useful for programming.

Every one of these languages makes it easy to communicate the intent of an ordered collection by using lists or arrays of some sort. I'm pretty sure I've seen this lead to association lists even in JSON, particularly for things like database query results:

  [
    {"id":..., "name": ..., "occupation": ...},
    {"id":..., "name": ..., "occupation": ...}
  ]
When the order matters, I think this is the kind of approach to prefer, even if it takes a few more abstractions to make key lookup convenient.

-----

1 point by kinnard 2159 days ago | link

What do you think of this: http://arclanguage.org/item?id=21066

-----

2 points by rocketnia 2159 days ago | link

It could be me, but I have trouble imagining any situation where I would need this syntax.

- If I'm not doing both kinds of lookup several times in the same part of the code, the lookup styles don't both need to be concise at the same time. Different parts of the code can convert to different representations that are concise to use in that context.

- If it's the root layer of a data structure, then I can simply use two different local variables to refer to that data, one of which always treats it as a list and one of which always treats it as a table.

- Unless I'm processing some specific indices, I'll usually want to iterate through the whole data structure, so I usually wouldn't be using any indexing operations in the first place.

- If I know what specific indexes I want at development time, then I would usually use an unordered table (for easy indexing) or a fixed-length list (for easy destrucuring).

- If I'm performing any two lookups for the same reason, they can usually use the same expression in the code.

So I would have to be writing some code where I'm performing several different lookups of specific ordered indexes and specific chosen keyed indexes, all of which are for distinct reasons I know at development time but few of which are under indexes I know at development time. Moreover, each lookup must be two or more layers deep into a nested data structure.

This seems to me like a very specific situation. I suppose maybe it might come up more often in data-driven applications that have many scripted extensions which are specialized to certain configurations of data, but it seems to me even those would rarely care about specific ordered indexes.

Even what you've described about your application makes it sound like you only need to look things up by ordered indexes when the user wants to move an item from one stack to the next (or to the first). If that one part of your code is verbose, I recommend not worrying about it. If the rest of your code is verbose, how about converting your alists to tables whenever you enter that code?

-----

3 points by rocketnia 2159 days ago | link

I'm sorry, you have a very clear idea in mind of the data structure you want, and it's very reasonable to want to know how to build it in a way that gives you convenient syntax for it.

I think I'm just trying to encourage you to get to know other techniques because I think that's the easiest way to start. Building a custom data structure in Arc is not something many people go to the trouble to do, so it's a bit hard to come up with a good example.

I don't think you need any new ssyntaxes for this new data structure.

Say you have an ordered dictionary value `db` and you want convenient syntaxes to look up items by index (where 0 should give you the first item) and by key (where 0 should give you the item with key 0). One thing you might be able to do is this:

  db!by-key.0
  db!by-pos.0
To get these lookups working, you only need to use Anarki's `defcall`. Here's a stub that might get you started:

  (defcall ordered-dict (self val)
    (case val
      by-key (fn (key) ...)
      by-pos (fn (pos) ...)
      (err "Expected by-key or by-pos")))
Unfortunately, this doesn't allow you to assign:

  (= db!by-key.0 "val")
  (= db!by-pos.0 "val")
To get these to work, you must know a little bit about how Arc implements assigment. The above two calls are essentially turn into this:

  (sref db!by-key "val" 0)
  (sref db!by-pos "val" 0)
This means you need to extend `sref` using Anarki's `defextend` to make these work.

But we've defined db!by-key and db!by-pos to be procedures that only know how to get the value, not how to set it. We need them to return something that knows how to do both.

So let's define another type, 'getset, which contains a getter function and a setter function.

  (def make-getset (get set)
    (annotate 'getset (list get set)))
  
  (def getset-get (gs . args)
    (apply rep.gs.0 args))
  
  (def getset-set (gs val . args)
    (apply rep.gs.1 val args))
We need it to be callable and settable:

  (defcall getset (self . args)
    (apply getset-get self args))

  (defextend sref (self val . args) (isa self 'getset)
    (apply getset-set self val args)))
Now we can refactor our (defcall ordered-dict ...) definition to put in implementations for setting:

  (defcall ordered-dict (self val)
    (case val
      
      by-key
      (make-getset
        (fn (key) ...)
        (fn (key val) ...))
      
      by-pos
      (make-getset
        (fn (pos) ...)
        (fn (pos val) ...))
      
      (err "Expected by-key or by-pos")))
You should pick some representation to use (annotate 'ordered-dict ...) with (or whatever other name you like for this type) so you can implement all four of these methods.

Note that we don't need to do (defextend sref ...) for ordered-dict values here because the `sref` calls don't ever get passed the table directly. You would only need that if you wanted (= db!by-key "val") or (= db.0 "val") to do something, but those look like buggy code to me.

So far, so good, but you probably want to do more with ordered dictionaries than getting and setting.

There's at least one more utility you might like to `defextend` in Anarki for use with ordered dictionaries: `walk`. A few Anarki utilities like `each` internally call `walk` to iterate over collections, so if you extend `walk`, you can get several Anarki utilities to work with ordered dictionaries all at once.

You may also want to make various other utilities for coercing between ordered dictionatires and other Anarki values, such as tables and alists. That way you can easily use Anarki's existing table and alist utilities when they work for your situation.

Note that this approach doesn't make it possible to use the {...} curly brace syntax you and shawn have been talking about. Since there are potentially many kinds of tables, I'm not sure whether giving them all read syntaxes really makes sense; we might start getting into some obscure punctuation. :-p

I hope that's a little bit more helpful advice for your situation. I tried to write up something like this the other day and got carried away implementing it all myself, but then I realized it might not be what you really wanted to use here. It didn't even take an approach as convenient to use as this one. Maybe this can give you enough of a starting point to reach a point you're happy with.

-----

3 points by i4cu 2158 days ago | link

What about using '#' for by-pos and '?' for by-key via the reader?

  db#0
  db?0

-----

3 points by i4cu 2161 days ago | link

I agree that order matters. It matters in pretty much every application that uses data. Even HN has data ordered by newest, rank (best) etc. And Arc has a tonne of functionality to support sorting and comparing data to let you do that very thing.

But I don't see the need to order data in your application equating to the need for tables that support constant insertion order.

I 'm not going to say what you're doing is wrong because it's not, but I am going to say it's non-standard and I think there other ways to manage your data that doesn't require ordered tables (which have downsides with data growth). It's just my opinion, but there's not much you can't do with the standard storing of table records and maintaining of indexes.

That said if your data load is always low with little to no growth it could very well be a good fit.

-----

2 points by kinnard 2161 days ago | link

This may have no bearing on how tables are implemented in arc but a more efficient implementation happened to be ordered:

https://morepypy.blogspot.com/2015/01/faster-more-memory-eff...

"One surprising part is that the new design, besides being more memory efficient, is ordered by design: it preserves the insertion order."

-----

1 point by i4cu 2160 days ago | link

> "being more memory efficient"

It's only more memory efficient than its previous incarnation.

It's an interesting implementation though. The use of sparse arrays to index the entries is compelling. Measuring this kind of stuff can be non-trivial though as you also have to account for gc (and/or compaction) at different times within your application. There are always trade-offs.

Personally I haven't compared the various implementations (clojure vs. racket vs. python) such that I can give you any real insight. I know clojure's (or rather java's) array-maps are costly when performing iterative lookups within large data sets because I've benchmarked it.

The best bet is to see how an implementation works for your app with your data.

-----

1 point by kinnard 2160 days ago | link

> "The microbenchmarks that we did show large improvements on large and very large dictionaries (particularly, building dictionaries of at least a couple 100s of items is now twice faster) and break-even on small ones (between 20% slower and 20% faster depending very much on the usage patterns and sizes of dictionaries)."

-----

2 points by i4cu 2159 days ago | link

relative to... its previous incarnation.

Microbenchmarks against a previous version only means they've made a relative improvement.

The benchmarks that would matter to us (or at least me) are:

1. how does it compare to an equivalent implementation without having to maintain insertion order.

2. how does it hold up under stress (larger data sets, with heavy load where gc/compaction have to occur)

Obviously none of this should matter to you as you've said your data load is low with no growth. So bobs your uncle.

-----

1 point by kinnard 2161 days ago | link

I do think the data load for this would always be low.

I feel like implementing tables with indices would basically be like implementing ordered dictionaries (new nomenclature I've arrived at per: http://arclanguage.org/item?id=21037 ), no?

-----

2 points by akkartik 2162 days ago | link

That Wikipedia page seems to only talk about C++. I just checked and C++'s `map` doesn't support indexing by integer: http://www.cplusplus.com/reference/map/map.

So I don't think this is something anybody supports:

  arc> (= grail (otable a "up" b "down"))
  arc> grail.0
  "up"
Ordered associative containers merely have a well-defined order for iterating over. And the order has nothing to do with the order in which elements were inserted.

So what you're asking for is interesting and plausible, but I don't think "ordered tables" is quite the precise label for it.

-----

2 points by kinnard 2162 days ago | link

Damn. I thought I might've gotten that mixed up. Should've checked.

I think clojure's array-maps do it but with some limitations:

https://clojure.org/reference/data_structures#ArrayMaps

http://arclanguage.org/item?id=21014

-----

2 points by kinnard 2161 days ago | link

Another example and probably better name is Ordered Dict:

https://pymotw.com/2/collections/ordereddict.html

https://docs.microsoft.com/en-us/dotnet/api/system.collectio...

-----

1 point by kinnard 2161 days ago | link

Dicts are ordered as a feature in Python 3.6^:

https://stackoverflow.com/questions/39980323/are-dictionarie...

-----

1 point by kinnard 2161 days ago | link

Sadly: https://stackoverflow.com/a/44687752/1236793

-----

1 point by akkartik 2161 days ago | link

How about AutobiographicalDict? ^_^

More serious suggestion: Journaling Dict. Maybe the Arc type could be `jtable`?

'order' is pretty ambiguous here. The fact that some places use the word to mean what we mean doesn't seem like sufficient reason to follow suit.

-----

1 point by kinnard 2160 days ago | link

For some reason I can't escape associating journal and "jour" (day) in French, journaling having to do with something done daily.

Maybe ledger => ltable which is insertion-ordered would be better but that has all sorts of other implications.

-----

1 point by kinnard 2161 days ago | link

I mean 'insertion order'

-----

1 point by akkartik 2161 days ago | link

But do you see that other interpretations are possible?

-----

1 point by kinnard 2161 days ago | link

Yes like a variety of consistent sort orders

-----

1 point by akkartik 2161 days ago | link

Not quite. The distinction I'm drawing is between ordering the elements based on their intrinsic properties, and ordering the elements based merely on the order they're inserted in.

-----

1 point by kinnard 2160 days ago | link

Do you mean other possible orderings like those dependent on the implementation of the map/table/obj structure?

Or orderings dependent on keyvalue pairs not just keys?

-----

2 points by i4cu 2160 days ago | link

I think he's just stating their use of the term 'order' is a poor choice when the word 'order' is in fact non-specific. i.e. People use 'order' to categorize things that are stable and have order predictability, but let's not pin the term 'order' to a single variant such as the insertion order.

-----

2 points by kinnard 2160 days ago | link

Know a more succinct term for insertion order?

-----

2 points by i4cu 2160 days ago | link

Well he gave you a pretty good 'name' which is the topic (semantics I know :).

'jtable'

Journaling infers logging by insertion order.

so jtable, or log-table are good no?

This is kind of niggly stuff. Normally we create something before debates ensue about naming... :)

-----

3 points by kinnard 2160 days ago | link

Ah, I overlooked that! I'm unfamiliar with that usage of journal in this context. Just looked it up.

-----

1 point by kinnard 2160 days ago | link

I agree. It's a big jump for me to try to implement an insertion-ordered table w/e it ends up being called.

And it's unclear what the behavior should be: http://arclanguage.org/item?id=21066

-----