Arc Forumnew | comments | leaders | submitlogin
3 points by kinnard 2161 days ago | link | parent

It's for an application called pipeline: https://github.com/Kinnardian/pipeline

Meant to operate as a generalized pipeline for items to move through.

Could function as a basis for a kanban or as a sales management system.

In which cases the "stacks" as I call the stages in a pipeline would be:

"todo" "fortnight" "week" "today" "done"

"attention" "interest" "decision" "action" "love" "refer"

Respectively.

I was working on converting the whole thing from lists (https://github.com/Kinnardian/pipeline/commits/templatize) when I realized that tables are not order preserving (I can't believe this never came up working with json).

Order definitely matters, and I'm envisioning functionality like repeating tasks that once moved to the "done" stack/marked done wrap around to `pipe.0`(Imagine something that must be done once a month or once a week). I suppose I could tag items as monthly or weekly in which case they'd get appended to pipe!month or pipe!week as soon as they're done.

Another example is the `(createItem)` function which should take an item and an optional "stack" parameter. If no stack parameter items should be appended to the zeroeth stack `pipe.0` identified by index because it will be named by different keys in different pipelines.

Another example is a `(push)` function which pushes an item from its current stage to the "next" stage. With order this is simple.

EDIT: Order even matter matters within a stack. In the current kanban system I have a bunch of things I need to do today, the top priorities are actually at the top. I wouldn't want them getting all disheveled.

. . .(please excuse my disheveled codebase)



3 points by rocketnia 2159 days ago | link

"I can't believe this never came up working with json"

I think order-preserving tables are pretty useful for JSON -- particularly to help ensure implementation details aren't leaking even in a distributed system where JSON is being passed through multiple parsers and serializers -- but it's worth noting that even JSON objects are specified to be unordered and are commonly treated as such. (RFC 8259: "An object is an unordered collection ...")

Even JavaScript's standard JSON.parse(...) doesn't preserve order the way you might like it to. Here's what I get in Firefox right now:

  > JSON.stringify(JSON.parse("{\"a\": 10, \"1\": 20}"))
  "{\"1\":20,\"a\":10}"
This is because JavaScript objects historically didn't preserve any particular iteration order, and although they have more consistent cross-browser behavior these days, they seem to have settled on some rules like iterating through numeric properties like "1" before other properties like "a". (From what I've found, I see people saying this behavior is a standard as of ES2015, but I don't see it in the ES2018 spec.)

JavaScript Maps are a different story; they're a newer addition to the language, and they're strictly specified to preserve insertion order. Other languages use dictionaries which preserve order, including Groovy (where [a: 1, b: 2] is a LinkedHashMap literal), Ruby, and... you've already found details on Python.

Even in Groovy, JavaScript (Maps), and Ruby, the only way to observe the order of their collections is to iterate from the beginning, and the only way to influence it is to insert entries at the end. This means any substantial interactions with the order of these collections will be just about as inefficient and inconvenient as if we had converted the whole collection to an association list, processed it that way, and converted it back. Essentially, it seems the order is preserved only to prevent programs from having language-implementation-dependent bugs, and maybe to help with recognizing values in debug output, not because it's a useful for programming.

Every one of these languages makes it easy to communicate the intent of an ordered collection by using lists or arrays of some sort. I'm pretty sure I've seen this lead to association lists even in JSON, particularly for things like database query results:

  [
    {"id":..., "name": ..., "occupation": ...},
    {"id":..., "name": ..., "occupation": ...}
  ]
When the order matters, I think this is the kind of approach to prefer, even if it takes a few more abstractions to make key lookup convenient.

-----

1 point by kinnard 2159 days ago | link

What do you think of this: http://arclanguage.org/item?id=21066

-----

2 points by rocketnia 2159 days ago | link

It could be me, but I have trouble imagining any situation where I would need this syntax.

- If I'm not doing both kinds of lookup several times in the same part of the code, the lookup styles don't both need to be concise at the same time. Different parts of the code can convert to different representations that are concise to use in that context.

- If it's the root layer of a data structure, then I can simply use two different local variables to refer to that data, one of which always treats it as a list and one of which always treats it as a table.

- Unless I'm processing some specific indices, I'll usually want to iterate through the whole data structure, so I usually wouldn't be using any indexing operations in the first place.

- If I know what specific indexes I want at development time, then I would usually use an unordered table (for easy indexing) or a fixed-length list (for easy destrucuring).

- If I'm performing any two lookups for the same reason, they can usually use the same expression in the code.

So I would have to be writing some code where I'm performing several different lookups of specific ordered indexes and specific chosen keyed indexes, all of which are for distinct reasons I know at development time but few of which are under indexes I know at development time. Moreover, each lookup must be two or more layers deep into a nested data structure.

This seems to me like a very specific situation. I suppose maybe it might come up more often in data-driven applications that have many scripted extensions which are specialized to certain configurations of data, but it seems to me even those would rarely care about specific ordered indexes.

Even what you've described about your application makes it sound like you only need to look things up by ordered indexes when the user wants to move an item from one stack to the next (or to the first). If that one part of your code is verbose, I recommend not worrying about it. If the rest of your code is verbose, how about converting your alists to tables whenever you enter that code?

-----

3 points by rocketnia 2159 days ago | link

I'm sorry, you have a very clear idea in mind of the data structure you want, and it's very reasonable to want to know how to build it in a way that gives you convenient syntax for it.

I think I'm just trying to encourage you to get to know other techniques because I think that's the easiest way to start. Building a custom data structure in Arc is not something many people go to the trouble to do, so it's a bit hard to come up with a good example.

I don't think you need any new ssyntaxes for this new data structure.

Say you have an ordered dictionary value `db` and you want convenient syntaxes to look up items by index (where 0 should give you the first item) and by key (where 0 should give you the item with key 0). One thing you might be able to do is this:

  db!by-key.0
  db!by-pos.0
To get these lookups working, you only need to use Anarki's `defcall`. Here's a stub that might get you started:

  (defcall ordered-dict (self val)
    (case val
      by-key (fn (key) ...)
      by-pos (fn (pos) ...)
      (err "Expected by-key or by-pos")))
Unfortunately, this doesn't allow you to assign:

  (= db!by-key.0 "val")
  (= db!by-pos.0 "val")
To get these to work, you must know a little bit about how Arc implements assigment. The above two calls are essentially turn into this:

  (sref db!by-key "val" 0)
  (sref db!by-pos "val" 0)
This means you need to extend `sref` using Anarki's `defextend` to make these work.

But we've defined db!by-key and db!by-pos to be procedures that only know how to get the value, not how to set it. We need them to return something that knows how to do both.

So let's define another type, 'getset, which contains a getter function and a setter function.

  (def make-getset (get set)
    (annotate 'getset (list get set)))
  
  (def getset-get (gs . args)
    (apply rep.gs.0 args))
  
  (def getset-set (gs val . args)
    (apply rep.gs.1 val args))
We need it to be callable and settable:

  (defcall getset (self . args)
    (apply getset-get self args))

  (defextend sref (self val . args) (isa self 'getset)
    (apply getset-set self val args)))
Now we can refactor our (defcall ordered-dict ...) definition to put in implementations for setting:

  (defcall ordered-dict (self val)
    (case val
      
      by-key
      (make-getset
        (fn (key) ...)
        (fn (key val) ...))
      
      by-pos
      (make-getset
        (fn (pos) ...)
        (fn (pos val) ...))
      
      (err "Expected by-key or by-pos")))
You should pick some representation to use (annotate 'ordered-dict ...) with (or whatever other name you like for this type) so you can implement all four of these methods.

Note that we don't need to do (defextend sref ...) for ordered-dict values here because the `sref` calls don't ever get passed the table directly. You would only need that if you wanted (= db!by-key "val") or (= db.0 "val") to do something, but those look like buggy code to me.

So far, so good, but you probably want to do more with ordered dictionaries than getting and setting.

There's at least one more utility you might like to `defextend` in Anarki for use with ordered dictionaries: `walk`. A few Anarki utilities like `each` internally call `walk` to iterate over collections, so if you extend `walk`, you can get several Anarki utilities to work with ordered dictionaries all at once.

You may also want to make various other utilities for coercing between ordered dictionatires and other Anarki values, such as tables and alists. That way you can easily use Anarki's existing table and alist utilities when they work for your situation.

Note that this approach doesn't make it possible to use the {...} curly brace syntax you and shawn have been talking about. Since there are potentially many kinds of tables, I'm not sure whether giving them all read syntaxes really makes sense; we might start getting into some obscure punctuation. :-p

I hope that's a little bit more helpful advice for your situation. I tried to write up something like this the other day and got carried away implementing it all myself, but then I realized it might not be what you really wanted to use here. It didn't even take an approach as convenient to use as this one. Maybe this can give you enough of a starting point to reach a point you're happy with.

-----

3 points by i4cu 2158 days ago | link

What about using '#' for by-pos and '?' for by-key via the reader?

  db#0
  db?0

-----

3 points by i4cu 2161 days ago | link

I agree that order matters. It matters in pretty much every application that uses data. Even HN has data ordered by newest, rank (best) etc. And Arc has a tonne of functionality to support sorting and comparing data to let you do that very thing.

But I don't see the need to order data in your application equating to the need for tables that support constant insertion order.

I 'm not going to say what you're doing is wrong because it's not, but I am going to say it's non-standard and I think there other ways to manage your data that doesn't require ordered tables (which have downsides with data growth). It's just my opinion, but there's not much you can't do with the standard storing of table records and maintaining of indexes.

That said if your data load is always low with little to no growth it could very well be a good fit.

-----

2 points by kinnard 2161 days ago | link

This may have no bearing on how tables are implemented in arc but a more efficient implementation happened to be ordered:

https://morepypy.blogspot.com/2015/01/faster-more-memory-eff...

"One surprising part is that the new design, besides being more memory efficient, is ordered by design: it preserves the insertion order."

-----

1 point by i4cu 2160 days ago | link

> "being more memory efficient"

It's only more memory efficient than its previous incarnation.

It's an interesting implementation though. The use of sparse arrays to index the entries is compelling. Measuring this kind of stuff can be non-trivial though as you also have to account for gc (and/or compaction) at different times within your application. There are always trade-offs.

Personally I haven't compared the various implementations (clojure vs. racket vs. python) such that I can give you any real insight. I know clojure's (or rather java's) array-maps are costly when performing iterative lookups within large data sets because I've benchmarked it.

The best bet is to see how an implementation works for your app with your data.

-----

1 point by kinnard 2160 days ago | link

> "The microbenchmarks that we did show large improvements on large and very large dictionaries (particularly, building dictionaries of at least a couple 100s of items is now twice faster) and break-even on small ones (between 20% slower and 20% faster depending very much on the usage patterns and sizes of dictionaries)."

-----

2 points by i4cu 2159 days ago | link

relative to... its previous incarnation.

Microbenchmarks against a previous version only means they've made a relative improvement.

The benchmarks that would matter to us (or at least me) are:

1. how does it compare to an equivalent implementation without having to maintain insertion order.

2. how does it hold up under stress (larger data sets, with heavy load where gc/compaction have to occur)

Obviously none of this should matter to you as you've said your data load is low with no growth. So bobs your uncle.

-----

1 point by kinnard 2161 days ago | link

I do think the data load for this would always be low.

I feel like implementing tables with indices would basically be like implementing ordered dictionaries (new nomenclature I've arrived at per: http://arclanguage.org/item?id=21037 ), no?

-----