Arc Forum | > So where will you're data be? In a local data structure?Yes, I think. The ...

Arc Forum

2 points by hjek 2778 days ago | link | parent

> So where will you're data be? In a local data structure?

Yes, I think. The database just stored in memory but it can be serialized and saved to the disk using `write-theory`[0] and loaded `read-theory`. That is what I'm doing for now, and it's a very naive and inefficient to do a full database dump rather than just appending new data, and I presume it's particularly in this area where Datomic is way more optimised and well thought out.

> Also, I'm curious what made you choose a graph db. It seems like you're inheriting a lot of complexity and I'm wondering what the benefit is over a more traditional sql or nosql db.

Well, I did the initial work on the web app: creating user accounts, adding posts and replies, and then I got to data storage. Initially I did a News-style flat-file database, just saving data as lists in files, that are then loaded into memory when the program starts. It mostly worked but also felt a bit complicated, and I thought that perhaps I should just use a proper database?

What I like about news.arc is that you can just launch it without any configuration, so MySQL and PostGreSQL were out of the question, and I started reading a bit about SQLite. But I've also had this fascination with logic programming, from what people are posting here[1][2], and from reading a bit of The Reasoned Schemer, and I watched some of those Rich Hickey talks again, where he talks about Datalog, which happens to be available for Racket.

There are just some things that are incredibly simple in declarative/logic programming. For example, if you have facts about stories being `parent` of their replies, then it's simple to just define the `ancestor` relation, and when you have the `ancestor` relation, you automagically get `descendants` without having to write any code, because it's just the inverse of `ancestor`:

    (! (:- (ancestor A B)
           (parent A B)))
    (! (:- (ancestor A B)
           (parent A C)
           (ancestor C B)))

But, I've also bumped into some questions - more practical than theorical - and that is why it's interested hearing about your experience with Datomic, and why I'm asking here.

So, SQLite is still on the table. I'm not too familiar with NoSQL, but my impression is that they are all about speed and scalability of data storage. I haven't used MongoDB but isn't it essentially just like storing JSON in a file, except faster? It would be interesting if any of those could be used in conjunction with Datalog though, if don't add too complexity for the sake of increased speed.

[0]: https://docs.racket-lang.org/datalog/interop.html#%28def._%2...

[1]: http://arclanguage.org/item?id=20650

[2]: http://arclanguage.org/item?id=20519

2 points by i4cu 2778 days ago | link

So a few things I wanted to point out:

Datomic vs. DataLog

Datomic uses DataLog as part of its query language, but that's pretty much where the comparison should end. Things like "treating the database as a value", and features such as data accretion that Rich talks about have nothing to do with DataLog. They're features of Datomic. So for example when you mention never retracting data, well your data size is going to continuously grow unless you write your own data management layer on top. Datomic, on the other hand, does this for you. When you want to query the database over time, then you're going to need to store time intervals for all of your data and incorporate that into each query. Where as in Datomic (which has a time log) you can pass in the DB itself as a value (with an associated time interval) and Datomic will make sure your queries are working against the dataset that accounts for the time interval.

I'm pointing this out because it seems to me that you're doing (or are going to be doing) a lot of work that may not be worth it for what you're trying to accomplish.

Nosql

> I'm not too familiar with NoSQL, but my impression is that they are all about speed and scalability of data storage.

Yes and No. Often speed can be a feature Nosql dbs advertise, but really, for me anyway, it's about flexibility and ease of use. Traditional RDBMS, for example, require creating schemas. Many Nosql databases don't require a schema at all which makes it easier to use and more flexible to change. Nosql's are often a key-value store so it can be really easy to take a hash-map or table of data from your code and just dump it into an nosql datastore and be able to query it.

My personal favourite is Redis and it might be worth considering for your app.

You can:

- store a value under a key [1]

- store table data [2]

- store values in a set [3] (which allows intersection/difference queries)

- store values in a sorted-set [4] (which allows you query by some numerical value like timestamp)

- use it to manage relationships [5]

The reasons I mention Redis is that the HN app is very well suited to it. HN only keeps 'x' amount of data in memory. And in Redis the data lives in memory. Also Redis allows you to set expiry times on data for auto eviction [6]. And Redis also supports ordered lists [7] which can make it useful for lisp based languages.

However it's not embedded. And if that's a requirement I'd almost suggest you move away from Racket and adopt a language that has more options for embeddable databases. I guess if you're willing to roll your own (and it looks like you may be) then that's awesome too.

But in case you decide otherwise... The library I use is Redis Carmine [8], but there are Racket clients [9].

1. https://redis.io/commands/set

2. https://redis.io/commands/mset

2a. https://redis.io/commands/mget

3. https://redis.io/commands/sadd

4. https://redis.io/commands/zadd

5. search: "Representing and querying graphs using an hexastore" https://redis.io/topics/indexes

6. https://redis.io/commands/expire

7. https://redis.io/commands/lset

8. https://github.com/ptaoussanis/carmine

9. https://redis.io/clients#racket

-----

2 points by hjek 2778 days ago | link

> Datomic uses DataLog as part of its query language, but that's pretty much where the comparison should end. Things like "treating the database as a value", and features such as data accretion that Rich talks about have nothing to do with DataLog.

I'm not sure I totally agree with this. I think that apart from talking about the design of Datomic, he also has a more general point against what he calls PLOP (PLace Oriented Programming), which Datalog does address.

For example in plain Racket a value is lost if something else is put in its place:

    > (define foo 'bar)
    > (define foo 'baz)
    > foo
    'baz

In Datalog you just accrete facts:

    > (! (is foo bar))
    > (! (is foo baz))
    > (? (is foo X))
    is(foo, bar).
    is(foo, baz).

Hickey is also mentioning how git doesn't do PLOP in that it doesn't throw out your commit history (without you asking it to do so).

> The reasons I mention Redis is that the HN app is very well suited to it. HN only keeps 'x' amount of data in memory. And in Redis the data lives in memory. Also Redis allows you to set expiry times on data for auto eviction [6].

Interesting. Just checked news.arc, and yes `initload*` is set to 15000. Interesting idea from Redis with expiry times. I'll check it out. I hadn't considered the scenario of storing text enough to max out on memory, because it would probably be premature optimisation, but good to keep in mind. I'd like to give Redis/Rackdis a try; thanks for the suggestion. I've been hosting an Etherpad Lite instance, and Redis was painless to setup.

> I'm pointing this out because it seems to me that you're doing (or are going to be doing) a lot of work that may not be worth it for what you're trying to accomplish.

Yes, my priorities here are definitely to make the code as brief and simple as possible, and to not have to do to much work. With plain Datalog it's very little work to timestamp a fact, and it's also kind of necessary, e.g. to figure out which fact is most recent, when previous facts are not removed. I'm just trying to get the gist of Hickey's ideas here.

-----

2 points by i4cu 2777 days ago | link

> PLOP (PLace Oriented Programming), which Datalog does address.

Yeah, I was thinking more along the lines that Datomic has built-in functionality to address the caching, cache eviction, and indexing that goes along with all that data accumulation. But you're correct, DataLog does accumulate facts.

> Interesting. Just checked news.arc, and yes `initload*` is set to 15000.

I did the same thing, about 6 or 7 years ago, that you're doing now. I ported HN to Clojure (which is actually how I learned Clojure). If memory serves me correctly when I was doing the work I realized I needed a real DB if I wanted to support load balancing. i.e. I needed to centralize the data for the authentication and fnid session info. I think Arc calls them fnids... You probably know better than I do now, but Arc has all this code to expire these session fnids and so, for me, Redis was just a good fit for that task.

Anyways, I'll be sure to take a look at the final result of your work.

Cheers.

-----

2 points by hjek 2777 days ago | link

> I did the same thing, about 6 or 7 years ago, that you're doing now. I ported HN to Clojure (which is actually how I learned Clojure).

Cool!

> I needed to centralize the data for the authentication and fnid session info. I think Arc calls them fnids... You probably know better than I do now, but Arc has all this code to expire these session fnids and so, for me, Redis was just a good fit for that task.

The Racket web server is quite "batteries included" and comes with these different managers[0] for dealing with expiration of sessions/continuations, such as the LRU manager:

> The memory limit is set to `memory-threshold` bytes. Continuations start with 24 life points. Life points are deducted at the rate of one every 10 minutes, or one every 5 seconds when the memory limit is exceeded. Hence the maximum life time for a continuation is 4 hours, and the minimum is 2 minutes.

> If the load on the server spikes—as indicated by memory usage—the server will quickly expire continuations, until the memory is back under control. If the load stays low, it will still efficiently expire old continuations.

[0]: https://docs.racket-lang.org/web-server/servlet.html?q=respo...

-----

2 points by i4cu 2777 days ago | link

> If the load on the server spikes...

When I was referring to load balancing and centralizing the data I was referring to many web servers sharing a centralized/external source for auth/session data.

I'm unfamiliar with racket's web server 'servlets'. The docs are little unclear (at least to me). Can these servlets live on a separate server so that the data can be shared between web servers? I'm guessing that was/is not a requirement for you, but I'm just interested in knowing if that's how it can work.

Uh oh, you're getting me interested in Racket now. I can't have that... I have too many projects :)

edit: I guess at the end of the day these servlets are web-servers right, so you can, even if you have to do it over http and build an api.

-----

2 points by hjek 2777 days ago | link

> Can these servlets live on a separate server so that the data can be shared between web servers?

Probably. I assume that serializable continuations[0] from stateless servlets can just be stored wherever, like in Redis or something, instead of in the memory of one server.

> I ported HN to Clojure

If that is something you have published, it'd be fun to see, whether it's finished or not.

> Uh oh, you're getting me interested in Racket now.

My impression is that Clojure is faster, less verbose partly due to clever syntax and provides more immutable data structures than Racket. But when it comes to documentation and error messages, I find Racket more coherent and comprehensible.

Say, if I wanted to connect to a SQL databse, with Racket I'd use the DB module[1], end of discussion. But with Clojure there's Korma, ClojureQL, Persist, HoneySQL, Yesql, a JDBC wrapper from Clojure contrib, SQLingvo, oj, Suricatta, aggregate, Hyperion, HugSQL, and probably a few more[2][3]. That multitude of libraries with similar purpose may be useful in some cases, sure, but also potentially a bit overwhelming for beginners, so I guess that's why I found it easier to get started with Racket.

[0]: https://docs.racket-lang.org/web-server/stateless.html#%28pa...

[1]: https://docs.racket-lang.org/db/

[2]: https://stackoverflow.com/questions/294802/use-a-database-wi...

[3]: https://adambard.com/blog/clojure-sql-libs-compared/

-----

2 points by i4cu 2776 days ago | link

> If that is something you have published, it'd be fun to see, whether it's finished or not.

I actually tried to look it out the other day during this conv, but it's buried somewhere unavailable right now. If I find/get to it I'll post.

> That multitude of libraries with similar purpose may be useful in some cases, sure, but also potentially a bit overwhelming for beginners, so I guess that's why I found it easier to get started with Racket.

Agreed. Navigating the volume libraries and the options available is a real pain in the beginning, but once you get past that, then it's not bad at all. At the same time, take a look at the quality of Clojure's Redis Carmine Library vs. Racket's Redis Libraries. Miles apart.

To each their own, right :)

-----

3 points by hjek 2768 days ago | link

> take a look at the quality of Clojure's Redis Carmine Library vs. Racket's Redis Libraries.

Good point.

Anyways, I'm going with SQLite. It's really fast[0] and I just found out about recursive selects[1].

[0]: https://www.sqlite.org/fasterthanfs.html

[1]: https://sqlite.org/lang_with.html

-----