2025-08-02.md

August 2, 2025 · View on GitHub

Comparing Interpreted Datalogs

The Minnowbrook Logic Programming Seminar now has a web page, and on it are other blog posts about the event! Michael Greenberg's post in particular calls out one recurring theme: "Everyone beats Soufflé". Folks working in the Datalog space compare to Soufflé, and they don't usually lose (of course, they keep programming until they don't). I should stress, as Michael does, that the point isn't that Soufflé is bad, a soft and fluffy target so to say, as much as that Soufflé is pervasive and useful. This take is part of why I spun up the datatoad work: to make something useful and useable foremost, rather than primarily fast and lean.

So ignoring that line of thought entirely, how fast is datatoad compared to Soufflé?

Soufflé provides both an interpreted and compiled Datalog, and which has become something of a standard. In my dated experience, it was pretty good but had some quirks. You often need to specify plans as hints outside of Datalog, for example, otherwise it could go off the rails.

We've spent a bit of time beating up on Graspan, and it feels like it is time to graduate to Soufflé.

Many years ago, I wrote a paper called PINQ: Private integrated queries, which showed how to build differentially private algorithms using a little language. I was pleased, but soon thereafter papers started showing up announcing how PINQ was limited, not good enough, had problems. I was heartbroken.

I remained cross for a while, but eventually realized that this was how you know you have made it in academia. Once your work is the target that others are aiming at, of course they are going to announce its limitations; they don't get a publication otherwise. They want a publication, and your work is what is in their way. Moreover, everyone now agrees that your work is how you measure forward progress. I felt a bit better.

Then I realized most of my prior work hadn't had much impact at all.

Soufflé has mattered, is why folks compare against it. The repo's list of features calls out nine publications. No one compares against datafrog. With that in mind, our goal isn't to murderize Soufflé, as much as see how datatoad compares against a reasonable reference point.

The GALEN project

I have a collection of data from the GALEN project, whose only URL that I had is now dead. The query is non-trivial, with some cyclic rules, and relations with more than two variables (bye, Graspan). The rules look like so:

p(?x,?z) :- p(?x,?y), p(?y,?z).
p(?x,?z) :- p(?y,?w), u(?w,?r,?z), q(?x,?r,?y).
p(?x,?z) :- c(?y,?w,?z), p(?x,?w), p(?x,?y).
q(?x,?r,?z) :- p(?x,?y), q(?y,?r,?z).
q(?x,?q,?z) :- q(?x,?r,?z), s(?r,?q).
q(?x,?e,?o) :- q(?x,?y,?z), r(?y,?u,?e), q(?z,?u,?o).

I couldn't tell you what the rules mean, but there's a paper that goes in to detail about them.

The underlying facts are not especially large. I wrote a more general file loader, and with them loaded in the count of their numbers of facts is.

> .list
        c:      230161
        p:      353216
        q:      295448
        r:      385
        s:      1006
        u:      96336
64.375µs

>

At this point, we can copy/paste those rules in and see how long it takes!

> p(?x,?z) :- p(?x,?y), p(?y,?z).p(?x,?z) :- p(?y,?w), u(?w,?r,?z), q(?x,?r,?y).p(?x,?z) :- c(?y,?w,?z), p(?x,?w), p(?x,?y).q(?x,?r,?z) :- p(?x,?y), q(?y,?r,?z).q(?x,?q,?z) :- q(?x,?r,?z), s(?r,?q).q(?x,?e,?o) :- q(?x,?y,?z), r(?y,?u,?e), q(?z,?u,?o).
191.918791375s

>

This is a fair bit longer than the times we've previously seen with the Graspan workloads. Perhaps this will be a new reference problem that we can plumb for new optimizations! Hold that thought.

Soufflé, Interpreted

We can compare this with Soufflé's interpreted mode.

mcsherry@gallustrate input % time souffle ../query.dl
souffle ../query.dl  130.64s user 0.98s system 99% cpu 2:12.10 total

We are within striking distance already, which is great news!

Soufflé also has a compiled mode, which can go substantially faster. I've evaluated it before, but attempting the same workflow on my current laptop produces a wall of C++ errors. Soufflé itself is compiled from source, so the tools can work, but something about how they are held goes sideways. I'm not in a position to diagnose and fix those, but if you are and want to, amazing! The Soufflé interpreted numbers above are already faster than the compiled code run five years ago (on an older, but more infamous laptop). But let's imagine that the compiled numbers could be a fair bit faster still. We'll want to markedly out-perform interpreted Soufflé, if we can!

So, we were within striking distance. Unfortunately (for me) Bernhard Scholz, a core person behind Soufflé, provided an optimized form of the query for which Soufflé performs better:

p(?x,?z) :- p(?x,?y), p(?y,?z).  .plan 1:(2, 1)
p(?x,?z) :- u(?w,?r,?z), p(?y,?w), q(?x,?r,?y).
p(?x,?z) :- c(?y,?w,?z), p(?x,?w), p(?x,?y).
q(?x,?r,?z) :- q(?y,?r,?z),  p(?x,?y).  .plan 1:(2, 1)
q(?x,?q,?z) :- q(?x,?r,?z),s(?r,?q).
q(?x,?e,?o) :- q(?x,?y,?z),r(?y,?u,?e),q(?z,?u,?o). .plan 1:(2,3,1)

You can see some of the .plan directives in there, and .. I don't know what they mean but I'm happy to run it.

mcsherry@gallustrate input % time souffle ../query2.dl
souffle ../query2.dl  81.02s user 0.65s system 99% cpu 1:21.94 total

That's a pretty good improvement, from 130s to 80s. It certainly makes our life a bit harder.

Looking Closer (Datatoad)

A quick examination of the datatoad execution reveals .. our join planner went totally off the rails. At least, when provided with rules written with the intent of left-to-right evaluation, it .. does something else entirely. This is on me, and my mediocre join planner. We can recover the intended behavior with some quick query rewrites, though.

Three of the rules are binary joins, and there's no planning flexbility there.

p(?x,?z) :- p(?x,?y), p(?y,?z).
q(?x,?r,?z) :- p(?x,?y), q(?y,?r,?z).
q(?x,?q,?z) :- q(?x,?r,?z), s(?r,?q).

The other three rules have three terms each, and if we just want to force their left-to-right evaluation we can write them as specific binary joins. The left-to-write evaluation order should be:

ir4(?y,?r,?z) :- p(?y,?w), u(?w,?r,?z).
p(?x,?z) :- ir4(?y,?r,?z), q(?x,?r,?y).

ir5(?x,?y,?z) :- c(?y,?w,?z), p(?x,?w).
p(?x,?z) :- p(?x,?y), ir5(?x,?y,?z).

ir6(?z,?u,?x,?e) :- q(?x,?y,?z), r(?y,?u,?e).
q(?x,?e,?o) :- ir6(?z,?u,?x,?e), q(?z,?u,?o).

Let's pop these into datatoad.

> p(?x,?z) :- p(?x,?y), p(?y,?z).q(?x,?r,?z) :- p(?x,?y), q(?y,?r,?z).q(?x,?q,?z) :- q(?x,?r,?z),s(?r,?q).ir4(?y,?r,?z) :- p(?y,?w),u(?w,?r,?z).p(?x,?z) :- ir4(?y,?r,?z),q(?x,?r,?y).ir5(?x,?y,?z) :- c(?y,?w,?z),p(?x,?w).p(?x,?z) :- p(?x,?y),ir5(?x,?y,?z).ir6(?z,?u,?x,?e) :- q(?x,?y,?z),r(?y,?u,?e).q(?x,?e,?o) :- ir6(?z,?u,?x,?e),q(?z,?u,?o).
36.763722042s

>

Ok. 37s compares pretty well even to 80s, for the hand-optimized Soufflé. I did hand-de-optimize the query above, so there's no reason to feel especially clever. I feel a bit the opposite, actually.

Closer Still (Datatoad)

If we look at what happened with our query, specifically the intermediate results, we see

> .list
        .temp-0-2:      7560179
        .temp-1-3:      7560179
        .temp-2-3:      16595494
        .temp-3-3:      7560179
        .temp-4-2:      16595494
        .temp-4-3:      5898127
        .temp-5-2:      7560179
        .temp-5-3:      230161
        .temp-7-3:      16595494
        .temp-8-2:      16595494
        .temp-8-3:      19831125
        c:      230161
        ir3:    5898127
        ir4:    28388317
        ir6:    19831125
        p:      7560179
        q:      16595494
        r:      385
        s:      1006
        u:      96336
155.292µs

>

That is a lot of intermediate results! You might also notice that many of the intermediate results have the same size. They are mostly several different layouts of p and q, which get used a lot but not always in the order they are listed by default. We can improve this!

The main thing we want to do is write queries whose atoms start with equated terms, in the same order. This is what allows us to use the collections as written, without laying them out anew. For example, when we look at

p(?x,?z) :- p(?x,?y), p(?y,?z).

we can see that the p(?x,?y) atom will end up equating ?y, its second term. That's not great, and it will result in a new intermediate result just to spin around the terms. We can do this manually, creating a p_10 that has the terms in the order of the subscript:

p_10(?y, ?x) :- p(?x, ?y).

Careful inspection also reveals that we'd like q in a different order as well:

q_120(?b,?c,?a) :- q(?a,?b,?c).

With these two alternate layouts, we can rewrite their uses throughout the query:

p(?x,?z) :- p_10(?y,?x), p(?y,?z).
q(?x,?r,?z) :- p_10(?y,?x), q(?y,?r,?z).
q(?x,?q,?z) :- q_120(?r,?z,?x), s(?r,?q).

ir4(?r,?y,?z) :- p_10(?w,?y), u(?w,?r,?z).
p(?x,?z) :- ir4(?r,?y,?z), q_120(?r,?y,?x).

ir5(?x,?y,?z) :- c(?y,?w,?z), p_10(?w,?x).
p(?x,?z) :- p(?x,?y), ir5(?x,?y,?z).

ir6(?z,?u,?x,?e) :- q_120(?y,?z,?x), r(?y,?u,?e).
q(?x,?e,?o) :- ir6(?z,?u,?x,?e), q(?z,?u,?o).

And we can re-run the query

> p_10(?y, ?x) :- p(?x, ?y).q_120(?b, ?c, ?a) :- q(?a,?b,?c).p(?x,?z) :- p_10(?y,?x), p(?y,?z).q(?x,?r,?z) :- p_10(?y,?x), q(?y,?r,?z).q(?x,?q,?z) :- q_120(?r,?z,?x),s(?r,?q).ir4(?r,?y,?z) :- p_10(?w,?y),u(?w,?r,?z).p(?x,?z) :- ir4(?r,?y,?z),q_120(?r,?y,?x).ir5(?x,?y,?z) :- c(?y,?w,?z),p_10(?w,?x).p(?x,?z) :- p(?x,?y),ir5(?x,?y,?z).ir6(?z,?u,?x,?e) :- q_120(?y,?z,?x),r(?y,?u,?e).q(?x,?e,?o) :- ir6(?z,?u,?x,?e),q(?z,?u,?o).
30.007797916s

>

> .list
        .temp-10-2:     16595494
        .temp-10-3:     19831125
        .temp-7-3:      230161
        c:      230161
        ir4:    5898127
        ir5:    28388317
        ir6:    19831125
        p:      7560179
        p_10:   7560179
        q:      16595494
        q_120:  16595494
        r:      385
        s:      1006
        u:      96336
126.917µs

>

It seems like we still have some intermediate results going on here, and a temporary copy of q for some reason. It turns out the reason is from this rule:

q(?x,?e,?o) :- ir6(?z,?u,?x,?e),q(?z,?u,?o).

Looks great, right? Starts with ?z,?u in both terms, so .. no reason to rearrange the terms is there?

Unfortunately, somewhere along the way the cleverness of the join planner decides that we should really have variables in alphabetical order. The order it wants is ?u,?z. That's .. not very helpful, and rather than fix the code, I'm just going to change the rule to

q(?x,?e,?o) :- ir6(?az,?bu,?x,?e),q(?az,?bu,?o).

How clever was that?

The new rule runs a fair bit faster:

> p_10(?y, ?x) :- p(?x, ?y).q_120(?b, ?c, ?a) :- q(?a,?b,?c).p(?x,?z) :- p_10(?y,?x), p(?y,?z).q(?x,?r,?z) :- p_10(?y,?x), q(?y,?r,?z).q(?x,?q,?z) :- q_120(?r,?z,?x),s(?r,?q).ir4(?r,?y,?z) :- p_10(?w,?y),u(?w,?r,?z).p(?x,?z) :- ir4(?r,?y,?z),q_120(?r,?y,?x).ir5(?x,?y,?z) :- c(?y,?w,?z),p_10(?w,?x).p(?x,?z) :- p(?x,?y),ir5(?x,?y,?z).ir6(?z,?u,?x,?e) :- q_120(?y,?z,?x),r(?y,?u,?e).q(?x,?e,?o) :- ir6(?az,?bu,?x,?e),q(?az,?bu,?o).
25.173910125s

>

and we can confirm that there is just one temporary result, which is c in a different order.

> .list
        .temp-7-3:      230161
        c:      230161
        ir4:    5898127
        ir5:    28388317
        ir6:    19831125
        p:      7560179
        p_10:   7560179
        q:      16595494
        q_120:  16595494
        r:      385
        s:      1006
        u:      96336
80.083µs

>

The c relation isn't very big, and it's only used once anyhow, so no great justification to reorder the terms with a new name.

Conclusions

We got down to 25s with interpreted Datalog. We successfully went some amount faster than interpreted Soufflé, and like a good academic I couldn't figure out how to use the performance optimized version of Soufflé and just excluded it from comparison. To be honest, I wish I could have compared, as compiled Soufflé is a reasonable target for interpreted datatoad, which aims to boil away interpretetation overhead.

I did try a few StackOverflow suggestions, but they didn't work. =/

I think there are a few conclusions here, not least that writing a query optimizer is serious business, and getting it wrong can lead to substantial performance regressions. Had we simply implemented the rules left-to-right as written, we would have started with the 37s number. I'm hoping to use this to motivate some worst-case-optimal join work in the near future.

Another conclusion is that interpreted logic doesn't have to be slow. There's a longer post about this, but datatoad is written to be as SPMD as possible, performing blocks of work at a time that can take a beat to notice that e.g. the data are four-byte aligned. The computational kernel does not need to do continual interpretation, and can mostly just go fast.

A final conclusion is that Soufflé is still relevant! I don't have another interpreted Datalog to compare against, nor a compiled Datalog that is as easy to use (or, was as easy to use, back when it worked for me). Ideally we might fix that, and had I done all of this ahead of the Minnowbrook meet-up, I would be sharing this work and stirring up trouble there.