Planet Scheme

Saturday, December 6, 2025

Retropikzel's blog

Thursday, December 4, 2025

Peter Bex

Trustworthy software through non-profits?

I feel a change is happening in how people produce and (want to) consume software, and I want to give my two cents on the matter.

It has become more mainstream to see people critical of "Big Tech". Enshittification has become a familiar term even outside the geek community. Obnoxious "AI" features that nobody asked for get crammed into products. Software that spies on its users is awfully common. Software updates have started crippling existing features, or have deliberately stopped being available, so more new devices can be sold. Finally, it is increasingly common to get obnoxious ads shoved in your face, even in software you have already paid for.

In short, it has become hard to really trust software. It often does not act in the user's best interest. At the same time, we are entrusting software with more and more of our lives.

Thankfully, new projects are springing up which are using a different governance model. Instead of a for-profit commercial business, there is a non-profit backing them. Some examples of more or less popular projects:

Some of these are older projects, but there seems to be something in the air that is causing more projects to move to non-profit governance, and for people to choose these.

As I was preparing this article, I saw an announcement that ghostty now has a non-profit organisation behind it. At the same time, I see more reports from developers leaving GitHub for Codeberg, and in the mainstream more and more people are switching to Signal.

Why free and open source software is not enough

From a user perspective, free software and open source software (FOSS) has advantages over proprietary software. For instance, you can study the code to see what it does. This alone can deter manufacturers from putting in user-hostile features. You can also remove or change what you dislike or add features you would like to see. If you are unable to code, you can usually find someone else to do it for you.

Unfortunately, this is not enough. Simply having the ability to see and change the code does not help when the program is a web service. Network effects will ensure that the "main instance" is the only viable place to use this; you have all your data there, and all your friends are there. And hosting the software yourself is hard for non-technical people. Even highly technical people often find it too much of a hassle.

Also, code can be very complex! Often, only the team behind it can realistically further develop it. This means you can run it yourself, but still are dependent on the manufacturer for the direction of the product. This is how you get, for example, AI features in GitLab and ads in Ubuntu Linux. One can technically remove or disable those features, but it is hard to keep such a modified version (a fork) up with the manufacturer's more desirable changes.

The reason is that the companies creating these products are still motivated by profit and increasing shareholder value. As long as the product still provides (enough) value, users will put up with misfeatures. The (perceived) cost of switching is too high.

Non-profit is not a panacea

Let us say a non-profit is behind the software. It is available under a 100% FOSS license. Then there are still ways things can go south. I think this happens most commonly if the funding is not in order.

For example, Mozilla is often criticised for receiving funding from Google. In return, it uses Google as the default search. To make it less dependent on Google, Mozilla acquired Pocket and integrated it into the browser. It also added ads on the home screen. Both of these actions have also been criticized. I do not want to pick on Mozilla (I use Firefox every day). It has clearly been struggling to make ends meet in a way that is consistent with its goals and values.

I think the biggest risk factor is (ironically) if the non-profit does not have a sustainable business model and has to rely on funding from other groups. This can compromise the vision, like in Mozilla's case. For web software, the obvious business model is a SaaS platform that offers the software. This allows the non-profit to make money from the convenience of not having to administer it yourself.

There is another, probably even better, way to ensure the non-profit will make good decisions. If the organization is democratically led and open for anyone to become a member like Codeberg e.v. is, it can be steered by the very users it serves. This means there is no top-down leadership that may make questionable decisions. Many thanks to Technomancy for pointing this out.

What about volunteer driven efforts?

Ah, good old volunteer driven FOSS. Personally, I prefer using such software in general. There is no profit motive in sight and the developers are just scratching their own itch. Nobody is focused on growth and attracting more customers. Instead, the software does only what it has to do with a minimum of fuss.

I love that aspect, but it is also a problem. Developers often do not care about ease of use for beginners. Software like this is often a power tool for power users, with lots of sharp edges. Perfect for developers, not so much for the general public.

More importantly, volunteer driven FOSS has other limits. Developer burn-out happens more than we would like to admit, and for-profit companies tend to strip-mine the commons.

There are some solutions available for volunteer-driven projects. For example Clojurists together, thanks.dev, the Apache Foundation, the Software Freedom Conservancy and NLNet all financially support volunteer-driven projects. But it is not easy to apply to these, and volunteer-driven projects are often simply not organized in a way to receive money.

Conclusion

With a non-profit organisation employing the maintainers of a project, there is more guarantee of continuity. It also can ensure that the "boring" but important work gets done. Good interface design, documentation, customer support. All that good stuff. If there are paying users, I expect that you get some of the benefits of corporate-driven software and less of the drawbacks.

That is why I believe these types of projects will be the go-to source for sustainable, trustworthy software for end-users. I think it is important to increase awareness about such projects. They offer alternatives to Big Tech software that are palatable to non-technical users.

by Peter Bex at Thursday, December 4, 2025

Tuesday, December 2, 2025

Idiomdrottning

Scheme Do

I thought the do in in Scheme was a li’l hard to learn since the examples I could find was a li’l too fancy and clever. Just like a lot of my own documentation often is; sorry about that.

(do ((a 0 (add1 a))
     (e 13)
     (i 7 (sub1 i)))
    ((zero? i) (print "ended ") (print a))
  (print "game cube")
  (print e)
  (print  a " wii"))

The first argument is a list of bindings like a let. But if you put in three things they will be rebound every round. Like in the above example, a will be rebound to (add1 a), and i will be rebound to (sub1 i). And you can also put in just one thing like the example here just binds e to 13 and then it just stays that.

The second argument is a list that starts with the ending condition and the rest of the list are statements that will be evaled after.

And then all the remaining arguments are evaled every round.

Yes, this is backwards that the ending stuff comes before the main progn body.

Here’s a more basic example to just print happy birthday four times:

(do ((i 4 (sub1 i)))
    ((zero? i))
  (print "happy birthday"))

Although for simple things like that, Chicken has dotimes:

(dotimes (i 4) (print "happy birthday"))

Both do and dotimes are pretty side-effects oriented but in both cases you can put in a return value:

(do ((i 4 (sub1 i)))
    ((zero? i) 'my-return-value)
  (print "happy birthday"))

(dotimes (i 4 'my-return-value) (print "happy birthday"))

And that return value can access your do scoped binds.

I refer to my own fold write-up all the time (the folds as named lets version hasn’t been as useful) and maybe with this, I can use do and dotimes more instead of making let loops and consing up unnecessary iotas.

by Idiomdrottning (sandra.snan@idiomdrottning.org) at Tuesday, December 2, 2025

Monday, December 1, 2025

Idiomdrottning

Advent of Code, 2025

Here is where I’ll post my solutions to Advent of Code using zshbrev. Spoilers ahead, and no promises that I’ll make it through the entire 12 days.

Dec 1st

(define (turn)
  (define-parameters zero 0)
  (fold
   (fn
    (with-result
     (when
         (zero?
          (save
           (modulo
            (+ (string->number
                (strse x "L" "-" "R" "")) y) 100)))
       (zero (add1 (zero))))))
   50
   (read-lines))
  (zero))

Last time I attempted Advent of Code, I got tangled up modifying the step one solutions to handle step two and then I ended up wanting to revisit step one but they were gone. So this year I’m going to try to paste a second copy before modifying and I hope that works out better.

(define (dial acc r d)
  (with (+ d r)
    (when
        (or
         (zero? it)
         (= 100 it)
         (< it 0 d)
         (< d 100 it))
      (acc))
    (modulo it 100)))

(define (dial acc (? (fn (< x -100)) r) d)
  (acc)
  (dial acc (+ r 100) d))

(define (dial acc (? (fn (< 100 x)) r) d)
  (acc)
  (dial acc (- r 100) d))

(define (dial acc (? string? x) y)
 (dial acc (string->number (strse x "L" "-" "R" "")) y))

(define (turn)
  (define-parameters zero 0)
  (fold (c dial (fn (zero (add1 (zero))))) 50 (read-lines))
  (zero))

This was one of the hardest bugs I’ve ever had to debug.
I’ve had to write log parsers, differs between different logs from different versions, multiple implementations to check against each other, Emacs highlight-regexp and count-matches and so on. It took me ten tries on the Advent of Code website. I get paranoid that I had mistyped my answer in there.

The line that now says (< it 0 d), originally I had it as (<= it 0 d) but it gave false positives on rotating from zero.

For a while I had it as (and (< it 0) (<= 0 d)) which… doesn’t fix that problem at all. Even after coming up with the fix (< it 0 d), that gives false negatives on rotating exactly one rotation left from zero. But there’s no L100 in the data set? No, but my code before I cleaned it up had:

(dial acc (+ r 100)
      (dial acc -100 d))

where it now says:

(acc)
(dial acc (+ r 100) d)

…leading to lots and lots of zero to zero turns which came with a false positive in some versions and false positives in others.

A complete PEBCAK on my part but the hunt for the bug became a real adventure of trying to sift through clues in logs that were thousands of lines long.

Dec 2nd

(define (valid? x) #t)

(define (valid? (= ->list x))
  (->* x (split-at (/ (require even? (length x)) 2)) equal? not))

(define (sum-not-valids-in-range
         (=
          (fn
           (map string->number (string-split x "-")))
          (current end)))
  (descend ((steps (- (add1 end) current)) current)
    (+ (if (valid? current) 0 current)
       (desc (sub1 steps) (add1 current)))))

(define (sum-not-valids)
  (->>
   (-> (read-string)
       (string-split ",\n"))
   (map sum-not-valids-in-range)
   (reduce + 0)))

After adapting that same idea to part two with a few minor tweaks, it’s too slow! Works fine with the example data but not the full input. I hate it when I have a working solution that I really like beacuse it does something clever but have to write a whole new one that’s faster. This is a “Project Euler” type problem where I need to come up with a math solution instead of just list procressing. But then I don’t really hate it because I did come up with a good solution.

Inverting the puzzle by making an is-in-any-range? predicate and then we can generate all invalid numbers up to the ceiling and see if they’re in any range.

(define (is-in-any-range? x)
  (or
   (<= 1 x 19)
   (<= 51 x 69)
   (<= 72 x 85)
   (<= 86 x 113)
   (<= 411 x 466)
   (<= 525 x 652)
   (<= 660 x 782)
   (<= 859 x 1056)
   (<= 1626 x 1972)
   (<= 2768 x 3285)
   (<= 4002 x 4783)
   (<= 4919 x 5802)
   (<= 7025 x 8936)
   (<= 9096 x 10574)
   (<= 13004 x 15184)
   (<= 32138 x 36484)
   (<= 48548 x 61680)
   (<= 69302 x 80371)
   (<= 82984 x 100358)
   (<= 126397 x 148071)
   (<= 193276 x 237687)
   (<= 266408 x 302255)
   (<= 333117 x 414840)
   (<= 431250 x 455032)
   (<= 528410 x 680303)
   (<= 726807 x 764287)
   (<= 779543 x 880789)
   (<= 907442 x 983179)
   (<= 2558912 x 2663749)
   (<= 5117615 x 5149981)
   (<= 7702278 x 7841488)
   (<= 9231222 x 9271517)
   (<= 13413537 x 13521859)
   (<= 32295166 x 32343823)
   (<= 49829276 x 50002273)
   (<= 67606500 x 67729214)
   (<= 99990245 x 100008960)
   (<= 146086945 x 146212652)
   (<= 4747426142 x 4747537765)
   (<= 5552410836 x 5552545325)
   (<= 5858546565 x 5858614010)
   (<= 7454079517 x 7454227234)
   (<= 8764571787 x 8764598967)
   (<= 9999972289 x 10000034826)))

Okay, great! I checked that there’s no overlapping ranges in this particular data set. That means we can make an idempotent summer so we don’t add the same number twice.

(define summer (memoize (call-key* proc: + initial: 0)))

Now for a generator. The spine is just incrementing the numbers and the ribs are repeating them.

(define roof (biggest 2558912 2663749 1 19 72 85 82984 100358 86 113
                      193276 237687 51 69 779543 880789 13004 15184 2768 3285 4002 4783
                      7702278 7841488 7025 8936 5858546565 5858614010 5117615 5149981 4919
                      5802 411 466 126397 148071 726807 764287 7454079517 7454227234 48548
                      61680 67606500 67729214 9096 10574 9999972289 10000034826 431250
                      455032 907442 983179 528410 680303 99990245 100008960 266408 302255
                      146086945 146212652 9231222 9271517 32295166 32343823 32138 36484
                      4747426142 4747537765 525 652 333117 414840 13413537 13521859 1626
                      1972 49829276 50002273 69302 80371 8764571787 8764598967 5552410836
                      5552545325 660 782 859 1056))

(define (add-all-repeats seed big-number) (void))

(define (add-all-repeats seed number)
  (with (require (c > roof) (string->number (conc number seed)))
   (when (is-in-any-range? it) (summer it))
   (add-all-repeats
    seed it)))

Let’s hard code it to five-digit numbers which is okay for this particular input.

(define (generate-the-answer)
  (do ((num 1 (add1 num)))
      ((< 100000 num) (summer))
    (add-all-repeats num num)))

Okay, that’s a relief! Today was a lot easier to debug. I originally had the summer see the numbers even before they were repeating. But that bug was easy enough to find and fix.

Dec 3rd

Okay, here we have a similar dilemma of “extracting” vs building up possible joltages and filtering for them like (strse? "9.*9"). Maybe if I start with extracting, that will still be useful as a fallback for any stragglers after a filtering solution.

(define (extract bank)
  (with (find-tail
         (is?
          (biggest (butlast bank))) bank)
    (list (car it) (biggest (cdr it)))))

(define (sum-joltages) (fold (fn (+ ((as-list extract) x) y)) 0 (read-list)))

Okay, that worked fine. I’m always remarkably bad at predicting what step two is gonna be. I feel like I’m gonna try extracting for step two also.

(define ((extract amount) bank)
  (with (find-tail
         (is?
          (biggest (drop-right bank amount))) bank)
    (cons (car it) ((extract (sub1 amount)) (cdr it)))))

(define ((extract 0) bank) (list (biggest bank)))

(define (sum-joltages) (fold (fn (+ ((as-list (extract 11)) x) y)) 0 (read-list)))

That worked! Weird feeling how Monday took all day because I was chasing a bug and even Tuesday took more than an hour, maybe closer to three hours, but this one my idea worked right away and the solution for part 1 was also the right direction for part 2. I lucked out! And/or am actually good at programming especially when it’s straight-forward list-processing like this.

Dec 4th

This time around (it’s my second time attempting Advent of Code; I tried it in 2023 but quit before the end) I’m paying more attention to the story and I’m really getting into the Matt Groening–like shenanigans.

As for the puzzle, this type of 2d, maps-and-neighbors stuff is something I don’t have as much of a standard library for. SRFI-1 doesn’t really cover it so I’m starting more from scratch here and I’m buckling in, accepting that it might take a li’l more time and what write here I’ll get use out of later too. I actually thought to work a li’l bit ahead and look up some array stuff in the latter SRFI’s but then I didn’t have time to do that in November.

(define (count-accessible)

  (define nodes (call-list (map call-string (read-lines))))

  (define (get-node x y) (void))

  (define (get-node x y)
    (handle-exceptions exn (void)
                       ((require procedure?
                                 (nodes (require (c < -1) y)))
                        (require (c < -1) x))))

  (define (get-neighbors x y)
    (parse (c apply get-node)
           (list-ec (: dx -1 2)
                    (: dy -1 2)
                    (if (not (= 0 dy dx)))
                    (list (+ x dx) (+ y dy)))))
  (let ((width (length (nodes)))
        (height (string-length ((nodes 0)))))

    (sum-ec (: x 0 width)
            (: y 0 height)
            (if (eq? #\@ (get-node x y)))
            (if (> 4 (count (is? #\@) (get-neighbors x y))))
            1)))

Okay I like it when it works first try because I hate to put in more than one guess but this was right. Good. Also didn’t have any bugs.

Now onto part 2. I really have to give Advent of Code a stern scolding when it comes to accessibility: the dark grey text on dark grey background is really really really hard to read so I use einkbro’s light mode but that mode didn’t show the highlighted @ signs in the part 2 example. I had to toggle off the mode but then I almost can’t see anything on the screen. Bad bad elves!

But okay, I figured out from what the text says what to do.

(define (count-accessible)

  (define nodes (call-list (map call-string (read-lines))))

  (define (get-node x y) (void))

  (define (get-node x y)
    (handle-exceptions exn (void)
                       ((require procedure?
                                 (nodes (require (c < -1) y)))
                        (require (c < -1) x))))

  (define (get-neighbors x y)
    (parse (c apply get-node)
           (list-ec (: dx -1 2)
                    (: dy -1 2)
                    (if (not (= 0 dy dx)))
                    (list (+ x dx) (+ y dy)))))

  (let* ((width (length (nodes)))
         (height (string-length ((nodes 0))))
         (get-accessible
          (lambda ()
            (sum-ec (: x 0 width) (: y 0 height)
                    (if (memq (get-node x y) '(#\x #\@)))
                    (if (> 4 (count (fn
                                     (memq x '(#\x #\@)))
                                    (get-neighbors x y))))
                    (begin
                      ((nodes y) x #\x)
                      1)))))
    (descend ((accessible (get-accessible)))
      (do-ec (: x 0 width) (: y 0 height)
             (if (eq? #\x (get-node x y)))
             ((nodes y) x #\.))
      (+ accessible (desc (get-accessible))))))

Okay. That worked. No wrong entries today either which always feels great. I could spot my bugs on the example output. The bug today was that while I realized right away that I need to count X as neighbors, I forgot that I needed to count X as self too. So I was done in a li’l less than an hour (three quarters rather) which is fine. More than yesterday but that’s OK. I had to implement all this 2D neighbors stuff. I liked the idea of using parse since it just elides voids.

Dec 5th

(define ((in-ranges? ranges) ingredient)
  (any (fn (<= (first x) ingredient (second x))) ranges))

(define (count-fresh)
  (receive (ranges ingredients)
      (break number?
             (with (read-list)
               (strse* it
                       (: (=> start integer) "-" (=> end integer))
                       (conc "(" start " " end ")"))))
    (count (in-ranges? ranges) ingredients)))

Today was a real head-scratcher because it seemed to me part 1 is a subset of December 2nd and part 2 is even easier than part one. Then I realized that the difference is that unlike December 2nd, this time our input data have overlapping ranges (something I checked for on Dec 2nd but almost forgot to do here). I’m grateful that the test input also did, or I would’ve wasted a guess on the real thing. Joining the ranges is just a smop once you know that it’s there.

(define (join-ranges single) single)

(define (join-ranges (and ((had hadd) (nak nadk) . tail) (hd . tl)))
  (if (<= nak hadd)
      (join-ranges (cons (list had (biggest hadd nadk)) tail))
      (cons hd (join-ranges tl))))

(define (count-fresh)
  (fold
   (fn
    (+ y 1 (second x) (- (first x)))) 0
   (join-ranges
    (sort
     (take-while
      list?
      (with (read-list)
        (strse* it
                (: (=> start integer) "-" (=> end integer))
                (conc "(" start " " end ")"))))))))

Dec 6th

(define (pivot table) (cons (map car table) (pivot (map cdr table))))

(define (pivot (? (c every null?) table)) '())

(define (cephaluate)
  (reduce + 0
          (map (o eval (c map string->read) reverse)
               (pivot
                (map string-split (read-lines))))))

Oh, wow, here’s what I’ve been dreading: an easy part 1 followed by a seemingly completely different part 2!

(define ((space-pad gl) str)
  (conc str (make-string (- gl (string-length str)) #\space)))

(define (cephaluate)
  (let* ((lines (read-lines))
         (gl (biggest (map string-length lines))))
    (reduce + 0
            ((over (eval
                    (map string->read
                         (cons* ((as-list list last) (car x))
                                ((as-list butlast) (car x))
                                (cdr x)))))
             (parse (?-> string? (fn (if (strse? x "^ +$") (values close: open:) x)))
                    (append
                     (map list->string
                          (pivot
                           (map (o string->list (space-pad gl)) lines)))
                     (list close:)))))))

I live for this convoluted maps of maps of maps of maps stuff! Very fun problem.

Uh but if I were to try to explain how my program works… Hmm. From the inside out:
Reads all lines as lines.
Adds extra spaces to the end so all lines are the same length.

Pivots the lists so columns become rows and rows become columns.

Then with Acetone’s parse I split the problems into their own lists.

I split out the operator (that’s the list last, butlast stuff) and put it first then read and eval each problem.

Then finally I sum all those answers up.

Didn’t have any bugs today. I did put in two redundant reverses that still gave me the right answer; I found them and removed them after getting the star while making this write up.

Dec 7

(define-parameters splits 0)

(define (tachyon-count (prev current next . beams))
  ((over
    (when (and (eq? x #\S) (eq? y #\.)) (current i #\S))
    (when (and (eq? x #\S) (eq? y #\^))
      (splits (add1 (splits)))
      (next (sub1 i) #\S)
      (next (add1 i) #\S)))
   (prev) (current))
  (tachyon-count
   (cons* current next beams)))

(define (tachyon-count (last exit))
  (splits))

(define (tachyon-count)
  (tachyon-count (map call-string (read-lines))))

Now this is what I’m talking about! This is the longest I’ve spent on a part 1 so far. Even Dec 1st, which was my longest day, part 1 wasn’t where I got stuck. Here I knew what to do, it was just tricky to keep track of everything. Now onto part two of this wonderful puzzle!

After reading part two… what a let down! It’s just the non-idempotent version. Although smopping that together on a tired-brain day like today is easier said than done.

I apologize to the makers of Advent of Code for calling their hard work a let down, it’s just that the non-idempotent “naively recursive” version is what I almost wrote by accident for part 1. I checked myself in time before making that version so actually implementing it did take some time.

(define ((list->indices pred) lis)
  (filter-map (fn (and (pred x) y)) lis (iota (length lis))))

(define (tachyon-count prev (next . beams))
  (if (memq prev next)
      (+
       (tachyon-count (sub1 prev) beams)
       (tachyon-count (add1 prev) beams))
      (tachyon-count prev beams)))

(define (tachyon-count last '()) 1)

(define (tachyon-count)
  (with
   (remove
    empty?
    (map (as-list (list->indices (complement (is? #\.)))) (read-lines)))
   (tachyon-count (caar it) (cdr it))))

(memoize! tachyon-count)

Before I thought to clean up the input it was hard to keep track of everything (I had prev, current, next, blank lines, passing through etc). And I had something that worked on the example input but was too slow for the real input. So I started over and that’s the version you see above. It introduced a bug (I forgot to pass through beams at first) which required some creative logging to find with ever-increasing indentation prefixes etc etc until the new version finally worked on the example input. But it was still too slow for the real input. And memoization fixed that and here we are. All in all an extra hour or two.

The hardest problem yet after the “breathers” of 5th and 6th, but I remember last time (2023) I was completely stumped on some problems even after spending a day with a paper notebook just thinking and thinking and so far we haven’t seen that. I remember back then having to postpone some of the stars like “Okay I’ll get back to this one later” and doing it in the evening or the next day or something and this year I’ve just done both of them in the morning except for the first day that did take all day. (And what a privilege to be able to work all day on a recreational puzzle!) Maybe it says more about how incredibly burnt out I was after the apartment move back then than about the difficulties of the puzzles.

Also this one felt more like a “knowledge test” than the preious entries. I knew about the basics of recursion vs iteration, idempotence vs shadowing, and the life-changing magic of memoization. I know about those things from books like SICP and PAIP. It’s less about me figuring out something clever and more about me having book learning. That doesn’t feel super fair.

Maybe I should take this opportunity to share some of that book learning: My part one solution went through every row once. That’s why it’s fast. It’s an iterative solution. The part two solution needs to go through every row for every beam splitter above it. It branches over three trillion times. That is too slow for even my super duper computer to figure out. But memoization, which in this case means having a hash-table that stores the results it has seen before, means that it doesn’t have to re-calculate subtrees it has seen before. It becomes fast again.

memoize! the brev-separate version does work even through match-generics (but it needs to be called after all the definitions) and zshbrev (since the entire file is compiled).

by Idiomdrottning (sandra.snan@idiomdrottning.org) at Monday, December 1, 2025

Friday, November 28, 2025

Retropikzel's blog

crumbles.blog

Tour of a pattern matcher: decision trees

I’ve teased it long enough:1 today we’re going to look at the real meat of turning declarative, something-that-looks-like-this patterns into fast, running, imperative, if-this-then-that code. The absolute final generation of working Scheme code will have to wait until our thrilling conclusion in part 4, but by the end of this episode you’ll probably at least have an idea of what that will look like.

Today we’re in the (extensible-match decision-tree) library, which takes patacts (the AST representation of a pattern match clause introduced last time) and produces an optimized decision tree. It starts with a foreboding comment:

;; This is some of the hardest code I have ever had to write.

And, indeed, not since I was a tweenaged baby hacker struggling through an online tutorial on how to implement A* pathfinding for my game2 have I had so many false starts and wrong directions in implementing a known algorithm. But these were problems with working out how to translate the idea into code – the idea itself is pretty intuitive!

Motivating example

To see how we benefit from an optimizing decision tree generator, let’s look at an example not all too dissimilar from the last example we saw in the previous part.3

(define (uniq ls same?) ; remove adjacent equivalent elements from list
  (match ls
    ('()
     '())
    ((cons _ '())
     ls)
    ((cons* y x ls*)
     (if (same? y x)
         (uniq (cons x ls*) same?)
         (cons y (uniq (cons x ls*) same?))))))

For this input, a typical Lisp hacker’s first attempt at a pattern matcher implementation usually generates code that looks basically like this:

(define (uniq ls same?)
  (let ((subject ls))
    (define (clause-1)
      (if (equal? subject '())
          '()
          (clause-2)))
    (define (clause-2)
      (if (pair? subject)
          (let ((g0 (car subject)) (g1 (cdr subject)))
            (if (equal? g1 '())
                ls
                (clause-3)))
          (clause-3)))
    (define (clause-3)
      (if (pair? subject)
          (let ((g2 (car subject))
                (g3 (cdr subject)))
            (if (pair? g3)
                (let ((g4 (car g3)) (g5 (cdr g3)))
                  (let ((y g2) (x g4) (ls* g5))
                    (if (same? y x)
                        (uniq (cons x ls*) same?)
                        (cons y (uniq (cons x ls*) same?)))))
                (no-matching-clause)))
          (no-matching-clause)))
    (define (no-matching-clause)
      (assertion-violation 'match "no clause matches" subject))
    (clause-1)))

The idea is simple: compile each clause to a procedure, and tail-call the next procedure’s clause as a ‘failure continuation’ at any point where you notice that the pattern doesn’t match.

This is a completely reasonable implementation for very many, nay, probably even a majority of use cases. It’s how everyone learning to write a pattern matcher starts out, and indeed probably how every individual pattern match compiler starts out (including extensible-match!) The syntax-case pattern matcher implemented by psyntax basically works like this, because the absolute fastest matching time isn’t a priority for something that runs at expand time. Alex Shinn’s pattern matcher implementation works like this – anything more complicated would probably be hell to write in pure syntax-rules, and the sheer portability Shinn managed to achieve by writing only in syntax-rules has let his pattern matcher take over the world, inefficiencies and the occasional fudging on semantics be damned. (By contrast, SRFI 262’s dependence on identifier properties will likely hold it back from any equivalent level of world dominance for some time yet, until more Scheme implementers start to catch on to how nifty they are.)

Some of the inefficiencies in this particular code will be optimized away by the Scheme compiler. I mentioned in the first part of this series that redundant let bindings are easy to get rid of. Unfortunately, more consequential inefficiencies are harder to deal with. Most significantly, even after optimization, this code will call pair? and cdr twice every time it goes through the loop: once for the second clause, then again in the third clause. While Scheme compilers are usually clever enough to eliminate repeated type checks when they occur within a single nested tree of code, putting each check in its own little procedure messes this compiler optimization up. The exception is Guile, because Andy Wingo developed a very clever compiler pass which can recognize common prefixes across these internal procedure boundaries. Also, very simple uses – like the last example from last episode – can have the clauses inlined into one another and thus end up in a nice nested tree for the compiler to work with; but as a data point, clause-3 above is already complex enough (or complex enough and referenced from enough places) that Chez’s cp0 doesn’t bother.

This repeated calling and branching is theoretically unsatisfactory, but the real philosophical issue in these inefficiencies – the thing that really motivated me to want to do better – is specifically that this generated code is nothing like what a programmer would write in the absence of pattern matching. Nobody would ever write code which had those repeated pair? calls. The real point I want to make by spending many lines of code on trying to do something better is that pattern matching is a good idiom, and programmers should use it. To that end, pattern matching should be a usable tool even in the hottest of hot loops, even on the slowest of microcontrollers: nobody should ever feel like they have to switch back to writing out if expressions by hand for performance’s sake. The fact that pattern matching is a declarative idiom, and declarative programming has something of a reputation for being inefficient, doesn’t exactly help its reputation here.

There are mountains of research into efficient compilation of pattern matching to avoid generating silly code like this. In the usual framing, there are two approaches: the backtracking automaton and the decision tree. The backtracking automaton, as its name implies, will still sometimes run the same tests on the same subject more than once, but tries to do so less than the naïve approach shown above which re-examines everything from scratch in every clause. The decision tree, on the other hand, will never re-examine any of its inputs, but may have much larger (exponentially larger!) code size than the automaton.

In fact these two approaches aren’t so different in practice – once you run the right time-saving optimizations for generating an automaton, or the right space-saving optimizations for generating a decision tree, the results are pretty close to one another. They’ll be nearly identical on very, very many use cases; it’s in the comparatively rare cases where they fall down that they fall down in different ways.

In the Scheme and Scheme-adjacent world, Racket’s pattern matcher by Sam Tobin-Hochstadt uses an optimized backtracking automaton after the technique by Le Fessant and Maranget (2001);4 extensible-match uses a decision tree after the technique by Maranget (2008). Yep, same guy! Luc Maranget is actually far from the only compiler researcher to have written papers on optimizing both techniques; but he is the only one whose papers now seem to be regarded as the somewhat definitive introduction to the state of the art for each of the two techniques.

Generating a decision tree

Maranget’s 2008 paper is heavy on notation, but mostly explains it as it goes along. It’s pretty accessible as European academia goes: if you’re good at reading and understanding other people’s code, you should be able to understand Maranget’s paper. It’s worth a read (or take a look at a shorter explanation by Colin James or David Nolen) – I’ll explain it here as it’s implemented in extensible-match, noting some (but not all!) differences to Maranget’s original presentation.

So, our input is a list of patterns and their associated actions, one for each clause. In fact, our input is always a list of rows of patterns – remember, we’re in %core-match-lambda which matches multiple patterns against multiple values, using the row-pattern AST type to group them together. Each AST pattern (and subpattern, but we’ll get to that) represents something we can do to get closer to picking a clause which matches the subject – meaning that with the row-pattern, we have a choice of multiple potential patterns to choose from. The ?-pattern, for example, represents that we could test its subject against a predicate; the var-pattern represents renaming an internal subject variable to a pattern variable visibly bound in the clause.

Let’s say we pick one pattern from the first row, since that’s the one we have to test first. (How we pick one is a topic we’ll get to later.) We’re going to act on this pattern – let’s say a ?-pattern testing pair? against our input x. What do we need to do?

What we ultimately want to do is generate (if (pair? x) <code-if-a-pair> <code-if-not-a-pair>); this gives us a hint that we also need to work out how to fill in those two slots. Well, what goes in those two slots?

The <code-if-not-a-pair> clause is easier to understand first, because if (pair? x) is false, the whole row can’t match any more: a row is an and combination of patterns, and one of the anded subpatterns just threw up false. So we can throw out that entire row and fill in <code-if-not-a-pair> with the code generated by repeating this process on that smaller list of rows with their actions.

But wait! What if the next row also contains a ?-pattern testing pair? on x – there’s no point testing that again, because it will be false a second time as well! So apart from throwing away the first row, we also walk through the remaining rows and throw away any of them which also depend on the same test before we repeat this process to generate the <code-if-not-a-pair>.

The <code-if-a-pair> is generated by a similar walk over the rows and recursion on the result. We don’t throw away the first row because in this case, that part of it did match. Instead, we remove it from the row and from any subsequent rows which make exactly the same test, preventing generating the same test again in the case where we know it succeeded as well as the case where we know that it failed.

That’s basically the core of Maranget’s algorithm. Maranget calls generating the new rows for <code-if-a-pair> specializing the patterns, and generating the new rows for <code-if-not-a-pair> defaulting the patterns – both with respect to a particular ‘constructor’, but because Scheme is different (more below), extensible-match calls these specializers. There are two base cases of the recursion: one when there are no rows left (no patterns matched; generate the code to signal a matching failure) and one when the top row only consists of irrefutable patterns (ones like _ wildcard patterns and var-patterns, which don’t have failure cases: in this case that row matched; generate the code to activate the corresponding action).

However, there’s one type of pattern in our AST which doesn’t have a failure case, but hides more subpatterns which do: the apply-pattern which calls some procedure on its subject variable and creates more subject variable(s) for the value(s) returned. Those subject variables are then available to be the subject of a subpattern of the apply-pattern.

For his equivalent of an apply-pattern Maranget makes a whole separate set of patterns when specializing, just containing the subpatterns. This probably works better for his compiler of ML patterns than it did when I tried in my compiler of Scheme patterns: for one thing, testing type (our ?-pattern) and deconstructing into values for subpatterns (our apply-pattern) are the same thing in his implementation; since ML is statically typed and pattern matching is the primitive means of deconstruction, the compiler has a lot more knowledge about the potential range of input values and what is and isn’t possible after each deconstruction step than extensible-match can get as a Scheme macro. In particular, Maranget’s patterns after each deconstruction step always form a matrix with columns for each value and rows for each clause. That’s not guaranteed by the structure of this AST (but explains why the term ‘row’ is used in extensible-match).

So instead, when an apply-pattern has been chosen as the specializer, each specialized row simply has the subpattern of that apply-pattern spliced into it. This means that each row-pattern can have a different length, depending on which apply-patterns have been specialized and had their subpatterns spliced in!

As one last point, in order to simplify the job of the procedures which pick a specializer and specialize and default the rows on it, after this splicing takes place (or more accurately before each recursive step), any rows nested within the rows are flattened out. If there were any or patterns, they’re also expanded into multiple copies of the entire patact, each with one of the or clauses, so the decision tree generator proper doesn’t need to look for specializers specially inside of or patterns either.

Representing a decision tree

For reasons that will become more apparent below and in the next episode, a decision tree is actually another kind of intermediate representation: we don’t represent it directly as Scheme code. Here’s its definition, slightly simplified:

(define-record-type dt-node
  (fields success-branch failure-branch))
(define-record-type dt-test
  (fields proc var)
  (parent dt-node))
(define-record-type dt-apply
  (fields proc var vars)
  (parent dt-node))
(define-record-type dt-equal
  (fields val var)
  (parent dt-node))
(define-record-type dt-rename
  (fields internal external)
  (parent dt-node))

Immediately it’s clear how much simpler this is than the AST we got as input: only four concrete node types vs nine pattern types in the AST.5 The action type from the AST also appears in decision trees, as the leaf nodes, where a matching clause has been chosen or the match failed.

?-patterns, when chosen, become dt-test nodes; apply-patterns become dt-apply nodes; quote-patterns become dt-equal nodes; var-patterns become dt-rename nodes. and-patterns, or-patterns, row-patterns, and not-patterns are reflected in the structure of the tree rather than as nodes themselves. wildcard-patterns are no-ops, as far as the pattern match compiler is concerned, and don’t show up in the tree at all.

Step by step

Here’s a worked example. Let’s take the last example from the previous episode:

(define (last ls) ; returns the last element in a list
  (match ls
    ((cons x '())  x)
    ((cons _ ls*)  (last ls*))))

Our AST for this pattern match expression looks like this (in an entirely hypothetical notation):

((patact (row (and (? #:subject ls #:predicate pair?)
                   (row (apply #:subject    ls
                               #:procedure  car
                               #:vars       (ls.car)
                               #:subpattern (row (var #:subject ls.car
                                                      #:name x)))
                        (apply #:subject    ls
                               #:procedure  cdr
                               #:vars       (ls.cdr)
                               #:subpattern (row (quote #:subject ls.cdr
                                                        #:datum ()))))))
         (action return x))
 (patact (row (and (? #:subject ls #:predicate pair?)
                   (row (apply #:subject    ls
                               #:procedure  car
                               #:vars       (ls.car)
                               #:subpattern (row (wildcard #:subject ls.car)))
                        (apply #:subject    ls
                               #:procedure  cdr
                               #:vars       (ls.cdr)
                               #:subpattern (row (var #:subject ls.cdr
                                                      #:name ls*))))))
         (action do-last ls*)))

We look at the first row, which only has one subpattern and thus only one potential specializer to pick: (? #:subject ls #:predicate pair?). So we generate a dt-test node which checks if ls is a pair.

In the success branch of this dt-test node, we put the result of re-running this algorithm on all of the patacts with this specializer applied. We don’t need the pair? any more in either row, so it will disappear and be replaced by the right-hand side of the and pattern which contains it. This in turn is a row in both cases; we splice it into the main row of each patact. Our specialized patacts now look like this:

((patact (row (apply #:subject    ls
                     #:procedure  car
                     #:vars       (ls.car)
                     #:subpattern (row (var #:subject ls.car
                                            #:name x)))
              (apply #:subject    ls
                     #:procedure  cdr
                     #:vars       (ls.cdr)
                     #:subpattern (row (quote #:subject ls.cdr
                                              #:datum ()))))
         (action return x))
 (patact (row (apply #:subject    ls
                     #:procedure  car
                     #:vars       (ls.car)
                     #:subpattern (row (wildcard #:subject ls.car)))
              (apply #:subject    ls
                     #:procedure  cdr
                     #:vars       (ls.cdr)
                     #:subpattern (row (var #:subject ls.cdr
                                            #:name ls*))))
         (action do-last ls*)))

As for the failure branch of dt-test: we know that the first patact can’t match any more just because this single test failed. We have to default on the second patact only – and that also required (pair? ls) to be true. So our defaulted patacts in this case are just the empty list, and the recursion on that will generate the leaf node to signal an exception because no clause matched that input.

Back on the success branch, we still have these two patacts, but this time the first row gives us a choice of what to do. We could pick the leftmost one – the car – but actually the car isn’t very interesting because it’s not refutable. The cdr, on the other hand, can fail, so let’s prioritize it. We generate a dt-apply binding the cdr of ls to ls.cdr; it only needs a success branch since we assume this can’t fail. So we specialize again, and the specialized rows, after splicing, look like this:

((patact (row (apply #:subject    ls
                     #:procedure  car
                     #:vars       (ls.car)
                     #:subpattern (row (var #:subject ls.car
                                            #:name x)))
              (quote #:subject ls.cdr
                     #:datum ()))
         (action return x))
 (patact (row (apply #:subject    ls
                     #:procedure  car
                     #:vars       (ls.car)
                     #:subpattern (row (wildcard #:subject ls.car)))
              (var #:subject ls.cdr
                   #:name ls*))
         (action do-last ls*)))

Still two patterns in the top row, and again, the quote pattern looking at ls.cdr is still considerably more interesting than the apply pattern, so we pick that as the specializer again. We generate a dt-equal node checking if ls.cdr is the empty list. This one does need success and failure branches.

On the success branch, the specialized patterns look like this:

((patact (row (apply #:subject    ls
                     #:procedure  car
                     #:vars       (ls.car)
                     #:subpattern (row (var #:subject ls.car
                                            #:name x))))
         (action return x))
 (patact (row (apply #:subject    ls
                     #:procedure  car
                     #:vars       (ls.car)
                     #:subpattern (row (wildcard #:subject ls.car)))
              (var #:subject ls.cdr
                   #:name ls*))
         (action do-last ls*)))

We have completely eliminated the cdr patterns from the top row! Only the irrefutable car pattern is left, and when the first row is irrefutable, the pattern has matched – so we generate the dt-apply node and its dt-rename child and have it point to the action which returns x. Done!

On the failure branch of the check for a null ls.cdr, we’ve thrown away the top row again and we end up with this single patact:

((patact (row (apply #:subject    ls
                     #:procedure  car
                     #:vars       (ls.car)
                     #:subpattern (row (wildcard #:subject ls.car)))
              (var #:subject ls.cdr
                   #:name ls*))
         (action do-last ls*)))

Once again, this is irrefutable, so we again generate the dt-apply node for ls.car, the dt-rename node for the ls* variable, and then finish at the action.

Picking a pattern

Okay so, at each step of the pattern matching process, we pick a specializer from the first row’s patterns, generate a node from it, and specialize/default the remainder of the patterns on it. But how do we pick which pattern to choose from the row?

We have to work out which pattern is going to help us find answers the fastest, probably by comparing against the patterns in different rows somehow. We need a heuristic – a scoring function to guess which pattern is most likely to help find the matching clause in the fewest nodes possible.

Maranget considers the question of which heuristics to use, assigning letters to each of a number of potential scoring functions and measuring the impact of many potential combinations of them. In Maranget’s terms, extensible-match uses heuristic fnr: the specializer picked has to be in the first row;6 the specializer is the one which will specialize the most rows (most needed); if there’s a tie for the most needed specializer, we pick the pattern with the smallest combined size of the specialized and defaulted rows. (If even that goes to a tie, just pick the leftmost tied specializer.) I didn’t benchmark this particularly: n seemed like an intuitively sensible heuristic; based the suggestion in Maranget’s paper, I later added r as an additional tie-breaker.

Some of the heuristics Maranget considers aren’t really applicable to Scheme. His ML version can generate nodes with more outward branches than just success and failure, because the ML compiler can generate a single switch over many types at once; Scheme detects different types by calling individual predicate procedures for each individual type, so that’s not possible, and neither is the heuristic b which picks the smallest switch to build, for example – so extensible-match also can’t use the combination of heuristics Maranget personally recommends, pba.

So why was it so hard/why didn’t you do it slightly differently

As always, the hard thing was not the actual writing of the code, but working out what the right code to write was.

For example, you might ask why I wasn’t tempted to skip the step of representing decision trees as their own, reified type and simply recurse by generating new match expressions nested within the branches of an if or within the body of a let for the specializer we’ve chosen, for example.

This is, indeed, very tempting. If you’re implementing a compiler based on decision trees, you might even find some literature suggesting that you could do exactly this. Surely, you’ll think, just like those Swap nodes in Maranget’s paper aren’t actually part of how it’s really compiled – surely you can avoid generating the other nodes per se too, and just go straight to generating the code which those nodes will eventually be turned into?

So you try it, and it doesn’t work, and you regret it, and you commit it to the repository anyway as a warning to your future self and to others. Actually there were other things that were wrong with that first attempt – trying to insist that patterns form a matrix-like grid turned out to be the wrong thing for this kind of pattern input AST. My idea was that I really wanted to be able to property test the decision tree generator, ensuring that for any generated patterns and terms it always had the same result as the naïve style of generation shown above. I was thinking of decision tree generation as an optional optimization pass, rather than a fundamental transformation pass. This is the wrong mental model to have.

Anyway, the real reason that would have been the wrong approach even if it had worked is that it wouldn’t have allowed any optimization for code size or compilation speed. Wait whaaaaaat??

Optimizing space and time

Yeah, remember how I mentioned about the potentially exponential code size if you choose decision trees, but only in cases where the tricks to avoid it start to fail? It’s time to talk about those tricks.

The most important trick is sharing of nodes. When two different branches in the tree end up doing the same things, they should actually re-converge and become the same branch again. This makes the decision tree actually a decision dag (directed acyclic graph).

To achieve this sharing of nodes between different code paths, extensible-match uses two tricks. The most important one is hash consing, a trick more generally applicable in functional programming. When you have immutable structures which might be created many times over with the same contents, you can maybe save memory by storing all instances of structures of that type in a hash table. When someone asks to create a new instance of the type, first check if there’s already one existing which has the contents they’re interested in. If so, just return that existing one instead of creating a new one. (But if not, be sure to put the newly-created object into the table before returning it, in case someone asks for another one of the same in future.)

Neither R6RS’s record system nor its hash tables were really meant to support doing this, and the (extensible-match decision-tree) library has to do some pretty undignified nonsense in order to make it work. But it does work, and saves both time and space: space obviously, but also time because it stresses the garbage collector less to have fewer objects sitting around.

There’s a second layer of caching which is similar, one I mentioned I was considering using in an earlier post: the recursive procedure which generates tree nodes from a (specialized or defaulted) set of rows and actions is memoized. This has a similar effect to hash consing but, since hash consing already guarantees sharing, it optimizes compilation time and not decision tree size. Memoization avoids repeating an exponential recursion entirely in many cases. I measured how many cache hits the memo table got for some complex patterns, and how much it sped things up, and it was totally worth it. The 14 cases over three list variables which used to take 9.3 seconds to compile on Guile now comes back almost immediately.

Visualization

What’s the most common target language for compilers? x86? The JVM? C? JavaScript? If I were a betting lass, I might be tempted to put money on the answer being DOT, the GraphViz file format. If you want to understand what your compiler is doing, the easiest way to see it is often to actually have it draw your control-flow graph, automaton, or decision dag in two dimensions. Outsourcing the actual drawing task to GraphViz is a nice sweet spot in the trade-off between getting it working and making it nice.7

Initially I tried to see what was going on with a very silly (non-hygienic) code generator outputting Scheme code, but this quickly got tiring to read, especially because it couldn’t show sharing in the dag. (How the real code generator represents shared nodes is a topic for next time.) With a 2D graph, it’s relatively easy to get a rough impression on first glance of how well the decision tree generator handles certain cases; by following the arrows, you can fairly easily check for anything it’s doing wrong.

I’ve kept an archive of older DOT files showing how the pattern match compiler has got more capable over time, going back to some of the first decision trees this code ever generated, before it was even hooked up to a code generator. (This DOT file is dated 2 December 2024, and I didn’t get the code generator working until Candlemas.) This lets me see how certain changes improved or disimproved the generated trees over time.

Originally, to review these DOT files, I had to manually copy the generated DOT source into a file and invoke the dot command line tool. But recently I thought, hey, I have a terminal emulator which can display inline graphics, so why not just show them directly in the REPL? So I wrote a little library to do just that.

Source code of the library which plops a drawing of a graph straight into my terminal
(library (extensible-match dot)
  (export show-dot)
  (import (rnrs (6))
          (only (guile) mkstemp port-filename waitpid)
          (srfi :39)
          (ice-9 popen))

  (define-syntax show-dot
    (syntax-rules ()
      ((_ expr)
       (let* ((dot-port (mkstemp "/tmp/dot-XXXXXX"))
              (dot-file (port-filename dot-port)))
         ;; It was probably a mistake that decision-tree->dot writes
         ;; to current-output-port, but I make the best of it:
         (parameterize ((current-output-port dot-port))
           expr)
         (close-port dot-port)
         (let-values (((from to pids)
                       (pipeline
                        `(("dot" "-Tpng" ,dot-file)
                          ;; I tried out both sixel with img2sixel
                          ;; (more widely supported) and the inline
                          ;; bitmap functionality of iTerm2 (my
                          ;; terminal emulator of choice) with this
                          ;; imgcat program. Sixel was very fragile
                          ;; when the terminal window got resized (the
                          ;; bitmap area would just unrecoverably
                          ;; blank out sometimes), so I went with
                          ;; imgcat. It’s for my own use only, anyway,
                          ;; and not too hard to change if needed:
                          ("imgcat" "--iterm2")))))
           (close-port to)
           (let ((stdout (standard-output-port)))
             (put-bytevector stdout (get-bytevector-all from))
             (flush-output-port stdout))
           (close-port from)
           (for-each waitpid pids)))))))

Enough of the talking, what do these decision trees actually look like, then? Here’s the drawing for the last example we worked through above.

(pair? ls)(receive (ls.cdr) (cdr ls) …)T(fail)F(equal? ls.cdr '())(receive (ls.car) (car ls) …)T(receive (ls.car) (car ls) …)F(let ((x ls.car)) …)(let ((ls* ls.cdr)) …)(return x)(do-last ls*)

The shapes of the nodes are the classic flowchart style: diamonds for decisions, rectangles for steps, and circles for termination.

Note that the do-last path also generates a call to (car ls) even though it doesn’t use the value of the car. In theory, it should be smart enough not to do this, but I decided for now to leave that optimization to the Scheme compiler on the theory that it can probably do a better job of it anyway.

For another simple example, this is a type-safe list merge.

(define list-merge
  (match-lambda
    (('() '())  '())
    (((? pair? xs) '())  xs)
    (('() (? pair? ys))  ys)
    (((cons x xs*) (cons y ys*))
     (if (< y x)
         (cons y (list-merge (cons x xs*) ys*))
         (cons x (list-merge xs* (cons y ys*)))))))
(equal? arg1 '())(equal? arg2 '())T(pair? arg1)F(return-null)T(pair? arg2)F(fail)F(equal? arg2 '())T(let ((ys arg2)) …)TF(return-ys ys)(let ((xs arg1)) …)T(pair? arg2)F(return-xs xs)F(receive (arg1_car) (car arg1) …)T(let ((x arg1_car)) …)(receive (arg1_cdr) (cdr arg1) …)(let ((xs* arg1_cdr)) …)(receive (arg2_car) (car arg2) …)(let ((y arg2_car)) …)(receive (arg2_cdr) (cdr arg2) …)(let ((ys* arg2_cdr)) …)(do-merge x xs* y ys*)

You can see the distinction between subject variables (arg1, arg2, arg1_car, arg1_cdr, arg2_car, arg2_cdr) and pattern variables. You can follow through and see how it avoids repeating the pair? check on both arguments for the final clause – there are two different pair? checks on arg2, but they’re in mutually exclusive code paths. There’s a long chain of dt-apply (receive) and dt-rename (let) nodes at the end, because the decision tree generator puts off those steps, which don’t affect whether the patterns match or not, until the end.

For a more complex example, let’s look at Chris Okasaki’s red-black tree balancing pattern (paper, book). This is probably pattern match compiler authors’ favourite example to use, because

  1. it’s nontrivial

  2. it’s real and not contrived

  3. it’s a ‘hot’ function which gets called log2n times for every insertion into a tree of size n, making it important to optimize well

  4. it’s a pretty amazing example of using pattern matching to translate reasoning about data structure invariants directly into running code, in a way that would be difficult without pattern matching

What does it do? Basically a red-black tree stays relatively well balanced by colouring the levels of nodes red or black, and maintaining two properties about where coloured nodes are allowed to appear relative to one another. One of these rules is that no red node is allowed to touch another red node: black touching black is okay; red touching red isn’t. So Okasaki works out that after finding a place to insert a new red node, there are exactly four ways in which a tree can end up ‘unbalanced’ in having two adjacent red nodes; in that case the tree needs to be ‘rotated’ a little to restore balance.

He writes that like this (but in ML):

(define (balance n)
  (match n
    ((or (node 'black (node 'red (node 'red a x b) y c) z d)
         (node 'black (node 'red a x (node 'red b y c)) z d)
         (node 'black a x (node 'red (node 'red b y c) z d))
         (node 'black a x (node 'red b y (node 'red c z d))))
     (node 'red (node 'black a x b) y (node 'black c z d)))
    (_ n)))

a, b, c, and d are values in the tree; x, y, and z are child nodes. The sort order of the values and the values of the child nodes is a < x < b < y < c < z < d. Visually explained, the job of this procedure is to turn one of these wrong trees:

axbyczdaxybczdaxzbycdaxybzcd

Into this right tree:

axbyzcd

Here’s what the decision dag looks like:

(node? n)(receive (nc1) (node-colour n) …)T(already-balanced)F(equal? nc1 'black)F(receive (nl1) (node-left n) …)T(node? nl1)(receive (nlc1) (node-colour nl1) …)T(receive (nr1) (node-right n) …)F(equal? nlc1 'red)(node? nr1)F(receive (nll1) (node-left nl1) …)T(node? nll1)(receive (nllc1) (node-colour nll1) …)T(receive (nlr1) (node-right nl1) …)F(equal? nllc1 'red)(node? nlr1)F(receive (nlll1) (node-left nll1) …)T(let ((a nlll1)) …)(receive (nllv1) (node-value nll1) …)(let ((x nllv1)) …)(receive (nllr1) (node-right nll1) …)(let ((b nllr1)) …)(receive (nlv1) (node-value nl1) …)(let ((y nlv1)) …)(receive (nlr1) (node-right nl1) …)(let ((c nlr1)) …)(receive (nv1) (node-value n) …)(let ((z nv1)) …)(receive (nr1) (node-right n) …)(let ((d nr1)) …)(do-balance a x b y c z d)F(receive (nlrc2) (node-colour nlr1) …)T(equal? nlrc2 'red)F(let ((a nll1)) …)T(receive (nlv1) (node-value nl1) …)(let ((x nlv1)) …)(receive (nlrl2) (node-left nlr1) …)(let ((b nlrl2)) …)(receive (nlrv2) (node-value nlr1) …)(let ((y nlrv2)) …)(receive (nlrr2) (node-right nlr1) …)(let ((c nlrr2)) …)F(receive (nrc3) (node-colour nr1) …)T(equal? nrc3 'red)F(receive (nrl3) (node-left nr1) …)T(node? nrl3)(receive (nrlc3) (node-colour nrl3) …)T(receive (nrr3) (node-right nr1) …)F(equal? nrlc3 'red)(node? nrr3)F(let ((a nl1)) …)T(receive (nv1) (node-value n) …)(let ((x nv1)) …)(receive (nrll3) (node-left nrl3) …)(let ((b nrll3)) …)(receive (nrlv3) (node-value nrl3) …)(let ((y nrlv3)) …)(receive (nrlr3) (node-right nrl3) …)(let ((c nrlr3)) …)(receive (nrv3) (node-value nr1) …)(let ((z nrv3)) …)(receive (nrr3) (node-right nr1) …)(let ((d nrr3)) …)F(receive (nrrc4) (node-colour nrr3) …)T(equal? nrrc4 'red)F(let ((a nl1)) …)T(receive (nv1) (node-value n) …)(let ((x nv1)) …)(let ((b nrl3)) …)(receive (nrv3) (node-value nr1) …)(let ((y nrv3)) …)(receive (nrrl4) (node-left nrr3) …)(let ((c nrrl4)) …)(receive (nrrv4) (node-value nrr3) …)(let ((z nrrv4)) …)(receive (nrrr4) (node-right nrr3) …)(let ((d nrrr4)) …)

You can see that nodes are shared (1, 2, 3, 4) to reduce the overall size of the graph. This may look big and complicated, but at run time it’s efficient because no test or extraction will ever be repeated on any given code path. There are 13 branch points, the same number as in Colin James’s independent OCaml implementation of Maranget’s algorithm, a nice confirmation that my implementation is doing the right thing :-)

Can you imagine trying to write out the code for this pattern match by hand? You don’t have to imagine, because Rich Hickey implemented this data structure in bad old Java as part of Clojure. His version is spread all over the place because old-school Java just lacked the features to make writing this function nice. (I’m not even sure new-school Java could do much better.)

For an example where things start to fall down in Scheme, consider Maranget’s bytecode interpreter for a miniature dialect of ML. In figures 6 and 7 of his 2008 paper – which really ought to be shown as motivating examples on the first page – Maranget showed how a bad and a good decision tree for this can look in OCaml, a statically-typed language.

First of all, here’s roughly what the run function would look like in Scheme:

(define (run a s e c)
  (match (values a s c)
    ((_ _ (cons (ldi i) c))  (case-1))
    ((_ _ (cons (push) c))   (case-2))
    (((? integer? n2) (val (? integer? n1)) (cons (iop op) c))  (case-3))
    ((0            _ (cons (test c2 _) c))  (case-4))
    (((? integer?) _ (cons (test _ c3) c))  (case-5))
    ((_ _ (cons (extend) c))    (case-6))
    ((_ _ (cons (search k) c))  (case-7))
    ((_ _ (cons (pushenv) c))   (case-8))
    ((_ (cons (env e) s) (cons (popenv) c)) (case-9))
    ((_ _ (cons (mkclos cc) c))     (case-10))
    ((_ _ (cons (mkclosrec cc) c))  (case-11))
    (((clo cc ce) (cons (val v) s) (cons (apply) c))  (case-12))
    ((a (cons (code c) (cons (env e) s)) '())         (case-13))
    ((a '() '())  (case-14))))

You can already see that we’re running up against the limits of what it’s sensible to do in Lisp notation here.

What does it look like when compiled? I shared what it looked like in the very first working versions of the decision tree generator above … and today it looks like this. To be sure, there’s improvement there (196 nodes in the December 2024 version, 135 nodes today). But it’s also very far from the nice, elegant dag of just a dozen or so branches which Maranget shows in his paper, even accounting for the fact that his version can switch over many types at once and mine can’t. What gives?

Well, here dynamic typing messes up opportunities to do better. If you trace down the graph from the top, it looks good until you get to the third case. Once the pattern matcher knows that the current instruction (the first element in the list c for code) is an iop but the top of the stack s isn’t a val, it should jump straight to fail because there is no situation in which any of the other patterns can match. But it doesn’t know enough to know that; what if something might match both iop and test, or pushenv? A lot of these nodes are therefore dead, and will never execute except to go a long, slow way to a case that should never happen when this function is actually running anyway. I’ve been meaning to study in details exactly which branches are impossible, and see if it would be possible to do anything at all better here, like allow users to declare which predicates in ? are mutually exclusive. (Initial results on that particular feature weren’t exactly encouraging either. Update, 30 November: Scratch that! I just tried it again and it works great. 86 nodes, down from 135. I must have forgotten to clear the cache of pre-compiled code before the last time I tested it, or something. The problem now is that a complete implementation of this would require implementing something like Datalog …) It does do some very basic stuff to avoid generating redundant nodes like this, but not nearly as much as an ML compiler can.

As the folklore predicts, Racket’s backtracking automaton does better here for code size but worse for speed, generating many checks that c is a pair and many separate extractions of its car and cdr. It only does these things once for cases 1 and 2, but again for case 3, again for case 4, again for case 5, and only gets back on a roll again with cases 6, 7, and 8 …

Is all of this worth it?

Last of all, here’s the hand-wringing section where I ask whether I should have bothered with any of this!

In 2000 Kevin Scott and Norman Ramsey wrote a draft paper asking ‘When Do Match-Compilation Heuristics Matter?’; their answer is ‘almost never’. Out of 18 program benchmarks, only five were affected in decision tree size by the use of different heuristics, and three of those were made worse by choosing heuristics other than simple left to right selection of patterns. More encouraging is that 13 of the 18 benchmarks were artificial benchmark programs, but one of the five where heuristics did make some difference was a real-world program (the SML/NJ compiler).

Maranget’s results measuring heuristics may be more representative because he created a corpus of match expressions to test on specifically designed to represent a wide range of sizes and matching techniques; and because he actually used his own decision tree generation algorithm, the one extensible-match’s is based on. He finds a larger difference than Scott and Ramsey – about 20% improvement in code size and 10% improvement in average path length for any individual heuristic, compared to just naïvely picking the leftmost pattern in each row every time. As a reflection on how Maranget’s corpus might not be exactly representative, though, the heuristics only score about 15% better for code size and about 3% better for average path length than naïvely picking the rightmost pattern in each row.

Still, these are fairly slim pickings. As mentioned above, Guile’s optimizer is already capable of taking a naïve pattern match compilation and doing optimizations equivalent to generating a backtracking automaton that always picks the leftmost pattern for each operation; basically on the bytecode machine example it would probably do about as well as Racket’s match compiler, even without any help from the macro. The benchmarks by Scott and Ramsey suggest this should be good enough for almost every real use case. If Guile ever ships SRFI 262 in the standard distribution, it might well not be worth the extra weight in code to include a heuristic-driven decision tree generator; maybe that version should go back to just generating the naïve kind of compilation shown in the very first example above, and should let Guile’s own optimizer do the work. Changing strategy to the automaton style in that case also seem like the right thing, given that any version shipped with Guile would probably also end up in Hoot, which has its own reasons for optimizing code size. For other Scheme implementations which don’t (yet) do the online common subexpression elimination pass Andy Wingo developed, the winnings will definitely be bigger – although still diminishing if online CSE becomes more common.

Next time

Whew, this was a long one. Next time will be a comparatively short conclusion, on the code generator.


  1. Especially considering personal matters got in the way of completing this article promptly. Sorry about that!↩︎

  2. A transport/railway simulation game like Transport Tycoon that ran in a text terminal. That makes it sound a lot cooler (and more fun, and more functional) than it actually was.

    If you want to learn how to do pathfinding in a game these days, that tutorial still isn’t bad, but you might have more fun with Amit Patel’s interactive walkthrough including other graph search algorithms.↩︎

  3. This example is from Philip Wadler’s chapter ‘Efficient Compilation of Pattern Matching’ in Simon Peyton Jones’s book The Implementation of Functional Programming Languages, renamed after a Unix command line utility which hackers are probably at least passingly familiar with, and with an explicit comparison predicate argument because Scheme is monomorphic. The rest of this article has absolutely nothing to do with the contents of that chapter.↩︎

  4. In 2016 Sam Tobin-Hochstadt held a public seminar on YouTube explaining Racket’s implementation.↩︎

  5. Actually it’s five vs ten because I’m still omitting the seq-pattern and its decision tree equivalent dt-seq here.↩︎

  6. This is actually a requirement of the semantics of SRFI 262: all implementations have to use something like f as their first ‘heuristic’. This isn’t necessary in ML because the type testing functions the compiler notionally generates calls to are internal to the ML implementation and known to have universal domain; in SRFI 262 it’s possible to use a previous row as a type check that a procedure in a subsequent row will be safe to call.↩︎

  7. Nicer, home-spun solutions are possible at the cost of more work.↩︎

Friday, November 28, 2025

Monday, November 24, 2025

spritely.institute

Composing capability security and conflict-free replicated data types

Various personified brassicas chatting

In August, I attended the DWeb Seminar where a small group of builders gathered to discuss the state-of-the-art and open problems in the distributed web space. Some in the group are primarily concerned with distributed data and focus on sync algorithms and local-first use cases. I am mainly concerned with distributed behavior and focus on the object capability security model. Both areas of study are steeped in their own lore and research papers, which makes it difficult for the two camps to communicate effectively with each other.

It is in the interest of unity that I write this blog post. I will show how distributed behavior and data techniques can be composed to build local-first applications that combine the strengths of each paradigm.

Fortunately, the distinction between behavior and data is a false dichotomy; they are two sides of the same coin. This circular relationship is well understood in the Lisp world where we say that “code is data and data is code� in reference to Lisp’s homoiconic syntax.

Messages are both behavior and data. They invoke behavior but are also encoded as a string of bytes and sent across the wire. Our context within the tower of abstraction determines how we look at a message. What is treated as data at one abstraction level may be treated as behavior in another. We need to be equipped to handle both cases.

I’ve crystalized all that I’ve learned recently into a small prototype that combines the following techniques:

Local-first chat again

There are no new vegetables, just new brassica oleraceavarieties

I started down the well-trodden path of making a local-first group chat application. Seemingly every other DWeb-adjacent project has one, after all, so why shouldn’t Spritely? I then branched off and went down my own trail, trying to compose tools in ways I hadn’t quite seen before in this context. The result is Brassica Chat!

Brassica Chat is written in Scheme, is built upon Goblins, our distributed programming environment, and uses Hoot to compile it all to WebAssembly so it can be used on the web. Besides just posting messages, it also supports some of the usual features like editing/removing messages and emoji reacts. 🚀

Demo time!

Below is an embedded and simplified demo of Brassica Chat that simulates a conversation between Alice, Bob, and Carol. Messages are sent over a fake network and each user’s network access can be toggled with a button to simulate network partitions and offline usage. Alice is the chat room creator and has the privilege to edit/remove any post. Bob and Carol can only edit/remove their own posts. Okay, hopefully that’s enough context. Try it out!

If you’d like more screen real estate, try this demo on its own dedicated web page. Check out the source code if you’d like.

High-level design

Brassica Chat unum diagram

Let’s examine the scenario modeled in the demo more closely. First, Alice creates a new chat room on her computer. She then shares a capability with her friend Bob and another (distinct) capability with Carol that grants them the privilege to send messages to her chat room. Bob and Carol reciprocate by giving Alice capabilities to their respective chat room copies. The resulting network is shown in the diagram above.

Note that Bob and Carol are not directly connected to each other but rather indirectly connected through Alice. This is because Bob and Carol did not exchange capabilities with each other. This is okay! They can all still chat with each other in real time as long as Alice is online. When Alice goes offline, Bob and Carol can still send messages locally. Everything done while in offline mode will be synchronized once Bob and Carol can connect to Alice again. Perhaps Bob and Carol will exchange capabilities with each other later so they can still chat in real time when Alice is offline. The important detail is that Brassica Chat does not try to wire everyone together directly without the active consent of its users.

Each user in the system has a cryptographic identity in the form of a public/private key pair. This key is used for signing messages. In addition to the key, an identity also contains a human-readable, self-proposed name for displaying in the user interface.

Each chat room is an eventually-consistent replica of the distributed chat room state managed using a collection of CRDTs. Chat rooms can propagate locally created or remotely received messages to other replicas of the chat room for which it holds a capability. The replication process works to eventually achieve convergence across all reachable replicas.

At a meta level, these replicas can be thought of as forming a single, conceptual chat room actor. To use some ocap jargon, the chat room is an unum where each presence (replica) communicates by broadcasting messages to the other presences it knows about. In the diagram above, there’s a dotted line drawn around the three replicas to indicate that the chat room is an abstract entity whose canonical form does not live on any single machine. The presences are all co-equal; no single presence has more privilege than any other.

The stack

Brassica Chat layers diagram

There are four levels of abstraction in the Brassica Chat architecture. From bottom to top, they are:

  1. Object capabilities: online access control through reference passing.
  2. Actors: online, asynchronous messaging through object references.
  3. CRDTs: eventually consistent, offline messaging.
  4. Authorization capabilities: offline access control through certificate chains.

All objects in the application are represented as actors, including CRDTs. Implementing CRDTs as actors has been done elsewhere, Akka being a notable example.

A reference to an actor is an object capability. In other words, holding a reference to an actor gives you the authority to send messages to it. An actor needs to be online in order to receive messages, however. For offline usage, an object capability variant known as an authorization or certificate capability is used, as well.

Messages are sent between machines using the Object Capability Network (OCapN) protocol, which handles the burden of secure message transport. Messages can be transported over any medium with an associated OCapN netlayer. For this prototype, I used a WebSocket netlayer with a relay in the middle. The CRDT implementation has its own messaging protocol which is defined using actors so that it automatically works over OCapN.

On capabilities

Brassica Chat’s use of capabilities stands in contrast to most existing local-first applications that use the access-control list (ACL) model. In the ACL model, users are associated with groups or roles that grant privileges. When compared to capabilities, the ACL model has many deficiencies:

  • ACLs are too coarse-grained. It’s difficult to follow the principle of least authority with a limited set of role-based privilege levels so the norm is for users to have more privilege than is necessary. By contrast, capabilities can be arbitrarily fine-grained. Want to make it so that Bob can only moderate Carol’s posts and not Alice’s? It’s easy and natural to make a capability for this but awkward to define a one-off ACL role.

  • ACLs can’t be safely delegated. Only an administrator may grant or revoke privileges. As a non-admin, your only option is to share your credentials, which is unsafe and hard to audit. Credential sharing happens often in the real world due to the friction involved in doing things “the right wayâ€�. With capabilities, it is easy to delegate a subset of your authority to someone else in an auditable, revokable manner without sharing your own credentials or communicating with a central authority.

  • Most importantly, ACLs have inherent vulnerabilities, such as the confused deputy problem. The “if you don’t have it, you can’t use itâ€� approach of capabilities avoids an entire class of security bugs.

In short, capabilities are safer, more expressive, and more decentralized than ACLs. Now, let’s move on to some implementation details.

The chat room actor

Screenshot of a chat application where Alice, Bob, and Carol are talking

The chat room actor is implemented as a composition of several CRDTs. Rather than using one giant CRDT for the entirety of a chat room’s history, it is partitioned by time into a set of chat log CRDT actors. Each partition covers some uniform number of seconds of real time known as the “period�. This means that all presences must use the same period value in order to converge properly (30 minutes was chosen as a reasonable default). The benefit of this partitioning strategy is that it allows each replica to perform garbage collection (GC) on entire chunks of history without coordinating with the other replicas (GC within a CRDT requires coordination). This ought to keep the append-only log for any individual chunk of history quite small and manageable. Rebuilding the state of a previously deleted chunk from scratch shouldn’t take much time, assuming there is another replica online with the data. For this prototype I didn’t bother to GC old message history as the chat rooms are ephemeral and not persisted to disk (but we could use Goblins’ persistence API to do so in the future).

In addition to the message log partitions, there are two additional CRDT actors that make up the chat room: profiles and certificates. The profiles CRDT contains a mapping from a user’s public key to their self-proposed display name (and could later be extended to include other metadata that a user would like to share with the room). The certificates CRDT contains the set of all zcaps that have been issued for the chat room.

The CRDT actors

Simple chat log CRDT diagram

CRDTs can be roughly divided into two categories: state-based or operation-based. Brassica Chat uses operation-based CRDTs, which can be thought of like a Git repository with automatic conflict resolution. Each replica of an operation-based CRDT maintains an event log containing all of the operations that have occurred. Due to concurrency in distributed systems, an event may have one or more direct causal predecessors (a fancy term for “parents�). Thus, the log entries form an append-only, directed acyclic graph (DAG), as shown in the diagram above.

An event has the following immutable fields:

  • ID: Unique ID of the event (SHA-256 hash).
  • Parent IDs: IDs of all causal predecessors (forming a DAG).
  • Timestamp: Timestamp from a hybrid logical clock indicating when the event occurred.
  • Author: Creator of the event (ed25519 public key).
  • Signature: Crytographic signature of the event.
  • Blob: Syrup encoded event data (Syrup is the binary serialization format used by OCapN).

Events are delivered in causal order, meaning that an event is not applied to the CRDT’s internal state until all of its predecessor events have been applied. Concurrent events may be applied in any order, so it’s important that operations on the CRDT state are commutative. Despite causal order being encoded in the event graph, a logical timestamp is included in each event. This is important for handling concurrent events and is used to implement common CRDT patterns like the “last write wins� register.

Brassica Chat contains a generic operation-based CRDT actor with prepare, effect and query hooks (straight out of the CRDT literature) for special-purpose CRDTs to implement. The CRDT actor is used as the basis for the chat log, certificates, and profiles actors.

This CRDT implementation, though on the simple side, is Byzantine fault tolerant. A Byzantine fault is best explained by the following scenario: Mallet, a user who is up to no good, sends Alice and Bob an event with the same ID but different contents. When Alice and Bob sync data with each other, they ignore events with IDs that they already have and don’t realize that Mallet has tricked them. The result is that Alice and Bob will never converge to the correct state because their message logs contain different operations.

Divergence due to Byzantine behavior is prevented through content-addressing and cryptographic signing of events, much like Git, as described in Martin Kleppmann’s “Making CRDTs Byzantine Fault Tolerant� paper. Mallet cannot send Alice and Bob events with the same ID but different contents because the ID is the hash of the contents and if the hash doesn’t match then the event is rejected. Events are signed to associate them with the author for use with the authorization capability system and the parent IDs are incorporated into the signature to prevent replay attacks. For this prototype, SHA-256 was chosen for the hash function and ed25519 for signatures.

Any number of Byzantine replicas may be in the network, but as long as Alice and Bob can directly connect to each other, or indirectly connect through a non-Byzantine node such as Carol, the well-behaved nodes will eventually converge to the correct state. While not implemented in this prototype, detection of Byzantine behavior from a replica could be used as the basis for revoking the object capability being used to send such messages, adding a layer of accountability to the system.

Authorization capabilities

Certficiate chain diagram

With CRDTs in the mix as an offline messaging layer, object capabilities alone are insufficient for access control. The ocap layer controls access to synchronize chat messages between two replicas but it does not (and cannot) control what those messages contain. Why is that? Because the chat messages are at a higher level of abstraction than the actor messages for which the ocaps apply. When Bob writes the message (react alice-message-1 "👋") to his local replica, he is sending a message to the abstract chat room that doesn’t exist in any single location. What if Alice wanted to prevent Bob from reacting to messages? Who even has the authority to impose that restriction when there’s no central server? We’ve traded away strong consistency to support local-first usage, so there is no way for an adminstrator to install an ocap on all replicas such that they are all guaranteed to reject this message from Bob and converge to the same state. Ocaps are online capabilities, but CRDTs use offline messaging. We need an offline capability that can be used to process the offline messages.

This is where authorization capabilities (zcaps) come in. A zcap is a signed certificate that describes what actions a controller of that certificate may perform. Like ocaps, zcaps support delegation which is represented as a chain of signed certificates. A crucial property of a delegated zcap is that it cannot expand privilege, only reduce it. Certificate chains need to bottom out somewhere, so we need to decide upon a root signer. In Brassica Chat, the initiator of the chat room (Alice in our example scenario) is considered to be the root signer for all zcaps used in the chat room. This is just a convention, though, and a user could decide to place their trust in a different root signer.

Certificates in Brassica Chat are inspired by ZCAP-LD and are composed of the following immutable fields:

  • ID: Unique ID of the certificate (SHA-256 hash).
  • Parent ID: ID of the previous certificate in the delegation chain.
  • Signer: The public key used to sign the certificate. The signer must be a controller of the parent certificate to be considered valid.
  • Controllers: A list of public keys for the users who are allowed to invoke the capabilities of this certificate.
  • Predicate: An expression that constrains (or attenuates, to use the ocap term) the capabilities granted by the parent certificate. For example, the expression (when-op (edit delete) (allow-self)) says that edit and delete operations can only be used on posts authored by the user invoking the capability (one of the controllers).

Certificates also carry one piece of mutable state: a flag that the signer can flip from false to true to revoke the certificate. Revocation cannot be reversed, making this a trivially monotonic operation within the certificates CRDT.

At first glance, zcaps might appear to have the same problem as ocaps: a zcap cannot prevent Bob from sending a message that is not permitted because there’s no strong consistency. Instead, zcaps specify the rules by which well-behaved clients should interpret the events that have occurred. For example, Bob can send a message that edits the contents of Carol’s post, but if the zcap Bob used for that operation does not grant the capability to edit posts authored by Carol then that edit will simply be ignored when updating the chat room state on a given replica. Since zcaps are encoded as certificate documents, they can be synced amongst all replicas so that the user interface can eventually render the correct view of the chat room. This is a good example of something treated as data at one level of abstraction but behavior at another.

Security considerations

The security implications of sharing a capability to a chat room are rather large. If Alice, Bob, and Carol have replicas of the same chat room then sending Alice a message means indirectly sending Bob and Carol messages, too. Each presence of the chat room is co-equal with all other presences, after all. As a consequence, we cannot perform administration in a centralized manner like we could if there was a single canonical chat room actor living on a single machine. Revocation, for example, is now a communal effort. If Mallet can propagate messages through Bob and Carol (because Mallet holds a capability to both) then Bob and Carol must each revoke their respective capabilities in order to prevent Mallet from sending messages to the chat room in the future. While it’s possible to create a zcap that would cause Mallet’s messages to be ignored by clients, it doesn’t change the fundamental truth that Mallet has the capability to send messages to the chat room until such a time that all previously issued ocaps have been revoked. The formation of complete networks, where each replica holds a capability to sync with every other replica, is thus discouraged in this design. The connectedness of a replica is a function of how trusted the user of that replica is in the real world social group. The more strongly connected a user is, the harder it becomes to remove them later if the social dynamic changes. There is a tension between the risk imposed by a strongly connected network and the desire to maximize availability of the chat room for online users.

The overall security goal for this prototype was to prevent Mallet from irreparably destroying the shared state of the chat room, which was achieved through Byzantine fault tolerance. Additionally, message signing and zcaps provide a means of holding Mallet accountable for anti-social/malicious actions that the system is technically incapable of preventing, giving users some agency over what they see in their client interface. Is this good enough?

Things left undone

This prototype was focused on exploring the core of a minimally viable p2p chat built on capability security principles. It is not production software. I did not concern myself with optimal bandwidth nor memory usage. As mentioned earlier, chat history is not even saved to disk.

Some areas for improvement are:

  • Decentralized identity and naming. This was deliberately left out to keep the scope of this experiment manageable. Spritely has another project, codenamed Brux, to explore this topic. See also our paper on petnames.

  • Ergonomic UI/UX for the complexity introduced by decentralization and eventual consistency. What’s a user-friendly way to add and revoke ocaps and zcaps? The UI doesn’t even attempt to allow viewing or editing zcaps right now. How can we clearly communicate what the security properties are/aren’t so that users don’t get false impressions?

  • History rewriting. If Mallet writes some truly terrible content to the append-only chat log, it’s stuck in there even if it’s hidden in the user interface. Introducing some amount of synchronization to deal with this scenario seems okay. We could take inspiration from Git where the commit graph is append-only but branch names are mutable pointers.

  • Preventing new members from reading past messages like in Signal groups. This should be an option like it is in other secure chat programs, but it’s a complex topic and exploring it was out of scope.

Conclusion

Another Brassica Chat screenshot

I hope this was an interesting walkthrough of how ocaps, actors, CRDTs, and zcaps can be composed with each other! Big thanks to the DWeb Seminar organizers for providing the spark of inspiration I needed to dive into the CRDT literature and build this prototype.

by Dave Thompson (contact@spritely.institute) at Monday, November 24, 2025

Sunday, November 23, 2025

The Racket Blog

Parallel Threads in Racket v9.0

posted by Matthew Flatt, Ryan Culpepper, Robby Findler, Gustavo Massaccesi, and Sam Tobin-Hochstadt

With the version 9.0 release, Racket includes support for shared-memory threads that can take advantage of multicore hardware and operating-systems threads to run in parallel— not merely concurrently with other Racket threads, as was the case in versions before 9.0.

Creating a thread that runs in parallel is as simple as adding a flag to the call to thread. To see the effect, try first putting the following code into a file named "thread.rkt" and running racket thread.rkt on the command line:

#lang racket/base
 
(define N 22)
 
(define (go)
  (thread
   (λ ()
     (for ([i (in-range (expt 2 N))])
       (black-box (sqrt i))))))
 
(define t1 (go))
(define t2 (go))
 
(thread-wait t1)
(thread-wait t2)

Racket will find many square roots (tweak N to match your machine), but will keep only one core of your CPU busy. Using time in the shell reports “CPU” (possibly broken into “user” and “system”) and “real” times that are similar. To use two cores, add #:pool 'own to the thread call:

#lang racket/base
 
(define N 22)
 
(define (go)
  (thread
   #:pool 'own ; <-- only change is here
   (λ ()
     (for ([i (in-range (expt 2 N))])
       (black-box (sqrt i))))))
 
(define t1 (go))
(define t2 (go))
 
(thread-wait t1)
(thread-wait t2)

In this case, real time should be about half of CPU time, while CPU should remain similar to before. In other words, the parallel version runs twice as fast. On the machine used below:

 concurrent

  

 parallel

  

 ×1

  

1011 msec real

  

979
 msec CPU
10
 msec CPU for GC

  

 ×2

  

517 msec real

  

1021
 msec CPU
13
 msec CPU for GC

  

Passing the new #:pool argument creates a parallel thread; create pools via make-parallel-thread-pool to have a group of threads share processor resources or just pass 'own to have the new thread exist in its own parallel thread pool.

As a further addition to thread, a #:keep 'result argument keeps the result of thunk when it returns, instead of discarding the result. Retrieve a thread’s result with thread-wait. So, for example,

(thread-wait (thread thunk #:pool 'own #:keep 'result))

runs thunk in parallel to other Racket threads, blocks the current Racket thread (while allowing other Racket threads to continue, even non-parallel ones), and then returns the result value(s) when thunk completes.

To maintain backwards compatibility, the thread function still creates a coroutine thread by default, which is a lightweight thread that is preemptively scheduled and whose execution is interleaved with other coroutine threads. For many tasks that need the organizational benefits of concurrency without the performance benefits of parallelism, such as when managing GUI interactions or orchestrating remote processes, coroutine threads are still the best abstraction. Coroutine threads can use #:keep 'result, too.

Racket’s full thread API works with parallel threads. Follow the links from the thread documentation to see more details on thread pools and for more interesting uses. Of course, just because you put tasks in parallel threads doesn’t mean that they always speed up, as sharing and communication can limit parallelism. Racket’s future visualizer works for parallel threads, tho, and it can help you understand where synchronization in a task limits parallelism. Also, adding parallelism to Racket potentially creates trouble for existing libraries that were not designed to accommodate parallelism. We expect problems to be rare, however.

We’ll explore the performance details and explain why we expect most programs will continue to work well later in this post, but first:

Racket’s Road to Parallelism

Running threads in parallel counts as news in 2025?! Well, it has been a long road.

Racket’s implementation started in the mid-1990s, just as a wave of enthusiasm for parallel programming was winding down. Although operating systems by that point consistently supported within-process threads, computers with multiprocessors were not commonly available. Many language runtime systems from the same era— including Python, Ruby, and OCaml— took advantage of the internal simplicity of a single-threaded runtime system while offering constructs for concurrency at the language level. Racket has always included threads for concurrency, and it was an early adopter of Concurrent ML’s abstractions for managing concurrency well. But an absence of parallelism was deeply baked into the original implementation.

Over time, to provide support for parallelism, we added places and futures to Racket. Places support coarse-grained parallelism through a message-passing API, effectively running parallel instances of the virtual machine within a single operating-system process; limited sharing makes the implementation easier and safer than arbitrary sharing between parallel threads. Futures provide fine-grained parallelism for restricted computations; a future blocks when it tries to perform any operation that would be difficult for the runtime system to complete safely in parallel. Places and futures are both useful, and they avoid some pitfalls of shared-memory threads. Still, fitting a parallel task into futures or places usually requires special effort.

Meanwhile, single-threaded execution was only one of the problems with the original Racket (a.k.a. PLT Scheme) implementation. To address larger problems with the implementation and to improve performance, we started in 2017 rebuilding Racket on top of Chez Scheme. Rebuilding took some time, and we only gradually deprecated the old “BC” implementation in favor of the new “CS” implementation, but the transition is now complete. Racket BC is still maintained, but as of August 2025, we distribute only Racket CS builds at https://download.racket-lang.org.

Chez Scheme is a much better foundation for improving parallelism in Racket. Part of the Racket-rebuilding effort included improving Chez Scheme’s support for parallelism: we added memory fences as needed for platforms with a weak memory-consistency model, and we parallelized the Chez Scheme garbage collector so that garbage collection itself runs in parallel. There’s still plenty of room for improvement— the garbage collector is only parallel with itself, not the main program, for example— but further improvements are more within reach than before. Equally important, the rebuild included new implementations of the Racket thread scheduler and I/O layer in Racket itself (instead of C). Because of these improvements, Racket’s futures worked better for parallelism from the start in Racket CS than in Racket BC.

With version 9.0, we finally take advantage of new opportunities for parallelism created by the move to Racket CS. Internally, a parallel thread is backed by combination of a future and a coroutine thread. The main extra work was making Racket’s coroutine thread scheduler cooperate more with the future scheduler and making the I/O layer safe for Chez Scheme threads— all while making locks fine-grained enough to enable parallelism, and keeping the cost of needed synchronization as low as possible, including for non-parallel Racket programs.

Performance

Here are some simple benchmarks on an M2 Mac to give a sense of the state of the current implementation. This machine has 8 cores, but 4 big and 4 little, so ×4 speedup is possible with 4-way parallelism but less than ×8 with 8-way parallelism.

As an easy first example, we should expect that a Fibonacci [code] run of 1 iteration in each of 4 coroutine threads takes the same time as running it 4 iterations in 1 thread, while 1 iteration in each of 4 parallel threads should take about 1/4th of the time. Also, for such a simple function, using plain old futures should work just as well as parallel threads. That’s what we see in the numbers below.

Times are shown as a speedup over single-threaded, then in real elapsed milliseconds, with CPU milliseconds as the upper smaller number to the right, and CPU milliseconds that are specifically for GC as the lower smaller number to the right. The times are from a single run of the benchmark.

  

(fib 40)

  

real msec

  

CPU
GC

  

  

N

  

sequential

  

coroutine

  

parallel

  

futures

  

  

1

  

 ×1

  

511

  

493
0

  

 ×1

  

506

  

490
0

  

 ×1

  

494

  

494
0

  

 ×1

  

495

  

495
0

  

  

4

  

 ×1

  

2045

  

1978
0

  

 ×1

  

2034

  

1967
0

  

 ×3.7

  

554

  

2210
0

  

 ×3.8

  

545

  

2168
3

  

  

8

  

 ×1

  

4154

  

4021
0

  

 ×1

  

4154

  

4021
2

  

 ×5.4

  

776

  

5928
1

  

 ×5.2

  

796

  

6006
2

  

Of course, most programs are not just simple arithmetic. If we change our example to repeatedly convert numbers back and forth to strings as we compute Fibonacci [code], then we can see the effects of the more complex conversions. This version also triggers frequent allocation, which lets us see how thread-local allocation and parallel garbage collection scale.

  

(strfib* 32)

  

real msec

  

CPU
GC

  

  

N

  

sequential

  

coroutine

  

parallel

  

futures

  

  

1

  

 ×1

  

204

  

197
3

  

 ×1

  

205

  

198
0

  

 ×1.1

  

192

  

192
0

  

 ×1

  

211

  

203
0

  

  

4

  

 ×1

  

826

  

796
10

  

 ×1

  

808

  

780
2

  

 ×3.7

  

222

  

861
10

  

 ×3.7

  

221

  

857
10

  

  

8

  

 ×1

  

1619

  

1563
11

  

 ×1

  

1602

  

1545
4

  

 ×3.9

  

419

  

2544
59

  

 ×4

  

406

  

2551
59

  

From this table, we still see reasonable scaling up to four cores, but the additional work and the use of the garbage collector limit scaling beyond that point.

That first string variant of Fibonacci includes a slight cheat, however: it goes out of its way to use a string->number* wrapper that carefully calls string->number in a way that avoids evaluating expressions that compute the default values of some arguments. The defaults consult the parameters read-decimal-as-inexact and read-single-flonum— which a perfectly fine thing to do in general, but turns out to block a future, because parameter values can depend on the current continuation. In contrast, parallel threads continue to provide a benefit when those kinds of Racket constructs are used. We can see the difference by using plain string->number in place of string->number*, which will fetch parameter values 14 million times in each individual run of (strfib 32):

  

(strfib 32)

  

real msec

  

CPU
GC

  

  

N

  

sequential

  

coroutine

  

parallel

  

futures

  

  

1

  

 ×1

  

772

  

751
4

  

 ×1.3

  

578

  

562
1

  

 ×1.1

  

721

  

721
1

  

 ×0.9

  

873

  

851
0

  

  

4

  

 ×1

  

3169

  

3085
12

  

 ×1.3

  

2364

  

2303
6

  

 ×4

  

797

  

3103
33

  

 ×0.8

  

4164

  

4058
2

  

  

8

  

 ×1

  

6409

  

6225
14

  

 ×1.4

  

4730

  

4608
13

  

 ×4.3

  

1493

  

9166
197

  

 ×0.8

  

8353

  

8135
4

  

The coroutine column here also shows an improvement, surprisingly. That’s because a coroutine thread has a smaller continuation than the one in the sequential column, and the cost of fetching a parameter value can depend (to a limited degree) on continuation size. The effect of parallel threads on this kind of program is more consistent than fine details of a continuation’s shape.

Operations on mutable equal?-based hash tables [code] are another case where futures block, but parallel threads can provide performance improvement.

  

(hash-nums 6)

  

real msec

  

CPU
GC

  

  

N

  

sequential

  

coroutine

  

parallel

  

futures

  

  

1

  

 ×1

  

193

  

184
1

  

 ×1

  

190

  

182
0

  

 ×1

  

186

  

186
0

  

 ×1

  

191

  

182
0

  

  

4

  

 ×1

  

767

  

733
5

  

 ×1

  

763

  

729
0

  

 ×3.7

  

208

  

824
3

  

 ×1

  

763

  

729
0

  

  

8

  

 ×1

  

1541

  

1473
11

  

 ×1

  

1532

  

1463
1

  

 ×4.5

  

346

  

2373
24

  

 ×1

  

1539

  

1470
1

  

As an illustration of the current limitations of parallel threads in Racket, let’s try a program that writes data to a byte-string port then hashes it [code].

  

(hash-digs 7)

  

real msec

  

CPU
GC

  

  

N

  

sequential

  

coroutine

  

parallel

  

futures

  

  

1

  

 ×1

  

127

  

123
3

  

 ×1

  

127

  

123
4

  

 ×0.9

  

135

  

135
3

  

 ×1

  

126

  

122
3

  

  

4

  

 ×1

  

503

  

487
13

  

 ×0.9

  

536

  

519
35

  

 ×2.5

  

201

  

599
38

  

 ×1

  

520

  

504
28

  

  

8

  

 ×1

  

1022

  

986
33

  

 ×0.9

  

1097

  

1057
81

  

 ×2.5

  

403

  

1630
85

  

 ×1

  

1049

  

1017
60

  

Here we see that parallel threads do get some speedup, but they do not scale especially well. The fact that separate ports are not contended enables performance improvement from parallelism, but speedup is limited by some general locks in the I/O layer.

Even further in that direction, let’s try a program that hashes all files in the current directory [code] and computes a combined hash. When run on the "src" directory of the Racket Git repository, most of the time is reading bytes from files, and locks related to file I/O are currently too coarse-grained to permit much speed-up.

  

(hash-dir)

  

real msec

  

CPU
GC

  

  

N

  

sequential

  

coroutine

  

parallel

  

futures

  

  

1

  

 ×1

  

170

  

169
1

  

 ×1

  

169

  

169
0

  

 ×0.7

  

256

  

248
0

  

 ×1

  

170

  

170
0

  

  

4

  

 ×1

  

692

  

690
3

  

 ×1

  

662

  

658
3

  

 ×1.3

  

515

  

1068
5

  

 ×1

  

681

  

679
3

  

  

8

  

 ×1

  

1393

  

1377
10

  

 ×1.1

  

1293

  

1290
6

  

 ×1.6

  

868

  

2158
18

  

 ×1

  

1368

  

1366
7

  

Having locks in place for parallel threads can impose a cost on sequential programs, since locks generally have to be taken whether or not any parallel threads are active. Different data structures in Racket use specialized locks to minimize the cost, and most benchmarks reported here run report the same numbers in sequential column in Racket v8.18 (the previous release) and Racket v9.0. The exceptions are the (hash-nums 6) and (hash-digs 7) benchmarks, because those measure very-fine grained actions on mutable hash tables and I/O ports, and the cost is largest for those. Comparing sequential times for those two versions shows that support for parallel thread can cost up to 6-8% for programs that do not use them, although the cost tends to be much less for most programs.

  

(hash-nums 6)

  

real msec

  

CPU
GC

  

  

N

  

 v8.18 sequential

  

 v9.0 sequential

  

  

1

  

 ×1

  

188

  

180
0

  

 ×0.96

  

195

  

187
0

  

  

4

  

 ×1

  

757

  

728
2

  

 ×0.98

  

773

  

739
3

  

  

8

  

 ×1

  

1520

  

1461
9

  

 ×0.98

  

1546

  

1477
12

  

  

(hash-digs 7)

  

real msec

  

CPU
GC

  

  

N

  

 v8.18 sequential

  

 v9.0 sequential

  

  

1

  

 ×1

  

118

  

114
3

  

 ×0.94

  

126

  

122
3

  

  

4

  

 ×1

  

474

  

458
12

  

 ×0.94

  

506

  

489
13

  

  

8

  

 ×1

  

947

  

915
25

  

 ×0.92

  

1025

  

989
26

  

Overall, parallelizable numerical programs or ones that manipulate unshared data structures can achieve speedup through parallel threads relatively easily, but I/O remains a direction for improvement.

Backward Compatibility

If a library uses mutable variables or objects, either publicly or internally, then it must use locks or some other form of concurrency control to work properly in a multithreaded context. Racket already has concurrency, and the expectation for libraries to work with threads does not change with the introduction of parallel threads. Racket’s semaphores, channels, and other synchronization constructs work the same with parallel threads as concurrent threads. Even programs that use lock-free approaches based on compare-and-swap operation (such as box-cas!) continue to work, since Racket’s compare-and-swap operations use processor-level primitives.

Still, there are a few concerns:

  • Racket’s coroutine threads offer the guarantee of sequential consistency, which means that effects in one thread cannot be seen out-of-order in another thread. Parallel threads in Racket expose the underlying machine’s memory-consistency model, which may allow reordering of memory effects as observed by other threads. In general, a weak memory model can be an issue for code not intended for use with threads, but Racket— more precisely, Chez Scheme— always guarantees the memory safety of such code using memory fences. That is, Racket code might observe out-of-order writes, but it never observes ill-formed Racket objects. The fences are not new, and they are part of the same write barrier that already supports generational garbage collection and the memory safety of futures. Although sequential consistency supports lock implementations that don’t work with weaker memory models, so they would work with coroutine threads and not parallel threads, we have not found any such implementations in Racket libraries.

  • Some Racket libraries use atomic mode for concurrency control. Atomic mode in Racket prevent coroutine thread swaps, and entering atomic mode is a relatively cheap operation within Racket’s coroutine scheduler. When a parallel thread enters atomic mode, then it prevents other coroutine threads from running, but it does not prevent other parallel threads from running. As long as atomic mode is used consistently to guard a shared resource, then it continues to serve that role with parallel threads.

    Entering atomic mode is a much more expensive operation in a parallel thread than in a coroutine thread; in many cases, Racket core libraries that need finer-grained locking more specifically need to move away from using atomic mode. Still, making atomic mode synchronize a parallel thread with coroutine thread provides a graceful fallback and evolution path.

  • Foreign functions that are called by Racket in a coroutine threads are effectively atomic operations when there are no parallel threads, since a coroutine swap cannot take place during the foreign call. It’s rare that this atomicity implies any kind of lock at the Racket level, however, and the foreign function itself is either adapted to operating-system threads or not. Racket can already create operating systems threads through dynamic-place, and foreign-function bindings have generally been adapted already to that possibility.

The greater degree of concurrency enabled by parallelism exposed some bugs in our existing core libraries that could have been triggered with coroutine threads, but hadn’t been triggered reliably enough to detect and repair the bugs before. Beyond those general improvements, our experience with pre-release Racket is that parallel threads have not created backward-compatibility problems.

by The Unknown Author at Sunday, November 23, 2025

Saturday, November 22, 2025

The Racket Blog

Racket v9.0

posted by Stephen De Gabrielle and John Clements


We are pleased to announce Racket v9.0 is now available from https://download.racket-lang.org/.

Racket 9.0 is here!

A major release is always exciting and Racket 9.0 is no exception in that it introduces Parallel Threads. While Racket has had green threads for some time, and supports parallelism via futures and places, we feel parallel threads is a major addition.

As of this release:

  • Racket supports parallel threads. For more information see the new blog post on the topic.
    • Parallel threads can be created using the #:pool argument to thread creation.
    • Threads created with #:keep set to 'results will record their results for later retrieval with thread-wait.
  • The black-box wrapper prevents the optimizing compiler from optimizing away certain computations entirely. This can be helpful in ensuring that benchmarks are accurate.
  • The decompile-linklet function can map linklets back to s-expressions.
  • When using BC Racket, the processor-count function is changed to always return the parallel count.
  • We now distribute “natipkg” packages for AArch64, useful for package-build and package-testing infrastructure.
  • Check Syntax tracks identifiers more deeply nested in the “origin” field of syntax objects.
  • The math library includes Weibull distributions.
  • There are many other repairs and documentation improvements!

Thank you

The following people contributed to this release:

Alexander Shopov, Anthony Carrico, Bert De Ketelaere, Bogdan Popa, Cadence Ember, David Van Horn, Gustavo Massaccesi, Jade Sailor, Jakub Zalewski, Jens Axel Søgaard, jestarray, John Clements, Jordan Johnson, Matthew Flatt, Matthias Felleisen, Mike Sperber, Philip McGrath, RMOlive, Robby Findler, Ruifeng Xie, Ryan Culpepper, Sam Phillips, Sam Tobin-Hochstadt, Sebastian Rakel, shenleban tongying, Shu-Hung You, Stephen De Gabrielle, Steve Byan, and Wing Hei Chan.

Racket is a community developed open source project and we welcome new contributors. See racket/README.md to learn how you can be a part of this amazing project.

Feedback Welcome

Questions and discussion welcome at the Racket community on Discourse or Discord.

Please share

If you can - please help get the word out to users and platform specific repo packagers

Racket - the Language-Oriented Programming Language - version 9.0 is now available from https://download.racket-lang.org

See https://blog.racket-lang.org/2025/11/racket-v9-0.html for the release announcement and highlights.

by John Clements, Stephen De Gabrielle at Saturday, November 22, 2025

Tuesday, November 18, 2025

spritely.institute

GoblinShare: Secure, Peer-to-Peer File-Sharing with Goblins

This year, instead of participating in the Autumn Lisp Game Jam, Spritely decided to make some non-game demos to show off our tech. To that end, I spent the week of November 3rd writing GoblinShare, a secure, peer-to-peer file-sharing utility using the Guile port of Magenc for storage and distribution, relying on Goblins for the peer-to-peer connection abstraction, and delivered over Tor. Thanks to Goblins, this turned out to be super easy to implement. Let me show you how!

�� Demos; glorious demos! ��

We at Spritely absolutely love our technology demos. We know it can be tricky to understand an unfamiliar paradigm, but we also think hands-on demos help. To that end, we build lots of usable demos to try. We particularly like building games during the Lisp Game Jam so we can connect with the broader Lisp community, encourage more developers to use our technology, and end up with something fun that people want to use. After all, we can't build the future of the social web all by ourselves! This year, though, we decided to focus not on showing off work we've already done, but pushing our work in a new direction; and it felt best to do that outside the context of the jam.

My assignment (which I did choose to accept) was tripartite: port Magenc to Guile, implement a simple file-sharing tool (which I decided to somewhat model on Magic Wormhole), and port Crystal to Guile. The Crystal work remains ahead, but I've ported Magenc and built GoblinShare as the file-sharing tool. I treated the Magenc port as prep work and GoblinShare proper as the equivalent of a jam entry, so that will be the focus of this post. First, though, I'll provide an overview of these two projects, what they do, and how they do it.

Magenc

Magenc is an encrypted, distributed, content-addressed data store relying on magnet URLs as capabilities, inspired by Tahoe-LAFS. It does not use Goblins, but it is built with capability security concepts in mind. It consists of three components, each functioning as a subcommand under a single program: magenc serve, which starts a server; magenc put, which encrypts and POSTs files to a server; and magenc get, which GETs files from a server. (Although the architecture is designed to support arbitrary remote stores, only a web store using HTTP has been implemented so far.)

A quick summary

The core of Magenc's functionality is the magnet URL, which looks like magnet://?xt=<url-encoded-urn>&ek=<base64-url-encoded-string>&es=<string> and uniquely identifies a specific file. Decomposing the query parameters (everything after ?), xt refers to the exact topic identifying a specific file, which is simply the sha256 hash of the encrypted manifest (which may be a raw object as discussed below). ek is the encryption key used to encrypt the file. es is the encryption suite used for encryption. The Guile port of Magenc uses AES in Galois/Counter Mode (AES-GCM) for encryption; the Racket version uses AES in Counter Mode (AES-CTR). Of this information, only the exact topic is ever known by the server. Together, the information in a magnet URL is both all that is needed and everything that is required to identify, access, and decrypt a file on a given server. This makes a magnet URL a capability.

The server only has access to encrypted binary data a client sends. When the server receives a POST request with attached binary data, it hashes the provided data using sha256, encodes the unhashed data using base64, and stores the encoded data in a key-value store where the hash ‒ the exact topic ‒ is the key. It then converts the exact topic into a URI object from Guile's (web uri) module before sending the URI back in the content-location field of the response object. As a safety redundancy, the client checks to make sure that the exact topic it gets back is what it expects. When the server receives a GET request with an exact topic in the content-location field, it looks up the associated data, decodes the base64 into binary data, and sends that back as the body of a response. That's all the server knows and does; everything else is handled by the client.

magenc put handles the encryption and generation of cryptographic inputs. When passed a file (and, optionally, server URL), Magenc first chunks the file as necessary, encrypting each chunk to transmit separately so that the server does not necessarily know the chunks are related (though correlation of connections and network traffic by a malicious server or observer could be used to reliably guess interrelation). Then, the client prints a magnet URL which identifies the entrypoint to retrieve the file ‒ a "raw" object if the file fits in one chunk or a "manifest" if it doesn't. As discussed above, the magnet URL also embeds cryptographic information.

A file's magnet URL can be used with magenc get to retrieve that file. When passed a magnet URL (and, if necessary, server URL), magenc get first retrieves and decrypts the object identified by the exact topic. If the object is a "raw" object, Magenc extracts the binary data, decrypts it, and writes it to either standard output or a given file. If the object is a "manifest" object, Magenc extracts the exact topics for each chunk from the manifest, retrieves and decrypts each chunk in order, then writes out the complete file in the same way.

The original version of Magenc includes a more thorough write-up. Aside from the differing encryption suites and commandline interfaces, that write-up also holds for the Guile version.

A simple example

Some things are easier to understand in action, so let's walk through a trivially simple example of storing and retrieving a file using Magenc. We will skip build instructions, which are provided in the repository. For ease of demonstration, we will use the default server configuration which launches a process listening at http://127.0.0.1:8118 (http://localhost:8118) and stores data in memory rather than writing it to disk ‒ Magenc does include a backend relying on Goblins' bloblin persistence store to efficiently store files on disk. We will use what is called convergent encryption, which uses part of the unencrypted file itself as a cryptographic input, to ensure that our example file produces the same magnet URL every time it is stored. (If you are reading this in the future and Magenc's cryptography has changed, the magnet URLs may no longer match.)

First, let's start a server:

magenc serve

This should print the address where the server is listening, like:

Server running at: http://127.0.0.1:8118

We can leave that running in one terminal session and use a different one for everything else. Now let's create our example file:

echo "Hello! I'm an example file!" > example.txt

Next, we'll store the file with magenc put example.txt --convergent. This prints the magnet URL magnet:?xt=urn%3Asha256d%3A4a2TJXrPx83v1DGnOJyPa5b678AkVsPaplsx_LcT06I&ek=_TrAfpNRRLQb7gutF8KKMtj-tPWk8_AapsJQgu6sDeo&es=AES-256-GCM. (Note that argument ordering is relevant; to simplify the implementation of the CLI, Magenc expects option arguments ‒ anything starting with - or -- ‒ after positional arguments.) Finally, we can retrieve the file:

magenc get "magnet:?xt=urn%3Asha256d%3A4a2TJXrPx83v1DGnOJyPa5b678AkVsPaplsx_LcT06I&ek=_TrAfpNRRLQb7gutF8KKMtj-tPWk8_AapsJQgu6sDeo&es=AES-256-GCM"

Make sure to quote the magnet URL otherwise the shell will interpret & as a command. This command produces the output:

Hello! I'm an example file!

And that's it! Simple! There are a few more options available for the various subcommands, each explained with magenc --help.

GoblinShare

In addition to being an application, Magenc is also a library. GoblinShare is implemented using Magenc in this capacity. All it adds is a purpose-built UI and a wrapper around the in-memory store backend which makes it easier to use through a Goblins actor. The sending peer launches a Tor netlayer, generates a sturdyref ‒ a persistent object reference which can be shared out-of-band ‒ for the store actor, and adds that to the magnet URL with the additional acceptable source (as) field. The receiving peer sets up its own netlayer, "enlivens" the sturdyref, and downloads the file. Once the file is retrieved, the sending peer terminates, removing the associated data from memory.

An example

We will perform essentially the same tasks for this example as we did for the Magenc example. There are a few differences to note. First, GoblinShare does not provide an option for convergent encryption. This simply wouldn't make sense for an ephemeral file-sharing tool. For our present purposes, this means that you will almost certainly get a different magnet URL than is produced here. Second, it is necessary to manually run the tor daemon. Build and usage instructions in the GoblinShare repository cover that so the following example assumes a running daemon.

We will reuse the same example file as above, so feel free to reuse the previous command to create it. Then, all we have to do is send it:

goblinshare send example.txt

to get the magnet URL. The sending process will wait for the file to be retrieved, so switch to another terminal session to retrieve the file:

goblinshare receive "magnet://?xt=...&ek=...&es=...&as=..."

Because we're using Tor as our network layer, it can take a few seconds for the connection to be established and the data to be sent, even though we are connecting to a local server. Since our example file is small enough to fit into a single chunk, the delay isn't very long, but a larger file can take quite a while to transfer. Eventually, you will receive the expected output:

Hello! I'm an example file!

At this point, both the sending and receiving processes will terminate, and that's that! You've successfully shared a file with Goblins! Just like Magenc, GoblinShare has a (very) few options you can see with goblinshare --help. Unlike Magenc, option arguments can be supplied before or after positional arguments as long as they follow the associated subcommand.

A note on Magic Wormhole

I mentioned that I took initial inspiration for this project from Magic Wormhole. In practice, though, the only similarity wound up being the names of the subcommands send and receive. Magic Wormhole relies on a central relay server because it uses "wormhole codes" to facilitate oral communication of the relevant capabilities. These codes seem to be keys mapping to a fuller capability and therefore requiring a coordination point. GoblinShare, by contrast, chooses to instead provide less-human-friendly magnet URLs which allow fully peer-to-peer file transfer because they encode all necessary information.

How easy was it, really?

As I mentioned near the beginning of this post, the most surprising part of implementing GoblinShare was how easy it was. The core functionality was implemented in about half a day, though a failed attempt to get GoblinShare to manage the tor daemon stretched the initial implementation out to about a day and a half. In the end, the UI and business logic of GoblinShare together require 250 lines of code including module headers, whitespace, docstrings, and inline comments (but not license headers). That's very little code! Magenc took a bit more work, but a lot of that was getting the cryptography and URL abstractions playing nicely ‒ and, admittedly, there's still room to make Magenc more approachable as a library. Still, the port took a little under a week all told, including time to implement tests, and came out to somewhere in the neighborhood of 1400 lines of code with similar caveats.

Numbers are one thing, but the simplicity of GoblinShare really comes through in the code itself. The core logic comes down to three procedures, two in the send logic (with an additional helper) and one in the receive logic. Let's break these down and walk through them to build up the relevant logic.

As a general note, the UI layers of GoblinShare and Magenc use SRFI-37 to convert commandline arguments into an association list of options and arguments. The main procedures in (goblinshare) and (magenc) simply parse the commandline into the appropriate arguments which they pass to the procedure associated with a given subcommand.

Now, on to the code walkthrough!

send

First, we'll briefly discuss the relevant helper procedure, add-sref-to-magnet-url:

(define (add-sref-to-magnet-url magnet-url sref)
  (match magnet-url
    (($ <magnet-url> xt ek es #f)
     (make-magnet-url xt ek es
                      (uri-encode
                       (ocapn-id->string sref))))))

All this procedure does is decompose the magnet URL we get from Magenc then build a new magnet URL with an additional query parameter holding the sturdyref to the Goblins actor.

Most of the important logic happens in connect-to-goblinshare-server-store, named to match the convention of Magenc's store abstraction:

(define (connect-to-goblinshare-server-store done)
  (define backend (connect-to-backend #:backend-type 'memory))

  (define (put data)
    (store-backend-data backend data))

  (define (get exact-topic)
    ;; CapTP doesn't support records so we turn exact topics into strings
    (retrieve-backend-data backend (string->exact-topic exact-topic)))

  (define (close)
    (close-backend backend)
    (signal-condition! done))

  (connect-to-store* 'goblinshare-send
                     (lambda () (values get put close))))

This procedure does a few interesting things. First, it accepts a done parameter. This is a Fibers condition so that we can wait on a remote peer to collect the shared file before exiting, which is handled elsewhere.

Next, this procedure creates a Magenc memory backend. Backends are simply data stores wrapped in a Scheme record so they can be used with an abstract interface. They provide three procedures associated with three fields of the record type: backend-get, backend-put, and backend-close, accessed using retrieve-backend-data, store-backend-data, and close-backend, respectively. Here we wrap each of the underlying backend's procedures in a new procedure so we can massage our inputs to work over OCapN.

put calls the memory backend's put procedure unmodified.

get does a bit more. Because we are using OCapN to communicate between peers, and because OCapN doesn't have a dedicated type for generic records, we convert exact topics into strings when sending messages between Goblins actors. get thus converts exact topic strings back into exact topic records. We could instead have written a marshaller to convert our record into an OCapN tagged value, but there was little reason to do so.

close calls the underlying backend's close then signals the done condition.

Finally, the last call in this procedure is to Magenc's connect-to-store* helper which constructs a store interface (similar in shape and function to a backend interface) to be passed to chunk-and-store-data. As you can see, it takes two arguments: a symbol identifying its type, and a thunk which returns three values: get, put, and close. (The higher-level connect-to-store supports keyword arguments and is used to construct Magenc's built-in backends.)

The last and likely most interesting piece of GoblinShare's send logic is the part that actually deals with Goblins. All it does is wrap a store in an actor, spawn a Tor netlayer, and return a sturdyref to the wrapper actor:

(define (spawn-client-sref store)
  (define (^client bcom store)
    (methods
     ((get id) (retrieve-data store id))
     ((put . _) (error "cannot put with client capability"))
     ((close) (close-store store))))

  (define mycapn (spawn-mycapn (spawn ^onion-netlayer)))
  (:: mycapn 'register (spawn ^client store) 'onion))

The actor is ^client. As you can see, we override the backing store's put procedure to prevent remote peers from writing unexpectedly. Otherwise, we use Magenc's store interface to manipulate the store normally.

The last two lines pack a lot of logic. First, we spawn an ^onion-netlayer representing a Tor netlayer. We immediately pass this into spawn-mycapn, creating a new ^mycapn object populated with that netlayer. Then, we spawn a new ^client, immediately passing it to our ^mycapn object's register method to get a promise to a sturdyref referencing the object. Whew! All that in only two lines!

We bring all of the send logic together in gs-send:

(define* (gs-send filename #:optional
                  (out-port (current-output-port)))
  (let* ((done (make-condition))
         (store (connect-to-goblinshare-server-store done))
         (magnet-url
          (call-with-input-file filename
            (lambda (in-port)
              (chunk-and-store-data in-port store))
            #:binary #t)))
    (with-vat (spawn-vat #:name 'goblinshare-server)
      (on (spawn-client-sref store)
          (lambda (sref)
            (format out-port "~a~%"
                    (magnet-url->string
                     (add-sref-to-magnet-url magnet-url sref))))))
    (wait done)))

As you can see, gs-send requires a filename and optionally accepts an output port where it will write the resulting magnet URL. This is to facilitate tests and is not exposed in the commandline client.

The procedure starts by creating a Fibers condition which it passes to connect-to-goblinshare-server-store to create a store.

Next, it passes the resulting store to Magenc's chunk-and-store-data with default parameters ‒ so, using AES-GCM encryption and generating a new key appropriate for that cipher ‒ and a port for reading the input file. It assigns the returned magnet URL to a new variable.

Then, gs-send spawns a vat and enters a vat context. There, it spawns a client actor to wrap the store and resolves the sturdyref to the client actor. It adds the resolved sturdyref to the magnet URL and writes out the result.

Finally, gs-send waits for the done condition to be signaled.

receive

The receive command relies on only one procedure for most of its logic:

(define (connect-to-goblinshare-client-store client-sref)
  (define vat (spawn-vat #:name 'goblinshare-client))
  (define mycapn (with-vat vat (spawn-mycapn (spawn ^onion-netlayer))))
  (define client (with-vat vat (:: mycapn 'enliven client-sref)))

  (define (put . _)
    (error "cannot put with client capability"))

  (define get
    (let ((ch (make-channel)))
      (lambda (exact-topic)
        (with-vat vat
          ;; CapTP doesn't support records so we turn exact topics into strings
          (on (<- client 'get (exact-topic->string exact-topic))
              (lambda (val)
                (syscaller-free-fiber
                 (lambda () (put-message ch `(ok ,val))))
                #t)
              #:catch
              (lambda (exn)
                (syscaller-free-fiber
                 (lambda () (put-message ch `(error ,exn))))
                #f)))
        (match (get-message ch)
          (('ok val) val)
          (('error exn) (raise-exception exn))))))

  (define (close)
    (with-vat vat
      (<-np client 'close)))

  (connect-to-store* 'goblinshare-receive
                     (lambda () (values get put close))))

There are more lines here than in connect-to-goblinshare-server-store, but the gist is simple. connect-to-goblinshare-client-store takes a sturdyref, which may be a promise thanks to promise pipelining.

It starts by spawning a vat, spawning a ^mycapn with a new ^onion-netlayer, and enlivening the client actor. These steps mirror the steps used to spawn the client sturdyref above. Rather than registering an actor to get a sturdyref, here we enliven a sturdyref to get an actor.

Next, connect-to-goblinshare-client-store wraps each method of the client actor in a regular Scheme procedure for Magenc's store interface.

put and close are quite simple. The former errors out to avoid unneeded network access should it be called. The latter sends the close message to the remote actor.

get looks complicated, but most of the code is there to resolve a promise into a concrete Scheme value. The core functionality is (<- client 'get (exact-topic->string exact-topic)). This line encodes the exact topic as a string, messages the client actor's get method with that string, and returns the resulting promise. When that promise fulfills or breaks, the surrounding logic propagates the result as normal.

Finally, connect-to-goblinshare-client-store creates a new store with these wrapper procedures using connect-to-store*.

The resulting store is used by gs-receive to get the desired file:

(define* (gs-receive magnet-url #:optional
                     (out-port (current-output-port)))
  (let* ((store
          (connect-to-goblinshare-client-store
           (string->ocapn-id
            (uri-decode
             (magnet-url-acceptable-source magnet-url)))))
         (result
          (call-with-output-bytevector
           (lambda (out-port)
             (retrieve-and-unchunk-data
              out-port store
              #:exact-topic (magnet-url-exact-topic magnet-url)
              #:key (magnet-url-encryption-key magnet-url)
              #:cipher (magnet-url-encryption-suite magnet-url))))))
    (write-bytevector result out-port)
    (close-store store)))

This procedure requires the magnet URL to retrieve, and optionally accepts an output port where it will write the resulting data. This is exposed through goblinshare receive's --output option.

First, gs-receive creates a store as discussed above, extracting the sturdyref string from the magnet URL and converting it into the appropriate Scheme type. Then, it decomposes the magnet URL into its components and passes them, along with the store and a port receiving the result, to Magenc's retrieve-and-unchunk-data. Finally, it writes the result and informs the remote store that it's done.

And that's it! All the other code in GoblinShare is for the UI.

I told you it was simple!

Final thoughts

It's worth mentioning that neither Magenc nor GoblinShare are intended as finished, production software. Notably, the underlying cryptography has not been audited. Additionally, there are some improvements I'd like to make to Magenc's API which would make GoblinShare simpler.

That's okay, though. These projects are demonstrations. Magenc is intended to demonstrate the basic concepts of capability-secure distributed, encrypted, content-addressed data storage. GoblinShare, for its part, is supposed to show how easy it is to implement otherwise-complex functionality with Goblins, which I think it does. As a rough comparison, Magic Wormhole is about 11,500 lines of Python code, counted with similar caveats as those for Magenc and GoblinShare, and relies on a much longer list of dependencies.

All told, I am incredibly happy with how things turned out. GoblinShare is beautifully simple and useful. I hope it shows others how easy Goblins makes networked applications and inspires more neat software.

Happy hacking!

by Juliana Sims (contact@spritely.institute) at Tuesday, November 18, 2025

Saturday, November 15, 2025

Idiomdrottning

Emacs Font Fun

I actually love fonts! (Which might surprise those who know that I leave my webpage set to the browser default font, but it’s * I love good fonts so much that I think fonts should be a reader decision, not a server decision.) When my desktop went headless after having to move to a way smaller apartment and my Emacs had to live in SSH only, I was sad because I had set up all kinds of weird fonts in Emacs and functions to switch the entire Emacs over along with “only switch this particular buffer over” variants. E.g.

(defun to-june ()
  (interactive)
  (set-frame-font "Junicode-32"))

(defun to-june-b ()
   (interactive)
   (setq buffer-face-mode-face '(:family "Junicode" :height 320))
   (buffer-face-mode))

Along with this to make a changed buffer go back to the frame-wide font:

(defun unbufface ()
  (interactive)
  (buffer-face-mode -1))

And now I have all those fun fonts working again on the Android version of Emacs! ♥︎♥︎♥︎

Hence the huge sizes. I was on 12 for most non-Junicode fonts with a -16 “big” option, while Junicode with its lower x-heights I had at 14 pts. But on “Moria”, I use 28 as the “small” size, 34 as the big size and Junicode gets to be 32.

My default font for coding and general emacsing around was (CW non-free) Futura while I wrote most prose texts with Junicode. Coding in proportional?! A geometric sans with a super ambiguous character set for I1O0? Well, it works great for Lisp for the most part and I had it set up so I could super easily toggle out from it back into Deja Vu Mono, or Fira Code these days.

(add-to-list 'default-frame-alist '(font . "Futura LT-28"))

I set this all of this up pretty soon before having to switch away from it so I only got to enjoy it for a few months so I’m grateful that I have it back albeit not on my big 21″ 4:3 screen. It’s on a more cozy and puny lantern-lit 7″ screen. Last piece of the puzzle is that I’m gonna go find a retro-ish typewritery font. Old Timey Code maybe. I was on “Bohemian Typewriter” but I couldn’t get that to work on since the Android version of Emacs don’t support OTF. And good riddance since the latin-1 coverage was so bad, but also not good riddance since the alternatives I’ve found so far are more consistenly x-spaced instead of nostalgically jittering around like an X-File on uppers.

I really like switching fonts with the same text open. It helps me get fresh eyes on the same text and see different problems or things I can write in a more beautiful or clear way.

by Idiomdrottning (sandra.snan@idiomdrottning.org) at Saturday, November 15, 2025

crumbles.blog

Tour of a pattern matcher: core patterns and the AST

Alright! Last time we looked at the mouth of extensible-match and how it chews up patterns in order to make them digestible. Before we take a look into the stomach,1 we should look at what that chewed-up form of patterns looks like. This is an episode with a bit less how and a lot more what than the last one. I’ll explain and justify the core pattern structure we just got out of the expansion process, how and why it’s different from the AST that we eventually like to work on, but also one further transformation on that AST that makes it easier for the rest of the system to digest.

We’re still in the (extensible-match match) library, which we’ll keep dipping back up into because it contains this long chain of procedure calls which actually co-ordinates the whole of the rest of the process of compilation.

Subject variables vs pattern variables

The idea of a pattern variable is simple, and is a concept exposed to the user. In a pattern like (cons x '()), x is a pattern variable. This pattern deconstructs a pair which could previously have been constructed by the same code used as an expression: the expression form takes a variable and makes it the only element of a 1-list; the pattern form takes a 1-list and puts its element in a variable. This is the essence of the equational reasoning property which makes pattern matching such a nice style to work with and think in.

Subject variables are pattern variables’ twin from the dark side. In the above example, there’ll be three subject variables: one for the pair itself; one for its car, and one for its cdr. The upshot is that the pattern matcher doesn’t generate code like (if (equal? (cdr p) '()) ...); it will always extract the thing it’s interested in to a variable first: (let ((g1234 (cdr p))) (if (equal? g1234 '()) ...)). This simplifies things because it means that the kind of thing we’re testing against is always a simple variable expression no matter what the nesting level of patterns is.

(Obiter dictum: this doesn’t affect the efficiency of the final generated code; it anticipates the Scheme compiler’s own pass which will un-nest expressions, namely CPS or ANF conversion.)

Subject variables are always generated identifiers; their precise names are only interesting to the internals of the pattern matcher. I mention them here because they’re important for understanding the structure of expanded patterns and the first few stages of compilation we’ll look at today. The first thing that %core-match-lambda does to patterns, in fact, is to generate subject variables for each of the values it will be examining. (Recall that %core-match-lambda has multiple sets of patterns, but each such set has to match the same number of values. The values for all clauses get the same subject variables.) Although these subject variables are the entry-point for the pattern match, this is in fact the last set of new subject variables generated, because all of the others (in the example above, the ones for the car and cdr of the pair) are generated during expansion.

Core patterns

Core patterns are defined in (extensible-match core-pattern).

Behold, a grammar:

⟨core pattern⟩ ::= (core:wildcard)
| (core:var ⟨identifier⟩)
| (core:quote ⟨datum⟩)
| (core:and ⟨core pattern1⟩ ⟨core pattern2)
| (core:or ⟨core pattern1⟩ ⟨core pattern2)
| (core:not ⟨core pattern⟩)
| (core:row ⟨core pattern⟩ …)
| (core:? ⟨expression⟩)
| (core:apply ⟨expression⟩ (⟨identifier⟩ …) ⟨core pattern⟩)
| (core:subject ⟨identifier⟩ ⟨core pattern⟩)
| (core:seq ⟨don’t worry about it⟩)

I won’t dwell on the first six rows (core:wildcard, core:var, core:quote, core:and, core:or, core:not) because they’re not very interesting. Core:wildcard is spelled _ in the high-level pattern language, core:var is just written as a variable name; core:and and core:or are now restricted to two subpatterns each. Otherwise, you can pretty much find everything you need to know about these from the documentation of the high-level forms in SRFI 262.

Core:row is where things start to get slightly more interesting. This pattern isn’t directly exposed to high-level forms, but it’s used for the patterns which can come after the procedure expression in a high-level ? pattern and the value patterns of a => pattern. They’re also used for each of the individual values at the top-level of a multiple-pattern %core-match-lambda form.

In a row pattern, all the subpatterns have to match, but the compiler is allowed to test them in any order. While a full rationale will have to wait until the next episode, in general the order in which it makes most sense to test subpatterns is not always the left-to-right order in which they appear in the source code. In the (cons x '()) example, we’re far more interested in the cdr of this pair than we are in the car, because the cdr can tell us whether the pattern matches or not; only once we know that the cdr is the empty list should we bother trying to load the car.

Core:apply corresponds directly to the high-level => pattern (which was called apply in earlier drafts of the SRFI) but has a different structure, to do with the need to give each value its own subject variable. The ⟨expression⟩ gives the procedure which is to be applied to the pattern’s subject; the ⟨identifier⟩s are subject variables to which each respective returned value will be bound; finally, the ⟨core pattern⟩ will be matched in an environment which includes those subject variables.

Core:subject is the last piece of the puzzle; it simply says which subject variable names the subject of its own subpattern.

How this all fits together is probably best illustrated by example. Our (cons x '()) example expands, within the high-level pattern matching language, to this:

(? pair?
   (=> car x)
   (=> cdr '()))

which in core patterns expands to this:

(core:and (core:? pair?)
          (core:row
           (core:apply car (g1) (core:row (core:subject g1 (core:var x))))
           (core:apply cdr (g2) (core:row (core:subject g2 (core:quote ()))))))

A single-subpattern core:row isn’t very useful, but an equivalent expansion using SRFI 1’s car+cdr2 would be:

(? pair?
   (=> car+cdr x '()))

which in core patterns is:

(core:and (core:? pair?)
          (core:row
           (core:apply car+cdr (g1 g2)
            (core:row
             (core:subject g1 (core:var x))
             (core:subject g2 (core:quote ()))))))

Something to note here is what core:var actually, formally does: it takes its subject variable and turns it into a pattern variable. g1 is a hidden, generated variable; x is bound within its clause to the exact same value.

That’s more or less it for the core pattern language; I hope it’s more-or-less clear how it works. I’ve omitted core:seq from this description. If this series ever covers core:seq and the implementation of the pattern matcher’s functionality for running simple regular expressions over sequences of Scheme values, it will be in a bonus episode, possibly quite a bit later. I’m leaving it out here because the implementation is still an unhappy mess, much more so than any other part; also, Wolfgang Corcoran-Mathe has pointed out that the current API mixes idioms in a confusing way, and fixing that might involve some other deep changes to how it’s implemented.

The Abstract Syntax Tree (AST)

As covered in a previous missive, core patterns are pseudo-record syntax objects; accessing fields from such syntax objects is slow; real records are much faster; so to make decision tree generation go faster, there’s an additional layer, called the abstract syntax tree, implemented as real Scheme records. This is in (extensible-match ast), along with the code to convert core patterns to the AST. (That’s invoked by the long chain of procedure calls that drives the whole compilation process in %core-match-lambda, but defined here.)

Actually, though, there’s (at least) one more reason besides speed that the AST is better to work with going forward – see if you can spot it in the definition:

(define-record-type primitive-pattern)
(define-record-type pattern
  (fields subject)
  (parent primitive-pattern))
(define-record-type wildcard-pattern
  (parent pattern))
(define-record-type var-pattern
  (fields name)
  (parent pattern))
(define-record-type quote-pattern
  (fields datum)
  (parent pattern))
(define-record-type and-pattern
  (fields subpat_1 subpat_2)
  (parent primitive-pattern))
(define-record-type or-pattern
  (fields subpat_1 subpat_2)
  (parent primitive-pattern))
(define-record-type row-pattern
  (fields subpats)
  (parent primitive-pattern))
(define-record-type not-pattern
  (fields subpat)
  (parent primitive-pattern))
(define-record-type ?-pattern
  (fields predicate-id predicate-expr)
  (parent pattern))
(define-record-type apply-pattern
  (fields procedure-id procedure-expr vars subpat)
  (parent pattern))

I’ve omitted the AST equivalent of core:seq for simplicity.

There are a couple of improvements here. I was so determined to kill free-identifier=? from the decision tree generator entirely that the expressions naming the procedures for ? and =>/apply patterns are now interned, giving them each a unique integer for the decision tree generator to use as a proxy for whether they are the same identifier. (This was probably a premature optimization, in hindsight.)

But the big improvement is in the (not very well-named) distinction between a pattern and a primitive-pattern. Notice that core:subject is gone from the AST! In its place, every pattern that needs to know the name of its subject variable has it stored directly within it.

In the core pattern language, you don’t actually know what (core:quote ()) means without looking for a lexically enclosing core:subject: in our (cons x '()) example, (core:quote ()) alone could be looking at the pair (no match), the car (not intended, but might incorrectly match), or the cdr (as intended).

Making this change made the decision tree generator much simpler, apart from the speed boost from using records instead of syntax objects.3 It no longer has to track a lexically implicit subject for each pattern.

You might wonder why the code doesn’t do this already during pattern expansion, and why core patterns don’t include their subjects in the same way the AST does. Well, maybe ideally it would do this, but in the sense of having everything in its right place at the right time, it’s probably the right thing for simplicity. Doing it this way lets the expansion of patterns happen completely independently of any context; we add that context to each individual subpattern later.

Tracking just the state of each expansion in continuation-passing style is tricky enough; adding the current lexical subject variable as extra state to pass around would make it more complicated. Moreover, continuation-passing expansions imply that the expander’s final result has to be a syntax object anyway – it’s not safe to go straight from unexpanded patterns to AST records – so let’s take the opportunity afforded by the need to have a slightly redundant layer in order to at least make the process of creating that redundant layer easier. That said, if Scheme does one day grow a way to call a transformer and apply marks independently of the expander’s main loop, it will probably then be time to get rid of core patterns and track the current subject variable implicitly while expanding directly into AST records.4

The other half of the clause

We’re not quite done with the AST. We’ve talked a lot about the left-hand side of each pattern matching clause – the pattern – and we’ll continue to do so. But there is another side, which is what happens when the pattern of a clause is known to have matched.

(define-record-type action
  (fields procedure args))
(define failure-action (make-action #'fail '()))

(define-record-type patact
  (fields pattern action))

There’s not too much to say here. When a top-level pattern matches, its corresponding action is called. It’s a procedure, named by a generated identifier for each clause, plus a list of the variables bound by the corresponding pattern (which become arguments of the procedure). If no pattern matches, we call a special action called fail which just raises an exception.

A pattern and an action together make a clause, but here we call it a patact.

Unifying subject variables

That’s all the what; let’s finish up with a quick look at another bit of how. We’ll talk a lot more about what it takes to generate optimal code for a pattern match next time, but there’s one step that takes place to get the patterns in a form that’s easier to optimize.

Recall that each (sub)pattern expansion happens independently, without knowledge of the context from other expansions; and also that the expansion of => patterns into core:apply patterns. What this means is that equivalent applications generate distinct subject variables; we really want equivalent applications to create equivalent subject variables!

To see how, let’s put the example we’ve used so far into context:

(define (last ls) ; returns the last element in a list
  (match ls
    ((cons x '())  x)
    ((cons _ ls*)  (last ls*))))

These two patterns will expand, respectively, into the same core pattern we saw above and another very similar one:

(core:subject ls
 (core:and (core:? pair?)
           (core:row
            (core:apply car (g1) (core:row (core:subject g1 (core:var x))))
            (core:apply cdr (g2) (core:row (core:subject g2 (core:quote ())))))))
;; action: (g3 x)

(core:subject ls
 (core:and (core:? pair?)
           (core:row
            (core:apply car (g4) (core:row (core:subject g4 (core:wildcard))))
            (core:apply cdr (g5) (core:row (core:subject g5 (core:var ls*)))))))
;; action: (g6 ls*)

We applied car and cdr to ls in each pattern, but the result was called g1 and g2 in the first pattern and g4 and g5 in the second one. We only want to call each procedure once – if at all – in the generated code, so it would be nice to only generate one variable. This is what the pattern subject unification pass does: for every application of the same procedure to the same subject, the created subject variables should have the same name. The result is that the first pattern will still look the same, but the second will be (the AST-level equivalent of) this:

(core:subject ls
 (core:and (core:? pair?)
           (core:row
            (core:apply car (g1) (core:row (core:subject g1 (core:wildcard))))
            (core:apply cdr (g2) (core:row (core:subject g2 (core:var ls*)))))))
;; action: (g6 ls*)

It does this in a recursive pass which looks for combinations of subject, procedure, and number of output values5 it’s not seen before; when it finds one, it puts the variable names that were generated for that pattern’s version into a hash table. When it sees the same combination of subject, procedure, and co-arity again, it replaces that AST apply pattern with a version whose subject variables are renamed, and recursively traverses the apply pattern’s subpattern to change instances of those subject variables appearing within quote-patterns, var-patterns, other apply-patterns etc. to the unified name.

This is also something that could be done during expansion, but it’s just simpler to do it as an extra pass after expansion because it makes expansion independent of external context.

Potential future AST transformations

Subject variable unification is the currently the only optimization pass that’s done on the AST of the patterns before handing over to the decision tree generator, which is responsible for most of the work of finding an efficient way to match a set of patterns. But there are a couple of other passes the pattern match compiler could make at this point to improve the quality of generated code, or make the decision tree generator simpler.

Rewrite complex not patterns

At the moment, using a not pattern will completely frustrate the optimizer. It’s not too bad if you only put not around a datum or a predicate – but if you put it around an (implicit) and, the optimizer will only know how to compile it and its entire subpattern at once, and be unable to avoid potentially repeating equivalent computations for multiple similar patterns. Being able to mix in patterns which do match when they don’t match anywhere arbitrarily would just add too much complexity to the decision tree generator.

Fortunately, in 1846, compiler engineer Augustus De Morgan formalized an intuition about the relationship between negations, conjunctions, and disjunctions, which could eventually be applied here to ensure that the not pattern never appears outside of an and or or pattern. Ensuring it also never appears outside of row would be ideal, but the obvious solution – turning e.g. (not (row pat1 pat2)) into (or (not pat1) (not pat2)) – may be an accidental pessimization because or has an order-of-evaluation restriction which row doesn’t.

This AST transformation alone would be an improvement for the decision tree generator, but the real improvement would come with recognizing common not operations and being able to fully optimize them. Another benefit of this transformation would be the ability, in theory, to (sometimes) issue compile-time warnings about patterns like (not _) which will never match.

Remove quote patterns from the AST

A (core:quote val) pattern can be rewritten as (core:? (lambda (x) (equal? x 'val))), but I don’t do this during expansion because it makes it harder to recognize that two quote patterns are equivalent. Since conversion from core patterns to the AST includes the recognition of equivalent expressions in order to give them unique integer IDs, it could generate an appropriate lambda expression for each mentioned datum then.

This would be good to do together with the previous one, because adding support for explicitly recognizing not patterns in the decision tree generator would add complexity there; removing quote patterns from the AST would remove some related complexity to balance that out. The cp0-equivalent pass of the Scheme compiler will then get rid of the lambda expression again.

Re-chain and patterns to only be nested on the right-hand side

Another way to frustrate the optimizer at the moment is to write something like (and (and a b) c) instead of (and a (and b c)). The optimizer will look for things it can do in the left-hand side of an and, but if it sees another and it isn’t smart enough to be able to look again at that and’s left-hand side, mostly because reconstructing the leftwardly-nested and after deciding to take action on its contents would be too much of a pain. The ordering relationship for these two nesting structures is the same, so an AST transformation should rewrite them all into a form that’s friendlier for the decision tree generator.

Or patterns, on the other hand, don’t have this problem, because the decision tree generator will flatten those out into entirely separate patterns while it’s working, and that process can handle any nesting structure it finds.

Next time

The decision tree generator! This will probably be the third of a four-part series, before the final part – a short coda on turning decision trees into executable code.


Updated 26 November 2025 to reflect some improvements in the structure of the AST, and also to actually add links to the other libraries mentioned here.


The next part of this series is online here.


  1. I would try to promise that, like Knuth, I will have the decorum to not continue the digestive metaphor beyond the stomach, but I fear I would be promising too much.↩︎

  2. Which is the real procedural codata counterpart to the data cons.↩︎

  3. The old, slow version of the decision tree generator had a pretty awful procedure whose job was to push core:subject patterns down the core pattern tree lazily.↩︎

  4. Sheesh, that one missing expander feature alone would probably let me get rid of about a quarter of the lines of implementation in extensible-match.↩︎

  5. For very sad reasons explained in the SRFI, you can’t use a => pattern to match on the number of values a procedure returns. It’s nonetheless possible to construct valid, non-erroring patterns where the same procedure applied to the same subject would return different numbers of values and match different numbers of patterns. Writing one is an exercise for the reader.↩︎

Saturday, November 15, 2025

Sunday, November 9, 2025

crumbles.blog

Tour of a pattern matcher: expression and pattern expansion

I’ve posted a lot on Mastodon (and a little bit here) about the adventures I’ve had implementing an extensible pattern matcher in Scheme. Well, the SRFI isn’t too far off finalization now, and the implementation is unlikely to change dramatically, so I reckoned it might be worthwhile to take readers on a tour of how the implementation works: for one thing, for those who want to be able to dive in and understand the code in future and maybe hack on it; for another, as a further propaganda piece in favour of a style of programming still under-used in Scheme; but also I think there’s a lot the budding Schemer can learn about Scheme, and a lot the budding functional programmer can learn about this central construct of functional languages, from studying the implementation. There’s still a lot of mess in the code that I’m moderately embarassed about (we’ll deal with one such embarassment today, albeit a less immediately avoidable one). But I hope that will get cleaned up over time; like visiting a newly-furnished building where they haven’t yet discovered that putting that cabinet there will make the door handle hit the wall and damage the wallpaper, you can see the idea of the place, and those little things will get sorted eventually.

As mentioned last time I talked about the implementation, extensible-match is a Scheme macro complex enough that it can be seen as a compiler unto itself. There’s a sense in which this is a truism – every Scheme macro is a compiler from a dialect of Scheme containing that macro’s syntactic extension into a dialect that doesn’t use it, at least locally – and attempting to define what makes a macro a compiler beyond that is somewhat impressionistic. But extensible-match has intermediate representations, multiple passes, an optimizer, and a code generator to turn all of that back into Scheme code: it’s definitely on the side of a compiler, taken as a whole. But today, following the path that your patterns actually take when the compiler digests them and turns them into running Scheme code, we’ll just take a look at the front end, which looks more like a traditional Scheme macro definition. Even here, there are a few things worth pointing out along the way!

The match forms

We start our journey in the uncreatively named (extensible-match match) library, which provides all the Scheme expression forms with which pattern match programmers actually use to interact with the pattern matcher. There are a dozen distinct macros for getting access to the pattern matcher – from the very basic match itself (which will probably consistute about 90% of real-world uses) to the somewhat obscure match-letrec*.

Under the hood, though, all of them ultimately expand into a use of match-lambda. Partly this is for party reasons – one thing to name them all, one thing to define them and so on – but also match-lambda is, in fact, the fundamental pattern matching form, in that it offers almost everything that all of the subordinate pattern matching forms need. Specifically, unlike match it handles multiple values without consing them up into a list first; unlike match-let and friends it has multiple clauses which it tries in succession.

Match-lambda in turn is implemented in terms of case-lambda, the Scheme way of making a procedure which takes varying number of arguments by simply dispatching to different code based on the number of them. Case-lambda is built in to Scheme, so the first bit of pattern matching – discerning what to do based on the number of values given – is done for us. All match-lambda does is group the pattern matching clauses (which of course can discern more finely) by the number of values they expect, creates a case-lambda clause for each of those groups, and puts a %match-lambda inside each of those clauses. This %match-lambda handles the next stage: it still handles multiple clauses, but each of them expect the same number of values.

This is a good moment to take a short diversion into macro efficiency, or rather into deliberate inattention to efficiency. An ideal macro expansion will look different depending on the exact forms it receives. In this case, you might think it would somehow optimize the macro if it treated the (common!) case where every match-lambda clause takes the same number of values specially, and expanded directly into a single lambda instead of a case-lambda with only one clause, which itself is going to turn into a call to a lambda expression once this %match-lambda is expanded. Not so!

When writing Scheme macros, it’s always better to just handle the general case and let the compiler itself deal with the redundant code you might generate in special cases. This is the essence of the Guaranteed-Optimization Clause of the Macro-Writer’s Bill of Rights. A case-lambda with only one clause is usually treated by the expander as identical to a lambda even before any optimization begins; resulting structures like (lambda (x y z) ((lambda (x* y* z*) ...) x y z)) are trivial for the compiler itself to remove and turn into a more sane form. Adding it into the macro implementation directly would add more lines of code, but would bring no benefit at all the moment the expansion got out of the expander: in a Scheme compiler, getting rid of pointless make-work code like this is usually the first pass done after macro expansion, a combined inlining/partial evaluation/constant folding pass which Chez Scheme calls ‘cp0’ and Guile calls ‘peval’.1 Compiler optimizations which, in a language without macros, seem fairly pointless – ‘who’d ever write code like that?’ – become valuable with macros, and become more valuable the more powerful the macro system is – ‘oh, code generated by other code looks like that!’

This is the point of the ‘guaranteed-optimization clause’: it lets us be laudably lazy as macro authors by guaranteeing us that our code will still run fast even if we don’t take the time to deal with redundant expressions; and it lets the users of macros know that they don’t have to worry about the performance impact of using a higher-level syntactic construct. Macros generate all sorts of silly code structures, not only because their authors don’t bother to deal with special cases, but also because there’s a lot that macros don’t know about the context they’re being used in. But that information which is trivially accessible for the Scheme optimizer once it has a fully-expanded abstract syntax tree. An example of not needing to worry about special cases is that, even in performance-sensitive code, if your macro takes in an expression but only wants to evaluate it once, you might be tempted to try to recognize if the expression is a single identifier to avoid redundantly generating a let expression giving it a name local to your macro. But you don’t need to: the Scheme compiler can just take care of it! An example which isn’t just laudable laziness is when the expansion of a macro includes code which treats its input specially if it’s a certain type – but the Scheme compiler with the fully-expanded program at hand can run a type propagation pass and change the code to reflect its knowledge of the possible types a variable might have at the specific point of each individual use of the macro.

We’ll see a few more examples of this philosophy of ‘just let the Scheme compiler deal with it later’ throughout our tour. In general: ask not what you can do for your optimizer, ask what your optimizer can do for you! The expand/optimize procedure in Chez and Loko, or the ,optimize REPL command in Guile, is your friend whenever you’re trying to work out what special cases it’s worth writing extra code to deal with vs. which would be wasted effort on something the compiler itself could do better anyway.

Anyway, %match-lambda turns out not to be the end of the chain of delegating to yet more primitive forms called lambda, although this final one doesn’t generate a redundant expression in the final code. This is an extensible pattern matching library, so patterns can be forms that the matcher itself doesn’t natively know how to interpret. In order to deal with those forms, we have to expand them into forms which it can natively deal with. That’s right, it’s a macro expander within a macro! (Kind of!)

Pattern expansion

So we move on to the (extensible-match expand) library, which contains the expand-pattern form which %match-lambda delegates to. The job of this library is to turn unexpanded patterns, containing all the weird and wonderful pattern syntax that users defined themselves, into ‘core patterns’ that the rest of the pattern match compiler can deal with directly. We’ll deal with the structure and rationale of core patterns in the next installment; for now, if you’re familiar with the structure of SRFI 262 patterns, they should look pretty familiar with an extra core: at the beginning of their name. For disambiguation, variables are now spelled (core:var name-of-the-var) and _ is spelled (core:wildcard), but they’re generally pretty similar in structure.

The interpretation of non-core patterns is actually done by yet another library, (extensible-match pattern-syntax), but this was a fairly late change which I made in order to be able to write an implementation-specific version of only that functionality for Guile, which doesn’t yet support identifier properties. Potentially in future, implementation-specific versions could appear for other Schemes which provide enough access to their internals to allow correct (or nearly correct) pattern syntax semantics to be impemented in some way other than actually using SRFI 213. Anyway, the two libraries still work together pretty much as one.

The forms in these two libraries are, surprisingly, themselves macros and not procedures, and they’re used in a bit of an unusual way. The problem is applying macro hygiene to the pattern syntax expansions. Put (very) roughly, macro hygiene works by the expander creating a new, unique token every time one expansion of one macro use takes place; when the user-written code which writes the expansion has returned, the expander goes in and changes the names of all the identifiers in the returned code by adding that magic token.

Well, extensible-match itself is a macro, and doesn’t have a way to create one of those magic tokens to apply to identifiers. We have to let the macro expander of the Scheme implementation itself do it for us – but there’s no standard Scheme API for that. What do?

The answer is macros in continuation-passing style. Hilsdale and Friedman’s paper, linked, is a little obscure and doesn’t really make a compelling case for using this in practice apart from hinting at the possibility of some really dirty tricks (Petrofsky extraction, Kiselyov defilement, etc.). It’s a well-known trick, though, for implementing Scheme macros which can expand to a language other than Scheme (embedded within Scheme). Patterns are an example of such a language; Alex Shinn’s loop macro clauses are another (Taylor R. Campbell’s version of this looping macro is more popular, but I’ve linked Shinn’s original implementation as it’s shorter and shows the continuation-passing trick more clearly).

The idea is that a continuation-passing macro matches code of the form

(the-keyword input-of-the-macro ... next-keyword . next-arguments)

and always transforms it into the form

(next-keyword output-of-the-macro . next-arguments)

So the macro receives the name of another macro which will further process its own output. Because the whole output after this transformation goes back into the expander, which thinks a complete expansion is finished, it will apply a new magic token to everything in the output-of-the-macro. If output-of-the-macro is itself a macro use in the sublanguage being expanded into, next-keyword can again recursively transform it into a continuation-passing style macro use at the Scheme level, referring back to itself as the continuation.

You can do this entirely in syntax-rules! Shinn’s loop does exactly this – macro-extensible macros don’t have to involve the most advanced expander features available. But offering direct access to an extension API based on continuation-passing macros has some disadvantages:

  1. It exposes implementation details of the sublanguage to writers of extensions to the sublanguage, failing to maintain an appropriately opaque abstraction barrier between usage and implementation. One can imagine that someone might come along and decide to actually destructure the next-arguments, thinking that they can understand what the sublanguage implementation is packing in there and re-use that information for a nifty feature. In so doing they create a Hyrumesque nightmare for the developer of the sublanguage, who will break their nifty feature the next time they decide that this internal continuation structure needs changing.

    The problem is only exacerbated by continuation-passing macro extension APIs which provide more than one continuation, or more information, to their writers. Olin Shivers’s loop macro and Taylor R. Campbell’s implementation of foof-loop both do this: the result may be powerful, but from the point of the designer of the sublanguage, it is inflexible, hemming in future extensions of the power of the sublanguage itself.

  2. It fails to establish a distinct namespace for extension keywords to safely live in, outside of which uses of those keywords are either errors (as in (chibi loop)) or have distinct but related semantics (as in the pattern matcher). It also fails to prevent unrelated keywords being accidentally, mistakenly used as keywords within the sublanguage, which can expand into unrelated forms with unexpected behaviour.

  3. It makes error reporting for incorrect uses of extensions to the sublanguage confusing, since upon failure to match any syntax-rules clause, it has no idea which parts of the macro are implementation details of the sublanguage and which are parts written in the sublanguage and directly entered by the programmer.

  4. Macro writers have to be careful to always return to the continuation they were given. (On the other hand, providing quote or quote-syntax as the continuation keyword makes it easy to debug more complex expansions by single-stepping; whereas most Scheme implementations unfortunately do not provide macroexpand-1 for forms in the Scheme language itself.)

The extensible pattern matcher side-steps all of this by capturing the transformer procedures of its extension macros directly (in identifier properties) and calling them only on the user input, then embedding the output of the transformer procedure into a continuation-passing macro. The only information the transformer for pattern syntax has access to is the syntax object representing the pattern use itself; if there’s an error, it’ll be expressed in terms of this only and not in terms of some low-level form the pattern syntax user isn’t interested it. All the pattern syntax transformer has to do is return its expansion, and what the pattern expander does with it after that is none of its business: a clean abstraction barrier, just like real Scheme macros.

You might ask why, having thus eliminated the usual reasons for writing continuation-passing macros, the pattern syntax expander isn’t just a simple procedural recursion like this:

(define (expand-pattern-syntax stx)
  (syntax-case stx (core-pattern)
    ;; Base case, a fully expanded pattern identified by some special head:
    ((core-pattern expansion) #'expansion)
    ;; Recursive case of a pattern with a pattern syntax keyword:
    ((keyword . _)
     (expand-pattern-syntax ((look-up-pattern-transformer #'keyword) stx)))))

Indeed, Sam Tobin-Hochstadt’s paper on extensible pattern matching in Racket presents it as if it did work this way.

As mentioned above, the reason is hygiene: we want to go back into the Scheme expander after each step and apply a unique magic token to any inserted identifiers for each individual use of pattern syntax, so that the names don’t collide with one another. This is, admittedly, somewhat pedantic: there’s no real point to introducing identifiers with the current structure of patterns, but with future extensions there might be, and it’s possible that some unusual pattern macro implementation might have a reason to do it anyway – so it’s worth taking the effort to do the right thing.

All of (extensible-match expand) is thus concerned with implementing a recursive expansion of pattern syntax while maintaining this hygienic property, which it has to do in continuation-passing style. For example, expanding a core:and or core:or means each of their sides have to be expanded recursively, then those expanded forms placed back in a new core:and or core:or which is then deemed fully expanded. If the core:and or core:or is actually a subpattern of some larger pattern – another core:and or core:or even (the core versions of these forms can only be two-armed, as a simplification) – they then have to pass this fully-expanded form back to the ongoing expansion of that larger pattern, so that it in turn can continue and reconstruct its input.

Tobin-Hochstadt’s presentation in his paper is actually something of a pedagogical lie, although it’s not much of one since Racket does have a way to apply its expander’s magic tokens without going all the way back into the expander, and Racket’s pattern matcher uses that to avoid this continuation-passing macro malarkey. This is therefore all a bit embarassing for a Schemer who can look across the aisle and see that Racket has always had better ways of doing this; hopefully, one day, Scheme will too, and all this continuation-passing nonsense will be gone.

As one last point, we don’t expand only one pattern at once – in match-lambda we have a number of clauses, each of which has a number of patterns (for each of the values handled by that clause). So we have not only an expand-pattern in continuation-passing style, but an expand-patterns too, which expands all the patterns in a list of patterns and passes them all on as a list of core patterns; then an expand-patternses which expands all the patterns in a list of list of patterns, passing them on as a list of lists of core patterns, which is what %core-match-lambda actually uses directly. These, of course, work by invoking expand-pattern with a continuation which keeps track of the already-expanded patterns and the ones yet to be expanded. Phew!

All of that gets bundled up and, as %match-lambda requested, the final continuation of the expansion process is %core-match-lambda, the form which co-ordinates the whole pipeline of the actual compilation of patterns into Scheme code.2 We’ll start talking about the stages %core-match-lambda puts the patterns through next time – from here on in, it’s all procedures, and no more expanding match macros into yet more macros!

One last thing: that ‘almost

There’s one wrinkle in using match-lambda for everything: the forms which do recursive binding – match-letrec, match-letrec*, and match-define – practically speaking need access to something that match-lambda doesn’t give them: they need to know the names of all the variables bound by the pattern match. In order to know that, they have to expand the patterns and extract the list of variables before match-lambda does.

Racket actually expands the patterns twice for this functionality, but I think this is unnecessarily nondeterministic. There is no guarantee that any syntax transformation has to be a pure function; while every expansion of a macro should have the same effect, if you throw in generated identifiers and the possibility of seeded hash functions used in a macro implementation and similar such things, there’s no guarantee that every expansion will be detectably identical. So my view is that it’s better to expand the patterns only once. Match-letrec and match-define therefore invoke expand-patterns or expand-pattern themselves, skip around the publically-exposed match-lambda, and go directly to %core-match-lambda. Match-letrec* is implemented in terms of match-define, letting the Scheme implementation take over the implementation of letrec* semantics.

Next time

I’m not sure exactly how much it will make sense to cover in each individual entry in this series, and thus also not sure how many entries there’ll be. But next time we’ll certainly look at the structure of core patterns and the abstract syntax tree, and maybe start to look at the real meat in the decision tree generator.


Greetings, reader from the future! You can read the next part here.


  1. Most compiler courses: ‘Here is 90% on parsing and a tiny bit on register allocation; good luck and have fun!’

    Kent Dybvig: calls the pass of his compiler which comes right after parsing and macro expansion ‘Compiler Pass Zero’, because parsing and expansion aren’t really compiling.↩︎

  2. That’s right – just like Chez Scheme, the architecture of extensible-match is such that it doesn’t really consider expansion per se to be compilation.↩︎

Sunday, November 9, 2025

Friday, November 7, 2025

Scheme Requests for Implementation

SRFI 261: Portable SRFI Library Reference

SRFI 261 is now in final status.

This SRFI proposal addresses systemic compatibility issues exposed by the SRFI 97-defined library reference format (srfi :<SRFI number> <identifier> ...) and advocates for two more modernized, portable and readable alternatives: (srfi srfi-<SRFI number>) and (srfi <identifier>-<SRFI number>).

by WANG Zheng at Friday, November 7, 2025

Wednesday, November 5, 2025

Retropikzel's blog