Ever since I worked on the Android project nearly twenty years ago, I've wanted to write mobile apps in
Scheme. But the early Android API made it hard to run a good Scheme
implementation, and the Android team strongly discouraged the use of
languages other than Java. Those limitations are long gone. Still,
whenever I got the energy to dive into Android Studio, the NDK, and the latest APIs, build problems and other random complexity
would frustrate me, and I would give up and switch to another project.
But now we have LLMs. They are exactly what I needed to get past the
slog and into the interesting parts of the project. I'm so glad I tried again.
My new project, Schmeep (Github), is an Android app built around Chibi Scheme, Alex Shinn's R7RS-Small-compatible Scheme implementation. It uses RAX, a simple framework
inspired by HTMX, to make HTML-based user interfaces in Scheme. No web server is
required. Instead, events are delivered to your Scheme code, which
returns new HTML fragments to update the UI.
Schmeep includes a simple Linux program that opens a REPL on the phone over Bluetooth. The whole project builds quickly using a
simple Makefile. This means that you rarely wait more than a few seconds to see the
effects of a change.
Schmeep has just come to life, but I'm eager to build a real app with
it. My plan is to experiment with writing local-first software in Scheme. I'll wrote more here when I do.
Am I the only one who often gets obsessed with stuff they discovered in really weird ways? I watch a film and some character listens to some music on their car stereo; I look it up and have a new favourite band for the next few months. That kind of thing.
Along those lines, quite a while ago I made an unfunny remark on Mastodon about the tendency of some widely-respected computer scientists to use a certain font in their presentations; just recently, prompted by referring to that post for the nth time – so much for jokes getting funnier by repetition – I decided to look into why they do this, even though that particular font is widely disliked. The answer turned out to be pretty much that they don’t care, and they wish people would focus on the content instead of the font. Fair enough.
Except someone also mentioned that that particular font might be good to use because it’s allegedly easier to read for dyslexic people.
Ah. Hmm. Hrrmph. Well, maybe I need to do something about this in future as well, then!
Cue a whole morning spent overthinking this!
The first thing to do was to find out whether that claim was really true of that font in particular. Answer: unsurprisingly, initial research pointed to ‘no’; some people say it helps, but just as for non-dyslexic people, opinions vary and you can’t generalize over ‘dyslexic people’ as a class here. For one thing, there are very different types of dyslexia. The chap who wrote that Reddit post says it only reminded him of the patronizing treatment he got as a dyslexic kid at school; again, that’s fair enough.1
There are a couple of fonts – Dyslexie and OpenDyslexic – which claim to be designed to be helpful for dyslexic people. The idea of both seems to be the same: somehow making the stroke weight heavier at the bottom of the font maybe gives the letters more ‘gravity’, as if that would help them stay in place better?
Type designer David Jones happened upon my posts fretting about this on Mastodon and was able to point me to a handy introduction to the topic of legibility for dyslexic folk by big-shot typographer Chuck Bigelow. Bigelow looks at the actual scientific research: on this kind of bottom-heavy font in particular, it’s a wash.
Sidebar! Personal hypothesis about why these fonts might feel like they work for some dyslexic people
Readers recognize words by their whole shape, not a letter at a time. The more common the shape, the easier it is to recognize the word. This applies for dyslexic people just as much as non-dyslexic people. For (some) dyslexic people, it’s just harder to pick apart the differences between words that have similar shapes.
This is why place names on road signs are in mixed case, not upper case. In the UK, for example, they used to be in upper case, with the idea that because upper case is bigger it would be easier to read from further away. But people are more used to seeing place names written in running text in mixed case, so the shape is easier to recognize in mixed case. So in 1963 it was fixed to require ordinary mixed case on road signs.
My guess as to why the dyslexic fonts might work for some people is just that they have such unusual letterforms compared to normal fonts that they make the shapes of words unfamiliar enough as to force word recognition to fall back to a lower level. In the same way that non-dyslexic readers have to fall back on letter-at-a-time (or morpheme-at-a-time) recognition to deal with unfamiliar words like quixolotl (a quixotic axolotl) or all-caps words like ASTROLABE, these fonts make everyone fall back to engaging more with the shapes of individual letters, reducing confusions between similar-looking words.
The real problem with this is that it also implies that as dyslexic readers get more used to OpenDyslexia/Dyslexie, the more they’ll get used to identifying words by shape in them, and the more they’ll go back to struggling to tell similarly-shaped words apart.
This is just an educated guess. It would be interesting to come up with an experiment to see if it could be tested.
To me, promoting these fonts now feels like a kind of inclusionist techno-solutionism all too common on the hackerly left, like the able-bodied Mastodon HOA insisting on alt texts that actually make life worse for those who depend on assistive technologies. With this one weird trick, you too can help people overcome their disability! Unfortunately, inclusion and accessibility are hard (though that certainly doesn’t mean they’re not worthwhile!) and there isn’t one weird trick to include everyone without accidentally disadvantaging some. If there is something you can do to try to include more people, it needs careful thought and consideration about the best way to make that affordance available, and about how to avoid accidentally hindering more people than you help – or worse, hindering the people you’re actually trying to help. In this case, for individual dyslexic people who feel like these fonts actually do help them read, and know where to find the setting to make websites display in those fonts, sure;2 but for a general audience, maybe not – we need a different solution here.
Bigelow does point to some clear results saying there’s a change you can make that makes readability measurably better for dyslexic people: make the font bigger!
That study has some limitations: the sample size was small and included only children, but it seems it was prompted by a previous study which included adults and gave initial positive results for increased font size across age ranges, prompting further investigation by the authors, so the indications are good that this isn’t a fluke or limited to children. The other major limitation is that both studies concern passively lit running text at close distance, not bullet points projected on a wall on the opposite side of the room; intuitively that doesn’t feel like a relevant factor. Also, increasing the font size only works to a certain point, at which point the benefits stop increasing (but, crucially, things don’t actually get worse), which also makes intuitive sense: as a non-dyslexic person, very fine print is harder to read than normal-size print, but after a certain point it makes no difference – 6 pt type is pretty hard going, but I can read 10 pt type just as well as 72 pt type. The result of the study is specifically that this threshold seems to be about a third higher for dyslexic children.
Of course, bigger text on a slide means you can fit less on, but I’m of the view that less text per slide is better anyway. I’m still not very good at practicing what I preach on that topic, so encouragement from a larger font size probably ends up being good for me as presenter as well! And, of course, a bigger font size is probably going to be better for audience members with vision impairments too. Wins all around.
So I boosted the font size of the slides I was working on by about 50% where I could; slightly less in some cases. The change also prompted me to redesign or split up some otherwise overcrowded slides.
Bigelow points to two other positive results. One is for a shorter line length: with a fixed width projector screen, a bigger font means shorter line length anyway, so that’s an easy win as well. But also that seems more relevant for running text than for short points on a slide: if you have text that spans more than one line on a slide, that’s already suspect; three or more lines is barely even a slide any more – it’s stopped being a visual aid for speaking and turned into a copy and paste of the paper for reading.
The other vaguely positive result is for slightly increased letter spacing (which David Jones suggested might be the reason the font discussed at the start of this post works for some dyslexic people). The results here are less clear than for font size or line length, but choosing a familiar font with slightly looser spacing would also be a change that would be unlikely to make anything worse for some people. I can kind of see the problem here: at the moment I use Helvetica, and the letters ‘tt’ in the word ‘pattern’ sometimes join together at the crossbars, making them look like a single letter.
I looked at the fonts Andika and Atkinson Hyperlegible as potential alternatives that are specifically designed to avoid situations that make words difficult to read or tell apart, but I haven’t yet made up my mind about switching. Atkinson in particular seems to be aimed at helping people with vision impairment more than dyslexia – not that that would be a bad thing, but it also can’t be denied that some of its letterforms deviate from the principle of familiarity of form which seems significant in helping dyslexic people. From my anecdotal experience with my own short-sightedness (only −1.25 dioptres in both eyes, admittedly), it also feels like I probably did much, much more to help by increasing the font size than I could do by switching to a different font; from my experience, probably like an order of magnitude more. For my personal stuff in general, I’ve mostly standardized on Adrian Frutiger’s Univers (in the case of this website, through the noncommercial-licensed GhostScript digitization); perhaps I should look at using that on my slides as well. It feels like it has a little more character than Helvetica, perhaps simply through being less overexposed, and it does also have slightly looser tracking.
Anyway, make the font bigger – for real! I think there are only wins to be had in that direction. You’ll make better, simpler slides; your vision-impaired and dyslexic audience members may thank you.
I sympathize with unhappy school memories associated with that font – to me, it only reminds me of the handouts my school German teacher gave us. Funnily (considering where I now live), German was my worst subject in school, and the teacher was an authoritarian ass who liked to pick on weaker students, so those aren’t good memories for me either.↩︎
Incidentally, designers, have you tried out your website or app with a custom font setting recently? Have you checked that no button text is unnecessarily truncated; that no important text overflows onto a second, partially invisible line? Another benefit: if your thing is only (as yet) available in one language, making sure that everything works with a wider or narrower font is an important step to be ready for the day when a translator comes along and wants to change the whole text, which will have just as big an effect on the relative widths of all the links and buttons.↩︎
Alternative external formats for arrays and complex numbers
Gauche now recognizes a few different external format for arrays and complex numbers. For writing, it can be selected by <write-controls> object, or print-mode in REPL.
Array literals
Gauche has been supporting srfi:25 arrays, but it does not define external representations.
Gauche uses srfi:10#, mechanism to allow to write
arrays that can be read back, but it is not very user-frendly.
The #2a(...) notation is defined in srfi:163 Enhanced Array Literals, and in its
simplest form, it is also compatible to Common Lisp's array literals. From 0.9.16, it is
the default output format of the array.
You can also make Gauche reports the lengthes of each dimension:
There was an asymmetry in input/output of complex literals. For reading, both the rectangular notation
1.4142135623730951+1.4142135623730951i and the polar notation 2@0.7853981633974483 are recognized, but
for printing, it is alyways in the rectangular notation.
Now you can choose the output format.
Gauche also extended the polar notation by adding suffix pi, e.g. 2@0.25pi to specify
the phase by the multiple of pi.
The following session shows how a complex number is printed with different print-mode:
Furthermore, Gauche also supports Common-Lisp style
complex notation, #c(...). This is particulary
useful to exchange data between Gauche and CL programs.
You can change all instances of that particular list. Super dangerous.♥︎
Here, I’ll show you:
(let* ((abc '(a b c))
(abc2 abc)
(fish '(f i s h))
(fish2 fish))
(push! 'nope abc)
(mutate-cons! 'yeah fish)
(list abc abc2 fish fish2))
Returns this:
((nope a b c) (a b c) (yeah f i s h) (yeah f i s h))
That’s not a slag on push! which is often what you want. It’s a
macro modifying what one particular name points to. Whereas
mutate-cons! is a procedure that changes the actual underlying
object.
Also mutate-cons! doesn’t work on the empty list. You can only push
onto lists with at least one element. They need to be actual cons pairs.
I feel a change is happening in how people produce and (want to) consume software, and I want to give my two cents on the matter.
It has become more mainstream to see people critical of "Big Tech". Enshittification has become a familiar term even outside the geek community. Obnoxious "AI" features that nobody asked for get crammed into products. Software that spies on its users is awfully common. Software updates have started crippling existing features, or have deliberately stopped being available, so more new devices can be sold. Finally, it is increasingly common to get obnoxious ads shoved in your face, even in software you have already paid for.
In short, it has become hard to really trust software. It often does not act in the user's best interest. At the same time, we are entrusting software with more and more of our lives.
Thankfully, new projects are springing up which are using a different governance model. Instead of a for-profit commercial business, there is a non-profit backing them. Some examples of more or less popular projects:
Some of these are older projects, but there seems to be something in the air that is causing more projects to move to non-profit governance, and for people to choose these.
As I was preparing this article, I saw an announcement that ghostty now has a non-profit organisation behind it. At the same time, I see more reports from developers leaving GitHub for Codeberg, and in the mainstream more and more people are switching to Signal.
From a user perspective, free software and open source software (FOSS) has advantages over proprietary software. For instance, you can study the code to see what it does. This alone can deter manufacturers from putting in user-hostile features. You can also remove or change what you dislike or add features you would like to see. If you are unable to code, you can usually find someone else to do it for you.
Unfortunately, this is not enough. Simply having the ability to see and change the code does not help when the program is a web service. Network effects will ensure that the "main instance" is the only viable place to use this; you have all your data there, and all your friends are there. And hosting the software yourself is hard for non-technical people. Even highly technical people often find it too much of a hassle.
Also, code can be very complex! Often, only the team behind it can realistically further develop it. This means you can run it yourself, but still are dependent on the manufacturer for the direction of the product. This is how you get, for example, AI features in GitLab and ads in Ubuntu Linux. One can technically remove or disable those features, but it is hard to keep such a modified version (a fork) up with the manufacturer's more desirable changes.
The reason is that the companies creating these products are still motivated by profit and increasing shareholder value. As long as the product still provides (enough) value, users will put up with misfeatures. The (perceived) cost of switching is too high.
Let us say a non-profit is behind the software. It is available under a 100% FOSS license. Then there are still ways things can go south. I think this happens most commonly if the funding is not in order.
For example, Mozilla is often criticised for receiving funding from Google. In return, it uses Google as the default search. To make it less dependent on Google, Mozilla acquired Pocket and integrated it into the browser. It also added ads on the home screen. Both of these actions have also been criticized. I do not want to pick on Mozilla (I use Firefox every day). It has clearly been struggling to make ends meet in a way that is consistent with its goals and values.
I think the biggest risk factor is (ironically) if the non-profit does not have a sustainable business model and has to rely on funding from other groups. This can compromise the vision, like in Mozilla's case. For web software, the obvious business model is a SaaS platform that offers the software. This allows the non-profit to make money from the convenience of not having to administer it yourself.
There is another, probably even better, way to ensure the non-profit will make good decisions. If the organization is democratically led and open for anyone to become a member like Codeberg e.v. is, it can be steered by the very users it serves. This means there is no top-down leadership that may make questionable decisions. Many thanks to Technomancy for pointing this out.
Ah, good old volunteer driven FOSS. Personally, I prefer using such software in general. There is no profit motive in sight and the developers are just scratching their own itch. Nobody is focused on growth and attracting more customers. Instead, the software does only what it has to do with a minimum of fuss.
I love that aspect, but it is also a problem. Developers often do not care about ease of use for beginners. Software like this is often a power tool for power users, with lots of sharp edges. Perfect for developers, not so much for the general public.
More importantly, volunteer driven FOSS has other limits. Developer burn-out happens more than we would like to admit, and for-profit companies tend to strip-mine the commons.
There are some solutions available for volunteer-driven projects. For example Clojurists together, thanks.dev, the Apache Foundation, the Software Freedom Conservancy and NLNet all financially support volunteer-driven projects. But it is not easy to apply to these, and volunteer-driven projects are often simply not organized in a way to receive money.
With a non-profit organisation employing the maintainers of a project, there is more guarantee of continuity. It also can ensure that the "boring" but important work gets done. Good interface design, documentation, customer support. All that good stuff. If there are paying users, I expect that you get some of the benefits of corporate-driven software and less of the drawbacks.
That is why I believe these types of projects will be the go-to source for sustainable, trustworthy software for end-users. I think it is important to increase awareness about such projects. They offer alternatives to Big Tech software that are palatable to non-technical users.
I thought the do in in Scheme was a li’l hard to learn since the
examples I could find was a li’l too fancy and clever. Just like a lot
of my own documentation often is; sorry about that.
(do ((a 0 (add1 a))
(e 13)
(i 7 (sub1 i)))
((zero? i) (print "ended ") (print a))
(print "game cube")
(print e)
(print a " wii"))
The first argument is a list of bindings like a let. But if you put in
three things they will be rebound every round. Like in the above
example, a will be rebound to (add1 a), and i will be rebound to
(sub1 i). And you can also put in just one thing like the example
here just binds e to 13 and then it just stays that.
The second argument is a list that starts with the ending condition
and the rest of the list are statements that will be evaled after.
And then all the remaining arguments are evaled every round.
Yes, this is backwards that the ending stuff comes before the main progn body.
Here’s a more basic example to just print happy birthday four times:
(do ((i 4 (sub1 i)))
((zero? i))
(print "happy birthday"))
Although for simple things like that, Chicken has dotimes:
(dotimes (i 4) (print "happy birthday"))
Both do and dotimes are pretty side-effects oriented but in both cases you can put in a return value:
(do ((i 4 (sub1 i)))
((zero? i) 'my-return-value)
(print "happy birthday"))
(dotimes (i 4 'my-return-value) (print "happy birthday"))
And that return value can access your do scoped binds.
The same miscmacros that gives us dotimes also has the even simpler (repeat 4 (print "happy birtday")).
I refer to my own fold write-up all the time (the
folds as named lets version hasn’t been as useful) and maybe with
this, I can use do and dotimes more instead of making let loops
and consing up unnecessary iotas.
Last time I attempted Advent of Code, I got tangled up modifying the
step one solutions to handle step two and then I ended up wanting
to revisit step one but they were gone. So this year I’m going to
try to paste a second copy before modifying and I hope that works out
better.
(define (dial acc r d)
(with (+ d r)
(when
(or
(zero? it)
(= 100 it)
(< it 0 d)
(< d 100 it))
(acc))
(modulo it 100)))
(define (dial acc (? (fn (< x -100)) r) d)
(acc)
(dial acc (+ r 100) d))
(define (dial acc (? (fn (< 100 x)) r) d)
(acc)
(dial acc (- r 100) d))
(define (dial acc (? string? x) y)
(dial acc (string->number (strse x "L" "-" "R" "")) y))
(define (turn)
(define-parameters zero 0)
(fold (c dial (fn (zero (add1 (zero))))) 50 (read-lines))
(zero))
This was one of the hardest bugs I’ve ever had to debug.
I’ve had to write log parsers, differs between different logs from
different versions, multiple implementations to check against each
other, Emacs highlight-regexp and count-matches and so on. It took
me ten tries on the Advent of Code website. I get paranoid that I
had mistyped my answer in there.
The line that now says (< it 0 d), originally I had it
as (<= it 0 d) but it gave false positives on rotating from zero.
For a while I had it as (and (< it 0) (<= 0 d)) which… doesn’t
fix that problem at all. Even after coming up with the fix (< it 0
d), that gives false negatives on rotating exactly one rotation
left from zero. But there’s no L100 in the data set? No, but my
code before I cleaned it up had:
(dial acc (+ r 100)
(dial acc -100 d))
where it now says:
(acc)
(dial acc (+ r 100) d)
…leading to lots and lots of zero to zero turns which came with a
false positive in some versions and false positives in others.
A complete PEBCAK on my part but the hunt for the bug became a real
adventure of trying to sift through clues in logs that were
thousands of lines long.
After adapting that same idea to part two with a few minor tweaks,
it’s too slow! Works fine with the example data but not the full
input. I hate it when I have a working solution that
I really like beacuse it does something clever but have to write a
whole new one that’s faster. This is a “Project Euler” type
problem where I need to come up with a math solution instead of
just list procressing. But then I don’t really hate it because I
did come up with a good solution.
Inverting the puzzle by making an is-in-any-range? predicate and
then we can generate all invalid numbers up to the ceiling and see
if they’re in any range.
(define (is-in-any-range? x)
(or
(<= 1 x 19)
(<= 51 x 69)
(<= 72 x 85)
(<= 86 x 113)
(<= 411 x 466)
(<= 525 x 652)
(<= 660 x 782)
(<= 859 x 1056)
(<= 1626 x 1972)
(<= 2768 x 3285)
(<= 4002 x 4783)
(<= 4919 x 5802)
(<= 7025 x 8936)
(<= 9096 x 10574)
(<= 13004 x 15184)
(<= 32138 x 36484)
(<= 48548 x 61680)
(<= 69302 x 80371)
(<= 82984 x 100358)
(<= 126397 x 148071)
(<= 193276 x 237687)
(<= 266408 x 302255)
(<= 333117 x 414840)
(<= 431250 x 455032)
(<= 528410 x 680303)
(<= 726807 x 764287)
(<= 779543 x 880789)
(<= 907442 x 983179)
(<= 2558912 x 2663749)
(<= 5117615 x 5149981)
(<= 7702278 x 7841488)
(<= 9231222 x 9271517)
(<= 13413537 x 13521859)
(<= 32295166 x 32343823)
(<= 49829276 x 50002273)
(<= 67606500 x 67729214)
(<= 99990245 x 100008960)
(<= 146086945 x 146212652)
(<= 4747426142 x 4747537765)
(<= 5552410836 x 5552545325)
(<= 5858546565 x 5858614010)
(<= 7454079517 x 7454227234)
(<= 8764571787 x 8764598967)
(<= 9999972289 x 10000034826)))
Okay, great! I checked that there’s no overlapping ranges in this
particular data set. That means we can make an idempotent summer so
we don’t add the same number twice.
Let’s hard code it to five-digit numbers which is okay for this
particular input.
(define (generate-the-answer)
(do ((num 1 (add1 num)))
((< 100000 num) (summer))
(add-all-repeats num num)))
Okay, that’s a relief! Today was a lot easier to debug. I
originally had the summer see the numbers even before they were
repeating. But that bug was easy enough to find and fix.
Dec 3rd
Okay, here we have a similar dilemma of “extracting” vs building up
possible joltages and filtering for them like (strse? "9.*9").
Maybe if I start with extracting, that will still be useful as a
fallback for any stragglers after a filtering solution.
That worked! Weird feeling how Monday took all day because I was
chasing a bug and even Tuesday took more than an hour, maybe closer
to three hours, but this one my idea worked right away and the
solution for part 1 was also the right direction for part 2. I
lucked out! And/or am actually good at programming especially when
it’s straight-forward list-processing like this.
Dec 4th
This time around (it’s my second time attempting Advent of
Code; I tried it in 2023 but quit before the end) I’m
paying more attention to the story and I’m really getting into the
Matt Groening–like shenanigans.
As for the puzzle, this type of 2d, maps-and-neighbors stuff is
something I don’t have as much of a standard library for. SRFI-1
doesn’t really cover it so I’m starting more from scratch here and
I’m buckling in, accepting that it might take a li’l more time and
what write here I’ll get use out of later too. I actually thought
to work a li’l bit ahead and look up some array stuff in the latter
SRFI’s but then I didn’t have time to do that in November.
(define (count-accessible)
(define nodes (call-list (map call-string (read-lines))))
(define (get-node x y) (void))
(define (get-node x y)
(handle-exceptions exn (void)
((require procedure?
(nodes (require (c < -1) y)))
(require (c < -1) x))))
(define (get-neighbors x y)
(parse (c apply get-node)
(list-ec (: dx -1 2)
(: dy -1 2)
(if (not (= 0 dy dx)))
(list (+ x dx) (+ y dy)))))
(let ((width (length (nodes)))
(height (string-length ((nodes 0)))))
(sum-ec (: x 0 width)
(: y 0 height)
(if (eq? #\@ (get-node x y)))
(if (> 4 (count (is? #\@) (get-neighbors x y))))
1)))
Okay I like it when it works first try because I hate to put in
more than one guess but this was right. Good. Also didn’t have any bugs.
Now onto part 2. I really have to give Advent of Code a stern
scolding when it comes to accessibility: the dark grey text on dark
grey background is really really really hard to read so I use
einkbro’s light mode but that mode didn’t show the highlighted @
signs in the part 2 example. I had to toggle off the mode but then
I almost can’t see anything on the screen. Bad bad elves!
But okay, I figured out from what the text says what to do.
(define (count-accessible)
(define nodes (call-list (map call-string (read-lines))))
(define (get-node x y) (void))
(define (get-node x y)
(handle-exceptions exn (void)
((require procedure?
(nodes (require (c < -1) y)))
(require (c < -1) x))))
(define (get-neighbors x y)
(parse (c apply get-node)
(list-ec (: dx -1 2)
(: dy -1 2)
(if (not (= 0 dy dx)))
(list (+ x dx) (+ y dy)))))
(let* ((width (length (nodes)))
(height (string-length ((nodes 0))))
(get-accessible
(lambda ()
(sum-ec (: x 0 width) (: y 0 height)
(if (memq (get-node x y) '(#\x #\@)))
(if (> 4 (count (fn
(memq x '(#\x #\@)))
(get-neighbors x y))))
(begin
((nodes y) x #\x)
1)))))
(descend ((accessible (get-accessible)))
(do-ec (: x 0 width) (: y 0 height)
(if (eq? #\x (get-node x y)))
((nodes y) x #\.))
(+ accessible (desc (get-accessible))))))
Okay. That worked. No wrong entries today either which always feels
great. I could spot my bugs on the example output. The bug today
was that while I realized right away that I need to count X as
neighbors, I forgot that I needed to count X as self too. So I was
done in a li’l less than an hour (three quarters rather) which is
fine. More than yesterday but that’s OK. I had to implement all
this 2D neighbors stuff. I liked the idea of using parse since it
just elides voids.
Today was a real head-scratcher because it seemed to me part 1 is a
subset of December 2nd and part 2 is even easier than part one.
Then I realized that the difference is that unlike December 2nd,
this time our input data have overlapping ranges (something I
checked for on Dec 2nd but almost forgot to do here). I’m grateful
that the test input also did, or I would’ve wasted a guess on the
real thing. Joining the ranges is just a smop once you know that
it’s there.
I live for this convoluted maps of maps of maps of maps stuff! Very
fun problem.
Uh but if I were to try to explain how my program works… Hmm. From the inside out:
Reads all lines as lines.
Adds extra spaces to the end so all lines are the same length.
Pivots the lists so columns become rows and rows become columns.
Then with Acetone’s parse I split the problems into their own
lists.
I split out the operator (that’s the list last, butlast stuff)
and put it first then read and eval each problem.
Then finally I sum all those answers up.
Didn’t have any bugs today. I did put in two redundant
reverses that still gave me the right answer; I found them and
removed them after getting the star while making this write
up.
Dec 7
(define-parameters splits 0)
(define (tachyon-count (prev current next . beams))
((over
(when (and (eq? x #\S) (eq? y #\.)) (current i #\S))
(when (and (eq? x #\S) (eq? y #\^))
(splits (add1 (splits)))
(next (sub1 i) #\S)
(next (add1 i) #\S)))
(prev) (current))
(tachyon-count
(cons* current next beams)))
(define (tachyon-count (last exit))
(splits))
(define (tachyon-count)
(tachyon-count (map call-string (read-lines))))
Now this is what I’m talking about! This is the longest I’ve spent
on a part 1 so far. Even Dec 1st, which was my longest day, part 1
wasn’t where I got stuck. Here I knew what to do, it was just tricky
to keep track of everything. Now onto part two of this wonderful
puzzle!
After reading part two… what a let down! It’s just the
non-idempotent version. Although smopping that together on a
tired-brain day like today is easier said than done.
I apologize to the makers of Advent of Code for calling their hard
work a let down, it’s just that the non-idempotent “naively
recursive” version is what I almost wrote by accident for part 1. I
checked myself in time before making that version so actually
implementing it did take some time.
Before I thought to clean up the input it was hard to keep track of
everything (I had prev, current, next, blank lines, passing through
etc). And I had something that worked on the example input but was
too slow for the real input. So I started over and that’s the
version you see above. It introduced a bug (I forgot to pass
through beams at first) which required some creative logging to
find with ever-increasing indentation prefixes etc etc until the
new version finally worked on the example input. But it was still
too slow for the real input. And memoization fixed that and here we
are. All in all an extra hour or two.
The hardest problem yet after the “breathers” of 5th and 6th, but I
remember last time (2023) I was completely stumped on some problems
even after spending a day with a paper notebook just thinking and
thinking and so far we haven’t seen that. I remember back then
having to postpone some of the stars like “Okay I’ll get back to
this one later” and doing it in the evening or the next day or
something and this year I’ve just done both of them in the morning
except for the first day that did take all day. (And what a
privilege to be able to work all day on a recreational
puzzle!) Maybe it says more about how incredibly burnt out
I was after the apartment move back then than about the
difficulties of the puzzles.
Also this one felt more like a “knowledge test” than the preious
entries. I knew about the basics of recursion vs iteration,
idempotence vs shadowing, and the life-changing magic of
memoization. I know about those things from books like SICP and
PAIP. It’s less
about me figuring out something clever and more about me having
book learning. That doesn’t feel super fair.
Maybe I should take this opportunity to share some of that book
learning: My part one solution went through every row once. That’s
why it’s fast. It’s an iterative solution. The part two solution
needs to go through every row for every beam splitter above it. It
branches over three trillion times. That is too slow for even my
super duper computer to figure out. But memoization, which in this
case means having a hash-table that stores the results it has seen
before, means that it doesn’t have to re-calculate subtrees it has
seen before. It becomes fast again.
memoize! the brev-separate version does work even through
match-generics (but it needs to be called after all the
definitions) and zshbrev (since the entire file is compiled).
Dec 8th
I usually hope for a hard one but today I overslept or rather I had
a hard time falling asleep and it’s already like two hours past my
normal wake-up time and I have laundry day so I hope it’s an easy
one today.
I haven’t gotten to read the problem yet but I’m opening the wrong
windows, pasting the wrong files etc. This is gonna be a hard day
no matter how easy the problem is just from my own tiredness.
(define (pyth a b) (sqrt (+ (* a a) (* b b))))
(define (distance (x1 y1 z1) (x2 y2 z2))
(pyth (abs (- z1 z2))
(pyth (abs (- x1 x2)) (abs (- y1 y2)))))
Note to self: It was way slower when or both of these was memoized
because they’re usually not called that often on the same inputs so
the lookup costs more than it saves. Also note to self: I could
save a little by removing the outermost sqrt. Orders of squares are
the same as the order of roots.
Aaand it’s a super hard problem. We’re in the back half now folks! The deep end!
Back after breakfast break. This is three problems (distance in 3D
space, keeping track of groups of circuits, and recursive pairwise
comparison) that each on their own would’ve been enough the past
week. I for one did not know how to check distances in 3D space so
I had to figure it out. (I did know how to check them in 2D space.)
(define circuits (call-table))
(define (connect a b)
(unless (memq a (circuits b))
(with (append! (circuits a) (circuits b))
(for-each (fn (circuits x it)) it))))
This one, take-up-to is a goody that I should get around to
putting in brev-separate. I use it all the time for RSS stuff. I’m
sure it’ll come in use for more than one day this challenge:
This divides up the half a million distances into spans.
After getting my two stars I wanted to keep optimizing and I timed
it out that five or six was the fastest quotient. I started out
with a hundred but that’s way slower. Even seven is slower and so
is four. With five, we have 25909 spans (hash-table entries) with
an average list length of just under twenty each. That’s an
indictment of my fancy mutating cdr-setting insert-sorted!. But a
testament to the glory of hash-tables.♥︎
Getting a solution that even works on the example of part one took
several hours (the actually figuring out the three parts of
the problem took a long time and then I also had a bad bug in my
connect routine where it worked until I was connecting existing
networks). That solution was too slow for the real
input—and that was still on the first star! The first problem was
that I was running sort on all the distances and then took the
limit on that sorted list. Running sort post-hoc on half a million
entries was something I thought it would’ve been able to handle but
apparently not.
Then I made an insert-sorted that, functionally (pure shadowing)
inserted the new distances as I went instead of sorting them
post-hoc. That finally gave me an answer for part one’s full data
but it took several minutes to find the answer. So after reading
part two, I made the mutating insert-sorted! and also added the
spans. Initially I had spans by hundreds which gave me the answer
in about twelve seconds or so. Really strange to me that sorting
was the bottleneck but that’s what it was.
Now for part two:
(define (connect a b)
(unless (memq a (circuits b))
(with (append! (circuits a) (circuits b))
(for-each (fn (circuits x it)) it)))
(length (circuits a)))
(define (connect-and-count)
(define boxes (read-csv))
(define boxl (length boxes))
((over (circuits x (list x))) boxes)
(pair-for-each
(fn (with (car x)
((over
(stash! (list (distance it x) it x)))
(cdr x)))) boxes)
(let/cc break
(for-each
(fn (for-each (fn (when (= (connect (second x) (third x)) boxl)
(break (* (caadr x) (caaddr x))))) (spans x)))
(sort (hash-table-keys (spans))))))
I had it down to “just” twelve seconds to sort all the half a
million distances, but then the connecting them all into one big
network was still too slow. I tried several other algoritms for
connect until I landed on the one I used above and I updated part
one to match also. Since this was largely an optimization puzzle, I
kept working on part one to make it faster making sure I’d still
get the right answer and then applying it to part two. So today I
didn’t submit any wrong answers either. It just took all day, is all.
I had a bunch of versions of connect which added all the boxes to
all the boxes one by one. One of those versions was bugged (before
I even had part one done). I felt like galaxy brain when I came up
with the append! solution since it was so different than anything I
had and such a Gordian shortcut. If some of y’all had that approach
figured out right away I salute you.♥︎ For me it took some time
getting there.
Wow!! I loved this puzzle! I had it ticking along slowly until I
figured out a new way to connect and now the connection part is
instant. I’m really proud of my solution. I ended up with getting
the distance sorting down to under two seconds and the connection
part to be instant. That’s a good optimization down from an
instance sorting that timed out, and then I got it down to twelve
seconds, so that then the connection part was what timed out. And
now the whole thing is done in two secs. I loved this puzzle. It
took all day but it was a day well spent and I learned a lot just
by experimenting, without looking things up (beyond just
reading the docs for SRFI-1, SRFI-69 and the other libraries I was
using. Especially my own. I don’t consider it cheating to read my
docs I’ve written and posted to this website♥.).
Maybe this is backwards and rotty but I’m way more proud of spending a whole
day on the problem like today than when I solve it quickly.
Although as per ushe with me there’s a zone of suck where
spending a couple of hours is what I’m least proud of compared to a
fast solution or sticking to it all day.
Dec 9th
I’m hoping for easy ones from here on out and at first glance it
seems like today delivered on that since I can use a similar
pairwise comparison as yesterday. Unlike yesterday where I couldn’t
figure out any easier way to count up the distances than to
actually pyth them out, here an idea immediately comes to mind
where I can discard candidates based on one or both axes and weed
things out considerably, but I’ll try the more brute force wasteful
approach first, maybe it’ll be enough.
(define (carpet (ax ay) (bx by)) (abs (* (- ax -1 bx) (- ay -1 by))))
(define (all-squares)
(with (read-csv)
(biggest (list-ec (:list a it) (:list b it) (carpet a b)))))
Sloppy! For the first time in a while I entered a wrong input into
the website; I didn’t notice that my code gave the wrong answer on
the example input, I was just happy that it was fast enough on both
the example input and the actual input. It was, so I’m not gonna
have to do anything fancy at least for part one, but, uh, being
right is more important than being fast.♥︎ The bug was that I had
forgotten the -1 above, counting the tile distances exclusive
rather than inclusive.
Okay so unlike yesterday where I thought part one was pretty
difficult on its own, here there’ a huge jump in difficulty by part
two, or rather, what I did for part two is not super relevant for
part two. But that’s okay. That’s why I like to get to part two
quickly so I can know what I’m really supposed to do. (And
the fact that it’s often hard to predict is part of the fun of
Advent of Code.)
It’s just after midnight and I haven’t figured out the second part yet.
Dec 9th, continued
Okay it’s the next morning! Back to yesterday’s problem before even
looking at the new problem. Most of the following was written
yesterday. I write and delete and write and delete, that’s my
workflow.
The red tiles are all inside the area 1837,1574 to 98308,98188 so
initializing a vector of vectors to fit that dies with OOM so
we’re gonna have to get fancy and procedural.
I painted the red and green tiles both green in this image and
superimposed my program’s best solution as a black rectangle on
top.
But it’s not accepted as the right answer. I can’t see a better
answer with my own eyes either. So I’m leaving this as a one star
and I’m just that much closer to giving up on the entire Advent of
Code. I obsess over it, I spend all day on it in this horrible
hyperfocused state. Other activities like playing games with
friends or eating food become stress isntead of joy. And to boot
I’m still not smart enough to actually solve the problems!
I’ve teased it long enough:1 today we’re going to look at the real meat of turning declarative, something-that-looks-like-this patterns into fast, running, imperative, if-this-then-that code. The absolute final generation of working Scheme code will have to wait until our thrilling conclusion in part 4, but by the end of this episode you’ll probably at least have an idea of what that will look like.
Today we’re in the (extensible-match decision-tree) library, which takes patacts (the AST representation of a pattern match clause introduced last time) and produces an optimized decision tree. It starts with a foreboding comment:
;; This is some of the hardest code I have ever had to write.
And, indeed, not since I was a tweenaged baby hacker struggling through an online tutorial on how to implement A* pathfinding for my game2 have I had so many false starts and wrong directions in implementing a known algorithm. But these were problems with working out how to translate the idea into code – the idea itself is pretty intuitive!
Motivating example
To see how we benefit from an optimizing decision tree generator, let’s look at an example not all too dissimilar from the last example we saw in the previous part.3
(define(uniq ls same?) ; remove adjacent equivalent elements from list (match ls ('() '()) ((cons_ '()) ls) ((cons* y x ls*) (if (same? y x) (uniq (cons x ls*) same?) (cons y (uniq (cons x ls*) same?))))))
For this input, a typical Lisp hacker’s first attempt at a pattern matcher implementation usually generates code that looks basically like this:
The idea is simple: compile each clause to a procedure, and tail-call the next procedure’s clause as a ‘failure continuation’ at any point where you notice that the pattern doesn’t match.
This is a completely reasonable implementation for very many, nay, probably even a majority of use cases. It’s how everyone learning to write a pattern matcher starts out, and indeed probably how every individual pattern match compiler starts out (including extensible-match!) The syntax-case pattern matcher implemented by psyntax basically works like this, because the absolute fastest matching time isn’t a priority for something that runs at expand time. Alex Shinn’s pattern matcher implementation works like this – anything more complicated would probably be hell to write in pure syntax-rules, and the sheer portability Shinn managed to achieve by writing only in syntax-rules has let his pattern matcher take over the world, inefficiencies and the occasional fudging on semantics be damned. (By contrast, SRFI 262’s dependence on identifier properties will likely hold it back from any equivalent level of world dominance for some time yet, until more Scheme implementers start to catch on to how nifty they are.)
Some of the inefficiencies in this particular code will be optimized away by the Scheme compiler. I mentioned in the first part of this series that redundant let bindings are easy to get rid of. Unfortunately, more consequential inefficiencies are harder to deal with. Most significantly, even after optimization, this code will call pair? and cdrtwice every time it goes through the loop: once for the second clause, then again in the third clause. While Scheme compilers are usually clever enough to eliminate repeated type checks when they occur within a single nested tree of code, putting each check in its own little procedure messes this compiler optimization up. The exception is Guile, because Andy Wingo developed a very clever compiler pass which can recognize common prefixes across these internal procedure boundaries. Also, very simple uses – like the last example from last episode – can have the clauses inlined into one another and thus end up in a nice nested tree for the compiler to work with; but as a data point, clause-3 above is already complex enough (or complex enough and referenced from enough places) that Chez’s cp0 doesn’t bother.
This repeated calling and branching is theoretically unsatisfactory, but the real philosophical issue in these inefficiencies – the thing that really motivated me to want to do better – is specifically that this generated code is nothing like what a programmer would write in the absence of pattern matching. Nobody would ever write code which had those repeated pair? calls. The real point I want to make by spending many lines of code on trying to do something better is that pattern matching is a good idiom, and programmers should use it. To that end, pattern matching should be a usable tool even in the hottest of hot loops, even on the slowest of microcontrollers: nobody should ever feel like they have to switch back to writing out if expressions by hand for performance’s sake. The fact that pattern matching is a declarative idiom, and declarative programming has something of a reputation for being inefficient, doesn’t exactly help its reputation here.
There are mountains of research into efficient compilation of pattern matching to avoid generating silly code like this. In the usual framing, there are two approaches: the backtracking automaton and the decision tree. The backtracking automaton, as its name implies, will still sometimes run the same tests on the same subject more than once, but tries to do so less than the naïve approach shown above which re-examines everything from scratch in every clause. The decision tree, on the other hand, will never re-examine any of its inputs, but may have much larger (exponentially larger!) code size than the automaton.
In fact these two approaches aren’t so different in practice – once you run the right time-saving optimizations for generating an automaton, or the right space-saving optimizations for generating a decision tree, the results are pretty close to one another. They’ll be nearly identical on very, very many use cases; it’s in the comparatively rare cases where they fall down that they fall down in different ways.
In the Scheme and Scheme-adjacent world, Racket’s pattern matcher by Sam Tobin-Hochstadt uses an optimized backtracking automaton after the technique by Le Fessant and Maranget (2001);4 extensible-match uses a decision tree after the technique by Maranget (2008). Yep, same guy! Luc Maranget is actually far from the only compiler researcher to have written papers on optimizing both techniques; but he is the only one whose papers now seem to be regarded as the somewhat definitive introduction to the state of the art for each of the two techniques.
Generating a decision tree
Maranget’s 2008 paper is heavy on notation, but mostly explains it as it goes along. It’s pretty accessible as European academia goes: if you’re good at reading and understanding other people’s code, you should be able to understand Maranget’s paper. It’s worth a read (or take a look at a shorter explanation by Colin James or David Nolen) – I’ll explain it here as it’s implemented in extensible-match, noting some (but not all!) differences to Maranget’s original presentation.
So, our input is a list of patterns and their associated actions, one for each clause. In fact, our input is always a list of rows of patterns – remember, we’re in %core-match-lambda which matches multiple patterns against multiple values, using the row-pattern AST type to group them together. Each AST pattern (and subpattern, but we’ll get to that) represents something we can do to get closer to picking a clause which matches the subject – meaning that with the row-pattern, we have a choice of multiple potential patterns to choose from. The ?-pattern, for example, represents that we could test its subject against a predicate; the var-pattern represents renaming an internal subject variable to a pattern variable visibly bound in the clause.
Let’s say we pick one pattern from the first row, since that’s the one we have to test first. (How we pick one is a topic we’ll get to later.) We’re going to act on this pattern – let’s say a ?-pattern testing pair? against our input x. What do we need to do?
What we ultimately want to do is generate (if (pair? x) <code-if-a-pair> <code-if-not-a-pair>); this gives us a hint that we also need to work out how to fill in those two slots. Well, what goes in those two slots?
The <code-if-not-a-pair> clause is easier to understand first, because if (pair? x) is false, the whole row can’t match any more: a row is an and combination of patterns, and one of the anded subpatterns just threw up false. So we can throw out that entire row and fill in <code-if-not-a-pair> with the code generated by repeating this process on that smaller list of rows with their actions.
But wait! What if the next row also contains a ?-pattern testing pair? on x – there’s no point testing that again, because it will be false a second time as well! So apart from throwing away the first row, we also walk through the remaining rows and throw away any of them which also depend on the same test before we repeat this process to generate the <code-if-not-a-pair>.
The <code-if-a-pair> is generated by a similar walk over the rows and recursion on the result. We don’t throw away the first row because in this case, that part of it did match. Instead, we remove it from the row and from any subsequent rows which make exactly the same test, preventing generating the same test again in the case where we know it succeeded as well as the case where we know that it failed.
That’s basically the core of Maranget’s algorithm. Maranget calls generating the new rows for <code-if-a-pair> specializing the patterns, and generating the new rows for <code-if-not-a-pair> defaulting the patterns – both with respect to a particular ‘constructor’, but because Scheme is different (more below), extensible-match calls these specializers. There are two base cases of the recursion: one when there are no rows left (no patterns matched; generate the code to signal a matching failure) and one when the top row only consists of irrefutable patterns (ones like _ wildcard patterns and var-patterns, which don’t have failure cases: in this case that row matched; generate the code to activate the corresponding action).
However, there’s one type of pattern in our AST which doesn’t have a failure case, but hides more subpatterns which do: the apply-pattern which calls some procedure on its subject variable and creates more subject variable(s) for the value(s) returned. Those subject variables are then available to be the subject of a subpattern of the apply-pattern.
For his equivalent of an apply-pattern Maranget makes a whole separate set of patterns when specializing, just containing the subpatterns. This probably works better for his compiler of ML patterns than it did when I tried in my compiler of Scheme patterns: for one thing, testing type (our ?-pattern) and deconstructing into values for subpatterns (our apply-pattern) are the same thing in his implementation; since ML is statically typed and pattern matching is the primitive means of deconstruction, the compiler has a lot more knowledge about the potential range of input values and what is and isn’t possible after each deconstruction step than extensible-match can get as a Scheme macro. In particular, Maranget’s patterns after each deconstruction step always form a matrix with columns for each value and rows for each clause. That’s not guaranteed by the structure of this AST (but explains why the term ‘row’ is used in extensible-match).
So instead, when an apply-pattern has been chosen as the specializer, each specialized row simply has the subpattern of that apply-pattern spliced into it. This means that each row-pattern can have a different length, depending on which apply-patterns have been specialized and had their subpatterns spliced in!
As one last point, in order to simplify the job of the procedures which pick a specializer and specialize and default the rows on it, after this splicing takes place (or more accurately before each recursive step), any rows nested within the rows are flattened out. If there were any or patterns, they’re also expanded into multiple copies of the entire patact, each with one of the or clauses, so the decision tree generator proper doesn’t need to look for specializers specially inside of or patterns either.
Representing a decision tree
For reasons that will become more apparent below and in the next episode, a decision tree is actually another kind of intermediate representation: we don’t represent it directly as Scheme code. Here’s its definition, slightly simplified:
Immediately it’s clear how much simpler this is than the AST we got as input: only four concrete node types vs nine pattern types in the AST.5 The action type from the AST also appears in decision trees, as the leaf nodes, where a matching clause has been chosen or the match failed.
?-patterns, when chosen, become dt-test nodes; apply-patterns become dt-apply nodes; quote-patterns become dt-equal nodes; var-patterns become dt-rename nodes. and-patterns, or-patterns, row-patterns, and not-patterns are reflected in the structure of the tree rather than as nodes themselves. wildcard-patterns are no-ops, as far as the pattern match compiler is concerned, and don’t show up in the tree at all.
Step by step
Here’s a worked example. Let’s take the last example from the previous episode:
(define(last ls) ; returns the last element in a list (match ls ((cons x '()) x) ((cons_ ls*) (last ls*))))
Our AST for this pattern match expression looks like this (in an entirely hypothetical notation):
((patact (row (and (? #:subject ls #:predicate pair?) (row (apply #:subject ls #:procedure car #:vars (ls.car) #:subpattern (row (var #:subject ls.car #:name x))) (apply #:subject ls #:procedure cdr #:vars (ls.cdr) #:subpattern (row (quote #:subject ls.cdr #:datum ())))))) (action return x)) (patact (row (and (? #:subject ls #:predicate pair?) (row (apply #:subject ls #:procedure car #:vars (ls.car) #:subpattern (row (wildcard #:subject ls.car))) (apply #:subject ls #:procedure cdr #:vars (ls.cdr) #:subpattern (row (var #:subject ls.cdr #:name ls*)))))) (action do-last ls*)))
We look at the first row, which only has one subpattern and thus only one potential specializer to pick: (? #:subject ls #:predicate pair?). So we generate a dt-test node which checks if ls is a pair.
In the success branch of this dt-test node, we put the result of re-running this algorithm on all of the patacts with this specializer applied. We don’t need the pair? any more in either row, so it will disappear and be replaced by the right-hand side of the and pattern which contains it. This in turn is a row in both cases; we splice it into the main row of each patact. Our specialized patacts now look like this:
As for the failure branch of dt-test: we know that the first patact can’t match any more just because this single test failed. We have to default on the second patact only – and that also required (pair? ls) to be true. So our defaulted patacts in this case are just the empty list, and the recursion on that will generate the leaf node to signal an exception because no clause matched that input.
Back on the success branch, we still have these two patacts, but this time the first row gives us a choice of what to do. We could pick the leftmost one – the car – but actually the car isn’t very interesting because it’s not refutable. The cdr, on the other hand, can fail, so let’s prioritize it. We generate a dt-apply binding the cdr of ls to ls.cdr; it only needs a success branch since we assume this can’t fail. So we specialize again, and the specialized rows, after splicing, look like this:
Still two patterns in the top row, and again, the quote pattern looking at ls.cdr is still considerably more interesting than the apply pattern, so we pick that as the specializer again. We generate a dt-equal node checking if ls.cdr is the empty list. This one does need success and failure branches.
On the success branch, the specialized patterns look like this:
We have completely eliminated the cdr patterns from the top row! Only the irrefutable car pattern is left, and when the first row is irrefutable, the pattern has matched – so we generate the dt-apply node and its dt-rename child and have it point to the action which returns x. Done!
On the failure branch of the check for a null ls.cdr, we’ve thrown away the top row again and we end up with this single patact:
Once again, this is irrefutable, so we again generate the dt-apply node for ls.car, the dt-rename node for the ls* variable, and then finish at the action.
Picking a pattern
Okay so, at each step of the pattern matching process, we pick a specializer from the first row’s patterns, generate a node from it, and specialize/default the remainder of the patterns on it. But how do we pick which pattern to choose from the row?
We have to work out which pattern is going to help us find answers the fastest, probably by comparing against the patterns in different rows somehow. We need a heuristic – a scoring function to guess which pattern is most likely to help find the matching clause in the fewest nodes possible.
Maranget considers the question of which heuristics to use, assigning letters to each of a number of potential scoring functions and measuring the impact of many potential combinations of them. In Maranget’s terms, extensible-match uses heuristic fnr: the specializer picked has to be in the first row;6 the specializer is the one which will specialize the most rows (most needed); if there’s a tie for the most needed specializer, we pick the pattern with the smallest combined size of the specialized and defaulted rows. (If even that goes to a tie, just pick the leftmost tied specializer.) I didn’t benchmark this particularly: n seemed like an intuitively sensible heuristic; based the suggestion in Maranget’s paper, I later added r as an additional tie-breaker.
Some of the heuristics Maranget considers aren’t really applicable to Scheme. His ML version can generate nodes with more outward branches than just success and failure, because the ML compiler can generate a single switch over many types at once; Scheme detects different types by calling individual predicate procedures for each individual type, so that’s not possible, and neither is the heuristic b which picks the smallest switch to build, for example – so extensible-match also can’t use the combination of heuristics Maranget personally recommends, pba.
So why was it so hard/why didn’t you do it slightly differently
As always, the hard thing was not the actual writing of the code, but working out what the right code to write was.
For example, you might ask why I wasn’t tempted to skip the step of representing decision trees as their own, reified type and simply recurse by generating new match expressions nested within the branches of an if or within the body of a let for the specializer we’ve chosen, for example.
This is, indeed, very tempting. If you’re implementing a compiler based on decision trees, you might even find some literature suggesting that you could do exactly this. Surely, you’ll think, just like those Swap nodes in Maranget’s paper aren’t actually part of how it’s really compiled – surely you can avoid generating the other nodes per se too, and just go straight to generating the code which those nodes will eventually be turned into?
So you try it, and it doesn’t work, and you regret it, and you commit it to the repository anyway as a warning to your future self and to others. Actually there were other things that were wrong with that first attempt – trying to insist that patterns form a matrix-like grid turned out to be the wrong thing for this kind of pattern input AST. My idea was that I really wanted to be able to property test the decision tree generator, ensuring that for any generated patterns and terms it always had the same result as the naïve style of generation shown above. I was thinking of decision tree generation as an optional optimization pass, rather than a fundamental transformation pass. This is the wrong mental model to have.
Anyway, the real reason that would have been the wrong approach even if it had worked is that it wouldn’t have allowed any optimization for code size or compilation speed. Wait whaaaaaat??
Optimizing space and time
Yeah, remember how I mentioned about the potentially exponential code size if you choose decision trees, but only in cases where the tricks to avoid it start to fail? It’s time to talk about those tricks.
The most important trick is sharing of nodes. When two different branches in the tree end up doing the same things, they should actually re-converge and become the same branch again. This makes the decision tree actually a decision dag (directed acyclic graph).
To achieve this sharing of nodes between different code paths, extensible-match uses two tricks. The most important one is hash consing, a trick more generally applicable in functional programming. When you have immutable structures which might be created many times over with the same contents, you can maybe save memory by storing all instances of structures of that type in a hash table. When someone asks to create a new instance of the type, first check if there’s already one existing which has the contents they’re interested in. If so, just return that existing one instead of creating a new one. (But if not, be sure to put the newly-created object into the table before returning it, in case someone asks for another one of the same in future.)
Neither R6RS’s record system nor its hash tables were really meant to support doing this, and the (extensible-match decision-tree) library has to do some pretty undignified nonsense in order to make it work. But it does work, and saves both time and space: space obviously, but also time because it stresses the garbage collector less to have fewer objects sitting around.
There’s a second layer of caching which is similar, one I mentioned I was considering using in an earlier post: the recursive procedure which generates tree nodes from a (specialized or defaulted) set of rows and actions is memoized. This has a similar effect to hash consing but, since hash consing already guarantees sharing, it optimizes compilation time and not decision tree size. Memoization avoids repeating an exponential recursion entirely in many cases. I measured how many cache hits the memo table got for some complex patterns, and how much it sped things up, and it was totally worth it. The 14 cases over three list variables which used to take 9.3 seconds to compile on Guile now comes back almost immediately.
Visualization
What’s the most common target language for compilers? x86? The JVM? C? JavaScript? If I were a betting lass, I might be tempted to put money on the answer being DOT, the GraphViz file format. If you want to understand what your compiler is doing, the easiest way to see it is often to actually have it draw your control-flow graph, automaton, or decision dag in two dimensions. Outsourcing the actual drawing task to GraphViz is a nice sweet spot in the trade-off between getting it working and making it nice.7
Initially I tried to see what was going on with a very silly (non-hygienic) code generator outputting Scheme code, but this quickly got tiring to read, especially because it couldn’t show sharing in the dag. (How the real code generator represents shared nodes is a topic for next time.) With a 2D graph, it’s relatively easy to get a rough impression on first glance of how well the decision tree generator handles certain cases; by following the arrows, you can fairly easily check for anything it’s doing wrong.
I’ve kept an archive of older DOT files showing how the pattern match compiler has got more capable over time, going back to some of the first decision trees this code ever generated, before it was even hooked up to a code generator. (This DOT file is dated 2 December 2024, and I didn’t get the code generator working until Candlemas.) This lets me see how certain changes improved or disimproved the generated trees over time.
Originally, to review these DOT files, I had to manually copy the generated DOT source into a file and invoke the dot command line tool. But recently I thought, hey, I have a terminal emulator which can display inline graphics, so why not just show them directly in the REPL? So I wrote a little library to do just that.
Source code of the library which plops a drawing of a graph straight into my terminal
(library (extensible-match dot) (export show-dot) (import (rnrs (6)) (only (guile) mkstemp port-filename waitpid) (srfi :39) (ice-9 popen)) (define-syntax show-dot (syntax-rules () ((_ expr) (let* ((dot-port (mkstemp "/tmp/dot-XXXXXX")) (dot-file (port-filename dot-port)));; It was probably a mistake that decision-tree->dot writes;; to current-output-port, but I make the best of it: (parameterize ((current-output-port dot-port)) expr) (close-port dot-port) (let-values (((from to pids) (pipeline `(("dot""-Tpng" ,dot-file);; I tried out both sixel with img2sixel;; (more widely supported) and the inline;; bitmap functionality of iTerm2 (my;; terminal emulator of choice) with this;; imgcat program. Sixel was very fragile;; when the terminal window got resized (the;; bitmap area would just unrecoverably;; blank out sometimes), so I went with;; imgcat. It’s for my own use only, anyway,;; and not too hard to change if needed: ("imgcat""--iterm2"))))) (close-port to) (let ((stdout (standard-output-port))) (put-bytevector stdout (get-bytevector-all from)) (flush-output-port stdout)) (close-port from) (for-each waitpid pids)))))))
Enough of the talking, what do these decision trees actually look like, then? Here’s the drawing for the last example we worked through above.
The shapes of the nodes are the classic flowchart style: diamonds for decisions, rectangles for steps, and circles for termination.
Note that the do-last path also generates a call to (car ls) even though it doesn’t use the value of the car. In theory, it should be smart enough not to do this, but I decided for now to leave that optimization to the Scheme compiler on the theory that it can probably do a better job of it anyway.
For another simple example, this is a type-safe list merge.
(define list-merge (match-lambda (('() '()) '()) (((?pair? xs) '()) xs) (('() (?pair? ys)) ys) (((cons x xs*) (cons y ys*)) (if (< y x) (cons y (list-merge (cons x xs*) ys*)) (cons x (list-merge xs* (cons y ys*)))))))
You can see the distinction between subject variables (arg1, arg2, arg1_car, arg1_cdr, arg2_car, arg2_cdr) and pattern variables. You can follow through and see how it avoids repeating the pair? check on both arguments for the final clause – there are two different pair? checks on arg2, but they’re in mutually exclusive code paths. There’s a long chain of dt-apply (receive) and dt-rename (let) nodes at the end, because the decision tree generator puts off those steps, which don’t affect whether the patterns match or not, until the end.
For a more complex example, let’s look at Chris Okasaki’s red-black tree balancing pattern (paper, book). This is probably pattern match compiler authors’ favourite example to use, because
it’s nontrivial
it’s real and not contrived
it’s a ‘hot’ function which gets called log2n times for every insertion into a tree of size n, making it important to optimize well
it’s a pretty amazing example of using pattern matching to translate reasoning about data structure invariants directly into running code, in a way that would be difficult without pattern matching
What does it do? Basically a red-black tree stays relatively well balanced by colouring the levels of nodes red or black, and maintaining two properties about where coloured nodes are allowed to appear relative to one another. One of these rules is that no red node is allowed to touch another red node: black touching black is okay; red touching red isn’t. So Okasaki works out that after finding a place to insert a new red node, there are exactly four ways in which a tree can end up ‘unbalanced’ in having two adjacent red nodes; in that case the tree needs to be ‘rotated’ a little to restore balance.
He writes that like this (but in ML):
(define(balance n) (match n ((or (node 'black (node 'red (node 'red a x b) y c) z d) (node 'black (node 'red a x (node 'red b y c)) z d) (node 'black a x (node 'red (node 'red b y c) z d)) (node 'black a x (node 'red b y (node 'red c z d)))) (node 'red (node 'black a x b) y (node 'black c z d))) (_ n)))
a, b, c, and d are values in the tree; x, y, and z are child nodes. The sort order of the values and the values of the child nodes is a < x < b < y < c < z < d. Visually explained, the job of this procedure is to turn one of these wrong trees:
Into this right tree:
Here’s what the decision dag looks like:
You can see that nodes are shared (1, 2, 3, 4) to reduce the overall size of the graph. This may look big and complicated, but at run time it’s efficient because no test or extraction will ever be repeated on any given code path. There are 13 branch points, the same number as in Colin James’s independent OCaml implementation of Maranget’s algorithm, a nice confirmation that my implementation is doing the right thing :-)
Can you imagine trying to write out the code for this pattern match by hand? You don’t have to imagine, because Rich Hickey implemented this data structure in bad old Java as part of Clojure. His version is spread all over the place because old-school Java just lacked the features to make writing this function nice. (I’m not even sure new-school Java could do much better.)
For an example where things start to fall down in Scheme, consider Maranget’s bytecode interpreter for a miniature dialect of ML. In figures 6 and 7 of his 2008 paper – which really ought to be shown as motivating examples on the first page – Maranget showed how a bad and a good decision tree for this can look in OCaml, a statically-typed language.
First of all, here’s roughly what the run function would look like in Scheme:
You can already see that we’re running up against the limits of what it’s sensible to do in Lisp notation here.
What does it look like when compiled? I shared what it looked like in the very first working versions of the decision tree generator above … and today it looks like this. To be sure, there’s improvement there (196 nodes in the December 2024 version, 135 nodes today). But it’s also very far from the nice, elegant dag of just a dozen or so branches which Maranget shows in his paper, even accounting for the fact that his version can switch over many types at once and mine can’t. What gives?
Well, here dynamic typing messes up opportunities to do better. If you trace down the graph from the top, it looks good until you get to the third case. Once the pattern matcher knows that the current instruction (the first element in the list c for code) is an iop but the top of the stack s isn’t a val, it should jump straight to fail because there is no situation in which any of the other patterns can match. But it doesn’t know enough to know that; what if something might match both iopandtest, or pushenv? A lot of these nodes are therefore dead, and will never execute except to go a long, slow way to a case that should never happen when this function is actually running anyway. I’ve been meaning to study in details exactly which branches are impossible, and see if it would be possible to do anything at all better here, like allow users to declare which predicates in ? are mutually exclusive. (Initial results on that particular feature weren’t exactly encouraging either. Update, 30 November: Scratch that! I just tried it again and it works great. 86 nodes, down from 135. I must have forgotten to clear the cache of pre-compiled code before the last time I tested it, or something. The problem now is that a complete implementation of this would require implementing something like Datalog …) It does do some very basic stuff to avoid generating redundant nodes like this, but not nearly as much as an ML compiler can.
As the folklore predicts, Racket’s backtracking automaton does better here for code size but worse for speed, generating many checks that c is a pair and many separate extractions of its car and cdr. It only does these things once for cases 1 and 2, but again for case 3, again for case 4, again for case 5, and only gets back on a roll again with cases 6, 7, and 8 …
Is all of this worth it?
Last of all, here’s the hand-wringing section where I ask whether I should have bothered with any of this!
In 2000 Kevin Scott and Norman Ramsey wrote a draft paper asking ‘When Do Match-Compilation Heuristics Matter?’; their answer is ‘almost never’. Out of 18 program benchmarks, only five were affected in decision tree size by the use of different heuristics, and three of those were made worse by choosing heuristics other than simple left to right selection of patterns. More encouraging is that 13 of the 18 benchmarks were artificial benchmark programs, but one of the five where heuristics did make some difference was a real-world program (the SML/NJ compiler).
Maranget’s results measuring heuristics may be more representative because he created a corpus of match expressions to test on specifically designed to represent a wide range of sizes and matching techniques; and because he actually used his own decision tree generation algorithm, the one extensible-match’s is based on. He finds a larger difference than Scott and Ramsey – about 20% improvement in code size and 10% improvement in average path length for any individual heuristic, compared to just naïvely picking the leftmost pattern in each row every time. As a reflection on how Maranget’s corpus might not be exactly representative, though, the heuristics only score about 15% better for code size and about 3% better for average path length than naïvely picking the rightmost pattern in each row.
Still, these are fairly slim pickings. As mentioned above, Guile’s optimizer is already capable of taking a naïve pattern match compilation and doing optimizations equivalent to generating a backtracking automaton that always picks the leftmost pattern for each operation; basically on the bytecode machine example it would probably do about as well as Racket’s match compiler, even without any help from the macro. The benchmarks by Scott and Ramsey suggest this should be good enough for almost every real use case. If Guile ever ships SRFI 262 in the standard distribution, it might well not be worth the extra weight in code to include a heuristic-driven decision tree generator; maybe that version should go back to just generating the naïve kind of compilation shown in the very first example above, and should let Guile’s own optimizer do the work. Changing strategy to the automaton style in that case also seem like the right thing, given that any version shipped with Guile would probably also end up in Hoot, which has its own reasons for optimizing code size. For other Scheme implementations which don’t (yet) do the online common subexpression elimination pass Andy Wingo developed, the winnings will definitely be bigger – although still diminishing if online CSE becomes more common.
Next time
Whew, this was a long one. Next time will be a comparatively short conclusion, on the code generator.
Especially considering personal matters got in the way of completing this article promptly. Sorry about that!↩︎
A transport/railway simulation game like Transport Tycoon that ran in a text terminal. That makes it sound a lot cooler (and more fun, and more functional) than it actually was.
If you want to learn how to do pathfinding in a game these days, that tutorial still isn’t bad, but you might have more fun with Amit Patel’s interactive walkthrough including other graph search algorithms.↩︎
This example is from Philip Wadler’s chapter ‘Efficient Compilation of Pattern Matching’ in Simon Peyton Jones’s book The Implementation of Functional Programming Languages, renamed after a Unix command line utility which hackers are probably at least passingly familiar with, and with an explicit comparison predicate argument because Scheme is monomorphic. The rest of this article has absolutely nothing to do with the contents of that chapter.↩︎
Actually it’s five vs ten because I’m still omitting the seq-pattern and its decision tree equivalent dt-seq here.↩︎
This is actually a requirement of the semantics of SRFI 262: all implementations have to use something like f as their first ‘heuristic’. This isn’t necessary in ML because the type testing functions the compiler notionally generates calls to are internal to the ML implementation and known to have universal domain; in SRFI 262 it’s possible to use a previous row as a type check that a procedure in a subsequent row will be safe to call.↩︎
In August, I attended the DWeb Seminar
where a small group of builders gathered to discuss the
state-of-the-art and open problems in the distributed web space. Some
in the group are primarily concerned with distributed data and focus
on sync algorithms and local-first use cases. I am mainly concerned
with distributed behavior and focus on the object capability
security model. Both areas of study are steeped in their own lore and
research papers, which makes it difficult for the two camps to
communicate effectively with each other.
It is in the interest of unity that I write this blog post. I will
show how distributed behavior and data techniques can be composed to
build local-first applications that combine the strengths of each
paradigm.
Fortunately, the distinction between behavior and data is a false
dichotomy; they are two sides of the same coin. This circular
relationship is well understood in the Lisp world where we say that
“code is data and data is code� in reference to Lisp’s
homoiconic syntax.
Messages are both behavior and data. They invoke behavior but are
also encoded as a string of bytes and sent across the wire. Our
context within the tower of abstraction determines how we look at a
message. What is treated as data at one abstraction level may be
treated as behavior in another. We need to be equipped to handle both
cases.
I’ve crystalized all that I’ve learned recently into a small prototype
that combines the following techniques:
I started down the well-trodden path of making a local-first group
chat application. Seemingly every other DWeb-adjacent project has
one, after all, so why shouldn’t Spritely? I then branched off and
went down my own trail, trying to compose tools in ways I hadn’t quite
seen before in this context. The result is Brassica
Chat!
Brassica Chat is written in Scheme, is built upon
Goblins, our distributed
programming environment, and uses
Hoot to compile it all to
WebAssembly so it can be used on the web. Besides just posting
messages, it also supports some of the usual features like
editing/removing messages and emoji reacts. 🚀
Demo time!
Below is an embedded and simplified demo of Brassica Chat that
simulates a conversation between Alice, Bob, and Carol. Messages are
sent over a fake network and each user’s network access can be toggled
with a button to simulate network partitions and offline usage. Alice
is the chat room creator and has the privilege to edit/remove any
post. Bob and Carol can only edit/remove their own posts. Okay,
hopefully that’s enough context. Try it out!
Let’s examine the scenario modeled in the demo more closely. First,
Alice creates a new chat room on her computer. She then shares a
capability with her friend Bob and another (distinct) capability with
Carol that grants them the privilege to send messages to her chat
room. Bob and Carol reciprocate by giving Alice capabilities to their
respective chat room copies. The resulting network is shown in the
diagram above.
Note that Bob and Carol are not directly connected to each other but
rather indirectly connected through Alice. This is because Bob and
Carol did not exchange capabilities with each other. This is okay!
They can all still chat with each other in real time as long as Alice
is online. When Alice goes offline, Bob and Carol can still send
messages locally. Everything done while in offline mode will be
synchronized once Bob and Carol can connect to Alice again. Perhaps
Bob and Carol will exchange capabilities with each other later so they
can still chat in real time when Alice is offline. The important
detail is that Brassica Chat does not try to wire everyone together
directly without the active consent of its users.
Each user in the system has a cryptographic identity in the form of a
public/private key pair. This key is used for signing messages. In
addition to the key, an identity also contains a human-readable,
self-proposed name for displaying in the user interface.
Each chat room is an eventually-consistent replica of the
distributed chat room state managed using a collection of CRDTs. Chat
rooms can propagate locally created or remotely received messages to
other replicas of the chat room for which it holds a capability. The
replication process works to eventually achieve convergence across all
reachable replicas.
At a meta level, these replicas can be thought of as forming a single,
conceptual chat room actor. To use some ocap jargon, the chat room is
an unum
where each presence (replica) communicates by broadcasting messages
to the other presences it knows about. In the diagram above, there’s
a dotted line drawn around the three replicas to indicate that the
chat room is an abstract entity whose canonical form does not live
on any single machine. The presences are all co-equal; no single
presence has more privilege than any other.
The stack
There are four levels of abstraction in the Brassica Chat
architecture. From bottom to top, they are:
Object capabilities: online access control through reference
passing.
Actors: online, asynchronous messaging through object references.
CRDTs: eventually consistent, offline messaging.
Authorization capabilities: offline access control through
certificate chains.
All objects in the application are represented as actors, including
CRDTs. Implementing CRDTs as actors has been done elsewhere,
Akka
being a notable example.
A reference to an actor is an object capability. In other words,
holding a reference to an actor gives you the authority to send
messages to it. An actor needs to be online in order to receive
messages, however. For offline usage, an object capability variant
known as an authorization or certificate capability is used, as
well.
Messages are sent between machines using the Object Capability
Network (OCapN) protocol, which handles the burden
of secure message transport. Messages can be transported over any
medium with an associated OCapN netlayer. For this prototype, I used
a
WebSocket
netlayer with a relay in the middle. The CRDT implementation has its
own messaging protocol which is defined using actors so that it
automatically works over OCapN.
On capabilities
Brassica Chat’s use of capabilities stands in contrast to most
existing local-first applications that use the access-control list
(ACL) model. In the ACL model, users are associated with groups or
roles that grant privileges. When compared to capabilities, the ACL
model has many deficiencies:
ACLs are too coarse-grained. It’s difficult to follow the
principle of least
authority
with a limited set of role-based privilege levels so the norm is for
users to have more privilege than is necessary. By contrast,
capabilities can be arbitrarily fine-grained. Want to make it so
that Bob can only moderate Carol’s posts and not Alice’s? It’s easy
and natural to make a capability for this but awkward to define a
one-off ACL role.
ACLs can’t be safely delegated. Only an administrator may grant or
revoke privileges. As a non-admin, your only option is to share
your credentials, which is unsafe and hard to audit. Credential
sharing happens often in the real world due to the friction involved
in doing things “the right way�. With capabilities, it is easy to
delegate a subset of your authority to someone else in an
auditable, revokable manner without sharing your own credentials or
communicating with a central authority.
Most importantly, ACLs have inherent
vulnerabilities,
such as the confused deputy
problem.
The “if you don’t have it, you can’t use it� approach of
capabilities avoids an entire class of security bugs.
In short, capabilities are safer, more expressive, and more
decentralized than ACLs. Now, let’s move on to some implementation
details.
The chat room actor
The chat room actor is implemented as a composition of several CRDTs.
Rather than using one giant CRDT for the entirety of a chat room’s
history, it is partitioned by time into a set of chat log CRDT actors.
Each partition covers some uniform number of seconds of real time
known as the “period�. This means that all presences must use the
same period value in order to converge properly (30 minutes was chosen
as a reasonable default). The benefit of this partitioning strategy
is that it allows each replica to perform garbage collection (GC) on
entire chunks of history without coordinating with the other replicas
(GC within a CRDT requires coordination). This ought to keep the
append-only log for any individual chunk of history quite small and
manageable. Rebuilding the state of a previously deleted chunk from
scratch shouldn’t take much time, assuming there is another replica
online with the data. For this prototype I didn’t bother to GC old
message history as the chat rooms are ephemeral and not persisted to
disk (but we could use Goblins’ persistence
API
to do so in the future).
In addition to the message log partitions, there are two additional
CRDT actors that make up the chat room: profiles and certificates.
The profiles CRDT contains a mapping from a user’s public key to their
self-proposed display name (and could later be extended to include
other metadata that a user would like to share with the room). The
certificates CRDT contains the set of all zcaps that have been issued
for the chat room.
The CRDT actors
CRDTs can be roughly divided into two categories: state-based or
operation-based. Brassica Chat uses operation-based CRDTs, which
can be thought of like a Git repository with automatic conflict
resolution. Each replica of an operation-based CRDT maintains an
event log containing all of the operations that have occurred. Due to
concurrency in distributed systems, an event may have one or more
direct causal predecessors (a fancy term for “parents�). Thus, the
log entries form an append-only, directed acyclic
graph (DAG),
as shown in the diagram above.
An event has the following immutable fields:
ID: Unique ID of the event (SHA-256 hash).
Parent IDs: IDs of all causal predecessors (forming a DAG).
Timestamp: Timestamp from a hybrid logical
clock
indicating when the event occurred.
Author: Creator of the event (ed25519 public key).
Signature: Crytographic signature of the event.
Blob: Syrup encoded event data
(Syrup is the binary serialization format used by OCapN).
Events are delivered in causal order, meaning that an event is not
applied to the CRDT’s internal state until all of its predecessor
events have been applied. Concurrent events may be applied in any
order, so it’s important that operations on the CRDT state are
commutative. Despite causal order being encoded in the event graph,
a logical timestamp is included in each event. This is important for
handling concurrent events and is used to implement common CRDT
patterns like the “last write wins�
register.
Brassica Chat contains a generic operation-based CRDT actor with
prepare, effect and query hooks (straight out of the CRDT
literature) for special-purpose CRDTs to implement. The CRDT actor is
used as the basis for the chat log, certificates, and profiles actors.
This CRDT implementation, though on the simple side, is Byzantine
fault tolerant. A
Byzantine fault is best explained by the following scenario: Mallet, a
user who is up to no good, sends Alice and Bob an event with the same
ID but different contents. When Alice and Bob sync data with each
other, they ignore events with IDs that they already have and don’t
realize that Mallet has tricked them. The result is that Alice and
Bob will never converge to the correct state because their message
logs contain different operations.
Divergence due to Byzantine behavior is prevented through
content-addressing and cryptographic signing of events, much
like Git, as described in Martin Kleppmann’s “Making CRDTs Byzantine
Fault Tolerant�
paper. Mallet cannot send Alice and Bob events with the same ID but
different contents because the ID is the hash of the contents and if
the hash doesn’t match then the event is rejected. Events are signed
to associate them with the author for use with the authorization
capability system and the parent IDs are incorporated into the
signature to prevent replay attacks. For this prototype, SHA-256 was
chosen for the hash function and ed25519 for signatures.
Any number of Byzantine replicas may be in the network, but as long as
Alice and Bob can directly connect to each other, or indirectly
connect through a non-Byzantine node such as Carol, the well-behaved
nodes will eventually converge to the correct state. While not
implemented in this prototype, detection of Byzantine behavior from a
replica could be used as the basis for revoking the object capability
being used to send such messages, adding a layer of accountability to
the system.
Authorization capabilities
With CRDTs in the mix as an offline messaging layer, object
capabilities alone are insufficient for access control. The ocap
layer controls access to synchronize chat messages between two
replicas but it does not (and cannot) control what those messages
contain. Why is that? Because the chat messages are at a higher
level of abstraction than the actor messages for which the ocaps
apply. When Bob writes the message (react alice-message-1 "👋") to
his local replica, he is sending a message to the abstract chat room
that doesn’t exist in any single location. What if Alice wanted to
prevent Bob from reacting to messages? Who even has the authority to
impose that restriction when there’s no central server? We’ve traded
away strong consistency to support local-first usage, so there is no
way for an adminstrator to install an ocap on all replicas such that
they are all guaranteed to reject this message from Bob and converge
to the same state. Ocaps are online capabilities, but CRDTs use
offline messaging. We need an offline capability that can be used
to process the offline messages.
This is where authorization capabilities (zcaps) come in. A zcap
is a signed certificate that describes what actions a controller of
that certificate may perform. Like ocaps, zcaps support delegation
which is represented as a chain of signed certificates. A crucial
property of a delegated zcap is that it cannot expand privilege, only
reduce it. Certificate chains need to bottom out somewhere, so we
need to decide upon a root signer. In Brassica Chat, the initiator of
the chat room (Alice in our example scenario) is considered to be the
root signer for all zcaps used in the chat room. This is just a
convention, though, and a user could decide to place their trust in a
different root signer.
Certificates in Brassica Chat are inspired by
ZCAP-LD and are composed of
the following immutable fields:
ID: Unique ID of the certificate (SHA-256 hash).
Parent ID: ID of the previous certificate in the delegation
chain.
Signer: The public key used to sign the certificate. The signer
must be a controller of the parent certificate to be considered
valid.
Controllers: A list of public keys for the users who are allowed
to invoke the capabilities of this certificate.
Predicate: An expression that constrains (or attenuates, to
use the ocap term) the capabilities granted by the parent
certificate. For example, the expression (when-op (edit delete) (allow-self)) says that edit and delete operations can only be
used on posts authored by the user invoking the capability (one of
the controllers).
Certificates also carry one piece of mutable state: a flag that the
signer can flip from false to true to revoke the certificate.
Revocation cannot be reversed, making this a trivially monotonic
operation within the certificates CRDT.
At first glance, zcaps might appear to have the same problem as ocaps:
a zcap cannot prevent Bob from sending a message that is not permitted
because there’s no strong consistency. Instead, zcaps specify the
rules by which well-behaved clients should interpret the events that
have occurred. For example, Bob can send a message that edits the
contents of Carol’s post, but if the zcap Bob used for that operation
does not grant the capability to edit posts authored by Carol then
that edit will simply be ignored when updating the chat room state on
a given replica. Since zcaps are encoded as certificate documents,
they can be synced amongst all replicas so that the user interface can
eventually render the correct view of the chat room. This is a good
example of something treated as data at one level of abstraction but
behavior at another.
Security considerations
The security implications of sharing a capability to a chat room are
rather large. If Alice, Bob, and Carol have replicas of the same chat
room then sending Alice a message means indirectly sending Bob and
Carol messages, too. Each presence of the chat room is co-equal with
all other presences, after all. As a consequence, we cannot perform
administration in a centralized manner like we could if there was a
single canonical chat room actor living on a single machine.
Revocation, for example, is now a communal effort. If Mallet can
propagate messages through Bob and Carol (because Mallet holds a
capability to both) then Bob and Carol must each revoke their
respective capabilities in order to prevent Mallet from sending
messages to the chat room in the future. While it’s possible to
create a zcap that would cause Mallet’s messages to be ignored by
clients, it doesn’t change the fundamental truth that Mallet has the
capability to send messages to the chat room until such a time that
all previously issued ocaps have been revoked. The formation of
complete networks, where each replica holds a capability to sync with
every other replica, is thus discouraged in this design. The
connectedness of a replica is a function of how trusted the user of
that replica is in the real world social group. The more strongly
connected a user is, the harder it becomes to remove them later if the
social dynamic changes. There is a tension between the risk imposed
by a strongly connected network and the desire to maximize
availability of the chat room for online users.
The overall security goal for this prototype was to prevent Mallet
from irreparably destroying the shared state of the chat room, which
was achieved through Byzantine fault tolerance. Additionally, message
signing and zcaps provide a means of holding Mallet accountable for
anti-social/malicious actions that the system is technically incapable
of preventing, giving users some agency over what they see in their
client interface. Is this good enough?
Things left undone
This prototype was focused on exploring the core of a minimally viable
p2p chat built on capability security principles. It is not
production software. I did not concern myself with optimal bandwidth
nor memory usage. As mentioned earlier, chat history is not even
saved to disk.
Some areas for improvement are:
Decentralized identity and naming. This was deliberately left out
to keep the scope of this experiment manageable. Spritely has
another project, codenamed Brux, to explore this topic. See also
our paper on
petnames.
Ergonomic UI/UX for the complexity introduced by decentralization
and eventual consistency. What’s a user-friendly way to add and
revoke ocaps and zcaps? The UI doesn’t even attempt to allow
viewing or editing zcaps right now. How can we clearly communicate
what the security properties are/aren’t so that users don’t get
false impressions?
History rewriting. If Mallet writes some truly terrible content to
the append-only chat log, it’s stuck in there even if it’s hidden in
the user interface. Introducing some amount of synchronization to
deal with this scenario seems okay. We could take inspiration from
Git where the commit graph is append-only but branch names are
mutable pointers.
Preventing new members from reading past messages like in Signal
groups. This should be an option like it is in other secure chat
programs, but it’s a complex topic and exploring it was out of
scope.
Conclusion
I hope this was an interesting walkthrough of how ocaps, actors,
CRDTs, and zcaps can be composed with each other! Big thanks to the
DWeb Seminar organizers for providing the spark of inspiration I
needed to dive into the CRDT literature and build this prototype.
posted by Matthew Flatt, Ryan Culpepper, Robby
Findler, Gustavo Massaccesi, and Sam Tobin-Hochstadt
With the version 9.0 release, Racket includes support for
shared-memory threads that can take advantage of multicore hardware
and operating-systems threads to run in parallel—
not merely
concurrently with other Racket threads, as was the case in versions before 9.0.
Creating a thread that runs in parallel is as simple as adding a
flag to the call to thread.
To see the effect, try first putting the following code into a file named
"thread.rkt" and running racket thread.rkt on the
command line:
Racket will find many square roots (tweak N to match your
machine), but will keep only one core of your CPU busy. Using
time in the shell reports “CPU” (possibly broken into
“user” and “system”) and “real” times that are similar. To
use two cores, add #:pool'own to the thread call:
In this case, real time should be about half of CPU time, while CPU
should remain similar to before. In other words, the parallel version
runs twice as fast. On the machine used below:
concurrent
parallel
×1
1011 msec real
979
msec CPU
10
msec CPU for GC
×2
517 msec real
1021
msec CPU
13
msec CPU for GC
Passing the new #:pool argument
creates a parallel thread; create pools via
make-parallel-thread-pool to have a group of threads share
processor resources or just pass 'own to have the
new thread exist in its own
parallel thread pool.
As a further addition to thread, a #:keep'result
argument keeps the result of thunk when it returns, instead
of discarding the result. Retrieve a thread’s result with
thread-wait. So, for example,
runs thunk in parallel
to other Racket threads, blocks the current Racket thread (while
allowing other Racket threads to continue, even non-parallel ones),
and then returns the result value(s) when thunk completes.
To maintain backwards compatibility, the thread function still creates a coroutine
thread by default, which is a lightweight thread that is preemptively
scheduled and whose execution is interleaved with other coroutine
threads. For many tasks that need the organizational benefits of
concurrency without the performance benefits of parallelism, such as
when managing GUI interactions or orchestrating remote processes,
coroutine threads are still the best abstraction. Coroutine threads
can use #:keep'result, too.
Racket’s full thread API works with parallel threads. Follow the links
from the thread documentation to see more details on thread
pools and for more interesting uses. Of course, just because you put
tasks in parallel threads doesn’t mean that they always speed up,
as sharing and communication can limit parallelism. Racket’s
future
visualizer works for parallel threads, tho, and it can help you
understand where synchronization in a task limits parallelism.
Also, adding parallelism to Racket potentially creates trouble for existing
libraries that were not designed to accommodate parallelism. We expect
problems to be rare, however.
We’ll explore the performance details and explain why we expect most
programs will continue to work well later in this post, but first:
Racket’s Road to Parallelism
Running threads in parallel counts as news in 2025?! Well, it has been a long
road.
Racket’s implementation started in the mid-1990s, just as a wave of
enthusiasm for parallel programming was winding down. Although
operating systems by that point consistently supported within-process
threads, computers with multiprocessors were not commonly available.
Many language runtime systems from the same era—
including Python,
Ruby, and OCaml—
took advantage of the internal simplicity of a
single-threaded runtime system while offering constructs for
concurrency at the language level. Racket has always
included threads for concurrency, and it was an early adopter of
Concurrent ML’s
abstractions for managing concurrency well. But an absence of
parallelism was deeply baked into the original implementation.
Over time, to provide support for parallelism, we added
places and futures to Racket. Places support
coarse-grained parallelism through a message-passing API, effectively
running parallel instances of the virtual machine within a single
operating-system process; limited sharing makes the implementation
easier and safer than arbitrary sharing between parallel threads.
Futures provide fine-grained parallelism for restricted computations;
a future blocks when it tries to perform any operation that would be
difficult for the runtime system to complete safely in parallel.
Places and futures are both useful, and they avoid some pitfalls of
shared-memory threads. Still, fitting a parallel task into futures or
places usually requires special effort.
Meanwhile, single-threaded execution was only one of the problems with
the original Racket (a.k.a. PLT Scheme) implementation. To address
larger problems with the implementation and to improve performance, we
started in 2017
rebuilding
Racket on top of Chez Scheme. Rebuilding took some time, and we only
gradually deprecated the old “BC” implementation in favor of the new
“CS” implementation, but the transition is now complete. Racket BC
is still maintained, but as of August 2025, we distribute only Racket
CS builds at https://download.racket-lang.org.
Chez Scheme is a much better foundation for improving parallelism in
Racket. Part of the Racket-rebuilding effort included improving Chez
Scheme’s support for parallelism: we added memory fences as needed for
platforms with a weak memory-consistency model, and we parallelized
the Chez Scheme garbage collector so that garbage collection itself
runs in parallel. There’s still plenty of room for improvement—
the
garbage collector is only parallel with itself, not the main program,
for example—
but further improvements are more within reach than
before. Equally important, the rebuild included new implementations of
the Racket thread scheduler and I/O layer in Racket itself (instead of
C). Because of these improvements, Racket’s futures worked better for
parallelism from the start in Racket CS than in Racket BC.
With version 9.0, we finally take advantage of new opportunities for
parallelism created by the move to Racket CS. Internally, a parallel
thread is backed by combination of a future and a coroutine thread.
The main extra work was making Racket’s coroutine thread scheduler
cooperate more with the future scheduler and making the I/O layer safe
for Chez Scheme threads—
all while making locks fine-grained enough
to enable parallelism, and keeping the cost of needed synchronization
as low as possible, including for non-parallel Racket programs.
Performance
Here are some simple benchmarks on an M2 Mac to give a sense
of the state of the current implementation. This machine has 8
cores, but 4 big and 4 little, so ×4 speedup is possible with
4-way parallelism but less than ×8 with 8-way parallelism.
As an easy first example, we should expect that a Fibonacci [code] run of
1 iteration in each of 4 coroutine threads takes the same time as
running it 4 iterations in 1 thread, while 1 iteration in each of 4
parallel threads should take about 1/4th of the time. Also, for such a
simple function, using plain old futures should work just as well as
parallel threads. That’s what we see in the numbers below.
Times are shown as a speedup over single-threaded, then in
real elapsed milliseconds, with CPU milliseconds as the upper smaller
number to the right, and CPU milliseconds that are specifically for GC
as the lower smaller number to the right. The times are from a single
run of the benchmark.
(fib40)
real msec
CPU
GC
N
sequential
coroutine
parallel
futures
1
×1
511
493
0
×1
506
490
0
×1
494
494
0
×1
495
495
0
4
×1
2045
1978
0
×1
2034
1967
0
×3.7
554
2210
0
×3.8
545
2168
3
8
×1
4154
4021
0
×1
4154
4021
2
×5.4
776
5928
1
×5.2
796
6006
2
Of course, most programs are not just simple arithmetic. If we
change our example to repeatedly convert numbers back and forth
to strings as we compute Fibonacci [code], then
we can see the effects of the more complex conversions. This version
also triggers frequent allocation, which lets us see
how thread-local allocation and parallel garbage collection scale.
(strfib*32)
real msec
CPU
GC
N
sequential
coroutine
parallel
futures
1
×1
204
197
3
×1
205
198
0
×1.1
192
192
0
×1
211
203
0
4
×1
826
796
10
×1
808
780
2
×3.7
222
861
10
×3.7
221
857
10
8
×1
1619
1563
11
×1
1602
1545
4
×3.9
419
2544
59
×4
406
2551
59
From this table, we still see reasonable scaling up to four cores,
but the additional work and the use of the garbage collector limit
scaling beyond that point.
That first string variant of Fibonacci includes a slight cheat,
however: it goes out of its way to use a string->number*
wrapper that carefully calls string->number in a way that
avoids evaluating expressions that compute the default values of some
arguments. The defaults consult the
parametersread-decimal-as-inexact and
read-single-flonum—
which a perfectly fine thing to do in
general, but turns out to block a future, because parameter values
can depend on the current continuation. In contrast, parallel threads
continue to provide a benefit when those kinds of Racket constructs
are used. We can see the difference by using plain
string->number in place of string->number*, which
will fetch parameter values 14 million times in each individual run of
(strfib32):
(strfib32)
real msec
CPU
GC
N
sequential
coroutine
parallel
futures
1
×1
772
751
4
×1.3
578
562
1
×1.1
721
721
1
×0.9
873
851
0
4
×1
3169
3085
12
×1.3
2364
2303
6
×4
797
3103
33
×0.8
4164
4058
2
8
×1
6409
6225
14
×1.4
4730
4608
13
×4.3
1493
9166
197
×0.8
8353
8135
4
The coroutine column here also shows an improvement, surprisingly.
That’s because a coroutine thread has a smaller continuation than the one in
the sequential column, and the cost of fetching a parameter
value can depend (to a limited degree) on continuation size.
The effect of parallel threads on this kind of program is
more consistent than fine details of a continuation’s shape.
Operations on mutable equal?-based
hash tables
[code] are another case where
futures block, but parallel threads can provide performance
improvement.
(hash-nums6)
real msec
CPU
GC
N
sequential
coroutine
parallel
futures
1
×1
193
184
1
×1
190
182
0
×1
186
186
0
×1
191
182
0
4
×1
767
733
5
×1
763
729
0
×3.7
208
824
3
×1
763
729
0
8
×1
1541
1473
11
×1
1532
1463
1
×4.5
346
2373
24
×1
1539
1470
1
As an illustration of the current limitations of parallel threads in
Racket, let’s try a program that writes data to a byte-string
port then hashes it
[code].
(hash-digs7)
real msec
CPU
GC
N
sequential
coroutine
parallel
futures
1
×1
127
123
3
×1
127
123
4
×0.9
135
135
3
×1
126
122
3
4
×1
503
487
13
×0.9
536
519
35
×2.5
201
599
38
×1
520
504
28
8
×1
1022
986
33
×0.9
1097
1057
81
×2.5
403
1630
85
×1
1049
1017
60
Here we see that parallel threads do get some speedup, but
they do not scale especially well. The fact that separate
ports are not contended enables performance
improvement from parallelism, but speedup is limited by some
general locks in the I/O layer.
Even further in that direction, let’s try a program that
hashes all files in the current directory [code] and
computes a combined hash. When run on the "src" directory of
the Racket Git repository, most of the time is reading bytes from
files, and locks related to file I/O are currently too coarse-grained
to permit much speed-up.
(hash-dir)
real msec
CPU
GC
N
sequential
coroutine
parallel
futures
1
×1
170
169
1
×1
169
169
0
×0.7
256
248
0
×1
170
170
0
4
×1
692
690
3
×1
662
658
3
×1.3
515
1068
5
×1
681
679
3
8
×1
1393
1377
10
×1.1
1293
1290
6
×1.6
868
2158
18
×1
1368
1366
7
Having locks in place for parallel threads can impose a cost on
sequential programs, since locks generally have to be taken whether or
not any parallel threads are active. Different data structures in
Racket use specialized locks to minimize the cost, and most benchmarks
reported here run report the same numbers in sequential column in Racket v8.18 (the
previous release) and Racket v9.0. The exceptions are the
(hash-nums6) and (hash-digs7) benchmarks, because
those measure very-fine grained actions on mutable hash tables and I/O
ports, and the cost is largest for those. Comparing sequential times
for those two versions shows that support for parallel thread can cost
up to 6-8% for programs that do not use them, although the cost tends
to be much less for most programs.
(hash-nums6)
real msec
CPU
GC
N
v8.18 sequential
v9.0 sequential
1
×1
188
180
0
×0.96
195
187
0
4
×1
757
728
2
×0.98
773
739
3
8
×1
1520
1461
9
×0.98
1546
1477
12
(hash-digs7)
real msec
CPU
GC
N
v8.18 sequential
v9.0 sequential
1
×1
118
114
3
×0.94
126
122
3
4
×1
474
458
12
×0.94
506
489
13
8
×1
947
915
25
×0.92
1025
989
26
Overall, parallelizable numerical programs or ones that manipulate
unshared data structures can achieve speedup through parallel threads
relatively easily, but I/O remains a direction for improvement.
Backward Compatibility
If a library uses mutable variables or objects, either publicly or
internally, then it must use locks or some other form of concurrency
control to work properly in a multithreaded context. Racket already
has concurrency, and the expectation for libraries to work with
threads does not change with the introduction of parallel threads.
Racket’s semaphores, channels, and other synchronization constructs
work the same with parallel threads as concurrent threads. Even
programs that use lock-free approaches based on compare-and-swap
operation (such as box-cas!) continue to work, since Racket’s
compare-and-swap operations use processor-level primitives.
Still, there are a few concerns:
Racket’s coroutine threads offer the guarantee of
sequential consistency, which means that effects in one
thread cannot be seen out-of-order in another thread. Parallel threads
in Racket
expose
the underlying machine’s memory-consistency model, which may allow
reordering of memory effects as observed by other threads. In general,
a weak memory model can be an issue for code not intended for use with
threads, but Racket—
more precisely, Chez Scheme—
always guarantees
the memory safety of such code using memory fences. That is, Racket
code might observe out-of-order writes, but it never observes
ill-formed Racket objects. The fences are not new, and they are part of
the same write barrier that already supports generational garbage
collection and the memory safety of futures. Although
sequential consistency supports lock implementations that don’t work
with weaker memory models, so they would work with coroutine threads
and not parallel threads, we have not found any such
implementations in Racket libraries.
Some Racket libraries use atomic mode for concurrency
control. Atomic mode in Racket prevent coroutine thread swaps, and
entering atomic mode is a relatively cheap operation within Racket’s
coroutine scheduler. When a parallel thread enters atomic mode, then
it prevents other coroutine threads from running, but it does
not prevent other parallel threads from running. As long as
atomic mode is used consistently to guard a shared resource, then it
continues to serve that role with parallel threads.
Entering atomic mode is a much more expensive operation in a parallel
thread than in a coroutine thread; in many cases, Racket core
libraries that need finer-grained locking more specifically need to
move away from using atomic mode. Still, making atomic mode
synchronize a parallel thread with coroutine thread provides a
graceful fallback and evolution path.
Foreign functions that are called by Racket in a coroutine
threads are effectively atomic operations when there are no parallel
threads, since a coroutine swap cannot take place during the foreign
call. It’s rare that this atomicity implies any kind of lock at the
Racket level, however, and the foreign function itself is either
adapted to operating-system threads or not. Racket can already create
operating systems threads through dynamic-place, and
foreign-function bindings have generally been adapted already to that
possibility.
The greater degree of concurrency enabled by parallelism exposed some
bugs in our existing core libraries that could have been triggered
with coroutine threads, but hadn’t been triggered reliably enough to
detect and repair the bugs before. Beyond those general improvements,
our experience with pre-release Racket is that parallel threads have
not created backward-compatibility problems.
A major release is always exciting and Racket 9.0 is no exception in that it introduces Parallel Threads. While Racket has had green threads for some time, and supports parallelism via futures and places, we feel parallel threads is a major addition.
Parallel threads can be created using the #:pool argument to thread creation.
Threads created with #:keep set to 'results will record their results for later retrieval with thread-wait.
The black-box wrapper prevents the optimizing compiler from optimizing away certain computations entirely. This can be helpful in ensuring that benchmarks are accurate.
The decompile-linklet function can map linklets back to s-expressions.
When using BC Racket, the processor-count function is changed to always return the parallel count.
We now distribute “natipkg” packages for AArch64, useful for package-build and package-testing infrastructure.
Check Syntax tracks identifiers more deeply nested in the “origin” field of syntax objects.
There are many other repairs and documentation improvements!
Thank you
The following people contributed to this release:
Alexander Shopov, Anthony Carrico, Bert De Ketelaere, Bogdan Popa, Cadence Ember, David Van Horn, Gustavo Massaccesi, Jade Sailor, Jakub Zalewski, Jens Axel Søgaard, jestarray, John Clements, Jordan Johnson, Matthew Flatt, Matthias Felleisen, Mike Sperber, Philip McGrath, RMOlive, Robby Findler, Ruifeng Xie, Ryan Culpepper, Sam Phillips, Sam Tobin-Hochstadt, Sebastian Rakel, shenleban tongying, Shu-Hung You, Stephen De Gabrielle, Steve Byan, and Wing Hei Chan.
Racket is a community developed open source project and we welcome new contributors. See racket/README.md to learn how you can be a part of this amazing project.
Feedback Welcome
Questions and discussion welcome at the Racket community on Discourse or Discord.
Please share
If you can - please help get the word out to users and platform specific repo packagers
Racket - the Language-Oriented Programming Language - version 9.0 is now available from https://download.racket-lang.org
See https://blog.racket-lang.org/2025/11/racket-v9-0.html for the release announcement and highlights.
This year, instead of participating in the Autumn Lisp Game
Jam, Spritely decided to make
some non-game demos to show off our tech. To that end, I spent the week of
November 3rd writing GoblinShare, a
secure, peer-to-peer file-sharing utility using the Guile port of
Magenc for storage and distribution,
relying on Goblins for the peer-to-peer
connection abstraction, and delivered over Tor.
Thanks to Goblins, this turned out to be super easy to implement. Let me show
you how!
�� Demos; glorious demos! ��
We at Spritely absolutely love our technology demos. We know it can be
tricky to understand an unfamiliar paradigm, but we also think hands-on demos
help. To that
end, we build lotsofusabledemostotry. We particularly like building
games during the Lisp Game Jam so we can connect with the broader Lisp
community, encourage more developers to use our technology, and end up with
something fun that people want to use. After all, we can't build the future
of the social web all by ourselves! This year, though, we decided to focus not
on showing off work we've already done, but pushing our work in a new direction;
and it felt best to do that outside the context of the jam.
My assignment (which I did choose to accept) was tripartite: port Magenc to
Guile, implement a simple file-sharing tool (which I decided to somewhat model
on Magic Wormhole), and port
Crystal to Guile. The Crystal
work remains ahead, but I've ported
Magenc and built
GoblinShare as the file-sharing
tool. I treated the Magenc port as prep work and GoblinShare proper as the
equivalent of a jam entry, so that will be the focus of this post. First,
though, I'll provide an overview of these two projects, what they do, and how
they do it.
Magenc
Magenc is an encrypted, distributed, content-addressed data store relying on
magnet URLs as capabilities, inspired by
Tahoe-LAFS. It does not use Goblins,
but it is built with capability security concepts in mind. It consists of three
components, each functioning as a subcommand under a single program: magenc serve, which starts a server; magenc put, which encrypts and POSTs files to
a server; and magenc get, which GETs files from a server. (Although the
architecture is designed to support arbitrary remote stores, only a web store
using HTTP has been implemented so far.)
A quick summary
The core of Magenc's functionality is the magnet URL, which looks like
magnet://?xt=<url-encoded-urn>&ek=<base64-url-encoded-string>&es=<string> and
uniquely identifies a specific file. Decomposing the query parameters
(everything after ?), xt refers to the exact topic identifying a specific
file, which is simply the sha256 hash of the encrypted manifest (which may be a
raw object as discussed below). ek is the encryption key used to encrypt the
file. es is the encryption suite used for encryption. The Guile port of
Magenc uses AES in Galois/Counter Mode (AES-GCM) for encryption; the Racket
version uses AES in Counter Mode (AES-CTR). Of this information, only the exact
topic is ever known by the server. Together, the information in a magnet URL is
both all that is needed and everything that is required to identify, access, and
decrypt a file on a given server. This makes a magnet URL a capability.
The server only has access to encrypted binary data a client sends. When the
server receives a POST request with attached binary data, it hashes the
provided data using sha256, encodes the unhashed data using base64, and stores
the encoded data in a key-value store where the hash ‒ the exact topic ‒ is
the key. It then converts the exact topic into a URI object from Guile's (web uri) module before sending the URI back in the content-location field of the
response object. As a safety redundancy, the client checks to make sure that
the exact topic it gets back is what it expects. When the server receives a
GET request with an exact topic in the content-location field, it looks up
the associated data, decodes the base64 into binary data, and sends that back as
the body of a response. That's all the server knows and does; everything else
is handled by the client.
magenc put handles the encryption and generation of cryptographic inputs.
When passed a file (and, optionally, server URL), Magenc first chunks the file
as necessary, encrypting each chunk to transmit separately so that the server
does not necessarily know the chunks are related (though correlation of
connections and network traffic by a malicious server or observer could be used
to reliably guess interrelation). Then, the client prints a magnet URL which
identifies the entrypoint to retrieve the file ‒ a "raw" object if the file fits
in one chunk or a "manifest" if it doesn't. As discussed above, the magnet URL
also embeds cryptographic information.
A file's magnet URL can be used with magenc get to retrieve that file. When
passed a magnet URL (and, if necessary, server URL), magenc get first
retrieves and decrypts the object identified by the exact topic. If the object
is a "raw" object, Magenc extracts the binary data, decrypts it, and writes it
to either standard output or a given file. If the object is a "manifest"
object, Magenc extracts the exact topics for each chunk from the manifest,
retrieves and decrypts each chunk in order, then writes out the complete file in
the same way.
The original version of Magenc includes a more thorough
write-up.
Aside from the differing encryption suites and commandline interfaces, that
write-up also holds for the Guile version.
A simple example
Some things are easier to understand in action, so let's walk through a
trivially simple example of storing and retrieving a file using Magenc. We will
skip build instructions, which are provided in the
repository. For ease of demonstration,
we will use the default server configuration which launches a process listening
at http://127.0.0.1:8118 (http://localhost:8118) and stores data in memory
rather than writing it to disk ‒ Magenc does include a backend relying on
Goblins' bloblin persistence
store
to efficiently store files on disk. We will use what is called convergent
encryption, which uses part of the unencrypted file itself as a cryptographic
input, to ensure that our example file produces the same magnet URL every time
it is stored. (If you are reading this in the future and Magenc's cryptography
has changed, the magnet URLs may no longer match.)
First, let's start a server:
magenc serve
This should print the address where the server is listening, like:
Server running at: http://127.0.0.1:8118
We can leave that running in one terminal session and use a different one for
everything else. Now let's create our example file:
echo "Hello! I'm an example file!" > example.txt
Next, we'll store the file with magenc put example.txt --convergent. This
prints the magnet URL
magnet:?xt=urn%3Asha256d%3A4a2TJXrPx83v1DGnOJyPa5b678AkVsPaplsx_LcT06I&ek=_TrAfpNRRLQb7gutF8KKMtj-tPWk8_AapsJQgu6sDeo&es=AES-256-GCM.
(Note that argument ordering is relevant; to simplify the implementation of the
CLI, Magenc expects option arguments ‒ anything starting with - or -- ‒
after positional arguments.) Finally, we can retrieve the file:
magenc get "magnet:?xt=urn%3Asha256d%3A4a2TJXrPx83v1DGnOJyPa5b678AkVsPaplsx_LcT06I&ek=_TrAfpNRRLQb7gutF8KKMtj-tPWk8_AapsJQgu6sDeo&es=AES-256-GCM"
Make sure to quote the magnet URL otherwise the shell will interpret & as a
command. This command produces the output:
Hello! I'm an example file!
And that's it! Simple! There are a few more options available for the various
subcommands, each explained with magenc --help.
GoblinShare
In addition to being an application, Magenc is also a library. GoblinShare is
implemented using Magenc in this capacity. All it adds is a purpose-built UI
and a wrapper around the in-memory store backend which makes it easier to use
through a Goblins actor. The sending peer launches a Tor
netlayer,
generates a sturdyref ‒ a persistent object reference which can be shared
out-of-band ‒ for the store actor, and adds that to the magnet URL with the
additional acceptable source (as) field. The receiving peer sets up its own
netlayer, "enlivens" the sturdyref, and downloads the file. Once the file is
retrieved, the sending peer terminates, removing the associated data from
memory.
An example
We will perform essentially the same tasks for this example as we did for the
Magenc example. There are a few differences to note. First, GoblinShare does
not provide an option for convergent encryption. This simply wouldn't make
sense for an ephemeral file-sharing tool. For our present purposes, this means
that you will almost certainly get a different magnet URL than is produced here.
Second, it is necessary to manually run the tor daemon. Build and usage
instructions in the GoblinShare
repository cover that so the
following example assumes a running daemon.
We will reuse the same example file as above, so feel free to reuse the previous
command to create it. Then, all we have to do is send it:
goblinshare send example.txt
to get the magnet URL. The sending process will wait for the file to be
retrieved, so switch to another terminal session to retrieve the file:
Because we're using Tor as our network layer, it can take a few seconds for the
connection to be established and the data to be sent, even though we are
connecting to a local server. Since our example file is small enough to fit
into a single chunk, the delay isn't very long, but a larger file can take quite
a while to transfer. Eventually, you will receive the expected output:
Hello! I'm an example file!
At this point, both the sending and receiving processes will terminate, and
that's that! You've successfully shared a file with Goblins! Just like Magenc,
GoblinShare has a (very) few options you can see with goblinshare --help.
Unlike Magenc, option arguments can be supplied before or after positional
arguments as long as they follow the associated subcommand.
A note on Magic Wormhole
I mentioned that I took initial inspiration for this project from Magic
Wormhole. In practice,
though, the only similarity wound up being the names of the subcommands send
and receive. Magic Wormhole relies on a central relay server because it uses
"wormhole codes" to facilitate oral communication of the relevant capabilities.
These codes seem to be keys mapping to a fuller capability and therefore
requiring a coordination point. GoblinShare, by contrast, chooses to instead
provide less-human-friendly magnet URLs which allow fully peer-to-peer file
transfer because they encode all necessary information.
How easy was it, really?
As I mentioned near the beginning of this post, the most surprising part of
implementing GoblinShare was how easy it was. The core functionality was
implemented in about half a day, though a failed attempt to get GoblinShare to
manage the tor daemon stretched the initial implementation out to about a day
and a half. In the end, the UI and business logic of GoblinShare together
require 250 lines of code including module headers, whitespace, docstrings, and
inline comments (but not license headers). That's very little code! Magenc
took a bit more work, but a lot of that was getting the cryptography and URL
abstractions playing nicely ‒ and, admittedly, there's still room to make Magenc
more approachable as a library. Still, the port took a little under a week all
told, including time to implement tests, and came out to somewhere in the
neighborhood of 1400 lines of code with similar caveats.
Numbers are one thing, but the simplicity of GoblinShare really comes through in
the code itself. The core logic comes down to three procedures, two in the
send logic (with an additional helper) and one in the receive logic. Let's
break these down and walk through them to build up the relevant logic.
As a general note, the UI layers of GoblinShare and Magenc use
SRFI-37 to convert commandline
arguments into an association list of options and arguments. The main
procedures in (goblinshare) and (magenc) simply parse the commandline into
the appropriate arguments which they pass to the procedure associated with a
given subcommand.
Now, on to the code walkthrough!
send
First, we'll briefly discuss the relevant helper procedure, add-sref-to-magnet-url:
All this procedure does is decompose the magnet URL we get from Magenc then
build a new magnet URL with an additional query parameter holding the sturdyref
to the Goblins actor.
Most of the important logic happens in connect-to-goblinshare-server-store,
named to match the convention of Magenc's store abstraction:
(define(connect-to-goblinshare-server-storedone)(definebackend(connect-to-backend#:backend-type'memory))(define(putdata)(store-backend-databackenddata))(define(getexact-topic);; CapTP doesn't support records so we turn exact topics into strings
(retrieve-backend-databackend(string->exact-topicexact-topic)))(define(close)(close-backendbackend)(signal-condition!done))(connect-to-store*'goblinshare-send(lambda()(valuesgetputclose))))
This procedure does a few interesting things. First, it accepts a done
parameter. This is a Fibers condition so
that we can wait on a remote peer to collect the shared file before exiting,
which is handled elsewhere.
Next, this procedure creates a Magenc memory backend. Backends are simply data
stores wrapped in a Scheme record so they can be used with an abstract
interface. They provide three procedures associated with three fields of the
record type: backend-get, backend-put, and backend-close, accessed using
retrieve-backend-data, store-backend-data, and close-backend,
respectively. Here we wrap each of the underlying backend's procedures in a new
procedure so we can massage our inputs to work over OCapN.
put calls the memory backend's put procedure unmodified.
get does a bit more. Because we are using OCapN to communicate between peers,
and because OCapN doesn't have a dedicated type for generic records, we convert
exact topics into strings when sending messages between Goblins actors. get
thus converts exact topic strings back into exact topic records. We could
instead have written a marshaller to convert our record into an OCapN tagged
value, but there was little reason to do so.
close calls the underlying backend's close then signals the done
condition.
Finally, the last call in this procedure is to Magenc's connect-to-store*
helper which constructs a store interface (similar in shape and function to a
backend interface) to be passed to chunk-and-store-data. As you can see, it
takes two arguments: a symbol identifying its type, and a thunk which returns
three values: get, put, and close. (The higher-level connect-to-store
supports keyword arguments and is used to construct Magenc's built-in backends.)
The last and likely most interesting piece of GoblinShare's send logic is the
part that actually deals with Goblins. All it does is wrap a store in an actor,
spawn a Tor netlayer, and return a sturdyref to the wrapper actor:
(define(spawn-client-srefstore)(define(^clientbcomstore)(methods((getid)(retrieve-datastoreid))((put._)(error"cannot put with client capability"))((close)(close-storestore))))(definemycapn(spawn-mycapn(spawn^onion-netlayer)))(::mycapn'register(spawn^clientstore)'onion))
The actor is ^client. As you can see, we override the backing store's put
procedure to prevent remote peers from writing unexpectedly. Otherwise, we use
Magenc's store interface to manipulate the store normally.
The last two lines pack a lot of logic. First, we spawn an ^onion-netlayer
representing a Tor netlayer. We immediately pass this into spawn-mycapn,
creating a new ^mycapn object populated with that netlayer. Then, we spawn a
new ^client, immediately passing it to our ^mycapn object's register
method to get a promise to a sturdyref referencing the object. Whew! All that
in only two lines!
We bring all of the send logic together in gs-send:
As you can see, gs-send requires a filename and optionally accepts an output
port where it will write the resulting magnet URL. This is to facilitate tests
and is not exposed in the commandline client.
The procedure starts by creating a Fibers condition which it passes to
connect-to-goblinshare-server-store to create a store.
Next, it passes the resulting store to Magenc's chunk-and-store-data with
default parameters ‒ so, using AES-GCM encryption and generating a new key
appropriate for that cipher ‒ and a port for reading the input file. It
assigns the returned magnet URL to a new variable.
Then, gs-send spawns a vat and enters a vat context. There, it spawns a
client actor to wrap the store and resolves the sturdyref to the client actor.
It adds the resolved sturdyref to the magnet URL and writes out the result.
Finally, gs-send waits for the done condition to be signaled.
receive
The receive command relies on only one procedure for most of its logic:
(define(connect-to-goblinshare-client-storeclient-sref)(definevat(spawn-vat#:name'goblinshare-client))(definemycapn(with-vatvat(spawn-mycapn(spawn^onion-netlayer))))(defineclient(with-vatvat(::mycapn'enlivenclient-sref)))(define(put._)(error"cannot put with client capability"))(defineget(let((ch(make-channel)))(lambda(exact-topic)(with-vatvat;; CapTP doesn't support records so we turn exact topics into strings
(on(<-client'get(exact-topic->stringexact-topic))(lambda(val)(syscaller-free-fiber(lambda()(put-messagech`(ok,val))))#t)#:catch(lambda(exn)(syscaller-free-fiber(lambda()(put-messagech`(error,exn))))#f)))(match(get-messagech)(('okval)val)(('errorexn)(raise-exceptionexn))))))(define(close)(with-vatvat(<-npclient'close)))(connect-to-store*'goblinshare-receive(lambda()(valuesgetputclose))))
There are more lines here than in connect-to-goblinshare-server-store, but the
gist is simple. connect-to-goblinshare-client-store takes a sturdyref, which
may be a promise thanks to promise pipelining.
It starts by spawning a vat, spawning a ^mycapn with a new ^onion-netlayer,
and enlivening the client actor. These steps mirror the steps used to spawn the
client sturdyref above. Rather than registering an actor to get a sturdyref,
here we enliven a sturdyref to get an actor.
Next, connect-to-goblinshare-client-store wraps each method of the client
actor in a regular Scheme procedure for Magenc's store interface.
put and close are quite simple. The former errors out to avoid unneeded
network access should it be called. The latter sends the close message to the
remote actor.
get looks complicated, but most of the code is there to resolve a promise into
a concrete Scheme value. The core functionality is (<- client 'get (exact-topic->string exact-topic)). This line encodes the exact topic as a
string, messages the client actor's get method with that string, and returns
the resulting promise. When that promise fulfills or breaks, the surrounding
logic propagates the result as normal.
Finally, connect-to-goblinshare-client-store creates a new store with these
wrapper procedures using connect-to-store*.
The resulting store is used by gs-receive to get the desired file:
This procedure requires the magnet URL to retrieve, and optionally accepts an
output port where it will write the resulting data. This is exposed through
goblinshare receive's --output option.
First, gs-receive creates a store as discussed above, extracting the sturdyref
string from the magnet URL and converting it into the appropriate Scheme type.
Then, it decomposes the magnet URL into its components and passes them, along
with the store and a port receiving the result, to Magenc's
retrieve-and-unchunk-data. Finally, it writes the result and informs the
remote store that it's done.
And that's it! All the other code in GoblinShare is for the UI.
I told you it was simple!
Final thoughts
It's worth mentioning that neither Magenc nor GoblinShare are intended as
finished, production software. Notably, the underlying cryptography has not
been audited. Additionally, there are some improvements I'd like to make to
Magenc's API which would make GoblinShare simpler.
That's okay, though. These projects are demonstrations. Magenc is intended
to demonstrate the basic concepts of capability-secure distributed, encrypted,
content-addressed data storage. GoblinShare, for its part, is supposed to show
how easy it is to implement otherwise-complex functionality with Goblins, which
I think it does. As a rough comparison, Magic Wormhole is about 11,500 lines of
Python code, counted with similar caveats as those for Magenc and GoblinShare,
and relies on a much longer list of dependencies.
All told, I am incredibly happy with how things turned out. GoblinShare is
beautifully simple and useful. I hope it shows others how easy Goblins makes
networked applications and inspires more neat software.
I actually love fonts! (Which might surprise those who know that I leave my webpage set to the browser default font, but it’s * I love good fonts so much that I think fonts should be a reader decision, not a server decision.) When my desktop went headless after having to move to a way smaller apartment and my Emacs had to live in SSH only, I was sad because I had set up all kinds of weird fonts in Emacs and functions to switch the entire Emacs over along with “only switch this particular buffer over” variants. E.g.
And now I have all those fun fonts working again on the Android version of Emacs! ♥︎♥︎♥︎
Hence the huge sizes. I was on 12 for most non-Junicode fonts with a -16 “big” option, while Junicode with its lower x-heights I had at 14 pts. But on “Moria”, I use 28 as the “small” size, 34 as the big size and Junicode gets to be 32.
My default font for coding and general emacsing around was (CW non-free) Futura while I wrote most prose texts with Junicode. Coding in proportional?! A geometric sans with a super ambiguous character set for I1O0? Well, it works great for Lisp for the most part and I had it set up so I could super easily toggle out from it back into Deja Vu Mono, or Fira Code these days.
I set this all of this up pretty soon before having to switch away from it so I only got to enjoy it for a few months so I’m grateful that I have it back albeit not on my big 21″ 4:3 screen. It’s on a more cozy and puny lantern-lit 7″ screen. Last piece of the puzzle is that I’m gonna go find a retro-ish typewritery font. Old Timey Code maybe. I was on “Bohemian Typewriter” but I couldn’t get that to work on since the Android version of Emacs don’t support OTF. And good riddance since the latin-1 coverage was so bad, but also not good riddance since the alternatives I’ve found so far are more consistenly x-spaced instead of nostalgically jittering around like an X-File on uppers.
I really like switching fonts with the same text open. It helps me get fresh eyes on the same text and see different problems or things I can write in a more beautiful or clear way.