ActivityPub is the protocol that powers
the Fediverse. Not only does it allow different instances of the same app to
federate with one another, it also allows different apps to federate. For
example, a post on a video hosting app could federate to a microblogging app.
ActivityPub does a good job of this and is a leap forward from what
came before it.
For those unfamiliar, ActivityPub is a decentralized social networking protocol
standardized under the W3C. Both Spritely’s Executive Director,
Christine Lemmer-Webber and myself (Jessica Tallon) worked on standardizing
ActivityPub. The ActivityPub specification left holes in for identity,
distributed storage, and more. Since then Spritely has been a continuation of
this work, researching and developing the next generation of social web
infrastructure.
But where does this leave ActivityPub? Has it been abandoned by Spritely as a
stepping stone? No! We’ve long had a project (codenamed Mandy) on our roadmap to
implement ActivityPub on top of Goblins. If you open up the ActivityPub
specification you’ll actually see
mention of actors. The protocol
itself is designed with the actor model in mind. Since Goblins is an
implementation of the actor model, they should be a natural fit.
Goblins actors over HTTP
ActivityPub is a protocol on top of HTTP, but Goblins doesn’t use HTTP. So, the
first step was to make Goblins actors available over HTTP. Fortunately, hooking
this up was quite easy. There are many different ways we could do this, but for
this prototype I took a fairly simple approach.
Guile has a
built in web server.
not only that but fibers (the concurrency
system Goblins uses) has a backend for this. It means we can pretty
quickly start handling requests.
The webserver can be started using the run-server procedure. It takes in a
symbol which specifies an implementation ('http would be the built in HTTP
server, 'fibers is the one provided by Fibers):
(run-serverhandler'fibers)
The handler is a procedure which takes a request and a body and expects the
HTTP response as the return value. When writing a typical HTTP server in Fibers
we’d suspend the fiber until the response is ready. However, Goblins code is
built around sending messages and waiting for promises to resolve. To bridge the
API between these two different philosophies, we use a channel.
Channels allow two fibers to send a message to one another. Reading or
writing to a channel causes the fiber to suspend until a message is available.
We can then send a message to one of our actors and use on to listen to the
response, once we have the response we can write our response to our channel.
Goblins vats are event loops which run on a fiber. These event loops manage a
queue of messages sent to actors spawned within that vat and are processed
sequentially. If we were to just write to the channel, we would suspend
underlying fiber for the vat. When the vat’s fiber suspends, it stops it from
processing other messages within the queue. To ensure that we’re not blocking
the vat by suspending it, we’ll use the helper procedure syscaller-free-fiber
which gives us a new fiber outside the vat which can be safely suspended.
This is a slightly simpler version than the one used in the prototype, but it
shows how we’re making asynchronous actors which can return promises accessible
to HTTP requests. From the code above, we’ve already bridged into our
Goblins actors. This is a pretty flexible bridge as this ^router actor just
takes in a request and provides a response, we could dispatch this request in
any number of ways. For our prototype, this is the approach we took:
Most web frameworks have a special-purpose routing language that uses strings,
which is an inexpressive anti-pattern.
We’re fortunate in Scheme to have a powerful
pattern matcher
that we can use instead. In this case we’re matching a filename for our static
files and we’re also making some objects available at
/object/<base32-encoded-id>.
We can see above the router we’re spawning something called the
nonce registry.
This is an actor which provides a mechanism to register any number of actors
against some identifier and look them up. The actor handles salting and hashing
the IDs and even persistence. This works great for registering ActivityPub
objects. Each one gets a unique ID which can be used for lookup later.
These IDs are bytevectors, so we base32 encode them to convert them into a
textual form that can be included in a URI. We then just need to add a route to
our match clause to look them up. You may notice we’re using <- to send
messages which means we get a promise in return. This isn’t a problem for our
web server though as the on handler will wait until it’s resolved.
The prototype has more routes and handles slightly more situations than the
snippet of code shown above, but the principles introduced are the same.
How does ActivityPub work?
Let’s take a step back and look at ActivityPub itself both because how it
works will be useful to keep in mind the rest of the post, and to see if we can
see the actor model within the specification.
ActivityPub is actually not too tricky. It has concepts like inboxes and
outboxes that you’re probably familiar with from email. It also has activities
which describe something the user is “doing”. Activities are the building block
of the protocol. Finally, it has objects which are things like Notes, Images,
Videos, etc.
ActivityPub is actually two protocols in one. There’s the client-to-server
protocol and then the federated server-to-server protocol. These protocols are
actually very similar and for the most part mirror one another, but
unfortunately the client-to-server protocol gets little love from ActivityPub
implementations. Even so, let’s take a look at how I might go about posting a
Note (think toot/tweet/microblog text of choice) to my good friend Christine:
The JSON object above represents a
Create activity
which basically is posting something. Other activities might be Like, Share,
Delete, etc. Most activities are transitive (the activity has an object) and
our Create activity is no exception. The object inside is a Note with some
content.
The activity itself is posted to my outbox as an HTTP POST. The server will
assign IDs to both objects and assign me (Jessica) as the author of the note. If
you were to do a GET on the outbox, you’d see something like this:
Reading from a user’s outbox gives you all the things they’ve posted (often it’s
paginated, but that was left unimplemented in the prototype). You can see that
the note we posted is the only item and the server has assigned IDs the
activity, note, and author.
The server also should deliver this activity and note to Christine by looking
up her inbox and doing an HTTP POST to it with the activity. I’m not going to
go any further into federation but it’s very similar to the client-to-server
API.
ActivityPub meets Goblins
If you’ve worked with Goblins or other actor frameworks you might be thinking
“these activities look an awful lot like messages between different objects” and
you’d be right.
That Create activity we posted above could be written something like this:
(definenote(spawn^note#:content"Ohai Christine!"))(<-tsyesika-outbox; The outbox
'create; A method called create
#:objectnote; Create an object
#:to(listchristine)); Send it to Christine.
The outbox can then implement whatever things it needs to for its Create
activity (assigning an ID), federating the post out, etc. Just like any other
Goblins actor would implement a method.
The prototype I’ve built does a fairly simple transformation from the
unmarshalled JSON data. The JSON data is accepted and parsed to an association
list. Then there is then an unmarshalling step where any nested objects get
converted into their corresponding Goblins actors. The result is an activity
actor which looks like this:
ActivityPub is based on Activity Streams 2.0 which has a hierarchical structure.
The Activity type extends from a base type of Object. This is represented in
the Mandy prototype as parent.
We can then use this very simple procedure which takes an activity and converts
it to a message:
This produces our message as a list with its method name and a bunch of keyword
arguments. The methods can then be defined as normal Goblins methods. If all the
keywords aren’t needed for that method behavior, it can include
#:allow-other-keys so that everything else can be ignored.
As an example of this let’s see the Collection type which is basically a list
of objects. Here’s the implementation in the prototype:
Indicates that the actor has added the object to the target. If the
target property is not explicitly specified, the target would need to be
determined implicitly by context. The origin can be used to identify the
context from which the object originated.
In this case, there might be two things which would be important to this method,
the first being the #:object keyword and the second being the #:target.
Since the collection is being sent the add message, it’s being assumed in the
above code that the sender has figured the collection out. The add method
then just needs to care about the object, in which case it specifies object as
the only key and ignores anything else. Finally, the behavior is straightforward
it adds the object to the collection.
Hopefully the above shows how we can take these ActivityPub activities and
transform them into Goblins messages. This gives us the desired Goblins
ergonomics while implementing ActivityPub objects.
Going further
The prototype that I implemented was a demo trying to explore some of both the
Goblins actor HTTP interface and how ActivityPub might be implemented in an
actor framework like Goblins. Having helped co-author ActivityPub and then
develop Goblins, I’ve had musings of how this implementation might look, but
it’s been very exciting to see that they work in practice.
This demo has explored both a HTTP interface with Goblins actors and
ActivityPub. I think each one has a lot of potential for future work and I’d
love to see Goblins applied in building websites. Goblins could be used both to
build the backend of websites by handling the HTTP requests themselves, and in
the browser by using Hoot.
There’s a many aspects of an ActivityPub implementation left to explore, for
instance Goblins’ persistence system would be well suited to be our database. We
could explore adding federation (Goblins being a distributed framework would be
well suited to implement). Hopefully in the future we’ll be able to build on
this experiment more.
If you found this blog post interesting, both myself and Christine
Lemmer-Webber will be giving a FOSDEM 2026
talk
on this. We’d love to see you there if you’re attending, but if not
the video will be posted shortly after the event.
It's been in our todo list for very long time to create
some system to track Gauche extension packages.
It is trivial to create a site where users can put
the info. What's not trivial is how to keep the info updated.
It's a burden to the user if we ask them to keep updating
such info whenever they update their extension package.
If a user puts their website/email for the package,
but then moves away from Gauche development, and eventually
the site/email become inactive and go away, we don't know
what to do with the entry; it'd be also difficult if
somebody wants to take over the project.
Should anybody be able to update the package's info? Limiting it to
the original authors becomes an issue if they go inactive and
out of touch. Allowing it may cause a security issue if
someone replaces the distribution URL to malicious one.
To vet the users entering info, we need some mechanism of
user registration and authentication, which adds another
database to maintain.
These implications kept us from implementing the official
mechanism to provide the extension package registry.
Things has changed in the last decate or so.
First, distributed VCS and their hosting services have become
a norm. Instead of having personal websites to serve
extension package tarballs and documents, developers
can put their repository on one of those services
and make it public.
Recent Gauche provides a standard framework of building extensions.
One important aspect of it is package.scm in the
source tree to keep meta information about the package,
including version number, authors, "official" repository url, dependencies, etc.
So, once we know the package's repository URL, we can get its
meta information!
The author updates package.scm as the development proceeds,
because it is a part of the source. No need to update information
on the registry separately.
Anybody can create account on those services, but the service
gives certain identity to each one and the place to interact with
each other. Sure, people move away eventually, but it's rarely that they
bother to remove the repositories; and it's easy to inherit
the abandoned project.
We already have a official way to state such transfer of control
in package.scm (superseded-by slot). If the successor
can contact the original author/maitainer/committer, the package.scm
in the original repository can have superseded-by field
pointing to the new repository. It is not mandatory, but it can
make it clear where is the "official" successor.
In other words, we can use the existing public repositories as the metadata
database, and merely maintain pointers to them by ourselves.
So, how to manage those pointers? We don't have thousands of
extension packages updated daily, so we don't need a sophisticated
database server for it.
We decided to piggyback on the public DVCS service again.
Gauche package repository index github repo maintains
the list of package urls under its packages directory.
If you want your packages to be listed, just fork it,
add your package, and send a pull request.
(If you don't want to use GitHub, just send a patch
via email.)
Which repository is added when, by whose request, is recorded
in the commits of that repo.
Currenly, pulling metadata and reflecting it on the webpage
is done in occasional batch process. We'll adjust the frequency
as it goes. If we ever get very popular and receiving
tons of new package registration requests, we might need to upgrade
the system, but until then, this will be the least-maintenance-cost
solution.
To be in the registry, your extension package needs package.scm.
I scanned through the existing list on wiki
(WiLiKi:Gauche:Packages) and added packages that I could find
the public repository with package.scm.
If your extension is old enough not to have package.scm,
a convenient way is to run gauche-package populate in
the top directory of the source tree. It gives you a template
package.scm with some fields filled by whatever information
it can find.
Pattern matching extends Scheme's repertoire of conditional constructs, allowing
decomposition of compound data structures and binding their parts to variables.
This SRFI proposes one such construct, match, which offers all
the core functionality specified in SRFI 200,
while extending it with support for non-linear patterns and backtracking.
The proposed construct is modular and supports easy extension through the define-match-pattern
mechanism. It can be implemented portably in R⁷RS-Small.
I had a dream the other evening, in which I was at a large event full of
hackers—funny, that this is the extent of my dreams at the moment; as a
parent of three young kids, I don’t get out much—and, there, I was to
receive an award and give a speech. (I know, I am a ridiculous man,
even when sleeping.) The award was something about free software; it
had the trappings of victory, but the vibe among attendees was numbness
and bitter loss. Palantir had a booth; they use free software, and
isn’t that just great?
My talk was to be about Guile, I think: something technical, something
interesting, but, I suspected, something inadequate: in its place and
time it would be a delight to go deep on mechanism but the moment seemed
to call for something else.
These days are funny. We won, objectively, in the sense of the goals we
set in the beginning; most software is available to its users under a
free license: Firefox, Chromium, Android, Linux, all the programming
languages, you know the list. So why aren’t we happy?
When I reflect back on what inspired me about free software 25 years
ago, it was much more political than technical. The idea that we should
be able to modify our own means of production and share those
modifications was a part of a political project of mutual care: we
should be empowered to affect the systems that surround us, to the
extent that they affect us.
To give you an idea of the milieu, picture me in 1999. I left my home
to study abroad on another continent. When I would go to internet cafés
I would do my email and read slashdot and freshmeat as one did back
then, but also I would often read Z magazine,
Noam Chomsky and Michael Albert and Michael Parenti and Arundhati Roy
and Zapatistas and all. I remember reading El País the day after “we”
shut down the World Trade Organization meeting in Seattle, seeing
front-page pictures of pink-haired kids being beat up by the cops and
wishing I were there with them. For me, free software fit with all of
this: the notion that a better world was possible, and we could build it
together.
I won’t lie and say that the ideals were everything. I think much of my
motivation to program is selfish: I like to learn, to find out, to do.
But back then I felt the social component more strongly. Among my
cohort, though, I think we now do free software because we did free
software; the motive sedimented into mechanism. These are the spoils of
victory: free is the default. But defaults lack a sense of urgency, of
the political.
Nowadays the commons that we built is the feedlot of large language
models, and increasingly also its waste pond. The software we make is
free, but the system in which it is made is not; Linux Magazine 1, Z
magazine 0.
All of this makes me think that free software as a cause has run its
course. We were the vanguard, and we won. Our dreams of 25 years ago
are today’s table stakes. Specifically for my copyleft comrades, it
seems that the role of copyright as a societal lever has much less purchase; taken to its conclusion, we might find ourselves siding with
Disney and OpenAI against Google.
If I had to choose an idea from the 90s to keep, I would take “another
world is possible” over the four freedoms. For me, software freedom is
a strategy within a broader humanist project of liberation. It was
clever, in that it could motivate people from a variety of backgrounds
in a way that was on the whole positive for the humanist project. It
inspired me as a meaningful way in which I could work towards a world of
people caring for each other. In that spirit, I would like to invite my
comrades to reflect on their own hierarchy of principles; too often I
see people arguing the fine points of “is this software free” according
to a specific definition without appreciating the ends to which the
software freedom definition is a means.
Anyway, it turns out that I did win something, the Award for the
Advancement of Free
Software,
for my work on Guile over the years. My work on Guile has waxed and waned,
and in these last few years of parenthood it has been rather the latter,
but I am proud of some of the technical hacks; and it has been with a
heart-warming, wondrous delight that I have been a spectator to the rise
of Guix, a complete operating system built on
Guile. Apart from its quite compelling technical contributions, I just
love that Guix is a community of people working together to build a
shared project. I am going to the Guix
days in a month or
so and in past years it has been such a pleasure to see so many people
there, working to make possible another world.
In my dream, instead of talking about Guile, I gave a rousing and
compelling impromptu invective against Palantir and their ilk. I
thought it quite articulate; I was asleep. In these waking hours, some
days later, I don’t know what I did say, but I think I know what I would
like to have said: that if we take the means of free software to be the
ends, then we will find ourselves arguing our enemies are our friends.
Saying that it’s OK if some software we build on is made by people who
facilitate ICE raids. People who build spy software for controlling
domestic populations. People who work for empire.
What I would like to say is that free software is a strategy. As a
community of people that share some kind of liberatory principles of
which free software has been a part, let use free software as best we
can, among many other strategies. If it fits, great. If you find
yourself on the same side of an argument as Palantir, it’s time to back
up and try something else.
Ever since I worked on the Android project nearly twenty years ago, I've wanted to write mobile apps in
Scheme. But the early Android API made it hard to run a good Scheme
implementation, and the Android team strongly discouraged the use of
languages other than Java. Those limitations are long gone. Still,
whenever I got the energy to dive into Android Studio, the NDK, and the latest APIs, build problems and other random complexity
would frustrate me, and I would give up and switch to another project.
But now we have LLMs. They are exactly what I needed to get past the
slog and into the interesting parts of the project. I'm so glad I tried again.
My new project, Schmeep (Github), is an Android app built around Chibi Scheme, Alex Shinn's R7RS-Small-compatible Scheme implementation. It uses RAX, a simple framework
inspired by HTMX, to make HTML-based user interfaces in Scheme. No web server is
required. Instead, events are delivered to your Scheme code, which
returns new HTML fragments to update the UI.
Schmeep includes a simple Linux program that opens a REPL on the phone over Bluetooth. The whole project builds quickly using a
simple Makefile. This means that you rarely wait more than a few seconds to see the
effects of a change.
Schmeep has just come to life, but I'm eager to build a real app with
it. My plan is to experiment with writing local-first software in Scheme. I'll wrote more here when I do.
Am I the only one who often gets obsessed with stuff they discovered in really weird ways? I watch a film and some character listens to some music on their car stereo; I look it up and have a new favourite band for the next few months. That kind of thing.
Along those lines, quite a while ago I made an unfunny remark on Mastodon about the tendency of some widely-respected computer scientists to use a certain font in their presentations; just recently, prompted by referring to that post for the nth time – so much for jokes getting funnier by repetition – I decided to look into why they do this, even though that particular font is widely disliked. The answer turned out to be pretty much that they don’t care, and they wish people would focus on the content instead of the font. Fair enough.
Except someone also mentioned that that particular font might be good to use because it’s allegedly easier to read for dyslexic people.
Ah. Hmm. Hrrmph. Well, maybe I need to do something about this in future as well, then!
Cue a whole morning spent overthinking this!
The first thing to do was to find out whether that claim was really true of that font in particular. Answer: unsurprisingly, initial research pointed to ‘no’; some people say it helps, but just as for non-dyslexic people, opinions vary and you can’t generalize over ‘dyslexic people’ as a class here. For one thing, there are very different types of dyslexia. The chap who wrote that Reddit post says it only reminded him of the patronizing treatment he got as a dyslexic kid at school; again, that’s fair enough.1
There are a couple of fonts – Dyslexie and OpenDyslexic – which claim to be designed to be helpful for dyslexic people. The idea of both seems to be the same: somehow making the stroke weight heavier at the bottom of the font maybe gives the letters more ‘gravity’, as if that would help them stay in place better?
Type designer David Jones happened upon my posts fretting about this on Mastodon and was able to point me to a handy introduction to the topic of legibility for dyslexic folk by big-shot typographer Chuck Bigelow. Bigelow looks at the actual scientific research: on this kind of bottom-heavy font in particular, it’s a wash.
Sidebar! Personal hypothesis about why these fonts might feel like they work for some dyslexic people
Readers recognize words by their whole shape, not a letter at a time. The more common the shape, the easier it is to recognize the word. This applies for dyslexic people just as much as non-dyslexic people. For (some) dyslexic people, it’s just harder to pick apart the differences between words that have similar shapes.
This is why place names on road signs are in mixed case, not upper case. In the UK, for example, they used to be in upper case, with the idea that because upper case is bigger it would be easier to read from further away. But people are more used to seeing place names written in running text in mixed case, so the shape is easier to recognize in mixed case. So in 1963 it was fixed to require ordinary mixed case on road signs.
My guess as to why the dyslexic fonts might work for some people is just that they have such unusual letterforms compared to normal fonts that they make the shapes of words unfamiliar enough as to force word recognition to fall back to a lower level. In the same way that non-dyslexic readers have to fall back on letter-at-a-time (or morpheme-at-a-time) recognition to deal with unfamiliar words like quixolotl (a quixotic axolotl) or all-caps words like ASTROLABE, these fonts make everyone fall back to engaging more with the shapes of individual letters, reducing confusions between similar-looking words.
The real problem with this is that it also implies that as dyslexic readers get more used to OpenDyslexia/Dyslexie, the more they’ll get used to identifying words by shape in them, and the more they’ll go back to struggling to tell similarly-shaped words apart.
This is just an educated guess. It would be interesting to come up with an experiment to see if it could be tested.
To me, promoting these fonts now feels like a kind of inclusionist techno-solutionism all too common on the hackerly left, like the able-bodied Mastodon HOA insisting on alt texts that actually make life worse for those who depend on assistive technologies. With this one weird trick, you too can help people overcome their disability! Unfortunately, inclusion and accessibility are hard (though that certainly doesn’t mean they’re not worthwhile!) and there isn’t one weird trick to include everyone without accidentally disadvantaging some. If there is something you can do to try to include more people, it needs careful thought and consideration about the best way to make that affordance available, and about how to avoid accidentally hindering more people than you help – or worse, hindering the people you’re actually trying to help. In this case, for individual dyslexic people who feel like these fonts actually do help them read, and know where to find the setting to make websites display in those fonts, sure;2 but for a general audience, maybe not – we need a different solution here.
Bigelow does point to some clear results saying there’s a change you can make that makes readability measurably better for dyslexic people: make the font bigger!
That study has some limitations: the sample size was small and included only children, but it seems it was prompted by a previous study which included adults and gave initial positive results for increased font size across age ranges, prompting further investigation by the authors, so the indications are good that this isn’t a fluke or limited to children. The other major limitation is that both studies concern passively lit running text at close distance, not bullet points projected on a wall on the opposite side of the room; intuitively that doesn’t feel like a relevant factor. Also, increasing the font size only works to a certain point, at which point the benefits stop increasing (but, crucially, things don’t actually get worse), which also makes intuitive sense: as a non-dyslexic person, very fine print is harder to read than normal-size print, but after a certain point it makes no difference – 6 pt type is pretty hard going, but I can read 10 pt type just as well as 72 pt type. The result of the study is specifically that this threshold seems to be about a third higher for dyslexic children.
Of course, bigger text on a slide means you can fit less on, but I’m of the view that less text per slide is better anyway. I’m still not very good at practicing what I preach on that topic, so encouragement from a larger font size probably ends up being good for me as presenter as well! And, of course, a bigger font size is probably going to be better for audience members with vision impairments too. Wins all around.
So I boosted the font size of the slides I was working on by about 50% where I could; slightly less in some cases. The change also prompted me to redesign or split up some otherwise overcrowded slides.
Bigelow points to two other positive results. One is for a shorter line length: with a fixed width projector screen, a bigger font means shorter line length anyway, so that’s an easy win as well. But also that seems more relevant for running text than for short points on a slide: if you have text that spans more than one line on a slide, that’s already suspect; three or more lines is barely even a slide any more – it’s stopped being a visual aid for speaking and turned into a copy and paste of the paper for reading.
The other vaguely positive result is for slightly increased letter spacing (which David Jones suggested might be the reason the font discussed at the start of this post works for some dyslexic people). The results here are less clear than for font size or line length, but choosing a familiar font with slightly looser spacing would also be a change that would be unlikely to make anything worse for some people. I can kind of see the problem here: at the moment I use Helvetica, and the letters ‘tt’ in the word ‘pattern’ sometimes join together at the crossbars, making them look like a single letter.
I looked at the fonts Andika and Atkinson Hyperlegible as potential alternatives that are specifically designed to avoid situations that make words difficult to read or tell apart, but I haven’t yet made up my mind about switching. Atkinson in particular seems to be aimed at helping people with vision impairment more than dyslexia – not that that would be a bad thing, but it also can’t be denied that some of its letterforms deviate from the principle of familiarity of form which seems significant in helping dyslexic people. From my anecdotal experience with my own short-sightedness (only −1.25 dioptres in both eyes, admittedly), it also feels like I probably did much, much more to help by increasing the font size than I could do by switching to a different font; from my experience, probably like an order of magnitude more. For my personal stuff in general, I’ve mostly standardized on Adrian Frutiger’s Univers (in the case of this website, through the noncommercial-licensed GhostScript digitization); perhaps I should look at using that on my slides as well. It feels like it has a little more character than Helvetica, perhaps simply through being less overexposed, and it does also have slightly looser tracking.
Anyway, make the font bigger – for real! I think there are only wins to be had in that direction. You’ll make better, simpler slides; your vision-impaired and dyslexic audience members may thank you.
I sympathize with unhappy school memories associated with that font – to me, it only reminds me of the handouts my school German teacher gave us. Funnily (considering where I now live), German was my worst subject in school, and the teacher was an authoritarian ass who liked to pick on weaker students, so those aren’t good memories for me either.↩︎
Incidentally, designers, have you tried out your website or app with a custom font setting recently? Have you checked that no button text is unnecessarily truncated; that no important text overflows onto a second, partially invisible line? Another benefit: if your thing is only (as yet) available in one language, making sure that everything works with a wider or narrower font is an important step to be ready for the day when a translator comes along and wants to change the whole text, which will have just as big an effect on the relative widths of all the links and buttons.↩︎
Alternative external formats for arrays and complex numbers
Gauche now recognizes a few different external format for arrays and complex numbers. For writing, it can be selected by <write-controls> object, or print-mode in REPL.
Array literals
Gauche has been supporting srfi:25 arrays, but it does not define external representations.
Gauche uses srfi:10#, mechanism to allow to write
arrays that can be read back, but it is not very user-frendly.
The #2a(...) notation is defined in srfi:163 Enhanced Array Literals, and in its
simplest form, it is also compatible to Common Lisp's array literals. From 0.9.16, it is
the default output format of the array.
You can also make Gauche reports the lengthes of each dimension:
There was an asymmetry in input/output of complex literals. For reading, both the rectangular notation
1.4142135623730951+1.4142135623730951i and the polar notation 2@0.7853981633974483 are recognized, but
for printing, it is alyways in the rectangular notation.
Now you can choose the output format.
Gauche also extended the polar notation by adding suffix pi, e.g. 2@0.25pi to specify
the phase by the multiple of pi.
The following session shows how a complex number is printed with different print-mode:
Furthermore, Gauche also supports Common-Lisp style
complex notation, #c(...). This is particulary
useful to exchange data between Gauche and CL programs.
You can change all instances of that particular list. Super dangerous.♥︎
Here, I’ll show you:
(let* ((abc '(a b c))
(abc2 abc)
(fish '(f i s h))
(fish2 fish))
(push! 'nope abc)
(mutate-cons! 'yeah fish)
(list abc abc2 fish fish2))
Returns this:
((nope a b c) (a b c) (yeah f i s h) (yeah f i s h))
That’s not a slag on push! which is often what you want. It’s a
macro modifying what one particular name points to. Whereas
mutate-cons! is a procedure that changes the actual underlying
object.
Also mutate-cons! doesn’t work on the empty list. You can only push
onto lists with at least one element. They need to be actual cons pairs.
I feel a change is happening in how people produce and (want to) consume software, and I want to give my two cents on the matter.
It has become more mainstream to see people critical of "Big Tech". Enshittification has become a familiar term even outside the geek community. Obnoxious "AI" features that nobody asked for get crammed into products. Software that spies on its users is awfully common. Software updates have started crippling existing features, or have deliberately stopped being available, so more new devices can be sold. Finally, it is increasingly common to get obnoxious ads shoved in your face, even in software you have already paid for.
In short, it has become hard to really trust software. It often does not act in the user's best interest. At the same time, we are entrusting software with more and more of our lives.
Thankfully, new projects are springing up which are using a different governance model. Instead of a for-profit commercial business, there is a non-profit backing them. Some examples of more or less popular projects:
Some of these are older projects, but there seems to be something in the air that is causing more projects to move to non-profit governance, and for people to choose these.
As I was preparing this article, I saw an announcement that ghostty now has a non-profit organisation behind it. At the same time, I see more reports from developers leaving GitHub for Codeberg, and in the mainstream more and more people are switching to Signal.
From a user perspective, free software and open source software (FOSS) has advantages over proprietary software. For instance, you can study the code to see what it does. This alone can deter manufacturers from putting in user-hostile features. You can also remove or change what you dislike or add features you would like to see. If you are unable to code, you can usually find someone else to do it for you.
Unfortunately, this is not enough. Simply having the ability to see and change the code does not help when the program is a web service. Network effects will ensure that the "main instance" is the only viable place to use this; you have all your data there, and all your friends are there. And hosting the software yourself is hard for non-technical people. Even highly technical people often find it too much of a hassle.
Also, code can be very complex! Often, only the team behind it can realistically further develop it. This means you can run it yourself, but still are dependent on the manufacturer for the direction of the product. This is how you get, for example, AI features in GitLab and ads in Ubuntu Linux. One can technically remove or disable those features, but it is hard to keep such a modified version (a fork) up with the manufacturer's more desirable changes.
The reason is that the companies creating these products are still motivated by profit and increasing shareholder value. As long as the product still provides (enough) value, users will put up with misfeatures. The (perceived) cost of switching is too high.
Let us say a non-profit is behind the software. It is available under a 100% FOSS license. Then there are still ways things can go south. I think this happens most commonly if the funding is not in order.
For example, Mozilla is often criticised for receiving funding from Google. In return, it uses Google as the default search. To make it less dependent on Google, Mozilla acquired Pocket and integrated it into the browser. It also added ads on the home screen. Both of these actions have also been criticized. I do not want to pick on Mozilla (I use Firefox every day). It has clearly been struggling to make ends meet in a way that is consistent with its goals and values.
I think the biggest risk factor is (ironically) if the non-profit does not have a sustainable business model and has to rely on funding from other groups. This can compromise the vision, like in Mozilla's case. For web software, the obvious business model is a SaaS platform that offers the software. This allows the non-profit to make money from the convenience of not having to administer it yourself.
There is another, probably even better, way to ensure the non-profit will make good decisions. If the organization is democratically led and open for anyone to become a member like Codeberg e.v. is, it can be steered by the very users it serves. This means there is no top-down leadership that may make questionable decisions. Many thanks to Technomancy for pointing this out.
Ah, good old volunteer driven FOSS. Personally, I prefer using such software in general. There is no profit motive in sight and the developers are just scratching their own itch. Nobody is focused on growth and attracting more customers. Instead, the software does only what it has to do with a minimum of fuss.
I love that aspect, but it is also a problem. Developers often do not care about ease of use for beginners. Software like this is often a power tool for power users, with lots of sharp edges. Perfect for developers, not so much for the general public.
More importantly, volunteer driven FOSS has other limits. Developer burn-out happens more than we would like to admit, and for-profit companies tend to strip-mine the commons.
There are some solutions available for volunteer-driven projects. For example Clojurists together, thanks.dev, the Apache Foundation, the Software Freedom Conservancy and NLNet all financially support volunteer-driven projects. But it is not easy to apply to these, and volunteer-driven projects are often simply not organized in a way to receive money.
With a non-profit organisation employing the maintainers of a project, there is more guarantee of continuity. It also can ensure that the "boring" but important work gets done. Good interface design, documentation, customer support. All that good stuff. If there are paying users, I expect that you get some of the benefits of corporate-driven software and less of the drawbacks.
That is why I believe these types of projects will be the go-to source for sustainable, trustworthy software for end-users. I think it is important to increase awareness about such projects. They offer alternatives to Big Tech software that are palatable to non-technical users.
I thought the do in in Scheme was a li’l hard to learn since the
examples I could find was a li’l too fancy and clever. Just like a lot
of my own documentation often is; sorry about that.
(do ((a 0 (add1 a))
(e 13)
(i 7 (sub1 i)))
((zero? i) (print "ended ") (print a))
(print "game cube")
(print e)
(print a " wii"))
The first argument is a list of bindings like a let. But if you put in
three things they will be rebound every round. Like in the above
example, a will be rebound to (add1 a), and i will be rebound to
(sub1 i). And you can also put in just one thing like the example
here just binds e to 13 and then it just stays that.
The second argument is a list that starts with the ending condition
and the rest of the list are statements that will be evaled after.
And then all the remaining arguments are evaled every round.
Yes, this is backwards that the ending stuff comes before the main progn body.
Here’s a more basic example to just print happy birthday four times:
(do ((i 4 (sub1 i)))
((zero? i))
(print "happy birthday"))
Although for simple things like that, Chicken has dotimes:
(dotimes (i 4) (print "happy birthday"))
Both do and dotimes are pretty side-effects oriented but in both cases you can put in a return value:
(do ((i 4 (sub1 i)))
((zero? i) 'my-return-value)
(print "happy birthday"))
(dotimes (i 4 'my-return-value) (print "happy birthday"))
And that return value can access your do scoped binds.
The same miscmacros that gives us dotimes also has the even simpler (repeat 4 (print "happy birtday")).
I refer to my own fold write-up all the time (the
folds as named lets version hasn’t been as useful) and maybe with
this, I can use do and dotimes more instead of making let loops
and consing up unnecessary iotas.
Last time I attempted Advent of Code, I got tangled up modifying the
step one solutions to handle step two and then I ended up wanting
to revisit step one but they were gone. So this year I’m going to
try to paste a second copy before modifying and I hope that works out
better.
(define (dial acc r d)
(with (+ d r)
(when
(or
(zero? it)
(= 100 it)
(< it 0 d)
(< d 100 it))
(acc))
(modulo it 100)))
(define (dial acc (? (fn (< x -100)) r) d)
(acc)
(dial acc (+ r 100) d))
(define (dial acc (? (fn (< 100 x)) r) d)
(acc)
(dial acc (- r 100) d))
(define (dial acc (? string? x) y)
(dial acc (string->number (strse x "L" "-" "R" "")) y))
(define (turn)
(define-parameters zero 0)
(fold (c dial (fn (zero (add1 (zero))))) 50 (read-lines))
(zero))
This was one of the hardest bugs I’ve ever had to debug.
I’ve had to write log parsers, differs between different logs from
different versions, multiple implementations to check against each
other, Emacs highlight-regexp and count-matches and so on. It took
me ten tries on the Advent of Code website. I get paranoid that I
had mistyped my answer in there.
The line that now says (< it 0 d), originally I had it
as (<= it 0 d) but it gave false positives on rotating from zero.
For a while I had it as (and (< it 0) (<= 0 d)) which… doesn’t
fix that problem at all. Even after coming up with the fix (< it 0
d), that gives false negatives on rotating exactly one rotation
left from zero. But there’s no L100 in the data set? No, but my
code before I cleaned it up had:
(dial acc (+ r 100)
(dial acc -100 d))
where it now says:
(acc)
(dial acc (+ r 100) d)
…leading to lots and lots of zero to zero turns which came with a
false positive in some versions and false positives in others.
A complete PEBCAK on my part but the hunt for the bug became a real
adventure of trying to sift through clues in logs that were
thousands of lines long.
After adapting that same idea to part two with a few minor tweaks,
it’s too slow! Works fine with the example data but not the full
input. I hate it when I have a working solution that
I really like beacuse it does something clever but have to write a
whole new one that’s faster. This is a “Project Euler” type
problem where I need to come up with a math solution instead of
just list procressing. But then I don’t really hate it because I
did come up with a good solution.
Inverting the puzzle by making an is-in-any-range? predicate and
then we can generate all invalid numbers up to the ceiling and see
if they’re in any range.
(define (is-in-any-range? x)
(or
(<= 1 x 19)
(<= 51 x 69)
(<= 72 x 85)
(<= 86 x 113)
(<= 411 x 466)
(<= 525 x 652)
(<= 660 x 782)
(<= 859 x 1056)
(<= 1626 x 1972)
(<= 2768 x 3285)
(<= 4002 x 4783)
(<= 4919 x 5802)
(<= 7025 x 8936)
(<= 9096 x 10574)
(<= 13004 x 15184)
(<= 32138 x 36484)
(<= 48548 x 61680)
(<= 69302 x 80371)
(<= 82984 x 100358)
(<= 126397 x 148071)
(<= 193276 x 237687)
(<= 266408 x 302255)
(<= 333117 x 414840)
(<= 431250 x 455032)
(<= 528410 x 680303)
(<= 726807 x 764287)
(<= 779543 x 880789)
(<= 907442 x 983179)
(<= 2558912 x 2663749)
(<= 5117615 x 5149981)
(<= 7702278 x 7841488)
(<= 9231222 x 9271517)
(<= 13413537 x 13521859)
(<= 32295166 x 32343823)
(<= 49829276 x 50002273)
(<= 67606500 x 67729214)
(<= 99990245 x 100008960)
(<= 146086945 x 146212652)
(<= 4747426142 x 4747537765)
(<= 5552410836 x 5552545325)
(<= 5858546565 x 5858614010)
(<= 7454079517 x 7454227234)
(<= 8764571787 x 8764598967)
(<= 9999972289 x 10000034826)))
Okay, great! I checked that there’s no overlapping ranges in this
particular data set. That means we can make an idempotent summer so
we don’t add the same number twice.
Let’s hard code it to five-digit numbers which is okay for this
particular input.
(define (generate-the-answer)
(do ((num 1 (add1 num)))
((< 100000 num) (summer))
(add-all-repeats num num)))
Okay, that’s a relief! Today was a lot easier to debug. I
originally had the summer see the numbers even before they were
repeating. But that bug was easy enough to find and fix.
Dec 3rd
Okay, here we have a similar dilemma of “extracting” vs building up
possible joltages and filtering for them like (strse? "9.*9").
Maybe if I start with extracting, that will still be useful as a
fallback for any stragglers after a filtering solution.
That worked! Weird feeling how Monday took all day because I was
chasing a bug and even Tuesday took more than an hour, maybe closer
to three hours, but this one my idea worked right away and the
solution for part 1 was also the right direction for part 2. I
lucked out! And/or am actually good at programming especially when
it’s straight-forward list-processing like this.
Dec 4th
This time around (it’s my second time attempting Advent of
Code; I tried it in 2023 but quit before the end) I’m
paying more attention to the story and I’m really getting into the
Matt Groening–like shenanigans.
As for the puzzle, this type of 2d, maps-and-neighbors stuff is
something I don’t have as much of a standard library for. SRFI-1
doesn’t really cover it so I’m starting more from scratch here and
I’m buckling in, accepting that it might take a li’l more time and
what write here I’ll get use out of later too. I actually thought
to work a li’l bit ahead and look up some array stuff in the latter
SRFI’s but then I didn’t have time to do that in November.
(define (count-accessible)
(define nodes (call-list (map call-string (read-lines))))
(define (get-node x y) (void))
(define (get-node x y)
(handle-exceptions exn (void)
((require procedure?
(nodes (require (c < -1) y)))
(require (c < -1) x))))
(define (get-neighbors x y)
(parse (c apply get-node)
(list-ec (: dx -1 2)
(: dy -1 2)
(if (not (= 0 dy dx)))
(list (+ x dx) (+ y dy)))))
(let ((width (length (nodes)))
(height (string-length ((nodes 0)))))
(sum-ec (: x 0 width)
(: y 0 height)
(if (eq? #\@ (get-node x y)))
(if (> 4 (count (is? #\@) (get-neighbors x y))))
1)))
Okay I like it when it works first try because I hate to put in
more than one guess but this was right. Good. Also didn’t have any bugs.
Now onto part 2. I really have to give Advent of Code a stern
scolding when it comes to accessibility: the dark grey text on dark
grey background is really really really hard to read so I use
einkbro’s light mode but that mode didn’t show the highlighted @
signs in the part 2 example. I had to toggle off the mode but then
I almost can’t see anything on the screen. Bad bad elves!
But okay, I figured out from what the text says what to do.
(define (count-accessible)
(define nodes (call-list (map call-string (read-lines))))
(define (get-node x y) (void))
(define (get-node x y)
(handle-exceptions exn (void)
((require procedure?
(nodes (require (c < -1) y)))
(require (c < -1) x))))
(define (get-neighbors x y)
(parse (c apply get-node)
(list-ec (: dx -1 2)
(: dy -1 2)
(if (not (= 0 dy dx)))
(list (+ x dx) (+ y dy)))))
(let* ((width (length (nodes)))
(height (string-length ((nodes 0))))
(get-accessible
(lambda ()
(sum-ec (: x 0 width) (: y 0 height)
(if (memq (get-node x y) '(#\x #\@)))
(if (> 4 (count (fn
(memq x '(#\x #\@)))
(get-neighbors x y))))
(begin
((nodes y) x #\x)
1)))))
(descend ((accessible (get-accessible)))
(do-ec (: x 0 width) (: y 0 height)
(if (eq? #\x (get-node x y)))
((nodes y) x #\.))
(+ accessible (desc (get-accessible))))))
Okay. That worked. No wrong entries today either which always feels
great. I could spot my bugs on the example output. The bug today
was that while I realized right away that I need to count X as
neighbors, I forgot that I needed to count X as self too. So I was
done in a li’l less than an hour (three quarters rather) which is
fine. More than yesterday but that’s OK. I had to implement all
this 2D neighbors stuff. I liked the idea of using parse since it
just elides voids.
Today was a real head-scratcher because it seemed to me part 1 is a
subset of December 2nd and part 2 is even easier than part one.
Then I realized that the difference is that unlike December 2nd,
this time our input data have overlapping ranges (something I
checked for on Dec 2nd but almost forgot to do here). I’m grateful
that the test input also did, or I would’ve wasted a guess on the
real thing. Joining the ranges is just a smop once you know that
it’s there.
I live for this convoluted maps of maps of maps of maps stuff! Very
fun problem.
Uh but if I were to try to explain how my program works… Hmm. From the inside out:
Reads all lines as lines.
Adds extra spaces to the end so all lines are the same length.
Pivots the lists so columns become rows and rows become columns.
Then with Acetone’s parse I split the problems into their own
lists.
I split out the operator (that’s the list last, butlast stuff)
and put it first then read and eval each problem.
Then finally I sum all those answers up.
Didn’t have any bugs today. I did put in two redundant
reverses that still gave me the right answer; I found them and
removed them after getting the star while making this write
up.
Dec 7
(define-parameters splits 0)
(define (tachyon-count (prev current next . beams))
((over
(when (and (eq? x #\S) (eq? y #\.)) (current i #\S))
(when (and (eq? x #\S) (eq? y #\^))
(splits (add1 (splits)))
(next (sub1 i) #\S)
(next (add1 i) #\S)))
(prev) (current))
(tachyon-count
(cons* current next beams)))
(define (tachyon-count (last exit))
(splits))
(define (tachyon-count)
(tachyon-count (map call-string (read-lines))))
Now this is what I’m talking about! This is the longest I’ve spent
on a part 1 so far. Even Dec 1st, which was my longest day, part 1
wasn’t where I got stuck. Here I knew what to do, it was just tricky
to keep track of everything. Now onto part two of this wonderful
puzzle!
After reading part two… what a let down! It’s just the
non-idempotent version. Although smopping that together on a
tired-brain day like today is easier said than done.
I apologize to the makers of Advent of Code for calling their hard
work a let down, it’s just that the non-idempotent “naively
recursive” version is what I almost wrote by accident for part 1. I
checked myself in time before making that version so actually
implementing it did take some time.
Before I thought to clean up the input it was hard to keep track of
everything (I had prev, current, next, blank lines, passing through
etc). And I had something that worked on the example input but was
too slow for the real input. So I started over and that’s the
version you see above. It introduced a bug (I forgot to pass
through beams at first) which required some creative logging to
find with ever-increasing indentation prefixes etc etc until the
new version finally worked on the example input. But it was still
too slow for the real input. And memoization fixed that and here we
are. All in all an extra hour or two.
The hardest problem yet after the “breathers” of 5th and 6th, but I
remember last time (2023) I was completely stumped on some problems
even after spending a day with a paper notebook just thinking and
thinking and so far we haven’t seen that. I remember back then
having to postpone some of the stars like “Okay I’ll get back to
this one later” and doing it in the evening or the next day or
something and this year I’ve just done both of them in the morning
except for the first day that did take all day. (And what a
privilege to be able to work all day on a recreational
puzzle!) Maybe it says more about how incredibly burnt out
I was after the apartment move back then than about the
difficulties of the puzzles.
Also this one felt more like a “knowledge test” than the preious
entries. I knew about the basics of recursion vs iteration,
idempotence vs shadowing, and the life-changing magic of
memoization. I know about those things from books like SICP and
PAIP. It’s less
about me figuring out something clever and more about me having
book learning. That doesn’t feel super fair.
Maybe I should take this opportunity to share some of that book
learning: My part one solution went through every row once. That’s
why it’s fast. It’s an iterative solution. The part two solution
needs to go through every row for every beam splitter above it. It
branches over three trillion times. That is too slow for even my
super duper computer to figure out. But memoization, which in this
case means having a hash-table that stores the results it has seen
before, means that it doesn’t have to re-calculate subtrees it has
seen before. It becomes fast again.
memoize! the brev-separate version does work even through
match-generics (but it needs to be called after all the
definitions) and zshbrev (since the entire file is compiled).
Dec 8th
I usually hope for a hard one but today I overslept or rather I had
a hard time falling asleep and it’s already like two hours past my
normal wake-up time and I have laundry day so I hope it’s an easy
one today.
I haven’t gotten to read the problem yet but I’m opening the wrong
windows, pasting the wrong files etc. This is gonna be a hard day
no matter how easy the problem is just from my own tiredness.
(define (pyth a b) (sqrt (+ (* a a) (* b b))))
(define (distance (x1 y1 z1) (x2 y2 z2))
(pyth (abs (- z1 z2))
(pyth (abs (- x1 x2)) (abs (- y1 y2)))))
Note to self: It was way slower when or both of these was memoized
because they’re usually not called that often on the same inputs so
the lookup costs more than it saves. Also note to self: I could
save a little by removing the outermost sqrt. Orders of squares are
the same as the order of roots.
Aaand it’s a super hard problem. We’re in the back half now folks! The deep end!
Back after breakfast break. This is three problems (distance in 3D
space, keeping track of groups of circuits, and recursive pairwise
comparison) that each on their own would’ve been enough the past
week. I for one did not know how to check distances in 3D space so
I had to figure it out. (I did know how to check them in 2D space.)
(define circuits (call-table))
(define (connect a b)
(unless (memq a (circuits b))
(with (append! (circuits a) (circuits b))
(for-each (fn (circuits x it)) it))))
This one, take-up-to is a goody that I should get around to
putting in brev-separate. I use it all the time for RSS stuff. I’m
sure it’ll come in use for more than one day this challenge:
This divides up the half a million distances into spans.
After getting my two stars I wanted to keep optimizing and I timed
it out that five or six was the fastest quotient. I started out
with a hundred but that’s way slower. Even seven is slower and so
is four. With five, we have 25909 spans (hash-table entries) with
an average list length of just under twenty each. That’s an
indictment of my fancy mutating cdr-setting insert-sorted!. But a
testament to the glory of hash-tables.♥︎
Getting a solution that even works on the example of part one took
several hours (the actually figuring out the three parts of
the problem took a long time and then I also had a bad bug in my
connect routine where it worked until I was connecting existing
networks). That solution was too slow for the real
input—and that was still on the first star! The first problem was
that I was running sort on all the distances and then took the
limit on that sorted list. Running sort post-hoc on half a million
entries was something I thought it would’ve been able to handle but
apparently not.
Then I made an insert-sorted that, functionally (pure shadowing)
inserted the new distances as I went instead of sorting them
post-hoc. That finally gave me an answer for part one’s full data
but it took several minutes to find the answer. So after reading
part two, I made the mutating insert-sorted! and also added the
spans. Initially I had spans by hundreds which gave me the answer
in about twelve seconds or so. Really strange to me that sorting
was the bottleneck but that’s what it was.
Now for part two:
(define (connect a b)
(unless (memq a (circuits b))
(with (append! (circuits a) (circuits b))
(for-each (fn (circuits x it)) it)))
(length (circuits a)))
(define (connect-and-count)
(define boxes (read-csv))
(define boxl (length boxes))
((over (circuits x (list x))) boxes)
(pair-for-each
(fn (with (car x)
((over
(stash! (list (distance it x) it x)))
(cdr x)))) boxes)
(let/cc break
(for-each
(fn (for-each (fn (when (= (connect (second x) (third x)) boxl)
(break (* (caadr x) (caaddr x))))) (spans x)))
(sort (hash-table-keys (spans))))))
I had it down to “just” twelve seconds to sort all the half a
million distances, but then the connecting them all into one big
network was still too slow. I tried several other algoritms for
connect until I landed on the one I used above and I updated part
one to match also. Since this was largely an optimization puzzle, I
kept working on part one to make it faster making sure I’d still
get the right answer and then applying it to part two. So today I
didn’t submit any wrong answers either. It just took all day, is all.
I had a bunch of versions of connect which added all the boxes to
all the boxes one by one. One of those versions was bugged (before
I even had part one done). I felt like galaxy brain when I came up
with the append! solution since it was so different than anything I
had and such a Gordian shortcut. If some of y’all had that approach
figured out right away I salute you.♥︎ For me it took some time
getting there.
Wow!! I loved this puzzle! I had it ticking along slowly until I
figured out a new way to connect and now the connection part is
instant. I’m really proud of my solution. I ended up with getting
the distance sorting down to under two seconds and the connection
part to be instant. That’s a good optimization down from an
instance sorting that timed out, and then I got it down to twelve
seconds, so that then the connection part was what timed out. And
now the whole thing is done in two secs. I loved this puzzle. It
took all day but it was a day well spent and I learned a lot just
by experimenting, without looking things up (beyond just
reading the docs for SRFI-1, SRFI-69 and the other libraries I was
using. Especially my own. I don’t consider it cheating to read my
docs I’ve written and posted to this website♥.).
Maybe this is backwards and rotty but I’m way more proud of spending a whole
day on the problem like today than when I solve it quickly.
Although as per ushe with me there’s a zone of suck where
spending a couple of hours is what I’m least proud of compared to a
fast solution or sticking to it all day.
Dec 9th
I’m hoping for easy ones from here on out and at first glance it
seems like today delivered on that since I can use a similar
pairwise comparison as yesterday. Unlike yesterday where I couldn’t
figure out any easier way to count up the distances than to
actually pyth them out, here an idea immediately comes to mind
where I can discard candidates based on one or both axes and weed
things out considerably, but I’ll try the more brute force wasteful
approach first, maybe it’ll be enough.
(define (carpet (ax ay) (bx by)) (abs (* (- ax -1 bx) (- ay -1 by))))
(define (all-squares)
(with (read-csv)
(biggest (list-ec (:list a it) (:list b it) (carpet a b)))))
Sloppy! For the first time in a while I entered a wrong input into
the website; I didn’t notice that my code gave the wrong answer on
the example input, I was just happy that it was fast enough on both
the example input and the actual input. It was, so I’m not gonna
have to do anything fancy at least for part one, but, uh, being
right is more important than being fast.♥︎ The bug was that I had
forgotten the -1 above, counting the tile distances exclusive
rather than inclusive.
Okay so unlike yesterday where I thought part one was pretty
difficult on its own, here there’ a huge jump in difficulty by part
two, or rather, what I did for part two is not super relevant for
part two. But that’s okay. That’s why I like to get to part two
quickly so I can know what I’m really supposed to do. (And
the fact that it’s often hard to predict is part of the fun of
Advent of Code.)
It’s just after midnight and I haven’t figured out the second part yet.
Dec 9th, continued
Okay it’s the next morning! Back to yesterday’s problem before even
looking at the new problem. Most of the following was written
yesterday. I write and delete and write and delete, that’s my
workflow.
The red tiles are all inside the area 1837,1574 to 98308,98188 so
initializing a vector of vectors to fit that dies with OOM so
we’re gonna have to get fancy and procedural.
I painted the red and green tiles both green in this image and
superimposed my program’s best solution as a black rectangle on
top.
But it’s not accepted as the right answer. I can’t see a better
answer with my own eyes either. So I’m leaving this as a one star
and I’m just that much closer to giving up on the entire Advent of
Code. I obsess over it, I spend all day on it in this horrible
hyperfocused state. Other activities like playing games with
friends or eating food become stress isntead of joy. And to boot
I’m still not smart enough to actually solve the problems!
I’ve teased it long enough:1 today we’re going to look at the real meat of turning declarative, something-that-looks-like-this patterns into fast, running, imperative, if-this-then-that code. The absolute final generation of working Scheme code will have to wait until our thrilling conclusion in part 4, but by the end of this episode you’ll probably at least have an idea of what that will look like.
Today we’re in the (extensible-match decision-tree) library, which takes patacts (the AST representation of a pattern match clause introduced last time) and produces an optimized decision tree. It starts with a foreboding comment:
;; This is some of the hardest code I have ever had to write.
And, indeed, not since I was a tweenaged baby hacker struggling through an online tutorial on how to implement A* pathfinding for my game2 have I had so many false starts and wrong directions in implementing a known algorithm. But these were problems with working out how to translate the idea into code – the idea itself is pretty intuitive!
Motivating example
To see how we benefit from an optimizing decision tree generator, let’s look at an example not all too dissimilar from the last example we saw in the previous part.3
(define(uniq ls same?) ; remove adjacent equivalent elements from list (match ls ('() '()) ((cons_ '()) ls) ((cons* y x ls*) (if (same? y x) (uniq (cons x ls*) same?) (cons y (uniq (cons x ls*) same?))))))
For this input, a typical Lisp hacker’s first attempt at a pattern matcher implementation usually generates code that looks basically like this:
The idea is simple: compile each clause to a procedure, and tail-call the next procedure’s clause as a ‘failure continuation’ at any point where you notice that the pattern doesn’t match.
This is a completely reasonable implementation for very many, nay, probably even a majority of use cases. It’s how everyone learning to write a pattern matcher starts out, and indeed probably how every individual pattern match compiler starts out (including extensible-match!) The syntax-case pattern matcher implemented by psyntax basically works like this, because the absolute fastest matching time isn’t a priority for something that runs at expand time. Alex Shinn’s pattern matcher implementation works like this – anything more complicated would probably be hell to write in pure syntax-rules, and the sheer portability Shinn managed to achieve by writing only in syntax-rules has let his pattern matcher take over the world, inefficiencies and the occasional fudging on semantics be damned. (By contrast, SRFI 262’s dependence on identifier properties will likely hold it back from any equivalent level of world dominance for some time yet, until more Scheme implementers start to catch on to how nifty they are.)
Some of the inefficiencies in this particular code will be optimized away by the Scheme compiler. I mentioned in the first part of this series that redundant let bindings are easy to get rid of. Unfortunately, more consequential inefficiencies are harder to deal with. Most significantly, even after optimization, this code will call pair? and cdrtwice every time it goes through the loop: once for the second clause, then again in the third clause. While Scheme compilers are usually clever enough to eliminate repeated type checks when they occur within a single nested tree of code, putting each check in its own little procedure messes this compiler optimization up. The exception is Guile, because Andy Wingo developed a very clever compiler pass which can recognize common prefixes across these internal procedure boundaries. Also, very simple uses – like the last example from last episode – can have the clauses inlined into one another and thus end up in a nice nested tree for the compiler to work with; but as a data point, clause-3 above is already complex enough (or complex enough and referenced from enough places) that Chez’s cp0 doesn’t bother.
This repeated calling and branching is theoretically unsatisfactory, but the real philosophical issue in these inefficiencies – the thing that really motivated me to want to do better – is specifically that this generated code is nothing like what a programmer would write in the absence of pattern matching. Nobody would ever write code which had those repeated pair? calls. The real point I want to make by spending many lines of code on trying to do something better is that pattern matching is a good idiom, and programmers should use it. To that end, pattern matching should be a usable tool even in the hottest of hot loops, even on the slowest of microcontrollers: nobody should ever feel like they have to switch back to writing out if expressions by hand for performance’s sake. The fact that pattern matching is a declarative idiom, and declarative programming has something of a reputation for being inefficient, doesn’t exactly help its reputation here.
There are mountains of research into efficient compilation of pattern matching to avoid generating silly code like this. In the usual framing, there are two approaches: the backtracking automaton and the decision tree. The backtracking automaton, as its name implies, will still sometimes run the same tests on the same subject more than once, but tries to do so less than the naïve approach shown above which re-examines everything from scratch in every clause. The decision tree, on the other hand, will never re-examine any of its inputs, but may have much larger (exponentially larger!) code size than the automaton.
In fact these two approaches aren’t so different in practice – once you run the right time-saving optimizations for generating an automaton, or the right space-saving optimizations for generating a decision tree, the results are pretty close to one another. They’ll be nearly identical on very, very many use cases; it’s in the comparatively rare cases where they fall down that they fall down in different ways.
In the Scheme and Scheme-adjacent world, Racket’s pattern matcher by Sam Tobin-Hochstadt uses an optimized backtracking automaton after the technique by Le Fessant and Maranget (2001);4 extensible-match uses a decision tree after the technique by Maranget (2008). Yep, same guy! Luc Maranget is actually far from the only compiler researcher to have written papers on optimizing both techniques; but he is the only one whose papers now seem to be regarded as the somewhat definitive introduction to the state of the art for each of the two techniques.
Generating a decision tree
Maranget’s 2008 paper is heavy on notation, but mostly explains it as it goes along. It’s pretty accessible as European academia goes: if you’re good at reading and understanding other people’s code, you should be able to understand Maranget’s paper. It’s worth a read (or take a look at a shorter explanation by Colin James or David Nolen) – I’ll explain it here as it’s implemented in extensible-match, noting some (but not all!) differences to Maranget’s original presentation.
So, our input is a list of patterns and their associated actions, one for each clause. In fact, our input is always a list of rows of patterns – remember, we’re in %core-match-lambda which matches multiple patterns against multiple values, using the row-pattern AST type to group them together. Each AST pattern (and subpattern, but we’ll get to that) represents something we can do to get closer to picking a clause which matches the subject – meaning that with the row-pattern, we have a choice of multiple potential patterns to choose from. The ?-pattern, for example, represents that we could test its subject against a predicate; the var-pattern represents renaming an internal subject variable to a pattern variable visibly bound in the clause.
Let’s say we pick one pattern from the first row, since that’s the one we have to test first. (How we pick one is a topic we’ll get to later.) We’re going to act on this pattern – let’s say a ?-pattern testing pair? against our input x. What do we need to do?
What we ultimately want to do is generate (if (pair? x) <code-if-a-pair> <code-if-not-a-pair>); this gives us a hint that we also need to work out how to fill in those two slots. Well, what goes in those two slots?
The <code-if-not-a-pair> clause is easier to understand first, because if (pair? x) is false, the whole row can’t match any more: a row is an and combination of patterns, and one of the anded subpatterns just threw up false. So we can throw out that entire row and fill in <code-if-not-a-pair> with the code generated by repeating this process on that smaller list of rows with their actions.
But wait! What if the next row also contains a ?-pattern testing pair? on x – there’s no point testing that again, because it will be false a second time as well! So apart from throwing away the first row, we also walk through the remaining rows and throw away any of them which also depend on the same test before we repeat this process to generate the <code-if-not-a-pair>.
The <code-if-a-pair> is generated by a similar walk over the rows and recursion on the result. We don’t throw away the first row because in this case, that part of it did match. Instead, we remove it from the row and from any subsequent rows which make exactly the same test, preventing generating the same test again in the case where we know it succeeded as well as the case where we know that it failed.
That’s basically the core of Maranget’s algorithm. Maranget calls generating the new rows for <code-if-a-pair> specializing the patterns, and generating the new rows for <code-if-not-a-pair> defaulting the patterns – both with respect to a particular ‘constructor’, but because Scheme is different (more below), extensible-match calls these specializers. There are two base cases of the recursion: one when there are no rows left (no patterns matched; generate the code to signal a matching failure) and one when the top row only consists of irrefutable patterns (ones like _ wildcard patterns and var-patterns, which don’t have failure cases: in this case that row matched; generate the code to activate the corresponding action).
However, there’s one type of pattern in our AST which doesn’t have a failure case, but hides more subpatterns which do: the apply-pattern which calls some procedure on its subject variable and creates more subject variable(s) for the value(s) returned. Those subject variables are then available to be the subject of a subpattern of the apply-pattern.
For his equivalent of an apply-pattern Maranget makes a whole separate set of patterns when specializing, just containing the subpatterns. This probably works better for his compiler of ML patterns than it did when I tried in my compiler of Scheme patterns: for one thing, testing type (our ?-pattern) and deconstructing into values for subpatterns (our apply-pattern) are the same thing in his implementation; since ML is statically typed and pattern matching is the primitive means of deconstruction, the compiler has a lot more knowledge about the potential range of input values and what is and isn’t possible after each deconstruction step than extensible-match can get as a Scheme macro. In particular, Maranget’s patterns after each deconstruction step always form a matrix with columns for each value and rows for each clause. That’s not guaranteed by the structure of this AST (but explains why the term ‘row’ is used in extensible-match).
So instead, when an apply-pattern has been chosen as the specializer, each specialized row simply has the subpattern of that apply-pattern spliced into it. This means that each row-pattern can have a different length, depending on which apply-patterns have been specialized and had their subpatterns spliced in!
As one last point, in order to simplify the job of the procedures which pick a specializer and specialize and default the rows on it, after this splicing takes place (or more accurately before each recursive step), any rows nested within the rows are flattened out. If there were any or patterns, they’re also expanded into multiple copies of the entire patact, each with one of the or clauses, so the decision tree generator proper doesn’t need to look for specializers specially inside of or patterns either.
Representing a decision tree
For reasons that will become more apparent below and in the next episode, a decision tree is actually another kind of intermediate representation: we don’t represent it directly as Scheme code. Here’s its definition, slightly simplified:
Immediately it’s clear how much simpler this is than the AST we got as input: only four concrete node types vs nine pattern types in the AST.5 The action type from the AST also appears in decision trees, as the leaf nodes, where a matching clause has been chosen or the match failed.
?-patterns, when chosen, become dt-test nodes; apply-patterns become dt-apply nodes; quote-patterns become dt-equal nodes; var-patterns become dt-rename nodes. and-patterns, or-patterns, row-patterns, and not-patterns are reflected in the structure of the tree rather than as nodes themselves. wildcard-patterns are no-ops, as far as the pattern match compiler is concerned, and don’t show up in the tree at all.
Step by step
Here’s a worked example. Let’s take the last example from the previous episode:
(define(last ls) ; returns the last element in a list (match ls ((cons x '()) x) ((cons_ ls*) (last ls*))))
Our AST for this pattern match expression looks like this (in an entirely hypothetical notation):
((patact (row (and (? #:subject ls #:predicate pair?) (row (apply #:subject ls #:procedure car #:vars (ls.car) #:subpattern (row (var #:subject ls.car #:name x))) (apply #:subject ls #:procedure cdr #:vars (ls.cdr) #:subpattern (row (quote #:subject ls.cdr #:datum ())))))) (action return x)) (patact (row (and (? #:subject ls #:predicate pair?) (row (apply #:subject ls #:procedure car #:vars (ls.car) #:subpattern (row (wildcard #:subject ls.car))) (apply #:subject ls #:procedure cdr #:vars (ls.cdr) #:subpattern (row (var #:subject ls.cdr #:name ls*)))))) (action do-last ls*)))
We look at the first row, which only has one subpattern and thus only one potential specializer to pick: (? #:subject ls #:predicate pair?). So we generate a dt-test node which checks if ls is a pair.
In the success branch of this dt-test node, we put the result of re-running this algorithm on all of the patacts with this specializer applied. We don’t need the pair? any more in either row, so it will disappear and be replaced by the right-hand side of the and pattern which contains it. This in turn is a row in both cases; we splice it into the main row of each patact. Our specialized patacts now look like this:
As for the failure branch of dt-test: we know that the first patact can’t match any more just because this single test failed. We have to default on the second patact only – and that also required (pair? ls) to be true. So our defaulted patacts in this case are just the empty list, and the recursion on that will generate the leaf node to signal an exception because no clause matched that input.
Back on the success branch, we still have these two patacts, but this time the first row gives us a choice of what to do. We could pick the leftmost one – the car – but actually the car isn’t very interesting because it’s not refutable. The cdr, on the other hand, can fail, so let’s prioritize it. We generate a dt-apply binding the cdr of ls to ls.cdr; it only needs a success branch since we assume this can’t fail. So we specialize again, and the specialized rows, after splicing, look like this:
Still two patterns in the top row, and again, the quote pattern looking at ls.cdr is still considerably more interesting than the apply pattern, so we pick that as the specializer again. We generate a dt-equal node checking if ls.cdr is the empty list. This one does need success and failure branches.
On the success branch, the specialized patterns look like this:
We have completely eliminated the cdr patterns from the top row! Only the irrefutable car pattern is left, and when the first row is irrefutable, the pattern has matched – so we generate the dt-apply node and its dt-rename child and have it point to the action which returns x. Done!
On the failure branch of the check for a null ls.cdr, we’ve thrown away the top row again and we end up with this single patact:
Once again, this is irrefutable, so we again generate the dt-apply node for ls.car, the dt-rename node for the ls* variable, and then finish at the action.
Picking a pattern
Okay so, at each step of the pattern matching process, we pick a specializer from the first row’s patterns, generate a node from it, and specialize/default the remainder of the patterns on it. But how do we pick which pattern to choose from the row?
We have to work out which pattern is going to help us find answers the fastest, probably by comparing against the patterns in different rows somehow. We need a heuristic – a scoring function to guess which pattern is most likely to help find the matching clause in the fewest nodes possible.
Maranget considers the question of which heuristics to use, assigning letters to each of a number of potential scoring functions and measuring the impact of many potential combinations of them. In Maranget’s terms, extensible-match uses heuristic fnr: the specializer picked has to be in the first row;6 the specializer is the one which will specialize the most rows (most needed); if there’s a tie for the most needed specializer, we pick the pattern with the smallest combined size of the specialized and defaulted rows. (If even that goes to a tie, just pick the leftmost tied specializer.) I didn’t benchmark this particularly: n seemed like an intuitively sensible heuristic; based the suggestion in Maranget’s paper, I later added r as an additional tie-breaker.
Some of the heuristics Maranget considers aren’t really applicable to Scheme. His ML version can generate nodes with more outward branches than just success and failure, because the ML compiler can generate a single switch over many types at once; Scheme detects different types by calling individual predicate procedures for each individual type, so that’s not possible, and neither is the heuristic b which picks the smallest switch to build, for example – so extensible-match also can’t use the combination of heuristics Maranget personally recommends, pba.
So why was it so hard/why didn’t you do it slightly differently
As always, the hard thing was not the actual writing of the code, but working out what the right code to write was.
For example, you might ask why I wasn’t tempted to skip the step of representing decision trees as their own, reified type and simply recurse by generating new match expressions nested within the branches of an if or within the body of a let for the specializer we’ve chosen, for example.
This is, indeed, very tempting. If you’re implementing a compiler based on decision trees, you might even find some literature suggesting that you could do exactly this. Surely, you’ll think, just like those Swap nodes in Maranget’s paper aren’t actually part of how it’s really compiled – surely you can avoid generating the other nodes per se too, and just go straight to generating the code which those nodes will eventually be turned into?
So you try it, and it doesn’t work, and you regret it, and you commit it to the repository anyway as a warning to your future self and to others. Actually there were other things that were wrong with that first attempt – trying to insist that patterns form a matrix-like grid turned out to be the wrong thing for this kind of pattern input AST. My idea was that I really wanted to be able to property test the decision tree generator, ensuring that for any generated patterns and terms it always had the same result as the naïve style of generation shown above. I was thinking of decision tree generation as an optional optimization pass, rather than a fundamental transformation pass. This is the wrong mental model to have.
Anyway, the real reason that would have been the wrong approach even if it had worked is that it wouldn’t have allowed any optimization for code size or compilation speed. Wait whaaaaaat??
Optimizing space and time
Yeah, remember how I mentioned about the potentially exponential code size if you choose decision trees, but only in cases where the tricks to avoid it start to fail? It’s time to talk about those tricks.
The most important trick is sharing of nodes. When two different branches in the tree end up doing the same things, they should actually re-converge and become the same branch again. This makes the decision tree actually a decision dag (directed acyclic graph).
To achieve this sharing of nodes between different code paths, extensible-match uses two tricks. The most important one is hash consing, a trick more generally applicable in functional programming. When you have immutable structures which might be created many times over with the same contents, you can maybe save memory by storing all instances of structures of that type in a hash table. When someone asks to create a new instance of the type, first check if there’s already one existing which has the contents they’re interested in. If so, just return that existing one instead of creating a new one. (But if not, be sure to put the newly-created object into the table before returning it, in case someone asks for another one of the same in future.)
Neither R6RS’s record system nor its hash tables were really meant to support doing this, and the (extensible-match decision-tree) library has to do some pretty undignified nonsense in order to make it work. But it does work, and saves both time and space: space obviously, but also time because it stresses the garbage collector less to have fewer objects sitting around.
There’s a second layer of caching which is similar, one I mentioned I was considering using in an earlier post: the recursive procedure which generates tree nodes from a (specialized or defaulted) set of rows and actions is memoized. This has a similar effect to hash consing but, since hash consing already guarantees sharing, it optimizes compilation time and not decision tree size. Memoization avoids repeating an exponential recursion entirely in many cases. I measured how many cache hits the memo table got for some complex patterns, and how much it sped things up, and it was totally worth it. The 14 cases over three list variables which used to take 9.3 seconds to compile on Guile now comes back almost immediately.
Visualization
What’s the most common target language for compilers? x86? The JVM? C? JavaScript? If I were a betting lass, I might be tempted to put money on the answer being DOT, the GraphViz file format. If you want to understand what your compiler is doing, the easiest way to see it is often to actually have it draw your control-flow graph, automaton, or decision dag in two dimensions. Outsourcing the actual drawing task to GraphViz is a nice sweet spot in the trade-off between getting it working and making it nice.7
Initially I tried to see what was going on with a very silly (non-hygienic) code generator outputting Scheme code, but this quickly got tiring to read, especially because it couldn’t show sharing in the dag. (How the real code generator represents shared nodes is a topic for next time.) With a 2D graph, it’s relatively easy to get a rough impression on first glance of how well the decision tree generator handles certain cases; by following the arrows, you can fairly easily check for anything it’s doing wrong.
I’ve kept an archive of older DOT files showing how the pattern match compiler has got more capable over time, going back to some of the first decision trees this code ever generated, before it was even hooked up to a code generator. (This DOT file is dated 2 December 2024, and I didn’t get the code generator working until Candlemas.) This lets me see how certain changes improved or disimproved the generated trees over time.
Originally, to review these DOT files, I had to manually copy the generated DOT source into a file and invoke the dot command line tool. But recently I thought, hey, I have a terminal emulator which can display inline graphics, so why not just show them directly in the REPL? So I wrote a little library to do just that.
Source code of the library which plops a drawing of a graph straight into my terminal
(library (extensible-match dot) (export show-dot) (import (rnrs (6)) (only (guile) mkstemp port-filename waitpid) (srfi :39) (ice-9 popen)) (define-syntax show-dot (syntax-rules () ((_ expr) (let* ((dot-port (mkstemp "/tmp/dot-XXXXXX")) (dot-file (port-filename dot-port)));; It was probably a mistake that decision-tree->dot writes;; to current-output-port, but I make the best of it: (parameterize ((current-output-port dot-port)) expr) (close-port dot-port) (let-values (((from to pids) (pipeline `(("dot""-Tpng" ,dot-file);; I tried out both sixel with img2sixel;; (more widely supported) and the inline;; bitmap functionality of iTerm2 (my;; terminal emulator of choice) with this;; imgcat program. Sixel was very fragile;; when the terminal window got resized (the;; bitmap area would just unrecoverably;; blank out sometimes), so I went with;; imgcat. It’s for my own use only, anyway,;; and not too hard to change if needed: ("imgcat""--iterm2"))))) (close-port to) (let ((stdout (standard-output-port))) (put-bytevector stdout (get-bytevector-all from)) (flush-output-port stdout)) (close-port from) (for-each waitpid pids)))))))
Enough of the talking, what do these decision trees actually look like, then? Here’s the drawing for the last example we worked through above.
The shapes of the nodes are the classic flowchart style: diamonds for decisions, rectangles for steps, and circles for termination.
Note that the do-last path also generates a call to (car ls) even though it doesn’t use the value of the car. In theory, it should be smart enough not to do this, but I decided for now to leave that optimization to the Scheme compiler on the theory that it can probably do a better job of it anyway.
For another simple example, this is a type-safe list merge.
(define list-merge (match-lambda (('() '()) '()) (((?pair? xs) '()) xs) (('() (?pair? ys)) ys) (((cons x xs*) (cons y ys*)) (if (< y x) (cons y (list-merge (cons x xs*) ys*)) (cons x (list-merge xs* (cons y ys*)))))))
You can see the distinction between subject variables (arg1, arg2, arg1_car, arg1_cdr, arg2_car, arg2_cdr) and pattern variables. You can follow through and see how it avoids repeating the pair? check on both arguments for the final clause – there are two different pair? checks on arg2, but they’re in mutually exclusive code paths. There’s a long chain of dt-apply (receive) and dt-rename (let) nodes at the end, because the decision tree generator puts off those steps, which don’t affect whether the patterns match or not, until the end.
For a more complex example, let’s look at Chris Okasaki’s red-black tree balancing pattern (paper, book). This is probably pattern match compiler authors’ favourite example to use, because
it’s nontrivial
it’s real and not contrived
it’s a ‘hot’ function which gets called log2n times for every insertion into a tree of size n, making it important to optimize well
it’s a pretty amazing example of using pattern matching to translate reasoning about data structure invariants directly into running code, in a way that would be difficult without pattern matching
What does it do? Basically a red-black tree stays relatively well balanced by colouring the levels of nodes red or black, and maintaining two properties about where coloured nodes are allowed to appear relative to one another. One of these rules is that no red node is allowed to touch another red node: black touching black is okay; red touching red isn’t. So Okasaki works out that after finding a place to insert a new red node, there are exactly four ways in which a tree can end up ‘unbalanced’ in having two adjacent red nodes; in that case the tree needs to be ‘rotated’ a little to restore balance.
He writes that like this (but in ML):
(define(balance n) (match n ((or (node 'black (node 'red (node 'red a x b) y c) z d) (node 'black (node 'red a x (node 'red b y c)) z d) (node 'black a x (node 'red (node 'red b y c) z d)) (node 'black a x (node 'red b y (node 'red c z d)))) (node 'red (node 'black a x b) y (node 'black c z d))) (_ n)))
x, y, and z are values in the tree; a, b, c, and d are child nodes. The sort order of the values and the values in the child nodes is a < x < b < y < c < z < d. Visually explained, the job of this procedure is to turn one of these wrong trees:
Into this right tree:
Here’s what the decision dag looks like:
You can see that nodes are shared (1, 2, 3, 4) to reduce the overall size of the graph. This may look big and complicated, but at run time it’s efficient because no test or extraction will ever be repeated on any given code path. There are 13 branch points, the same number as in Colin James’s independent OCaml implementation of Maranget’s algorithm, a nice confirmation that my implementation is doing the right thing :-)
Can you imagine trying to write out the code for this pattern match by hand? You don’t have to imagine, because Rich Hickey implemented this data structure in bad old Java as part of Clojure. His version is spread all over the place because old-school Java just lacked the features to make writing this function nice. (I’m not even sure new-school Java could do much better.)
For an example where things start to fall down in Scheme, consider Maranget’s bytecode interpreter for a miniature dialect of ML. In figures 6 and 7 of his 2008 paper – which really ought to be shown as motivating examples on the first page – Maranget showed how a bad and a good decision tree for this can look in OCaml, a statically-typed language.
First of all, here’s roughly what the run function would look like in Scheme:
You can already see that we’re running up against the limits of what it’s sensible to do in Lisp notation here.
What does it look like when compiled? I shared what it looked like in the very first working versions of the decision tree generator above … and today it looks like this. To be sure, there’s improvement there (196 nodes in the December 2024 version, 135 nodes today). But it’s also very far from the nice, elegant dag of just a dozen or so branches which Maranget shows in his paper, even accounting for the fact that his version can switch over many types at once and mine can’t. What gives?
Well, here dynamic typing messes up opportunities to do better. If you trace down the graph from the top, it looks good until you get to the third case. Once the pattern matcher knows that the current instruction (the first element in the list c for code) is an iop but the top of the stack s isn’t a val, it should jump straight to fail because there is no situation in which any of the other patterns can match. But it doesn’t know enough to know that; what if something might match both iopandtest, or pushenv? A lot of these nodes are therefore dead, and will never execute except to go a long, slow way to a case that should never happen when this function is actually running anyway. I’ve been meaning to study in details exactly which branches are impossible, and see if it would be possible to do anything at all better here, like allow users to declare which predicates in ? are mutually exclusive. (Initial results on that particular feature weren’t exactly encouraging either. Update, 30 November: Scratch that! I just tried it again and it works great. 86 nodes, down from 135. I must have forgotten to clear the cache of pre-compiled code before the last time I tested it, or something. The problem now is that a complete implementation of this would require implementing something like Datalog …) It does do some very basic stuff to avoid generating redundant nodes like this, but not nearly as much as an ML compiler can.
As the folklore predicts, Racket’s backtracking automaton does better here for code size but worse for speed, generating many checks that c is a pair and many separate extractions of its car and cdr. It only does these things once for cases 1 and 2, but again for case 3, again for case 4, again for case 5, and only gets back on a roll again with cases 6, 7, and 8 …
Is all of this worth it?
Last of all, here’s the hand-wringing section where I ask whether I should have bothered with any of this!
In 2000 Kevin Scott and Norman Ramsey wrote a draft paper asking ‘When Do Match-Compilation Heuristics Matter?’; their answer is ‘almost never’. Out of 18 program benchmarks, only five were affected in decision tree size by the use of different heuristics, and three of those were made worse by choosing heuristics other than simple left to right selection of patterns. More encouraging is that 13 of the 18 benchmarks were artificial benchmark programs, but one of the five where heuristics did make some difference was a real-world program (the SML/NJ compiler).
Maranget’s results measuring heuristics may be more representative because he created a corpus of match expressions to test on specifically designed to represent a wide range of sizes and matching techniques; and because he actually used his own decision tree generation algorithm, the one extensible-match’s is based on. He finds a larger difference than Scott and Ramsey – about 20% improvement in code size and 10% improvement in average path length for any individual heuristic, compared to just naïvely picking the leftmost pattern in each row every time. As a reflection on how Maranget’s corpus might not be exactly representative, though, the heuristics only score about 15% better for code size and about 3% better for average path length than naïvely picking the rightmost pattern in each row.
Still, these are fairly slim pickings. As mentioned above, Guile’s optimizer is already capable of taking a naïve pattern match compilation and doing optimizations equivalent to generating a backtracking automaton that always picks the leftmost pattern for each operation; basically on the bytecode machine example it would probably do about as well as Racket’s match compiler, even without any help from the macro. The benchmarks by Scott and Ramsey suggest this should be good enough for almost every real use case. If Guile ever ships SRFI 262 in the standard distribution, it might well not be worth the extra weight in code to include a heuristic-driven decision tree generator; maybe that version should go back to just generating the naïve kind of compilation shown in the very first example above, and should let Guile’s own optimizer do the work. Changing strategy to the automaton style in that case also seem like the right thing, given that any version shipped with Guile would probably also end up in Hoot, which has its own reasons for optimizing code size. For other Scheme implementations which don’t (yet) do the online common subexpression elimination pass Andy Wingo developed, the winnings will definitely be bigger – although still diminishing if online CSE becomes more common.
Next time
Whew, this was a long one. Next time will be a comparatively short conclusion, on the code generator.
Especially considering personal matters got in the way of completing this article promptly. Sorry about that!↩︎
A transport/railway simulation game like Transport Tycoon that ran in a text terminal. That makes it sound a lot cooler (and more fun, and more functional) than it actually was.
If you want to learn how to do pathfinding in a game these days, that tutorial still isn’t bad, but you might have more fun with Amit Patel’s interactive walkthrough including other graph search algorithms.↩︎
This example is from Philip Wadler’s chapter ‘Efficient Compilation of Pattern Matching’ in Simon Peyton Jones’s book The Implementation of Functional Programming Languages, renamed after a Unix command line utility which hackers are probably at least passingly familiar with, and with an explicit comparison predicate argument because Scheme is monomorphic. The rest of this article has absolutely nothing to do with the contents of that chapter.↩︎
Actually it’s five vs ten because I’m still omitting the seq-pattern and its decision tree equivalent dt-seq here.↩︎
This is actually a requirement of the semantics of SRFI 262: all implementations have to use something like f as their first ‘heuristic’. This isn’t necessary in ML because the type testing functions the compiler notionally generates calls to are internal to the ML implementation and known to have universal domain; in SRFI 262 it’s possible to use a previous row as a type check that a procedure in a subsequent row will be safe to call.↩︎