Planet Scheme

Tuesday, July 8, 2025

Andy Wingo

guile lab notebook: on the move!

Hey, a quick update, then a little story. The big news is that I got Guile wired to a moving garbage collector!

Specifically, this is the mostly-moving collector with conservative stack scanning. Most collections will be marked in place. When the collector wants to compact, it will scan ambiguous roots in the beginning of the collection cycle, marking objects referenced by such roots in place. Then the collector will select some blocks for evacuation, and when visiting an object in those blocks, it will try to copy the object to one of the evacuation target blocks that are held in reserve. If the collector runs out of space in the evacuation reserve, it falls back to marking in place.

Given that the collector has to cope with failed evacuations, it is easy to give the it the ability to pin any object in place. This proved useful when making the needed modifications to Guile: for example, when we copy a stack slice containing ambiguous references to a heap-allocated continuation, we eagerly traverse that stack to pin the referents of those ambiguous edges. Also, whenever the address of an object is taken and exposed to Scheme, we pin that object. This happens frequently for identity hashes (hashq).

Anyway, the bulk of the work here was a pile of refactors to Guile to allow a centralized scm_trace_object function to be written, exposing some object representation details to the internal object-tracing function definition while not exposing them to the user in the form of API or ABI.

bugs

I found quite a few bugs. Not many of them were in Whippet, but some were, and a few are still there; Guile exercises a GC more than my test workbench is able to. Today I’d like to write about a funny one that I haven’t fixed yet.

So, small objects in this garbage collector are managed by a Nofl space. During a collection, each pointer-containing reachable object is traced by a global user-supplied tracing procedure. That tracing procedure should call a collector-supplied inline function on each of the object’s fields. Obviously the procedure needs a way to distinguish between different kinds of objects, to trace them appropriately; in Guile, we use an the low bits of the initial word of heap objects for this purpose.

Object marks are stored in a side table in associated 4-MB aligned slabs, with one mark byte per granule (16 bytes). 4 MB is 0x400000, so for an object at address A, its slab base is at A & ~0x3fffff, and the mark byte is offset by (A & 0x3fffff) >> 4. When the tracer sees an edge into a block scheduled for evacuation, it first checks the mark byte to see if it’s already marked in place; in that case there’s nothing to do. Otherwise it will try to evacuate the object, which proceeds as follows...

But before you read, consider that there are a number of threads which all try to make progress on the worklist of outstanding objects needing tracing (the grey objects). The mutator threads are paused; though we will probably add concurrent tracing at some point, we are unlikely to implement concurrent evacuation. But it could be that two GC threads try to process two different edges to the same evacuatable object at the same time, and we need to do so correctly!

With that caveat out of the way, the implementation is here. The user has to supply an annoyingly-large state machine to manage the storage for the forwarding word; Guile’s is here. Basically, a thread will try to claim the object by swapping in a busy value (-1) for the initial word. If that worked, it will allocate space for the object. If that failed, it first marks the object in place, then restores the first word. Otherwise it installs a forwarding pointer in the first word of the object’s old location, which has a specific tag in its low 3 bits allowing forwarded objects to be distinguished from other kinds of object.

I don’t know how to prove this kind of operation correct, and probably I should learn how to do so. I think it’s right, though, in the sense that either the object gets marked in place or evacuated, all edges get updated to the tospace locations, and the thread that shades the object grey (and no other thread) will enqueue the object for further tracing (via its new location if it was evacuated).

But there is an invisible bug, and one that is the reason for me writing these words :) Whichever thread manages to shade the object from white to grey will enqueue it on its grey worklist. Let’s say the object is on an block to be evacuated, but evacuation fails, and the object gets marked in place. But concurrently, another thread goes to do the same; it turns out there is a timeline in which the thread A has marked the object, published it to a worklist for tracing, but thread B has briefly swapped out the object’s the first word with the busy value before realizing the object was marked. The object might then be traced with its initial word stompled, which is totally invalid.

What’s the fix? I do not know. Probably I need to manage the state machine within the side array of mark bytes, and not split between the two places (mark byte and in-object). Anyway, I thought that readers of this web log might enjoy a look in the window of this clown car.

next?

The obvious question is, how does it perform? Basically I don’t know yet; I haven’t done enough testing, and some of the heuristics need tweaking. As it is, it appears to be a net improvement over the non-moving configuration and a marginal improvement over BDW, but which currently has more variance. I am deliberately imprecise here because I have been more focused on correctness than performance; measuring properly takes time, and as you can see from the story above, there are still a couple correctness issues. I will be sure to let folks know when I have something. Until then, happy hacking!

by Andy Wingo at Tuesday, July 8, 2025

Wednesday, June 11, 2025

Andy Wingo

whippet in guile hacklog: evacuation

Good evening, hackfolk. A quick note this evening to record a waypoint in my efforts to improve Guile’s memory manager.

So, I got Guile running on top of the Whippet API. This API can be implemented by a number of concrete garbage collector implementations. The implementation backed by the Boehm collector is fine, as expected. The implementation that uses the bump-pointer-allocation-into-holes strategy is less good. The minor reason is heap sizing heuristics; I still get it wrong about when to grow the heap and when not to do so. But the major reason is that non-moving Immix collectors appear to have pathological fragmentation characteristics.

Fragmentation, for our purposes, is memory under the control of the GC which was free after the previous collection, but which the current cycle failed to use for allocation. I have the feeling that for the non-moving Immix-family collector implementations, fragmentation is much higher than for size-segregated freelist-based mark-sweep collectors. For an allocation of, say, 1024 bytes, the collector might have to scan over many smaller holes until you find a hole that is big enough. This wastes free memory. Fragmentation memory is not gone—it is still available for allocation!—but it won’t be allocatable until after the current cycle when we visit all holes again. In Immix, fragmentation wastes allocatable memory during a cycle, hastening collection and causing more frequent whole-heap traversals.

The value proposition of Immix is that if there is too much fragmentation, you can just go into evacuating mode, and probably improve things. I still buy it. However I don’t think that non-moving Immix is a winner. I still need to do more science to know for sure. I need to fix Guile to support the stack-conservative, heap-precise version of the Immix-family collector which will allow for evacuation.

So that’s where I’m at: a load of gnarly Guile refactors to allow for precise tracing of the heap. I probably have another couple weeks left until I can run some tests. Fingers crossed; we’ll see!

by Andy Wingo at Wednesday, June 11, 2025

Monday, June 9, 2025

Scheme Requests for Implementation

SRFI 263: Prototype Object System

SRFI 263 is now in draft status.

This SRFI proposes a "Self"-inspired prototype object system. Such an object system works by having prototype objects that are cloned repeatedly to modify, extend, and use them, and is interacted with by passing messages.

by Daniel Ziltener at Monday, June 9, 2025

Wednesday, June 4, 2025

spritely.institute

Goblinville: A Spring Lisp Game Jam 2025 retrospective

Spritely participates in the Lisp Game Jam to make interactive artifacts demonstrating our progress building out our tech stack. The 2025 edition of the Spring Lisp Game Jam recently wrapped up and this time around we were finally able to show off using both Hoot and Goblins together to create a multiplayer virtual world demo! Now that we’ve had a moment to breathe, it’s time to share what we built and reflect on the experience.

But first, some stats about the jam overall.

Jam stats

Out of 26 total entries, 7 were made with Guile Scheme, including ours. Of those 7, all but one used Hoot, our Scheme to WebAssembly compiler. Guile tied for first place with Fennel as the most used Lisp implementation for the jam. We’re thrilled to see that Guile and Hoot have become popular choices for this jam!

Though many entries used Hoot, our entry was the only one that used Goblins, our distributed programming framework. However, David Wilson of System Crafters gets an honorable mention because he streamed several times throughout the jam while working on a MUD built with Goblins that was ultimately unsubmitted.

Our entry was Goblinville and it was rated the 7th best game in the jam overall. Not bad!

About Goblinville

Goblinville is a 2D, multiplayer, virtual world demo. During last year’s Spring Lisp Game Jam we made Cirkoban with a restricted subset of Goblins that had no network functionality. Since then, we’ve made a lot of progress porting Goblins to Hoot, culminating with the Goblins 0.15.0 release in January that featured OCapN working in the web browser using WebSockets.

Given all of this progress, we really wanted to show off a networked game this time. Making a multiplayer game for a jam is generally considered a bad idea, but Spritely is all about building networked communities so that’s what we set out to do. Our goal was to make something of a spiritual successor to the community garden demo I made when I first joined Spritely.

Screenshot of Jessica Tallon inGoblinville

What went well

First, let’s reflect on the good stuff. Here’s what went well:

  • Having participated in this jam a number of times, we have gotten pretty good at scoping projects down into something achievable.

  • Goblins made it easy to describe the game world as a collection of actors that communicate asynchronously. Initially, the entire world was hosted inside a single web browser tab. Once enough essential actors were implemented it was a simple task to push most of those actors into a separate server process. Since sending a message to a Goblins actor is the same whether it is local or remote, this change required little more than setting up an OCapN connection.

  • Communicating with actors over OCapN really helped with creating an architecture that separated server state from client-side input and rendering concerns. This was harder to think about with Cirkoban because there was no network separation.

  • The Hoot game jam template made it easy to get started quickly. It had been a year since we made our last game, so having a small template project was useful while we were refreshing our memory about the various Web APIs we needed to use.

  • The vast amount of freely licensed Liberated Pixel Cup (something our Executive Director Christine Lemmer-Webber organized back in her days at Creative Commons) assets allowed us to focus on the code while still having pleasing graphics that felt unified.

As a bonus, David Wilson gave Goblinville a shout out on a System Crafters stream and a bunch of people joined the server while I was online! It was a really cool moment.

Screenshot of six Goblinville players on screen at once

What didn’t go so well

Game jams are fast paced (even though the Lisp Game Jam is more relaxed than the average jam) and not everything goes according to plan. A big part of the game jam experience is to practice adjusting project scope as difficulties arise. Issues with the project included:

  • Time pressure. Unfortunately, we didn’t have as much time to dedicate to this project that we would have liked. We weren’t able to start work until the Monday after the jam started, so we only had 7 days instead of 10. Also, I came down with a cold at the end of the week which didn’t help my productivity. Making something that felt as polished as Cirkoban simply wasn’t possible.

  • Lack of persistence for the game world. There’s still some amount of pre-planning that goes into writing actors that can persist that we didn’t have time for. Furthermore, while our persistence system is written to support incremental updates, we don’t have a storage backend that supports it yet. Each tick of the game world would trigger a full re-serialization and we felt that was too much of a performance penalty. We hope that by the next jam this will no longer be an issue.

  • As predicted, multiplayer increased overall complexity. What felt like a stable enough world during local testing was quickly shown to have several performance issues and bugs once it was released to the public and other people started using it. We had to restart the server once every day or so during the jam rating period (though we have resolved these issues in a post-jam update). Since we weren’t persisting the game world, each restart wiped out all registered players and the state of the map.

  • No client-side prediction to mask lag. For example, when you press an arrow key to move, you won’t see the player sprite move in the client until it receives a notification from the server that the move was valid. In other words, how responsive the controls feel is directly tied to server lag. A production game client would move the player immediately and fix things up later if it receives contradictory information from the server.

Screenshot of 3 Goblinville players online, user “djm” is saying“hi!”

Post-jam updates

We did a bit of additional work after the jam was over to sand some of the roughest edges:

  • Re-architected the server update loop to greatly reduce message volume. Because it was simple to implement, actors in the game world were being sent a tick message at 60Hz to update their internal state. Most of the time, the actors would simply do nothing. A plant that is done growing has nothing left to do, so that’s 60 wasteful messages per second per plant. Instead, a timer system was added to schedule things to happen after so many ticks of the game world and the tick method was removed from all game objects. This greatly improved server stability, especially for worlds with lots of live objects. As of writing, we’ve had a server running for six days without any noticeable increase in lag.

  • Added a server event log. It was hard to see what was going on in the world during the jam rating period without being connected to the graphical client. Now the server process emits a timestamped log of every event to standard output.

  • Added character sprite selection. This feature just barely missed the jam submission deadline, but it’s in now! Instead of all players being the same sprite, there are now six to choose from.

  • Took down the public server. For the jam submission version, we had baked a URI into the itch.io client to a public server we were hosting so the game would “just work”. This was particularly important for the other participants who were rating the submitted games and giving feedback. Since the jam rating period is now over, we took down the public server. If you’re interested in trying out Goblinville, you can follow the instructions in the README to host your own server.

Also, Spritely co-founder Randy Farmer stopped by our updated Goblinville world!

Screenshot of current Spritely staff plus co-founder Randy Farmer inGoblinville

Wrapping up

Goblinville turned out to be more of a tech demo than a true game, but we’re quite happy with the result. We think it’s a good demonstration of what can be built with Goblins and Hoot in a short amount of time. We hope to build on this success to create even more engaging, featureful demos in the future!

by Dave Thompson (contact@spritely.institute) at Wednesday, June 4, 2025

Tuesday, May 27, 2025

Idiomdrottning

Endless scroll

The “endless scroll” debate was after it replaced pages where you’d scroll scroll scroll click, scroll scroll scroll click, scroll scroll scroll click. That was annoying while still not actually stemming addiction (at least for me). I’d still read through those megathreads on RPG.net, UI annoyances or no. The endless scroll it just took the clicks out of that process which was an improvement. But what I want is instead taking scrolls out of the process! So it’s tap, tap, tap, tap—like an ebook!

Probably going to be just as addictive but I won’t get anxiety from all the scrolling.

Scrolling and panning is fiddly and I never get exactly the right amount of page scrolled it’s like threding a needle repeatedly and most psge down algos are no good either since they’re paging in a text format that’s not designed for pages so you have to read the same couple of lines twice, last on this page and first on the next. So in the future maybe we’ll render HTML as actual pages (after all, epub readers can [sorta] do it). Even less and more on Unix can do it; they show all of one page, then all of the next page separately and so on. The weaksauce nature of page down in GUI apps like Netscape was one of the biggest letdowns when I first started using them in the nineties.

However, the addiction dark pattern has another component; the endless and often junky content which really makes the scroll endless. That part can not stay.

That’s a secondary reason for why I don’t like discover algorithms on Mastodon, the primary reason being how it’s artificial virality.

by Idiomdrottning (sandra.snan@idiomdrottning.org) at Tuesday, May 27, 2025

Thursday, May 22, 2025

Andy Wingo

whippet lab notebook: guile, heuristics, and heap growth

Greets all! Another brief note today. I have gotten Guile working with one of the Nofl-based collectors, specifically the one that scans all edges conservatively (heap-conservative-mmc / heap-conservative-parallel-mmc). Hurrah!

It was a pleasant surprise how easy it was to switch—from the user’s point of view, you just pass --with-gc=heap-conservative-parallel-mmc to Guile’s build (on the wip-whippet branch); when developing I also pass --with-gc-debug, and I had a couple bugs to fix—but, but, there are still some issues. Today’s note thinks through the ones related to heap sizing heuristics.

growable heaps

Whippet has three heap sizing strategies: fixed, growable, and adaptive (MemBalancer). The adaptive policy is the one I would like in the long term; it will grow the heap for processes with a high allocation rate, and shrink when they go idle. However I won’t really be able to test heap shrinking until I get precise tracing of heap edges, which will allow me to evacuate sparse blocks.

So for now, Guile uses the growable policy, which attempts to size the heap so it is at least as large as the live data size, times some multiplier. The multiplier currently defaults to 1.75×, but can be set on the command line via the GUILE_GC_OPTIONS environment variable. For example to set an initial heap size of 10 megabytes and a 4× multiplier, you would set GUILE_GC_OPTIONS=heap-size-multiplier=4,heap-size=10M.

Anyway, I have run into problems! The fundamental issue is fragmentation. Consider a 10MB growable heap with a 2× multiplier, consisting of a sequence of 16-byte objects followed by 16-byte holes. You go to allocate a 32-byte object. This is a small object (8192 bytes or less), and so it goes in the Nofl space. A Nofl mutator holds on to a block from the list of sweepable blocks, and will sequentially scan that block to find holes. However, each hole is only 16 bytes, so we can’t fit our 32-byte object: we finish with the current block, grab another one, repeat until no blocks are left and we cause GC. GC runs, and after collection we have an opportunity to grow the heap: but the heap size is already twice the live object size, so the heuristics say we’re all good, no resize needed, leading to the same sweep again, leading to a livelock.

I actually ran into this case during Guile’s bootstrap, while allocating a 7072-byte vector. So it’s a thing that needs fixing!

observations

The root of the problem is fragmentation. One way to solve the problem is to remove fragmentation; using a semi-space collector comprehensively resolves the issue, modulo any block-level fragmentation.

However, let’s say you have to live with fragmentation, for example because your heap has ambiguous edges that need to be traced conservatively. What can we do? Raising the heap multiplier is an effective mitigation, as it increases the average hole size, but for it to be a comprehensive solution in e.g. the case of 16-byte live objects equally interspersed with holes, you would need a multiplier of 512× to ensure that the largest 8192-byte “small” objects will find a hole. I could live with 2× or something, but 512× is too much.

We could consider changing the heap organization entirely. For example, most mark-sweep collectors (BDW-GC included) partition the heap into blocks whose allocations are of the same size, so you might have some blocks that only hold 16-byte allocations. It is theoretically possible to run into the same issue, though, if each block only has one live object, and the necessary multiplier that would “allow” for more empty blocks to be allocated is of the same order (256× for 4096-byte blocks each with a single 16-byte allocation, or even 4096× if your blocks are page-sized and you have 64kB pages).

My conclusion is that practically speaking, if you can’t deal with fragmentation, then it is impossible to just rely on a heap multiplier to size your heap. It is certainly an error to live-lock the process, hoping that some other thread mutates the graph in such a way to free up a suitable hole. At the same time, if you have configured your heap to be growable at run-time, it would be bad policy to fail an allocation, just because you calculated that the heap is big enough already.

It’s a shame, because we lose a mooring on reality: “how big will my heap get” becomes an unanswerable question because the heap might grow in response to fragmentation, which is not deterministic if there are threads around, and so we can’t reliably compare performance between different configurations. Ah well. If reliability is a goal, I think one needs to allow for evacuation, one way or another.

for nofl?

In this concrete case, I am still working on a solution. It’s going to be heuristic, which is a bit of a disappointment, but here we are.

My initial thought has two parts. Firstly, if the heap is growable but cannot defragment, then we need to reserve some empty blocks after each collection, even if reserving them would grow the heap beyond the configured heap size multiplier. In that way we will always be able to allocate into the Nofl space after a collection, because there will always be some empty blocks. How many empties? Who knows. Currently Nofl blocks are 64 kB, and the largest “small object” is 8kB. I’ll probably try some constant multiplier of the heap size.

The second thought is that searching through the entire heap for a hole is a silly way for the mutator to spend its time. Immix will reserve a block for overflow allocation: if a medium-sized allocation (more than 256B and less than 8192B) fails because no hole in the current block is big enough—note that Immix’s holes have 128B granularity—then the allocation goes to a dedicated overflow block, which is taken from the empty block set. This reduces fragmentation (holes which were not used for allocation because they were too small).

Nofl should probably do the same, but given its finer granularity, it might be better to sweep over a variable number of blocks, for example based on the logarithm of the allocation size; one could instead sweep over clz(min-size)–clz(size) blocks before taking from the empty block list, which would at least bound the sweeping work of any given allocation.

fin

Welp, just wanted to get this out of my head. So far, my experience with this Nofl-based heap configuration is mostly colored by live-locks, and otherwise its implementation of a growable heap sizing policy seems to be more tight-fisted regarding memory allocation than BDW-GC’s implementation. I am optimistic though that I will be able to get precise tracing sometime soon, as measured in development time; the problem as always is fragmentation, in that I don’t have a hole in my calendar at the moment. Until then, sweep on Wayne, cons on Garth, onwards and upwards!

by Andy Wingo at Thursday, May 22, 2025

Wednesday, May 21, 2025

spritely.institute

Functional hash tables explained

Prologue: The quest for a functional hash table for Goblins

For those of us that use the Lisp family of programming languages, we have much appreciation for the humble pair. Using pairs, we can construct singly-linked lists and key/value mappings called association lists. Lists built from pairs have the pleasant property of being immutable (if you abstain from using setters!) and persistent: extending a list with a new element creates a new list that shares all the data from the original list. Adding to a list is a constant time operation, but lookup is linear time. Thus, lists are not appropriate when we need constant time lookup. For that, we need hash tables.

The classic hash table is a mutable data structure and one that our Lisp of choice (Guile, a Scheme implementation) includes in its standard library, like most languages. Hash tables are neither immutable nor persistent; adding or removing a new key/value pair to/from a hash table performs an in-place modification of the underlying memory. Mutable data structures introduce an entire class of bug possibilities that immutable data structures avoid, and they’re particularly tricky to use successfully in a multi-threaded program.

Fortunately, there exists an immutable, persistent data structure that is suitable: the Hash Array Mapped Trie or HAMT, for short. This data structure was introduced by Phil Bagwell in the paper “Ideal Hash Trees� (2001). HAMTs were popularized in the Lisp world by Clojure over a decade ago. Unfortunately, Guile does not currently provide a HAMT-based functional hash table in its standard library.

Instead, Guile comes with the VList, another one of Phil Bagwell’s creations which can be used to build a VHash. Goblins currently uses VLists for its functional hash table needs, but HAMTs are better suited for the task.

There are various implementations of HAMTs in Scheme floating about, but none of them seem to have notable adoption in a major Guile project. So, we thought we’d write our own that we could ensure meets the needs of Goblins, compiles on Hoot, and that might just be useful enough to send upstream for inclusion into Guile after it’s been battle tested. To that end, we recently added the (goblins utils hashmap) module. This will become the base for a future re-implementation of the ^ghash actor.

Okay, enough context. Let’s talk about HAMTs!

What the hoot is a HAMT anyway?

From the outside, HAMTs can seem rather mysterious and intimidating. That’s how I felt about them, at least. However, once I dug into the topic and started writing some code, I was pleasantly surprised that the essential details were not too complicated.

As mentioned previously, the HAMT is an immutable, persistent data structure that associates keys with their respective values, just like a regular hash table. By utilizing a special kind of tree known as a “trie� plus some nifty bit shifting tricks, HAMTs achieve effectively constant time insertion, deletion, and lookup despite all operations being logarithmic time on paper.

A trie differs from a tree in the following way: tries only store keys in their leaf nodes. If you’re familiar with binary search trees, tries aren’t like that. Instead, the key itself (or the hash thereof, in our case) encodes the path through the trie to the appropriate leaf node containing the associated value. Tries are also called “prefix trees� for this reason.

A binary tree node can have at most two children, but HAMTs have a much larger branching factor (typically 32). These tries are wide and shallow, so few nodes need to be traversed to find the value for any given key. This is why the logarithmic time complexity of HAMT operations can be treated as if it were constant time in practice.

Trie representation

To explain, let’s start by defining a trie node using a branching factor of 32. At its simplest, we can think of a trie node as a 32-element array. Each element of the trie can contain either a leaf node (a key/value pair) or a pointer to a subtrie (another 32 element array).

For example, a HAMT with 3 entries might look like this:

Example trie visualization

This is nice and simple, but it’s a little too simple. With such a large branching factor, it’s likely that many boxes in the trie will have nothing in them, like in the above example. This is a waste of space. The issue is further compounded by immutability: adding or removing a single element requires allocating a new 32 element array. For the sake of efficiency, we need to do better.

Instead, we’ll use a sparse array that only stores the occupied elements of the trie node. To keep track of which elements of the theoretical 32 element array are occupied, we’ll use a bitmap. Below is an example trie:

Example trie visualization

In the above example, bits 4 and 10 (starting from the right) are set. This means that of the 32 possible elements, only 2 are currently occupied. Thus, the size of the underlying array is 2.

To get the number of occupied elements in the trie node, we simply count the number of 1s in the bitmap. This is known as the “population count�. (We could also just check the size of the array, but this bit counting idea is about to have another important use.)

To retrieve the value at index 10, we need to perform a translation to get an index into the underlying array. To do this, we compute the population count of the bits set to the right of bit 10. There is 1 bit set: bit 4. Thus, the value we’re looking for is stored at index 1 in the underlying array. If we take a peek, we find bar → 2 there.

Thanks to this sparse storage technique, insertion/deletion operations will allocate less memory overall.

Insertion algorithm

With the trie representation out of the way, let’s walk through the algorithm to insert a new key/value pair. The insertion algorithm covers all the essential aspects of working with a HAMT.

Let’s say we want insert the mapping foo→42 into an empty HAMT. The empty HAMT consists of a single node with an empty bitmap and an empty array:

Empty trie

To figure out where to store the key foo, we first need to compute the hash code for it. For ease of demonstration, we’ll use 10-bit hash codes. (In practice, 32 bits or more is ideal.)

Let’s say our fictitious hash function produces the hash bits 1000010001 for foo.

Each trie node has 32 possible branches. Thus, the range of indices for a node can be represented using a 5-bit unsigned integer. What if we took the 5 most significant bits of the hash code and used that as our index into the trie? Wow, that sounds like a clever idea! Let’s do that!

Hash bits for foo

So, foo gets inserted at index 16. The original trie is empty, so we’ll make a new trie with a single leaf node. To do this, we need to set bit 16 in our bitmap and create an array with just one key/value pair in it. Our output trie looks like this:

Single level trie with one leaf node

Note that we only examined 5 bits of the hash code. We only need to examine as many bits in the hash as it takes to find an empty element.

Let’s insert another key/value pair, this time for key bar and value 17. Our made-up hash code for bar is 0100100001. Repeating the process above, the most significant 5 bits are 01001, so our index is 9, another unoccupied index. Our new trie looks like this:

Single level trie with two leaf nodes

Because 9 < 16, the entry for bar is stored in array index 0, followed by foo.

As a last example, let’s insert a key/value pair where the most significant bits of the hash code collide with an existing entry. This time, the key is baz and the value is 66. Our made-up hash code for baz is 1000000001.

The most significant 5 bits are 10000, so our index is 16. Now things get interesting because 16 is already occupied; the first 5 bits are not enough to distinguish between the keys foo and baz! To resolve this, we’ll replace the leaf node with a subtrie that uses the next 5 bits when calculating indices. The resulting root trie node will look sort of like this:

Subtrie placeholder

In the figure above, subtrie is a placeholder for the new trie we need to construct to hold the mappings for foo and baz.

To create the subtrie, we just recursively apply the same algorithm we’ve already been using but with different bits. Now we’re looking at these the least significant 5 bits of the hash codes:

Hash bits for foo and bar

The index for foo is 17, and the index for baz is 1. So, our new subtrie will have bits 1 and 17 set, and contain 2 leaf nodes:

Subtrie with two leaf nodes

Putting it all together, the complete trie looks like this:

Complete trie with a subtrie and three total leaf nodes

Each insertion creates a new trie. The new trie shares nearly all of the data with the original trie. Only the root node and the visited subtries need to be allocated afresh. This makes insertion even into large HAMTs quite efficient.

The partial hash collision described above could happen recursively, in which case the trie will grow yet another level deeper upon each iteration. The worst case scenario is that two or more keys have the same exact hash code. A good hash function and lots of hash bits will make this a very rare occurrence, but a robust insertion algorithm needs to account for this case. We’ll gloss over this edge case here, but hash collisions can be handled by “bottoming out� to a special leaf node: a linked list of the colliding key/value pairs. These lists need to be traversed linearly when necessary upon lookup or future insertions that cause more collisions. In practice, these collision lists tend to be quite short, often only of length 2.

Wrapping up

The insertion algorithm explains all of the essential elements of a HAMT and how to work with its sparse storage structure, but we’ll touch upon lookup and deletion very briefly.

Looking up the value for a key follows the same basic process as insertion, but in a read-only manner. If the hash bits point to an unoccupied element of a trie node then the search fails right then and there. If the bits point to a leaf node with a matching key, the search succeeds. If the key doesn’t match, the search fails. Finally, if the bits point to a subtrie, we recursively search that subtrie with the next set of bits.

Deletion is the most complicated operation as it involves path compression when subtries become empty. If you’ve been successfully nerd sniped by this blog post then consider this a homework assignment. Give Phil Bagwell’s paper a read! 🙂

Thanks for following along! Hope this was fun!

by Dave Thompson (contact@spritely.institute) at Wednesday, May 21, 2025

Saturday, May 17, 2025

The Racket Blog

Racket v8.17

posted by Stephen De Gabrielle


We are pleased to announce Racket v8.17 is now available from https://download.racket-lang.org/.

As of this release:

  • The new drracket-core package provides a version of drracket with a smaller set of dependencies.
  • Typed Racket has support for treelists.
  • The package manager computes checksums for packages when required, allowing the use and automatic upgrade of packages without them.
  • The bitwise-first-bit-set function returns the smallest bit that is set in the twos-complement representation of the given number.
  • The updated dynamic-require function makes it easier to use syntax bindings by allowing a syntax-thunk (or ’eval) to be used for them.
  • The error-module-path->string-handler parameter allows the customization of the display of module-paths in error messages.
  • Precision of certain numeric functions (sin, cos, and others) is improved on Windows platforms by using the MSVCRT/UCRT libraries.
  • The string-append function has improved performance and reduced memory use for long lists of strings in the Racket CS implementation. Differences are clearly noticeable for lists of length 1 million.
  • TCP ports use SO_KEEPALIVE, instructing the kernel to send periodic messages while waiting for data to check whether the connection is still responsive.
  • Racket code using a terminal in Windows can receive mouse events as virtual terminal characters after using SetConsoleMode. (This is also already possible on macOS and Linux.) See the tui-term package for related example code.
  • The #:replace-malformed-surrogate? keyword can be used to specify a replacement for malformed unicode surrogates in JSON input
  • The http-client module no longer sends “Content-Length: 0” for requests without a body.
  • The demodularizer (compiler/demod) can prune more unused assignments.
  • Several judgment rendering forms in Redex are replaced by functions, allowing more convenient abstraction.
  • When a distribution includes no teaching languages, DrRacket’s language-dialog configuration moves into the preferences dialog and the “Language” menu disappears.
  • The math library has better support for block-diagonal matrices, including both Racket and Typed Racket.
  • The math library contains improved implementations of acos and matrix-(cos-)angle.
  • The stepper again works for big-bang programs.
  • There are many other repairs and documentation imprevements!

Thank you

The following people contributed to this release:

Alexander Shopov, Andrei Dorian Duma, Bert De Ketelaere, Bob Burger, Bogdan Popa, Bogdana Vereha, Cameron Moy, Chung-chieh Shan, Cutie Deng, D. Ben Knoble, Dario Hamidi, Dominik Pantůček, Gustavo Massaccesi, halfminami, Jacqueline Firth, Jason Hemann, Jens Axel Søgaard, Joel Dueck, John Clements, Jordan Harman, Marc Nieper-Wißkirchen, Matthew Flatt, Matthias Felleisen, Mike Sperber, Noah Ma, owaddell-ib, Philippe Meunier, Robby Findler, Ryan Culpepper, Ryan Ficklin, Sam Phillips, Sam Tobin-Hochstadt, Shu-Hung You, sogaiu, Sorawee Porncharoenwase, Stephen De Gabrielle, Vincent Lee, and Wing Hei Chan.

Racket is a community developed open source project and we welcome new contributors. See racket/README.md to learn how you can be a part of this amazing project.

Feedback Welcome

Questions and discussion welcome at the Racket community on Discourse or Discord.

Please share

If you can - please help get the word out to users and platform specific repo packagers

Racket - the Language-Oriented Programming Language - version 8.17 is now available from https://download.racket-lang.org

See https://blog.racket-lang.org/2025/05/racket-v8-17.html for the release announcement and highlights.

by John Clements, Stephen De Gabrielle at Saturday, May 17, 2025

Thursday, May 15, 2025

Andy Wingo

guile on whippet waypoint: goodbye, bdw-gc?

Hey all, just a lab notebook entry today. I’ve been working on the Whippet GC library for about three years now, learning a lot on the way. The goal has always been to replace Guile’s use of the Boehm-Demers-Weiser collector with something more modern and maintainable. Last year I finally got to the point that I felt Whippet was feature-complete, and taking into account the old adage about long arses and brief videos, I think that wasn’t too far off. I carved out some time this spring and for the last month have been integrating Whippet into Guile in anger, on the wip-whippet branch.

the haps

Well, today I removed the last direct usage of the BDW collector’s API by Guile! Instead, Guile uses Whippet’s API any time it needs to allocate an object, add or remove a thread from the active set, identify the set of roots for a collection, and so on. Most tracing is still conservative, but this will move to be more precise over time. I haven’t had the temerity to actually try one of the Nofl-based collectors yet, but that will come soon.

Code-wise, the initial import of Whippet added some 18K lines to Guile’s repository, as counted by git diff --stat, which includes documentation and other files. There was an unspeakable amount of autotomfoolery to get Whippet in Guile’s ancient build system. Changes to Whippet during the course of integration added another 500 lines or so. Integration of Whippet removed around 3K lines of C from Guile. It’s not a pure experiment, as my branch is also a major version bump and so has the freedom to refactor and simplify some things.

Things are better but not perfect. Notably, I switched to build weak hash tables in terms of buckets and chains where the links are ephemerons, which give me concurrent lock-free reads and writes but not resizable tables. I would like to somehow resize these tables in response to GC, but haven’t wired it up yet.

Anyway, next waypoint will be trying out the version of Whippet’s Nofl-based mostly-marking collector that traces all heap edges conservatively. If that works... well if that works... I don’t dare to hope! We will see what we get when that happens. Until then, happy hacking!

by Andy Wingo at Thursday, May 15, 2025

Friday, May 9, 2025

Andy Wingo

a whippet waypoint

Hey peoples! Tonight, some meta-words. As you know I am fascinated by compilers and language implementations, and I just want to know all the things and implement all the fun stuff: intermediate representations, flow-sensitive source-to-source optimization passes, register allocation, instruction selection, garbage collection, all of that.

It started long ago with a combination of curiosity and a hubris to satisfy that curiosity. The usual way to slake such a thirst is structured higher education followed by industry apprenticeship, but for whatever reason my path sent me through a nuclear engineering bachelor’s program instead of computer science, and continuing that path was so distasteful that I noped out all the way to rural Namibia for a couple years.

Fast-forward, after 20 years in the programming industry, and having picked up some language implementation experience, a few years ago I returned to garbage collection. I have a good level of language implementation chops but never wrote a memory manager, and Guile’s performance was limited by its use of the Boehm collector. I had been on the lookout for something that could help, and when I learned of Immix it seemed to me that the only thing missing was an appropriate implementation for Guile, and hey I could do that!

whippet

I started with the idea of an MMTk-style interface to a memory manager that was abstract enough to be implemented by a variety of different collection algorithms. This kind of abstraction is important, because in this domain it’s easy to convince oneself that a given algorithm is amazing, just based on vibes; to stay grounded, I find I always need to compare what I am doing to some fixed point of reference. This GC implementation effort grew into Whippet, but as it did so a funny thing happened: the mark-sweep collector that I prototyped as a direct replacement for the Boehm collector maintained mark bits in a side table, which I realized was a suitable substrate for Immix-inspired bump-pointer allocation into holes. I ended up building on that to develop an Immix collector, but without lines: instead each granule of allocation (16 bytes for a 64-bit system) is its own line.

regions?

The Immix paper is funny, because it defines itself as a new class of mark-region collector, fundamentally different from the three other fundamental algorithms (mark-sweep, mark-compact, and evacuation). Immix’s regions are blocks (64kB coarse-grained heap divisions) and lines (128B “fine-grained” divisions); the innovation (for me) is the optimistic evacuation discipline by which one can potentially defragment a block without a second pass over the heap, while also allowing for bump-pointer allocation. See the papers for the deets!

However what, really, are the regions referred to by mark-region? If they are blocks, then the concept is trivial: everyone has a block-structured heap these days. If they are spans of lines, well, how does one choose a line size? As I understand it, Immix’s choice of 128 bytes was to be fine-grained enough to not lose too much space to fragmentation, while also being coarse enough to be eagerly swept during the GC pause.

This constraint was odd, to me; all of the mark-sweep systems I have ever dealt with have had lazy or concurrent sweeping, so the lower bound on the line size to me had little meaning. Indeed, as one reads papers in this domain, it is hard to know the real from the rhetorical; the review process prizes novelty over nuance. Anyway. What if we cranked the precision dial to 16 instead, and had a line per granule?

That was the process that led me to Nofl. It is a space in a collector that came from mark-sweep with a side table, but instead uses the side table for bump-pointer allocation. Or you could see it as an Immix whose line size is 16 bytes; it’s certainly easier to explain it that way, and that’s the tack I took in a recent paper submission to ISMM’25.

paper??!?

Wait what! I have a fine job in industry and a blog, why write a paper? Gosh I have meditated on this for a long time and the answers are very silly. Firstly, one of my language communities is Scheme, which was a research hotbed some 20-25 years ago, which means many practitioners—people I would be pleased to call peers—came up through the PhD factories and published many interesting results in academic venues. These are the folks I like to hang out with! This is also what academic conferences are, chances to shoot the shit with far-flung fellows. In Scheme this is fine, my work on Guile is enough to pay the intellectual cover charge, but I need more, and in the field of GC I am not a proven player. So I did an atypical thing, which is to cosplay at being an independent researcher without having first been a dependent researcher, and just solo-submit a paper. Kids: if you see yourself here, just go get a doctorate. It is not easy but I can only think it is a much more direct path to goal.

And the result? Well, friends, it is this blog post :) I got the usual assortment of review feedback, from the very sympathetic to the less so, but ultimately people were confused by leading with a comparison to Immix but ending without an evaluation against Immix. This is fair and the paper does not mention that, you know, I don’t have an Immix lying around. To my eyes it was a good paper, an 80% paper, but, you know, just a try. I’ll try again sometime.

In the meantime, I am driving towards getting Whippet into Guile. I am hoping that sometime next week I will have excised all the uses of the BDW (Boehm GC) API in Guile, which will finally allow for testing Nofl in more than a laboratory environment. Onwards and upwards!

by Andy Wingo at Friday, May 9, 2025

Wednesday, May 7, 2025

Scheme Requests for Implementation

SRFI 262: Extensible pattern matcher

SRFI 262 is now in draft status.

A pattern matching form which can operate on arbitrary Scheme values is defined. It conforms to the following design principles.

The syntax of patterns is declarative. The syntax is extensible in the same way that Scheme’s procedural syntax is extensible by macros.

For most use cases, the use of the pattern matcher should produce code which is essentially as efficient at run time as equivalent procedural Scheme code would be, assuming a moderately optimizing Scheme implementation. This applies only when the equivalent code is also equivalently correct in terms of handling of error cases (i.e. including all type checks done automatically by the pattern matcher). However, using extension syntax should not cause this principle to be violated (provided the extension syntax is appropriately implemented).

by Daphne Preston-Kendal at Wednesday, May 7, 2025

SRFI 261: Portable SRFI Library Reference

SRFI 261 is now in draft status.

This SRFI proposal addresses systemic compatibility issues exposed by the library reference format defined in SRFI 97 (srfi :<SRFI number> <identifier> ...) and advocates for a modernized, portable alternative: (srfi srfi-<SRFI number> <identifier> ...).

by WANG Zheng at Wednesday, May 7, 2025

SRFI 260: Generated Symbols

SRFI 260 is now in final status.

This SRFI defines the procedure generate-symbol. Each time it is invoked, the procedure returns a new symbol whose name cannot be guessed. The returned symbol is a standard symbol for all purposes; it obeys write/read invariance and it is equal to another symbol if and only if their names are spelled the same.

by Marc Nieper-Wißkirchen at Wednesday, May 7, 2025

SRFI 259: Tagged procedures with type safety

SRFI 259 is now in final status.

Tagged procedures are procedures with boxes attached, which can be used to create applicable records and other abstractions. This SRFI proposes a variant with the notion of a tagging protocol, analogous to a record type definition, for ensuring encapsulation and security for tagged procedures.

by Daphne Preston-Kendal at Wednesday, May 7, 2025

Tuesday, May 6, 2025

spritely.institute

Hoot 0.6.1 released!

We are excited to announce the release of Hoot 0.6.1! Hoot is a Scheme to WebAssembly compiler backend for Guile, as well as a general purpose WebAssembly toolchain. In other words, Scheme in the browser!

This small patch release contains bug fixes and a few new features added since the 0.6.0 release back in January. Just in time for the upcoming Lisp Game Jam!

Make a game with Hoot!

The 2025 edition of the Spring Lisp Game Jam starts on Friday, 5/9! The Lisp Game Jam is a casual, 10-day long event where participants build small games using the Lisp implementation of their choice. We’d like to encourage you to join the jam and try making a game with Hoot! Hoot is a great choice for game jams because web-based games are both easy to distribute and easy for other people to play.

To make it easy to jump right in and make a game with Hoot, we have a ready-to-go template repository on Codeberg that you can clone. It takes care of all the boilerplate and provides some useful bindings to essential web APIs like HTML5 Canvas.

Okay, game jam promotion over! Read on for the full release notes.

New features

  • New --bundle flag for guild compile-wasm that will automatically install runtime libraries alongside the generated binary (reflect.js, reflect.wasm, wtf8.wasm).

  • Integrated psyntax-based macro expander (hoot expander) into eval.

  • Add -g flag for controlling compiler debug level.

  • Changed current-module to require compilation with -g.

  • Changed (hoot environments) to use run-time module registry.

  • Added logcount and vector-move-left! procedures, part of Guile's default environment.

  • Added experimental (hoot time) module for date/time calculations (currently undocumented).

  • Internal make-port procedure now uses keyword arguments.

  • Keyword error constructors are now exported from Hoot's Wasm ABI.

  • Added external-function? and call-external to (hoot ffi).

  • Added quote-syntax to (hoot core-syntax) and (hoot syntax).

  • Documented (fibers streams) module.

Bug fixes

  • Fixed inexact->exact conversion for a subset of flonums.

  • Fixed number->string for radix 16.

  • Fixed current-time which was mistakenly using jiffies.

  • Fixed string-split when delimiter is the first character in the string.

  • Fixed string-utf8-length opcode emission.

  • Fixed define-values macro that was causing compilation errors on newer Guile Git builds.

  • Fixed cond-expand error condition.

  • Fixed Wasm linker handling of call_ref and return_call_ref.

  • Fixed Wasm linker handling of catch bodies.

  • Fixed ,wasm-eval metacommand.

  • Fixed NodeJS example in the manual. Thanks to Jelle Licht for the patch!

  • External references now have a printed representation.

  • Exported sleep from (fibers) to match upstream Fibers API.

Performance improvements

  • Optimized Wasm validation pass (validate-wasm).

  • Optimized Wasm linker (add-stdlib).

  • Optimized Wasm resolver (resolve-wasm).

  • Optimized Wasm emission in compiler backend (high-level-cps->wasm).

Browser compatibility

  • Compatible with Firefox 121 or later.

  • Compatible with Google Chrome 119 or later.

  • Compatibility with Safari/WebKit is still a bit complicated, unfortunately. We thought everything was in place when Hoot 0.6.0 was released, but then we found a bug in WebKit’s Wasm tail call implementation. This bug was fixed nearly two months ago but the fix has not yet propagated out to stable Safari and iOS releases, as far we’re aware. We verified that some of our larger Hoot programs are now working on a Git build of WebKit, so we are optimistic that Hoot will finally be supported on all major browsers soon.

Get Hoot 0.6.1!

Hoot is already available in GNU Guix:

$ guix pull
$ guix install guile-next guile-hoot

(Hoot currently requires a bleeding-edge version of Guile, hence guile-next above.)

Otherwise, Hoot can be built from source via our release tarball. See the Hoot homepage for a download link and GPG signature.

Documentation for Hoot 0.6.1, including build instructions, can be found here.

Get in touch!

For bug reports, pull requests, or just to follow along with development, check out the Hoot project on Codeberg.

If you build something cool with Hoot (for the upcoming game jam or otherwise), let us know on our community forum!

Special thanks to the MetaMask folks for funding this work!

Until next time, happy hooting! 🦉

by Dave Thompson (contact@spritely.institute) at Tuesday, May 6, 2025