Thursday, August 7, 2025

Andy Wingo

whippet hacklog: adding freelists to the no-freelist space

August greetings, comrades! Today I want to bookend some recent work on my Immix-inspired garbage collector: firstly, an idea with muddled results, then a slog through heuristics.

the big idea

My mostly-marking collector’s main space is called the “nofl space”. Its name comes from its historical evolution from mark-sweep to mark-region: instead of sweeping unused memory to freelists and allocating from those freelists, sweeping is interleaved with allocation; “nofl” means “no free-list”. As it finds holes, the collector bump-pointer allocates into those holes. If an allocation doesn’t fit into the current hole, the collector sweeps some more to find the next hole, possibly fetching another block. Space for holes that are too small is effectively wasted as fragmentation; mutators will try again after the next GC. Blocks with lots of holes will be chosen for opportunistic evacuation, which is the heap defragmentation mechanism.

Hole-too-small fragmentation has bothered me, because it presents a potential pathology. You don’t know how a GC will be used or what the user’s allocation pattern will be; if it is a mix of medium (say, a kilobyte) and small (say, 16 bytes) allocations, one could imagine a medium allocation having to sweep over lots of holes, discarding them in the process, which hastens the next collection. Seems wasteful, especially for non-moving configurations.

So I had a thought: why not collect those holes into a size-segregated freelist? We just cleared the hole, the memory is core-local, and we might as well. Then before fetching a new block, the allocator slow-path can see if it can service an allocation from the second-chance freelist of holes. This decreases locality a bit, but maybe it’s worth it.

Thing is, I implemented it, and I don’t know if it’s worth it! It seems to interfere with evacuation, in that the blocks that would otherwise be most profitable to evacuate, because they contain many holes, are instead filled up with junk due to second-chance allocation from the freelist. I need to do more measurements, but I think my big-brained idea is a bit of a wash, at least if evacuation is enabled.

heap growth

When running the new collector in Guile, we have a performance oracle in the form of BDW: it had better be faster for Guile to compile a Scheme file with the new nofl-based collector than with BDW. In this use case we have an additional degree of freedom, in that unlike the lab tests of nofl vs BDW, we don’t impose a fixed heap size, and instead allow heuristics to determine the growth.

BDW’s built-in heap growth heuristics are very opaque. You give it a heap multiplier, but as a divisor truncated to an integer. It’s very imprecise. Additionally, there are nonlinearities: BDW is relatively more generous for smaller heaps, because attempts to model and amortize tracing cost, and there are some fixed costs (thread sizes, static data sizes) that don’t depend on live data size.

Thing is, BDW’s heuristics work pretty well. For example, I had a process that ended with a heap of about 60M, for a peak live data size of 25M or so. If I ran my collector with a fixed heap multiplier, it wouldn’t do as well as BDW, because it collected much more frequently when the heap was smaller.

I ended up switching from the primitive “size the heap as a multiple of live data” strategy to live data plus a square root factor; this is like what Racket ended up doing in its simple implementation of MemBalancer. (I do have a proper implementation of MemBalancer, with time measurement and shrinking and all, but I haven’t put it through its paces yet.) With this fix I can meet BDW’s performance for my Guile-compiling-Guile-with-growable-heap workload. It would be nice to exceed BDW of course!

parallel worklist tweaks

Previously, in parallel configurations, trace workers would each have a Chase-Lev deque to which they could publish objects needing tracing. Any worker could steal an object from the top of a worker’s public deque. Also, each worker had a local, unsynchronized FIFO worklist, some 1000 entries in length; when this worklist filled up, the worker would publish its contents.

There is a pathology for this kind of setup, in which one worker can end up with a lot of work that it never publishes. For example, if there are 100 long singly-linked lists on the heap, and the worker happens to have them all on its local FIFO, then perhaps they never get published, because the FIFO never overflows; you end up not parallelising. This seems to be the case in one microbenchmark. I switched to not have local worklists at all; perhaps this was not the right thing, but who knows. Will poke in future.

a hilarious bug

Sometimes you need to know whether a given address is in an object managed by the garbage collector. For the nofl space it’s pretty easy, as we have big slabs of memory; bisecting over the array of slabs is fast. But for large objects whose memory comes from the kernel, we don’t have that. (Yes, you can reserve a big ol’ region with PROT_NONE and such, and then allocate into that region; I don’t do that currently.)

Previously I had a splay tree for lookup. Splay trees are great but not so amenable to concurrent access, and parallel marking is one place where we need to do this lookup. So I prepare a sorted array before marking, and then bisect over that array.

Except a funny thing happened: I switched the bisect routine to return the start address if an address is in a region. Suddenly, weird failures started happening randomly. Turns out, in some places I was testing if bisection succeeded with an int; if the region happened to be 32-bit-aligned, then the nonzero 64-bit uintptr_t got truncated to its low 32 bits, which were zero. Yes, crusty reader, Rust would have caught this!

fin

I want this new collector to work. Getting the growth heuristic good enough is a step forward. I am annoyed that second-chance allocation didn’t work out as well as I had hoped; perhaps I will find some time this fall to give a proper evaluation. In any case, thanks for reading, and hack at you later!

by Andy Wingo at Thursday, August 7, 2025

spritely.institute

Spritely Goblins v0.16.0 released!

We are excited to announce Spritely Goblins v0.16.0! This release of Goblins is faster than ever, with two major core speedups benefiting all Goblins-using programs! Furthermore, we have a brand new Unix Domain Socket netlayer, which means our OCapN protocol is now usable for efficient machine-local inter-process communication!

A new Unix Domain Sockets netlayer

Another new netlayer has come to Goblins, this time one based on Unix Domain Sockets! Unix domain sockets are ideal for communication between multiple processes running on the same machine. We think being able to use Goblins and OCapN to wire together a kind of efficient local inter-process communication is pretty neat!

Many users might be familiar with using Unix domain sockets using file paths on the system, however since the file system on Unix(-like) systems use ACLs, this can lead to security vulnerabilities via confused deputy attacks.

Our implementation uses a feature of Unix domain sockets which allow sockets to be sent and received over other sockets. We built an introduction server which you run on your system. You can think of it as a little OCaps kernel in amongst the modern ACL sea that our systems are built on today! The Unix domain socket netlayer can connect to one or multiple of these introduction servers and so long as two netlayers share the same introduction server, they can securely communicate with one another.

We look forward to seeing what you all use this new netlayer for... we have several exciting uses planned ourselves we hope to show off soon!

Speeding, speeding, speeding ahead!

When we're talking speedups in this release, we're not talking in mere single digit percentage speed boosts. No, not even double digit... keep going! Each of our two core speed-boosts bring make Goblins each improve the speed of common Goblins operations 10-20x, benefiting all Goblins-using programs!

The short version is, spawn has gotten faster, and bcom and promises have also gotten faster, all of which are core to all Goblins programs. For the interested reader, we explain more in further detail below. (This may go into more detail than many readers care for; feel free to skip past!)

Speeding up spawn by bypassing the elfs

Once upon a time, when Goblins was being created, a decision was made: debugging programs is important, and so all objects shall carry a debug name, and that debug name shall be, by default, the name of the procedure that constructed the actor! This was a sensible decision, and we believe, generally the correct one: it has served us well.

This decision was made long ago, in early days when Goblins was a Racket library, and we began to focus on speed much more after the port to Guile. But unfortunately, in Guile, asking a procedure "what is your name?" resulted in a journey to the land of elfs.

Or rather, that is all to say, calling procedure-name in Guile on every spawn, which we did for debuggability purposes, turns out to be painfully slow. And the reason it is slow is that normally procedure-name is only called when experimenting at the REPL or when printing a backtrace. While optimizing Goblins programs, we found that tracing an ordinary spawn (using Guile's lovely ,trace tool) was printing reams and reams of pages of lines of code. Guile's internal object file format is (perhaps surprising to some readers!) the very same ELF as, yes, the Executable and Linkable Format used by Linux executables! However, Guile uses this for different purposes; it turns out this format is just very well thought through, and Guile's lead dev Andy Wingo has a nice blogpost explaining why ELF was chosen. What this effectively meant is that ELF-parsing code would be executed all the time when simply trying to grab the name of a procedure while spawning an object.

What to do? Many paths were considered: We could try to optimize this code intended to be rarely-used in Guile itself, or cache the result and attach to the constructor somehow, or evaluate lazily. But each of these had problems: it was slow or otherwise complicated.

We could change spawn to be a macro, and grab the name referred to by the constructor at compile time. Alas, this had its own pitfall: this would break any case where spawn was already being used with apply.

The solution is to support both cases! Here is the new code for spawn:

;; When an actor is spawned and a name is not specified, we default to
;; the name of its constructor.  However, 'procedure-name' is very
;; slow and can involve parsing ELF for compiled code.  To speed
;; things up, we take advantage of the fact that actor constructors
;; are typically specified as identifiers in the source, so we can
;; simply use that identifier as the name.  To preserve the illusion
;; that 'spawn' is just a regular ol' procedure, there is identifier
;; syntax.
(define-syntax spawn
  (lambda (stx)
    (syntax-case stx ()
      ((_ constructor arg ...)          ; fast path
       (identifier? #'constructor)
       #'(spawn-named 'constructor constructor arg ...))
      ((_ constructor arg ...)          ; slow path
       #'(%spawn constructor arg ...))
      (id                               ; identifier syntax; also slow
       (identifier? #'id)
       #'%spawn))))

What this means is that when a Goblins program is compiled, most invocations of spawn will cleverly use the name of the constructor being passed in at compile time. But if this cannot be determined simply at compile time, or if spawn is to be invoked via apply or passed around as if it were a function, we fall back to using spawn as an ordinary procedure (ie, fall back to the internal %spawn procedure, which calls procedure-name as normal).

We still love our friends the elfs, and upon occasion, some Goblins programs might journey into elf land, should they need their help to provide a simple debugging name. But most of the time, we can be much faster now, by looking around where we are at compile time!

Become your new you, faster than ever

Previously we discussed how Goblins actors got much faster with spawn, but this is only part of an actor's journey. First, we are born, and then, we grow and change based upon experience. So it is too with Goblins actors!

When an actor is spawned, its constructor returns what will be its first behavior. But actors may change their behavior based upon experience: in response to a message, a Goblins actor may choose to bcom (pronounced "become") a new version of its behavior.

An actor having many experiences (receiving many messages) may experience a large amount of change, and thus may invoke bcom a lot. The way bcom was implemented used pretty much the same sealers/unsealers technique from the appendix of The Heart of Spritely, itself a technique borrowed from W7, the very security kernel from A Security Kernel Based on the Lambda Calculus!

This is a cool technique, and takes advantage of being able to construct new types at runtime. However, constructing new types at runtime turns out to have some overhead. The details are unimportant, but we moved to a new implementation of sealers which are functionally equivalent but use an encapsulated "cookie" comparison, which turns out to be dramatically faster... about as fast as two accessor calls and an identity-comparison invocation of eq?!

In other words, actors can now change their behavior with bcom quite quickly! And several other aspects of Goblins have gotten faster too with this new sealers technique, in particular several aspects of promises! Zoom zoom!

Getting the release

This release includes all the features detailed above as well as many bug fixes. See the NEWS for more information.

As usual, if you're using Guix you can upgrade to 0.16 by using the following:

guix pull
guix install guile-goblins

Otherwise, you can find the tarball on our release page.

The above features and speedups spoken about in this blogpost refer to the Guile version of Goblins, which is nowadays the primary version of Spritely Goblins. However, we do maintain our older Racket version, which has now also gotten updated to maintain OCapN compatibility with Guile Goblins. Racket users can run the following:

raco pkg install goblins

If you're making something with Goblins or want to contribute to Goblins itself, be sure to join our community at community.spritely.institute! We also host regular office hours where you can come and ask questions or discuss our projects, you can find information about those on our community forum. Thanks for following along and hope to see you there!

by Christine Lemmer-Webber (contact@spritely.institute) at Thursday, August 7, 2025

Thursday, July 31, 2025

spritely.institute

Spritely presented spirited speeches spanning the planet

Over the past 6 months, Spritely has been busy bringing our message to new audiences. I thought it might be nice to compile a list for everyone to watch our talks. Christine Lemmer-Webber, the Executive Director of Spritely, has been busy giving most of these presentations, but the entire team has helped as well. The talks cover our technology, our values, our past, and our vision.

Org mode Witchcraft at Spritely

In January, five Spritely members went to the annual FOSDEM conference to talk about the organization and how we all contribute to it. The first talk was actually by me, talking about how we organize our organization, and how we manage the many whitepapers we have put together using Org Mode. I also gave a sneak-peak of the trans-bean program which can use a Magit-style menu to edit plaintext accounting ledgers.

An update from that video is that trans-bean is now available on Codeberg if you want to try it out!

Today's fediverse: a good start, but there's more to do

Unfortunately, this talk is one of those things you had to experience in person. The camera didn't capture the performance. It is still worth a listen though! Our founder, Christine Lemmer-Webber, and the Chief Technologist at Spritely, Jessica Tallon, talk about their view of the fediverse, from their perspective as two of the primary authors to the AcitivityPub spec.

Object-Capability Security with Spritely Goblins for Secure Collaboration

Juliana gave a beautiful presentation of how our shared values regarding individual rights and consent led naturally to the technical choices Spritely has made. She also gives a great overview of Ocaps which is worth watching even if you are familiar with object capability security already.

Minimalist web application deployment with Scheme

Dave is fighting an uphill battle against dependency hell, and he needs your help. Part of the solution is, of course, Guile Scheme! Spritely's scheme-to-webassembly compiler, Hoot, is now mature enough to utilize in your next web application, and Dave thinks you should try it. What you will get in return is true reproducibility and a good bootstrapping story, along with all the web APIs you're used to. He even goes on to show how reactive programming can work through webassembly and Scheme, and plenty of other goodies.

Goblins: The framework for your next project!

Jessica is the lead technologist at Spritely and brings a lot of experience to the table when it comes to defining networking standards. She is a co-author of the ActivityPub. Now, working on Goblins at Spritely, she believes the Goblins library can be used for much more than just social media.

Shepherd with Spritely Goblins for Secure System Layer Collaboration

Juliana's second talk at FOSDEM this year was about her work on bringing the distributed networking power of Goblins to the Shepherd, which is responsible for coordinating services on a Guix system. With this project well underway, system administration, across the internet, can soon be done in a capability-secure way. This talk covers the current status of the project as well as how the Plan 9 system inspired her to start.

Spritely and a secure, collaborative, distributed future

Christine gave the last talk from Spritely at FOSDEM this year, and walked through the larger concepts of Spritely and how they come together, and why we decided to make mascots for all the different components. The Spritely project plan from 6 years ago is still the current plan, despite all the work that was done in between. As more and more characters have been coming to life, we have been getting closer to fulfilling our promise of peer-to-peer application development made easy and secure.

c-base Fireside chat

Christine had an intimate conversation at Berlin's famous c-base space station with Volker Grassmuck, ranging from topics about her personal life, her experience working on the ActivityPub, and the work she is doing now at Spritely. She ends it with a powerful and hopeful message about the future of decentralized networking.

Fediforum keynote

Christine again gave an amazing talk about the values that led to Spritely, most importantly including fun and enjoyment. She talks about the differences between the fediverse and Bluesky, and how each can learn from each other, as well as our current battle against surveillance capitalism. Throughout all of this, she gives an optimistic view of what can be accomplishe through community activism.

What the future holds

We are all putting our heads down to work on delivering the promises we talked about this year so far. In the current environment, the tools we are building are more important than ever. We hope that these talks inspire you to try out our technology and read our papers, maybe even donate! And each month, you can come listen to more of us talk at our monthly Office Hours.

Have a great rest of the Summer!

by Amy Pillow (contact@spritely.institute) at Thursday, July 31, 2025

Friday, July 25, 2025

Scheme Requests for Implementation

SRFI 264: String Syntax for Scheme Regular Expressions

SRFI 264 is now in draft status.

This SRFI proposes SSRE, an alternative string-based syntax for Scheme Regular Expressions as defined by SRFI 115. String syntax is both compact and familiar to many regexp users; it is translated directly into SRE S-expressions, providing equivalent constructs. While the proposed syntax mostly follows PCRE, it takes into account specifics of Scheme string syntax and limitations of SRE, leaving out constructs that either duplicate functionality provided by Scheme strings or have no SRE equivalents. The repertoire of named sets and boundary conditions can be extended via a parameter mechanism. Extensions to PCRE syntax allow concise expression of operations on named character sets.

by Sergei Egorov at Friday, July 25, 2025

Tuesday, July 8, 2025

Andy Wingo

guile lab notebook: on the move!

Hey, a quick update, then a little story. The big news is that I got Guile wired to a moving garbage collector!

Specifically, this is the mostly-moving collector with conservative stack scanning. Most collections will be marked in place. When the collector wants to compact, it will scan ambiguous roots in the beginning of the collection cycle, marking objects referenced by such roots in place. Then the collector will select some blocks for evacuation, and when visiting an object in those blocks, it will try to copy the object to one of the evacuation target blocks that are held in reserve. If the collector runs out of space in the evacuation reserve, it falls back to marking in place.

Given that the collector has to cope with failed evacuations, it is easy to give the it the ability to pin any object in place. This proved useful when making the needed modifications to Guile: for example, when we copy a stack slice containing ambiguous references to a heap-allocated continuation, we eagerly traverse that stack to pin the referents of those ambiguous edges. Also, whenever the address of an object is taken and exposed to Scheme, we pin that object. This happens frequently for identity hashes (hashq).

Anyway, the bulk of the work here was a pile of refactors to Guile to allow a centralized scm_trace_object function to be written, exposing some object representation details to the internal object-tracing function definition while not exposing them to the user in the form of API or ABI.

bugs

I found quite a few bugs. Not many of them were in Whippet, but some were, and a few are still there; Guile exercises a GC more than my test workbench is able to. Today I’d like to write about a funny one that I haven’t fixed yet.

So, small objects in this garbage collector are managed by a Nofl space. During a collection, each pointer-containing reachable object is traced by a global user-supplied tracing procedure. That tracing procedure should call a collector-supplied inline function on each of the object’s fields. Obviously the procedure needs a way to distinguish between different kinds of objects, to trace them appropriately; in Guile, we use an the low bits of the initial word of heap objects for this purpose.

Object marks are stored in a side table in associated 4-MB aligned slabs, with one mark byte per granule (16 bytes). 4 MB is 0x400000, so for an object at address A, its slab base is at A & ~0x3fffff, and the mark byte is offset by (A & 0x3fffff) >> 4. When the tracer sees an edge into a block scheduled for evacuation, it first checks the mark byte to see if it’s already marked in place; in that case there’s nothing to do. Otherwise it will try to evacuate the object, which proceeds as follows...

But before you read, consider that there are a number of threads which all try to make progress on the worklist of outstanding objects needing tracing (the grey objects). The mutator threads are paused; though we will probably add concurrent tracing at some point, we are unlikely to implement concurrent evacuation. But it could be that two GC threads try to process two different edges to the same evacuatable object at the same time, and we need to do so correctly!

With that caveat out of the way, the implementation is here. The user has to supply an annoyingly-large state machine to manage the storage for the forwarding word; Guile’s is here. Basically, a thread will try to claim the object by swapping in a busy value (-1) for the initial word. If that worked, it will allocate space for the object. If that failed, it first marks the object in place, then restores the first word. Otherwise it installs a forwarding pointer in the first word of the object’s old location, which has a specific tag in its low 3 bits allowing forwarded objects to be distinguished from other kinds of object.

I don’t know how to prove this kind of operation correct, and probably I should learn how to do so. I think it’s right, though, in the sense that either the object gets marked in place or evacuated, all edges get updated to the tospace locations, and the thread that shades the object grey (and no other thread) will enqueue the object for further tracing (via its new location if it was evacuated).

But there is an invisible bug, and one that is the reason for me writing these words :) Whichever thread manages to shade the object from white to grey will enqueue it on its grey worklist. Let’s say the object is on an block to be evacuated, but evacuation fails, and the object gets marked in place. But concurrently, another thread goes to do the same; it turns out there is a timeline in which the thread A has marked the object, published it to a worklist for tracing, but thread B has briefly swapped out the object’s the first word with the busy value before realizing the object was marked. The object might then be traced with its initial word stompled, which is totally invalid.

What’s the fix? I do not know. Probably I need to manage the state machine within the side array of mark bytes, and not split between the two places (mark byte and in-object). Anyway, I thought that readers of this web log might enjoy a look in the window of this clown car.

next?

The obvious question is, how does it perform? Basically I don’t know yet; I haven’t done enough testing, and some of the heuristics need tweaking. As it is, it appears to be a net improvement over the non-moving configuration and a marginal improvement over BDW, but which currently has more variance. I am deliberately imprecise here because I have been more focused on correctness than performance; measuring properly takes time, and as you can see from the story above, there are still a couple correctness issues. I will be sure to let folks know when I have something. Until then, happy hacking!

by Andy Wingo at Tuesday, July 8, 2025

Wednesday, June 11, 2025

Andy Wingo

whippet in guile hacklog: evacuation

Good evening, hackfolk. A quick note this evening to record a waypoint in my efforts to improve Guile’s memory manager.

So, I got Guile running on top of the Whippet API. This API can be implemented by a number of concrete garbage collector implementations. The implementation backed by the Boehm collector is fine, as expected. The implementation that uses the bump-pointer-allocation-into-holes strategy is less good. The minor reason is heap sizing heuristics; I still get it wrong about when to grow the heap and when not to do so. But the major reason is that non-moving Immix collectors appear to have pathological fragmentation characteristics.

Fragmentation, for our purposes, is memory under the control of the GC which was free after the previous collection, but which the current cycle failed to use for allocation. I have the feeling that for the non-moving Immix-family collector implementations, fragmentation is much higher than for size-segregated freelist-based mark-sweep collectors. For an allocation of, say, 1024 bytes, the collector might have to scan over many smaller holes until you find a hole that is big enough. This wastes free memory. Fragmentation memory is not gone—it is still available for allocation!—but it won’t be allocatable until after the current cycle when we visit all holes again. In Immix, fragmentation wastes allocatable memory during a cycle, hastening collection and causing more frequent whole-heap traversals.

The value proposition of Immix is that if there is too much fragmentation, you can just go into evacuating mode, and probably improve things. I still buy it. However I don’t think that non-moving Immix is a winner. I still need to do more science to know for sure. I need to fix Guile to support the stack-conservative, heap-precise version of the Immix-family collector which will allow for evacuation.

So that’s where I’m at: a load of gnarly Guile refactors to allow for precise tracing of the heap. I probably have another couple weeks left until I can run some tests. Fingers crossed; we’ll see!

by Andy Wingo at Wednesday, June 11, 2025

Monday, June 9, 2025

Scheme Requests for Implementation

SRFI 263: Prototype Object System

SRFI 263 is now in draft status.

This SRFI proposes a "Self"-inspired prototype object system. Such an object system works by having prototype objects that are cloned repeatedly to modify, extend, and use them, and is interacted with by passing messages.

by Daniel Ziltener at Monday, June 9, 2025

Wednesday, June 4, 2025

spritely.institute

Goblinville: A Spring Lisp Game Jam 2025 retrospective

Spritely participates in the Lisp Game Jam to make interactive artifacts demonstrating our progress building out our tech stack. The 2025 edition of the Spring Lisp Game Jam recently wrapped up and this time around we were finally able to show off using both Hoot and Goblins together to create a multiplayer virtual world demo! Now that we’ve had a moment to breathe, it’s time to share what we built and reflect on the experience.

But first, some stats about the jam overall.

Jam stats

Out of 26 total entries, 7 were made with Guile Scheme, including ours. Of those 7, all but one used Hoot, our Scheme to WebAssembly compiler. Guile tied for first place with Fennel as the most used Lisp implementation for the jam. We’re thrilled to see that Guile and Hoot have become popular choices for this jam!

Though many entries used Hoot, our entry was the only one that used Goblins, our distributed programming framework. However, David Wilson of System Crafters gets an honorable mention because he streamed several times throughout the jam while working on a MUD built with Goblins that was ultimately unsubmitted.

Our entry was Goblinville and it was rated the 7th best game in the jam overall. Not bad!

About Goblinville

Goblinville is a 2D, multiplayer, virtual world demo. During last year’s Spring Lisp Game Jam we made Cirkoban with a restricted subset of Goblins that had no network functionality. Since then, we’ve made a lot of progress porting Goblins to Hoot, culminating with the Goblins 0.15.0 release in January that featured OCapN working in the web browser using WebSockets.

Given all of this progress, we really wanted to show off a networked game this time. Making a multiplayer game for a jam is generally considered a bad idea, but Spritely is all about building networked communities so that’s what we set out to do. Our goal was to make something of a spiritual successor to the community garden demo I made when I first joined Spritely.

Screenshot of Jessica Tallon inGoblinville

What went well

First, let’s reflect on the good stuff. Here’s what went well:

Having participated in this jam a number of times, we have gotten pretty good at scoping projects down into something achievable.
Goblins made it easy to describe the game world as a collection of actors that communicate asynchronously. Initially, the entire world was hosted inside a single web browser tab. Once enough essential actors were implemented it was a simple task to push most of those actors into a separate server process. Since sending a message to a Goblins actor is the same whether it is local or remote, this change required little more than setting up an OCapN connection.
Communicating with actors over OCapN really helped with creating an architecture that separated server state from client-side input and rendering concerns. This was harder to think about with Cirkoban because there was no network separation.
The Hoot game jam template made it easy to get started quickly. It had been a year since we made our last game, so having a small template project was useful while we were refreshing our memory about the various Web APIs we needed to use.
The vast amount of freely licensed Liberated Pixel Cup (something our Executive Director Christine Lemmer-Webber organized back in her days at Creative Commons) assets allowed us to focus on the code while still having pleasing graphics that felt unified.

As a bonus, David Wilson gave Goblinville a shout out on a System Crafters stream and a bunch of people joined the server while I was online! It was a really cool moment.

Screenshot of six Goblinville players on screen at once

What didn’t go so well

Game jams are fast paced (even though the Lisp Game Jam is more relaxed than the average jam) and not everything goes according to plan. A big part of the game jam experience is to practice adjusting project scope as difficulties arise. Issues with the project included:

Time pressure. Unfortunately, we didn’t have as much time to dedicate to this project that we would have liked. We weren’t able to start work until the Monday after the jam started, so we only had 7 days instead of 10. Also, I came down with a cold at the end of the week which didn’t help my productivity. Making something that felt as polished as Cirkoban simply wasn’t possible.
Lack of persistence for the game world. There’s still some amount of pre-planning that goes into writing actors that can persist that we didn’t have time for. Furthermore, while our persistence system is written to support incremental updates, we don’t have a storage backend that supports it yet. Each tick of the game world would trigger a full re-serialization and we felt that was too much of a performance penalty. We hope that by the next jam this will no longer be an issue.
As predicted, multiplayer increased overall complexity. What felt like a stable enough world during local testing was quickly shown to have several performance issues and bugs once it was released to the public and other people started using it. We had to restart the server once every day or so during the jam rating period (though we have resolved these issues in a post-jam update). Since we weren’t persisting the game world, each restart wiped out all registered players and the state of the map.
No client-side prediction to mask lag. For example, when you press an arrow key to move, you won’t see the player sprite move in the client until it receives a notification from the server that the move was valid. In other words, how responsive the controls feel is directly tied to server lag. A production game client would move the player immediately and fix things up later if it receives contradictory information from the server.

Screenshot of 3 Goblinville players online, user “djm” is saying“hi!”

Post-jam updates

We did a bit of additional work after the jam was over to sand some of the roughest edges:

Re-architected the server update loop to greatly reduce message volume. Because it was simple to implement, actors in the game world were being sent a tick message at 60Hz to update their internal state. Most of the time, the actors would simply do nothing. A plant that is done growing has nothing left to do, so that’s 60 wasteful messages per second per plant. Instead, a timer system was added to schedule things to happen after so many ticks of the game world and the tick method was removed from all game objects. This greatly improved server stability, especially for worlds with lots of live objects. As of writing, we’ve had a server running for six days without any noticeable increase in lag.
Added a server event log. It was hard to see what was going on in the world during the jam rating period without being connected to the graphical client. Now the server process emits a timestamped log of every event to standard output.
Added character sprite selection. This feature just barely missed the jam submission deadline, but it’s in now! Instead of all players being the same sprite, there are now six to choose from.
Took down the public server. For the jam submission version, we had baked a URI into the itch.io client to a public server we were hosting so the game would “just work”. This was particularly important for the other participants who were rating the submitted games and giving feedback. Since the jam rating period is now over, we took down the public server. If you’re interested in trying out Goblinville, you can follow the instructions in the README to host your own server.

Also, Spritely co-founder Randy Farmer stopped by our updated Goblinville world!

Screenshot of current Spritely staff plus co-founder Randy Farmer inGoblinville

Wrapping up

Goblinville turned out to be more of a tech demo than a true game, but we’re quite happy with the result. We think it’s a good demonstration of what can be built with Goblins and Hoot in a short amount of time. We hope to build on this success to create even more engaging, featureful demos in the future!

by Dave Thompson (contact@spritely.institute) at Wednesday, June 4, 2025

Tuesday, May 27, 2025

Idiomdrottning

Endless scroll

The “endless scroll” debate was after it replaced pages where you’d scroll scroll scroll click, scroll scroll scroll click, scroll scroll scroll click. That was annoying while still not actually stemming addiction (at least for me). I’d still read through those megathreads on RPG.net, UI annoyances or no. The endless scroll it just took the clicks out of that process which was an improvement. But what I want is instead taking scrolls out of the process! So it’s tap, tap, tap, tap—like an ebook!

Probably going to be just as addictive but I won’t get anxiety from all the scrolling.

Scrolling and panning is fiddly and I never get exactly the right amount of page scrolled it’s like threding a needle repeatedly and most psge down algos are no good either since they’re paging in a text format that’s not designed for pages so you have to read the same couple of lines twice, last on this page and first on the next. So in the future maybe we’ll render HTML as actual pages (after all, epub readers can [sorta] do it). Even less and more on Unix can do it; they show all of one page, then all of the next page separately and so on. The weaksauce nature of page down in GUI apps like Netscape was one of the biggest letdowns when I first started using them in the nineties.

However, the addiction dark pattern has another component; the endless and often junky content which really makes the scroll endless. That part can not stay.

That’s a secondary reason for why I don’t like discover algorithms on Mastodon, the primary reason being how it’s artificial virality.

by Idiomdrottning (sandra.snan@idiomdrottning.org) at Tuesday, May 27, 2025

Thursday, May 22, 2025

Andy Wingo

whippet lab notebook: guile, heuristics, and heap growth

Greets all! Another brief note today. I have gotten Guile working with one of the Nofl-based collectors, specifically the one that scans all edges conservatively (heap-conservative-mmc / heap-conservative-parallel-mmc). Hurrah!

It was a pleasant surprise how easy it was to switch—from the user’s point of view, you just pass --with-gc=heap-conservative-parallel-mmc to Guile’s build (on the wip-whippet branch); when developing I also pass --with-gc-debug, and I had a couple bugs to fix—but, but, there are still some issues. Today’s note thinks through the ones related to heap sizing heuristics.

growable heaps

Whippet has three heap sizing strategies: fixed, growable, and adaptive (MemBalancer). The adaptive policy is the one I would like in the long term; it will grow the heap for processes with a high allocation rate, and shrink when they go idle. However I won’t really be able to test heap shrinking until I get precise tracing of heap edges, which will allow me to evacuate sparse blocks.

So for now, Guile uses the growable policy, which attempts to size the heap so it is at least as large as the live data size, times some multiplier. The multiplier currently defaults to 1.75×, but can be set on the command line via the GUILE_GC_OPTIONS environment variable. For example to set an initial heap size of 10 megabytes and a 4× multiplier, you would set GUILE_GC_OPTIONS=heap-size-multiplier=4,heap-size=10M.

Anyway, I have run into problems! The fundamental issue is fragmentation. Consider a 10MB growable heap with a 2× multiplier, consisting of a sequence of 16-byte objects followed by 16-byte holes. You go to allocate a 32-byte object. This is a small object (8192 bytes or less), and so it goes in the Nofl space. A Nofl mutator holds on to a block from the list of sweepable blocks, and will sequentially scan that block to find holes. However, each hole is only 16 bytes, so we can’t fit our 32-byte object: we finish with the current block, grab another one, repeat until no blocks are left and we cause GC. GC runs, and after collection we have an opportunity to grow the heap: but the heap size is already twice the live object size, so the heuristics say we’re all good, no resize needed, leading to the same sweep again, leading to a livelock.

I actually ran into this case during Guile’s bootstrap, while allocating a 7072-byte vector. So it’s a thing that needs fixing!

observations

The root of the problem is fragmentation. One way to solve the problem is to remove fragmentation; using a semi-space collector comprehensively resolves the issue, modulo any block-level fragmentation.

However, let’s say you have to live with fragmentation, for example because your heap has ambiguous edges that need to be traced conservatively. What can we do? Raising the heap multiplier is an effective mitigation, as it increases the average hole size, but for it to be a comprehensive solution in e.g. the case of 16-byte live objects equally interspersed with holes, you would need a multiplier of 512× to ensure that the largest 8192-byte “small” objects will find a hole. I could live with 2× or something, but 512× is too much.

We could consider changing the heap organization entirely. For example, most mark-sweep collectors (BDW-GC included) partition the heap into blocks whose allocations are of the same size, so you might have some blocks that only hold 16-byte allocations. It is theoretically possible to run into the same issue, though, if each block only has one live object, and the necessary multiplier that would “allow” for more empty blocks to be allocated is of the same order (256× for 4096-byte blocks each with a single 16-byte allocation, or even 4096× if your blocks are page-sized and you have 64kB pages).

My conclusion is that practically speaking, if you can’t deal with fragmentation, then it is impossible to just rely on a heap multiplier to size your heap. It is certainly an error to live-lock the process, hoping that some other thread mutates the graph in such a way to free up a suitable hole. At the same time, if you have configured your heap to be growable at run-time, it would be bad policy to fail an allocation, just because you calculated that the heap is big enough already.

It’s a shame, because we lose a mooring on reality: “how big will my heap get” becomes an unanswerable question because the heap might grow in response to fragmentation, which is not deterministic if there are threads around, and so we can’t reliably compare performance between different configurations. Ah well. If reliability is a goal, I think one needs to allow for evacuation, one way or another.

for nofl?

In this concrete case, I am still working on a solution. It’s going to be heuristic, which is a bit of a disappointment, but here we are.

My initial thought has two parts. Firstly, if the heap is growable but cannot defragment, then we need to reserve some empty blocks after each collection, even if reserving them would grow the heap beyond the configured heap size multiplier. In that way we will always be able to allocate into the Nofl space after a collection, because there will always be some empty blocks. How many empties? Who knows. Currently Nofl blocks are 64 kB, and the largest “small object” is 8kB. I’ll probably try some constant multiplier of the heap size.

The second thought is that searching through the entire heap for a hole is a silly way for the mutator to spend its time. Immix will reserve a block for overflow allocation: if a medium-sized allocation (more than 256B and less than 8192B) fails because no hole in the current block is big enough—note that Immix’s holes have 128B granularity—then the allocation goes to a dedicated overflow block, which is taken from the empty block set. This reduces fragmentation (holes which were not used for allocation because they were too small).

Nofl should probably do the same, but given its finer granularity, it might be better to sweep over a variable number of blocks, for example based on the logarithm of the allocation size; one could instead sweep over clz(min-size)–clz(size) blocks before taking from the empty block list, which would at least bound the sweeping work of any given allocation.

fin

Welp, just wanted to get this out of my head. So far, my experience with this Nofl-based heap configuration is mostly colored by live-locks, and otherwise its implementation of a growable heap sizing policy seems to be more tight-fisted regarding memory allocation than BDW-GC’s implementation. I am optimistic though that I will be able to get precise tracing sometime soon, as measured in development time; the problem as always is fragmentation, in that I don’t have a hole in my calendar at the moment. Until then, sweep on Wayne, cons on Garth, onwards and upwards!

by Andy Wingo at Thursday, May 22, 2025

Wednesday, May 21, 2025

spritely.institute

Functional hash tables explained

Prologue: The quest for a functional hash table for Goblins

For those of us that use the Lisp family of programming languages, we have much appreciation for the humble pair. Using pairs, we can construct singly-linked lists and key/value mappings called association lists. Lists built from pairs have the pleasant property of being immutable (if you abstain from using setters!) and persistent: extending a list with a new element creates a new list that shares all the data from the original list. Adding to a list is a constant time operation, but lookup is linear time. Thus, lists are not appropriate when we need constant time lookup. For that, we need hash tables.

The classic hash table is a mutable data structure and one that our Lisp of choice (Guile, a Scheme implementation) includes in its standard library, like most languages. Hash tables are neither immutable nor persistent; adding or removing a new key/value pair to/from a hash table performs an in-place modification of the underlying memory. Mutable data structures introduce an entire class of bug possibilities that immutable data structures avoid, and theyâ€™re particularly tricky to use successfully in a multi-threaded program.

Fortunately, there exists an immutable, persistent data structure that is suitable: the Hash Array Mapped Trie or HAMT, for short. This data structure was introduced by Phil Bagwell in the paper â€œIdeal Hash Treesâ€� (2001). HAMTs were popularized in the Lisp world by Clojure over a decade ago. Unfortunately, Guile does not currently provide a HAMT-based functional hash table in its standard library.

Instead, Guile comes with the VList, another one of Phil Bagwellâ€™s creations which can be used to build a VHash. Goblins currently uses VLists for its functional hash table needs, but HAMTs are better suited for the task.

There are various implementations of HAMTs in Scheme floating about, but none of them seem to have notable adoption in a major Guile project. So, we thought weâ€™d write our own that we could ensure meets the needs of Goblins, compiles on Hoot, and that might just be useful enough to send upstream for inclusion into Guile after itâ€™s been battle tested. To that end, we recently added the (goblins utils hashmap) module. This will become the base for a future re-implementation of the ^ghash actor.

Okay, enough context. Letâ€™s talk about HAMTs!

What the hoot is a HAMT anyway?

From the outside, HAMTs can seem rather mysterious and intimidating. Thatâ€™s how I felt about them, at least. However, once I dug into the topic and started writing some code, I was pleasantly surprised that the essential details were not too complicated.

As mentioned previously, the HAMT is an immutable, persistent data structure that associates keys with their respective values, just like a regular hash table. By utilizing a special kind of tree known as a â€œtrieâ€� plus some nifty bit shifting tricks, HAMTs achieve effectively constant time insertion, deletion, and lookup despite all operations being logarithmic time on paper.

A trie differs from a tree in the following way: tries only store keys in their leaf nodes. If youâ€™re familiar with binary search trees, tries arenâ€™t like that. Instead, the key itself (or the hash thereof, in our case) encodes the path through the trie to the appropriate leaf node containing the associated value. Tries are also called â€œprefix treesâ€� for this reason.

A binary tree node can have at most two children, but HAMTs have a much larger branching factor (typically 32). These tries are wide and shallow, so few nodes need to be traversed to find the value for any given key. This is why the logarithmic time complexity of HAMT operations can be treated as if it were constant time in practice.

Trie representation

To explain, letâ€™s start by defining a trie node using a branching factor of 32. At its simplest, we can think of a trie node as a 32-element array. Each element of the trie can contain either a leaf node (a key/value pair) or a pointer to a subtrie (another 32 element array).

For example, a HAMT with 3 entries might look like this:

Example trie visualization

This is nice and simple, but itâ€™s a little too simple. With such a large branching factor, itâ€™s likely that many boxes in the trie will have nothing in them, like in the above example. This is a waste of space. The issue is further compounded by immutability: adding or removing a single element requires allocating a new 32 element array. For the sake of efficiency, we need to do better.

Instead, weâ€™ll use a sparse array that only stores the occupied elements of the trie node. To keep track of which elements of the theoretical 32 element array are occupied, weâ€™ll use a bitmap. Below is an example trie:

Example trie visualization

In the above example, bits 4 and 10 (starting from the right) are set. This means that of the 32 possible elements, only 2 are currently occupied. Thus, the size of the underlying array is 2.

To get the number of occupied elements in the trie node, we simply count the number of 1s in the bitmap. This is known as the â€œpopulation countâ€�. (We could also just check the size of the array, but this bit counting idea is about to have another important use.)

To retrieve the value at index 10, we need to perform a translation to get an index into the underlying array. To do this, we compute the population count of the bits set to the right of bit 10. There is 1 bit set: bit 4. Thus, the value weâ€™re looking for is stored at index 1 in the underlying array. If we take a peek, we find bar â†’ 2 there.

Thanks to this sparse storage technique, insertion/deletion operations will allocate less memory overall.

Insertion algorithm

With the trie representation out of the way, letâ€™s walk through the algorithm to insert a new key/value pair. The insertion algorithm covers all the essential aspects of working with a HAMT.

Letâ€™s say we want insert the mapping fooâ†’42 into an empty HAMT. The empty HAMT consists of a single node with an empty bitmap and an empty array:

Empty trie

To figure out where to store the key foo, we first need to compute the hash code for it. For ease of demonstration, weâ€™ll use 10-bit hash codes. (In practice, 32 bits or more is ideal.)

Letâ€™s say our fictitious hash function produces the hash bits 1000010001 for foo.

Each trie node has 32 possible branches. Thus, the range of indices for a node can be represented using a 5-bit unsigned integer. What if we took the 5 most significant bits of the hash code and used that as our index into the trie? Wow, that sounds like a clever idea! Letâ€™s do that!

Hash bits for foo

So, foo gets inserted at index 16. The original trie is empty, so weâ€™ll make a new trie with a single leaf node. To do this, we need to set bit 16 in our bitmap and create an array with just one key/value pair in it. Our output trie looks like this:

Single level trie with one leaf node

Note that we only examined 5 bits of the hash code. We only need to examine as many bits in the hash as it takes to find an empty element.

Letâ€™s insert another key/value pair, this time for key bar and value 17. Our made-up hash code for bar is 0100100001. Repeating the process above, the most significant 5 bits are 01001, so our index is 9, another unoccupied index. Our new trie looks like this:

Single level trie with two leaf nodes

Because 9 < 16, the entry for bar is stored in array index 0, followed by foo.

As a last example, letâ€™s insert a key/value pair where the most significant bits of the hash code collide with an existing entry. This time, the key is baz and the value is 66. Our made-up hash code for baz is 1000000001.

The most significant 5 bits are 10000, so our index is 16. Now things get interesting because 16 is already occupied; the first 5 bits are not enough to distinguish between the keys foo and baz! To resolve this, weâ€™ll replace the leaf node with a subtrie that uses the next 5 bits when calculating indices. The resulting root trie node will look sort of like this:

Subtrie placeholder

In the figure above, subtrie is a placeholder for the new trie we need to construct to hold the mappings for foo and baz.

To create the subtrie, we just recursively apply the same algorithm weâ€™ve already been using but with different bits. Now weâ€™re looking at these the least significant 5 bits of the hash codes:

Hash bits for foo and bar

The index for foo is 17, and the index for baz is 1. So, our new subtrie will have bits 1 and 17 set, and contain 2 leaf nodes:

Subtrie with two leaf nodes

Putting it all together, the complete trie looks like this:

Complete trie with a subtrie and three total leaf nodes

Each insertion creates a new trie. The new trie shares nearly all of the data with the original trie. Only the root node and the visited subtries need to be allocated afresh. This makes insertion even into large HAMTs quite efficient.

The partial hash collision described above could happen recursively, in which case the trie will grow yet another level deeper upon each iteration. The worst case scenario is that two or more keys have the same exact hash code. A good hash function and lots of hash bits will make this a very rare occurrence, but a robust insertion algorithm needs to account for this case. Weâ€™ll gloss over this edge case here, but hash collisions can be handled by â€œbottoming outâ€� to a special leaf node: a linked list of the colliding key/value pairs. These lists need to be traversed linearly when necessary upon lookup or future insertions that cause more collisions. In practice, these collision lists tend to be quite short, often only of length 2.

Wrapping up

The insertion algorithm explains all of the essential elements of a HAMT and how to work with its sparse storage structure, but weâ€™ll touch upon lookup and deletion very briefly.

Looking up the value for a key follows the same basic process as insertion, but in a read-only manner. If the hash bits point to an unoccupied element of a trie node then the search fails right then and there. If the bits point to a leaf node with a matching key, the search succeeds. If the key doesnâ€™t match, the search fails. Finally, if the bits point to a subtrie, we recursively search that subtrie with the next set of bits.

Deletion is the most complicated operation as it involves path compression when subtries become empty. If youâ€™ve been successfully nerd sniped by this blog post then consider this a homework assignment. Give Phil Bagwellâ€™s paper a read! ğŸ™‚

Thanks for following along! Hope this was fun!

by Dave Thompson (contact@spritely.institute) at Wednesday, May 21, 2025

Saturday, May 17, 2025

The Racket Blog

Racket v8.17

posted by Stephen De Gabrielle

We are pleased to announce Racket v8.17 is now available from https://download.racket-lang.org/.

As of this release:

The new drracket-core package provides a version of drracket with a smaller set of dependencies.
Typed Racket has support for treelists.
The package manager computes checksums for packages when required, allowing the use and automatic upgrade of packages without them.
The bitwise-first-bit-set function returns the smallest bit that is set in the twos-complement representation of the given number.
The updated dynamic-require function makes it easier to use syntax bindings by allowing a syntax-thunk (or ’eval) to be used for them.
The error-module-path->string-handler parameter allows the customization of the display of module-paths in error messages.
Precision of certain numeric functions (sin, cos, and others) is improved on Windows platforms by using the MSVCRT/UCRT libraries.
The string-append function has improved performance and reduced memory use for long lists of strings in the Racket CS implementation. Differences are clearly noticeable for lists of length 1 million.
TCP ports use SO_KEEPALIVE, instructing the kernel to send periodic messages while waiting for data to check whether the connection is still responsive.
Racket code using a terminal in Windows can receive mouse events as virtual terminal characters after using SetConsoleMode. (This is also already possible on macOS and Linux.) See the tui-term package for related example code.
The #:replace-malformed-surrogate? keyword can be used to specify a replacement for malformed unicode surrogates in JSON input
The http-client module no longer sends “Content-Length: 0” for requests without a body.
The demodularizer (compiler/demod) can prune more unused assignments.
Several judgment rendering forms in Redex are replaced by functions, allowing more convenient abstraction.
When a distribution includes no teaching languages, DrRacket’s language-dialog configuration moves into the preferences dialog and the “Language” menu disappears.
The math library has better support for block-diagonal matrices, including both Racket and Typed Racket.
The math library contains improved implementations of acos and matrix-(cos-)angle.
The stepper again works for big-bang programs.
There are many other repairs and documentation imprevements!

Thank you

The following people contributed to this release:

Alexander Shopov, Andrei Dorian Duma, Bert De Ketelaere, Bob Burger, Bogdan Popa, Bogdana Vereha, Cameron Moy, Chung-chieh Shan, Cutie Deng, D. Ben Knoble, Dario Hamidi, Dominik Pantůček, Gustavo Massaccesi, halfminami, Jacqueline Firth, Jason Hemann, Jens Axel Søgaard, Joel Dueck, John Clements, Jordan Harman, Marc Nieper-Wißkirchen, Matthew Flatt, Matthias Felleisen, Mike Sperber, Noah Ma, owaddell-ib, Philippe Meunier, Robby Findler, Ryan Culpepper, Ryan Ficklin, Sam Phillips, Sam Tobin-Hochstadt, Shu-Hung You, sogaiu, Sorawee Porncharoenwase, Stephen De Gabrielle, Vincent Lee, and Wing Hei Chan.

Racket is a community developed open source project and we welcome new contributors. See racket/README.md to learn how you can be a part of this amazing project.

Feedback Welcome

Questions and discussion welcome at the Racket community on Discourse or Discord.

If you can - please help get the word out to users and platform specific repo packagers

Racket - the Language-Oriented Programming Language - version 8.17 is now available from https://download.racket-lang.org

See https://blog.racket-lang.org/2025/05/racket-v8-17.html for the release announcement and highlights.

by John Clements, Stephen De Gabrielle at Saturday, May 17, 2025

Thursday, May 15, 2025

Andy Wingo

guile on whippet waypoint: goodbye, bdw-gc?

Hey all, just a lab notebook entry today. I’ve been working on the Whippet GC library for about three years now, learning a lot on the way. The goal has always been to replace Guile’s use of the Boehm-Demers-Weiser collector with something more modern and maintainable. Last year I finally got to the point that I felt Whippet was feature-complete, and taking into account the old adage about long arses and brief videos, I think that wasn’t too far off. I carved out some time this spring and for the last month have been integrating Whippet into Guile in anger, on the wip-whippet branch.

the haps

Well, today I removed the last direct usage of the BDW collector’s API by Guile! Instead, Guile uses Whippet’s API any time it needs to allocate an object, add or remove a thread from the active set, identify the set of roots for a collection, and so on. Most tracing is still conservative, but this will move to be more precise over time. I haven’t had the temerity to actually try one of the Nofl-based collectors yet, but that will come soon.

Code-wise, the initial import of Whippet added some 18K lines to Guile’s repository, as counted by git diff --stat, which includes documentation and other files. There was an unspeakable amount of autotomfoolery to get Whippet in Guile’s ancient build system. Changes to Whippet during the course of integration added another 500 lines or so. Integration of Whippet removed around 3K lines of C from Guile. It’s not a pure experiment, as my branch is also a major version bump and so has the freedom to refactor and simplify some things.

Things are better but not perfect. Notably, I switched to build weak hash tables in terms of buckets and chains where the links are ephemerons, which give me concurrent lock-free reads and writes but not resizable tables. I would like to somehow resize these tables in response to GC, but haven’t wired it up yet.

Anyway, next waypoint will be trying out the version of Whippet’s Nofl-based mostly-marking collector that traces all heap edges conservatively. If that works... well if that works... I don’t dare to hope! We will see what we get when that happens. Until then, happy hacking!

by Andy Wingo at Thursday, May 15, 2025

Friday, May 9, 2025

Andy Wingo

a whippet waypoint

Hey peoples! Tonight, some meta-words. As you know I am fascinated by compilers and language implementations, and I just want to know all the things and implement all the fun stuff: intermediate representations, flow-sensitive source-to-source optimization passes, register allocation, instruction selection, garbage collection, all of that.

It started long ago with a combination of curiosity and a hubris to satisfy that curiosity. The usual way to slake such a thirst is structured higher education followed by industry apprenticeship, but for whatever reason my path sent me through a nuclear engineering bachelor’s program instead of computer science, and continuing that path was so distasteful that I noped out all the way to rural Namibia for a couple years.

Fast-forward, after 20 years in the programming industry, and having picked up some language implementation experience, a few years ago I returned to garbage collection. I have a good level of language implementation chops but never wrote a memory manager, and Guile’s performance was limited by its use of the Boehm collector. I had been on the lookout for something that could help, and when I learned of Immix it seemed to me that the only thing missing was an appropriate implementation for Guile, and hey I could do that!

whippet

I started with the idea of an MMTk-style interface to a memory manager that was abstract enough to be implemented by a variety of different collection algorithms. This kind of abstraction is important, because in this domain it’s easy to convince oneself that a given algorithm is amazing, just based on vibes; to stay grounded, I find I always need to compare what I am doing to some fixed point of reference. This GC implementation effort grew into Whippet, but as it did so a funny thing happened: the mark-sweep collector that I prototyped as a direct replacement for the Boehm collector maintained mark bits in a side table, which I realized was a suitable substrate for Immix-inspired bump-pointer allocation into holes. I ended up building on that to develop an Immix collector, but without lines: instead each granule of allocation (16 bytes for a 64-bit system) is its own line.

regions?

The Immix paper is funny, because it defines itself as a new class of mark-region collector, fundamentally different from the three other fundamental algorithms (mark-sweep, mark-compact, and evacuation). Immix’s regions are blocks (64kB coarse-grained heap divisions) and lines (128B “fine-grained” divisions); the innovation (for me) is the optimistic evacuation discipline by which one can potentially defragment a block without a second pass over the heap, while also allowing for bump-pointer allocation. See the papers for the deets!

However what, really, are the regions referred to by mark-region? If they are blocks, then the concept is trivial: everyone has a block-structured heap these days. If they are spans of lines, well, how does one choose a line size? As I understand it, Immix’s choice of 128 bytes was to be fine-grained enough to not lose too much space to fragmentation, while also being coarse enough to be eagerly swept during the GC pause.

This constraint was odd, to me; all of the mark-sweep systems I have ever dealt with have had lazy or concurrent sweeping, so the lower bound on the line size to me had little meaning. Indeed, as one reads papers in this domain, it is hard to know the real from the rhetorical; the review process prizes novelty over nuance. Anyway. What if we cranked the precision dial to 16 instead, and had a line per granule?

That was the process that led me to Nofl. It is a space in a collector that came from mark-sweep with a side table, but instead uses the side table for bump-pointer allocation. Or you could see it as an Immix whose line size is 16 bytes; it’s certainly easier to explain it that way, and that’s the tack I took in a recent paper submission to ISMM’25.

paper??!?

Wait what! I have a fine job in industry and a blog, why write a paper? Gosh I have meditated on this for a long time and the answers are very silly. Firstly, one of my language communities is Scheme, which was a research hotbed some 20-25 years ago, which means many practitioners—people I would be pleased to call peers—came up through the PhD factories and published many interesting results in academic venues. These are the folks I like to hang out with! This is also what academic conferences are, chances to shoot the shit with far-flung fellows. In Scheme this is fine, my work on Guile is enough to pay the intellectual cover charge, but I need more, and in the field of GC I am not a proven player. So I did an atypical thing, which is to cosplay at being an independent researcher without having first been a dependent researcher, and just solo-submit a paper. Kids: if you see yourself here, just go get a doctorate. It is not easy but I can only think it is a much more direct path to goal.

And the result? Well, friends, it is this blog post :) I got the usual assortment of review feedback, from the very sympathetic to the less so, but ultimately people were confused by leading with a comparison to Immix but ending without an evaluation against Immix. This is fair and the paper does not mention that, you know, I don’t have an Immix lying around. To my eyes it was a good paper, an 80% paper, but, you know, just a try. I’ll try again sometime.

In the meantime, I am driving towards getting Whippet into Guile. I am hoping that sometime next week I will have excised all the uses of the BDW (Boehm GC) API in Guile, which will finally allow for testing Nofl in more than a laboratory environment. Onwards and upwards!

by Andy Wingo at Friday, May 9, 2025

Planet Scheme

Thursday, August 7, 2025

the big idea

heap growth

parallel worklist tweaks

a hilarious bug

fin

A new Unix Domain Sockets netlayer

Speeding, speeding, speeding ahead!

Speeding up spawn by bypassing the elfs

Become your new you, faster than ever

Getting the release

Thursday, July 31, 2025

What the future holds

Friday, July 25, 2025

Tuesday, July 8, 2025

bugs

next?

Wednesday, June 11, 2025

Monday, June 9, 2025

Wednesday, June 4, 2025

Jam stats

About Goblinville

What went well

What didn’t go so well

Post-jam updates

Wrapping up

Tuesday, May 27, 2025

Thursday, May 22, 2025

growable heaps

observations

for nofl?

fin

Wednesday, May 21, 2025

Prologue: The quest for a functional hash table for Goblins

What the hoot is a HAMT anyway?

Trie representation

Insertion algorithm

Wrapping up

Saturday, May 17, 2025

As of this release:

Thank you

Feedback Welcome

Please share

Thursday, May 15, 2025

the haps

Friday, May 9, 2025

whippet

regions?

paper??!?

Wednesday, May 7, 2025

About

Maintenance

Blogs

Orrery

Last updated

Powered by