Planet Scheme

Wednesday, June 4, 2025

spritely.institute

Goblinville: A Spring Lisp Game Jam 2025 retrospective

Spritely participates in the Lisp Game Jam to make interactive artifacts demonstrating our progress building out our tech stack. The 2025 edition of the Spring Lisp Game Jam recently wrapped up and this time around we were finally able to show off using both Hoot and Goblins together to create a multiplayer virtual world demo! Now that we’ve had a moment to breathe, it’s time to share what we built and reflect on the experience.

But first, some stats about the jam overall.

Jam stats

Out of 26 total entries, 7 were made with Guile Scheme, including ours. Of those 7, all but one used Hoot, our Scheme to WebAssembly compiler. Guile tied for first place with Fennel as the most used Lisp implementation for the jam. We’re thrilled to see that Guile and Hoot have become popular choices for this jam!

Though many entries used Hoot, our entry was the only one that used Goblins, our distributed programming framework. However, David Wilson of System Crafters gets an honorable mention because he streamed several times throughout the jam while working on a MUD built with Goblins that was ultimately unsubmitted.

Our entry was Goblinville and it was rated the 7th best game in the jam overall. Not bad!

About Goblinville

Goblinville is a 2D, multiplayer, virtual world demo. During last year’s Spring Lisp Game Jam we made Cirkoban with a restricted subset of Goblins that had no network functionality. Since then, we’ve made a lot of progress porting Goblins to Hoot, culminating with the Goblins 0.15.0 release in January that featured OCapN working in the web browser using WebSockets.

Given all of this progress, we really wanted to show off a networked game this time. Making a multiplayer game for a jam is generally considered a bad idea, but Spritely is all about building networked communities so that’s what we set out to do. Our goal was to make something of a spiritual successor to the community garden demo I made when I first joined Spritely.

Screenshot of Jessica Tallon inGoblinville

What went well

First, let’s reflect on the good stuff. Here’s what went well:

  • Having participated in this jam a number of times, we have gotten pretty good at scoping projects down into something achievable.

  • Goblins made it easy to describe the game world as a collection of actors that communicate asynchronously. Initially, the entire world was hosted inside a single web browser tab. Once enough essential actors were implemented it was a simple task to push most of those actors into a separate server process. Since sending a message to a Goblins actor is the same whether it is local or remote, this change required little more than setting up an OCapN connection.

  • Communicating with actors over OCapN really helped with creating an architecture that separated server state from client-side input and rendering concerns. This was harder to think about with Cirkoban because there was no network separation.

  • The Hoot game jam template made it easy to get started quickly. It had been a year since we made our last game, so having a small template project was useful while we were refreshing our memory about the various Web APIs we needed to use.

  • The vast amount of freely licensed Liberated Pixel Cup (something our Executive Director Christine Lemmer-Webber organized back in her days at Creative Commons) assets allowed us to focus on the code while still having pleasing graphics that felt unified.

As a bonus, David Wilson gave Goblinville a shout out on a System Crafters stream and a bunch of people joined the server while I was online! It was a really cool moment.

Screenshot of six Goblinville players on screen at once

What didn’t go so well

Game jams are fast paced (even though the Lisp Game Jam is more relaxed than the average jam) and not everything goes according to plan. A big part of the game jam experience is to practice adjusting project scope as difficulties arise. Issues with the project included:

  • Time pressure. Unfortunately, we didn’t have as much time to dedicate to this project that we would have liked. We weren’t able to start work until the Monday after the jam started, so we only had 7 days instead of 10. Also, I came down with a cold at the end of the week which didn’t help my productivity. Making something that felt as polished as Cirkoban simply wasn’t possible.

  • Lack of persistence for the game world. There’s still some amount of pre-planning that goes into writing actors that can persist that we didn’t have time for. Furthermore, while our persistence system is written to support incremental updates, we don’t have a storage backend that supports it yet. Each tick of the game world would trigger a full re-serialization and we felt that was too much of a performance penalty. We hope that by the next jam this will no longer be an issue.

  • As predicted, multiplayer increased overall complexity. What felt like a stable enough world during local testing was quickly shown to have several performance issues and bugs once it was released to the public and other people started using it. We had to restart the server once every day or so during the jam rating period (though we have resolved these issues in a post-jam update). Since we weren’t persisting the game world, each restart wiped out all registered players and the state of the map.

  • No client-side prediction to mask lag. For example, when you press an arrow key to move, you won’t see the player sprite move in the client until it receives a notification from the server that the move was valid. In other words, how responsive the controls feel is directly tied to server lag. A production game client would move the player immediately and fix things up later if it receives contradictory information from the server.

Screenshot of 3 Goblinville players online, user “djm” is saying“hi!”

Post-jam updates

We did a bit of additional work after the jam was over to sand some of the roughest edges:

  • Re-architected the server update loop to greatly reduce message volume. Because it was simple to implement, actors in the game world were being sent a tick message at 60Hz to update their internal state. Most of the time, the actors would simply do nothing. A plant that is done growing has nothing left to do, so that’s 60 wasteful messages per second per plant. Instead, a timer system was added to schedule things to happen after so many ticks of the game world and the tick method was removed from all game objects. This greatly improved server stability, especially for worlds with lots of live objects. As of writing, we’ve had a server running for six days without any noticeable increase in lag.

  • Added a server event log. It was hard to see what was going on in the world during the jam rating period without being connected to the graphical client. Now the server process emits a timestamped log of every event to standard output.

  • Added character sprite selection. This feature just barely missed the jam submission deadline, but it’s in now! Instead of all players being the same sprite, there are now six to choose from.

  • Took down the public server. For the jam submission version, we had baked a URI into the itch.io client to a public server we were hosting so the game would “just work”. This was particularly important for the other participants who were rating the submitted games and giving feedback. Since the jam rating period is now over, we took down the public server. If you’re interested in trying out Goblinville, you can follow the instructions in the README to host your own server.

Also, Spritely co-founder Randy Farmer stopped by our updated Goblinville world!

Screenshot of current Spritely staff plus co-founder Randy Farmer inGoblinville

Wrapping up

Goblinville turned out to be more of a tech demo than a true game, but we’re quite happy with the result. We think it’s a good demonstration of what can be built with Goblins and Hoot in a short amount of time. We hope to build on this success to create even more engaging, featureful demos in the future!

by Dave Thompson (contact@spritely.institute) at Wednesday, June 4, 2025

Tuesday, May 27, 2025

Idiomdrottning

Endless scroll

The “endless scroll” debate was after it replaced pages where you’d scroll scroll scroll click, scroll scroll scroll click, scroll scroll scroll click. That was annoying while still not actually stemming addiction (at least for me). I’d still read through those megathreads on RPG.net, UI annoyances or no. The endless scroll it just took the clicks out of that process which was an improvement. But what I want is instead taking scrolls out of the process! So it’s tap, tap, tap, tap—like an ebook!

Probably going to be just as addictive but I won’t get anxiety from all the scrolling.

Scrolling and panning is fiddly and I never get exactly the right amount of page scrolled it’s like threding a needle repeatedly and most psge down algos are no good either since they’re paging in a text format that’s not designed for pages so you have to read the same couple of lines twice, last on this page and first on the next. So in the future maybe we’ll render HTML as actual pages (after all, epub readers can [sorta] do it). Even less and more on Unix can do it; they show all of one page, then all of the next page separately and so on. The weaksauce nature of page down in GUI apps like Netscape was one of the biggest letdowns when I first started using them in the nineties.

However, the addiction dark pattern has another component; the endless and often junky content which really makes the scroll endless. That part can not stay.

That’s a secondary reason for why I don’t like discover algorithms on Mastodon, the primary reason being how it’s artificial virality.

by Idiomdrottning (sandra.snan@idiomdrottning.org) at Tuesday, May 27, 2025

Thursday, May 22, 2025

Andy Wingo

whippet lab notebook: guile, heuristics, and heap growth

Greets all! Another brief note today. I have gotten Guile working with one of the Nofl-based collectors, specifically the one that scans all edges conservatively (heap-conservative-mmc / heap-conservative-parallel-mmc). Hurrah!

It was a pleasant surprise how easy it was to switch—from the user’s point of view, you just pass --with-gc=heap-conservative-parallel-mmc to Guile’s build (on the wip-whippet branch); when developing I also pass --with-gc-debug, and I had a couple bugs to fix—but, but, there are still some issues. Today’s note thinks through the ones related to heap sizing heuristics.

growable heaps

Whippet has three heap sizing strategies: fixed, growable, and adaptive (MemBalancer). The adaptive policy is the one I would like in the long term; it will grow the heap for processes with a high allocation rate, and shrink when they go idle. However I won’t really be able to test heap shrinking until I get precise tracing of heap edges, which will allow me to evacuate sparse blocks.

So for now, Guile uses the growable policy, which attempts to size the heap so it is at least as large as the live data size, times some multiplier. The multiplier currently defaults to 1.75×, but can be set on the command line via the GUILE_GC_OPTIONS environment variable. For example to set an initial heap size of 10 megabytes and a 4× multiplier, you would set GUILE_GC_OPTIONS=heap-size-multiplier=4,heap-size=10M.

Anyway, I have run into problems! The fundamental issue is fragmentation. Consider a 10MB growable heap with a 2× multiplier, consisting of a sequence of 16-byte objects followed by 16-byte holes. You go to allocate a 32-byte object. This is a small object (8192 bytes or less), and so it goes in the Nofl space. A Nofl mutator holds on to a block from the list of sweepable blocks, and will sequentially scan that block to find holes. However, each hole is only 16 bytes, so we can’t fit our 32-byte object: we finish with the current block, grab another one, repeat until no blocks are left and we cause GC. GC runs, and after collection we have an opportunity to grow the heap: but the heap size is already twice the live object size, so the heuristics say we’re all good, no resize needed, leading to the same sweep again, leading to a livelock.

I actually ran into this case during Guile’s bootstrap, while allocating a 7072-byte vector. So it’s a thing that needs fixing!

observations

The root of the problem is fragmentation. One way to solve the problem is to remove fragmentation; using a semi-space collector comprehensively resolves the issue, modulo any block-level fragmentation.

However, let’s say you have to live with fragmentation, for example because your heap has ambiguous edges that need to be traced conservatively. What can we do? Raising the heap multiplier is an effective mitigation, as it increases the average hole size, but for it to be a comprehensive solution in e.g. the case of 16-byte live objects equally interspersed with holes, you would need a multiplier of 512× to ensure that the largest 8192-byte “small” objects will find a hole. I could live with 2× or something, but 512× is too much.

We could consider changing the heap organization entirely. For example, most mark-sweep collectors (BDW-GC included) partition the heap into blocks whose allocations are of the same size, so you might have some blocks that only hold 16-byte allocations. It is theoretically possible to run into the same issue, though, if each block only has one live object, and the necessary multiplier that would “allow” for more empty blocks to be allocated is of the same order (256× for 4096-byte blocks each with a single 16-byte allocation, or even 4096× if your blocks are page-sized and you have 64kB pages).

My conclusion is that practically speaking, if you can’t deal with fragmentation, then it is impossible to just rely on a heap multiplier to size your heap. It is certainly an error to live-lock the process, hoping that some other thread mutates the graph in such a way to free up a suitable hole. At the same time, if you have configured your heap to be growable at run-time, it would be bad policy to fail an allocation, just because you calculated that the heap is big enough already.

It’s a shame, because we lose a mooring on reality: “how big will my heap get” becomes an unanswerable question because the heap might grow in response to fragmentation, which is not deterministic if there are threads around, and so we can’t reliably compare performance between different configurations. Ah well. If reliability is a goal, I think one needs to allow for evacuation, one way or another.

for nofl?

In this concrete case, I am still working on a solution. It’s going to be heuristic, which is a bit of a disappointment, but here we are.

My initial thought has two parts. Firstly, if the heap is growable but cannot defragment, then we need to reserve some empty blocks after each collection, even if reserving them would grow the heap beyond the configured heap size multiplier. In that way we will always be able to allocate into the Nofl space after a collection, because there will always be some empty blocks. How many empties? Who knows. Currently Nofl blocks are 64 kB, and the largest “small object” is 8kB. I’ll probably try some constant multiplier of the heap size.

The second thought is that searching through the entire heap for a hole is a silly way for the mutator to spend its time. Immix will reserve a block for overflow allocation: if a medium-sized allocation (more than 256B and less than 8192B) fails because no hole in the current block is big enough—note that Immix’s holes have 128B granularity—then the allocation goes to a dedicated overflow block, which is taken from the empty block set. This reduces fragmentation (holes which were not used for allocation because they were too small).

Nofl should probably do the same, but given its finer granularity, it might be better to sweep over a variable number of blocks, for example based on the logarithm of the allocation size; one could instead sweep over clz(min-size)–clz(size) blocks before taking from the empty block list, which would at least bound the sweeping work of any given allocation.

fin

Welp, just wanted to get this out of my head. So far, my experience with this Nofl-based heap configuration is mostly colored by live-locks, and otherwise its implementation of a growable heap sizing policy seems to be more tight-fisted regarding memory allocation than BDW-GC’s implementation. I am optimistic though that I will be able to get precise tracing sometime soon, as measured in development time; the problem as always is fragmentation, in that I don’t have a hole in my calendar at the moment. Until then, sweep on Wayne, cons on Garth, onwards and upwards!

by Andy Wingo at Thursday, May 22, 2025

Wednesday, May 21, 2025

spritely.institute

Functional hash tables explained

Prologue: The quest for a functional hash table for Goblins

For those of us that use the Lisp family of programming languages, we have much appreciation for the humble pair. Using pairs, we can construct singly-linked lists and key/value mappings called association lists. Lists built from pairs have the pleasant property of being immutable (if you abstain from using setters!) and persistent: extending a list with a new element creates a new list that shares all the data from the original list. Adding to a list is a constant time operation, but lookup is linear time. Thus, lists are not appropriate when we need constant time lookup. For that, we need hash tables.

The classic hash table is a mutable data structure and one that our Lisp of choice (Guile, a Scheme implementation) includes in its standard library, like most languages. Hash tables are neither immutable nor persistent; adding or removing a new key/value pair to/from a hash table performs an in-place modification of the underlying memory. Mutable data structures introduce an entire class of bug possibilities that immutable data structures avoid, and they’re particularly tricky to use successfully in a multi-threaded program.

Fortunately, there exists an immutable, persistent data structure that is suitable: the Hash Array Mapped Trie or HAMT, for short. This data structure was introduced by Phil Bagwell in the paper “Ideal Hash Trees� (2001). HAMTs were popularized in the Lisp world by Clojure over a decade ago. Unfortunately, Guile does not currently provide a HAMT-based functional hash table in its standard library.

Instead, Guile comes with the VList, another one of Phil Bagwell’s creations which can be used to build a VHash. Goblins currently uses VLists for its functional hash table needs, but HAMTs are better suited for the task.

There are various implementations of HAMTs in Scheme floating about, but none of them seem to have notable adoption in a major Guile project. So, we thought we’d write our own that we could ensure meets the needs of Goblins, compiles on Hoot, and that might just be useful enough to send upstream for inclusion into Guile after it’s been battle tested. To that end, we recently added the (goblins utils hashmap) module. This will become the base for a future re-implementation of the ^ghash actor.

Okay, enough context. Let’s talk about HAMTs!

What the hoot is a HAMT anyway?

From the outside, HAMTs can seem rather mysterious and intimidating. That’s how I felt about them, at least. However, once I dug into the topic and started writing some code, I was pleasantly surprised that the essential details were not too complicated.

As mentioned previously, the HAMT is an immutable, persistent data structure that associates keys with their respective values, just like a regular hash table. By utilizing a special kind of tree known as a “trie� plus some nifty bit shifting tricks, HAMTs achieve effectively constant time insertion, deletion, and lookup despite all operations being logarithmic time on paper.

A trie differs from a tree in the following way: tries only store keys in their leaf nodes. If you’re familiar with binary search trees, tries aren’t like that. Instead, the key itself (or the hash thereof, in our case) encodes the path through the trie to the appropriate leaf node containing the associated value. Tries are also called “prefix trees� for this reason.

A binary tree node can have at most two children, but HAMTs have a much larger branching factor (typically 32). These tries are wide and shallow, so few nodes need to be traversed to find the value for any given key. This is why the logarithmic time complexity of HAMT operations can be treated as if it were constant time in practice.

Trie representation

To explain, let’s start by defining a trie node using a branching factor of 32. At its simplest, we can think of a trie node as a 32-element array. Each element of the trie can contain either a leaf node (a key/value pair) or a pointer to a subtrie (another 32 element array).

For example, a HAMT with 3 entries might look like this:

Example trie visualization

This is nice and simple, but it’s a little too simple. With such a large branching factor, it’s likely that many boxes in the trie will have nothing in them, like in the above example. This is a waste of space. The issue is further compounded by immutability: adding or removing a single element requires allocating a new 32 element array. For the sake of efficiency, we need to do better.

Instead, we’ll use a sparse array that only stores the occupied elements of the trie node. To keep track of which elements of the theoretical 32 element array are occupied, we’ll use a bitmap. Below is an example trie:

Example trie visualization

In the above example, bits 4 and 10 (starting from the right) are set. This means that of the 32 possible elements, only 2 are currently occupied. Thus, the size of the underlying array is 2.

To get the number of occupied elements in the trie node, we simply count the number of 1s in the bitmap. This is known as the “population count�. (We could also just check the size of the array, but this bit counting idea is about to have another important use.)

To retrieve the value at index 10, we need to perform a translation to get an index into the underlying array. To do this, we compute the population count of the bits set to the right of bit 10. There is 1 bit set: bit 4. Thus, the value we’re looking for is stored at index 1 in the underlying array. If we take a peek, we find bar → 2 there.

Thanks to this sparse storage technique, insertion/deletion operations will allocate less memory overall.

Insertion algorithm

With the trie representation out of the way, let’s walk through the algorithm to insert a new key/value pair. The insertion algorithm covers all the essential aspects of working with a HAMT.

Let’s say we want insert the mapping foo→42 into an empty HAMT. The empty HAMT consists of a single node with an empty bitmap and an empty array:

Empty trie

To figure out where to store the key foo, we first need to compute the hash code for it. For ease of demonstration, we’ll use 10-bit hash codes. (In practice, 32 bits or more is ideal.)

Let’s say our fictitious hash function produces the hash bits 1000010001 for foo.

Each trie node has 32 possible branches. Thus, the range of indices for a node can be represented using a 5-bit unsigned integer. What if we took the 5 most significant bits of the hash code and used that as our index into the trie? Wow, that sounds like a clever idea! Let’s do that!

Hash bits for foo

So, foo gets inserted at index 16. The original trie is empty, so we’ll make a new trie with a single leaf node. To do this, we need to set bit 16 in our bitmap and create an array with just one key/value pair in it. Our output trie looks like this:

Single level trie with one leaf node

Note that we only examined 5 bits of the hash code. We only need to examine as many bits in the hash as it takes to find an empty element.

Let’s insert another key/value pair, this time for key bar and value 17. Our made-up hash code for bar is 0100100001. Repeating the process above, the most significant 5 bits are 01001, so our index is 9, another unoccupied index. Our new trie looks like this:

Single level trie with two leaf nodes

Because 9 < 16, the entry for bar is stored in array index 0, followed by foo.

As a last example, let’s insert a key/value pair where the most significant bits of the hash code collide with an existing entry. This time, the key is baz and the value is 66. Our made-up hash code for baz is 1000000001.

The most significant 5 bits are 10000, so our index is 16. Now things get interesting because 16 is already occupied; the first 5 bits are not enough to distinguish between the keys foo and baz! To resolve this, we’ll replace the leaf node with a subtrie that uses the next 5 bits when calculating indices. The resulting root trie node will look sort of like this:

Subtrie placeholder

In the figure above, subtrie is a placeholder for the new trie we need to construct to hold the mappings for foo and baz.

To create the subtrie, we just recursively apply the same algorithm we’ve already been using but with different bits. Now we’re looking at these the least significant 5 bits of the hash codes:

Hash bits for foo and bar

The index for foo is 17, and the index for baz is 1. So, our new subtrie will have bits 1 and 17 set, and contain 2 leaf nodes:

Subtrie with two leaf nodes

Putting it all together, the complete trie looks like this:

Complete trie with a subtrie and three total leaf nodes

Each insertion creates a new trie. The new trie shares nearly all of the data with the original trie. Only the root node and the visited subtries need to be allocated afresh. This makes insertion even into large HAMTs quite efficient.

The partial hash collision described above could happen recursively, in which case the trie will grow yet another level deeper upon each iteration. The worst case scenario is that two or more keys have the same exact hash code. A good hash function and lots of hash bits will make this a very rare occurrence, but a robust insertion algorithm needs to account for this case. We’ll gloss over this edge case here, but hash collisions can be handled by “bottoming out� to a special leaf node: a linked list of the colliding key/value pairs. These lists need to be traversed linearly when necessary upon lookup or future insertions that cause more collisions. In practice, these collision lists tend to be quite short, often only of length 2.

Wrapping up

The insertion algorithm explains all of the essential elements of a HAMT and how to work with its sparse storage structure, but we’ll touch upon lookup and deletion very briefly.

Looking up the value for a key follows the same basic process as insertion, but in a read-only manner. If the hash bits point to an unoccupied element of a trie node then the search fails right then and there. If the bits point to a leaf node with a matching key, the search succeeds. If the key doesn’t match, the search fails. Finally, if the bits point to a subtrie, we recursively search that subtrie with the next set of bits.

Deletion is the most complicated operation as it involves path compression when subtries become empty. If you’ve been successfully nerd sniped by this blog post then consider this a homework assignment. Give Phil Bagwell’s paper a read! 🙂

Thanks for following along! Hope this was fun!

by Dave Thompson (contact@spritely.institute) at Wednesday, May 21, 2025

Saturday, May 17, 2025

The Racket Blog

Racket v8.17

posted by Stephen De Gabrielle


We are pleased to announce Racket v8.17 is now available from https://download.racket-lang.org/.

As of this release:

  • The new drracket-core package provides a version of drracket with a smaller set of dependencies.
  • Typed Racket has support for treelists.
  • The package manager computes checksums for packages when required, allowing the use and automatic upgrade of packages without them.
  • The bitwise-first-bit-set function returns the smallest bit that is set in the twos-complement representation of the given number.
  • The updated dynamic-require function makes it easier to use syntax bindings by allowing a syntax-thunk (or ’eval) to be used for them.
  • The error-module-path->string-handler parameter allows the customization of the display of module-paths in error messages.
  • Precision of certain numeric functions (sin, cos, and others) is improved on Windows platforms by using the MSVCRT/UCRT libraries.
  • The string-append function has improved performance and reduced memory use for long lists of strings in the Racket CS implementation. Differences are clearly noticeable for lists of length 1 million.
  • TCP ports use SO_KEEPALIVE, instructing the kernel to send periodic messages while waiting for data to check whether the connection is still responsive.
  • Racket code using a terminal in Windows can receive mouse events as virtual terminal characters after using SetConsoleMode. (This is also already possible on macOS and Linux.) See the tui-term package for related example code.
  • The #:replace-malformed-surrogate? keyword can be used to specify a replacement for malformed unicode surrogates in JSON input
  • The http-client module no longer sends “Content-Length: 0” for requests without a body.
  • The demodularizer (compiler/demod) can prune more unused assignments.
  • Several judgment rendering forms in Redex are replaced by functions, allowing more convenient abstraction.
  • When a distribution includes no teaching languages, DrRacket’s language-dialog configuration moves into the preferences dialog and the “Language” menu disappears.
  • The math library has better support for block-diagonal matrices, including both Racket and Typed Racket.
  • The math library contains improved implementations of acos and matrix-(cos-)angle.
  • The stepper again works for big-bang programs.
  • There are many other repairs and documentation imprevements!

Thank you

The following people contributed to this release:

Alexander Shopov, Andrei Dorian Duma, Bert De Ketelaere, Bob Burger, Bogdan Popa, Bogdana Vereha, Cameron Moy, Chung-chieh Shan, Cutie Deng, D. Ben Knoble, Dario Hamidi, Dominik Pantůček, Gustavo Massaccesi, halfminami, Jacqueline Firth, Jason Hemann, Jens Axel Søgaard, Joel Dueck, John Clements, Jordan Harman, Marc Nieper-Wißkirchen, Matthew Flatt, Matthias Felleisen, Mike Sperber, Noah Ma, owaddell-ib, Philippe Meunier, Robby Findler, Ryan Culpepper, Ryan Ficklin, Sam Phillips, Sam Tobin-Hochstadt, Shu-Hung You, sogaiu, Sorawee Porncharoenwase, Stephen De Gabrielle, Vincent Lee, and Wing Hei Chan.

Racket is a community developed open source project and we welcome new contributors. See racket/README.md to learn how you can be a part of this amazing project.

Feedback Welcome

Questions and discussion welcome at the Racket community on Discourse or Discord.

Please share

If you can - please help get the word out to users and platform specific repo packagers

Racket - the Language-Oriented Programming Language - version 8.17 is now available from https://download.racket-lang.org

See https://blog.racket-lang.org/2025/05/racket-v8-17.html for the release announcement and highlights.

by John Clements, Stephen De Gabrielle at Saturday, May 17, 2025

Friday, May 16, 2025

Arthur A. Gleckler

Emacs Web Editor

WYSIWYG, or What You See Is What You Get, is the way to go. When I started this blog, I used Scheme code to represent the text of each blog post, like this:

(blurb
  ((p)
    "Use these interactive maps to visualize where you've been "
    ((a href "/maps/usa") "in the USA")
    " and "
    ((a href "/maps/world") "in the world")
    ".")
  ...)

As you can imagine, that's a tedious way to edit text. So I switched to Emacs Org mode, combined with a home-grown wrapper around Pandoc to convert the Org file to HTML. But Org mode, like Markdown, is impoverished in many ways. [Add detail here.] So I switched to editing HTML manually, and wrote a clunky MHTML-mode extension to make the markup disappear during normal editing, leaving just the text. But that was still no fun to use, especially because, like the other approaches, it meant editing the posts in a way that didn't look at all like the final product.

I finally decided to bite the bullet, and invest in something better. I've written EWE, the Emacs Web Editor, an editor that runs a full web browser, but implements full Emacs text-editing commands. I'm writing this post using EWE.

Notes

- Make a hero image.

- Explain the reason for using HTML over Markdown or Org mode.

- Explain the reason for reading and writing via HTTP. Before publishing, add support for reading and writing files.

- Include a video demo. Display keys as they are entered.

by Arthur A. Gleckler at Friday, May 16, 2025

Thursday, May 15, 2025

Andy Wingo

guile on whippet waypoint: goodbye, bdw-gc?

Hey all, just a lab notebook entry today. I’ve been working on the Whippet GC library for about three years now, learning a lot on the way. The goal has always been to replace Guile’s use of the Boehm-Demers-Weiser collector with something more modern and maintainable. Last year I finally got to the point that I felt Whippet was feature-complete, and taking into account the old adage about long arses and brief videos, I think that wasn’t too far off. I carved out some time this spring and for the last month have been integrating Whippet into Guile in anger, on the wip-whippet branch.

the haps

Well, today I removed the last direct usage of the BDW collector’s API by Guile! Instead, Guile uses Whippet’s API any time it needs to allocate an object, add or remove a thread from the active set, identify the set of roots for a collection, and so on. Most tracing is still conservative, but this will move to be more precise over time. I haven’t had the temerity to actually try one of the Nofl-based collectors yet, but that will come soon.

Code-wise, the initial import of Whippet added some 18K lines to Guile’s repository, as counted by git diff --stat, which includes documentation and other files. There was an unspeakable amount of autotomfoolery to get Whippet in Guile’s ancient build system. Changes to Whippet during the course of integration added another 500 lines or so. Integration of Whippet removed around 3K lines of C from Guile. It’s not a pure experiment, as my branch is also a major version bump and so has the freedom to refactor and simplify some things.

Things are better but not perfect. Notably, I switched to build weak hash tables in terms of buckets and chains where the links are ephemerons, which give me concurrent lock-free reads and writes but not resizable tables. I would like to somehow resize these tables in response to GC, but haven’t wired it up yet.

Anyway, next waypoint will be trying out the version of Whippet’s Nofl-based mostly-marking collector that traces all heap edges conservatively. If that works... well if that works... I don’t dare to hope! We will see what we get when that happens. Until then, happy hacking!

by Andy Wingo at Thursday, May 15, 2025

Friday, May 9, 2025

Andy Wingo

a whippet waypoint

Hey peoples! Tonight, some meta-words. As you know I am fascinated by compilers and language implementations, and I just want to know all the things and implement all the fun stuff: intermediate representations, flow-sensitive source-to-source optimization passes, register allocation, instruction selection, garbage collection, all of that.

It started long ago with a combination of curiosity and a hubris to satisfy that curiosity. The usual way to slake such a thirst is structured higher education followed by industry apprenticeship, but for whatever reason my path sent me through a nuclear engineering bachelor’s program instead of computer science, and continuing that path was so distasteful that I noped out all the way to rural Namibia for a couple years.

Fast-forward, after 20 years in the programming industry, and having picked up some language implementation experience, a few years ago I returned to garbage collection. I have a good level of language implementation chops but never wrote a memory manager, and Guile’s performance was limited by its use of the Boehm collector. I had been on the lookout for something that could help, and when I learned of Immix it seemed to me that the only thing missing was an appropriate implementation for Guile, and hey I could do that!

whippet

I started with the idea of an MMTk-style interface to a memory manager that was abstract enough to be implemented by a variety of different collection algorithms. This kind of abstraction is important, because in this domain it’s easy to convince oneself that a given algorithm is amazing, just based on vibes; to stay grounded, I find I always need to compare what I am doing to some fixed point of reference. This GC implementation effort grew into Whippet, but as it did so a funny thing happened: the mark-sweep collector that I prototyped as a direct replacement for the Boehm collector maintained mark bits in a side table, which I realized was a suitable substrate for Immix-inspired bump-pointer allocation into holes. I ended up building on that to develop an Immix collector, but without lines: instead each granule of allocation (16 bytes for a 64-bit system) is its own line.

regions?

The Immix paper is funny, because it defines itself as a new class of mark-region collector, fundamentally different from the three other fundamental algorithms (mark-sweep, mark-compact, and evacuation). Immix’s regions are blocks (64kB coarse-grained heap divisions) and lines (128B “fine-grained” divisions); the innovation (for me) is the optimistic evacuation discipline by which one can potentially defragment a block without a second pass over the heap, while also allowing for bump-pointer allocation. See the papers for the deets!

However what, really, are the regions referred to by mark-region? If they are blocks, then the concept is trivial: everyone has a block-structured heap these days. If they are spans of lines, well, how does one choose a line size? As I understand it, Immix’s choice of 128 bytes was to be fine-grained enough to not lose too much space to fragmentation, while also being coarse enough to be eagerly swept during the GC pause.

This constraint was odd, to me; all of the mark-sweep systems I have ever dealt with have had lazy or concurrent sweeping, so the lower bound on the line size to me had little meaning. Indeed, as one reads papers in this domain, it is hard to know the real from the rhetorical; the review process prizes novelty over nuance. Anyway. What if we cranked the precision dial to 16 instead, and had a line per granule?

That was the process that led me to Nofl. It is a space in a collector that came from mark-sweep with a side table, but instead uses the side table for bump-pointer allocation. Or you could see it as an Immix whose line size is 16 bytes; it’s certainly easier to explain it that way, and that’s the tack I took in a recent paper submission to ISMM’25.

paper??!?

Wait what! I have a fine job in industry and a blog, why write a paper? Gosh I have meditated on this for a long time and the answers are very silly. Firstly, one of my language communities is Scheme, which was a research hotbed some 20-25 years ago, which means many practitioners—people I would be pleased to call peers—came up through the PhD factories and published many interesting results in academic venues. These are the folks I like to hang out with! This is also what academic conferences are, chances to shoot the shit with far-flung fellows. In Scheme this is fine, my work on Guile is enough to pay the intellectual cover charge, but I need more, and in the field of GC I am not a proven player. So I did an atypical thing, which is to cosplay at being an independent researcher without having first been a dependent researcher, and just solo-submit a paper. Kids: if you see yourself here, just go get a doctorate. It is not easy but I can only think it is a much more direct path to goal.

And the result? Well, friends, it is this blog post :) I got the usual assortment of review feedback, from the very sympathetic to the less so, but ultimately people were confused by leading with a comparison to Immix but ending without an evaluation against Immix. This is fair and the paper does not mention that, you know, I don’t have an Immix lying around. To my eyes it was a good paper, an 80% paper, but, you know, just a try. I’ll try again sometime.

In the meantime, I am driving towards getting Whippet into Guile. I am hoping that sometime next week I will have excised all the uses of the BDW (Boehm GC) API in Guile, which will finally allow for testing Nofl in more than a laboratory environment. Onwards and upwards!

by Andy Wingo at Friday, May 9, 2025

Wednesday, May 7, 2025

Scheme Requests for Implementation

SRFI 262: Extensible pattern matcher

SRFI 262 is now in draft status.

A pattern matching form which can operate on arbitrary Scheme values is defined. It conforms to the following design principles.

The syntax of patterns is declarative. The syntax is extensible in the same way that Scheme’s procedural syntax is extensible by macros.

For most use cases, the use of the pattern matcher should produce code which is essentially as efficient at run time as equivalent procedural Scheme code would be, assuming a moderately optimizing Scheme implementation. This applies only when the equivalent code is also equivalently correct in terms of handling of error cases (i.e. including all type checks done automatically by the pattern matcher). However, using extension syntax should not cause this principle to be violated (provided the extension syntax is appropriately implemented).

by Daphne Preston-Kendal at Wednesday, May 7, 2025

SRFI 261: Portable SRFI Library Reference

SRFI 261 is now in draft status.

This SRFI proposal addresses systemic compatibility issues exposed by the library reference format defined in SRFI 97 (srfi :<SRFI number> <identifier> ...) and advocates for a modernized, portable alternative: (srfi srfi-<SRFI number> <identifier> ...).

by WANG Zheng at Wednesday, May 7, 2025

SRFI 260: Generated Symbols

SRFI 260 is now in final status.

This SRFI defines the procedure generate-symbol. Each time it is invoked, the procedure returns a new symbol whose name cannot be guessed. The returned symbol is a standard symbol for all purposes; it obeys write/read invariance and it is equal to another symbol if and only if their names are spelled the same.

by Marc Nieper-Wißkirchen at Wednesday, May 7, 2025

SRFI 259: Tagged procedures with type safety

SRFI 259 is now in final status.

Tagged procedures are procedures with boxes attached, which can be used to create applicable records and other abstractions. This SRFI proposes a variant with the notion of a tagging protocol, analogous to a record type definition, for ensuring encapsulation and security for tagged procedures.

by Daphne Preston-Kendal at Wednesday, May 7, 2025

Tuesday, May 6, 2025

spritely.institute

Hoot 0.6.1 released!

We are excited to announce the release of Hoot 0.6.1! Hoot is a Scheme to WebAssembly compiler backend for Guile, as well as a general purpose WebAssembly toolchain. In other words, Scheme in the browser!

This small patch release contains bug fixes and a few new features added since the 0.6.0 release back in January. Just in time for the upcoming Lisp Game Jam!

Make a game with Hoot!

The 2025 edition of the Spring Lisp Game Jam starts on Friday, 5/9! The Lisp Game Jam is a casual, 10-day long event where participants build small games using the Lisp implementation of their choice. We’d like to encourage you to join the jam and try making a game with Hoot! Hoot is a great choice for game jams because web-based games are both easy to distribute and easy for other people to play.

To make it easy to jump right in and make a game with Hoot, we have a ready-to-go template repository on Codeberg that you can clone. It takes care of all the boilerplate and provides some useful bindings to essential web APIs like HTML5 Canvas.

Okay, game jam promotion over! Read on for the full release notes.

New features

  • New --bundle flag for guild compile-wasm that will automatically install runtime libraries alongside the generated binary (reflect.js, reflect.wasm, wtf8.wasm).

  • Integrated psyntax-based macro expander (hoot expander) into eval.

  • Add -g flag for controlling compiler debug level.

  • Changed current-module to require compilation with -g.

  • Changed (hoot environments) to use run-time module registry.

  • Added logcount and vector-move-left! procedures, part of Guile's default environment.

  • Added experimental (hoot time) module for date/time calculations (currently undocumented).

  • Internal make-port procedure now uses keyword arguments.

  • Keyword error constructors are now exported from Hoot's Wasm ABI.

  • Added external-function? and call-external to (hoot ffi).

  • Added quote-syntax to (hoot core-syntax) and (hoot syntax).

  • Documented (fibers streams) module.

Bug fixes

  • Fixed inexact->exact conversion for a subset of flonums.

  • Fixed number->string for radix 16.

  • Fixed current-time which was mistakenly using jiffies.

  • Fixed string-split when delimiter is the first character in the string.

  • Fixed string-utf8-length opcode emission.

  • Fixed define-values macro that was causing compilation errors on newer Guile Git builds.

  • Fixed cond-expand error condition.

  • Fixed Wasm linker handling of call_ref and return_call_ref.

  • Fixed Wasm linker handling of catch bodies.

  • Fixed ,wasm-eval metacommand.

  • Fixed NodeJS example in the manual. Thanks to Jelle Licht for the patch!

  • External references now have a printed representation.

  • Exported sleep from (fibers) to match upstream Fibers API.

Performance improvements

  • Optimized Wasm validation pass (validate-wasm).

  • Optimized Wasm linker (add-stdlib).

  • Optimized Wasm resolver (resolve-wasm).

  • Optimized Wasm emission in compiler backend (high-level-cps->wasm).

Browser compatibility

  • Compatible with Firefox 121 or later.

  • Compatible with Google Chrome 119 or later.

  • Compatibility with Safari/WebKit is still a bit complicated, unfortunately. We thought everything was in place when Hoot 0.6.0 was released, but then we found a bug in WebKit’s Wasm tail call implementation. This bug was fixed nearly two months ago but the fix has not yet propagated out to stable Safari and iOS releases, as far we’re aware. We verified that some of our larger Hoot programs are now working on a Git build of WebKit, so we are optimistic that Hoot will finally be supported on all major browsers soon.

Get Hoot 0.6.1!

Hoot is already available in GNU Guix:

$ guix pull
$ guix install guile-next guile-hoot

(Hoot currently requires a bleeding-edge version of Guile, hence guile-next above.)

Otherwise, Hoot can be built from source via our release tarball. See the Hoot homepage for a download link and GPG signature.

Documentation for Hoot 0.6.1, including build instructions, can be found here.

Get in touch!

For bug reports, pull requests, or just to follow along with development, check out the Hoot project on Codeberg.

If you build something cool with Hoot (for the upcoming game jam or otherwise), let us know on our community forum!

Special thanks to the MetaMask folks for funding this work!

Until next time, happy hooting! 🦉

by Dave Thompson (contact@spritely.institute) at Tuesday, May 6, 2025

Monday, May 5, 2025

spritely.institute

Announcing Spritely Oaken

Spritely's Oaken mascot

Hi! I’m Daphne Preston-Kendal! You may remember me from such programming languages as the Revised⁷ Report on the Algorithmic Language Scheme. In this guest post, I’m going to explain the work I’m doing for Spritely, funded by NLnet: developing a new secure Scheme sublanguage for running untrusted code, like modifications to an existing app, in a safe way!

Usually, loading some random third-party code into your program is considered a security risk. Games are probably the most popular category of app these days where it’s normal for them to be moddable by adding more code, but we can see what a security disaster this is. Hardly a week goes by without yet another story of some rascal sharing a Minecraft mod pack on a forum or uploading a mod to the Steam Workshop which, in reality, exists to steal browser cookies or passwords.

If fun things like games can’t stop malicious mods, what hope do we have to build more serious moddable apps which get deployed on major server systems? What if a malicious mod got onto servers with sensitive data about thousands or millions of people?

Obviously, though, not all code is evil: probably most code isn’t, at least not deliberately. Do we really have to throw out the entire idea of running third-party modifications to our software just because some small amount of it might do bad stuff?

What code can you trust?

Here’s a pop quiz! What potential security problems could the following Scheme code cause? (Assume that we are working on a fully-conformant implementation of the R6RS, and this is evaluated with just the (rnrs base (6)) library imported.)

(lambda (x) (+ x 1))

Go ahead – it’s not a trick question!

The answer, obviously, is none! Evaluating this expression can’t access any system resources; once evaluated, you get a procedure, and the worst thing that can happen then is that someone gives it something that’s not a number (which will just safely raise an exception). This code will never steal your bank account information, eat all the cheese in your fridge, cause demons to fly out of your nose, etc.

So obviously, we can trust some code. For a simple procedure like this, we can audit it and see what it does! But for more complex code, it’s not reasonable to carefully read thousands of lines to try and find anything that looks nefarious. Attackers also keep finding new ways to hide malicious code from human auditors, from weird tricks with Unicode control characters to just putting loads of spaces at the start of lines so you have to scroll horizontally to see the evil code.

What if we could automatically audit code to be sure that it doesn’t do anything dodgy? Could that avoid human errors (like failing to notice a horizontal scrollbar)?

To do that, we have to go a level further and consider how we came to the conclusion that procedures like the one above are safe. How do we know? Well, + is a pure function: it takes some input value and returns a new, completely different output value. It doesn’t change anything about the input value: if you call ((lambda (x) (+ x 1)) sensitive-number), the value of sensitive-number itself won’t have changed when the add-one procedure is done with it. It doesn’t open any files or connect to any other remote computers, so your sensitive-number won’t be leaked to any bad guys. Also, it doesn’t allocate any memory or loop. Even a pure-functional program can do that – but at scale, that causes resource exhaustion, which could make an important service inaccessible until the tied-up resources are released. Well, we can treat any pure functional program as safe, so pretty much everything in the (rnrs base (6)) library is okay – and maybe when we run it we’ll be sure to watch it to make sure it doesn’t use too many CPU cycles.

So we’ve got our first idea – but, sorry, I’ve gotta interrupt this blog post now because Christine just sent me some code she thinks I should put in my next program. It looks pretty nifty – it’s a square root procedure written in just Scheme! Hey, you know what, let’s take a look together:

(library (dustyweb square-root)
  (export square-root)

Looks good so far!

  (import (rnrs base (6))
          (rnrs io ports (6)))

Uh-oh. Why does a square root procedure need port I/O? Square-root should be a pure function, but that library gives Christine’s code complete access to my filesystem. What’s she up to? Is she trying to read my private data? Or has thinking about all those evil Minecraft mods and how to detect them made me paranoid??

  (define (close-enough? guess x)
    (< (abs (- (* guess guess) x)) 0.01))

  (define (improve-guess guess x)
    (/ (+ guess (/ x guess))
       2))

  (define (square-root x)
    (call-with-port
        (open-file-output-port "sqrt-log.txt"
                               (file-options no-fail)
                               (buffer-mode line)
                               (native-transcoder))
      (lambda (log-port)
        (put-string log-port "finding square root of: ")
        (put-datum log-port x)
        (put-char log-port #\newline)
        (let loop ((guess 1.0))
          (put-string log-port "guess: ")
          (put-datum log-port guess)
          (put-char log-port #\newline)
          (cond ((close-enough? guess x)
                 (put-string log-port "good enough, ship it!\n")
                 guess)
                (else
                 (loop (improve-guess guess x)))))))))

Well, actually, that looks perfectly innocent as far as uses of the filesystem go. Christine probably wanted to log the guesses so she could debug her code. Still, if I already have any file called sqrt-log.txt, this will inadvertently delete and write over it. Also, I wanted to use this procedure in a program to work out how much perspective distortion I should expect with various lenses on some of the weird and wacky film formats I photograph on. (The amount of perspective distortion is proportional to a function of the difference between the focal length and the corner-to-corner diagonal of the image plane.) but what weird and wacky film formats I use is my own private business, and I don’t necessarily want it sent to Christine’s server: imagine if instead of (rnrs io ports (6)), she’d imported (srfi :106) (socket library) and sent all that data off to her server to spy on the square roots I’m calculating! I can’t just trust any code Christine sends me.

Hmm. Maybe we’ve learned something here, though – something we can use to improve our intuition about what code we can trust in general. I’m fine with recording what square roots I take, and what steps were taken to calculate them – on my own computer. Can we reformulate Christine’s library so that she can still have her debugging information, but it still only imports the (rnrs base (6)) library? Then we know we can trust her code just by looking at the import list!

How about something like this?

(library (dustyweb square-root)
  (export square-root)
  (import (rnrs base (6)))

  (define (close-enough? guess x)
    (< (abs (- (* guess guess) x)) 0.01))

  (define (improve-guess guess x)
    (/ (+ guess (/ x guess))
       2))

  (define (square-root x debug-log)
    (debug-log "finding square root of" x)
    (let loop ((guess 1.0))
      (debug-log "guess" guess)
      (cond ((close-enough? guess x)
             (debug-log "good enough, ship it!")
             guess)
            (else
             (loop (improve-guess guess x)))))))

Now when I write my perspective-distortion program, I call square-root like this:

(square-root (+ (square image-width)
                (square image-height))
             ignore)

where ignore is a procedure I wrote and trust that accepts any number of arguments, does nothing with them, and returns nothing of interest. In other words, the logging calls do nothing for me because they’re not very interesting to me. Meanwhile, when Christine is debugging her program, she can write a debug-log procedure that looks like this:

(define (make-debug-log-procedure log-port)
  (case-lambda
    ((message)
     (put-string log-port message)
     (put-char log-port #\newline))
    ((message value)
     (put-string log-port message)
     (put-string log-port ": ")
     (put-datum log-port value)
     (put-char log-port #\newline))))

(define debug-log (make-debug-log-procedure
                   (open-file-output-port "sqrt-log.txt"
                                          (file-options no-fail)
                                          (buffer-mode line)
                                          (native-transcoder))))

What happened here? Square-root can still do I/O, but suddenly we’re able to trust it because we know that the library it’s in only has access to procedures we trust. In other words, we only have to look at that one line in the program to know that we can completely trust the whole program! But, somehow, it can still do the I/O for the debug log Christine wants. The reason is that when we call it, we explicitly give it the ability to do I/O in a limited way: we write code ourselves – code we trust to access libraries like (rnrs io ports (6)) or (srfi :106) – and pass in a procedure from outside that exposes the capabilities of those libraries in a specific, limited way. Even if I did want to log what square-root is doing, using Christine’s debug-log procedure would only let it write to one specific file, and in a very specific way. It couldn’t use that ability to start reading my browser cookies like it could if I gave it the full (rnrs io ports (6)) library. Even better, if you do want to get the log, you can change where it goes just by passing a different port to make-debug-log-procedure!

Make-debug-log-procedure uses a technique called closure so that its own argument, the log-port, is available to the debug-log procedure even though it isn’t an argument to it. (We say that the debug-log procedure is closed over the log-port variable – or more informally, that it’s closed over the port itself.) Once make-debug-log-procedure returns, the debug-log procedure it creates has access to that port and exposes it in a limited way, which means the (dustyweb square-root) library doesn’t need to import (rnrs io ports (6)) any more. But Christine can still use open-file-input-port when she writes a debugging procedure like make-debug-log-port, because she obviously trusts herself to use it responsibly. In fact, anyone who actually uses her library can trust themselves to access their own files.

It would still be nice to be able to use square-root as if it were a pure function, not having to pass in this ugly debug-log argument. That gives me an idea: could we close the whole library over a procedure like debug-log? That would mean Christine could also extend her library to include all sorts of useful functions, like cube roots or inverse square roots – all of them safely debug-logging away – and I would only need to pass in the debug-log argument once when I import her library.

Now we’ve actually arrived at a much better idea than trying to audit all the third-party code we might possibly want to use. Instead of doing that, we just make sure it only uses safe procedures or ones we’ve given it!

Unfortunately, neither R6RS nor R7RS Scheme provides the ability to make an entire library into a closure. But maybe we can design a system that does have this ability – that’s Oaken!

Taming libraries to stop them eating resources

In programming security research, this kind of trick of using language features to turn an unsafe library into a safer one is called ‘taming’. Like humans tamed wolves into dogs to stop them eating their sheep programmers can tame libraries to stop them providing access to the procedures that could create a security vulnerability.

I can go down to the pet shop and buy a dog and train it to recognize me as its owner. As its owner, I’m in charge and I can do all the things the dog isn’t allowed to do in my household – sleep on the human bed, eat food from the dinner table, dig holes in the garden, etc. In security terminology, this kind of general permission is called ‘ambient authority’. But, as a treat, I could let the dog do some of those things sometimes, but only when I allow it. The dog doesn’t get the ambient authority to do these things all of the time: it’s only allowed to do it in the way I let it and when I say so.

We are in charge of our own programs and our own computers. We get to do everything, but the code we pull in from some library should only be able to do things we allow it to do. We’ve seen how we could do this by closing those libraries over procedures giving more limited access to a resource, but we should really have a standard library for systematically controlling access to the resources which could cause security problems if evil code uses them in evil ways.

We’re not going to support every use case in the initial version of Oaken: full taming is a really hard problem. (Like any red-blooded programmer, I initially thought it was going to be easy, and then all the individual little difficulties slowly started to dawn on me. In fact, I still keep realizing new problems almost every day.) But we’ll provide some very basic tamings that should be good enough to start off with: the idea of Oaken is that the library system supports taming as a general concept, so more advanced controls can follow later.

Filesystem and network abuse

It should be obvious that giving untrusted code unrestricted read/write access to the whole computer filesystem is a bad idea. In combination with unrestricted ability to connect to other computers over the internet, it’s an even worse idea.

We already saw a simple idea for taming filesystem access in the square root example, but real-world libraries might legitimately want to create and read their own files to provide some useful functionality to the trusted code which uses those libraries.

We already have a real world example of how this could work. When you upload a file to a website in your browser, the web site can’t directly read the files on your computer. Instead, your browser, which does have that unrestricted filesystem access, pops up a file selection dialogue box and asks you to choose a file. The browser then makes that file available to the website by giving it a File object which it can use to read that file – and only that file. This pattern of passing limited access to some restricted resource to untrusted code is called the ‘powerbox pattern’.

Just as you give a website access to one file, with Oaken you can close a library over a version of the file-system access library which, for example, only gives access to one directory. That means the code could never go snooping outside that directory and steal or destroy information that doesn’t belong to it. Simple!

Unfortunately, file access is really tricky to fully tame. We can easily make it so that a tamed library can only access one directory on the disk, but it’s not so easy to stop that library from writing files in that directory which would eventually eat up all space on the drive. Ideally, you’d be able to limit how much disk space untrusted code can use and not just access to individual files. But that would require looking at what special support the operating system can give us and how to make use of the tools it has to restrict storage abuse, so full taming of filesystem access – beyond restricting code’s access to a single file or directory – is probably out of scope for the initial version of Oaken.

But what about network access? Should we instantiate each library a different time, passing it in multiple times, for each server and port it wants to access? That sounds a bit annoying. Maybe it would be better to use a different mechanism for powerboxing than the one the library system provides. I’ll be thinking a lot about this and trying to come up with a workable design!

Timing attacks

Another surprising source of security issues in apps comes just from giving programs access to a simple system clock. Timing attacks can let untrusted code find out information about what’s going on inside of trusted procedures they shouldn’t be able to extract information from, or see the inner workings of; or they can let two untrusted bits of code communicate with one another when they shouldn’t. Mark S. Miller, an influential researcher in the area of secure programming systems, talks about this as ‘prisoners tapping on the pipes’ – sending messages to one another in their cells with Morse code or whatever. You really do have to be careful to ensure that all potential back-channels for unwanted communication are closed off.

Preventing this particular kind of attack seems quite simple – there’s no real taming to be done. It’s just a matter of making sure a library only has access to the system clock if you explicitly give it access when you instantiate it, which is an all-or-nothing proposition – unlike carefully controlling which individual files or directories a procedure could access and how. But something I’ve already learned in planning this project is that if something looks simple, I probably haven’t thought about it enough yet, so maybe there are more problems lurking beneath the surface here.

Exhaustion attacks

It’s not only external resources like files, the network, or the system clock which can pose a security risk. The computer’s CPU and memory are all you need for another kind of attack: a denial of service. If you tie up all the computer’s memory and processor time with nonsense, you can make a service inaccessible to people who actually need to use it. Typically, attackers find some request they can make to a service that takes massively, disproportionately more resources for the service to respond to than it does for them to make the request.

What tools do you need for this kind of attack? Could we restrict those through the library system too? Unfortunately, that approach won’t work here. In fact, all you need to cause a denial-of-service attack is lambda, as in this example that’s often shown while explaining how the fixed-point combinator is derived:

((lambda (x) (x x)) (lambda (x) (x x)))

By conjuring up two procedures which call each other forever, this cute little Scheme expression will peg one of your CPUs until you find a way to stop it. This isn’t good! But this kind of vulnerability to resource-exhaustion attack can happen entirely by accident: just the other day I was hacking on a procedure and accidentally made it loop infinitely on some input.

It’s mathematically impossible to always detect this kind of bug in advance. Moreover, even if a program eventually stops looping (and there is a useful subset of programs whose halting behaviour is provable), it might still take too much CPU to return its result. (Even just a badly behaved regular expression implementation can make resource exhaustion attacks possible!) So the best we can do is put a limit on the amount of computation an untrusted procedure is allowed to do.

Guile already includes a little library called (ice-9 sandbox) which lets you call code with limits on how much time it can take and how much memory it can allocate. Unfortunately, (ice-9 sandbox) limits wall-clock time, not actual processor resources, which is not ideal. For example, wall-clock time includes time spent waiting for user input! You don’t want your procedure getting killed for DoSing the system when in fact it’s sitting there doing nothing while a hunt-and-peck typist is writing up their life story into a text box. Worse, there’s no way to powerbox time usage with (ice-9 sandbox).

But there might be a solution: engines! Engines are a Scheme idea introduced by Kent Dybvig and Bob Hieb. The basic concept is that the system runtime keeps track of how many units of computation (‘ticks’) have been executed, and when that number exceeds a certain limit (the amount of ‘fuel’ in the engine), it pauses execution and returns the unfinished task back to whoever started it. They can then decide whether to stop running the computation, or give it more fuel and carry on.

The ticks can be measured in any unit – with the right support from the OS, you could even make it actual CPU time – but Dybvig and Hieb suggest counting procedure calls. That works well in Scheme because, once you expand all your macros, everything you could do that might cause CPU exhaustion – in particular, looping – involves a procedure call.

The Guile manual also describes a few problems with the memory limiter in (ice-9 sandbox). For now, we’ll probably have to live with those and a few more, like the fact that it’s also not possible to powerbox memory usage. Guile is getting a new garbage collector soon, thanks to Andy Wingo’s sterling work on Whippet; as the migration from BDW to Whippet progresses, maybe it will become possible to fix some of these issues in future. But here also, many operating systems (cough Linux) make things difficult: we can’t really do anything about the kernel deciding that the Guile process has had quite enough memory for tonight and that it’s time to go home.

Capability attentuation

So far we’ve only considered one level of delegation of trust: you have your dog/program, and the dog/program is only allowed to sleep in the dog bed you gave it/use only the files you gave it. But in reality, software dependencies form a much more complex graph, and it should be possible for the things you trust in some limited way to place their trust in other things in even more limited ways.

Let’s say I have my program, and within it I gave your library access to a key-value store where it gets to store up to 100 MB of data in its own namespace. Your library might want to import some other library, giving it some smaller slice of that data in turn, like 10 MB, and its own subnamespace. So your library needs to be able to take the version of the library you gave it, which is closed over the limit of 100 MB and one namespace, and close it over a different limit and a different, subordinate namespace.

At the moment, I’m still in the early stages of designing the library system, and supporting this use case seems kind of hard! It’s easy to think about a one-level system where the main, trusted program gets to instantiate every (potentially untrusted) library with some fixed limitations. It’s a lot harder to design a system where it’s easy to start with an existing instantiated, untrusted library and impose even stricter limitations on it to create another library with the same interface but even tighter controls.

Onward

I’m looking forward to hacking on Oaken over the next couple of months! Oaken is one of the more intriguing parts of the Spritely Institute’s vision for decentralized internet infrastructure. Imagine spreadsheet macros that won’t make off with your sensitive financial data; imagine not just safe mods to existing games, but even something like the ability to send a whole new multiplayer game to your friends in a chat program, who could then just run it right in place on their computer with no security fears; imagine the Shepherd where each service ran in its own capability sandbox! Oaken will enable all of this by working alongside other Spritely projects like Hoot and Goblins.

Let a thousand Oaken forests bloom!

Further reading

The canonical paper on using Scheme variable scope for security is Jonathan Rees’s Ph.D. dissertation, ‘A Security Based on the Lambda Calculus’. That dissertation is a direct inspiration for Oaken, and Jonathan is advising me on the project.

As far as I know, the idea of closing modules over things, in the way procedures can be closed over things, comes from the Standard ML programming language. David MacQueen’s paper ‘Modules for Standard ML’ explains how it was designed. We’re not the first people to try adapting this idea for Scheme: the Scheme 48 module system, also designed by Jonathan Rees, worked this way. But Oaken will be the first such library system in Scheme specifically designed for security purposes, released for general use, and with a standard library for tamed access to important resources.

The E programming language pioneered many aspects of capability-based security within a programming language, and the Emaker library system is also an inspiration for Oaken. The same idea was proposed for JavaScript as FrozenRealms which evolved into Endo.

by Daphne Preston-Kendal (contact@spritely.institute) at Monday, May 5, 2025

Friday, April 25, 2025

Andy Wingo

partitioning ambiguous edges in guile

Today, some more words on memory management, on the practicalities of a system with conservatively-traced references.

The context is that I have finally started banging Whippet into Guile, initially in a configuration that continues to use the conservative Boehm-Demers-Weiser (BDW) collector behind the scene. In that way I can incrementally migrate over all of the uses of the BDW API in Guile to use Whippet API instead, and then if all goes well, I should be able to switch Whippet to use another GC algorithm, probably the mostly-marking collector (MMC). MMC scales better than BDW for multithreaded mutators, and it can eliminate fragmentation via Immix-inspired optimistic evacuation.

problem statement: how to manage ambiguous edges

A garbage-collected heap consists of memory, which is a set of addressable locations. An object is a disjoint part of a heap, and is the unit of allocation. A field is memory within an object that may refer to another object by address. Objects are nodes in a directed graph in which each edge is a field containing an object reference. A root is an edge into the heap from outside. Garbage collection reclaims memory from objects that are not reachable from the graph that starts from a set of roots. Reclaimed memory is available for new allocations.

In the course of its work, a collector may want to relocate an object, moving it to a different part of the heap. The collector can do so if it can update all edges that refer to the object to instead refer to its new location. Usually a collector arranges things so all edges have the same representation, for example an aligned word in memory; updating an edge means replacing the word’s value with the new address. Relocating objects can improve locality and reduce fragmentation, so it is a good technique to have available. (Sometimes we say evacuate, move, or compact instead of relocate; it’s all the same.)

Some collectors allow ambiguous edges: words in memory whose value may be the address of an object, or might just be scalar data. Ambiguous edges usually come about if a compiler doesn’t precisely record which stack locations or registers contain GC-managed objects. Such ambiguous edges must be traced conservatively: the collector adds the object to its idea of the set of live objects, as if the edge were a real reference. This tracing mode isn’t supported by all collectors.

Any object that might be the target of an ambiguous edge cannot be relocated by the collector; a collector that allows conservative edges cannot rely on relocation as part of its reclamation strategy. Still, if the collector can know that a given object will not be the referent of an ambiguous edge, relocating it is possible.

How can one know that an object is not the target of an ambiguous edge? We have to partition the heap somehow into possibly-conservatively-referenced and definitely-not-conservatively-referenced. The two ways that I know to do this are spatially and temporally.

Spatial partitioning means that regardless of the set of root and intra-heap edges, there are some objects that will never be conservatively referenced. This might be the case for a type of object that is “internal” to a language implementation; third-party users that may lack the discipline to precisely track roots might not be exposed to objects of a given kind. Still, link-time optimization tends to weather these boundaries, so I don’t see it as being too reliable over time.

Temporal partitioning is more robust: if all ambiguous references come from roots, then if one traces roots before intra-heap edges, then any object not referenced after the roots-tracing phase is available for relocation.

kinds of ambiguous edges in guile

So let’s talk about Guile! Guile uses BDW currently, which considers edges to be ambiguous by default. However, given that objects carry type tags, Guile can, with relatively little effort, switch to precisely tracing most edges. “Most”, however, is not sufficient; to allow for relocation, we need to eliminate intra-heap ambiguous edges, to confine conservative tracing to the roots-tracing phase.

Conservatively tracing references from C stacks or even from static data sections is not a problem: these are roots, so, fine.

Guile currently traces Scheme stacks almost-precisely: its compiler emits stack maps for every call site, which uses liveness analysis to only mark those slots that are Scheme values that will be used in the continuation. However it’s possible that any given frame is marked conservatively. The most common case is when using the BDW collector and a thread is pre-empted by a signal; then its most recent stack frame is likely not at a safepoint and indeed is likely undefined in terms of Guile’s VM. It can also happen if there is a call site within a VM operation, for example to a builtin procedure, if it throws an exception and recurses, or causes GC itself. Also, when per-instruction traps are enabled, we can run Scheme between any two Guile VM operations.

So, Guile could change to trace Scheme stacks fully precisely, but this is a lot of work; in the short term we will probably just trace Scheme stacks as roots instead of during the main trace.

However, there is one more significant source of ambiguous roots, and that is reified continuation objects. Unlike active stacks, these have to be discovered during a trace and cannot be partitioned out to the root phase. For delimited continuations, these consist of a slice of the Scheme stack. Traversing a stack slice precisely is less problematic than for active stacks, because it isn’t in motion, and it is captured at a known point; but we will have to deal with stack frames that are pre-empted in unexpected locations due to exceptions within builtins. If a stack map is missing, probably the solution there is to reconstruct one using local flow analysis over the bytecode of the stack frame’s function; time-consuming, but it should be robust as we do it elsewhere.

Undelimited continuations (those captured by call/cc) contain a slice of the C stack also, for historical reasons, and there we can’t trace it precisely at all. Therefore either we disable relocation if there are any live undelimited continuation objects, or we eagerly pin any object referred to by a freshly captured stack slice.

fin

If you want to follow along with the Whippet-in-Guile work, see the wip-whippet branch in Git. I’ve bumped its version to 4.0 because, well, why the hell not; if it works, it will certainly be worth it. Until next time, happy hacking!

by Andy Wingo at Friday, April 25, 2025