Planet Scheme

Thursday, March 23, 2023

Scheme Requests for Implementation

SRFI 241: Match — Simple Pattern-Matching Syntax to Express Catamorphisms on Scheme Data

SRFI 241 is now in final status.

This SRFI describes a simple pattern matcher based on one originally devised by Kent Dybvig, Dan Friedman, and Eric Hilsdale, which has a catamorphism feature to perform recursion automatically.

by Marc Nieper-Wißkirchen at Thursday, March 23, 2023

Monday, March 20, 2023

Andy Wingo

a world to win: webassembly for the rest of us

Good day, comrades!

Today I'd like to share the good news that WebAssembly is finally coming for the rest of us weirdos.

A world to win

WebAssembly for the rest of us

17 Mar 2023 – BOB 2023

Andy Wingo

Igalia, S.L.

This is a transcript-alike of a talk that I gave last week at BOB 2023, a gathering in Berlin of people that are using "technologies beyond the mainstream" to get things done: Haskell, Clojure, Elixir, and so on. PDF slides here, and I'll link the video too when it becomes available.

WebAssembly, the story

WebAssembly is an exciting new universal compute platform

WebAssembly: what even is it? Not a programming language that you would write software in, but rather a compilation target: a sort of assembly language, if you will.

WebAssembly, the pitch

Predictable portable performance

  • Low-level
  • Within 10% of native

Reliable composition via isolation

  • Modules share nothing by default
  • No nasal demons
  • Memory sandboxing

Compile your code to WebAssembly for easier distribution and composition

If you look at what the characteristics of WebAssembly are as an abstract machine, to me there are two main areas in which it is an advance over the alternatives.

Firstly it's "close to the metal" -- if you compile for example an image-processing library to WebAssembly and run it, you'll get similar performance when compared to compiling it to x86-64 or ARMv8 or what have you. (For image processing in particular, native still generally wins because the SIMD primitives in WebAssembly are more narrow and because getting the image into and out of WebAssembly may imply a copy, but the general point remains.) WebAssembly's instruction set covers a broad range of low-level operations that allows compilers to produce efficient code.

The novelty here is that WebAssembly is both portable while also being successful. We language weirdos know that it's not enough to do something technically better: you have to also succeed in getting traction for your alternative.

The second interesting characteristic is that WebAssembly is (generally speaking) a principle-of-least-authority architecture: a WebAssembly module starts with access to nothing but itself. Any capabilities that an instance of a module has must be explicitly shared with it by the host at instantiation-time. This is unlike DLLs which have access to all of main memory, or JavaScript libraries which can mutate global objects. This characteristic allows WebAssembly modules to be reliably composed into larger systems.

WebAssembly, the hype

It’s in all browsers! Serve your code to anyone in the world!

It’s on the edge! Run code from your web site close to your users!

Compose a library (eg: Expat) into your program (eg: Firefox), without risk!

It’s the new lightweight virtualization: Wasm is what containers were to VMs! Give me that Kubernetes cash!!!

Again, the remarkable thing about WebAssembly is that it is succeeding! It's on all of your phones, all your desktop web browsers, all of the content distribution networks, and in some cases it seems set to replace containers in the cloud. Launch the rocket emojis!

WebAssembly, the reality

WebAssembly is a weird backend for a C compiler

Only some source languages are having success on WebAssembly

What about Haskell, Ocaml, Scheme, F#, and so on – what about us?

Are we just lazy? (Well...)

So why aren't we there? Where is Clojure-on-WebAssembly? Where are the F#, the Elixir, the Haskell compilers? Some early efforts exist, but they aren't really succeeding. Why is that? Are we just not putting in the effort? Why is it that Rust gets to ride on the rocket ship but Scheme does not?

WebAssembly, the reality (2)

WebAssembly (1.0, 2.0) is not well-suited to garbage-collected languages

Let’s look into why

As it turns out, there is a reason that there is no good Scheme implementation on WebAssembly: the initial version of WebAssembly is a terrible target if your language relies on the presence of a garbage collector. There have been some advances but this observation still applies to the current standardized and deployed versions of WebAssembly. To better understand this issue, let's dig into the guts of the system to see what the limitations are.

GC and WebAssembly 1.0

Where do garbage-collected values live?

For WebAssembly 1.0, only possible answer: linear memory

  (global $hp (mut i32) (i32.const 0))
  (memory $mem 10)) ;; 640 kB

The primitive that WebAssembly 1.0 gives you to represent your data is what is called linear memory: just a buffer of bytes to which you can read and write. It's pretty much like what you get when compiling natively, except that the memory layout is more simple. You can obtain this memory in units of 64-kilobyte pages. In the example above we're going to request 10 pages, for 640 kB. Should be enough, right? We'll just use it all for the garbage collector, with a bump-pointer allocator. The heap pointer / allocation pointer is kept in the mutable global variable $hp.

(func $alloc (param $size i32) (result i32)
  (local $ret i32)
  (loop $retry
    (local.set $ret (global.get $hp))
    (global.set $hp
      (i32.add (local.get $size) (local.get $ret)))

    (br_if 1
      (i32.lt_u (i32.shr_u (global.get $hp) 16)
      (local.get $ret))

    (call $gc)
    (br $retry)))

Here's what an allocation function might look like. The allocation function $alloc is like malloc: it takes a number of bytes and returns a pointer. In WebAssembly, a pointer to memory is just an offset, which is a 32-bit integer (i32). (Having the option of a 64-bit address space is planned but not yet standard.)

If this is your first time seeing the text representation of a WebAssembly function, you're in for a treat, but that's not the point of the presentation :) What I'd like to focus on is the (call $gc) -- what happens when the allocation pointer reaches the end of the region?

GC and WebAssembly 1.0 (2)

What hides behind (call $gc) ?

Ship a GC over linear memory

Stop-the-world, not parallel, not concurrent

But... roots.

The first thing to note is that you have to provide the $gc yourself. Of course, this is doable -- this is what we do when compiling to a native target.

Unfortunately though the multithreading support in WebAssembly is somewhat underpowered; it lets you share memory and use atomic operations but you have to create the threads outside WebAssembly. In practice probably the GC that you ship will not take advantage of threads and so it will be rather primitive, deferring all collection work to a stop-the-world phase.

GC and WebAssembly 1.0 (3)

Live objects are

  • the roots
  • any object referenced by a live object

Roots are globals and locals in active stack frames

No way to visit active stack frames

What's worse though is that you have no access to roots on the stack. A GC has to keep live objects, as defined circularly as any object referenced by a root, or any object referenced by a live object. It starts with the roots: global variables and any GC-managed object referenced by an active stack frame.

But there we run into problems, because in WebAssembly (any version, not just 1.0) you can't iterate over the stack, so you can't find active stack frames, so you can't find the stack roots. (Sometimes people want to support this as a low-level capability but generally speaking the consensus would appear to be that overall performance will be better if the engine is the one that is responsible for implementing the GC; but that is foreshadowing!)

GC and WebAssembly 1.0 (3)


  • handle stack for precise roots
  • spill all possibly-pointer values to linear memory and collect conservatively

Handle book-keeping a drag for compiled code

Given the noniterability of the stack, there are basically two work-arounds. One is to have the compiler and run-time maintain an explicit stack of object roots, which the garbage collector can know for sure are pointers. This is nice because it lets you move objects. But, maintaining the stack is overhead; the state of the art solution is rather to create a side table (a "stack map") associating each potential point at which GC can be called with instructions on how to find the roots.

The other workaround is to spill the whole stack to memory. Or, possibly just pointer-like values; anyway, you conservatively scan all words for things that might be roots. But instead of having access to the memory to which the WebAssembly implementation would spill your stack, you have to do it yourself. This can be OK but it's sub-optimal; see my recent post on the Whippet garbage collector for a deeper discussion of the implications of conservative root-finding.

GC and WebAssembly 1.0 (4)

Cycles with external objects (e.g. JavaScript) uncollectable

A pointer to a GC-managed object is an offset to linear memory, need capability over linear memory to read/write object from outside world

No way to give back memory to the OS

Gut check: gut says no

If that were all, it would already be not so great, but it gets worse! Another problem with linear-memory GC is that it limits the potential for composing a number of modules and the host together, because the garbage collector that manages JavaScript objects in a web browser knows nothing about your garbage collector over your linear memory. You can easily create memory leaks in a system like that.

Also, it's pretty gross that a reference to an object in linear memory requires arbitrary read-write access over all of linear memory in order to read or write object fields. How do you build a reliable system without invariants?

Finally, once you collect garbage, and maybe you manage to compact memory, you can't give anything back to the OS. There are proposals in the works but they are not there yet.

If the BOB audience had to choose between Worse is Better and The Right Thing, I think the BOB audience is much closer to the Right Thing. People like that feel instinctual revulsion to ugly systems and I think GC over linear memory describes an ugly system.

GC and WebAssembly 1.0 (5)

There is already a high-performance concurrent parallel compacting GC in the browser

Halftime: C++ N – Altlangs 0

The kicker is that WebAssembly 1.0 requires you to write and deliver a terrible GC when there is already probably a great GC just sitting there in the host, one that has hundreds of person-years of effort invested in it, one that will surely do a better job than you could ever do. WebAssembly as hosted in a web browser should have access to the browser's garbage collector!

I have the feeling that while those of us with a soft spot for languages with garbage collection have been standing on the sidelines, Rust and C++ people have been busy on the playing field scoring goals. Tripping over the ball, yes, but eventually they do manage to make within striking distance.

Change is coming!

Support for built-in GC set to ship in Q4 2023

With GC, the material conditions are now in place

Let’s compile our languages to WebAssembly

But to continue the sportsball metaphor, I think in the second half our players will finally be able to get out on the pitch and give it the proverbial 110%. Support for garbage collection is coming to WebAssembly users, and I think even by the end of the year it will be shipping in major browsers. This is going to be big! We have a chance and we need to sieze it.

Scheme to Wasm

Spritely + Igalia working on Scheme to WebAssembly

Avoid truncating language to platform; bring whole self

  • Value representation
  • Varargs
  • Tail calls
  • Delimited continuations
  • Numeric tower

Even with GC, though, WebAssembly is still a weird machine. It would help to see the concrete approaches that some languages of interest manage to take when compiling to WebAssembly.

In that spirit, the rest of this article/presentation is a walkthough of the approach that I am taking as I work on a WebAssembly compiler for Scheme. (Thanks to Spritely for supporting this work!)

Before diving in, a meta-note: when you go to compile a language to, say, JavaScript, you are mightily tempted to cut corners. For example you might implement numbers as JavaScript numbers, or you might omit implementing continuations. In this work I am trying to not cut corners, and instead to implement the language faithfully. Sometimes this means I have to work around weirdness in WebAssembly, and that's OK.

When thinking about Scheme, I'd like to highlight a few specific areas that have interesting translations. We'll start with value representation, which stays in the GC theme from the introduction.

Scheme to Wasm: Values

;;       any  extern  func
;;        |
;;        eq
;;     /  |   \
;; i31 struct  array

The unitype: (ref eq)

Immediate values in (ref i31)

  • fixnums with 30-bit range
  • chars, bools, etc

Explicit nullability: (ref null eq) vs (ref eq)

The GC extensions for WebAssembly are phrased in terms of a type system. Oddly, there are three top types; as far as I understand it, this is the result of a compromise about how WebAssembly engines might want to represent these different kinds of values. For example, an opaque JavaScript value flowing into a WebAssembly program would have type (ref extern). On a system with NaN boxing, you would need 64 bits to represent a JS value. On the other hand a native WebAssembly object would be a subtype of (ref any), and might be representable in 32 bits, either because it's a 32-bit system or because of pointer compression.

Anyway, three top types. The user can define subtypes of struct and array, instantiate values of those types, and access their fields. The life cycle of reference-typed objects is automatically managed by the run-time, which is just another way of saying they are garbage-collected.

For Scheme, we need a common supertype for all values: the unitype, in Bob Harper's memorable formulation. We can use (ref any), but actually we'll use (ref eq) -- this is the supertype of values that can be compared by (pointer) identity. So now we can code up eq?:

(func $eq? (param (ref eq) (ref eq))
           (result i32)
  (ref.eq (local.get a) (local.get b)))

Generally speaking in a Scheme implementation there are immediates and heap objects. Immediates can be encoded in the bits of a value, whereas for heap object the bits of a value encode a reference (pointer) to an object on the garbage-collected heap. We usually represent small integers as immediates, as well as booleans and other oddball values.

Happily, WebAssembly gives us an immediate value type, i31. We'll encode our immediates there, and otherwise represent heap objects as instances of struct subtypes.

Scheme to Wasm: Values (2)

Heap objects subtypes of struct; concretely:

(struct $heap-object
  (struct (field $tag-and-hash i32)))
(struct $pair
  (sub $heap-object
    (struct i32 (ref eq) (ref eq))))

GC proposal allows subtyping on structs, functions, arrays

Structural type equivalance: explicit tag useful

We actually need to have a common struct supertype as well, for two reasons. One is that we need to be able to hash Scheme values by identity, but for this we need an embedded lazily-initialized hash code. It's a bit annoying to take the per-object memory hit but it's a reality, and the JVM does it this way, so it must not be so terrible.

The other reason is more subtle: WebAssembly's type system is built in such a way that types that are "structurally" equivalent are indistinguishable. So a pair has two fields, besides the hash, but there might be a number of other fundamental object types that have the same shape; you can't fully rely on WebAssembly's dynamic type checks (ref.test et al) to be able to query the type of a value. Instead we re-use the low bits of the hash word to include a type tag, which might be 1 for pairs, 2 for vectors, 3 for closures, and so on.

Scheme to Wasm: Values (3)

(func $cons (param (ref eq)
                   (ref eq))
            (result (ref $pair))
  (struct.new_canon $pair
    ;; Assume heap tag for pairs is 1.
    (i32.const 1)
    ;; Car and cdr.
    (local.get 0)
    (local.get 1)))

(func $%car (param (ref $pair))
            (result (ref eq))
  (struct.get $pair 1 (local.get 0)))

With this knowledge we can define cons, as a simple call to struct.new_canon pair.

I didn't have time for this in the talk, but there is a ghost haunting this code: the ghost of nominal typing. See, in a web browser at least, every heap object will have its first word point to its "hidden class" / "structure" / "map" word. If the engine ever needs to check that a value is of a specific shape, it can do a quick check on the map word's value; if it needs to do deeper introspection, it can dereference that word to get more details.

Under the hood, testing whether a (ref eq) is a pair or not should be a simple check that it's a (ref struct) (and not a fixnum), and then a comparison of its map word to the run-time type corresponding to $pair. If subtyping of $pair is allowed, we start to want inline caches to handle polymorphism, but the checking the map word is still the basic mechanism.

However, as I mentioned, we only have structural equality of types; two (struct (ref eq)) type definitions will define the same type and have the same map word (run-time type / RTT). Hence the _canon in the name of struct.new_canon $pair: we create an instance of $pair, with the canonical run-time-type for objects having $pair-shape.

In earlier drafts of the WebAssembly GC extensions, users could define their own RTTs, which effectively amounts to nominal typing: not only does this object have the right structure, but was it created with respect to this particular RTT. But, this facility was cut from the first release, and it left ghosts in the form of these _canon suffixes on type constructor instructions.

For the Scheme-to-WebAssembly effort, we effectively add back in a degree of nominal typing via type tags. For better or for worse this results in a so-called "open-world" system: you can instantiate a separately-compiled WebAssembly module that happens to define the same types and use the same type tags and it will be able to happily access the contents of Scheme values from another module. If you were to use nominal types, you would't be able to do so, unless there were some common base module that defined and exported the types of interests, and which any extension module would need to import.

(func $car (param (ref eq)) (result (ref eq))
  (local (ref $pair))
  (block $not-pair
    (br_if $not-pair
      (i32.eqz (ref.test $pair (local.get 0))))
    (local.set 1 (ref.cast $pair) (local.get 0))
    (br_if $not-pair
        (i32.const 1)
          (i32.const 0xff)
          (struct.get $heap-object 0 (local.get 1)))))
    (return_call $%car (local.get 1)))

  (call $type-error)

In the previous example we had $%car, with a funny % in the name, taking a (ref $pair) as an argument. But in the general case (barring compiler heroics) car will take an instance of the unitype (ref eq). To know that it's actually a pair we have to make two checks: one, that it is a struct and has the $pair shape, and two, that it has the right tag. Oh well!

Scheme to Wasm

  • Value representation
  • Varargs
  • Tail calls
  • Delimited continuations
  • Numeric tower

But with all of that I think we have a solid story on how to represent values. I went through all of the basic value types in Guile and checked that they could all be represented using GC types, and it seems that all is good. Now on to the next point: varargs.

Scheme to Wasm: Varargs

(list 'hey)      ;; => (hey)
(list 'hey 'bob) ;; => (hey bob)

Problem: Wasm functions strongly typed

(func $list (param ???) (result (ref eq))

Solution: Virtualize calling convention

In WebAssembly, you define functions with a type, and it is impossible to call them in an unsound way. You must call $car exactly 2 arguments or it will not compile, and those arguments have to be of specific types, and so on. But Scheme doesn't enforce these restrictions on the language level, bless its little miscreant heart. You can call car with 5 arguments, and you'll get a run-time error. There are some functions that can take a variable number of arguments, doing different things depending on incoming argument count.

How do we square these two approaches to function types?

;; "Registers" for args 0 to 3
(global $arg0 (mut (ref eq)) ( (i32.const 0)))
(global $arg1 (mut (ref eq)) ( (i32.const 0)))
(global $arg2 (mut (ref eq)) ( (i32.const 0)))
(global $arg3 (mut (ref eq)) ( (i32.const 0)))

;; "Memory" for the rest
(type $argv (array (ref eq)))
(global $argN (ref $argv)
          $argv (i31.const 42) ( (i32.const 0))))

Uniform function type: argument count as sole parameter

Callee moves args to locals, possibly clearing roots

The approach we are taking is to virtualize the calling convention. In the same way that when calling an x86-64 function, you pass the first argument in $rdi, then $rsi, and eventually if you run out of registers you put arguments in memory, in the same way we'll pass the first argument in the $arg0 global, then $arg1, and eventually in memory if needed. The function will receive the number of incoming arguments as its sole parameter; in fact, all functions will be of type (func (param i32)).

The expectation is that after checking argument count, the callee will load its arguments from globals / memory to locals, which the compiler can do a better job on than globals. We might not even emit code to null out the argument globals; might leak a little memory but probably would be a win.

You can imagine a world in which $arg0 actually gets globally allocated to $rdi, because it is only live during the call sequence; but I don't think that world is this one :)

Scheme to Wasm

  • Value representation
  • Varargs
  • Tail calls
  • Delimited continuations
  • Numeric tower

Great, two points out of the way! Next up, tail calls.

Scheme to Wasm: Tail calls

;; Call known function
(return_call $f arg ...)

;; Call function by value
(return_call_ref $type callee arg ...)

Friends -- I almost cried making this slide. We Schemers are used to working around the lack of tail calls, and I could have done so here, but it's just such a relief that these functions are just going to be there and I don't have to think much more about them. Technically speaking the proposal isn't merged yet; checking the phases document it's at the last station before headed to the great depot in the sky. But, soon soon it will be present and enabled in all WebAssembly implementations, and we should build systems now that rely on it.

Scheme to Wasm

  • Value representation
  • Varargs
  • Tail calls
  • Delimited continuations
  • Numeric tower

Next up, my favorite favorite topic: delimited continuations.

Scheme to Wasm: Prompts (1)

Problem: Lightweight threads/fibers, exceptions

Possible solutions

  • Eventually, built-in coroutines
  • binaryen’s asyncify (not yet ready for GC); see Julia
  • Delimited continuations

“Bring your whole self”

Before diving in though, one might wonder why bother. Delimited continuations are a building-block that one can use to build other, more useful things, notably exceptions and light-weight threading / fibers. Could there be another way of achieving these end goals without having to implement this relatively uncommon primitive?

For fibers, it is possible to implement them in terms of a built-in coroutine facility. The standards body seems willing to include a coroutine primitive, but it seems far off to me; not within the next 3-4 years I would say. So let's put that to one side.

There is a more near-term solution, to use asyncify to implement coroutines somehow; but my understanding is that asyncify is not ready for GC yet.

For the Guile flavor of Scheme at least, delimited continuations are table stakes of their own right, so given that we will have them on WebAssembly, we might as well use them to implement fibers and exceptions in the same way as we do on native targets. Why compromise if you don't have to?

Scheme to Wasm: Prompts (2)

Prompts delimit continuations

(define k
  (call-with-prompt ’foo
    ; body
    (lambda ()
      (+ 34 (abort-to-prompt 'foo)))
    ; handler
    (lambda (continuation)

(k 10)       ;; ⇒ 44
(- (k 10) 2) ;; ⇒ 42

k is the _ in (lambda () (+ 34 _))

There are a few ways to implement delimited continuations, but my usual way of thinking about them is that a delimited continuation is a slice of the stack. One end of the slice is the prompt established by call-with-prompt, and the other by the continuation of the call to abort-to-prompt. Capturing a slice pops it off the stack, copying it out to the heap as a callable function. Calling that function splats the captured slice back on the stack and resumes it where it left off.

Scheme to Wasm: Prompts (3)

Delimited continuations are stack slices

Make stack explicit via minimal continuation-passing-style conversion

  • Turn all calls into tail calls
  • Allocate return continuations on explicit stack
  • Breaks functions into pieces at non-tail calls

This low-level intuition of what a delimited continuation is leads naturally to an implementation; the only problem is that we can't slice the WebAssembly call stack. The workaround here is similar to the varargs case: we virtualize the stack.

The mechanism to do so is a continuation-passing-style (CPS) transformation of each function. Functions that make no calls, such as leaf functions, don't need to change at all. The same goes for functions that make only tail calls. For functions that make non-tail calls, we split them into pieces that preserve the only-tail-calls property.

Scheme to Wasm: Prompts (4)

Before a non-tail-call:

  • Push live-out vars on stacks (one stack per top type)
  • Push continuation as funcref
  • Tail-call callee

Return from call via pop and tail call:

(return_call_ref (call $pop-return)
                 (i32.const 0))

After return, continuation pops state from stacks

Consider a simple function:

(define (f x y)
  (+ x (g y))

Before making a non-tail call, a "tailified" function will instead push all live data onto an explicitly-managed stack and tail-call the callee. It also pushes on the return continuation. Returning from the callee pops the return continuation and tail-calls it. The return continuation pops the previously-saved live data and continues.

In this concrete case, tailification would split f into two pieces:

(define (f x y)
  (push! x)
  (push-return! f-return-continuation-0)
  (g y))

(define (f-return-continuation-0 g-of-y)
  (define k (pop-return!))
  (define x (pop! x))
  (k (+ x g-of-y)))

Now there are no non-tail calls, besides calls to run-time routines like push! and + and so on. This transformation is implemented by tailify.scm.

Scheme to Wasm: Prompts (5)


  • Pop stack slice to reified continuation object
  • Tail-call new top of stack: prompt handler

Calling a reified continuation:

  • Push stack slice
  • Tail-call new top of stack

No need to wait for effect handlers proposal; you can have it all now!

The salient point is that the stack on which push! operates (in reality, probably four or five stacks: one in linear memory or an array for types like i32 or f64, three for each of the managed top types any, extern, and func, and one for the stack of return continuations) are managed by us, so we can slice them.

Someone asked in the talk about whether the explicit memory traffic and avoiding the return-address-buffer branch prediction is a source of inefficiency in the transformation and I have to say, yes, but I don't know by how much. I guess we'll find out soon.

Scheme to Wasm

  • Value representation
  • Varargs
  • Tail calls
  • Delimited continuations
  • Numeric tower

Okeydokes, last point!

Scheme to Wasm: Numbers

Numbers can be immediate: fixnums

Or on the heap: bignums, fractions, flonums, complex

Supertype is still ref eq

Consider imports to implement bignums

  • On web: BigInt
  • On edge: Wasm support module (mini-gmp?)

Dynamic dispatch for polymorphic ops, as usual

First, I would note that sometimes the compiler can unbox numeric operations. For example if it infers that a result will be an inexact real, it can use unboxed f64 instead of library routines working on heap flonums ((struct i32 f64); the initial i32 is for the hash and tag). But we still need a story for the general case that involves dynamic type checks.

The basic idea is that we get to have fixnums and heap numbers. Fixnums will handle most of the integer arithmetic that we need, and will avoid allocation. We'll inline most fixnum operations as a fast path and call out to library routines otherwise. Of course fixnum inputs may produce a bignum output as well, so the fast path sometimes includes another slow-path callout.

We want to minimize binary module size. In an ideal compile-to-WebAssembly situation, a small program will have a small module size, down to a minimum of a kilobyte or so; larger programs can be megabytes, if the user experience allows for the download delay. Binary module size will be dominated by code, so that means we need to plan for aggressive dead-code elimination, minimize the size of fast paths, and also minimize the size of the standard library.

For numbers, we try to keep module size down by leaning on the platform. In the case of bignums, we can punt some of this work to the host; on a JavaScript host, we would use BigInt, and on a WASI host we'd compile an external bignum library. So that's the general story: inlined fixnum fast paths with dynamic checks, and otherwise library routine callouts, combined with aggressive whole-program dead-code elimination.

Scheme to Wasm

  • Value representation
  • Varargs
  • Tail calls
  • Delimited continuations
  • Numeric tower

Hey I think we did it! Always before when I thought about compiling Scheme or Guile to the web, I got stuck on some point or another, was tempted down the corner-cutting alleys, and eventually gave up before starting. But finally it would seem that the stars are aligned: we get to have our Scheme and run it too.


Debugging: The wild west of DWARF; prompts

Strings: stringref host strings spark joy

JS interop: Export accessors; Wasm objects opaque to JS. externref.

JIT: A whole ’nother talk!

AOT: wasm2c

Of course, like I said, WebAssembly is still a weird machine: as a compilation target but also at run-time. Debugging is a right proper mess; perhaps some other article on that some time.

How to represent strings is a surprisingly gnarly question; there is tension within the WebAssembly standards community between those that think that it's possible for JavaScript and WebAssembly to share an underlying string representation, and those that think that it's a fool's errand and that copying is the only way to go. I don't know which side will prevail; perhaps more on that as well later on.

Similarly the whole interoperation with JavaScript question is very much in its early stages, with the current situation choosing to err on the side of nothing rather than the wrong thing. You can pass a WebAssembly (ref eq) to JavaScript, but JavaScript can't do anything with it: it has no prototype. The state of the art is to also ship a JS run-time that wraps each wasm object, proxying exported functions from the wasm module as object methods.

Finally, some language implementations really need JIT support, like PyPy. There, that's a whole 'nother talk!

WebAssembly for the rest of us

With GC, WebAssembly is now ready for us

Getting our languages on WebAssembly now a S.M.O.P.

Let’s score some goals in the second half!


WebAssembly has proven to have some great wins for C, C++, Rust, and so on -- but now it's our turn to get in the game. GC is coming and we as a community need to be getting our compilers and language run-times ready. Let's put on the coffee and bang some bytes together; it's still early days and there's a world to win out there for the language community with the best WebAssembly experience. The game is afoot: happy consing!

by Andy Wingo at Monday, March 20, 2023

Sunday, March 19, 2023

Jérémy Korwin-Zmijowski

KDE Neon : always asking for wifi password

KDE Neon Logo

For a few months, my system systematically asked me for the password of the wifi network.

Thanks to this topic on the KDE forum, I was able to fix the problem with the command sudo pkcon install libkf5wallet-bin.

Such a relief! Thank you very much for reading this article! Hope you learned something!

Don't hesitate to give me your opinion, suggest an idea for improvement, report an error, or ask a question ! I would be so glad to discuss about the topic covered here with you ! You can reach me here.

Don't miss out on the next ones ! Either via RSS or via e-mail !

And more importantly, share this blog and tell your friends why they should read this post!

#gnu #guile #tdd #book #english

GPG: 036B 4D54 B7B4 D6C8 DA62 2746 700F 5E0C CBB2 E2D1

Sunday, March 19, 2023

Wednesday, March 15, 2023

GNU Guix

Building Toolchains with Guix

In order to deploy embedded software using Guix we first need to teach Guix how to cross-compile it. Since Guix builds everything from source, this means we must teach Guix how to build our cross-compilation toolchain.

The Zephyr Project uses its own fork of GCC with custom configs for the architectures supported by the project. In this article, we describe the cross-compilation toolchain we defined for Zephyr; it is implemented as a Guix channel.

About Zephyr

Zephyr is a real-time operating system from the Linux Foundation. It aims to provide a common environment which can target even the most resource constrained devices.

Zephyr introduces a module system which allows third parties to share code in a uniform way. Zephyr uses CMake to perform physical component composition of these modules. It searches the filesystem and generates scripts which the toolchain will use to successfully combine those components into a firmware image.

The fact that Zephyr provides this mechanism is one reason I chose to target it in the first place.

This separation of modules in an embedded context is a really great thing. It brings many of the advantages that it brings to the Linux world such as code re-use, smaller binaries, more efficient cache/RAM usage, etc. It also allows us to work as independent groups and compose contributions from many teams.

It also brings all of the complexity. Suddenly most of the problems that plague traditional deployment now apply to our embedded system. The fact that the libraries are statically linked at compile time instead of dynamically at runtime is simply an implementation detail. I say most because everything is statically linked so there is no runtime component discovery that needs to be accounted for.

Anatomy of a Toolchain

Toolchains are responsible for taking high level descriptions of programs and lowering them down to a series of equivalent machine instructions. This process involves more than just a compiler. The compiler uses the GNU Binutils to manipulate its internal representation down to a given architecture. It also needs the use of the C standard library as well as a few other libraries needed for some compiler optimizations.

The C library provides the interface to the underlying kernel. System calls like write and read are provided by GNU C Library (glibc) on most distributions.

In embedded systems, smaller implementations like RedHat's newlib and newlib-nano are used.

Bootstrapping a Toolchain

In order to compile GCC we need a C library that's been compiled for our target architecture. How can we cross compile our C library if we need our C library to build a cross compiler? The solution is to build a simpler compiler that doesn't require the C library to function. It will not be capable of as many optimizations and it will be very slow, however it will be able to build the C libraries as well as the complete version of GCC.

In order to build the simpler compiler we need to compile the Binutils to work with our target architecture. Binutils can be bootstrapped with our host GCC and have no target dependencies. More information is available in this article.

Doesn't sound so bad right? It isn't... in theory. However internet forums since time immemorial have been littered with the laments of those who came before. From incorrect versions of ISL to the wrong C library being linked or the host linker being used, etc. The one commonality between all of these issues is the environment. Building GCC is difficult because isolating build environments is hard.

In fact as of v0.14.2, the Zephyr “software development kit” (SDK) repository took down the build instructions and posted a sign that read "Building this is too complicated, don't worry about it." (I'm paraphrasing, but not by much.)

We will neatly sidestep all of these problems and not risk destroying or polluting our host system with garbage by using Guix to manage our environments for us.

Our toolchain only requires the first pass compiler because newlib(-nano) is statically linked and introduced to the toolchain by normal package composition.

Defining the Packages

All of the base packages are defined in zephyr/packages/zephyr.scm. Zephyr modules (coming soon!) are defined in zephyr/packages/zephyr-xyz.scm, following the pattern of other module systems implemented by Guix.


First thing we need to build is the arm-zephyr-eabi binutils. This is very easy in Guix.

(define-public arm-zephyr-eabi-binutils
  (let ((xbinutils (cross-binutils "arm-zephyr-eabi")))
      (inherit xbinutils)
      (name "arm-zephyr-eabi-binutils")
      (version "2.38")
      (source (origin
                (method git-fetch)
                (uri (git-reference
                      (url "")
                      (commit "6a1be1a6a571957fea8b130e4ca2dcc65e753469")))
                (file-name (git-file-name name version))
                (sha256 (base32 "0ylnl48jj5jk3jrmvfx5zf8byvwg7g7my7jwwyqw3a95qcyh0isr"))))
       `(#:tests? #f
         ,@(substitute-keyword-arguments (package-arguments xbinutils)
             ((#:configure-flags flags)
          `(cons "--program-prefix=arm-zephyr-eabi-" ,flags)))))
       (modify-inputs (package-native-inputs xbinutils)
         (prepend texinfo bison flex gmp dejagnu)))
      (home-page "")
      (synopsis "Binutils for the Zephyr RTOS"))))

The function cross-binutils returns a package which has been configured for the given GNU triplet. We simply inherit that package and replace the source. The Zephyr build system expects the binutils to be prefixed with arm-zephyr-eabi- which is accomplished by adding another flag to the #:configure-flags argument.

We can test our package definition using the -L flag with guix build to add our packages.

$ guix build -L guix-zephyr zephyr-binutils


This directory contains the results of make install.

GCC sans libc

This one is a bit more involved. Don't be afraid! This version of GCC wants ISL version 0.15. It's easy enough to make that happen. Inherit the current version of ISL and swap out the source and update the version. For most packages the build process doesn't change that much between versions.

(define-public isl-0.15
    (inherit isl)
    (version "0.15")
    (source (origin
              (method url-fetch)
              (uri (list (string-append "mirror://sourceforge/libisl/isl-"
                            version ".tar.gz")))

Like the binutils, there is a cross-gcc function for creating cross-GCC packages. This one accepts keywords specifying which binutils and libc to use. If libc isn't given (like here), gcc is configured with many options disabled to facilitate being built without libc. Therefore we need to add the extra options we want (I got them from the SDK configuration scripts in the sdk-ng Git repository as well as the commits to use for each of the tools).

(define-public gcc-arm-zephyr-eabi-12
  (let ((xgcc (cross-gcc "arm-zephyr-eabi"
                         #:xbinutils zephyr-binutils)))
      (inherit xgcc)
      (version "12.1.0")
      (source (origin
                (method git-fetch)
                (uri (git-reference
                      (url "")
                      (commit "0218469df050c33479a1d5be3e5239ac0eb351bf")))
                (file-name (git-file-name (package-name xgcc) version))
                (patches (search-patches
      (native-inputs (modify-inputs (package-native-inputs xgcc)
                       ;; Get rid of stock ISL
                       (delete "isl")
                       ;; Add additional dependencies that xgcc doesn't have
                       ;; including our special ISL
                       (prepend flex
       (substitute-keyword-arguments (package-arguments xgcc)
         ((#:phases phases)
          `(modify-phases ,phases
             (add-after 'unpack 'fix-genmultilib
               (lambda _
                 (patch-shebang "gcc/genmultilib")))

             (add-after 'set-paths 'augment-CPLUS_INCLUDE_PATH
               (lambda* (#:key inputs #:allow-other-keys)
                 (let ((gcc (assoc-ref inputs "gcc")))
                   ;; Remove the default compiler from CPLUS_INCLUDE_PATH to
                   ;; prevent header conflict with the GCC from native-inputs.
                   (setenv "CPLUS_INCLUDE_PATH"
                           (string-join (delete (string-append gcc
                                                (string-split (getenv
                                                              #\:)) ":"))
                   (format #t
                    "environment variable `CPLUS_INCLUDE_PATH' changed to `a`%"
                    (getenv "CPLUS_INCLUDE_PATH")))))))

         ((#:configure-flags flags)
          ;; The configure flags are largely identical to the flags used by the
          ;; "GCC ARM embedded" project.
          `(append (list
                    "--with-host-libstdcxx=-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm"
                   (delete "--disable-multilib"
       (list (search-path-specification
              (variable "CROSS_C_INCLUDE_PATH")
              (files '("arm-zephyr-eabi/include")))
              (variable "CROSS_CPLUS_INCLUDE_PATH")
              (files '("arm-zephyr-eabi/include" "arm-zephyr-eabi/c++"
              (variable "CROSS_LIBRARY_PATH")
              (files '("arm-zephyr-eabi/lib")))))
      (home-page "")
      (synopsis "GCC for the Zephyr RTOS"))))

This GCC can be built like so.

$ guix build -L guix-zephyr gcc-cross-sans-libc-arm-zephyr-eabi


Great! We now have our stage-1 compiler.


The newlib package package is quite straight forward (relatively). It is mostly adding in the relevent configuration flags and patching the files the patch-shebangs phase missed.

(define-public zephyr-newlib
    (name "zephyr-newlib")
    (version "3.3")
    (source (origin
          (method git-fetch)
          (uri (git-reference
            (url "")
            (commit "4e150303bcc1e44f4d90f3489a4417433980d5ff")))
           (base32 "08qwjpj5jhpc3p7a5mbl7n6z7rav5yqlydqanm6nny42qpa8kxij"))))
    (build-system gnu-build-system)
     `(#:out-of-source? #t
       #:configure-flags '("--target=arm-zephyr-eabi"
       (modify-phases %standard-phases
     (add-after 'unpack 'fix-references-to-/bin/sh
       (lambda _
         (substitute# '("libgloss/arm/cpu-init/"
           (("/bin/sh") (which "sh")))
     `(("xbinutils" ,zephyr-binutils)
       ("xgcc" ,gcc-arm-zephyr-eabi-12)
       ("texinfo" ,texinfo)))
    (home-page "")
    (synopsis "C library for use on embedded systems")
    (description "Newlib is a C library intended for use on embedded
systems.  It is a conglomeration of several library parts that are easily
usable on embedded products.")
    (license (license:non-copyleft

And the build.

$ guix build -L guix-zephyr zephyr-newlib


Complete Toolchain

Mostly complete. libstdc++ does not build because arm-zephyr-eabi is not arm-none-eabi so a dynamic link check is performed/failed. I cannot figure out how crosstool-ng handles this.

Now that we've got the individual tools it's time to create our complete toolchain. For this we need to do some package transformations. Because these transformations are going to have to be done for every combination of binutils/gcc/newlib it is best to create a function which we can reuse for every version of the SDK.

(define (arm-zephyr-eabi-toolchain xgcc newlib version)
  "Produce a cross-compiler zephyr toolchain package with the compiler XGCC and the C\n  library variant NEWLIB."
  (let ((newlib-with-xgcc
           (inherit newlib)
            (modify-inputs (package-native-inputs newlib)
              (replace "xgcc" xgcc))))))
      (name (string-append "arm-zephyr-eabi"
                           (if (string=? (package-name newlib-with-xgcc)
      (version version)
      (source #f)
      (build-system trivial-build-system)
       '(#:modules ((guix build union)
                    (guix build utils))
         #:builder (begin
                     (use-modules (ice-9 match)
                                  (guix build union)
                                  (guix build utils))
                     (let ((out (assoc-ref %outputs "out")))
                       (mkdir-p out)
                       (match %build-inputs
                         (((names . directories) ...)
                          (union-build (string-append out "/arm-zephyr-eabi")
      (inputs `(("binutils" ,zephyr-binutils)
                ("gcc" ,xgcc)
                ("newlib" ,newlib-with-xgcc)))
      (synopsis "Complete GCC tool chain for ARM zephyrRTOS development")
       "This package provides a complete GCC tool chain for ARM
  bare metal development with zephyr rtos.  This includes the GCC arm-zephyr-eabi cross compiler
  and newlib (or newlib-nano) as the C library.  The supported programming
  language is C.")
      (home-page (package-home-page xgcc))
      (license (package-license xgcc)))))

This function creates a special package which consists of the toolchain in a special directory hierarchy, i.e arm-zephyr-eabi/. Our complete toolchain definition looks like this.

(define-public arm-zephyr-eabi-toolchain-0.15.0
  (arm-zephyr-eabi-toolchain gcc-arm-zephyr-eabi-12 zephyr-newlib

To build:

$ guix build -L guix-zephyr arm-zephyr-eabi-toolchain

Note: Guix now includes a mechanism to describe platforms at a high level, and which the --system and --target build options build upon. It is not used here but could be a way to better integrate Zephyr support in the future.

Integrating with Zephyr Build System

Zephyr uses CMake as its build system. It contains numerous CMake files in both the so-called ZEPHYR_BASE, the zephyr source code repository, as well as a handful in the SDK which help select the correct toolchain for a given board.

There are standard locations the build system will look for the SDK. We are not using any of them. Our SDK lives in the store, immutable forever. According to the Zephyr documentation, the variable ZEPHYR_SDK_INSTALL_DIR needs to point to our custom spot.

We also need to grab the CMake files from the repository and create a file, sdk_version, which contains the version string ZEPHYR_BASE uses to find a compatible SDK.

Along with the SDK proper we need to include a number of python packages required by the build system.

(define-public zephyr-sdk
    (name "zephyr-sdk")
    (version "0.15.0")
    (home-page "")
    (source (origin
              (method git-fetch)
              (uri (git-reference
                    (url "")
                    (commit "v0.15.0")))
              (file-name (git-file-name name version))
    (build-system trivial-build-system)
     `(#:modules ((guix build union)
                  (guix build utils))
       #:builder (begin
                   (use-modules (guix build union)
                                (ice-9 match)
                                (guix build utils))
                   (let ((out (assoc-ref %outputs "out"))
                         (cmake-scripts (string-append (assoc-ref
                         (sdk-out (string-append out "/zephyr-sdk-0.15.0")))
                     (mkdir-p out)

                     (match (assoc-remove! %build-inputs "source")
                       (((names . directories) ...)
                        (union-build sdk-out directories)))

                     (copy-recursively cmake-scripts
                                       (string-append sdk-out "/cmake"))

                     (with-directory-excursion sdk-out
                       (call-with-output-file "sdk_version"
                         (lambda (p)
                           (format p "0.15.0"))))))))
    (propagated-inputs (list arm-zephyr-eabi-toolchain-0.15.0
     (list (search-path-specification
            (variable "ZEPHYR_SDK_INSTALL_DIR")
            (separator #f)
            (files '("")))))
    (synopsis "Zephyr SDK")
     "zephyr-sdk contains bundles a complete gcc toolchain as well
as host tools like dtc, openocd, qemu, and required python packages.")
    (license license:apsl2)))


In order to test we will need an environment with the SDK installed. We can take advantage of guix shell to avoid installing test packages into our home environment. This way if it causes problems we can just exit the shell and try again.

guix shell -L guix-zephyr zephyr-sdk cmake ninja git

ZEPHYR_BASE can be cloned into a temporary workspace to test our toolchain functionality. (For now. Eventually we will need to create a package for zephyr-base that our Guix zephyr-build-system can use.)

mkdir /tmp/zephyr-project
cd /tmp/zephyr-project
git clone
export ZEPHYR_BASE=/tmp/zephyr-project/zephyr

In order to build for the test board (k64f in this case) we need to get a hold of the vendor Hardware Abstraction Layers and CMSIS. (These will also need to become Guix packages to allow the build system to compose modules).

git clone && \
git clone

To inform the build system about this module we pass it in with -DZEPHYR_MODULES= which is a semicolon separated list of paths containing a module.yml file.

To build the hello world sample we use the following incantation.

cmake -Bbuild $ZEPHYR_BASE/samples/hello_world \
    -GNinja \
    -DBOARD=frdm_k64f \
    -DBUILD_VERSION=3.1.0 \
    -DZEPHYR_MODULES="/tmp/zephyr-project/hal_nxp;/tmp/zephyr-project/cmsis" \
      && ninja -Cbuild

If everything is set up correctly we will end up with a ./build directory with all our build artifacts. The SDK is correctly installed!


A customized cross toolchain is one of the most difficult pieces of software to build. Using Guix, we do not need to be afraid of the complexity! We can fiddle with settings, swap out components, and do the most brain dead things to our environments without a care in the world. Just exit the environment and it's like it never happened at all.

It highlights one of my favorite aspects of Guix, every package is a working reference design for you to modify and learn from.

About GNU Guix

GNU Guix is a transactional package manager and an advanced distribution of the GNU system that respects user freedom. Guix can be used on top of any system running the Hurd or the Linux kernel, or it can be used as a standalone operating system distribution for i686, x86_64, ARMv7, AArch64 and POWER9 machines.

In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. When used as a standalone GNU/Linux distribution, Guix offers a declarative, stateless approach to operating system configuration management. Guix is highly customizable and hackable through Guile programming interfaces and extensions to the Scheme language.

by Mitchell Schmeisser at Wednesday, March 15, 2023

Tuesday, March 14, 2023


Escape hatches

By now, y’all should know about the Alternate Hard and Soft Layers pattern. It’s the idea of designing a system with some rules carved in granite (like Emacs’ C primitives) and some loosy-goosy (like Emacs’ Lisp extensions).

“In a cloud, bones of steel” as Charles Reznikoff put it. But what supercharges this design pattern for hackers is if you don’t make the boundaries between the layers too strict, if you provide ways to fall back through the patterns.

This “make the abstractions intentionally leaky” is a design decision that everytime I implement it, I get rewarded many times over (like how call-tables gives you easy, convenient access to the underlying hash-tables; I wasn’t sure if I was ever gonna use that but I’ve ended up using that again and again in many unforseen ways), and each time I forget to do it, I end up with a library that’s languishing from disuse and “What was I thinking?” and I don’t even use it myself.

It’s also why Markdown is so great. I was coming from Textile, which has all kinds of specific syntax to do specific things. Nice. I started out with Wikisyntax and was running up against the limitations of it, and Textile felt like a powerful injection of “wow, they managed to make room for all these things!”

p>{font-size:0.8em}. This is an example from the documentation for Textile markup language.

And then I saw Markdown. Fall through to HTML at any time. What a hecking Gordian knot. Make common things easy but keep unusual things possible.

And then as I’ve been working on this li’l site for years, I’ve been adding layers and above and below (pre-processing steps and post-processing steps), always keeping this “permeability” in mind. If I wanna bang out a quick li’l post using the normal defaults, I can. If I want to make anything a little different, weird, more special, I can. The other day I was writing a post that had super messy syntax under the hood; I have a way to make quick asides but here I needed a long aside with multiple nested paragraphs, blockquotes, links, something my normal aside system can’t handle. Escape hatches to the rescue; I just wrote that particular section in normal HTML. Easy peasy.


Like all patterns, this pattern isn’t always appropriate.


“Complex” is Latin for braided together, entwined.

Those who make super strict boundaries between the layers hope that will make each layer simple and portable. Scumm and Z-machine are two classic success stories from the world of game dev where they kept the abstractions tight and unleaky, and were rewarded with very portable games that are still installable and playable today.

But that requires you to know ahead of time exactly what primitives you’ll need and not need. One approach that might work here is to make the leaks traceable. You provide escape hatches so you can leak through the abstractions whenever you want to, but you put in some way to search for them. Then, once your app is settling, you can refactor all the “leaks” into proper calls to new API methods that you add to the layer below (recur this process when the leaks are from a layer more than one step below).

Or you just accept that the app is never gonna be ported, that it’s gonna stay as a big ball of mud, but thanks to the loosy-goosy layer boundaries it’s easy to quickly and flexibly add whatever feature you want.


Obviously you don’t want to do this for user input. HTML with JS is not appropriate for user input for example, since they can XSS you. (Why the world then has decided that HTML with JS is appropriate input for users and their browsers, from servers, is beyond me… but that’s a topic for another day.) So then you don’t want to allow formats that allow fallback HTML/JS either. Markdown, as in real markdown with JS and HTML attributes and stuff, is not a great format for a world-writable wiki, for example. Obviously a markdown-inspired format, like using ATX headers or Gemtext, is fine.

by Idiomdrottning ( at Tuesday, March 14, 2023

Emacs undo and me

Emacs has a couple of different undo packages I could install if I wanted to. Trees, histories, I don’t know what. I haven’t tried ‘em. Here is a good starting point.

Maybe I’ll switch over to one of them one of these days (and knowing how I usually work, probably right after writing an essay like this where I’ve just been like “oh I for sure don’t use any of those packages” and then three seconds later I get roped in (by myself if nothing else) to switching to one of them) but right now I use the same default way it works and has worked for twenty-five years.

In some weirdo chain my brain don’t fully understand but my fingers seem to know how to work. I can undo in one “direction” but then if I do anything else (just move the cursor or set the mark) it switches direction because the undos themselves are getting undone. It’s a mess but it somehow works, even for undos really far back.

But I would be dishonest if I didn’t also mention the other thing I do which sort of saves that messy system from being unusable: “save states”. I just save the file, usually with the default command, C-x C-s, but I also have mapped C-c A which saves a copy (to a standard location, always using the same name, it doesn’t prompt) without saving the local buffer at all, and C-c r which reverts the file, and if I revert by mistake I can still undo the revert. Usually.

So I’m often saving and reverting as a complement to the normal Emacs undo mechanisms. That kind of goes to show that I’m not 100% comfy with undo. On the other hand, these “save states” are a sort of protection that works even through crashes.

(global-set-key (kbd "C-c r") #'(lambda () (interactive) (revert-buffer t t)))

(defun save-a-copy ()
  (write-region (point-min) (point-max) "/tmp/saved-copy.txt"))

(global-set-key (kbd "C-c A") 'save-a-copy)

There’s no keyboard command for reverting from the copy, if I need to do that I’ll have to do it manually.

The life-changing Magit of… I’ll see myself out

If a file is under git, which admittedly most of these normal essay files aren’t, but programs and other heavier stuff are, then there’s another thing that’s possible that’s even better than undo or revert: “discard hunk” from magit. Perfect when a particular change was misguided or a particular thought ended up going nowhere. It feels like a super powerful time machine. I can use the files themselves as scrap paper, writing all kinds of junk in there. That’s why I’m terrified of using the “everything gets autocommitted, there is no staging area” philosophy of jujitsu, gitless, game of trees etc. I’m more scared of overcommitting than undercommitting. Slacker’s manifesto in one sentence right there!

Or if I want to restore something, I can browse to a version that has that file and paste from there. A li’l fiddly but at least it’s not lost. That has saved me a couple of times.

by Idiomdrottning ( at Tuesday, March 14, 2023


There is some semantic drift about whether or not ASCII only means the original 7 bit wide subset of what later became UTF-8. I grew up with having to be constantly aware of what encoding system was used since ISO-8859-1 and UTF-8 were fundamentally incompatible while also being hard for machines to tell apart. Vi minns när det såg ut så här.


  • sometimes means only 7-bit chars
  • sometimes means all chars in a standard font
  • sometimes means all glyphs representable in “font-like” metrics (like in Caves of Qud)

it’s a word that’s a little hard to use. But that’s fine. That happens in language. It sucks, but it’s because language is unfixably flawed (while also being the best we’ve got so we have to make do).

I guess I should clarify “7-bit” or “7-bit ASCII” when that’s what I mean.

Language design

If you’ve seen me complaining about overuse of ASCII in programming languages, I did mean 7-bit ASCII. One of my long-running peeves is abusing what I sometimes call “the shift line”. The line of non-letter, non-number characters that’s stuck above 0123456789 on Sholes-type keyboards: !@#$%^&*()[}-=_+?; and so on. I forget them all (I have a different kind of keyboard).

For historical reasons, language designers got it into their heads that “these ia ia ctharacters means whatever the heck I want”.

Especially sequences of them. Which means that if you’re doing Clojure dev through a screenreader and you use a thrush you get to listen to “hyphen greater than greater than colon foo open curly brace” to your hearts content.

I did mean 7-bit ASCII. Why is Unicode fine where “the shift line” is not? Why is → fine while -> is not fine? Because of the life-changing magic of semiotics. → sounds like a right arrow. ↓ sounds like a down arrow. Glyphs have pronouncable names and aren’t just sequences of signs that Sholes happened to like, signs that have had their original meanings scrubbed out and rewritten again and again in a palimpsest of blood and ink. $ mean dollars. Or hexadecimal. Or a scalar value. Or the grey, damp filthiness of ages.

Swearing in comics

I sometimes refer to the shift line as “grawlix”, which is pretty cruel to the inventor of that word, cartoonist Mort Walker.

Grawlix meant swearing in comics, which in the olden days meant drawing little skulls, lightning bolts, spirals (for some reason), dots and blots. Emoji before emoji. Fun fun fun.♥ You could also put the actual word (if it’s good enough for the characters it’s good enough for the readers), or a bowlderized version like “oh jeez”, or a black censor bar, the visual equivalent of a “beep”.

In the dark ages when the typewriter and the early 7- and 8-bit computers held illimitable dominion over all, comics writers started using the shift line to represent swearing. I hate that. I hate it even more when they “sneakily” tried to match up the glyphs with similar-looking letters in some sort of vulgar leetspeak. $#!&.

Hence “grawlix” for these cursed characters semantically diluted to the point of line noise.

by Idiomdrottning ( at Tuesday, March 14, 2023

Sunday, March 12, 2023



XMPP peeps: Email is a lost cause, a dead horse, it’s unpossible
Also XMPP peeps: Everyone needs to get on OMEMO, all old jabber clients are obsolete and need to be thrown away

Not sure why one of them can be improved and the other can’t…

I was hanging out with some XMPP fans and they were very gracious & kind & patient, and have not consented to this blog post so I’m gonna be super vague about them!

I generally like XMPP. Y’all know I snipe at the Matrix protocol sometimes which maybe as a li’l throwing stones in glass houses since I’m on ActivityPub which is almost as messy as Matrix, but XMPP I’m generally respectful of. Unlike Matrix, it has an architecture that makes sense. Me and my friends play D&D over XMPP and have done so for over two hundred sessions (after seven years of playing at the table).

Now to the point: the XMPP folks I talked to hated email! I’m sure they don’t speak for everyone in XMPP but they were like “email is a lost cause, hopeless, abandon, I don’t even use it”. But email is to XMPP as XMPP is to Matrix: it’s the original & best!

Email RFCs > XMPP XEPs > Matrix SCPs.

Email has a trail of lukewarm, picked-over, abandoned implementations that could not keep up with how the protocol evolved. Servers that don’t support DKIM, clients that don’t understand mimetypes etc. But. So does XMPP.

I’m still hoping we get double ratchet for email. Some sort of OMEMO-like setup. Hopefully more stale than XMPP’s OMEMO which has been changing namespaces incompatibly.

by Idiomdrottning ( at Sunday, March 12, 2023

Saturday, March 11, 2023

Jérémy Korwin-Zmijowski

Guile Hacker Handbook - New chapters

Guile Logo

Almost one year since I last added a chapter to the book I am writing to help anyone getting started in their Guile journey.

  • Thanks to contributors, I fixed some links and code snippets.
  • I started a new section called “Fix it!” (your feedback about it are welcome!)
  • I added chapters to the app tutorial. Still early, be patient haha I also changed the way I published the book. Since something broke in the compilation of the mdBook's version I am using, I now push the html directly to a Gitlab instance and render it using Gitlab Pages. One more reason to motivate a migration from mdBook to Skribilo. Brewing…

Thank you very much for reading this article!

Don't hesitate to give me your opinion, suggest an idea for improvement, report an error, or ask a question ! I would be so glad to discuss about the topic covered here with you ! You can reach me here.

Don't miss out on the next ones ! Either via RSS or via e-mail !

And more importantly, share this blog and tell your friends why they should read this post!

#gnu #guile #tdd #book #english

GPG: 036B 4D54 B7B4 D6C8 DA62 2746 700F 5E0C CBB2 E2D1

Saturday, March 11, 2023


go install a fork

Golang has become pretty rough on installing forks. I ran into the same frustrations as these peeps.

It only affects forks of packages that uses go’s module system, so there’s no problem for mdna.

But if you have a fork of a go package that does use modules, you’ll run into this error when you go install it:

module declares its path as: path/to/their/repo
        but was required as: path/to/your/repo

This is a serious obstacle for open source collaboration so it’s pretty imperative that go lang fixes the issue. It’s a bug in go install as far as I’m concerned.

Meanwhile, you have three bad compromised options:

Keep only their version canonical

Do not rename the package paths and jump through serious hoops in order to compile your local version from the source tree.

The downside is that no-one else can install your version, cutting down on software ecosystem diversity and decentralized collaboration and testing, and you can’t easily install your version on other machines either. This option is OK when upstream is awesome and immortal and rapidly responsive and they love all your patches and you have a good working relationship with them, but even so, you have to struggle to build & test your binaries.

Also, I don’t know how to run install from these local source trees; the pkgs got put into $GOPATH but the binary didn’t. When I ran go build it placed the binary in the source dir, not in $GOPATH. I could manually copy it into fakeroot’s /usr/bin in order to build a .deb, but that was because it was a single binary. I don’t know how to handle projects with a more complex artifact story.

Keep both versions canonical

Maintain two repos, once where you’ve renamed the paths and one where you haven’t. The rename should be in a single commit.

If you are sending patches rather than pull requests, you might make do with a single repo (just as long as your patches don’t contain the commit that has the renames).

Branches alone can’t hack it since go install, to the best of my current knowledge, isn’t branch-aware, it relies on fetching whatever branch the repo’s HEAD is set to, so you need two separate repos. In other words, I’m not aware of a go install equivalent for git clone -b.

This is bad because it’s a ton of work for you, the contributor.

It’s (by far) what’s best for everyone else, for the rest of society, but talk about hoop city. If I end up doing a lot of golang stuff I might consider cooking up something to automate this approach while we wait for golang to sober the heck up.

Sayonara upstream

Obviously if you’re taking over maintainership completely, there’s no problem. Just rename the paths and your version is the official one from now on. The easiest solution, but it’s not collaborative in spirit. It’s going to be fiddly for upstream to use your changes. If you at least kept your path renames in one separate commit, upstream can do some Magit juggling to keep up but it’s gonna be a chore for them.

Hybrid approaches

A gentler Sayonara

One way is to create a temporary repo with your changes (but with paths not renamed). Send your patches or pull requests from there. Then fork your own temporary repo into a separate location, and change the names of the paths there, and from then on it’s the “Sayonara upstream” approach. You give them one chance and then you move on.

This combines most of the drawbacks of the Sayonara approach but gives upstream the benefit of one drive-by commit. It’s not super collaborative but it’s ever-so-slightly more sociable than the pure Sayonara approach while being minimally more difficult and space-consuming.

A future Sayonara

Another way is to start with the “keep only their version canonical” approach, and stick with it as long as they’re alive (and by “alive” I only mean “an active maintainer”, no need to get morbid) and once they’ve moved on, you switch to the Sayonara approach.

This is not good because it’s an approach built only for optimists. You are presuming you’re gonna be alive when it’s time to make the switch when they’re gone. That’s not necessarily gonna work out. Unmaintained commits can end up in an unusable limbo.


The “keep both versions canonical” approach is the “best” in some sense of the word. When that’s too much of an effort, use one of the two hybrid approaches. All five of the approaches outlined here are severely compromised so I’m staring daggers at the go install design team.

by Idiomdrottning ( at Saturday, March 11, 2023

Butlerian Jihad

mnl wrote:

Most importantly, having a wide array of tools at your disposal is what allows you to be pragmatic. And pragmatically, large language models represent the biggest paradigm shift in programming that I have personally experienced. It feels an order of magnitude more life-changing than discovering Common Lisp, and I’m only 3 months into using these things intensely.

(And a few weeks back we had Tom Scott’s embarrassing and self-shaming video.)

My experience with the current generation (ChatGPT), for programming specifically, is that:

  • it suggests impossible things that can not ever be made to work even with tweaks
  • sends you down a rabbit hole of wrongness when what you would’ve needed instead was a blank slate and a clear perspective
  • it lies and says that it has tested things (even giving the specific version of the compiler it’s supposed to “work” on) without having done so

Now I don’t wanna base my anti-LLM sentiment on “the current generation doesn’t work very well” so I’m being very careful and deliberate in saying how those issues are specific to that one version.

(More generally, code is law and I don’t wanna be ruled by an ouroborus of law-generated law.)

But that said, my three issues with the present version are pragmatically & currently speaking enough of a showstopper for me to nope out for now.

Clogs in the cogs! is my rallying cry.

by Idiomdrottning ( at Saturday, March 11, 2023

Friday, March 10, 2023


Stray Bridge Musings

Five months ago, the ejabberd XMPP server dev team blogged that they were going to support Matrix.

While that hasn’t been committed to their “Community Edition” yet (and maybe won’t ever? Reading the comment section it seems like OMEMO/OLM conversion is kind if a tricky creature. But who knows what happens in the hearts of cathedrals), the post is still an enjoyable read for the snipes and jabs at Matrix:

Of course, by design, the Matrix protocol cannot scale as well as XMPP or MQTT protocols. At the heart of Matrix protocol, you have a kind of merging algorithm that reminds a bit of Google Wave. It means that a conversation is conceptually represented as a sort document you constantly merge on the server. This is a consuming process that is happening on the server for each message received in all conversations. That’s why Matrix has the reputation to be so difficult to scale.

Wikipedia corroborates this:

The Matrix standard specifies RESTful HTTP APIs for securely transmitting and replicating JSON data between Matrix-capable clients, servers and services. Clients send data by PUTing it to a ‘room’ on their server, which then replicates the data over all the Matrix servers participating in this ‘room’. This data is signed using a git-style signature to mitigate tampering, and the federated traffic is encrypted with HTTPS and signed with each server’s private key to avoid spoofing.

Holy heck. IRC, XMPP and email only have to worry about one message at a time on the wire. Matrix needs to sign & send the entire universe.

Anyway, I was thinking about that ejabberd announcment today, since I’ve been on a bridges kick, prompted by my newfound enthusiasm for XMPP, after finding out that Bitlbee can send multiple-line messages.

On Matrix’ old bridges blog post they list a bunch of categories of bridges, and mention in passing s2s bridges:

Server-to-server bridging

Some remote protocols (IRC, XMPP, SIP, SMTP, NNTP, GnuSocial etc) support federation - either open or closed. The most elegant way of bridging to these protocols would be to have the bridge participate in the federation as a server, directly bridging the entire namespace into Matrix.
We’re not aware of anyone who’s done this yet.

I agree that these are the most interesting bridges and it seems to me that that’s sort of what the ejabberd team set out to do.

Similarly, there is a project underway called Libervia that seems like it’s both an XMPP client and an ActivityPub / Fediverse server. Curious stuff.♥


by Idiomdrottning ( at Friday, March 10, 2023

Thursday, March 9, 2023


Why it’s bad that the web is so feature-rich

The feature-rich web has a lot of advantages; the best web-apps are easy to learn, which is great since the learnability threshold is a huge problem with the wonderful world of Unix and worse-is-better that a lot of us are so enamored with. I remember when gratis webmail was first made widely available (with the launch of Rocketmail and Hotmail) and how it made email accessible to a lot of people who didn’t have access to it before: not only to library users, students and other people without ISPs, but also to people who couldn’t figure out how to use their ISP email (or to use it when they were away from home).

But there are also a couple of problems with the feature-rich web:

Lack of mashupability

The web has gone through a couple of stages.

First, it was basic text and hyperlinks, no frills, no formatting.

Then, we were in the font-family, table, img map, Java applet, Shockwave, QuickTime, Flash, broken-puzzle-piece-of-the-week hell for a few years.

Third, CSS was invented and the heavens parted and we had clarity and it was degrading gracefully and could be tweaked and fixed on the user side. (Halfway through the third stage, “Web 2.0”, which means web pages talking to the server without having to reload the entire page, was introduced. I don’t have an issue with that, inherently, and it worked throughout the third stage and we didn’t run into problems until the fourth stage.)

Fourth, client-side DOM generation and completely JavaScript-reliant web pages were invented and we were back in the bad place.

Why was stages one and three so great and stages two and four so bad? Because one and three worked well with scraping, mashing, alternate usages, command line access, other interfaces, automation, labor-saving and comfort-giving devices. Awk & sed. They were also free-flowing and intrinsically device-independent.

The web, as presented by stage two and four, is arrogantly made with only one view in mind. It’s like interacting with the site through a thick sheet of glass. Yes, it’s “responsive”, which great if it really is (i.e. by being a stage one or three type web page), but often is just a cruel joke of “we are using JavaScript to check what we think your device is and then we are generating a view that we think suits it”. The “device independence” is limited and is only extrinsic.

Solution to mashupability

API endpoints and Atom feeds. The ActivityPub “fediverse” is built on feed tech so it’s got to be doing at least something right, even though it has a lot of problems.

And HTMX or Hotwire is a way to get the best of both worlds. Traditionally accessible webpages with modern polish.

Complexity causes security issues

Obviously when your tech is a pile of junk upon junk that you can’t even see the bottom of, let alone understand, you risk running into security issues.

When there’s stuff buried deep in your Rube Goldberg machines that you don’t fully understand, things are gonna break, which is bad if hackers can exploit it to wreck you but it’s also bad on its own, if it just breaks and you can’t repair it. The tragedy of Muine comes to mind; it was the best music player GUI of all time but it ended when the Mono platform it was built on was pulled out from its feet.

“We have even developed a machine to take care of the machine. What if the machine that repairs the machine breaks?” — from Mad’s adaptation of E.M. Forster’s story “The Machine Stops”

Solution to complexity

Use, maintain, and teach full-stack (all the way down to the wires and metal), but feel free to stick to the subset of the stack that you’re actually using. Sometimes things fall by the wayside and that’s OK. I remember having to learn about “token-ring networks” in school and in hindsight that was wasted time; in the 25 years that has passed since then I have never used, seen, or even heard of them in real life now that we have Ethernet and wireless. (On the flipside, UDP was pretty rusted compared to TCP but has seen a resurgence with techs like Mosh and Wireguard.)

Profit-seeking exploiters

And then we do have the whole class of problems that is stemming from how for-profit entities use it. Trackers, exploiters, siloers, wasm miners, externality-abusers, path-dependency–seekers, monopolists, popup spammers, rootkits, pundits, modals, captchas, machine-generated text, SEO, bad typographers.

It’s a morass of badness out there.

Solution to profit-seekers

Storm the palace.


The core of my problem with the web is this:

If I wanna access a web resource and I can’t just wget it and parse the tags, if I instead need to boot up a huge eight megabyte Indigo Ramses Colossus (like Safari (a.k.a. Epiphany), Firefox, or Chromium) to do that, because otherwise there is no DOM because it needs to be put together in JavaScript, that’s a problem. “But why not access the web pages normally?”, these site devs ask. Because they are so bad and so inaccessible and so gunked up with user-exploiting bad design decisions.

The flipside is that the Unix afficionado’s solution (“use separate apps, like an IRC client, email app, XMPP app, newsreader app”) is inaccessible since not everyone knows their way around ./configure && make. So maybe the ultimate solution is both. Put your room on IRC but provide a web interface to it. Put your contacts on email but provide a web interface to that too.

by Idiomdrottning ( at Thursday, March 9, 2023

Tuesday, March 7, 2023


RFC 2646’s Format=Flowed and inline quoting

According to RFC 2646, email apps should:

Space-stuff lines which start with a space, “From ”, or “>”.

So I’m gonna argue against an RFC should here.

The intent behind this particular should is to prevent lines that normally start with a greater-than character to be misinterpreted as if they were quote characters. I’m hard pressed to come up with even one example but I’m sure there are some math situation or something where it could conceivably come up. The intent was that there should be some sort of UI widget to explicitly mark text as “quoted”.

My recommendation is that MUAs (including webmail interfaces) should not space-stuff lines that start with a “>” character, and if there is a line that starts with “ >” (a single space in front of the >), leave those alone too.

That way, if math nerds want to send non-quoted lines that do start with a >, they still can, they just add a space in front of them manually.

Still space stuff lines that start with “From ”, and lines (other than “ >”) that start with a space.

Format=Flowed is otherwise a fantastic format. It solves a lot of email’s problems. The next section (4.2) in the RFC is still great. Non–space-stuffed lines starting with a > are marked as quoted. Perfect, that’s exactly what we want. In other words, I am not suggesting changing the format or the on-the-wire protocol. I’m only talking about a way to make generating this format more palatable for text area based interfaces.

Email is a text format and having to reach for a formatting menu or toolbar just to mark some text as quoted falls apart pretty quickly, as does trying to interleave your own responses in between the quoted lines. Those interfaces work for top-posting and bottom-posting, and that’s great, but they make inline-posting impossible.

Markdown to Format=Flowed

There are some variants of markdown where every linebreak is hard, but in original markdown you mark hard linebreaks with two spaces at the end of the line (before the break), and blank lines are also hard.

Format=Flowed is the other way around. Soft linebreaks are marked with a single space at the end of the line, before the break.

Therefore, to convert, one way is to add a space to the end of every line, then remove three spaces from lines that now have three spaces, then remove one space from lines that only contain that space and nothing else, then remove one space from lines before a blank line.

Another way to do it is go from top to bottom; if there are two spaces, just remove them, otherwise, if there is text (the line is non-blank) on both the current and the next line, add one space to the current line.

by Idiomdrottning ( at Tuesday, March 7, 2023

Monday, March 6, 2023


The IRC client optimized for creepers

I’ve been seeing the Matrix bridge to IRC networks like Libera as an overall good thing. IRC has been dying and this bridge is, in some sense of the word, an IRC client, and it’s an IRC client that people can get behind. Don’t ask me why they’re so into it because I don’t like using it, but that’s the point of clients and protocols: you use a client you like, I use a client I like, it’s all kumbaya and good.

However, IRC is set up so that you can join channels / rooms about topics and talk about topics there. It’s not set up so you can just go in and take your pick of users all across the network and start a convo with a rando. I know that /names with channel and server omitted is supposed to list all names, but that’s disabled on Libera.

Of course, there is /query and /msg, and that’s been great for talking to friends.

But lately I’ve been getting a ton of creepy queries from complete rando strangers who seem to believe they’re on Tinder or something and I’ve been wondering what channel they find me in. They don’t seem to know who I am or to be familiar with this web page.

They never say who they are, just a bunch of hello. And if I ask questions, like how the heck they found me or who they are, it’s “Can’t we get to know each other first dear�. I don’t think so.

Today I found out where they are all coming from and why there’s no connection to a particular room, channel, or topic.

Turns out on Matrix has a user directory and people search for Sandra (not sure why they do that if they don’t know me) and they find my Libera account.

Screenshot of searching for Sandra on Element. Several users come up, not just me.

Now, it’s not like it’d be a rock solid situation if Matrix didn’t exist. There’s no stopping someone from just going on Libera and /whois some random names, including mine. And I get plenty of creepers on email already.

But it’s a bit weird that I try to stay off the main networks, I try to stick to smolnet, but because of bridges the larger networks are right at my own door anyway.


I don’t want to discourage readers of this homepage from writing in (I get so many good corrections, clarifications, suggestions, questions from y’all), and this Matrix address is one legitimate way to do that (although I prefer email even though I’m also on Libera, Tilde, OFTC, XMPP, and Fedi). Using that Matrix address isn’t gonna break anything on my end. It shows up like any other query in my Soju. I might not see it right away, because I’m not hanging out on IRC all day.

If any of the Matrix email bridges gets dusted off and end up working well, maybe even with encryption (I have TLS for the wire, and WKD and Autocrypt and an exported key for e2ee), that’d be even better since email is better than IRC.

In the end, Matrix is just a client, and I don’t wanna prevent people from using whatever client they’re comfy with.

Matrix users are often nervous on IRC, “is my client being annoying�—no, I’ve never had a problem. I wish your clients could read/parse our escape codes (like bold or italic) but it’s such a minor issue that I don’t even notice. I want IRC and e-mail to live, and if Matrix is one way for that to happen then that’s a good thing.

Like, I use Emacs to connect to IRC, e-mail, and XMPP. Is someone preferring to use Matrix to connect to IRC, e-mail, and XMPP really that much weirder?

Of course, the fear is that Matrix will embrace & extend those protocols and eventually drop support for them once everyone is on Matrix.

So to summarize: I’m not complaining about readers writing in.

What I’m complaining about is scrubs who are just like “🤤🤤🤤 I wanna search for random girl names in this text box and chat them up�.

by Idiomdrottning ( at Monday, March 6, 2023