# Planet Scheme

## Tuesday, April 13, 2021

### Programming Praxis

#### Geothmetic Meandian

In mathematics, the arithmetic geometric mean of two positive real numbers is computed by repeatedly taking half their sum and the square root of their product until the two numbers converge. For instance, the arithmetic geometric mean of 24 and 6 is 13.456171…, with the iterative steps computed as follows:

0  24                            6
1  15                           12
2  13.5                         13.416407864998738175455042
3  13.458203932499369089227521  13.458139030990984877207090
4  13.458171481745176983217305  13.458171481706053858316334
5  13.458171481725615420766820  13.458171481725615420766806

The arithmetic geometric mean was invented by Lagrange and studied by Gauss, and is used today to compute various transcendental functions because it converges so quickly.

In the world of Randall Munroe, the geothmetic meandian of any set of positive numbers is computed by iterating three sequences — the arithmetic mean, the geometric mean, and the median — until they converge. For instance, the geothmetic meandian of the set (1,1,2,3,5) is 2.089, computed as follows:

1  2.4                1.9743504858348200 2
2  2.1247834952782734 2.1161924605448084 2
3  2.0803253186076938 2.0795368194795802 2.1161924605448084
4  2.0920181995440275 2.0919486049152223 2.0803253186076938
5  2.0880973743556477 2.0880901331209600 2.0919486049152223
6  2.0893787041306098 2.0893779142184865 2.0880973743556477
7  2.0889513309015815 2.0889512436159920 2.0893779142184865
8  2.0890934962453533 2.0890934865653277 2.0889513309015815
9  2.0890461045707540 2.0890461034958396 2.0890934865653277

[ I hate the damned new editor at WordPress; I struggle with it every time I post an exercise. I could not figure out how to embed the image from XKCD in this blog post. You can see it here. ]

Your task is to write programs that compute the arithmetic geometric mean and geothmetic meandian. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Once I realised than a file system tree is like a MindMap…

Contrary to the name, it is actually in Chibi, not in scsh. I initially thought that scsh would be better due to more exensive posix support, but it turned out to be that Chibi was good enough.

It is a small-ish (500 lines of code) script to generate a graph from your filesystem tree. It accepts a few options (editable directly at the file top) and duplicates quite a lot of the GNU Find functionality, but I didn’t find a way to avoid doing that, as it has to use heuristics in order to prune the tree to a reasonable size.

The resulting image is like this:

Small, 1Mb

Large, 44Mb

I plotted the Slarm64 (Unofficial Slackware for Raspberry Pi) repository tree, just for the demonstration purposes.

The size of the images above is 1×2.5 metres. It’s large, but my original goal was to plot my whole file system. The ’size=’ parameter is tunable. I think it is reasonable to assume that you need to have at least 4 square centimetres per node, so a graph that large would accommodate about 4000 nodes. In my opinion, 8000 is still possible, but too tight.

With the default settings the script ignores regular files, but traverses symlinks. In theory it also supports hardlinks, but you would need to turn on drawing regular files manually.

I made this script, because I started to feel that I am starting to forget what I have on my hard drive, that has amassed quite a lot of life history for the past 20 years. (Since hard drives became reasonably priced.)

Use-cases and pull requests welcome. One more reason to create this script was to prove that Scheme can be a practical programming language.

Technologically, this code is not terribly advanced, the only trick that may be interesting to programming nerds is having the r7rs module and the main function in the same file (like scsh/scheme48 suggest doing), which requires procedural module analysis.

I had to glue on a couple of C bindings for sys/xattr.h, those are now available at the Snow-Fort repo. Those are Chibi-specific.

Hope you will enjoy it.

## Monday, April 12, 2021

### GNU Guix

#### New Supported Platform: powerpc64le-linux

It is a pleasure to announce that support for powerpc64le-linux (PowerISA v.2.07 and later) has now been merged to the master branch of GNU Guix!

This means that GNU Guix can be used immediately on this platform from a Git checkout. Starting with the next release (Guix v1.2.1), you will also be able to download a copy of Guix pre-built for powerpc64le-linux. Regardless of how you get it, you can run the new powerpc64le-linux port of GNU Guix on top of any existing powerpc64le GNU/Linux distribution.

This new platform is available as a "technology preview". This means that although it is supported, substitutes are not yet available from the build farm, and some packages may fail to build. Although powerpc64le-linux support is nascent, the Guix community is actively working on improving it, and this is a great time to get involved!

### Why Is This Important?

This is important because it means that GNU Guix now works on the Talos II, Talos II Lite, and Blackbird mainboards sold by Raptor Computing Systems. This modern, performant hardware uses IBM POWER9 processors, and it is designed to respect your freedom. The Talos II and Talos II Lite have recently received Respects Your Freedom (RYF) certification from the FSF, and Raptor Computing Systems is currently pursuing RYF certification for the more affordable Blackbird, too. All of this hardware can run without any non-free code, even the bootloader and firmware. In other words, this is a freedom-friendly hardware platform that aligns well with GNU Guix's commitment to software freedom.

How is this any different from existing RYF hardware, you might ask? One reason is performance. The existing RYF laptops, mainboards, and workstations can only really be used with Intel Core Duo or AMD Opteron processors. Those processors were released over 15 years ago. Since then, processor performance has increased drastically. People should not have to choose between performance and freedom, but for many years that is exactly what we were forced to do. However, the POWER9 machines sold by Raptor Computing Systems have changed this: the free software community now has an RYF-certified option that can compete with the performance of modern Intel and AMD systems.

Although the performance of POWER9 processors is competitive with modern Intel and AMD processors, the real advantage of the Talos II, Talos II Lite, and Blackbird is that they were designed from the start to respect your freedom. Modern processors from both Intel and AMD include back doors over which you are given no control. Even though the back doors can be removed with significant effort on older hardware in some cases, this is an obstacle that nobody should have to overcome just to control their own computer. Many of the existing RYF-certified options (e.g., the venerable Lenovo x200) use hardware that can only be considered RYF-certified after someone has gone through the extra effort of removing those back doors. No such obstacles exist when using the Talos II, Talos II Lite, or Blackbird. In fact, although Intel and AMD both go out of their way to keep you from understanding what is going on in your own computer, Raptor Computing Systems releases all of the software and firmware used in their boards as free software. They even include circuit diagrams when they ship you the machine!

Compared to the existing options, the Talos II, Talos II Lite, and Blackbird are a breath of fresh air that the free software community really deserves. Raptor Computing Systems' commitment to software freedom and owner control is an inspiring reminder that it is possible to ship a great product while still respecting the freedom of your customers. And going forward, the future looks bright for the open, royalty-free Power ISA stewarded by the OpenPOWER Foundation, which is now a Linux Foundation project (see also: the same announcement from the OpenPOWER Foundation.

In the rest of this blog post, we will discuss the steps we took to port Guix to powerpc64le-linux, the issues we encountered, and the steps we can take going forward to further solidify support for this exciting new platform.

### Bootstrapping powerpc64le-linux: A Journey

To build software, you need software. How can one port Guix to a platform before support for that platform exists? This is a bootstrapping problem.

In Guix, all software for a given platform (e.g., powerpc64le-linux) is built starting from a small set of "bootstrap binaries". These are binaries of Guile, GCC, Binutils, libc, and a few other packages, pre-built for the relevant platform. It is intended that the bootstrap binaries are the only pieces of software in the entire package collection that Guix cannot build from source. In practice, additional bootstrap roots are possible, but introducing them in Guix is highly discouraged, and our community actively works to reduce our overall bootstrap footprint. There is one set of bootstrap binaries for each platform that Guix supports.

This means that to port Guix to a new platform, you must first build the bootstrap binaries for that platform. In theory, you can do this in many ways. For example, you might try to manually compile them on an existing system. However, Guix has package definitions that you can use to build them - using Guix, of course!

Commonly, the first step in porting Guix to a new platform is to use Guix to cross-compile the bootstrap binaries for that new platform from a platform on which Guix is already supported. This can be done by running a command like the following on a system where Guix is already installed:

guix build --target=powerpc64le-linux-gnu bootstrap-tarballs

This is the route that we took when building the powerpc64le-linux bootstrap binaries, as described in commit 8a1118a. You might wonder why the target above is "powerpc64le-linux-gnu" even though the new Guix platform is called "powerpc64le-linux". This is because "powerpc64le-linux-gnu" is a GNU triplet identifying the new platform, but "powerpc64le-linux" is the name of a "system" (i.e., a platform) in Guix. Guix contains code that converts between the two as needed (see nix-system->gnu-triplet and gnu-triplet->nix-system in guix/utils.scm. When cross-compiling, you only need to specify the GNU triplet.

Note that before you can even do this, you must first update the glibc-dynamic-linker and system->linux-architecture procedures in Guix's code, as described in Porting. In addition, the versions of packages in Guix that make up the GNU toolchain (gcc, glibc, etc.) must already support the target platform. This pre-existing toolchain support needs to be good enough so that Guix can (1) build, on some already-supported platform, a cross-compilation toolchain for the target platform, (2) use, on the already-supported platform, the cross-compilation toolchain to cross-compile the bootstrap binaries for the target platform, and (3) use, on the target platform, the bootstrap binaries to natively build the rest of the Guix package collection. The above guix build command takes care of steps (1) and (2) automatically.

Step (3) is a little more involved. Once the bootstrap binaries for the target platform have been built, they must be published online for anyone to download. After that, Guix's code must be updated so that (a) it recognizes the "system" name (e.g., "powerpc64le-linux") that will be used to identify the new platform and (b) it fetches the new platform's bootstrap binaries from the right location. After all that is done, you just have to try building things and see what breaks. For example, you can run ./pre-inst-env guix build hello from your Git checkout to try building GNU Hello.

The actual bootstrap binaries for powerpc64le-linux are stored on the alpha.gnu.org FTP server. Chris Marusich built these bootstrap binaries in an x86_64-linux Guix System VM which was running on hardware owned by Léo Le Bouter. Chris then signed the binaries and provided them to Ludovic Courtès, who in turn verified their authenticity, signed them, and uploaded them to alpha.gnu.org. After that, we updated the code to use the newly published bootstrap binaries in commit 8a1118a. Once all that was done, we could begin bootstrapping the rest of the system - or trying to, at least.

There were many stumbling blocks. For example, to resolve some test failures, we had to update the code in Guix that enables it to make certain syscalls from scheme. In another example, we had to patch GCC so that it looks for the 64-bit libraries in /lib, rather than /lib64, since that is where Guix puts its 64-bit libraries by convention. In addition, some packages required in order to build Guix failed to build, so we had to debug those build failures, too.

For a list of all the changes, see the patch series or the actual commits, which are:

\$ git log --oneline --no-decorate 8a1118a96c9ae128302c3d435ae77cb3dd693aea^..65c46e79e0495fe4d32f6f2725d7233fff10fd70
65c46e79e0 gnu: sed: Make it build on SELinux-enabled kernels.
93f21e1a35 utils: Fix target-64bit? on powerpc64le-linux.
c29bfbfc78 syscalls: Fix RNDADDTOENTCNT on powerpc64le-linux.
b57de27d03 syscalls: Fix clone on powerpc64le-linux.
a16eb6c5f9 Add powerpc64le-linux as a supported Guix architecture.
b50f426803 gnu: libelf: Fix compilation for powerpc64le-linux.
1a0f4013d3 gnu: texlive-latex-base: Fix compilation on powerpc64le*.
e9938dc8f0 gnu: texlive-bin: Fix compilation on powerpc64le*.
69b3907adf gnu: guile-avahi: Fix compilation on powerpc64le-linux.
4cc2d2aa59 gnu: bdb-4.8: Fix configure on powerpc64le-linux.
be4b1cf53b gnu: binutils-final: Support more Power architectures.
060478c32c gnu: binutils-final: Provide bash for binary on powerpc-linux.
b2135b5d57 gnu: gcc-boot0: Enable 128-bit long double for POWER9.
6e98e9ca92 gnu: glibc: Fix ldd path on powerpc*.
cac88b28b8 gnu: gcc-4.7: On powerpc64le, fix /lib64 references.
8a1118a96c gnu: bootstrap: Add support for powerpc64le-linux.

In the end, through the combined efforts of multiple people, we slowly worked through the issues until we reached a point where we could do all of the following things successfully:

• Build Guix manually on a Debian GNU/Linux ppc64el machine (this is Debian's name for a system using the powerpc64le-linux-gnu triplet), and verify that its make check tests passed.
• Build GNU Hello using Guix and run it.
• Run guix pull to build and install the most recent version of Guix, with powerpc64le-linux support.
• Build a release binary tarball for powerpc64le-linux via: make guix-binary.powerpc64le-linux.tar.xz
• Use that binary to install a version of Guix that could build/run GNU Hello and run guix pull successfully.

This was an exciting moment! But there was still more work to be done.

Originally, we did this work on the wip-ppc64le branch, with the intent of merging it into core-updates. By convention, the "core-updates" branch in Guix is where changes are made if they cause too many rebuilds. Since we were updating package definitions so deep in the dependency graph of the package collection, we assumed it wouldn't be possible to avoid rebuilding the world. For this reason, we had based the wip-ppc64le branch on core-updates.

However, Efraim Flashner proved us wrong! He created a separate branch, wip-ppc64le-for-master, where he adjusted some of the wip-ppc64le commits to avoid rebuilding the world on other platforms. Thanks to his work, we were able to merge the changes directly to master! This meant that we would be able to include it in the next release (Guix v.1.2.1).

In short, the initial porting work is done, and it is now possible for anyone to easily try out Guix on this new platform. Because guix pull works, too, it is also easy to iterate on what we have and work towards improving support for the platform. It took a lot of cooperation and effort to get this far, but there are multiple people actively contributing to this port in the Guix community who want to see it succeed. We hope you will join us in exploring the limits of this exciting new freedom-friendly platform!

### Other Porting Challenges

Very early in the porting process, there were some other problems that stymied our work.

First, we actually thought we would try to port to powerpc64-linux (big-endian). However, this did not prove to be any easier than the little-endian port. In addition, other distributions (e.g., Debian and Fedora) have recently dropped their big-endian powerpc64 ports, so the little-endian variant is more likely to be tested and supported in the community. For these reasons, we decided to focus our efforts on the little-endian variant, and so far we haven't looked back.

In both the big-endian and little-endian case, we were saddened to discover that the bootstrap binaries are not entirely reproducible. This fact is documented in bug 41669, along with our extensive investigations.

In short, if you build the bootstrap binaries on two separate machines without using any substitutes, you will find that the derivation which cross-compiles %gcc-static (the bootstrap GCC, version 5.5.0) produces different output on the two systems. However, if you build %gcc-static twice on the same system, it builds reproducibly. This suggests that something in the transitive closure of inputs of %gcc-static is perhaps contributing to its non-reproducibility. There is an interesting graph toward the end of the bug report, shown below:

This graph shows the derivations that produce differing outputs across two Guix System machines, when everything is built without substitutes. It starts from the derivation that cross-compiles %gcc-static for powerpc64-linux-gnu (from x86_64-linux) using Guix at commit 1ced8379c7641788fa607b19b7a66d18f045362b. Then, it walks the graph of derivation inputs, recording only those derivations which produce differing output on the two different machines. If the non-reproducibility (across systems) of %gcc-static is caused by a non-reproducible input, then it is probably caused by one or more of the derivations shown in this graph.

At some point, you have to cut your losses and move on. After months of investigation without resolving the reproducibility issue, we finally decided to move forward with the bootstrap binaries produced earlier. If necessary, we can always go back and try to fix this issue. However, it seemed more important to get started with the bootstrapping work.

Anyone who is interested in solving this problem is welcome to comment on the bug report and help us to figure out the mystery. We are very interested in solving it, but at the moment we are more focused on building the rest of the Guix package collection on the powerpc64le-linux platform using the existing bootstrap binaries.

### Next Steps

It is now possible to install Guix on a powerpc64le-linux system and use it to build some useful software - in particular, Guix itself. So Guix is now "self-hosted" on this platform, which gives us a comfortable place to begin further work.

The following tasks still need to be done. Anyone can help, so please get in touch if you want to contribute!

• Solve the GCC bootstrap binary reproducibility issue described above.
• Get Guix System to work on powerpc64le-linux.
• Get CI infrastructure to work (Cuirass (see also: Cuirass in the Guix manual, guix-build-coordinator (see also: Guix Build Coordinator in the Guix manual, substitutes, etc.)
• Try to build your favorite packages using Guix, report problems, try to fix them, and ask for help if you're feeling stuck or not sure how to start.
• Try building rust, and if it works, judiciously re-introduce the librsvg dependency for powerpc64le-linux in gtk+ and gtk+-2, since it is currently missing.
• Upgrade the default GCC to 8 on core-updates, try to build guix (e.g., ./pre-inst-env guix build guix), and report/fix whatever issues occur. We want to upgrade GCC to 8 because, on the core-updates branch, glibc has been upgraded from 2.31 to 2.32. Unfortunately, on powerpc64le-linux, upgrading glibc from 2.31 to 2.32 without also upgrading the default GCC (it's currently 7.5.0) causes a lot of problems. Right now, we believe the best path forward is probably just to upgrade to GCC 8 on core-updates.
• Merge core-updates to master after that.

GNU Guix is a transactional package manager and an advanced distribution of the GNU system that respects user freedom. Guix can be used on top of any system running the Hurd or the Linux kernel, or it can be used as a standalone operating system distribution for i686, x86_64, ARMv7, and AArch64 machines.

In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. When used as a standalone GNU/Linux distribution, Guix offers a declarative, stateless approach to operating system configuration management. Guix is highly customizable and hackable through Guile programming interfaces and extensions to the Scheme language.

## Sunday, April 11, 2021

### Andy Wingo

Good evening! A brief(ish?) note today about some Guile nargery.

the arc of history

Like many language implementations that started life when you could turn on the radio and expect to hear Def Leppard, Guile has a bottom half and a top half. The bottom half is written in C and exposes a shared library and an executable, and the top half is written in the language itself (Scheme, in the case of Guile) and somehow loaded by the C code when the language implementation starts.

Since 2010 or so we have been working at replacing bits written in C with bits written in Scheme. Last week's missive was about replacing the implementation of dynamic-link from using the libltdl library to using Scheme on top of a low-level dlopen wrapper. I've written about rewriting eval in Scheme, and more recently about how the road to getting the performance of C implementations in Scheme has been sometimes long.

These rewrites have a quixotic aspect to them. I feel something in my gut about rightness and wrongness and I know at a base level that moving from C to Scheme is the right thing. Much of it is completely irrational and can be out of place in a lot of contexts -- like if you have a task to get done for a customer, you need to sit and think about minimal steps from here to the goal and the gut doesn't have much of a role to play in how you get there. But it's nice to have a project where you can do a thing in the way you'd like, and if it takes 10 years, that's fine.

But besides the ineffable motivations, there are concrete advantages to rewriting something in Scheme. I find Scheme code to be more maintainable, yes, and more secure relative to the common pitfalls of C, obviously. It decreases the amount of work I will have when one day I rewrite Guile's garbage collector. But also, Scheme code gets things that C can't have: tail calls, resumable delimited continuations, run-time instrumentation, and so on.

Taking delimited continuations as an example, five years ago or so I wrote a lightweight concurrency facility for Guile, modelled on Parallel Concurrent ML. It lets millions of fibers to exist on a system. When a fiber would need to block on an I/O operation (read or write), instead it suspends its continuation, and arranges to restart it when the operation becomes possible.

A lot had to change in Guile for this to become a reality. Firstly, delimited continuations themselves. Later, a complete rewrite of the top half of the ports facility in Scheme, to allow port operations to suspend and resume. Many of the barriers to resumable fibers were removed, but the Fibers manual still names quite a few.

Which brings us to today's note: I just rewrote Guile's reader in Scheme too! The reader is the bit that takes a stream of characters and parses it into S-expressions. It was in C, and now is in Scheme.

One of the primary motivators for this was to allow read to be suspendable. With this change, read-eval-print loops are now implementable on fibers.

Another motivation was to finally fix a bug in which Guile couldn't record source locations for some kinds of datums. It used to be that Guile would use a weak-key hash table to associate datums returned from read with source locations. But this only works for fresh values, not for immediate values like small integers or characters, nor does it work for globally unique non-immediates like keywords and symbols. So for these, we just wouldn't have any source locations.

A robust solution to that problem is to return annotated objects rather than using a side table. Since Scheme's macro expander is already set to work with annotated objects (syntax objects), a new read-syntax interface would do us a treat.

With read in C, this was hard to do. But with read in Scheme, it was no problem to implement. Adapting the expander to expect source locations inside syntax objects was a bit fiddly, though, and the resulting increase in source location information makes the output files bigger by a few percent -- due somewhat to the increased size of the .debug_lines DWARF data, but also due to serialized source locations for syntax objects in macros.

Speed-wise, switching to read in Scheme is a regression, currently. The old reader could parse around 15 or 16 megabytes per second when recording source locations on this laptop, or around 22 or 23 MB/s with source locations off. The new one parses more like 10.5 MB/s, or 13.5 MB/s with positions off, when in the old mode where it uses a weak-key side table to record source locations. The new read-syntax runs at around 12 MB/s. We'll be noodling at these in the coming months, but unlike when the original reader was written, at least now the reader is mainly used only at compile time. (It still has a role when reading s-expressions as data, so there is still a reason to make it fast.)

As is the case with eval, we still have a C version of the reader available for bootstrapping purposes, before the Scheme version is loaded. Happily, with this rewrite I was able to remove all of the cruft from the C reader related to non-default lexical syntax, which simplifies maintenance going forward.

An interesting aspect of attempting to make a bug-for-bug rewrite is that you find bugs and unexpected behavior. For example, it turns out that since the dawn of time, Guile always read #t and #f without requiring a terminating delimiter, so reading "(#t1)" would result in the list (#t 1). Weird, right? Weirder still, when the #true and #false aliases were added to the language, Guile decided to support them by default, but in an oddly backwards-compatible way... so "(#false1)" reads as (#f 1) but "(#falsa1)" reads as (#f alsa1). Quite a few more things like that.

All in all it would seem to be a successful rewrite, introducing no new behavior, even producing the same errors. However, this is not the case for backtraces, which can expose the guts of read in cases where that previously wouldn't happen because the C stack was opaque to Scheme. Probably we will simply need to add more sensible error handling around callers to read, as a backtrace isn't a good user-facing error anyway.

OK enough rambling for this evening. Happy hacking to all and to all a good night!

### Joe Marshall

#### Can continuation passing style code perform well?

Continuation passing style is a powerful technique that allows you to abstract over control flow in your program. Here is a simple example: We want to look things up in a table, but sometimes the key we use is not associated with any value. In that case, we have to do something different, but the lookup code doesn't know what the caller wants to do, and the caller doesn't know how the lookup code works. Typically, we would arrange for the lookup code to return a special “key not found” value:

(let ((answer (lookup key table)))
... handle missing key ...
... compute something with answer...)

There are two minor problems with this approach. First, the “key not found” value has to be within the type returned by lookup. Consider a table that can only contain integers. Unfortunately, we cannot declare answer to be an integer because it might be the “key not found” value. Alternatively, we might decide to reserve a special integer to indicate “key not found”. The answer can then be declared an integer, but there is now a magic integer that cannot be stored in the table. Either way, answer is a supertype of what can be stored in the table, and we have to project it back down by testing it against “key not found”.

The second problem is one of redundancy. Presumably, somewhere in the code for lookup there is a conditional for the case that the key hasn't been found. We take a branch and return the “key not found” value. But now the caller tests the return value against “key not found” and it, too, takes a branch. We only take the true branch in the caller if the true branch was taken in the callee and we only take the false branch in the caller if the false branch was taken in the callee. In essence, we are branching on the exact same condition twice. We've reified the control flow, injected the reified value into the space of possible return values, passed it through the function call boundary, then projected and reflected the value back into control flow at the call site.

If we write this in continuation passing style, the call looks like this

(lookup key table
…compute something with answer…)
(lambda ()
…handle missing key…))
lookup will invoke the first lambda expression on the answer if it is found, but it will invoke the second lambda expression if the answer is not found. We no longer have a special “key not found” value, so answer can be exactly the type of what is stored in the table and we don't have to reserve a magic value. There is also no redundant conditional test in the caller.

This is pretty cool, but there is a cost. The first is that it takes practice to read continuation passing style code. I suppose it takes practice to read any code, but some languages make it extra cumbersome to pass around the lambda expressions. (Some seem actively hostile to the idea.) It's just more obscure to be passing around continuations when direct style will do.

The second cost is one of performance and efficiency. The lambda expressions that you pass in to a continuation passing style program will have to be closed in the caller's environment, and this likely means storage allocation. When the callee invokes one of the continuations, it has to perform a function call. Finally, the lexically scoped variables in the continuation will have to be fetched from the closure's environment. Direct style performs better because it avoids all the lexical closure machinery and can keep variables in the local stack frame. For these reasons, you might have reservations about writing code in continuation passing style if it needs to perform.

Continuation passing style looks complicated, but you don't need a Sufficiently Smart compiler to generate efficient code from it. Here is lookup coded up to illustrate:

(defun lookup (key table if-found if-not-found)
(labels ((scan-entries (entries)
(cond ((null entries) (funcall if-not-found))
((eq (caar entries) key) (funcall if-found (cdar entries)))
(t (scan-entries (cdr entries))))))
(scan-entries table)))
and a sample use might be
(defun probe (thing)
(lookup thing *special-table*
(lambda (value) (format t "~s maps to ~s." thing value))
(lambda () (format t "~s is not special." thing))))

Normally, probe would have to allocate two closures to pass in to lookup, and the code in each closure would have to fetch the lexical value of key from the closure. But without changing either lookup or probe we can (declaim (inline lookup)). Obviously, inlining the call will eliminate the overhead of a function call, but watch what happens to the closures:

(defun probe (thing)
((lambda (key table if-found if-not-found)
(labels ((scan-entries (table)
(cond ((null entries) (funcall if-not-found))
((eq (caar entries) key) (funcall if-found (cdar entries)))
(t (scan-entries (cdr entries))))))
(scan-entries table)))
thing *special-table*
(lambda (value) (format t "~s maps to ~s." thing value))
(lambda () (format t "~s has no mapping." thing))))
A Decent Compiler will easily notice that key is just an alias for thing and that table is just an alias for *special-table*, so we get:
(defun probe (thing)
((lambda (if-found if-not-found)
(labels ((scan-entries (entries)
(cond ((null entries) (funcall if-not-found))
((eq (caar entries) thing) (funcall if-found (cdar entries)))
(t (scan-entries (cdr entries))))))
(scan-entries *special-table*)))
(lambda (value) (format t "~s maps to ~s." thing value))
(lambda () (format t "~s has no mapping." thing))))
and the expressions for if-found and if-not-found are side-effect free, so they can be inlined (and we expect the compiler to correctly avoid unexpected variable capture):
(defun probe (thing)
((lambda ()
(labels ((scan-entries (entries)
(cond ((null entries)
(funcall (lambda () (format t "~s has no mapping." thing))))
((eq (caar entries) thing)
(funcall (lambda (value) (format t "~s maps to ~s." thing value))
(cdar entries)))
(t (scan-entries (cdr entries))))))
(scan-entries *special-table*)))))
and the immediate calls to literal lambdas can be removed:
(defun probe (thing)
(labels ((scan-entries (entries)
(cond ((null entries) (format t "~s has no mapping." thing))
((eq (caar entries) thing)
(format t "~s maps to ~s." thing (cdar value))))
(t (scan-entries (cdr entries))))))
(scan-entries *special-table*)))

Our Decent Compiler has removed all the lexical closure machinery and turned the calls to the continuations into direct code. This code has all the features we desire: there is no special “key not found” value to screw up our types, there is no redundant branch: the (null entries) test directly branches into the appropriate handling code, we do not allocate closures, and the variables that would have been closed over are now directly apparent in the frame.

It's a bit vacuous to observe that an inlined function performs better. Of course it does. At the very least you avoid a procedure call. But if you inline a continuation passing style function, any Decent Compiler will go to town and optimize away the continuation overhead. It's an unexpected bonus.

On occasion I find that continuation passing style is just the abstraction for certain code that is also performance critical. I don't give it a second thought. Continuation passing style can result in high-performance code if you simply inline the critical calls.

## Thursday, April 8, 2021

### Andy Wingo

#### sign of the times

Hello all! There is a mounting backlog of things that landed in Guile recently and to avoid having to eat the whole plate in one bite, I'm going to try to send some shorter missives over the next weeks.

Today's is about a silly thing, dynamic-link. This interface is dlopen, but "portable". See, back in the day -- like, 1998 -- there were lots of kinds of systems and how to make and load a shared library portably was hard. You'd have people with AIX and Solaris and all kinds of weird compilers and linkers filing bugs on your project if you hard-coded a GNU toolchain invocation when creating loadable extensions, or hard-coded dlopen or similar to use them.

Libtool provided a solution to create portable loadable libraries, which involved installing .la files alongside the .so files. You could use libtool to link them to a library or an executable, or you could load them at run-time via the libtool-provided libltdl library.

But, the .la files were a second source of truth, and thus a source of bugs. If a .la file is present, so is an .so file, and you could always just use the .so file directly. For linking against an installed shared library on modern toolchains, the .la files are strictly redundant. Therefore, all GNU/Linux distributions just delete installed .la files -- Fedora, Debian, and even Guix do so.

Fast-forward to today: there has been a winnowing of platforms, and a widening of the GNU toolchain (in which I include LLVM as well as it has a mostly-compatible interface). The only remaining toolchain flavors are GNU and Windows, from the point of view of creating loadable shared libraries. Whether you use libtool or not to create shared libraries, the result can be loaded either way. And from the user side, dlopen is the universally supported interface, outside of Windows; even Mac OS fixed their implementation a few years back.

So in Guile we have been in an unstable equilibrium: creating shared libraries by including a probably-useless libtool into the toolchain, and loading them by using a probably-useless libtool-provided libltdl.

But the use of libltdl has not been without its costs. Because libltdl intends to abstract over different platforms, it encourages you to leave off the extension when loading a library, instead promising to try a platform-specific set such as .so, .dll, .dylib etc as appropriate. In practice the abstraction layer was under-maintained and we always had some problems on Mac OS, for example.

Worse, as ltdl would search through the path for candidates, it would only report the last error it saw from the underlying dlopen interface. It was almost always the case that if A and B were in the search path, and A/foo.so failed to load because of a missing dependency, the error you would get as a user would instead be "file not found", because ltdl swallowed the first error and kept trucking to try to load B/foo.so which didn't exist.

In summary, this is a case where the benefits of an abstraction layer decline over time. For a few years now, libltdl hasn't been paying for itself. Libtool is dead, for all intents and purposes (last release in 2015); best to make plans to migrate away, somehow.

In the case of the dlopen replacement, in Guile we ended up rewriting the functionality in Scheme. The underlying facility is now just plain dlopen, for which we shim a version of dlopen on Windows, inspired by the implementation in cygwin. There are still platform-specific library extensions, but that is handled easily on the Scheme layer.

Looking forward, I think it's probably time to replace Guile's use of libtool to create its libraries and executables. I loathe the fact that libtool puts shell scripts in the place of executables in build directories and stashes the actual executables elsewhere -- like, visceral revulsion. There is no need for that nowadays. Not sure what to replace it with, nor on what timeline.

And what about autotools? That, my friends, would be a whole nother blog post. Until then, & probably sooner, happy hacking!

### GNU Guix

#### Outreachy 'guix git log' internship wrap-up

Magali Lemes joined Guix in December for a three-month internship with Outreachy. Magali implemented a guix git log command to browse the history of packaging changes, with mentoring from Simon Tournier and Gábor Boskovits. In this blog post, Magali and Simon wrap up on what's been accomplished.

#### Magali

The first tasks I had to do were pretty simple and were mainly meant to both get me acquainted with the source code and set the building blocks of the project. They were very important so that I could gradually build the subcommand. For starters, I created a Guix repository on Gitlab, so that I could push all the work I had done there, tweaked a little bit of the source code, and then proceeded to have the guix git log subcommand show the default channel checkout path.

From there on, I started adding options to the subcommand. The oneline option was the first and simplest option, and it pretty much emulates what git log --oneline does: it displays the commit short hash id and title. Afterwards, other options such as format, pretty, and grep came along. The possibility of retrieving commits from other channels---and not only from the default one---was also implemented. An example of invoking the subcommand would be:

guix git log --oneline --grep=guile-git

The road to doing all of this wasn't always smooth. Right at the beginning of the internship, for instance, I struggled with getting a segmentation fault error. It was a known bug, and I was able to overcome it.

I also got to participate in the one-day Guix workshop---a FOSDEM 2021 fringe event---and presented the work I had done so far. It was quite nice demonstrating the subcommand, receiving feedback and questions, and I could also get to know other things that were being worked on in Guix.

In the post I wrote three months ago, I mention that I wish I could gain meaningful experience and improve my communication skills. I'm glad to say that I feel like I was able to achieve that. Sending emails, explaining what I had done, and asking questions about what I had to do during the weekly meetings were a few of the situations I had to face, and that made me improve these skills. I also had a taste of what it's like to take part in a free software project, got to know a few people, and learned quite a lot about Guile Scheme.

One thing, though, that I wasn't able to implement in due time was the commit limiting options, such as guix git log --after=YYYY-MM-DD and guix git log --before=YYYY-MM-DD.

Hopefully, soon users will be able to invoke guix git log, and have the commit history from all Guix channels they have.

#### Simon

It was my first experience mentoring for the Outreachy program and now I am glad I did it. I have learnt a lot on various topics. I had already mentored interns occasionally, though it was the first time fully remote, not on the same timezone, and with code on which I am not expert. Thanks Gábor, Ludovic and Ricardo for pushing me to jump in this journey.

Reading back the initial proposal coming from a 2019 question, I am happy by the insofar Magali's internship. It paves the way for future proposals or the implementation of other guix git subcommands.

# Next Outreachy round & acknowledgment

Guix is participating in the upcoming Outreachy round. If you are eligible, please get in touch with us and consider applying by April 30th!

In light of recent changes at the Free Software Foundation (FSF) and Outreachy’s subsequent decision to refuse funds coming from the FSF, we are grateful to Software Freedom Conservancy (SFC) for their decision to sponsor our upcoming internship. We are working on a longer-term solution so we can keep participating in Outreachy. In the meantime, thanks a lot, Conservancy!

GNU Guix is a transactional package manager and an advanced distribution of the GNU system that respects user freedom. Guix can be used on top of any system running the kernel Linux, or it can be used as a standalone operating system distribution for i686, x86_64, ARMv7, and AArch64 machines.

In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. When used as a standalone GNU/Linux distribution, Guix offers a declarative, stateless approach to operating system configuration management. Guix is highly customizable and hackable through Guile programming interfaces and extensions to the Scheme language.

## Sunday, April 4, 2021

### Scheme Requests for Implementation

#### SRFI 219: Define higher-order lambda

SRFI 219 is now in final status.

This SRFI codifies the following shorthand syntax, which some Scheme implementations have had for a long time.

(define ((outer-name outer-args ...) inner-args ...)
inner-body ...)

## Saturday, April 3, 2021

### Joe Marshall

#### Early LISP Part II (Apply redux)

By April of 1959, issues with using subst to implement β-reduction became apparent. In the April 1959 Quarterly Progress Report of the Research Laboratory of Electronics, McCarthy gives an updated definition of the universal S-function apply:

    apply[f;args]=eval[cons[f;appq[args]];NIL]
where
appq[m]=[null[m]→NIL;T→cons[list[QUOTE;car[m]];appq[cdr[m]]]]
and
eval[e;a]=[
atom[e]→eval[assoc[e;a];a];
atom[car[e]]→[
car[e]=COND→evcon[cdr[e];a];
T→eval[cons[assoc[car[e];a];evlis[cdr[e];a]];a]];

and
and
evlis[m;a]= [null[m]→NIL;T→cons[list[QUOTE;eval[car[m];a]];
evlis[cdr[m];a]]

I find this a lot easier to understand if we transcribe it into modern Common LISP:

;;; Hey Emacs, this is -*- Lisp -*-

(in-package "CL-USER")

;; Avoid smashing the standard definitions.

(defun apply (f args)
(eval (cons f (appq args)) nil))

(defun appq (m)
(cond ((null m) nil)
(t (cons (list 'QUOTE (car m)) (appq (cdr m))))))

(defun eval (e a)
(cond ((atom e) (eval (assoc e a) a))
((atom (car e))
(cond ((eq (car e) 'QUOTE) (cadr e))
((eq (car e) 'ATOM)  (atom (eval (cadr e) a)))
((eq (car e) 'EQ)    (eq (eval (cadr e) a) (eval (caddr e) a)))
((eq (car e) 'COND)  (evcon (cdr e) a))
((eq (car e) 'CAR)   (car (eval (cadr e) a)))
((eq (car e) 'CDR)   (cdr (eval (cadr e) a)))
((eq (car e) 'CONS)  (cons (eval (cadr e) a) (eval (caddr e) a)))
(t (eval (cons (assoc (car e) a) (evlis (cdr e) a)) a))))
((eq (caar e) 'LABEL) (eval (cons (caddar e) (cdr e))
(cons (list (cadar e) (car e)) a)))
((eq (caar e) 'LAMBDA) (eval (caddar e)
(append (pair (cadar e) (cdr e)) a)))))

(defun evcon (c a)
(cond ((eval (caar c) a) (eval (cadar c) a))
(t (evcon (cdr c) a))))

(defun evlis (m a)
(cond ((null m) nil)
(t (cons (list 'QUOTE (eval (car m) a)) (evlis (cdr m) a)))))

;;; Modern helpers
(defun assoc (k l)

(defun pair (ls rs)
(map 'list #'list ls rs))

(defun testit ()
(apply '(label ff (lambda (x) (cond ((atom x) x) ((quote t) (ff (car x))))))
(list '((a . b) . c))))

There are a few things to notice about this. First, there is no code that inspects the value cell or function cell of a symbol. All symbols are evaluated by looking up the value in the association list a, so this evaluator uses one namespace. Second, the recursive calls to eval when evaluating combinations (the last clause of the inner cond and the LABEL and LAMBDA clauses) are in tail position, so this evaluator could be coded up tail-recursively. (It is impossible to say without inspecting the IBM 704 assembly code.)

What is most curious about this evaluator is the first clause in the outer cond in eval. This is where variable lookup happens. As you can see, we look up the variable by calling assoc, but once we obtain the value, we call eval on it. This LISP isn't storing values in the environment, but rather expressions that evaluate to values. If we look at the LAMBDA clause of the cond, the one that handles combinations that begin with lambda expressions, we can see that it does not evaluate the arguments to the lambda but instead associates the bound variables with the arguments' expressions. This therefore has call-by-name semantics rather than the modern call-by-value semantics.

By April 1960 we see these changes:

(defun eval (e a)
(cond ((atom e) (assoc e a))
((atom (car e))
(cond ((eq (car e) 'QUOTE) (cadr e))
((eq (car e) 'ATOM)  (atom (eval (cadr e) a)))
((eq (car e) 'EQ)    (eq (eval (cadr e) a) (eval (caddr e) a)))
((eq (car e) 'COND)  (evcon (cdr e) a))
((eq (car e) 'CAR)   (car (eval (cadr e) a)))
((eq (car e) 'CDR)   (cdr (eval (cadr e) a)))
((eq (car e) 'CONS)  (cons (eval (cadr e) a) (eval (caddr e) a)))
(t (eval (cons (assoc (car e) a) (evlis (cdr e) a)) a))))
((eq (caar e) 'LABEL) (eval (cons (caddar e) (cdr e))
(cons (list (cadar e) (car e)) a)))
((eq (caar e) 'LAMBDA) (eval (caddar e)
(append (pair (cadar e) (evlis (cdr e) a)) a)))))
Note how evaluating an atom now simply looks up the value of the atom in the association list and evaluation of a combination of a lambda involves evaluating the arguments eagerly. This is a call-by-value interpreter.

## Wednesday, March 31, 2021

### GNU Guix

#### Cuirass 1.0 released

We are pleased to announce the release of Cuirass version 1.0, after almost five years of development and around 700 commits from 14 contributors.

Cuirass is the GNU Guix continuous integration software. It's a general purpose build automation server written in GNU Guile that checks out sources from VCS repositories, execute build jobs and store build results in a database. Cuirass also provides a web interface to monitor the build results.

Cuirass is running on the GNU Guix build farm.

Since January, the project is funded through the NGI0 PET Fund, a fund established by NLnet with financial support from the European Commission's Next Generation, as explained here.

Thanks to this support, we were able to speed up the developments and finally propose a first release of this software. Many things have changed in Cuirass over the years and now is the perfect time to give it a try.

Here are the highlights of this new release.

### Database

Let's start with the database that is the core of Cuirass. Until recently Cuirass was using an SQLite3 database. This technological choice proved to be quite challenging, and we had some troubles to make it scale as discussed here.

Cuirass now uses a PostgreSQL database, bringing the performance issues to an end while providing much more stability. Almost all the SQL queries are covered by test cases.

### Specifications

In order to build some derivations, Cuirass first needs to be told what to build. Originally, an obscure association list describing the requested build jobs had to be passed to Cuirass.

Cuirass now operates on specification records that are described here. This input format is much more easy to understand for the user. It relies on Guix channels, which are well-adopted.

Here are a few different specifications examples.

This will build all the packages of the my-channel channel.

(list (specification
(name "my-channel")
(build '(channels my-channel))
(channels
(cons (channel
(name 'my-channel)
(url "https://my-channel.git"))
%default-channels))))

This will build the linux-libre package on the default Guix channel master branch.

(list (specification
(name "my-linux")
(build '(packages "linux-libre"))))

Finally, this will build the packages declared in the my-manifest file of the my-channel channel, against the core-updates branch of the default Guix channel.

(list (specification
(name "my-core-manifest")
(build '(manifests "my-manifest"))
(channels
(list (channel
(name 'my-channel)
(url "https://my-channel.git"))
(channel
(name 'guix)
(url %default-channel-url)
(branch "core-updates"))))))

For people willing to spare some parens, a specification edition form has been implemented in the Web interface.

The Cuirass home page has also been updated to reflect this new input format.

This feature was rightfully requested many times as this is a basic of any respectable CI system. Cuirass can now report failing and fixed builds in three different ways:

### New build mode

The traditional way of building things in Cuirass is to send batches of derivations that need to be built to the local Guix daemon. The daemon can possibly offload those builds to other machines. While it's probably the most sensible way to proceed, this solution doesn't scale well and suffers from some limitations.

• There's no way to influence the scheduling decisions of the Guix daemon. It's quite delicate to prioritize builds or build machines from Cuirass.

• The Guix daemon doesn't offer much feedback. Cuirass needs to parse the debug output of the daemon to detect build events such as start and stop events.

• Using a unique daemon means using unique build parameters such as build timeout and max-silent-time properties. Some packages have different build properties and Cuirass cannot honor them.

• When relying heavily on offloading, the Guix daemon scales badly. Builds that often take a longer time to complete, such as emulated builds can saturate the build queue.

For all those reasons, using a new build mode seemed like a necessary evil. The rationale behind this new build mode is to have Cuirass communicate directly with the Guix daemons of all the offloading machines. Instead of dealing with a single, local, Guix daemon, Cuirass can now interact with several Guix daemons on remote machines.

The build jobs are not submitted to the local Guix daemon. Instead, a remote server dispatches build requests to the connect remote workers, according to the build priorities.

The remote server and the connected workers communicate using ZeroMQ over TCP. The workers are able to discover the remote server using Avahi.

The built items are exchanged as substitutes by spawning Guix publish servers both on the remote server and on each connected remote worker.

It seems more complex, and it is indeed more complex. However, the performance gains are real.

This chart shows the CPU idle time percentage of the GNU Guix build farm machines. The introduction of the remote building mechanism around January 2021 results in a much higher activity of the connected machines.

This remote build mode also unlocked new features such as:

• The live streaming of build logs from remote workers to Cuirass so that they can be browsed in real time through the web interface.

• The support for timeout and max-silent-time package properties.

• The support for specification and package priorities.

• The new "Workers status" and "Machine status" pages allowing to closely monitor remote machine activities.

The workers status page is accessible here.

The machine status page is accessible here.

### Web interface

Besides the features related to the specification record introduction, several improvements were brought to the Web interface.

Some administration actions that previously required manual SQL intervention can now be performed directly through the Web interface.

• Edit a specification
• Delete a specification
• Cancel an evaluation pending builds
• Retry all builds of an evaluation
• Retry an evaluation
• Restart a build

The build page was also improved to display the build weather and a build history.

Several issues were also fixed such as the broken pagination and the negative build duration.

### Metrics

Cuirass computes periodically various metrics such as:

• Average evaluation duration per specification (seconds).
• Difference between newly added derivations and built derivations per day.
• Average time required for an evaluation to start its builds.
• Evaluation completion speed.
• Sum of currently pending builds.
• Builds count per machine during the last day.
• Percentage of failed evaluations.

Those metrics can be browsed here.

### Documentation

The Cuirass documentation is now updated to reflect those changes and can be browsed online.

The Guix's Cuirass package as well as the Cuirass service were also updated.

### Going further

The NLNet grant will allow me to keep working on Cuirass for a couple more months. This will hopefully help us to:

• Connect more armhf/aarch64 machines to the build farm.

• Fix the build dependencies issue

• Add a substitutes availability API and its counterpart in GNU Guix to improve the the channel-with-substitute-available procedure to take a manifest argument. This way, the guix pull command can be instructed to only update to Guix revisions where the manifest packages are all substitutable.

This release is an important milestone as, combined with the recent substitute improvements, the whole substitute availability & download speed situation is now largely mitigated, at least on Intel architectures.

Don't hesitate to run your own Cuirass server to build stuff ahead of the GNU Guix build farm, or to build your custom channels. Also feel free to share the features you would like to see in the next Cuirass release.

## Tuesday, March 30, 2021

### Programming Praxis

#### Greedy Text Justification

Today’s exercise would make a good interview question, though I don’t know of anyone doing that; you should answer as if you are standing at a whiteboard explaining your code as you go:

Given a string and a line width, split the string into words (a maximal run of characters excluding spaces) and write the words onto successive lines with spaces added between the words so that each line is the requested width. Words should be added to lines greedily (as many words as will fit) and extra spaces should be assigned to the left of the output string. The last line should not have spaces added, so it may be shorter than the other lines.

For example, the string “This is an example of text justification” is written with a line width of 16 like this:

    ----+----+----+-
This    is    an
example  of text
justification.
----+----+----+-

Your task is to write a program that greedily justifies text. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Monday, March 29, 2021

### Jeremy Kun

#### Regression and Linear Combinations

Recently I’ve been helping out with a linear algebra course organized by Tai-Danae Bradley and Jack Hidary, and one of the questions that came up a few times was, “why should programmers care about the concept of a linear combination?”

For those who don’t know, given vectors $v_1, \dots, v_n$, a linear combination of the vectors is a choice of some coefficients $a_i$ with which to weight the vectors in a sum $v = \sum_{i=1}^n a_i v_i$.

I must admit, math books do a poor job of painting the concept as more than theoretical—perhaps linear combinations are only needed for proofs, while the real meat is in matrix multiplication and cross products. But no, linear combinations truly lie at the heart of many practical applications.

In some cases, the entire goal of an algorithm is to find a “useful” linear combination of a set of vectors. The vectors are the building blocks (often a vector space or subspace basis), and the set of linear combinations are the legal ways to combine the blocks. Simpler blocks admit easier and more efficient algorithms, but their linear combinations are less expressive. Hence, a tradeoff.

A concrete example is regression. Most people think of regression in terms of linear regression. You’re looking for a linear function like $y = mx+b$ that approximates some data well. For multiple variables, you have, e.g., $\mathbf{x} = (x_1, x_2, x_3)$ as a vector of input variables, and $\mathbf{w} = (w_1, w_2, w_3)$ as a vector of weights, and the function is $y = \mathbf{w}^T \mathbf{x} + b$.

To avoid the shift by $b$ (which makes the function affine instead of purely linear; formulas of purely linear functions are easier to work with because the shift is like a pesky special case you have to constantly account for), authors often add a fake input variable $x_0$ which is always fixed to 1, and relabel $b$ as $w_0$ to get $y = \mathbf{w}^T \mathbf{x} = \sum_i w_i x_i$ as the final form. The optimization problem to solve becomes the following, where your data set to approximate is $\{ \mathbf{x}_1, \dots, \mathbf{x}_k \}$.

$\displaystyle \min_w \sum_{i=1}^k (y_i - \mathbf{w}^T \mathbf{x}_i)^2$

In this case, the function being learned—the output of the regression—doesn’t look like a linear combination. Technically it is, just not in an interesting way.

It becomes more obviously related to linear combinations when you try to model non-linearity. The idea is to define a class of functions called basis functions $B = \{ f_1, \dots, f_m \mid f_i: \mathbb{R}^n \to \mathbb{R} \}$, and allow your approximation to be any linear combination of functions in $B$, i.e., any function in the span of B.

$\displaystyle \hat{f}(\mathbf{x}) = \sum_{i=1}^m w_i f_i(\mathbf{x})$

Again, instead of weighting each coordinate of the input vector with a $w_i$, we’re weighting each basis function’s contribution (when given the whole input vector) to the output. If the basis functions were to output a single coordinate ($f_i(\mathbf{x}) = x_i$), we would be back to linear regression.

Then the optimization problem is to choose the weights to minimize the error of the approximation.

$\displaystyle \min_w \sum_{j=1}^k (y_j - \hat{f}(\mathbf{x}_j))^2$

As an example, let’s say that we wanted to do regression with a basis of quadratic polynomials. Our basis for three input variables might look like

$\displaystyle \{ 1, x_1, x_2, x_3, x_1x_2, x_1x_3, x_2x_3, x_1^2, x_2^2, x_3^2 \}$

Any quadratic polynomial in three variables can be written as a linear combination of these basis functions. Also note that if we treat this as the basis of a vector space, then a vector is a tuple of 10 numbers—the ten coefficients in the polynomial. It’s the same as $\mathbb{R}^{10}$, just with a different interpretation of what the vector’s entries mean. With that, we can see how we would compute dot products, projections, and other nice things, though they may not have quite the same geometric sensibility.

These are not the usual basis functions used for polynomial regression in practice (see the note at the end of this article), but we can already do some damage in writing regression algorithms.

## A simple stochastic gradient descent

Although there is a closed form solution to many regression problems (including the quadratic regression problem, though with a slight twist), gradient descent is a simple enough solution to showcase how an optimization solver can find a useful linear combination. This code will be written in Python 3.9. It’s on Github.

from typing import Callable, Tuple, List

Input = Tuple[float, float, float]
Coefficients = List[float]
Hypothesis = Callable[[Input], float]
Dataset = List[Tuple[Input, float]]


Then define a simple wrapper class for our basis functions

class QuadraticBasisPolynomials:
def __init__(self):
self.basis_functions = [
lambda x: 1,
lambda x: x[0],
lambda x: x[1],
lambda x: x[2],
lambda x: x[0] * x[1],
lambda x: x[0] * x[2],
lambda x: x[1] * x[2],
lambda x: x[0] * x[0],
lambda x: x[1] * x[1],
lambda x: x[2] * x[2],
]

def __getitem__(self, index):
return self.basis_functions[index]

def __len__(self):
return len(self.basis_functions)

def linear_combination(self, weights: Coefficients) -> Hypothesis:
def combined_function(x: Input) -> float:
return sum(
w * f(x)
for (w, f) in zip(weights, self.basis_functions)
)

return combined_function



The linear_combination function returns a function that computes the weighted sum of the basis functions. Now we can define the error on a dataset, as well as for a single point

def total_error(weights: Coefficients, data: Dataset) -> float:
hypothesis = basis.linear_combination(weights)
return sum(
(actual_output - hypothesis(example)) ** 2
for (example, actual_output) in data
)

def single_point_error(
weights: Coefficients, point: Tuple[Input, float]) -> float:
return point[1] - basis.linear_combination(weights)(point[0])


We can then define the gradient of the error function with respect to the weights and a single data point. Recall, the error function is defined as

$\displaystyle E(\mathbf{w}) = \sum_{j=1}^k (y_j - \hat{f}(\mathbf{x}_j))^2$

where $\hat{f}$ is a linear combination of basis functions

$\hat{f}(\mathbf{x}_j) = \sum_{s=1}^n w_s f_s(\mathbf{x}_j)$

Since we’ll do stochastic gradient descent, the error formula is a bit simpler. We compute it not for the whole data set but only a single random point at a time. So the error is

$\displaystyle E(\mathbf{w}) = (y_j - \hat{f}(\mathbf{x}_j))^2$

Then we compute the gradient with respect to the individual entries of $\mathbf{w}$, using the chain rule and noting that the only term of the linear combination that has a nonzero contribution to the gradient for $\frac{\partial E}{\partial w_i}$ is the term containing $w_i$. This is one of the major benefits of using linear combinations: the gradient computation is easy.

$\displaystyle \frac{\partial E}{\partial w_i} = -2 (y_j - \hat{f}(\mathbf{x}_j)) \frac{\partial \hat{f}}{\partial w_i}(\mathbf{x}_j) = -2 (y_j - \hat{f}(\mathbf{x}_j)) f_i(\mathbf{x}_j)$

Another advantage to being linear is that this formula is agnostic to the content of the underlying basis functions. This will hold so long as the weights don’t show up in the formula for the basis functions. As an exercise: try changing the implementation to use radial basis functions around each data point. (see the note at the end for why this would be problematic in real life)

def gradient(weights: Coefficients, data_point: Tuple[Input, float]) -> Gradient:
error = single_point_error(weights, data_point)
dE_dw = [0] * len(weights)

for i, w in enumerate(weights):
dE_dw[i] = -2 * error * basis[i](data_point[0])

return dE_dw


Finally, the gradient descent core with a debugging helper.

import random

data: Dataset,
learning_rate: float,
tolerance: float,
training_callback = None,
) -> Hypothesis:
weights = [random.random() * 2 - 1 for i in range(len(basis))]
last_error = total_error(weights, data)
step = 0
progress = tolerance * 2

if training_callback:
training_callback(step, 0.0, last_error, 0.0)

while abs(progress) > tolerance or grad_norm > tolerance:

for i in range(len(weights)):

error = total_error(weights, data)
progress = error - last_error
last_error = error
step += 1

if training_callback:

return basis.linear_combination(weights)


Next create some sample data and run the optimization

def example_quadratic_data(num_points: int):
def fn(x, y, z):
return 2 - 4*x*y + z + z**2

data = []
for i in range(num_points):
x, y, z = random.random(), random.random(), random.random()
data.append(((x, y, z), fn(x, y, z)))

return data

if __name__ == "__main__":
data,
learning_rate=0.01,
tolerance=1e-06,
training_callback=print_debug_info
)


Depending on the randomness, it may take a few thousand steps, but it typically converges to an error of < 1. Here’s the plot of error against gradient descent steps.

## Kernels and Regularization

I’ll finish with explanations of the parentheticals above.

The real polynomial kernel. We chose a simple set of polynomial functions. This is closely related to the concept of a “kernel”, but the “real” polynomial kernel uses slightly different basis functions. It scales some of the basis functions by $\sqrt{2}$. This is OK because a linear combination can compensate by using coefficients that are appropriately divided by $\sqrt{2}$. But why would one want to do this? The answer boils down to a computational efficiency technique called the “Kernel trick.” In short, it allows you to compute the dot product between two linear combinations of vectors in this vector space without explicitly representing the vectors in the space to begin with. If your regression algorithm uses only dot products in its code (as is true of the closed form solution for regression), you get the benefits of nonlinear feature modeling without the cost of computing the features directly. There’s a lot more mathematical theory to discuss here (cf. Reproducing Kernel Hilbert Space) but I’ll have to leave it there for now.

What’s wrong with the radial basis function exercise? This exercise asked you to create a family of basis functions, one for each data point. The problem here is that having so many basis functions makes the linear combination space too expressive. The optimization will overfit the data. It’s like a lookup table: there’s one entry dedicated to each data point. New data points not in the training would be rarely handled well, since they aren’t in the “lookup table” the optimization algorithm found. To get around this, in practice one would add an extra term to the error corresponding to the L1 or L2 norm of the weight vector. This allows one to ensure that the total size of the weights is small, and in the L1 case that usually corresponds to most weights being zero, and only a few weights (the most important) being nonzero. The process of penalizing the “magnitude” of the linear combination is called regularization.

### Joe Marshall

#### Early LISP

In AI Memo 8 of the MIT Research Laboratory of Electronics (March 4, 1959), John McCarthy gives a definition of the universal S-function apply:

     apply is defined by
apply[f;args]=eval[combine[f;args]]
eval is defined by
eval[e]=[
first[e]=NULL→[null[eval[first[rest[e]]]]→T;1→F]
first[e]=ATOM→[atom[eval[first[rest[e]]]]→T;1→F]
first[e]=EQ→[eval[first[rest[e]]]=eval[first[rest[rest[e]]]]→T;
1→F]
first[e]=QUOTE→first[rest[e]];
first[e]=FIRST→first[eval[first[rest[e]]]];
first[e]=REST→rest[eval[first[rest[e]]];
first[e]=COMBINE→combine[eval[first[rest[e]]];eval[first[rest[rest
[e]]]]];
first[e]=COND→evcon[rest[e]]
first[first[e]]=LAMBDA→evlam[first[rest[first[e]]];first[rest[rest
[first[e]]]];rest[e]];
first[first[e]]=LABELS→eval[combine[subst[first[e];first[rest
[first[e]]];first[rest[rest[first[e]]]]];rest[e]]]]
where: evcon[c]=[eval[first[first[c]]]=1→eval[first[rest[first[c]]]];
1→evcon[rest[c]]]
and
evlam[vars;exp;args]=[null[vars]→eval[exp];1→evlam[
rest[vars];subst[first[vars];first[args];exp];rest[args]]]
McCarthy asserts that “if f is an S-expression for an S-function φ and args is a list of the form (arg1, …, argn) where arg1, ---, argn are arbitrary S-expressions then apply[f,args] and φ(arg1, …, argn) are defined for the same values of arg1, … argn and are equal when defined.”

I find it hard to puzzle through these equations, so I've transcribed them into S-expressions to get the following:

;;; Hey Emacs, this is -*- Lisp -*-

(in-package "CL-USER")

;; Don't clobber the system definitions.

(defun apply (f args)
(eval (combine f args)))

(defun eval (e)
(cond ((eq (first e) 'NULL)    (cond ((null (eval (first (rest e)))) t)
(1 nil)))
((eq (first e) 'ATOM)    (cond ((atom (eval (first (rest e)))) t)
(1 nil)))
((eq (first e) 'EQ)      (cond ((eq (eval (first (rest e)))
(eval (first (rest (rest e))))) t)
(1 nil)))
((eq (first e) 'QUOTE)   (first (rest e)))
((eq (first e) 'FIRST)   (first (eval (first (rest e)))))
((eq (first e) 'REST)    (rest  (eval (first (rest e)))))
((eq (first e) 'COMBINE) (combine (eval (first (rest e)))
(eval (first (rest (rest e))))))
((eq (first e) 'COND)    (evcon (rest e)))
((eq (first (first e)) 'LAMBDA) (evlam (first (rest (first e)))
(first (rest (rest (first e))))
(rest e)))
((eq (first (first e)) 'LABELS) (eval (combine (subst (first e)
(first (rest (first e)))
(first (rest (rest (first e)))))
(rest e))))))

(defun evcon (c)
(cond ((eval (first (first c))) (eval (first (rest (first c)))))
(1 (evcon (rest c)))))

(defun evlam (vars exp args)
(cond ((null vars) (eval exp))
(1 (evlam (rest vars)
(subst (first args)
(first vars)
exp)
(rest args)))))
We just have to add a definition for combine as a synonym for cons and this should run:
* (eval '(eq (first (combine 'a 'b) (combine 'a 'c))))
T

As Steve “Slug” Russell observed, eval is an interpreter for Lisp. This version of eval uses an interesting evaluation strategy. If you look carefully, you'll see that there is no conditional clause for handling variables. Instead, when a lambda expression appears as the operator in a combination, the body of the lambda expression is walked and the bound variables are substituted with the expressions (not the values!) that represent the arguments. This is directly inspired by β-reduction from lambda calculus.

This is buggy, as McCarthy soon discovered. In the errata published one week later, McCarthy points out that the substitution process doesn't respect quoting, as we can see here:

* (eval '((lambda (name) (combine 'your (combine 'name (combine 'is (combine name nil))))) 'john))
(YOUR 'JOHN IS JOHN)
With a little thought, we can easily generate other name collisions. Notice, for example, that the substitution will happily substitute within the bound variable list of nested lambdas.

Substitution like this is inefficient. The body of the lambda is walked once for each bound variable to be substituted, then finally walked again to evaluate it. Later versions of Lisp will save the bound variables in an environment structure and substitute them incrementally during a single evaluation pass of the lambda body.

## Friday, March 26, 2021

### GNU Guix

#### Getting bytes to disk more quickly

Let’s face it: functional package managers like Guix provide unequaled support for reproducibility and transactional upgrades, but the price to pay is that users often spend a fair amount of time downloading (or building) packages. Download times are okay on day-to-day use but they’re a pain point when performing large upgrades or when installing Guix System for the first time.

With that in mind, it should come as no surprise that Michael Stapelberg’s excellent 2020 Arch Linux Conference talk and the installation speed achieved by distri were a great motivation boost. Michael proposes radical ideas to speed up package installation, such as downloading and mounting ready-to-use SquashFS images. Not everything can be transposed to Guix as-is, but it certainly got us thinking.

This article dives into improvements made over the last few months that will be in the upcoming 1.2.1 release, and which are already one guix pull away; they all contribute to making substitute download and installation “faster”. This is an evolution of the existing mechanisms rather than a revolution, but one that users will surely welcome.

# Reducing latency

One of the first things we notice is latency between subsequent substitute downloads. It may sound ridiculous, but guix-daemon would spawn one helper process (running the internal guix substitute command) for each substitute download. That means that not only would it create a new process each time, it would also be unable to reuse connections to the substitute servers. This is particularly noticeable and wasteful when downloading many substitutes in a row, such as when installing Guix System for the first time.

The latency in between subsequent substitute downloads was primarily this: fork (creating a new process), TCP connection establishment and TLS connection handshake. When running perf timechart record guix build … (what a wonderful tool perf timechart is!), we get this Gantt diagram, which gives an idea of all the time wasted creating these processes, waiting for them to initialize and connect to the substitute server, and so on:

Why was it done this way? Because the daemon, written in C++ and inherited from Nix, would historically delegate substitution to helper programs, with a flexible but naive protocol between the daemon and those helper programs.

Back in December, we tackled this issue (followup): the daemon would now launch a single guix substitute process and reuse it for subsequent substitute downloads. In turn, guix substitute would cache connections so we save time on connection establishment.

Another observation we made is that guix-daemon implemented post-processing steps after each download that would contribute to latency. Once guix substitute had completed a download and extracted the archive to the store, the daemon would traverse this store item a couple of times to reset file timestamps/permissions (“metadata canonicalization”), to deduplicate files, and to verify the integrity of the whole store item. This is I/O-intensive and particularly wasteful for store items with many files. Our next step was to delegate all this work to guix substitute, which allows it to pipeline archive extraction, integrity checks, deduplication, and metadata canonicalization. All this happens as the bytes flow in and no extra traversal is needed.

The speedup offered by these optimizations depends on several factors, including the latency of your network, the speed of your CPU and that of your hard disk, and it grows with the number of substitutes fetched in a row. What we can say is that even in a “favorable” situation—low-latency network, fast CPU, fast storage device—it definitely feels snappier.

# Increasing bandwidth

With the latency issue pretty much solved, the next step was to look at bandwidth. When we introduced lzip-compressed substitutes back in 2019, the assumption was that a much higher compression ratio (compared to gzip), would inevitably translate to faster downloads. That is true… but only for some bandwidth/CPU power configurations.

Specifically, it turns out that, for someone with a fast connection, such as fiber-to-the-home (FTTH), downloads of lzip-compressed substitutes are actually CPU-bound. In other words, the limiting factor is the processing time to decompress those lzip’d archives—not the available bandwidth. Lzip currently achieves the best compression ratios so far, and that’s great, but decompression is compute-intensive.

Ever-increasing bandwidth is what drove the design and implementation of newer compression methods, such as “Z standard”, colloquially known as “zstd”. The decompression speed of zstd is on the order of 3.5 higher than that of gzip and an order of magnitude higher than that of lzip; but unlike the venerable lzo, zstd achieves high compression ratios: at level 19, the compression ratio of zstd is lower than that of lzip but in the same ballpark. Guillaume Le Vaillant provided an insightful comparison of gzip, lzip, and zstd on a plot showing their decompression speed as a function of the compression ratio:

There are several takeaways. First, zstd decompression is always faster than the alternatives. Second, zstd compresses better than gzip starting from level 2, so gzip “loses” on both criteria. Last, zstd at level 19 achieves a compression ratio comparable to lzip level 5 or 6—lzip level 9 remains better in that regard.

With brand new Guile bindings to zstd, we were able to add zstd support to guix publish and guix substitute. But what policy should be adopted for the official substitute server at ci.guix.gnu.org?

Our goal is to maximize substitute download speed. In some configurations, for example on relatively slow connections, fetching lzip substitutes is not CPU-bound; in those cases, fetching lzip substitutes remains the fastest option. And of course, there are these other configurations where lzip decompression is the limiting factor, and where we’d rather fetch a slightly less compressed zstd substitute (or even a much less compressed gzip substitutes, sometimes).

The solution we came up with is a naive but efficient solution: client-side adaptive compression selection. When it fetches and decompresses a substitute, guix substitute monitors its CPU usage as reported by times. If the user time to wall-clock time ratio is close to one, that probably means the substitute fetch and decompression process was CPU-bound; in that case, choose a compression method that yields faster decompression next time—assuming the substitute server offers several options. Conversely, if CPU usage is close to zero, the process is probably network-bound, so next time we’ll choose a better-compressed option.

The official server at ci.guix.gnu.org now provides zstd-compressed substitutes in addition to lzip and gzip. An extra complication is that we cannot drop gzip compression just yet because pre-1.1.0 Guix installations understand nothing but gzip. To ease transition, the server will probably offer substitutes in all three compression formats for one more year or so.

There’s a good reason to upgrade your daemon though: you’ll be able to benefit from those zstd substitutes and if you have high bandwidth, you’ll quickly see the difference!

# Grabbing substitutes from your neighbors

When it comes to increasing download speeds, another option is to download from your neighbors rather than from a server far away. This is particularly useful at the workplace, where machines around you are likely to already have the store items you’re looking for, or, say, at Guix gatherings—at least when we can meet physically again…

The idea had been floating around for some time as a direct benefit of reproducible builds and co-maintainer Mathieu Othacehe recently implemented it. As Mathieu explains in his blog post, guix publish can now advertise itself on the local network using the mDNS/DNS-SD protocol via Avahi; when guix-daemon is passed the --discover option, guix substitute automatically discovers local substitute servers and adds them as the preferred download location. The Guix System installation image even allows you to enable it to speed up the installation process:

Again, that only speeds things up if substitute servers use a compression method with fast decompression, and with either a cache or fast compression if they compress things on the fly. Zstd comes in handy for that: Guillaume’s measurements show that zstd compression is inexpensive at low compression levels, while achieving higher compression ratios than gzip.

# Going further

Put together, these changes have the potential to noticeably improve user experience. But as I wrote in the introduction, it’s not revolutionary either—users still have to download all these things.

The next big step might come from fetching substitutes over peer-to-peer, content-addressed networks such as IPFS or GNUnet. Their content-addressed nature could allow users to download less. The performance characteristics of these networks is less clear though.

There is still value in a plain HTTP-based substitute protocol like the one currently used that is easy to set up, though. In that spirit, an option would be to “upgrade” the existing substitute protocol to take advantage of content-addressability. After all, the daemon already performs file-level deduplication and there’s a fair amount of identical files between subsequent builds of the same package. So… what if we only downloaded those files not already available locally? This idea is appealing but we need to go beyond the prototyping phase to get a better idea of its viability.

For completeness, another option currently investigated by the Nix developers is that of “content-addressed derivations”. While currently store file names contain a hash of the inputs used to produce them, the idea of content-addressed derivations is to make it a content hash; that way, if an input change has no effect on the build result, the output is the same and nothing needs to be re-downloaded (this is what Eelco Dolstra described as the intensional model in Chapter 6 of his seminal PhD thesis). This option is appealing, also for other reasons, but it’s a fundamental change with what looks like a high implementation complexity and transition cost. We have yet to gauge the pros and cons of following this approach.

Until then, we hope Guix 1.2.1 will bring your faster substitutes and happiness!

GNU Guix is a transactional package manager and an advanced distribution of the GNU system that respects user freedom. Guix can be used on top of any system running the Hurd or the Linux kernel, or it can be used as a standalone operating system distribution for i686, x86_64, ARMv7, and AArch64 machines.

In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. When used as a standalone GNU/Linux distribution, Guix offers a declarative, stateless approach to operating system configuration management. Guix is highly customizable and hackable through Guile programming interfaces and extensions to the Scheme language.

## Wednesday, March 24, 2021

### Scheme Requests for Implementation

#### SRFI 224: Integer Mappings

SRFI 224 is now in draft status.

Integer maps, or imappings, are finite sets, where each element is an association between an exact-integer key and an arbitrary Scheme object. They are similar to the general mappings of SRFI 146, but the restricted key-type allows implementations of imappings to benefit from optimized structures and algorithms. This library provides a rich set of operations on imappings, including analogues of most of the forms provided by SRFI 146. Imappings have no intrinsic order, but may be treated as ordered sets, using the natural ordering on keys; a substantial sublibrary for working with imappings in this fashion is included.