Planet Scheme

Thursday, January 16, 2025

GNU Guix

Guix User and Contributor Survey 2024: The Results (part 1)

The results from the Guix User and Contributor Survey (2024) are in! This is the first time the Guix community has run this type of survey, and we're excited to share the results. The goal of the survey was to collect the views of both users and contributors, understanding how people adopt Guix, what they love and they're experiences contributing to the project.

There were 943 full responses to the survey, of this 53% were users and 32% were contributors. The table of survey participants is as follows:

Table 1: Participant breakdown
CategoryCountPercentage
User49652.60
Contributor29731.50
Previous user929.76
Previous contributor586.15

First, thank-you to everyone who made the effort to fill out the survey. For a volunteer community project it's fantastic to see over 900 people took part. It's notable that 150 people took the survey who were previous users or contributors — it's really great that people are willing to make this effort to share their experiences — thanks so much!

With this many participants we can see the range of view points and experience across our whole community, many of the comments were enlightening and are worth reading. There are links in many of the questions so anyone that's interested can go through them.

As the results are extensive I've split them into three separate posts, in this post we'll focus on the first 10 questions of the survey which focused on how users learnt about Guix and their experiences adopting it.

User backgrounds and experience

The survey started by asking participants, How knowledgeable a Linux are you? (Q1).

Table 2: Participant's Linux knowledge
CategoryCountPercentage
Beginner (e.g. just getting started)182%
Explorer (e.g. comfortable installing it and using graphical apps)182%
Intermediate (e.g. comfortable with the command-line and configuring many aspects)44547%
Advanced (e.g. you correct the Arch Linux Wiki!)24826%
Expert (e.g. able to contribute to large Free Software projects!)21222%
No answer20.21%

Note that all the percentages in this table, and throughout the posts are rounded to make them easier to refer to.

Figure 1 shows this graphically:

2024 Guix user survey: GNU/Linux knowledge graph
Figure 1: Survey participants GNU/Linux knowledge

The next question (Q2) was, How long have you been using Guix?

Table 3: Guix experience
CategoryCountPercentage
Less than 1 year24526%
Between 1 and 2 years21823%
Between 2 and 4 years23425%
More than 4 years16017%
I've stopped using Guix839%
No answer30.3%

Figure 2 shows these results as a bar chart:

2024 Guix user survey: GNU Guix experience graph
Figure 2: Survey participants GNU Guix experience

These two questions already tell us some interesting things about Guix users:

  • Guix users generally have a lot of Linux experience: 50% said they were Intermediates who were "comfortable with the command-line and configuring many aspects". A further 26% said they were Advanced, and 22% said they were experts.
  • Conversely, very few users (~4%) are beginners or exploring Linux users.
  • Many Guix users are new to Guix itself.
  • Guix's user-base is growing! Almost 75% of the user-base are recent converts to Guix, having used it for less than 4 years.
  • It's a similar distribution of users to Nix's. Their 2024 survey showed dramatic growth (~65%) in users from 0-2 years, Guix's is 49%.
  • It's fantastic to see new users are exploring and trying out Guix.
  • Unfortunately, 9% of users are no longer using Guix, but care enough to fill out the survey - so what can be done to help them come back?!

Adopting Guix

The next few questions explored how participants adopted Guix. It's important that new users have a great adoption experience so they'll keep using Guix. Conversely, if the initial experience is too difficult, they may simply move onto something else without seeing it's benefits!

The first question asked, (Q4) Why were you initially interested in Guix?

This question tells us what users had heard about Guix, and what they discovered during their initial investigation. The answers could impact how the project talks about Guix's strengths and capabilities.

For this question users could select more than one answer and many did so. The most selected choice was "Declarative configuration" where 82% of participants were interested in Guix because it had this quality. The option "Scheme, Guile, and Lisp are cool" was second, where 72% of the survey's participants were intrigued by Guix because of this aspect. The "Reproducibility" choice came third with 70% interested in this capability. The detailed results were:

Table 4: Reason for adopting Guix
CategoryCountPercentage
Reliability and transactions53757%
Declarative configuration77282%
Reproducibility65870%
Reproducible scientific workflows19921%
Fresh packages with latest versions20722%
Scheme, Guile and Lisp are cool67772%
Friendly community25627%
FSF certified project (100% Free Software)40443%
Alternative architectures (e.g. ARM)9010%
GNU Hurd12213%
Package management on another Linux distribution31934%
As a tool for packaging my own software26728%

There were 110 choices of 'Other' where participants could add their own comments, they're all available to read. Looking through them some themes came through:

  • Development environments:
    • "General solution to rvm,pyenv etc"
    • "As a Docker replacement for software development"
  • Documentation:
    • "Initial interest in Nix, but hearing about Guix having more pleasant documentation also swayed me towards using Guix instead"
    • "Documentation (not exhaustive but well-structured), simplicity of the CLI"
  • Free Software & GNU:
    • "The possibility of releasing the GNU operating system version 1.0
    • "100% free software yes, FSF no (FSFE are fine)"
    • "Being a GNU project helped me decide between Guix and Nix."
  • Use for Continuous Integration:
    • "used for CI, replacing docker with free software and user control"
  • Sandboxes and security:
    • "Sandbox environment"
    • "Security: containerized environments integrated in the OS."
  • Package definitions:
    • "Writing packages for GNU Guix seemed more intuitive than for Gentoo Linux (Guix's hashes > Gentoo's slots)"
    • "Ease of packaging"
  • An alternative to Nix:
    • "Wanted to check out alternatives to Nix. Particularly interested in 1) grafting, 2) measures against ld.so stat storm, 3) performant guix packs without proot"
    • "Use Nix a lot, want to explore that design space more"
  • Guile Scheme and Lisp:
    • "One language for everything"
    • "Not nixlang"
    • "homogeneity of the configuration (one language for everything)"
  • Full source:
    • "Full Source Bootstrap & Strict Policy to compile all software from source"
    • "Full source auditability"

The next question the survey asked was, Which aspect of Guix did you initially adopt? (Q5). This is users initial entry point into using Guix.

The detailed results were:

Table 5: Initial aspect of Guix adopted
CategoryCountPercentage
Package manager on top of another Linux distro (guix package)33636%
Dotfiles and home environment management on another Linux distro (guix home)414%
Isolated development and runtime environments on another Linux distro (guix shell)586%
GNU/Linux distro as a graphical desktop (guix system)43446%
GNU/Linux distro as a server (guix system)475%
As a software build and deployment tool (guix image, guix package or guix deploy)162%
Other91%

Figure 3 shows this as a bar chart:

2024 Guix user survey: GNU Guix adoption bar chart
Figure 3: Guix initial adoption aspect

The summary is that almost 50% of users initially experienced Guix as a GNU/Linux distro: 44% in a graphical desktop configuration and a further 5% in a server configuration. Just over a third of users (36%) initial experience Guix as a package manager on top of another Linux distro. I found this surprising as I'd expected most users to use Guix as a hosted package manager first, what an interesting result! We can also see there's lots of room to develop Guix Home as an adoption path.

Adoption challenges

Adopting any new technology is difficult and time-consuming, so discovering what elements users find difficult is important. Q7 delved into this by asking, What were the biggest challenges getting started with Guix?

The results were:

Table 6: Adoption challenges
CategoryCountPercentage
Installing Guix as a package manager on a GNU/Linux distribution808%
Installing Guix System as a full Linux distribution23625%
Level of Linux knowledge needed to use Guix10211%
Difficulties with the reference material (i.e. the manual)23625%
Shortage of how-to tutorials and videos29732%
Shortage of examples (e.g. examples of usage)43146%
Inexperience with Lisp syntax and/or Guile Scheme37440%
Differences between Guix's approach and other Linux distros32134%
It was so long ago I can't possibly remember!445%
Other21823%

Figure 4 shows this as a bar chart:

2024 Guix user survey: GNU Guix adoption challenges bar chart
Figure 4: Guix adoption challenges

As we can see the biggest challenge is a Shortage of examples (e.g examples of usage). And, if we consider shortage of how-to tutorials (32%) to be similar then overall we can see there's a clear need for focused goal-orientated documentation with examples. Inexperience with Lisp syntax and or Guile Scheme and Differences between Guix's approach and other Linux distros both speak to the unique nature of Guix and the approach it takes: perhaps there are implications for how Guix's tooling can make initial adoption as easy as possible.

There were 218 comments, which are worth reading through. I've summarised them into broad themes:

  • Conceptual complexity: comments about the overall knowledge required being too much. Examples are:
    • "Understanding the concepts on which guix runs"
    • "managing storage space, generations, GC roots, profiles; generally grasping the concepts"
    • "Some interesting free software is only available for other distros, it's hard to adapt to a system without file system hierarchy"
  • Lack of drivers: issues caused by drivers not being available. Examples are:
    • "can't really use linux-libre on the machine I installed it on (lack drivers)"
    • "Getting an initial installation with working non-free wifi"
    • "hiding nonguix"
  • Efficiency: comments regarding overall resource usage making Guix slow or unusable. Example comments are:
    • "The evaluation of Guix is slow and resource-intensive. My laptop was no match for it, I had to change it."
    • "Guix experimentation is still too slow. Make experimenting faster for new users by identifying rate limiting steps and speeding them up"
    • "Slow network when download guix substitute"
  • Missing packages and services: issues where Guix doesn't contain a required package or service.
    • "missing packages I needed and getting them upstreamed after I packaged them"
    • "Unpackaged free software, and nonfree software"
    • "Coming from Nix: smaller, less up-to-date package set, substantially fewer home services"
  • Quality and reliability: issues of quality and reliability that made Guix difficult to use. Some comments:
    • "hard time fixing config errors with reports"
    • "Broken integration between some components (packages and services)"
    • "Basic setup is pretty easy on paper, but in practice sometimes it breaks my system and I need to fiddle with shell profiles and environment variables and installing extra packages to get Guix programs play nice with native programs. And I feel like this kind of breakage isn't acknowledged or addressed enough."
  • Practical guides, how-to's and examples: situations where a lack of direct instructions or examples made Guix difficult to use.
    • "Guix-unique bugs and issues that I can't find an answer to online"
    • "Lack of docs mostly, common patterns, the fact that's it's a pain the butt to make things works for some ecosystems on the Guix distro (e.g any app written in Golang, Rust, JS,TS..)"
  • Error messages: poor experience caused by error messages that are difficult to understand. Example comments:
    • "Horrible error messages"
    • "Difficult guile scheme error messages!!"
    • "Hard-to-understand error messages"
  • Configuring on a hosted distribution): issues caused when using Guix on top of another distribution. Some comments:
    • "I found the setting of numerous variables and the comments recommending I do so contradictory and so confusing"
    • "SELinux blocked installation of packages: remount"
    • "Problems using it on a foreign distro. Guix Home particularly assumes that you are using guix system, I had to tweak the .profile a lot to get it working."
  • Encrypted boot / LUKS: encryption in various forms unavailable or missing certain features:
    • "Very poor support for full disk encryption."
    • "Also using a LUKS encrypted root file-system was a challenge at the time i started Guix"
  • Language ecosystems (e.g. Rust, PHP): issues due to missing packages, or attempts to package, from certain language ecosystems.
    • "Missing packages, and the difficulty of packaging rust or npm packages on guix dissuaded me from contributing them"
  • Mac availability: situations where being unavailable on Mac meant Guix could not be adopted.
    • "Linux only. nix has macos support too which would help adoption in a team environment."
    • "No MacOS official distribution"

Adoption satisfaction score

The survey asked (Q6), How satisfied were you with your experience adopting Guix?

This question explores the users overall satisfaction with the initial steps of researching, installing and initially using Guix. The question asked the participant to score their satisfaction on one of 5 levels.

Table 7: Guix adoption satisfaction
CategoryCountPercentage
Very Dissatisfied222%
Dissatisfied11312%
Neutral15416%
Satisfied40843%
Very Satisfied22624%
Can't remember202%

See Figure 5 for a visual representation:

2024 Guix user survey: GNU Guix initial adoption satisfaction score
Figure 5: Guix initial adoption satisfaction bar chart

This is probably the most important question in the entire survey when it comes to growing the number of Guix users. Overall, it's positive with Very Satisfied (24%) and Satisfied (43%) meaning that the majority of users are happy with their initial experience. The comments above show there's lots of room to find small ways to move users initial experience from Satisfied to being overjoyed! Unfortunately, on the other end of the scale 14% of users who were unhappy and the 16% neutral show some of the bigger challenges!

Which GNU/Linux distribution do you use Guix on?

As we saw earlier just over a third of users (36%) initial adopt Guix as a package manager on top of another GNU/Linux distribution. Question 8 asked, Which GNU/Linux distribution did you use Guix on top of?

The results:

Table 8: Hosting Linux distributions
CategoryCountPercentage
Alpine Linux90.95%
Arch Linux818.59%
Fedora Linux333.50%
Gentoo Linux192.01%
NixOS222.33%
Ubuntu11111.77%
Other17018.03%

I errored when creating this question and somehow missed out Debian! Over 117 answers in the 'Other' category said Debian so it's the most popular distribution to use Guix on, Ubuntu is second (111) and then Arch Linux was third (81). There were also plenty of mentions of OpenSUSE, RHEL/CentOS and Void Linux.

Why did you stop using Guix?

Question 9 was targeted at those that had previously used Guix but had stopped. It asked, You previously used Guix but stopped, why?

This was a comment question and we got some fantastic answers. There were 147 comments from participants, which lines up well with the 150 people who took the survey and classed themselves as a 'Previous user' or 'Previous contributor'.

This was a free form text answer, the full comment are well worth a read through . As before I've clustered the comments into themes:

  • Complexity of maintenance too high: many commented that the overall experience of using Guix was too time-consuming and complex. A slow configuration feedback loop, inefficiency, and the overall maintenance burden were all concerns. Example comments:
    • "I needed to switch to a distribution that required less of my attention when I started my new job. I switched to NixOS with the intention of going back to Guix at a later date, but I am now reliant on so many parts of the nix ecosystem that I don't think I'll ever actually switch back."
    • "I was doing more work trying to make my setup perfect or fix issues with it rather than working on my other projects. A lot of things with my setup either broke with time or were just not compatible (My setup couldn't handle printing, screen sharing, audio, suspending/hibernation and I just didn't know how to fix all that) and I couldn't deal with it any longer, I simply went back to whatever worked for me earlier."
  • Learning curve too difficult: many aspects of Guix are completely different from how other distributions achieve the same result. In some instances this learning curve was too difficult and/or there was not enough assistance. Example comments:
    • "Mainly the learning curve is huge for a long-time *nix systems user. I knew it would be difficult to adapt, but for each and every little thing I would need to go dig how to fix something. Doing proper power management on my laptop, setting up mail (I've been using Gnus for years, but still...!), compile and test mainline kernels on my laptop, etc. It's awesome to learn all those things, but they all require time. And that's where I had to give up: I wanted a (reliable) system I could use for my day-to-day work, Guix would be great... if I could spend a few weeks only learning it (and Lisp!)."
    • "But the problem ends up to be that the whole ecosystem around guix basically assumes super knowledge about what scheme is, how to use it and worse of all deep comfort and will to use emacs as the main interface to it all. It's too high of a hurdle to dedicate when just wanting to write some files, evaluate them, declare some packages, shells, etc. I have zero interest and will to use or learn emacs and putting it so much upfront does a huge disservice to the whole project."
  • Lack of drivers within the distribution: the lack of drivers to enable hardware was the most commented on specific issue. Some examples of those comments:
    • "As a long time Arch user I found it difficult to configure Guix for daily use. I need proprietary video drivers (and possibly other bits to get everything working?) and I don't remember if I ever got those up and running."
    • "I have a lot of respect for the technical side of the project, but the politics of free software absolutism (to the point where we are supposed to tell people to replace perfectly functional hardware in order to use Guix, instead of telling them about Nonguix) and the user hostile email based contribution workflow made me realize Guix would likely never reach critical mass, so my time is best applied elsewhere."
  • Unavailable proprietary software: proprietary software not being available was also mentioned (not quite as much as drivers), often in comments that focused on Guix not being practical as a distribution for professional use. Some specific comments:
    • "Lack of proprietary software, primarily CUDA, MKL, etc."
    • "Although I like FSF license purity, NixOS was much more amenable to get working on various hardware & did not preclude using Nvidia CUDA."
  • Efficiency and resource usage: there were comments about guix pull taking too long, whether this was actually the fault of Guix pull locally or remote servers, the overall experience was mentioned multiple times. Some example comments were:
    • "The core tooling was far too slow (e.g. pulling updates, etc.); Nix is slow, but nowhere near as slow as Guix (was back then, but I'm not aware of the kind of order of magnitude improvements that would have been required). Core functionality was not reliable enough for a server operating system (shepherd, logging, system rollback). Arcane contribution requirements (no provisions for non-Emacs users, e.g. regarding code formatting; baroque and counterproductive changelog and commit factoring requirements); I didn't mind the email/patch based workflow btw"
    • "Guix pull is too slow. The guix ci servers are inaccessible from my location, requiring a proxy. Guix System does not have a large enough community to be reliable and universal enough for daily use (in my opinion)"
  • Missing packages and services: there were lots of comments about both missing packages or services and this making it difficult to use Guix. Example comments:
    • "Much of the software I needed wasn't packaged, and it eventually became frustrating. I tried to package what I could, but some things felt extremely difficult, E.g., `jujutsu` `ghc`. However unfortunate it may be, I also rely on various pieces of nonfree software, and Guix was working against me in that regard. I do not like that I have to use nonfree software, but I often have no choice."
    • "Still use to some extent as package manager on foreign distro. For desktop use, waited for usable KDE Plasma packaging, and for laptop, coverage of working builds for ARM. Hoping to return; there is progress on both of these fronts. Size of store and speed of guix pull where also issues (on limited hardware)."
  • Out of date packages: meaning that although there was a package within Guix it was lagging, with particular concern about security implications. Example comments:
    • "Outdated or absent FOSS software (ex: Gnome, KDE, etc)"
    • "Too many packages updates were lagging behind, this was raising concerns for me from a security point of view"
  • Quality and reliability: general issues with quality and reliability that undermined the users belief that the project was ready for real use. Examples:
    • "An upgrade broke the system and crippled it from booting. Moved on to other distribution"
    • "I like the whole idea of guix. But it feels like it is not really ready."
  • Guix not fully supporting disk encryption: full disk encryption in a variety of forms came up multiple times as a Guix weakness. Examples:
    • "Guix does not support an unencrypted /boot partition. But also does not fully support LUKS2 due to grub."
    • "I love Guix System, but it still misses a few quality-of-life improvements, such as better support for full disk encryption on install (entering two passwords!) and faster servers for South America. I kid you not, it takes me several hours to install a base system with MATE!"
  • Missing guides/how-to's and examples: we've already seen that lack of specific how-to documentation was an issue, there were various comments to that:
    • "Examples were insufficient, documentation expected much more in-depth linux knowledge. I would like to try again using it, as I love the concepts of it and I find that I resonate with the people representing Guix, and while I am on NixOS currently I find some social aspects of the Nix project concerning."
    • "I switched back to NixOS due to more Community support"
  • Free Software as a constraint: Free Software and GNU as an organisation were commented on as a constraint to having a practical, usable system that met user's needs. Note that the next bullet is the reverse of this. Some example comments:
    • "No ease of access to the tools I depend on without jumping through hoops. VSCode, Chrome, Discord, all required flatpaks. Gnome was extremely out of date and didn't work well with flatpaks making it even harder to use them. NVIDIA drivers unavailable. I would have to work entirely around Guix to make it usable for the real world. I can't just convince my friends to stop using Discord. I can't just convince my job to not depend on VSCode extensions. I have spent my time using VSCode Calva for my personal Clojure projects as well. I would have to spend a lot of time creating my own repository and writing guix packages for everything just to make it usable for myself. The GNU should be trying to meet users where they are to help liberate them, instead of creating an alternate reality where user needs are not addressed. This is a non-starter in the year 2024."
    • "Exclusion of all references to non-free software (and no suggested step-by-step easy setup) made a full-featured initial installation untenable."
  • Not enough GNU: there were also some comments that the Guix project was not sufficiently supportive of GNU and/or Richard Stallman:
    • "I am disappointed that you veered off the course of freedom and added nonguix. Also that you hate on RMS."
    • "I stopped using Guix after it ran a campaign against Richard Stallman. I don't plan to return back."
  • Language ecosystem issues: as tools like Docker, and languages like Go and Rust become more important, friction with them is more of an issue for users:
    • "my use case is to package tooling for other distros and use it to build docker images reproducibly for use in CI environments. it does not work for this use case very well. can't run guix daemon inside a container"
    • "Lack of packages, stance on 100% reproducibility which makes packaging software with transitive dependencies hard, slow evaluator, obscure communication and collaboration mediums, patches take months to even get a review, cryptic error messages."
  • Nix is more modern or practical: many users seem to have explored Guix as an alternative to Nix. Example comments:
    • "I looked at Guix as an alternative to NixOS, and like its design a lot, but struggle with the 100% free software approach as I need some non-free software (for various reasons, hardware support, required by work, etc.). I'm aware of the non-guix channel which mostly solves this, but having to compile most things myself got too cumbersome for me — I wish there was a more complete substitute server for that channel, or perhaps even a derivation based on guix with a less strict free-software policy more akin to those of NixOS or debian."
    • "There were too many packages missing or so out of date as to be de-facto missing. Using Guix was therefore much harder to use than Nix, where I had more packages (both Free and non-Free) and they were more up to date."
  • Old-fashioned communications: here were some comments about communications within the project being old-fashioned, both from general users and those that had tried to contribute:
    • "There seems to be shortage of packages and slow development. Email or only free software is definitely an hindrance to many people to daily drive guix. It has become hit and miss for me, so staying with nixos as its rich and I can followup on its development easily on git repo, discourse, matrix and all."
    • "The main two reasons are that I find the irc/email/emacs flow very hard to work with and I do not feel safe in the mailing lists."
  • Unavailable on Mac OSX: there were a few comments that in a professional context the fact that Guix isn't available for MacOS made it difficult to use:
    • "Being unavailable on macOS. I have my nix home manager setup on both linux and macOS. Also the lack of a number of packages was a challenge. Like typst, bottom, hugo, tree, ruff, and sd for example. I am interested in becoming a maintainer but I want my setup to also work in macOS."
  • Incompatibility with hosting Linux distro: running Guix on top of another distribution was confusing, particularly for graphical programs:
    • "Guix home breaking Fedora. Troubles with binary applications due to the non-fsh nature."
    • "Setting up the package manager & daemon was confusing. The command "guix pull" felt excessively slow. A lot of packages were not up to date. Breaking the FHS"
  • Poor contributor experience: the patch process itself, slow reviews and inconsistency in response were all mentioned as issues. Examples:
    • "I still use Guix, but am a previous contributor. Important patches (for me) which I submitted were/are ignored, so I’ve stopped contributing."
    • "Perceived Inconsistent patch reviews. I did create couple of patches for guix, I do believe to contribute to project that I use. Sometimes I see patches getting stuck without feedback on them (not necessarily mine), the process to review patches is unclear to me and most likely to most people. Also guix lack automation to help everyone understand what is going on, if patches break rules, if this trivial change could be merged easily, etc. maybe it’s there for you, but I dont see that."
    • "I was passed over for commit access (even though I surpassed the 50 commit requirement) because I could only find 2 people to vouch for me, not 3. Then my patches stopped being merged, and some 2-year-pending patches I sent were closed without good reason. With the way Guix is run and how they treat contributors, it is an insulting/degrading process that I am no longer willing to put myself through."

As we can see there are a wide variety of reasons why users stopped using Guix, many of them are similar to the challenges that many users find, but they're even more powerfully felt by these users. It's really useful to have these themes and comments captured, as contributors may be able to pick up some of these issues and work to resolve them!

How important is Guix?

Focusing back on all users, the next question was, (Q10) How important is Guix in your computing environment?

There was a good range of answers:

Table 9: Adoption challenges
CategoryCountPercentage
Not using9710%
Tinkering15617%
Slightly important14716%
Moderately important19421%
Important13314%
Essential21623%

A visual representation:

2024 Guix user survey: GNU Guix's importance in users computing environments bar chart
Figure 6: Guix's importance in users computing environments

This is an interesting mixture which is probably reflective of many new users, and how Guix is used as a package manager on top of another distribution. Over a third of users consider it to be essential/important where it would be difficult to replace, while the bottom third are tinkering or exploring it.

Some thoughts

We've looked at the first 10 questions of the survey which covered the composition of the Guix community, initial adoption and satisfaction, and challenges that led to users moving away from Guix. The first thing to say is how fantastic the response has been to the survey, it's amazing to have over 900 participants!

Some big take-aways:

  • Interest in declarative configuration, reproducibility along with Scheme, Guile and Lisp are bringing in lots of new user - around 50% have been using Guix for less than 2 years
  • Guix users are knowledgable Linux users who are comfortable being hands-on with their system
  • Around 50% of users adopt Guix as a GNU/Linux distribution, 36% as a hosted package manager on top of another Linux distro
  • The survey produced great feedback from current and previous users on areas where the project can improve
  • Around 67% of users were satisfied (or very satisfied) with their initial adoption experience
  • Guix is essential or important for over a third of users, part of their environment for the next third, and being explored by the last 27% of users

The next post will cover more of the survey — which parts of Guix are most used, what sorts of deployments are being used, architectures and drivers details, and how users view contributing to the project financially.

by Steve George at Thursday, January 16, 2025

spritely.institute

Spritely is going to Guix Days and FOSDEM

Later this month, the entire development team at Spritely will be heading to Brussels to attend Guix Days and FOSDEM! We are all excited to take part in these conferences. They are a great opportunity for networking and sharing ideas, and I fully expect it will be a lovely time as well.

Guix Days is happening the two days prior to FOSDEM, Thursday January 30th; and Friday the 31st. Spritely team members will be there to talk about our Guile libraries and learn from the community. One of our ongoing projects is to port the Guix Shepherd using Goblins and we would love to discuss and receive feedback on our progress. We want to use this un-conference to strengthen our relationships with Guix and Guile developers and talk about how this community can grow into the future. If you are interested in using Hoot or Goblins in your project, get in contact with us or bring it up at the conference!

The FOSDEM conference will occur over the weekend, February 1st and 2nd. For those nearby or able to travel, I'm especially excited about some of the tables being set up, such as multiple groups working to bring Linux to cell phones, and representatives from CERN talking about the open source software they use.

Spritely submitted a number of proposals for topics we want to discuss at FOSDEM that we think are important and relevant to our work and mission. The organizers of FOSDEM seem to agree, as we were given two full hours of time in total to share our vision!! So please tune in, however you are able, and let us propose that a better social web is not only possible, but is already here.

Org mode Witchcraft at Spritely

Scheduled 10:30-11:00 CET on Saturday
Presenter Amy Grinn, Technical Administrator
Devroom Tool the Docs
Room K.4.201 (Room Link)

Start off the conference right by attending my very own talk about how our organization uses Org mode in various contexts! This talk will cover a lot of ground, but the most important thing I want to get across is the freedom that Org mode gives us to express ourselves and convey our ideas.

Today's fediverse: a good start, but there's more to do

Scheduled 17:30-17:40 CET on Saturday
Presenter Christine Lemmer-Webber, Executive Director
Devroom Social Web
Room UA2.118 (Henriot) (Room Link)

Later on Saturday Christine will give a lightning talk on how the fediverse has evolved in the seven years since ActivityPub's introduction. Christine will discuss the advances she thinks are most noteworthy, how she thinks fediverse applications can make use of content-adressing, and ways in which the spec falls short of providing a method for secure collaboration.

Object-Capability Security with Spritely Goblins for Secure Collaboration

Scheduled 18:45-18:55 CET on Saturday
Presenter Juliana Sims, Engineer
Devroom Collaboration and Content Management
Room H.1308 (Rolin) (Room Link)

"Secure collaboration" is a phrase you'll probably hear a lot from the Spritely team. In this lightning talk, Juliana will discuss the ways object-capability security -- exemplified by Goblins -- empowers people to more easily build healthier, more secure collaborative systems based on consent.

Minimalist web application deployment with Scheme

Scheduled 10:40-11:10 CET on Sunday
Presenter David Thompson, CTO
Devroom Declarative and Minimalistic Computing
Room H.1308 (Rolin) (Room Link)

Dave's talk will focus on an alternate web development toolchain being developed in part by Spritely. This approach is a minimalistic and developer-friendly take on what web development could be. In particular, this approach emphasizes reproducibility and simple bootstrapping which will be beneficial for self hosting and packaging for Linux distros. Our CTO will give a progress update on Hoot, our scheme-to-webassembly compiler, and also make the case that WebAssembly is the path toward more simplicity.

Shepherd with Spritely Goblins for Secure System Layer Collaboration

Scheduled 13:50-14:10 CET on Saturday
Presenter Juliana Sims, Engineer
Devroom Declarative and Minimalistic Computing
Room H.1308 (Rolin) (Room Link)

Juliana is busy working on porting the GNU Shepherd using the ideas of actors and object-capability security promoted by Spritely. This Goblins-based port will allow system administration over the network as well as more granular authorization control for services and daemons. Juliana will give a progress update and talk about the benefits of capability security.

Goblins: The framework for your next project!

Scheduled 14:10-14:30 CET on Sunday
Presenter Jessica Tallon, Engineer
Devroom Declarative and Minimalistic Computing
Room H.1308 (Rolin) (Room Link)

That's right, it's back-to-back Goblins talks. I promise you won't want to miss this one though because Jessica is very passionate about Goblins. She will make the case that it should be used for just about any new project you can think of. The benefits of writing a Guile-Goblins application range from easy networking to object-capability security and time-travel debugging. Jessica will go over these features in detail and describe what is on the horizon for Goblins.

Spritely and a secure, collaborative, distributed future

Scheduled 14:30-14:50 CET on Sunday
Presenter Christine Lemmer-Webber, Executive Director
Devroom Declarative and Minimalistic Computing
Room H.1308 (Rolin) (Room Link)

Immediately after the Goblins talks by Juliana and Jessica will be our Executive Director, Christine. She will talk more broadly about what Spritely's mission is and why we are so excited about it. She will also discuss how the close relationship formed between Spritely and Guile Scheme. As we work on building the next platform for peer-to-peer technology in Guile Scheme, we will continue to give back to the community through patches and public outreach.

Talks of Interest

While we love talking about Spritely here at Spritely, some other great speakers are also lined up that we're excited to see. In fact, there are too many to list here, but here are a few that you'll likely see us attending.

The Whippet Embeddable Garbage Collection Library

Scheduled 12:50-13:20 CET on Sunday
Presenter Andy Wingo, Igalia
Devroom Declarative and Minimalistic Computing
Room H.1308 (Rolin) (Room Link)

Andy Wingo is a friend of Spritely and is also the lead engineer for the Hoot project! This talk will cover another of Andy's compiler-related projects: a zero-dependency garbage collector named Whippet. Spritely is very interested in how the transition to Whippet as Guile's default GC will affect the performance of our software. I'm excited to see how his GC compares to earlier models of garbage collection and also how Whippet can be integrated into various runtimes besides just Guile.

The Shepherd: Minimalism in PID 1

Scheduled 13:20-13:50 CET on Sunday
Presenter Ludovic Courtès, Guix
Devroom Declarative and Minimalistic Computing
Room H.1308 (Rolin) (Room Link)

Ludovic Courtès is an inspiration to all of us at Spritely. He founded Guix more than a decade ago, and a large part of the Guix system is driven by the Shepherd. This talk will go into the complexity of daemons and services, and talk about why Guix choose the Shepherd as their init system. Even if you're not a Guix user, I think this will be interesting for anybody that has to deal with daemonized software.

A special note here that Juliana's talk about porting the Shepherd using Spritely Goblins will be immediately after this talk in the same room. So kick your feet up and stay a while!

Fediverse talks

Scheduled 15:00-19:00 CET on Saturday
Presenter Various Fediverse Citizens
Devroom Social Web
Room UA2.118 (Henriot) (Room Link)

Lastly, the Social Web track at FOSDEM 2025 is full of ActivityPub spec discussions that we think will be interesting. Spritely was born out of the work on the ActivityPub spec, and it's extremely rewarding and exciting to see how far people can take the concept of federation.


FOSDEM 2025 is going to be a lot of fun; we hope to see you there! Even if you can't make it in person, though, there are a lot of ways to take part.

by Amy Grinn (contact@spritely.institute) at Thursday, January 16, 2025

Wednesday, January 15, 2025

spritely.institute

Supporter drive goal complete! Time for a stretch goal!

Over the weekend Spritely marked a major milestone: we reached our goal for our very first supporter drive! With your help we managed to surpass our $80,000 USD goal with three weeks to go!

This is incredible! Before we go any further we should note clearly that this campaign is already a success whether we raise any more or not! Thank you to everyone who pitched in to help us with our goals to build the next generation of decentralized network tech!

With three weeks left in our campaign, we are introducing a stretch goal! Can we raise another $40,000 USD?

If we meet our stretch goal, we will unlock the following:

  • Spritely Goblins playground in the browser: Wanted to get into Goblins but don't want to install a bunch of complicated local tooling on your computer? Want to experience the magic of programming in Goblins (or have others experience it!) on a web page? Help us bring Spritely Goblins to everyone!
  • Cirkoban Deluxe!: Cirkoban is a fun game we made to show off an early version of Goblins + Hoot in the browser (see our writeup!) but also a lot of people have told us they just plain loved it as a game! If we meet our stretch goal we'll add more levels, new interactive components, and release a mini-guide on how to make your own levels!

We've chosen these two deliverables because one, the Spritely Goblins playground, advances our technology and demonstrates its utility, and the second one, Cirkoban Deluxe, is fun and exciting (and still demonstrates the utility of our technology, just with an even more fun version)!

Plus! Don't forget that the silver, gold, and diamond levels of our campaign allow you to get your name in the credits of our games. This means you have an even more fun version of Cirkoban to get your name into!

We've already succeeded in our goal, so no matter what happens, we are extremely grateful for all of your support! Thank you, thank you, thank you!

Let's see if we can make this stretch goal happen! Together we can accomplish anything! Thank you for helping us make the future of the internet happen!

by Christine Lemmer-Webber, Amy Grinn (contact@spritely.institute) at Wednesday, January 15, 2025

Monday, January 13, 2025

Andy Wingo

an annoying failure mode of copying nurseries

I just found a funny failure mode in the Whippet garbage collector and thought readers might be amused.

Say you have a semi-space nursery and a semi-space old generation. Both are block-structured. You are allocating live data, say, a long linked list. Allocation fills the nursery, which triggers a minor GC, which decides to keep everything in the nursery another round, because that’s policy: Whippet gives new objects another cycle in which to potentially become unreachable.

This causes a funny situation!

Consider that the first minor GC doesn’t actually free anything. But, like, nothing: it’s impossible to allocate anything in the nursery after collection, so you run another minor GC, which promotes everything, and you’re back to the initial situation, wash rinse repeat. Copying generational GC is strictly a pessimization in this case, with the additional insult that it doesn’t preserve object allocation order.

Consider also that because copying collectors with block-structured heaps are unreliable, any one of your minor GCs might require more blocks after GC than before. Unlike in the case of a major GC in which this essentially indicates out-of-memory, either because of a mutator bug or because the user didn’t give the program enough heap, for minor GC this is just what we expect when allocating a long linked list.

Therefore we either need to allow a minor GC to allocate fresh blocks – very annoying, and we have to give them back at some point to prevent the nursery from growing over time – or we need to maintain some kind of margin, corresponding to the maximum amount of fragmentation. Or, or, we allow evacuation to fail in a minor GC, in which case we fall back to promotion.

Anyway, I am annoyed and amused and I thought others might share in one or the other of these feelings. Good day and happy hacking!

by Andy Wingo at Monday, January 13, 2025

Thursday, January 9, 2025

Andy Wingo

ephemerons vs generations in whippet

Happy new year, hackfolk! Today, a note about ephemerons. I thought I was done with them, but it seems they are not done with me. The question at hand is, how do we efficiently and correctly implement ephemerons in a generational collector? Whippet‘s answer turns out to be simple but subtle.

on oracles

The deal is, I want to be able to evaluate different collector constructions and configurations, and for that I need a performance oracle: a known point in performance space-time against which to compare the unknowns. For example, I want to know how a sticky mark-bit approach to generational collection does relative to the conventional state of the art. To do that, I need to build a conventional system to compare against! If I manage to do a good job building the conventional evacuating nursery, it will have similar performance characteristics as other nurseries in other unlike systems, and thus I can use it as a point of comparison, even to systems I haven’t personally run myself.

So I am adapting the parallel copying collector I described last July to have generational support: a copying (evacuating) young space and a copying old space. Ideally then I’ll be able to build a collector with a copying young space (nursery) but a mostly-marking nofl old space.

notes on a copying nursery

A copying nursery has different operational characteristics than a sticky-mark-bit nursery, in a few ways. One is that a sticky mark-bit nursery will promote all survivors at each minor collection, leaving the nursery empty when mutators restart. This has the pathology that objects allocated just before a minor GC aren’t given a chance to “die young”: a sticky-mark-bit GC over-promotes.

Contrast that to a copying nursery, which can decide to promote a survivor or leave it in the young generation. In Whippet the current strategy for the parallel-copying nursery I am working on is to keep freshly allocated objects around for another collection, and only promote them if they are live at the next collection. We can do this with a cheap per-block flag, set if the block has any survivors, which is the case if it was allocated into as part of evacuation during minor GC. This gives objects enough time to die young while not imposing much cost in the way of recording per-object ages.

Recall that during a GC, all inbound edges from outside the graph being traced must be part of the root set. For a minor collection where we just trace the nursery, that root set must include all old-to-new edges, which are maintained in a data structure called the remembered set. Whereas for a sticky-mark-bit collector the remembered set will be empty after each minor GC, for a copying collector this may not be the case. An existing old-to-new remembered edge may be unnecessary, because the target object was promoted; we will clear these old-to-old links at some point. (In practice this is done either in bulk during a major GC, or the next time the remembered set is visited during the root-tracing phase of a minor GC.) Or we could have a new-to-new edge which was not in the remembered set before, but now because the source of the edge was promoted, we must adjoin this old-to-new edge to the remembered set.

To preserve the invariant that all edges into the nursery are part of the roots, we have to pay special attention to this latter kind of edge: we could (should?) remove old-to-promoted edges from the remembered set, but we must add promoted-to-survivor edges. The field tracer has to have specific logic that applies to promoted objects during a minor GC to make the necessary remembered set mutations.

other object kinds

In Whippet, “small” objects (less than 8 kilobytes or so) are allocated into block-structed spaces, and large objects have their own space which is managed differently. Notably, large objects are never moved. There is generational support, but it is currently like the sticky-mark-bit approach: any survivor is promoted. Probably we should change this at some point, at least for collectors that don’t eagerly promote all objects during minor collections.

finalizers?

Finalizers keep their target objects alive until the finalizer is run, which effectively makes each finalizer part of the root set. Ideally we would have a separate finalizer table for young and old objects, but currently Whippet just has one table, which we always fully traverse at the end of a collection. This effectively adds the finalizer table to the remembered set. This is too much work—there is no need to visit finalizers for old objects in a minor GC—but it’s not incorrect.

ephemerons

So what about ephemerons? Recall that an ephemeron is an object E×KV in which there is an edge from E to V if and only if both E and K are live. Implementing this conjunction is surprisingly gnarly; you really want to discover live ephemerons while tracing rather than maintaining a global registry as we do with finalizers. Whippet’s algorithm is derived from what SpiderMonkey does, but extended to be parallel.

The question is, how do we implement ephemeron-reachability while also preserving the invariant that all old-to-new edges are part of the remembered set?

For Whippet, the answer turns out to be simple: an ephemeron E is never older than its K or V, by construction, and we never promote E without also promoting (if necessary) K and V. (Ensuring this second property is somewhat delicate.) In this way you never have an old E and a young K or V, so no edge from an ephemeron need ever go into the remembered set. We still need to run the ephemeron tracing algorithm for any ephemerons discovered as part of a minor collection, but we don’t need to fiddle with the remembered set. Phew!

conclusion

As long all promoted objects are older than all survivors, and that all ephemerons are younger than the objects referred to by their key and value edges, Whippet’s parallel ephemeron tracing algorithm will efficiently and correctly trace ephemeron edges in a generational collector. This applies trivially for a sticky-mark-bit collector, which always promotes and has no survivors, but it also holds for a copying nursery that allows for survivors after a minor GC, as long as all survivors are younger than all promoted objects.

Until next time, happy hacking in 2025!

by Andy Wingo at Thursday, January 9, 2025

Thursday, January 2, 2025

Joe Marshall

Scheme Interpreter: Conclusions

This experiment with writing an MIT-Scheme S-code interpreter in C# was successful in these ways:

  • It showed that the S-code interpreter is an independent component of the Scheme system. The interpreter substrate can be replaced with a new implementation, written in a different language, using a different evaluation strategy, without replacing the Scheme runtime system written in Scheme.
  • It showed that the S-code interpreter can, on small segments of code, perform as fast as compiled code. However, growing the size of these small segment causes an exponential increase in the number of interpreter specializations. The obvious solution of automatically generating interpreter specializations on demand is the equivalent of JIT compilation.
  • It validated the idea that the lexical environment can be represented as a flattened vector of values. Mutable variables can be implemented by cell conversion. Variable values are copied from outer scopes to inner scopes when closures are created. The semantics of such an implementation is equivalent to the semantics of a nested series of frames as used in MIT-CScheme.
  • It showed that you can implement tail recursion via trampolines at each call site, and that you can implement first-class continuations by testing for a magic return value after the return of each trampoline. We don’t use the C# exception handling mechanism to unwind the stack when implementing first-class continuations, just a conditional branch and a normal return. This is far less complicated and expensive.

It was a failure in these ways:

  • Although it showed one way in which we could take incremental steps to increase the speed of the interpreter until it approached the speed of compiled code, each step resulted in an exponential increase in the number of specializations in the interpreter and had diminishing returns.
  • The ultimate outcome of this process would be an interpreter with thousands of specializations. Small Scheme programs could be completely represented by a single specialization, and they would be interpreted as fast as compiled code. But this is because the specialization is eessentially a compiled version of the Scheme program. In other words, we ultimately will have an interpreter that “optimizes” by looking up a program in a huge table that maps small programs to their precomputed compiled bodies. This is just an unusual and inefficient way to implement a compiler.
  • Because C# offers no way to dump a the heap in binary format, we must cold load the system each time we start it.
  • One of the tasks in the cold load is to initialize the unicode tables. These are big tables that take a long time to initialize.
  • It took an annoyingly long time to get to Scheme’s initial top-level prompt.
  • Debugging crashes in the Scheme system was annoying and slow because we have to cold load the Scheme system to reproduce bugs.
  • I have understated a large component of the work: providing a new C# implementation for each of the hundreds of primitives in the Scheme runtime. I only bothered to implement those primitives called as part of the cold lood boot sequence, but still there were a lot of them. For many of these primitives, the C# implementation just achieved the same effect “in spirit” as the MIT-CScheme implementation. These were easy to implement. But some were more persnickety where it was vital that the C# implementation produced exactly the same bits as the MIT-CScheme implementation. For instance, the code used to hash the types for generic method dispatch had to produce the exact same hash values in both implementations. This is because there is code that depends on the hashed multimethod ending up at a precomputed location in a method cache.
  • The C# interpreter was complete enough to boot a Scheme cold load and run it to the top-level prompt. It could run the top-level REPL. But much was missing. It could not host the SF program, which generates the S-code for the Scheme runtime. You’d have to run an original instance of MIT-CScheme to generate the S-code that you would then run in the C# interpreter.

I think the next Lisp system I will try should be based around a simple, portable JIT compiler.

by Joe Marshall (noreply@blogger.com) at Thursday, January 2, 2025

Calling Conventions in the Interpreter

C# is not tail recursive. It could be. The IL that it compiles to supports tail recursion, but the C# compiler doesn’t generate the tail call instruction. It would be a simple thing to add: when the compiler emits a call instruction, it could check if the next instruction is a return, and if so, emit a tail call instruction. This could be controlled by a compiler flag so only us weirdos who want this feature would enable it.

But until the C# compiler adds this feature, we have to resort to other methods. I chose to use a trampoline at each call site. This is a small segment of code that awaits the result of the function call. If the callee wants to tail call, it returns the tail call target to the caller, which performs the call on the callee’s behalf. This requires a modification to the calling conventions.

EvalStep is the virtual method that all S-code objects implement to perform an evaluation. Its signature is this:

abstract class Control : SchemeObject
{
     public abstract TailRecursionFlag EvalStep (out object answer, 
                                                 ref Control expression, 
                                                 ref Environment environment);
}

The result of the evaluation is returned in the answer parameter. This is an out parameter, so the answer is allocated in the caller and a pointer to it is passed to the callee. The callee returns the answer by modifying it in the callers stack frame.

The expression and environment parameters are the expected parameters for a call to Eval. They, too, are allocated in the caller’s frame and references to them are passed to the callee. The callee is allowed to modify the caller’s values of these variables.

The returned value is a TailRecursionFlag. This is either 1, indicating that a value has been returned in the answer, or 0, indicating that the caller should perform another EvalStep. To return a value, the callee modifies the answer. To perform a tail call, the callee modifies the expression and environment references and returns 0.

Any caller must call EvalStep as follows: The caller allocates an answer variable to receive the answer of the call. It also allocates an expression, and environment variable to pass to the callee. It then calls EvalStep in a loop until the callee returns a TailRecursionFlag of 1, indicating that the answer has been set to the return value.

In the EvalStep for an S-code Conditional we see an example of the calling convention:

  object ev;
  Control unev = predicate;
  Environment env = environment;

  while (unev.EvalStep (out ev, ref unev, ref env) == TailRecursionFlag.TailCall) { };

We are making a recursive call to evaluate the predicate. We set up ev to receive the result of the evaluation. We set up unev and env to hold the expression and environment to pass to EvalStep. unev.EvalStep does the eval dispatch via virtual function dispatch.

If the predicate returns a TailRecursionFlag of ReturnValue, the loop will exit. The predicate is assumed to have put the return value in the ev variable.

If the predicate wants to tail call, it will modify the values of unev and env to the new expression and new environment, and return a TailRecursionFlag of TailCall. The loop will iterate, using the new value of unev and env to again dispatch a call to EvalStep.

When the while loop exits, the ev variable will contain the return value of the predicate. Control may be returned to the while loop several times before the loop exits. This is the trampoline in action.

Conditional expressions don’t return a value. They either tail call the consequent or the alternative. The EvalStep for a conditional ends like this:

  answer = null;
  expression = (ev is bool evb && evb == false) ? alternative :
  return TailRecursionFlag.TailCall;
}

The answer variable in the caller is set to null. out parameters must always be assigned to before the function exits, so this just keeps the compiler happy. If the return value of calling EvalStep on the predicate is the boolean false, we set the expression in the caller to the alternative, otherwise the consequent. This is the target of our tail call to EvalStep. For the scode for a conditional, we leave the environment alone — the tail call uses the same environment unchanged. We finally return TailRecursionFlag.TailCall so that the caller’s trampoline makes another iteration around its while. It will call EvalStep on the alternative or consequent that we stuffed in the caller’s expression.

This little song and dance is performed at every recursive call to EvalStep making EvalStep behave as a tail-recursive function. This calling convention is about half the speed of a normal C# method call. It is the cost of using a trampoline for tail recursion.

First Class Continuations

There is one more obscure reason that the control might return to us when evaluating the predicate. If some function further down the call chain invokes call-with-current-continuation, we need to copy the stack. The callee indicates this by returning a magic return value of Special.UnwindStack. The callee sets the caller’s environment to an UnwinderState that will accumulate the stack frames as we unwind the stack. So our calling convention says we need to check the return value of EvalStep, and if it is Special.UnwindStack, we allocate a ConditionalFrame on the heap that will contain the state of the current stack frame. We AddFrame to the UnwinderState. We propagate this up the stack by putting it in the caller’s environment, setting the caller’s value of answer to Special.UnwindStack and returning TailRecursionFlag.ReturnValue to stop the caller’s trampoline loop.

The full code of EvalStep for an S-code if expression is this:

 public override TailRecursionFlag EvalStep (out object answer, 
                                             ref Control expression,
                                             ref Environment environment)
{
    object ev;
    Control unev = predicate;
    Environment env = environment;

    // Tail recursion trampoline.
    while (unev.EvalStep (out ev, ref unev, ref env) == TailRecursionFlag.TailCall) { };
    // Support for first class continuations.
    if (ev == Special.UnwindStack)
    {
        ((UnwinderState) env).AddFrame (new ConditionalFrame (this, environment));
        environment = env;
        answer = Special.UnwindStack;

        return TailRecursionFlag.ReturnValue;
    }

    // Tail call EvalStep on the consequent or alternative.
    answer = null;
    expression = (ev is bool evb && evb == false) ? alternative : consequent;
    return TailRecursionFlag.TailCall;
}

First class continuations allow you unload and reload the pending call chain. We see that at each call site, we must check the return value and, if it is Special.UnwindStack, we create a new Frame on the heap and add it to the unwinder state befor we propagate the Special.UnwindStack up the call chain.

At the very top of the call chain, we have the outermost call to EvalStep. If the Special.UnwindStack value is returned to this call, the stack has been unwound and the UnwinderState is sitting in the environment variable. We need to rewind the stack and put the stack frames back on the stack. We create a RewindState from the UnwinderState. Each time we PopFrame from the RewindState, we get a deeper frame. We reload the stack by getting the outermost frame from the RewindState and calling EvalStep on it. The EvalStep for a Frame sets up the trampoline loop, calls PopFrame to get the next frame, and calls EvalStep on it. When we run out of stack frames to reload, the stack is reloaded and we return control the innermost frame so it can continue where it left off. This is the rewind loop.

The EvalStep for a Frame, after making the recursive call to EvalStep on the next frame, continues with code that is a duplicate of the code in the original frame before the cotinuation was captured. A specific example will make this clear. If an if expression is on the stack when it is uwound, a ConditionalFrame is created. A ConditionalFrame is a subclass of SubproblemFrame which has this EvalStep method:

public override TailRecursionFlag EvalStep (out object answer,
                                            ref Control expression,
                                            ref Environment environment)
{
    object temp;
    Control expr = ((RewindState) environment).PopFrame ();
    Environment env = environment;
    while (expr.EvalStep (out temp, ref expr, ref env) == TailRecursionFlag.TailCall) { };
    if (temp == Special.UnwindStack)
    {
        ((UnwinderState) env).AppendContinuationFrames (continuation);
        environment = env;
        answer = Special.UnwindStack;

        return TailRecursionFlag.ReturnValue;
    }
    expression = this.expression;
    environment = this.environment;
    return Continue (out answer, ref expression, ref environment, temp);
}

public abstract TailRecursionFlag Continue (out object answer,
                                            ref Control expression,
                                            ref Environment environment,
                                            object value);

That is, the EvalStep of the SubproblemFrame establishes a trampoline, pops the next frame from the RewindState, and invokes its EvalStep method. When an answer is returned, the SubproblemFrame calls its Continue method.

The Continue method is a virtual method that is implemented by each subclass of SubproblemFrame. It finishes the work of the frame. In the case of a ConditionalFrame, the Continue method is this:

public override TailRecursionFlag Continue (out object answer,
                                            ref Control expression,
                                            ref Environment environment,
                                            object value)
{
    answer = null;
    expression = value is bool bvalue && bvalue == false
      ? SCode.EnsureSCode (this.expression.Alternative)
      : SCode.EnsureSCode (this.expression.Consequent);
    return TailRecursionFlag.TailCall;
}
compare this to the code in the original Conditional:
    // Tail call EvalStep on the consequent or alternative.
    answer = null;
    expression = (ev is bool evb && evb == false) ? alternative : consequent;
    return TailRecursionFlag.TailCall;

There are only superficial differences: the Continue method gets the value returned by the predicate in an argument rather than in a local variable. It type checks the alternative and consequent components of the if expression by calling SCode.EnsureSCode. Otherwise, the code does the same thing.

It is not possible to actually rewind the stack with the original set of pending methods. What we do instead is rewind the stack with methods that do the same thing as the original pending methods. It is close enough. The same values will be computed.

There is one place you can see the difference. If you look at the stack trace in the debugger before you capture a continuation, you will see the pending recursive calls to the S-code EvalStep methods. If you look at the stack trace in the debugger after you capture a continuation, you will instead see pending calls to the EvalStep methods of a set of frames. The pending frames are in the same order and have names similar to the original pending methods. They compute the same values, too. But the debugger can notice that these are not the originals.

by Joe Marshall (noreply@blogger.com) at Thursday, January 2, 2025

Wednesday, January 1, 2025

Joe Marshall

More Inlining

Calls to (null? x) usually appear as the predicate to a conditional. We can specialize the conditional. Instead of

[if
  [primitive-null?-argument0]
  [quote 69]
  [quote 420]]

We create a new S-code construct, if-null?-argument0, and construct the conditional as

[if-null?-argument0 
  [quote 69]
  [quote 420]]

We avoid a recursive call and generating a ’T or ’NIL value and testing it, we just test for null and jump to the appropriate branch, just like the compiled code would have done.

Multiple arguments

We can further specialize the conditional based on the types of the consequent and alternative. In this case, they are both quoted values, so we can specialize the conditional to [if-null?-argument0-q-q 69 420]. (Where the name of the S-code type is derived from the type of the consequent and alternative.)

if-null?-argument0-q-q is an esoteric S-code type that codes a test of the first argument for null, and if it is null, returns the first quoted value, otherwise the second quoted value. This S-code type runs just as fast as compiled code. Indeed the machine instructions for evaluating this S-code are the same as what the compiler would have generated for the original Lisp form.

But there is a problem

Why not continue in this vein specializing up the S-code tree? Wouldn’t our interpreter be as fast as compiled code? Well it would, but there is a problem. Every time we add a new S-code type, we add new opportunities for specialization to the containing nodes. The number of ways to specialize a node is the product of the number of ways to specialize its children, so the number of ways to specialize the S-code tree grows exponentially with the number of S-code types. The few specializations I’ve just mentioned end up producing hundreds of specialized S-code types. Many of these specialized S-code types are quite esoteric and apply at most to only one or two nodes in the S-code tree for the entire program and runtime system. Performing another round of inlining and specialization would produce thousands of specialized S-code types — too many to author by hand, and most of which would be too esoteric to ever be actually used.

The solution, of course, is to automate the specialization process. We only generate a specialized S-code type when it is actually used by a program. The number of specialized S-code types will be limited by the number of ways programs are written, which is linear in the size of the program.

But specializing the code when we first encounter it is just JIT compiling the code. We’ve just reinvented the compiler. We might as well skip the multi-level specialization of the S-code tree and write a simple JIT compiler.

by Joe Marshall (noreply@blogger.com) at Wednesday, January 1, 2025

Inlinig Primitive Function Calls and Argument Evaluation

Inlining some primitives

Reconsider our model of a Lisp program as a “black box” that issues a series primitive function calls. We can eliminate some of the primitive function calls by implementing them directly within our “black box”. We inline some primitives.

Take null? as an example. Instead of constructing (null? arg) as

[primitive-funcall1
  [quote [primitive null?]]
  [argument 0]]

we add a new S-code construct, primitive-null?, and construct (null? arg) as

[primitive-null?
  [argument 0]]

We don't have to evaluate the function. We don't even have to jump to the entry point of the null? primitive. After we evaluate argument 0, we just test for null in the interpreter and return T or NIL as appropriate.

There are maybe 20 or so primitives that are used frequently enough to be worth inlining in this way. Each primitive you inline like this adds bloat to the interpreter.

Inlining simple arguments

The leaves of a tree of S-code are the atomic expressions, whereas the internal nodes of the tree are compound expressions. We can eliminate the leaves by inlining them into their parent nodes. For example if a leaf node is a lexical variable reference, we inline this into the parent node. We unroll the evaluation of the leaf node thus saving a recursive call to the interpreter and an evaluation dispatch.

Consider our previous example which we consructed as

[primitive-null?
  [argument 0]]

We further specialize primitive-null? based on its argument type into primitive-null?-argument or primitive-null?-lexical. Now our S-code becomes:

[primitive-null?-argument 0]

The leaf node [argument 0] is absorbed into the parent node [primitive-null? ...] making a new leaf node [primitive-null?-argument]. The evaluator for this S-code node simply tests if argument 0 is null and returns T or NIL as appropriate.

Compare this to the original S-code:

[funcall
  [global 'null?]
  [argument 0]]

This required two recursive calls to the interpreter, a global lookup, and a primitive function call on top of the null test. We’ve eliminated all of those. There’s not much left to do. Testing null? in the interpreter is almost as fast as testing null? in compiled code.

The number of S-code types needed to perform this inlining is the number of kinds of leaf nodes raised to the power of the number of leaves in the parent node. A call to a one-argument primitive would need specializations for the cases where the argument is a quoted value, an argument, a lexical variable or a global variable — four specializations. Calls to a two-argument primitive turn into one of sixteen specializations — the product of four for each argument. A call to a three-argument primitive would turn into one of sixty-four specializations.

We can inline all the non-compound argument evaluations, both to primitive functions and to user-defined functions. In our S-code tree, we have removed all the leaf nodes and absorbed them into their parent nodes (which have now become new leaves). The interpreter is now quite a bit faster, although still not as fast as compiled code.

by Joe Marshall (noreply@blogger.com) at Wednesday, January 1, 2025

Usual Integrations, Function Calls and Primitive Calls

Near the beginning of most MIT-Scheme files you'll see (declare (usual-integrations)). This instructs the SF program, which translates Scheme to S-code, to replace the “usual” global variables with their (compile time) values. You lose the ability to change the values of these variables, but who redefines CAR or CDR? (And if you do, just remove the declare form or add CAR and CDR to the exception list.)

Now forms such as (null? x), which used to be syntaxed into this S-code:

[funcall
   [global ’null?]
   [argument 0]]

are now syntaxed into this S-code:

[funcall
   [quote [primitive null?]]
   [argument 0]]

Recall that S-code is a tree representation of Lisp code that is walked by the interpreter. The leaves of the tree are all either quote, global, argument, or lexical, and each of these only take one or two machine instructions to eval. The “overhead” of interpretation involves getting to the leaves.

The internal nodes of an S-code tree are the compound forms: function calls and special forms. IF and PROGN (in Scheme BEGIN) are the two most frequently evaluated special forms. When we construct the S-code for a compound form, we pattern match against the subforms and specialize the S-code. Specialized S-code for a compound form is called a “superoperator”. Superoperators eliminate the eval dispatch for the subforms. The machine code for interpreting a superoperator is close to the equivalent compiled code. Let us examine some of the most important superoperators.

Funcall Argument List Elimination

In general, a function call accumulates a list of arguments and jumps to APPLY to dispatch the function. We can eliminate the overhead of doing this. First, we can eliminate the call to APPLY by having applicable objects implement the applicable interface. We use virtual function dispatch to invoke the code that implements the applicable object. Applicable objects can be applied to any number of arguments passed as a list. We can eliminate the overhead of manipulating short argument lists by spreading the arguments and adding call0, call1, call2, and call3 methods to the applicable interface. The number of arguments at any particular call site is fixed, and it is usually a small number. We eliminate the overhead of accumulating the argument list for a function call by specializing the funcall S-code object on the number of arguments.

An S-code funcall is replaced by a funcall0, funcall1, funcall2, funcall3, or funcalln depending on the number of arguments. The argument accumulation loop is unrolled in the numbered cases placing the arguments in local C# variables. The funcallx superoperators directly call the appropriate calln method in the function’s applicable interface.

Primitive Function Call

Function calls to primitive procedures are the foundation of Lisp and Scheme programs. We can look at a Lisp program as if it were a “black box” that makes a series of primitive function calls. If unoptimized, a Lisp program ought to make the same series of primitive function calls with the same values whether it is interpreted or compiled. The primitive function will run at the same speed no matter how it is called, so if compiled code is faster than interpreted, it is issuing the primitive function calls faster, taking less time between them. The time between issuing primitive function calls is the interpretation overhead, and we want to reduce that.

When we construct an S-code funcall (and funcall0, funcall1, etc.), we check if the thing being called is a literal quoted primitive function, and if it is, we replace the S-code with a primitive-funcall (or primitive-funcall0, primitive-funcall1, etc.). A primitive-funcall evaluates its arguments, but it can skip evaluating the function. It can skip the apply dispatch, too, and just jump right to the primitive entry point.

We construct (null? arg) as

[primitive-funcall1
  [quote [primitive null?]]
  [argument 0]]

The MIT-Scheme interpreter and the MIT-Scheme hardware all special case function calls with small numbers of arguments and those which call primitive procedures.

In the next post, we develop this idea beyond what MIT-Scheme already does.

by Joe Marshall (noreply@blogger.com) at Wednesday, January 1, 2025

Tuesday, December 31, 2024

Joe Marshall

Eliminating Deep Search

In general, when you look up a lexical variable, you have to do a deep search. You start with the innermost scope and work your way out until you find the variable you are looking for. This takes an absurd amount of time for each variable lookup, so we want to avoid actually doing it.

As mentioned in the previous post, we special case the two most common cases: when the variable is in the innermost scope and when the variable is in the global scope. In the other cases, when the variable is in an intermediate scope, we can flatten the scopes by copying the variable values from inner scopes to outer scopes whenever the variable is closed over. This only works because we have put the mutable variables in cells and we copy pointers to the cells.

When an S-code lambda object is created, we walk the code and find the variables that are closed over. We create a closure with a copy of the variable’s value (or a copy of its cell). In essence, we replace:

(lambda (x) (+ x y))

(assuming y is a lexical variable) with

(%make-closure ’(lambda (x) (+ x y)) y)

Once we have done this, we no longer need to deep search for lexical variables. There is only one lexical frame, it contains a copy of the variable, and we know the offset of the variable in the frame. Lexical variable lookup is now just a vector-ref.

The drawback is that we have cell-converted the mutable variables, so calling a function that contains mutable variables will allocate memory. Use of SET! makes your code expensive. Furthermore, %make-closure is linear in the number of things being closed over. But typically, closures only close over one or two items.

So between these specializations (this post and the two previous ones), we have removed the overhead of variable lookup. An expression like (lambda (x) (+ y x)) (where y is a lexical variable) is converted into S-code that looks like:

[%make-closure
  [quote (lambda ’(x)
            [funcall
              [global ’+]
              [lexical 0]
              [argument 0]])]
  [argument n]  ;; wherever ’y is in the outer scope
  ]

Evaluating variables is the bulk of the work in a Lisp program. Now both the interpreter and compiler can evaluate a variable with one or two machine instructions. Interpreted code is still slower than compiled code, but this isn’t a bottleneck anymore.

Now we look for other bottlenecks. Watch this space.

by Joe Marshall (noreply@blogger.com) at Tuesday, December 31, 2024

Specialized Variables

In general, when you lookup a variable, you have to do a deep search through the lexical contours. This is because someone could have captured a first class environment and evaled a define in it that shadows a variable.

However, this is a very rare case. In the vast majority of cases, the variable is either a global variable or lexical variable bound in the immediately enclosing environment. We can optimize for these cases.

Symbols refer to three different kinds of variable: if the variable is free, it refers to a global or “top-level” variable, if it is bound, it may be bound in the immediately enclosing environment (i.e., it is an argument), or it may be bound in a environment further up the lexical chain. We can determine which of these cases applies statically when we construct the S-code and select the appropriate subtype of S-code variable.

A global or top-level variable can contain a pointer to the value cell of the variable in the global environment, so variable lookup simply involves a dereference. If the variable is lexically bound, and the environment is a StatcEnvironment (or subtype), then the lexical address of the variable is known statically and cannot change (the variable cannot be shadowed), so we can store the lexical address in the S-code variable. Lookup is faster because it involves constant offsets.

The variable corresponding to argument 0 of the current lambda is by far the most commonly accessed variable, followed by argument 1. We can create special S-code types for each of these two variables in j addition to the three types for global, lexical, and the other arguments. By classifying the variables into one of these five types at syntax time, we can greatly reduce the amount of work needed at runtime to access the variable.

So consider the form (lambda (n) (lambda (x) (+ x n))). n is a lexical variable bound one level deep. x is an argument and + is a global. The S-code representation of this form will have three different variable types. The S-code for n will be a LexicalVariable with a depth of 1 and offset of 0. The eval method for LexicalVariable can walk back one level in the lexical chain and then access the zeroth element of the environment.

On the other hand, the S-code for x will be an ArgumentVariable with an argument number of 0. The eval method for ArgumentVariable can call the GetArg0 method of the environment.

Finally, the S-code for + will be a GlobalVariable with a pointer to the global value cell for +. The eval method for this type of S-code variable can simply dereference the pointer.

Between specializing environments and S-code varibles, we can greatly reduce the amount of time needed to access variables.

by Joe Marshall (noreply@blogger.com) at Tuesday, December 31, 2024

Specialized Environments

A Lisp program conceptually does a lookup every time it evaluates a symbol. Since evaluating symbols is the single most common operation, you want it to be as fast as possible. In compiled code, symbol values might be stored in registers, on the stack, or in a vector, but in interpreted code they are usually stored in a structure called an environment. In compiled code, the symbol value can often be retrieved by a move instruction, but in interpreted code there may be a function call and a vector-ref needed to get the value. This is a significant performance hit.

We can customize the layout of an environment to speed up symbol lookup. In the general case, an environment is a fairly heavyweight data structure. Because you can eval a define form at any time, you need an environment that supports incremental definition and you need to deep search each time you do a lookup because someone might have defined a shadowing variable.

But 99% of the time, you don’t need the general case. If no one captures a first-class environment, then you don’t need to support incremental definitions because no one can call eval to define a new variable. If no one mutates a variable, then you can store a copy of the variable’s value directly in the environment instead of indirectly through a ValueCell object. Most environments only close over a couple of variables, so you can make the variables fields of the environment object instead of storing them in a vector of values. You statically know the number of variables, so you can store them in a fixed-size class instance.

An Environment is an abstract base class.

A GlobalEnvironment is a singleton class that represents bindings in a hash table.

A StandardEnvironment is a concrete subclass of Environment that supports assignment and incremental definitions. The bindings are kept in a vector of ValueCell objects and the incremental bindings are kept in a hash table. This is a general purpose environment.

class StandardEnvironment : public Environment {
  Vector<ValueCell*> bindings;
  HashTable<ValueCell*> incrementalBindings;
};

A variable lookup in a StandardEnvironment involves fetching the vector of value bindings from the environment, looking up the variable in the vector of bindings, and if it is not found, looking it up in the hash table of incremental bindings, and finally fetching the value from the cell. Several levels of indirection are involved.

A StaticEnvironment supports assignment, but not incremental definitions. Bindings are kept in a vector of ValueCell objects.

class StaticEnvironment : public Environment {
  Vector<ValueCell*> bindings;

  Object* GetArg0() { return bindings[0].Value; }

};

Looking up a value in a StaticEnvironment is slightly quicker because there is no table of incremental bindings to search.

An ImmutableEnvironment holds bindings in a vector and does not support assignment or incremental definitions. Binding values are kept directly instead of indirectly through ValueCell objects.

class ImmutableEnvironment : public Environment {
  Vector<Object*> bindings;

  Object* GetArg0() { return bindings[0]; }

};

Looking up a variable in an ImmutableEnvironment is quicker because there are no ValueCell objects to dereference.

A SimpleEnvironment is an abstract subclass of ImmutableEnvironment that does not support assignment or incremental definition. Bindings are kept in class fields.

A SimpleEnvironment0 is a concrete subclass of SimpleEnvironment that holds no bindings — these are created when you invoke a thunk. A SimpleEnvironment1 is a concrete subclass of SimpleEnvironment that holds one binding, and so on up to SimpleEnvironment3.

  class SimpleEnvironment2 : public SimpleEnvironment {
    Object* var0;
    Object* var1;

  Object* GetArg0() { return var0; }

  };

Looking up a variable in an SimpleEnvironment is quicker because you don’t have to indirect through a vector of bindings. A method as simple as GetArg0(), which simply returns a field in the instance, can often be inlined.

Environments are created by applying closures. There are subtypes of closures that correspond to the different kinds of environments. For example, a SimpleClosure2 is a closure that creates a SimpleEnvironment2.

Closures are created by evaluating lambda expressions. There are subtypes of lambda expressions that correspond to the different kinds of closures. For example, a SimpleLambda3, when evaluated, will create a SimpleClosure3.

When S-code lambda object are created, they are analyzed to see if a first-class environment is captured, and if not, whether any of the variables are mutated. The appropriate subclass of lambda object is created based on this analysis.

Almost all environments end up being SimpleEnvironments and are quite a bit faster than the general case.

by Joe Marshall (noreply@blogger.com) at Tuesday, December 31, 2024

Sunday, December 29, 2024

GNU Guix

Adding a fully-bootstrapped Mono

We used to have a Mono package. It was introduced on August 8 2016 by commit 763b3d50b6249b43fceda51445bbeb1f5f5fd7d0, at Mono version 4.4.1.0, but it was later discovered in April of 2022 that the release tarball that it was built from included prebuilt binaries. Further research revealed that these binaries were not optional. Due to this, a decision was made to remove the Mono package, carried out on September 1, 2022.

We now once again have a Mono package, due to a patch series I submitted on November 29, which after some revisions was committed on December 22. This patch series introduced a full, 17-mono-package sequence that takes us from a mono-1.2.6 built fully from source to mono-6.12.0 built fully from source, using only packages that already have full bootstrap paths. I make no promise that this is the shortest or most optimal path, but it exists and I have verified it works.

As I've spent what is probably an unreasonable amount of time working toward this, I thought I'd share some of my thoughts, experiences, and commentary. Sorry in advance if it gets a bit rambly or lecture-ish.

Prologue

I started down this road because someone I'm working on a project with decided to depend on a C# package that requires C# 12.0 features, and my personal Mono package based on the tarball releases (which include bootstrap binaries) only went up to C# 7.0. This meant that the C# package in question de facto required strictly Microsoft's (er, I mean, "the .NET foundation"'s) .NET implementation — hereafter referred to as "dotnet" — and a very recent version no less. The bootstrapping story with dotnet is very bad; even beginning to untangle it would probably require a relatively modern C# compiler, and something that at least sort of understands MSBuild. And there's not much point to "bootstrapping" dotnet from something that isn't bootstrapped itself. So I figured I may as well start with Mono.

History

While Mono is today probably the most well-known alternative to Microsoft's .NET offerings, it is not the only one. Indeed, in the early 2000s there were at least 2 competing free software implementations: Mono, and DotGNU's Portable.NET (abbreviated pnet). They differed in goals, licenses, and methods: Portable.NET was a GNU project concerned with, among other things, limiting the ability of Microsoft to impose vendor lock-in via its proprietary .NET implementation and software patents. As a GNU project, it used the GPL for its runtime and compiler, and the GPL with a linking exception for its standard library, pnetlib. Mono, on the other hand, used a mix of many copyleft and permissive licenses: X11 for the standard library, GPL for the compiler (later dual-licensed to add an X11 option), and LGPL for the runtime, with GPL and LGPL code also offered "under commercial terms for when the GPL and the LGPL are not suitable". In 2016 after its acquisition by Microsoft, the runtime was relicensed to use the Expat (MIT) license.

But perhaps most importantly to us, while Mono opted to write its C# compiler, mcs, in... C#, Portable.NET's runtime and C# compiler were both written in C. Portable.NET along with the entire DotGNU project (except for LibJIT) was decommissioned in 2012, but the source is still available, and it still works fine (with a few modifications for compatibility with newer versions of its dependencies). In September of 2022, Adam Faiz submitted patches to package pnet and pnetlib, along with one of their dependencies named treecc. These packages were based on the last release of Portable.NET, version 0.8.0, released in 2007. I initially used these packages as the basis for my bootstrap efforts, and even managed to get mono-1.2.6 built using them, but later discovered that using a more recent version from git made it much easier. For example, while pnet-0.8.0 can do pointer arithmetic inside unsafe code blocks, it doesn't support the += or -= operators specifically, which requires lots of patching (after all, who would use x = x + y when you could do x += y?). There are many other similar improvements in the git version, so for this patch series I've decided to go with pnet-git.

The start

After building mono-1.2.6, I tried a few later versions, and the third or fourth one would always fail with errors about missing methods. It turns out that the reason for this is that, contrary to what their marketing suggests, C# and Java are not "write once, run everywhere". This is because their compilers rely on the details of the libraries that the program will be run with at compile-time. This is used, for example, to do overload resolution. Suppose, for example, that a certain implementation of the == operator is present in version 1.0 of a library, and then in version 2.0 of a library a more specific implementation is introduced. Now code that is compiled against version 2.0 may instead automatically reference the more-specific implementation, as is in accordance with the rules of C#. But when it is run with version 1.0, it will fail because that implementation doesn't exist. In my case, for some reason the initial mcs and core libraries being built to compile the rest of Mono were being compiled against a 2.0 library and then run with a 1.0 library. It turns out that this was because mcs uses mono's code for producing assemblies (.NET dlls and exes), and mono decides which version to put in an assembly it writes based on "which runtime version" is being used, and that version is decided at startup based on... the version that was put in the assembly it is running. So for example, mono-1.9.1 would produce 2.0 assemblies because mono-1.2.6 produced 2.0 assemblies because pnet produced 2.0 assemblies. So I modified Mono's runtime in mono-1.9.1 to allow for this version to be overridden via environment variable, and set it to v1.1.4322, and things went a lot more smoothly after that.

From there on it was mostly the usual trial-and-error process of identifying where things had bitrotted. I made sure to unvendor libgc wherever possible, though eventually by mono-4.9.0 they explicitly dropped support in their configure script for using any libgc other than what was bundled, so at that point I switched to using their homebrewed sgen garbage collector.

A concerning development

Once I got to mono-2.11.4, though, things took a turn for the interesting: Mono started using git submodules, and the (recursive? #t) clones were all failing. It turns out that this is because their submodules reference github.com using the git:// protocol.

This is notable for a few reasons.

First, GitHub dropped support for the git:// protocol in 2021, so recursive clones won't work now. This means I have to explicitly list out every submodule, its commit, and its sha256 hash, for every Mono version until they switched to using http or https. mono-2.11.4 has only 4 submodules, but that doesn't last for long: by mono-4.9.0 it has 14 submodules. A significant portion of these patches is just listing these submodules and their hashes. It's a bit annoying.

The more concerning reason, though, is why GitHub dropped support for the git:// protocol: it is unencrypted and unauthenticated. This is mitigated somewhat by the use of sha-1 hashes to identify commits in the referenced submodules, putting a significant computational burden on anyone who would try to alter what was fetched corresponding to a given submodule. Significantly more risky, though, is the process of updating submodules that use git:// URLs. It is quite unlikely that a developer is going to independently clone one of the submodules over https, navigate to a desirable commit, copy the sha-1 hash, and manually update the submodule reference's commit. They're far more likely to run cd submodule; git pull; cd ..; git add submodule; git commit ... or an equivalent.

Of course, any changes a network man-in-the-middle might try to make here would still be reflected in the commit history, so even if a developer did that, they or any of their fellow committers could spot anything strange or malicious and point it out. Also, the changes couldn't be propagated to others trying to pull them who weren't on a network path containing the MITM because the potentially-malicious commit wouldn't be present in the real submodule's repository. So the transparency of git clearly showing changes to text files, combined with the fact that surely no git hosting platform would just allow arbitrary entities to make whatever commits they want accessible under any arbitrary repository URL, rather mitigate this security issue.

This usage of git:// URLs lasted all the way until September 28, 2021, when GitHub's removal of support for it forced the developers to change them to https.

Meanwhile, in reality

On November 28, 2016, Mono added a submodule named roslyn-binaries. Unsurprisingly, it included binary blobs for Microsoft's Roslyn compiler (which I believe had been open-sourced shortly prior). From here on, Mono's build system would default to using these binaries for building on little-endian systems (though another compiler could be specified with the --with-csc configure flag). I happen to know that it is extremely unlikely that many Mono developers used this configure flag. I know this because the 5.0 series is an absolute pain in the neck to build from source, because they consistently depend on new C# features before they implement them.

To go on a brief tangent: does anyone remember back when youtube-dl was temporarily taken down from GitHub due to the RIAA's DMCA request? Many were unhappy about that. One such unhappy person made news when they made the full contents of youtube-dl's repository available to access through the DMCA request repository. It turns out that there are many actions that one can take on GitHub that will make arbitrary commits available under arbitrary repository URLs.

So, in reality, for the span of time from November 28, 2016 to September 28, 2021, anybody sitting on the network path between GitHub and any Mono developer updating the roslyn-binaries submodule could decide on any arbitrary new commit to be used. Of course, merely inspecting the diff for the commit will reveal nothing of use, because the contents are binary blobs. And not only are these blobs those of a compiler, they are the blobs of a compiler that is sure to be used to compile another compiler, which will then be redistributed as an opaque, non-bootstrappable binary blob to be used for compiling other compilers.

You would be hard-pressed to find a more fertile breeding ground for Ken Thompson / Trusting Trust attacks. If every agent of the NSA (and whatever other agencies, including those of other countries, had access to the appropriate network traffic) somehow failed to capitalize on 6 years of opportunity to compromise an entire software ecosystem using only a basic MITM of unencrypted traffic, they deserve to be sacked. Whether such an attack actually occurred or not, this is a case study in carelessness and why bootstrappability is so important; discovering all this made me quite worried about having used a Mono version built from blobs previously, and has convinced me that, as time-wasting and tedious as this project has been, it is nevertheless probably an important one.

Another note on roslyn-binaries

If you're going to write a self-hosting compiler, the least you can do is keep it self-hosting. Deciding to write a self-hosting compiler is a valid choice, of course, with its own merits and demerits, but there is something bitterly poetic about Mono starting out requiring specifically Microsoft's C# compiler in order to build (Mono did its initial bootstrapping using Microsoft's proprietary csc), achieving independence through self-hosting, being acquired by Microsoft, and thereafter coming crawling back to Microsoft's C# compiler once more before eventually dying.

The funny thing is that it's not even necessary. The dependencies on new C# features are all in Mono's standard library (which increasingly borrowed code from Microsoft's corefx library), not in Mono's compiler.

More binary submodules?

Even before roslyn-binaries, there was binary-reference-assemblies, which contained prebuilt "reference" blobs for the various versions of the standard libraries. These exist, I assume, precisely because of the library incompatibility problems regarding overloading that I mentioned earlier. While later versions of Mono included sources and a build system for producing these reference binaries, mono-4.9.0 and earlier did not. Mono's build system still demanded something to install, though, so I told it to use the real standard library of the input Mono version. When I did get to a Mono version that at least claimed to support regenerating the reference binaries, I found that it didn't work with mcs due to differences in which libraries had to be referenced, so I had to patch it to add a bunch of references determined through trial and error.

The xunit-binaries submodule was also added sometime before mono-5.1.0. This dependency makes it impossible to run the full test suite without binary blobs. Presumably for this reason, Debian elects to only run tests within the mono/mini/ and mono/tests/ subdirectories. For my part, I've disabled all tests except for those of mono-6.12.0, the final version, limited to the two aforementioned subdirectories. This is because it would take extra time for the builds, because several of the tests depend on binary blobs bundled into the Mono repository itself (which my thorough cleaning of all dlls and exes from the sources removes), because a large chunk of the tests depend on binary blobs in xunit-binaries in later versions, and because "expect some test failures" is part of the Mono documentation and I don't have the time to figure out for the Mono developers every reason why each of 17 versions of their test suite is broken.

The long march through the 5.0s

The 5.0 series was when Microsoft acquired Mono, and it shows. You'll notice I needed to introduce pre- packages for various versions because in several cases a tagged release could not build the following tagged release. For that matter, they couldn't build the pre- package either, but it at least took fewer patches to get them working. The reason for this is that Mono added a dependency on Microsoft's corefx library source code, and it usually started using C# features well before mcs was able to compile them. Because of this, despite taking 8 versions to get from 1.2.6 to 4.9.0, it took another 8 versions to get through the 5.0 series, and 5 of them required nontrivial patching to massage the source into a form compilable by mcs.

The final stretch

Eventually I realized that the dependencies on new features were all coming from corefx, not from Mono's compiler. Consequently, the only reason for this particular bootstrap-hostile ordering of builds is that it happened to be the order the Mono devs committed things. So I just cherry-picked every commit I could find touching mcs/mcs (magit was quite useful for this) and applied it to 5.10.0 to produce what is essentially the 6.12.0 compiler, then used it to jump straight to building 6.12.0.

Use of this technique earlier on in the bootstrap process may be of interest to anyone looking to shorten the chain of packages.

The finishing touches

My initial goal was to package dotnet, and I had tried to progress toward that from mono-4.9.0 for a period, but with no success. During that time, though, I did encounter a bug in Mono's xbuild condition parser, which I wrote a patch for, and included in mono-6.12.0.

I also discovered that xbuild would wrongly complain about missing references even when the proper assemblies were in MONO_PATH or MONO_GAC_PREFIX, because xbuild would erroneously only consider the path /gnu/store/...mono-6.12.0/lib/mono/gac when looking for global assembly caches, completely ignoring MONO_GAC_PREFIX. So I wrote a patch to fix that, and included it in mono-6.12.0.

Having witnessed how much nicer it is to package things that use rpath / runpath than things that use environment variables (like python) and therefore require constant wrapping of executables and use of propagated-inputs, I devised a patch that would extend Mono's per-assembly config files to support a <runpath> element. For example, if you have a file /tmp/dir2/test2.exe, and there is also a file /tmp/dir2/test2.exe.config, and its contents are

<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <runpath path="/tmp/dir1"/>
</configuration>

and it references test1.dll, it will first look for it at /tmp/dir1/test1.dll. Note that, of course, test1.dll still needs to be accessible to the compiler at compile-time through MONO_PATH or an explicitly-specified path passed on the mcs command line.

It is my hope that this feature will be of use to anybody interested in developing a build system.

Future work

Mono had several difficult points in bootstrapping and packaging, but at the end of the day it still met the basic description of a software package: well-defined environment-supplied inputs and sources, a user-supplied install prefix, and files installed under that prefix.

The dotnet world is an entirely different beast. The first step of most build systems I have encountered from that realm is downloading an entire toolchain, among other dependencies, as a binary blob. They heavily depend on the exact packages they specify being available exactly where they say to install them. There is no "install", there are no "install directories" to my knowledge. A build that doesn't contact nuget.org is an aberration. I am at a loss how to build these things, much less package them. I badly need help.

There are also some portability issues with the current bootstrap path. While Portable.NET can fall back to an interpreter written in C where LibJIT isn't supported, old versions of Mono have no such capability. Strictly speaking, there is some bitrotted code for an interpreter that used to work, but has stopped working by mono-1.2.6. It was left unmaintained until it was eventually removed in 2014, only to be revived in 2017. This poses a dilemma for anybody wanting to bootstrap Mono on a platform that wasn't supported by mono-1.2.6's JIT compiler. There are a number of possible ways to try resolving this, ranging from backporting the new interpreter, to fixing up the old one for every version prior to the new interpreter, to forward-porting the old compiler and class libraries to the new interpreter, etc.

The most interesting option, though, in my opinion, would be to port mcs to Portable.NET. This would achieve the intended portability, while also allowing individual builds to be much faster since we're only building mcs, not the runtime and class library each time. It would also allow us to make much bigger version jumps, since, as we discovered earlier, many of the new C# feature dependencies in Mono come from the class library rather than the compiler. Such a shortened bootstrap could also make the bootstrap path more appealing for other distributions to use instead of binaries.

Closing thoughts

"You wish now that our places had been exchanged. That I had died, and DotGNU had lived?"

"... Yes. I wish that."

Maintenance of Mono was recently transferred over to WineHQ. With that announcement this statement was placed at https://www.mono-project.com:

"We want to recognize that the Mono Project was the first .NET implementation on Android, iOS, Linux, and other operating systems. The Mono Project was a trailblazer for the .NET platform across many operating systems. It helped make cross-platform .NET a reality and enabled .NET in many new places and we appreciate the work of those who came before us."

I would like to clarify that, according to Miguel de Icaza himself, DotGNU "started working on the system about the same time". According to this DotGNU newsletter, Portable.NET began "in January 2001". While it's unclear exactly when Portable.NET reached various milestones, and the significance of the various milestones varies somewhat (for example, Mono probably does not care that Portable.NET also includes a Java and C compiler), I think that there is cause to dispute the claim that Mono was "the first" .NET implementation on Linux.

On a related note, if we haven't looked at the possibility of using Portable.NET in the Java bootstrap process, it may be worth visiting at some point.

Thank you to the DotGNU project, for the .NET implementation that made this bootstrap possible, Adam Faiz, for the initial packaging of it that let me jump straight in, the Mono project, for... Mono, and you, for your time.

by unmush at Sunday, December 29, 2024

Joe Marshall

Interpreting S-code

Lisp programs are not text but rather nested lists of symbols (atoms) and other lists. By default, a symbol represents a variable and a list represents a function call, but some lists are special forms that don’t follow the normal function call semantics.

S-code is an alternative representation for Lisp programs. Instead of nested lists, S-code is a set of nested data structures. Each S-code type corresponds to a Lisp form. There is an almost trivial isomorphism between S-code and the list representation of Lisp. S-code is a bit easier to interpret and compile than the list representation because the data structures representing the code can contain auxiliary information to aid in interpretation of the code. For example, a variable reference in S-code can contain the lexical depth and offset of the variable.

Converting from nested lists to S-code is a process called “syntaxing”. The nested lists are walked, the macros are expanded, auxiliary information is collected, and the S-code is generated. The S-code can be directly interpreted or compiled to machine code. “Unsyntaxing” is the process of converting S-code back to nested lists and it basically involves a walk of the data structure, discarding the auxiliary information, and collecting a tree of lists, thus recovering the original Lisp program (modulo macro expansion).

The MIT-Scheme hardware (Scheme 79, Scheme 81, and Scheme 86) directly executed S-code. MIT-CScheme is a C and assembly program that interprets S-code and runs compiled S-code. MzScheme and Racket also use a similar nested data structure represention of Scheme programs. I don’t think they call it “S-code”, but it is essentially the same thing. No doubt other Lisp and Scheme interpreters use this technique.

(The Lisp machine, by contrast, compiled Lisp programs to a byte-code that was interpreted by the microcoded Lisp machine hardware. I believe that MzScheme and Racket now include a JIT compiler.)

A few years back, for kicks, I wrote an S-code interpreter for Scheme. I decided to write it in C# so that I could leverage the .NET Common Language Runtime. I wouldn’t have to write a garbage collector, for example. I wrote it to interpret the S-code generated by MIT-Scheme. This way I could leverage the MIT-Scheme runtime as well. But this interpreter is not simply a port of the C implementation of MIT-Scheme. Many parts of the interpreter work completely differently from the MIT C implementation. Among them:

  • The C# call stack is used to hold the continuations. Subproblem continuations are C# continuations. Nested Scheme function calls appear as nested C# function calls, so the C# debugger can be used to debug Scheme code.
  • Tail recursion is handled by a trampoline at each function call boundary. To tail call a function, you return the function, the arguments, and a special marker to the caller’s trampoline, which will make the call on your behalf.
  • First class continuations are handled by lightweight stack inspection. In addition to a special marker for tail recursion, there is also a special marker for continuation capture. If this is returned, the caller’s trampoline will evacuate its current stack frame from the stack onto the heap and propagate the special marker up the stack.
  • Eval dispatch is handled through virtual method dispatch. Each type of S-code object has an eval method that is specialized for the type. I was curious if the virtual method dispatch was fast enough to be in the main interpreter loop.
  • The S-code interpreter is instrumented to discover the “hot” paths through the code. Certain patterns of S-code that are determined to be dynamically frequent can be replaced by S-code “superoperators” that provide specialized higher performance evaluation paths that avoid recursive evaluation steps.
  • Variable assignments are eliminated through cell conversion. All variables are ultimately immutable. Variables that were assigned to are converted to mutable cells that are allocated on the heap. This means we can use a flattened environment structure, but calls to procedures with mutable variables will cons.
  • The nested environment structure was replaced with flattened environment vectors. Shared immutable lexical variables are copied, and since mutable variables are stored in cells, their references are copied. Lexical variable lookup is constant independent of the lexical depth, but closure construction becomes linear in the number of variables being closed over.

Limitations

C# does not provide a way to dump a heap snapshot, so it isn’t possible to save a Lisp image. Instead, the interpreter is “cold loaded” each time it is started. Indeed, the main problem with the interpreter is the startup time. It spends an inordinate amount of time initializing the unicode tables.

The major version of MIT-Scheme has been incremented. The C# interpreter fails to reach the prompt when cold loading the latest version. I haven’t tracked down the problem.

The C# interpreter assumes that the S-code abstraction is completely opaque, i.e. that the Scheme code knows nothing about the binary layout of S-code objects and only manipulates them through the provided primitives. This isn’t always the case, however, so the C# intepreter sometimes has to recognize that certain idiomatic patterns of S-code are actually attempting to emulate a primitive operation. For example, system-pair-car is a primitive that extracts the first element of any system object that has two elements, i.e. objects that MIT-Scheme implements as a cons cell even though not tagged as one. But the point of using C# was to hide the representation of Scheme objects from the Scheme code, so many objects are no longer implemeted as cons cells. The interpreter notices calls to system-pair-car and instead extracts whatever field of the object that MIT-Scheme would have put in the car.

The interpreter is not complete enough to host the SF program, which creates Scheme .fasl files from Scheme source. You need MIT-Scheme to create the .fasl files, which can then be loaded into the C# interpreter.

Now what?

I got increasingly frustrated with the limitations. It was taking too long to do a full debugging cycle. Playing whack-a-mole with the bugs was getting tedious.

I wanted to understand fundamentally why interpreters are so slow. In theory, shouldn’t the interpreter be able to perform the same operations as the compiled code? What exactly is “interpreter overhead”? Shouldn’t it be able to run in the same ballpark as the compiled code? What would it take to make a high-performance interpreter?

I discovered some of the answers. A large part of the overhead comes from instruction dispatch. Compiled code has its instructions dispatched in hardware and often have dedicated hardware to read and process the instruction stream. Interpreters use software to do the dispatch. You can make the interpreter faster by combining instructions and dispatching instruction combinations. There is a combinatorical explosion of instruction combinations, however, so you only want to combine the most common instruction sequences.

I wrote a number of hand-coded instruction combinations, but it got tedious and I realized that I wanted an automatic way to generate instruction combinations for common instruction sequences. That is, I wanted a JIT compiler. I put the project on the shelf at this point because a JIT compiled S-code interpreter is another project in itself.

I decided to blog a bit about some of the more unusual features of this interpreter, so watch this space if you are interested.

by Joe Marshall (noreply@blogger.com) at Sunday, December 29, 2024