Planet Scheme

Friday, June 18, 2021

GNU Guix

Substitutes now also available from bordeaux.guix.gnu.org

There have been a number of different project operated sources of substitutes, for the last couple of years the default source of substitutes has been ci.guix.gnu.org (with a few different URLs).

Now, in addition to ci.guix.gnu.org, bordeaux.guix.gnu.org is a default substitute server.

Put that way, this development maybe doesn't sound particularly interesting. Why is a second substitute server useful? There's some thoughts on that exact question in the next section. If you're just interested in how to use (or how not to use) substitutes from bordeaux.guix.gnu.org, then you can just skip ahead to the last section.

Why a second source of substitutes?

This change is an important milestone, following on from the work that started on the Guix Build Coordinator towards the start of 2020.

Back in 2020, the substitute availability from ci.guix.gnu.org was often an issue. There seemed to be a number of contributing factors, including some parts of the architecture. Without going too much in to the details of the issues, aspects of the design of the Guix Build Coordinator were specifically meant to avoid some of these issues.

While there were some very positive results from testing back in 2020, it's taken so long to bring the substitute availability benefits to general users of Guix that ci.guix.gnu.org has changed and improved significantly in the meantime. This means that any benefits in terms of substitute availability are less significant now.

One clearer benefit of just having two independent sources of substitutes is redundancy. While the availability of ci.guix.gnu.org has been very high (in my opinion), having a second independent substitute server should mean that if there's a future issue with users accessing either source of substitutes, the disruption should be reduced.

I'm also excited about the new possibilities offered by having a second substitute server, particularly one using the Guix Build Coordinator to manage the builds.

Substitutes for the Hurd is already something that's been prototyped, so I'm hopeful that bordeaux.guix.gnu.org can start using childhurd VMs to build things soon.

Looking a bit further forward, I think there's some benefits to be had in doing further work on how the nar and narinfo files used for substitutes are managed. There are some rough plans already on how to address the retention of nars, and how to look at high performance mirrors.

Having two substitute servers is one step towards stronger trust policies for substitutes (as discussed on guix-devel, where you would only use a substitute if both ci.guix.gnu.org and bordeaux.guix.gnu.org have built it exactly the same. This would help protect against the compromise of a single substitute server.

Using substitutes from bordeaux.guix.gnu.org

If you're using Guix System, and haven't altered the default substitute configuration, updating guix (via guix pull), reconfiguring using the updated guix, and then restarting the guix-daemon should enable substitutes from bordeaux.guix.gnu.org.

If the ACL is being managed manually, you might need to add the public key for bordeaux.guix.gnu.org manually as well.

When using Guix on a foreign distribution with the default substitute configuration, you'll need to run guix pull as root, then restart the guix-daemon. You'll then need to add the public key for bordeaux.guix.gnu.org to the ACL.

guix archive --authorize < /root/.config/guix/current/share/guix/bordeaux.guix.gnu.org.pub

If you want to just use ci.guix.gnu.org, or bordeaux.guix.gnu.org for that matter, you'll need to adjust the substitute urls configuration for the guix-daemon to just refer to the substitute servers you want to use.

by Christopher Baines at Friday, June 18, 2021

Tuesday, June 15, 2021

Programming Praxis

Cardinal And Ordinal Numbers

Cardinal numbers are those used for counting; spelled in letters, they are one, two, three, four, and so on.

Ordinal numbers are those used for ranking: spelled in letters, they are first, second, third, fourth, and so on.

Your task is to write a program that takes a number and returns the spelled-out cardinal and ordinal forms of that number. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at Tuesday, June 15, 2021

Monday, June 14, 2021

Jeremy Kun

Searching for RH Counterexamples — Exploring Data

We’re ironically searching for counterexamples to the Riemann Hypothesis.

  1. Setting up Pytest
  2. Adding a Database
  3. Search Strategies
  4. Unbounded integers
  5. Deploying with Docker
  6. Performance Profiling
  7. Scaling up
  8. Productionizing

In the last article we added a menagerie of “production readiness” features like continuous integration tooling (automating test running and static analysis), alerting, and a simple deployment automation. Then I let it loose on AWS, got extremely busy with buying a house, forgot about this program for a few weeks (no alerts means it worked flawlessly!), and then saw my AWS bill.

So I copied the database off AWS using pg_dump (piped to gzip), terminated the instances, and inspected the results. A copy of the database is here. You may need git-lfs to clone it. If I wanted to start it back up again, I could spin them back up, and use gunzip | psql to restore the database, and it would start back up from where it left off. A nice benefit of all the software engineering work done thus far.

This article will summarize some of the data, show plots, and try out some exploratory data analysis techniques.

Summary

We stopped the search mid-way through the set of numbers with 136 prime divisors.

The largest number processed was

1255923956750926940807079376257388805204
00410625719434151527143279285143764977392
49474111379646103102793414829651500824447
17178682617437033476033026987651806835743
3694669721205424205654368862231754214894
07691711699791787732382878164959602478352
11435434547040000

Which in factored form is the product of these terms

  2^8   3^7   5^4   7^4  11^3  13^3  17^2  19^2  23^2  29^2
 31^2  37^2  41^2  43^1  47^1  53^1  59^1  61^1  67^1  71^1
 73^1  79^1  83^1  89^1  97^1 101^1 103^1 107^1 109^1 113^1
127^1 131^1 137^1 139^1 149^1 151^1 157^1 163^1 167^1 173^1
179^1 181^1 191^1 193^1 197^1 199^1 211^1 223^1 227^1 229^1
233^1 239^1 241^1 251^1 257^1 263^1 269^1 271^1 277^1 281^1
283^1 293^1 307^1 311^1 313^1 317^1 331^1 337^1 347^1 349^1
353^1 359^1 367^1 373^1 379^1 383^1 389^1 397^1 401^1 409^1
419^1 421^1 431^1 433^1 439^1 443^1 449^1 457^1 461^1 463^1
467^1 479^1 487^1 491^1 499^1 503^1 509^1 521^1 523^1 541^1
547^1 557^1 563^1 569^1 571^1 577^1

The best witness—the number with the largest witness value—was

38824169178385494306912668979787078930475
9208283469279319659854547822438432284497
11994812030251439907246255647505123032869
03750131483244222351596015366602420554736
87070007801035106854341150889235475446938
52188272225341139870856016797627204990720000

which has witness value 1.7707954880001586, which is still significantly smaller than the needed 1.782 to disprove RH.

The factored form of the best witness is

 2^11   3^7   5^4   7^3  11^3  13^2  17^2  19^2  23^2  29^2 
 31^2  37^1  41^1  43^1  47^1  53^1  59^1  61^1  67^1  71^1 
 73^1  79^1  83^1  89^1  97^1 101^1 103^1 107^1 109^1 113^1 
127^1 131^1 137^1 139^1 149^1 151^1 157^1 163^1 167^1 173^1 
179^1 181^1 191^1 193^1 197^1 199^1 211^1 223^1 227^1 229^1 
233^1 239^1 241^1 251^1 257^1 263^1 269^1 271^1 277^1 281^1 
283^1 293^1 307^1 311^1 313^1 317^1 331^1 337^1 347^1 349^1 
353^1 359^1 367^1 373^1 379^1 383^1 389^1 397^1 401^1 409^1 
419^1 421^1 431^1 433^1 439^1 443^1 449^1 457^1 461^1 463^1 
467^1 479^1 487^1 491^1 499^1 503^1 509^1 521^1 523^1 541^1 
547^1 557^1 563^1 

The average search block took 4m15s to compute, while the max took 7m7s and the min took 36s.

The search ran for about 55 days (hiccups included), starting at 2021-03-05 05:47:53 and stopping at 2021-04-28 15:06:25. The total AWS bill—including development, and periods where the application was broken but the instances still running, and including instances I wasn’t using but forgot to turn off—was $380.25. When the application was running at its peak, the bill worked out to about $100/month, though I think I could get it much lower by deploying fewer instances, after we made the performance optimizations that reduced the need for resource-heavy instances. There is also the possibility of using something that integrates more tightly with AWS, such as serverless jobs for the cleanup, generate, and process worker jobs.

Plots

When in doubt, plot it out. I started by writing an export function to get the data into a simpler CSV, which for each n only stored \log(n) and the witness value.

I did this once for the final computation results. I’ll call this the “small” database because it only contains the largest witness value in each block. I did it again for an earlier version of the database before we introduced optimizations (I’ll call this the “large” database), which had all witness values for all superabundant numbers processed up to 80 prime factors.. The small database was only a few dozen megabytes in size, and the large database was ~40 GiB, so I had to use postgres cursors to avoid loading the large database into memory. Moreover, then generated CSV was about 8 GiB in size, and so it required a few extra steps to sort it, and get it into a format that could be plotted in a reasonable amount of time.

First, using GNU sort to sort the file by the first column, \log(n)

sort -t , -n -k 1 divisor_sums.csv -o divisor_sums_sorted.csv

Then, I needed to do some simple operations on massive CSV files, including computing a cumulative max, and filtering down to a subset of rows that are sufficient for plotting. After trying to use pandas and vaex, I realized that the old awk command line tool would be great at this job. So I wrote a simple awk script to process the data, and compute data used for the cumulative max witness value plots below.

Then finally we can use vaex to create two plots. The first is a heatmap of witness value counts. The second is a plot of the cumulative max witness value. For the large database:

Witness value heatmap for the large database
The cumulative maximum witness value for the large database.

And for the small database

A heatmap for the witness values for the small database
The cumulative maximum witness value for the small database.

Note, the two ridges disagree slightly (the large database shows a longer flat line than the small database for the same range), because of the way that the superabundant enumeration doesn’t go in increasing order of n. So larger witness values in the range 400-500 are found later.

Estimating the max witness value growth rate

The next obvious question is whether we can fit the curves above to provide an estimate of how far we might have to look to find the first witness value that exceeds the desired 1.782 threshold. Of course, this will obviously depend on the appropriateness of the underlying model.

A simple first guess would be split between two options: the real data is asymptotic like a + b / x approaching some number less than 1.782 (and hence this approach cannot disprove RH), or the real data grows slowly (perhaps doubly-logarithmic) like a + b \log \log x, but eventually surpasses 1.782 (and RH is false). For both cases, we can use scipy’s curve fitting routine as in this pull request.

For the large database (roughly using log n < 400 since that’s when the curve flatlines due to the enumeration order), we get a reciprocal fit of

\displaystyle f(x) \approx 1.77579122 - 2.72527824 / x

and a logarithmic fit of

\displaystyle f(x) \approx 1.65074314 + 0.06642373 \log(\log(x))

The fit of the large database to a + b/x. Note the asymptote of 1.7757 suggests this will not disprove RH.
The fit of the large database to a + b log log x. If this is accurate, we would find the counterexample around log(n) = 1359.

The estimated asymptote is around 1.7757 in the first case, and the second case estimates we’d find an RH counterexample at around log(n) = 1359.

For the small database of only sufficiently large witness values, this time going up to about log(n) \approx 575, the asymptotic approximation is

\displaystyle 1.77481154 -2.31226382 / x

And the logarithmic approximation is

\displaystyle 1.70825262 + 0.03390312 \log(\log(x))

The reciprocal approximation of the small database with asymptote 1.77481154
The logarithmic approximation of the small database with RH counterexample estimate at log(n) = 6663

Now the asymptote is slightly lower, at 1.7748, and the logarithmic model approximates the counterexample can be found at approximately \log(n) = 6663.

Both of the logarithmic approximations suggest that if we want to find an RH counterexample, we would need to look at numbers with thousands of prime factors. The first estimate puts a counterexample at about 2^{1960}, the second at 2^{9612}, so let’s say between 1k and 10k prime factors.

Luckily, we can actually jump forward in the superabundant enumeration to exactly the set of candidates with m prime factors. So it might make sense to jump ahead to, say, 5k prime factors and search in that region. However, the size of a level set of the superabundant enumeration still grows exponentially in m. Perhaps we should (heuristically) narrow down the search space by looking for patterns in the distribution of prime factors for the best witness values we’ve found thus far. Perhaps the values of n with the best witness values tend to have a certain concentration of prime factors.

Exploring prime factorizations

At first, my thought was to take the largest witness values, look at their prime factorizations, and try to see a pattern when compared to smaller witness values. However, other than the obvious fact that the larger witness values correspond to larger numbers (more and larger prime factors), I didn’t see an obvious pattern from squinting at plots.

To go in a completely different direction, I wanted to try out the UMAP software package, a very nice and mathematically sophisticated for high dimensional data visualization. It’s properly termed a dimensionality reduction technique, meaning it takes as input a high-dimensional set of data, and produces as output a low-dimensional embedding of that data that tries to maintain the same shape as the input, where “shape” is in the sense of a certain Riemannian metric inferred from the high dimensional data. If there is structure among the prime factorizations, then UMAP should plot a pretty picture, and perhaps that will suggest some clearer approach.

To apply this to the RH witness value dataset, we can take each pair (n, \sigma(n)/(n \log \log n)), and associate that with a new (high dimensional) data point corresponding to the witness value paired with the number’s prime factorization

\displaystyle (\sigma(n)/(n \log \log n), k_1, k_2, \dots, k_d),

where n = 2^{k_1} 3^{k_2} 5^{k_3} \dots p_d^{k_d}, with zero-exponents included so that all points have the same dimension. This pull request adds the ability to factorize and export the witness values to a CSV file as specified, and this pull request adds the CSV data (using git-lfs), along with the script to run UMAP, the resulting plots shown below, and the saved embeddings as .npy files (numpy arrays).

When we do nothing special to the data and run it through UMAP we see this plot.

UMAP plotted on the raw prime factorization and witness value dataset.

It looks cool, but if you stare at it for long enough (and if you zoom in when you generate the plot yourself in matplotlib) you can convince yourself that it’s not finding much useful structure. The red dots dominate (lower witness values) and the blue dots are kind of spread haphazardly throughout the red regions. The “ridges” along the chart are probably due to how the superabundant enumeration skips lots of numbers, and that’s why it thins out on one end: the thinning out corresponds to fewer numbers processed that are that large since the enumeration is not uniform.

It also seemed like there is too much data. The plot above has some 80k points on it. After filtering down to just those points whose witness values are bigger than 1.769, we get a more manageable plot.

Witness values and prime factors processed with UMAP, where the witness value is at least 1.769.

This is a bit more reasonable. You can see a stripe of blue dots going through the middle of the plot.

Before figuring out how that blue ridge might relate to the prime factor patterns, let’s take this a few steps further. Typically in machine learning contexts, it helps to normalize your data, i.e., to transform each input dimension into a standard Z-score with respect to the set of values seen in that dimension, subtracting the mean and dividing by the standard deviation. Since the witness values are so close to each other, they’re a good candidate for such normalization. Here’s what UMAP plots when we normalize the witness value column only.

UMAP applied to the (normalized) witness values and prime factorizations. Applied to all witness values.

Now this is a bit more interesting! Here the colormap on the right is in units of standard deviation of witness values. You can see a definite bluest region, and it appears that the data is organized into long brushstrokes, where the witness values increase as you move from one end of the stroke to the other. At worst, this suggests that the dataset has structure that a learning algorithm could discover.

Going even one step further, what if we normalize all the columns? Well, it’s not as interesting.

UMAP when normalizing all columns, not just the witness value.

If you zoom in, you can see that the same sort of “brushstroke” idea is occurring here too, with blue on one end and red on the other. It’s just harder to see.

The previous image, zoomed in around a cluster of data

We would like to study the prettiest picture and see if we can determine what pattern of prime numbers the blue region has, if any. The embedding files are stored on github, and I put up (one version of) the UMAP visualization as an interactive plot via this pull request.

I’ve been sitting on this draft for a while, and while this article didn’t make a ton of headway, the pictures will have to do while I’m still dealing with my new home purchase.

Some ideas for next steps:

  • Squint harder at the distributions of primes for the largest witness values in comparison to the rest.
  • See if a machine learning algorithm can regress witness values based on their prime factorizations (and any other useful features I can derive). Study the resulting hypothesis to determine which features are the most important. Use that to refine the search strategy.
  • Try searching randomly in the superabundant enumeration around 1k and 10k prime factors, and see if the best witness values found there match the log-log regression.
  • Since witness values above a given threshold seem to be quite common, and because the UMAP visualization shows some possible “locality” structure for larger witness values, it suggests if there is a counterexample to RH then there are probably many. So a local search method (e.g., local neighborhood search/discrete gradient ascent with random restarts) might allow us to get a better sense for whether we are on the right track.

Until next time!

by j2kun at Monday, June 14, 2021

Friday, June 11, 2021

GNU Guix

Reproducible data processing pipelines

Last week, we at Guix-HPC published videos of a workshop on reproducible software environments we organized on-line. The videos are well worth watching—especially if you’re into reproducible research, and especially if you speak French or want to practice. This post, though, is more of a meta-post: it’s about how we processed these videos. “A workshop on reproducibility ought to have a reproducible video pipeline”, we thought. So this is what we did!

From BigBlueButton to WebM

Over the last year and half, perhaps you had the “opportunity” to participate in an on-line conference, or even to organize one. If so, chances are that you already know BigBlueButton (BBB), the free software video conferencing suite initially designed for on-line teaching. In a nutshell, it allows participants to chat (audio, video, and keyboard), and speakers can share their screen or a PDF slide deck. Organizers can also record the session.

BBB then creates a link to recorded sessions with a custom JavaScript player that replays everything: typed chat, audio and video (webcams), shared screens, and slide decks. This BBB replay a bit too rough though and often not the thing you’d like to publish after the conference. Instead, you’d rather do a bit of editing: adjusting the start and end time of each talk, removing live chat from what’s displayed (which allows you to remove info that personally identifies participants, too!), and so forth. Turns out this kind of post-processing is a bit of work, primarily because BBB does “the right thing” of recording each stream separately, in the most appropriate form: webcam and screen shares are recorded as separate videos, chat is recorded as text with timings, slide decks is recorded as a bunch of PNGs plus timings, and then there’s a bunch of XML files with metadata putting it all together.

Anyway, with a bit of searching, we quickly found the handy bbb-render tool, which can first download all these files and then assemble them using the Python interface to the GStreamer Editing Services (GES). Good thing: we don’t have to figure out all these things; we “just” have to run these two scripts in an environment with the right dependencies. And guess what: we know of a great tool to control execution environments!

A “deployment-aware Makefile”

So we have a process that takes input files—those PNGs, videos, and XML files—and produces output files—WebM video files. As developers we immediately recognize a pattern and the timeless tool to deal with it: make. The web already seems to contain countless BBB post-processing makefiles (and shell scripts, too). We were going to contribute to this while we suddenly realized that we know of another great tool to express such processes: Guix! Bonus: while a makefile would address just the tip of the iceberg—running bbb-render—Guix can also take care of the tedious task of deploying the right environment to run bbb-render in.

What we did was to write some sort of a deployment-aware makefile. It’s still a relatively unconventional way to use Guix, but one that’s very convenient. We’re talking about videos, but really, you could use the same approach for any kind of processing graph where you’d be tempted to just use make.

The end result here is a Guix file that returns a manifest—a list of videos to “build”. You can build the videos with:

guix build -m render-videos.scm

Overall, the file defines a bunch of functions (procedures in traditional Scheme parlance), each of which takes input files and produces output files. More accurately, these functions returns objects that describe how to build their output from the input files—similar to how a makefile rule describes how to build its target(s) from its prerequisite(s). (The reader familiar with functional programming may recognize a monad here, and indeed, those build descriptions can be thought of as monadic values in a hypothetical “Guix build” monad; technically though, they’re regular Scheme values.)

Let’s take a guided tour of this 300-line file.

Rendering

The first step in this file describes where bbb-render can be found and how to run it to produce a GES “project” file, which we’ll use later to render the video:

(define bbb-render
  (origin
    (method git-fetch)
    (uri (git-reference (url "https://github.com/plugorgau/bbb-render")
                        (commit "a3c10518aedc1bd9e2b71a4af54903adf1d972e5")))
    (file-name "bbb-render-checkout")
    (sha256
     (base32 "1sf99xp334aa0qgp99byvh8k39kc88al8l2wy77zx7fyvknxjy98"))))

(define rendering-profile
  (profile
   (content (specifications->manifest
             '("gstreamer" "gst-editing-services" "gobject-introspection"
               "gst-plugins-base" "gst-plugins-good"
               "python-wrapper" "python-pygobject" "python-intervaltree")))))

(define* (video-ges-project bbb-data start end
                            #:key (webcam-size 25))
  "Return a GStreamer Editing Services (GES) project for the video,
starting at START seconds and ending at END seconds.  BBB-DATA is the raw
BigBlueButton directory as fetched by bbb-render's 'download.py' script.
WEBCAM-SIZE is the percentage of the screen occupied by the webcam."
  (computed-file "video.ges"
                 (with-extensions (list (specification->package "guile-gcrypt"))
                  (with-imported-modules (source-module-closure
                                          '((guix build utils)
                                            (guix profiles)))
                    #~(begin
                        (use-modules (guix build utils) (guix profiles)
                                     (guix search-paths) (ice-9 match))

                        (define search-paths
                          (profile-search-paths #+rendering-profile))

                        (for-each (match-lambda
                                    ((spec . value)
                                     (setenv
                                      (search-path-specification-variable
                                       spec)
                                      value)))
                                  search-paths)

                        (invoke "python"
                                #+(file-append bbb-render "/make-xges.py")
                                #+bbb-data #$output
                                "--start" #$(number->string start)
                                "--end" #$(number->string end)
                                "--webcam-size"
                                #$(number->string webcam-size)))))))

First it defines the source code location of bbb-render as an “origin”. Second, it defines rendering-profile as a “profile” containing all the packages needed to run bbb-render’s make-xges.py script. The specification->manifest procedure creates a manifest from a set of packages specs, and likewise specification->package returns the package that matches a given spec. You can try these things at the guix repl prompt:

$ guix repl
GNU Guile 3.0.7
Copyright (C) 1995-2021 Free Software Foundation, Inc.

Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
This program is free software, and you are welcome to redistribute it
under certain conditions; type `,show c' for details.

Enter `,help' for help.
scheme@(guix-user)> ,use(guix profiles)
scheme@(guix-user)> ,use(gnu)
scheme@(guix-user)> (specification->package "guile@2.0")
$1 = #<package guile@2.0.14 gnu/packages/guile.scm:139 7f416be776e0>
scheme@(guix-user)> (specifications->manifest '("guile" "gstreamer" "python"))
$2 = #<<manifest> entries: (#<<manifest-entry> name: "guile" version: "3.0.7" …> #<<manifest-entry> name: "gstreamer" version: "1.18.2" …> …)

Last, it defines video-ges-project as a function that takes the BBB raw data, a start and end time, and produces a video.ges file. There are three key elements here:

  1. computed-file is a function to produce a file, video.ges in this case, by running the code you give it as its second argument—the recipe, in makefile terms.
  2. The recipe passed to computed-file is a G-expression (or “gexp”), introduced by this fancy #~ (hash tilde) notation. G-expressions are a way to stage code, to mark it for eventual execution. Indeed, that code will only be executed if and when we run guix build (without --dry-run), and only if the result is not already in the store.
  3. The gexp refers to rendering-profile, to bbb-render, to bbb-data and so on by escaping with the #+ or #$ syntax (they’re equivalent, unless doing cross-compilation). During build, these reference items in the store, such as /gnu/store/…-bbb-render, which is itself the result of “building” the origin we’ve seen above. The #$output reference corresponds to the build result of this computed-file, the complete file name of video.ges under /gnu/store.

That’s quite a lot already! Of course, this real-world example is more intimidating than the toy examples you’d find in the manual, but really, pretty much everything’s there. Let’s see in more detail at what’s inside this gexp.

The gexp first imports a bunch of helper modules with build utilities and tools to manipulate profiles and search path environment variables. The for-each call iterates over search path environment variables—PATH, PYTHONPATH, and so on—, setting them so that the python command is found and so that the needed Python modules are found.

The with-imported-modules form above indicates that the (guix build utils) and (guix profiles) modules, which are part of Guix, along with their dependencies (their closure), need to be imported in the build environment. What about with-extensions? Those (guix …) module indirectly depend on additional modules, provided by the guile-gcrypt package, hence this spec.

Next comes the ges->webm function which, as the name implies, takes a .ges file and produces a WebM video file by invoking ges-launch-1.0. The end result is a video containing the recording’s audio, the webcam and screen share (or slide deck), but not the chat.

Opening and closing

We have a WebM video, so we’re pretty much done, right? But… we’d also like to have an opening, showing the talk title and the speaker’s name, as well as a closing. How do we get that done?

Perhaps a bit of a sledgehammer, but it turns out that we chose to produce those still images with LaTeX/Beamer, from these templates.

We need again several processing steps:

  1. We first define the latex->pdf function that takes a template .tex file, a speaker name and title. It copies the template, replaces placeholders with the speaker name and title, and runs pdflatex to produce the PDF.
  2. The pdf->bitmap function takes a PDF and returns a suitably-sized JPEG.
  3. image->webm takes that JPEG and invokes ffmpeg to render it as WebM, with the right resolution, frame rate, and audio track.

With that in place, we define a sweet and small function that produces the opening WebM file for a given talk:

(define (opening title speaker)
  (image->webm
   (pdf->bitmap (latex->pdf (local-file "opening.tex") "opening.pdf"
                            #:title title #:speaker speaker)
                "opening.jpg")
   "opening.webm" #:duration 5))

We need one last function, video-with-opening/closing, that given a talk, an opening, and a closing, concatenates them by invoking ffmpeg.

Putting it all together

Now we have all the building blocks!

We use local-file to refer to the raw BBB data, taken from disk:

(define raw-bbb-data/monday
  ;; The raw BigBlueButton data as returned by './download.py URL', where
  ;; 'download.py' is part of bbb-render.
  (local-file "bbb-video-data.monday" "bbb-video-data"
              #:recursive? #t))

(define raw-bbb-data/tuesday
  (local-file "bbb-video-data.tuesday" "bbb-video-data"
              #:recursive? #t))

No, the raw data is not in the Git repository (it’s too big and contains personally-identifying information about participants), so this assumes that there’s a bbb-video-data.monday and a bbb-video-data.tuesday in the same directory as render-videos.scm.

For good measure, we define a <talk> data type:

(define-record-type <talk>
  (talk title speaker start end cam-size data)
  talk?
  (title     talk-title)
  (speaker   talk-speaker)
  (start     talk-start)           ;start time in seconds
  (end       talk-end)             ;end time
  (cam-size  talk-webcam-size)     ;percentage used for the webcam
  (data      talk-bbb-data))       ;BigBlueButton data

… such that we can easily define talks, along with talk->video, which takes a talk and return a complete, final video:

(define (talk->video talk)
  "Given a talk, return a complete video, with opening and closing."
  (define file-name
    (string-append (canonicalize-string (talk-speaker talk))
                   ".webm"))

  (let ((raw (ges->webm (video-ges-project (talk-bbb-data talk)
                                           (talk-start talk)
                                           (talk-end talk)
                                           #:webcam-size
                                           (talk-webcam-size talk))
                        file-name))
        (opening (opening (talk-title talk) (talk-speaker talk))))
    (video-with-opening/closing file-name raw
                                opening closing.webm)))

The very last bit iterates over the talks and returns a manifest containing all the final videos. Now we can build the ready-to-be-published videos, all at once:

$ guix build -m render-videos.scm
[… time passes…]
/gnu/store/…-emmanuel-agullo.webm
/gnu/store/…-francois-rue.webm
…

Voilà!

Image of an old TV screen showing a video opening.

Why all the fuss?

OK, maybe you’re thinking “this is just another hackish script to fiddle with videos”, and that’s right! It’s also worth mentioning another approach: Racket’s video language, which is designed to manipulate video abstractions, similar to GES but with a sweet high-level functional interface.

But look, this one’s different: it’s self-contained, it’s reproducible, and it has the right abstraction level. Self-contained is a big thing; it means you can run it and it knows what software to deploy, what environment variables to set, and so on, for each step of the pipeline. Granted, it could be simplified with appropriate high-level interfaces in Guix. But remember: the alternative is a makefile (“deployment-unaware”) completed by a README file giving a vague idea of the dependencies needed. The reproducible bit is pretty nice too (especially for a workshop on reproducibility). It also means there’s caching: videos or intermediate byproducts already in the store don’t need to be recomputed. Last, we have access to a general-purpose programming language where we can build abstractions, such as the <talk> data type, that makes the whole thing more pleasant to work with and more maintainable.

Hopefully that’ll inspire you to have a reproducible video pipeline for your next on-line event, or maybe that’ll inspire you to replace your old makefile and shelly habits for data processing!

High-performance computing (HPC) people might be wondering how to go from here and build “computing-resource-aware” or “storage-resource-aware” pipelines where each computing step could be submitted to the job scheduler of an HPC cluster and use distributed file systems for intermediate results rather than /gnu/store. If you’re one of these folks, do take a look at how the Guix Workflow Language addresses these issues.

Acknowledgments

Thanks to Konrad Hinsen for valuable feedback on an earlier draft.

About GNU Guix

GNU Guix is a transactional package manager and an advanced distribution of the GNU system that respects user freedom. Guix can be used on top of any system running the Hurd or the Linux kernel, or it can be used as a standalone operating system distribution for i686, x86_64, ARMv7, AArch64 and POWER9 machines.

In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. When used as a standalone GNU/Linux distribution, Guix offers a declarative, stateless approach to operating system configuration management. Guix is highly customizable and hackable through Guile programming interfaces and extensions to the Scheme language.

by Ludovic Courtès at Friday, June 11, 2021

Tuesday, June 8, 2021

Programming Praxis

Approximate Squaring

Lagarias and Sloane study the “approximate squaring” map f(x) = xx⌉ and its behavior when iterated in this paper.

Consider the fraction x = n / d when n > d > 1; let’s take 8/7 as an example. In the first step, the smallest integer greater than 8/7 (the “ceiling”) is 2, and 8/7 × 2 = 16/7. In the second step, the ceiling of 16/7 is 3, so we have 16/7 × 3 = 48/7. And in the third step, the ceiling of 48/7 is 7, so we have 48/7 × 7 = 48. Now the denominator is 1 and the result is an integer, so iteration stops, and we say that 8/7 goes to 48 in 3 steps.

Study shows the iteration is chaotic; sometimes the iteration stops in just a few steps, as in 8/7, sometimes it takes longer (6/5 goes to a number with 57735 digits in 18 steps), and sometimes it’s just ridiculous (200/199 goes to a number with 10435 digits). It is conjectured but not proven that iterated approximate squaring always terminates in an integer.

Your task is to write a program that iterates approximate squaring. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at Tuesday, June 8, 2021

Friday, May 28, 2021

Scheme Requests for Implementation

SRFI 221: Generator/accumulator sub-library

SRFI 221 is now in final status.

This is a set of convenience routines for generators and accumulators intended to blend in with SRFI 158. The authors recommend that they be added to the (srfi 158) library provided by users or implementations. If they are approved by the R7RS-large process, they can also be added to (r7rs generator).

by John Cowan (text) and Arvydas Silanskas (implementation) at Friday, May 28, 2021

Friday, May 21, 2021

Joe Marshall

Stupid Y operator tricks

Here is the delta function: δ = (lambda (f) (f f)). Delta takes a function and tail calls that function on itself. What happens if we apply the delta function to itself? Since the delta function is the argument, it is tail called and applied to itself. Which leads again to itself being tail called and applied to itself. We have a situation of infinite regression: the output of (δ δ) ends up being a restatement of the output of (δ δ). Now in this case, regression is infinite and there is no base case, but imagine that somehow there were a base case, or that somehow we identified a value that an infinite regression equated to. Then each stage of the infinite regression just replicates the previous stage exactly. It is like having a perfectly silvered mirror: it just replicates the image presented to it exactly. By calling delta on delta, we've arranged our perfectly silvered mirror to reflect an image of itself. This leads to the “infinite hall of mirrors” effect.

So let's tweak the delta function so that instead of perfectly replicating the infinite regression, it applies a function g around the replication: (lambda (f) (g (f f))). If we apply this modified delta function to itself, each expansion of the infinite regression ends up wrapping an application of the g around it: (g (f f)) = (g (g (f f))) = (g (g (g (f f)))) = (g (g (g (g … )))). So our modified delta function gives us a nested infinite regression of applications of g. This is like our perfectly silvered mirror, but now the reflected image isn't mirrored exactly: we've put a frame on the mirror. When we arrange for the mirror to reflect itself, each nested reflection also has an image of the frame around the reflection, so we get a set of infinitely nested frames.

An infinite regression of (g (g (g (g … )))) is confusing. What does it mean? We can untangle this by unwrapping an application. (g (g (g (g … )))) is just a call to g. The argument to that call is weird, but we're just calling (g <something>). The result of the infinite regression (g (g (g (g … )))) is simply the result of the outermost call to g. We can use this to build a recursive function.

;; If factorial = (g (g (g (g … )))), then
;; factorial = (g factorial), where

(defun g (factorial)
  (lambda (x)
    (if (zerop x)
        1
        (* x (funcall factorial (- x 1))))))
The value returned by an inner invocation of g is the value that will be funcalled in the altenative branch of the conditional.

Y is defined thus:

Y = λg.(λf.g(f f))(λf.g(f f))

A straightforward implementation attempt would be
;; Non working y operator
(defun y (g)
  (let ((d (lambda (f) (funcall g (funcall f f)))))
    (funcall d d)))
but since lisp is a call-by-value language, it will attempt to (funcall f f) before funcalling g, and this will cause runaway recursion. We can avoid the runaway recursion by delaying the (funcall f f) with a strategically placed thunk
;; Call-by-value y operator
;; returns (g (lambda () (g (lambda () (g (lambda () … ))))))
(defun y (g)
  (let ((d (lambda (f) (funcall g (lambda () (funcall f f))))))
    (funcall d d)))
Since the recursion is now wrapped in a thunk, we have to funcall the thunk to force the recursive call. Here is an example where we see that:
* (funcall (Y (lambda (thunk)
                (lambda (x)
                  (if (zerop x)
                      1
                      (* x (funcall (funcall thunk) (- x 1)))))))
           6)
720
the (funcall thunk) invokes the thunk in order to get the actual recursive function, which we when then funcall on (- x 1).

By wrapping the self-application with a thunk, we've made the call site where we use the thunk more complicated. We can clean that up by wrapping the call to the thunk in something nicer:

* (funcall
    (y (lambda (thunk)
         (flet ((factorial (&rest args)
                  (apply (funcall thunk) args)))

           (lambda (x)
             (if (zerop x)
                 1
                 (* x (factorial (- x 1))))))))
      6)
720
And we can even go so far as to hoist that wrapper back up into the definiton of y
(defun y1 (g)
  (let ((d (lambda (f) (funcall g (lambda (&rest args) (apply (funcall f f) args))))))
    (funcall d d)))

* (funcall
    (y1 (lambda (factorial)
          (lambda (x)           
            (if (zerop x)
                1
                (* x (funcall factorial x))))))
    6)
720
y1 is an alternative formulation of the Y operator where we've η-expanded the recursive call to avoid the runaway recursion.

The η-expanded version of the applicative order Y operator has the advantage that it is convenient for defining recursive functions. The thunkified version is less convenient because you have to force the thunk before using it, but it allows you to use the Y operator to define recursive data structures as well as functions:

(Y
  (lambda (delayed-ones)
    (cons-stream 1 (delayed-ones))))
{1 …}

The argument to the thunkified Y operator is itself a procedure of one argument, the thunk. Y returns the result of calling its argument. Y should return a procedure, so the argument to Y should return a procedure. But it doesn't have to immediately return a procedure, it just has to eventually return a procedure, so we could, for example, print something before returning the procedure:

* (funcall (Y (lambda (thunk)
                (format t "~%Returning a procedure")
                (lambda (x)
                  (if (zerop x)
                      1
                      (* x (funcall (funcall thunk) (- x 1)))))))
         6)
Returning a procedure
Returning a procedure
Returning a procedure
Returning a procedure
Returning a procedure
Returning a procedure
720
There is one caveat. You must be able to return the procedure without attempting to make the recursive call.

Let's transform the returned function before returning it by applying an arbitrary function h to it:

(Y (lambda (thunk)
     (h (lambda (x)
          (if (zerop x)
              1
              … )))))
Ok, so now when we (funcall thunk) we don't get what we want, we've got an invocation of h around it. If we have an inverse to h, h-1, available, we can undo it:
(y (lambda (thunk)
      (h (lambda (x)
           (if (zerop x)
               1
               (* (funcall (h-1 (funcall thunk)) (- x 1))))))))
As a concrete example, we return a list and at the call site we extract the first element of that list before calling it:
* (funcall (car (y (lambda (thunk)
                     (list (lambda (x)
                             (if (zerop x)
                                 1
                                 (* x (funcall (car (funcall thunk)) (- x 1))))))))
           6)
720
So we can return a list of mutually recursive functions:
(y (lambda (thunk)
    (list
      ;; even?
      (lambda (n)
        (or (zerop n)
            (funcall (cadr (funcall thunk)) (- n 1))))

      ;; odd?
      (lambda (n)
        (and (not (zerop n))
             (funcall (car (funcall thunk)) (- n 1))))
      )))
If we use the η-expanded version of the Y operator, then we can adapt it to expect a list of mutually recursive functions on the recursive call:
(defun y* (&rest g-list)
  (let ((d (lambda (f)
             (map 'list (lambda (g)
                          (lambda (&rest args)
                            (apply (apply g (funcall f f)) args)))
                  g-list))))
     (funcall d d)))
which we could use like this:
* (let ((eo (y* (lambda (even? odd?)
                  (declare (ignore even?))
                  (lambda (n)
                    (or (zerop n)
                        (funcall odd? (- n 1)))))

                (lambda (even? odd?)
                  (declare (ignore odd?))
                  (lambda (n)
                    (and (not (zerop n))
                         (funcall even? (- n 1))))))))
     (let ((even? (car eo))
           (odd?  (cadr eo)))
       (do ((i 0 (+ i 1)))
           ((>= i 5))
         (format t "~%~d, ~s ~s"
                 i
                 (funcall even? i)
                 (funcall odd? i)))))))
                 
0, T NIL
1, NIL T
2, T NIL
3, NIL T
4, T NIL
Instead of returning a list of mutually recursive functions, we could return them as multiple values. We just have to be expecting multiple values at the call site:
(defun y* (&rest gs)
  (let ((d (lambda (f)
             (apply #'values
                    (map 'list
                         (lambda (g)
                           (lambda (&rest args)
                             (apply (multiple-value-call g (funcall f f)) args)))
                         gs)))))
    (funcall d d)))

MIT Scheme used to have a construct called a named lambda. A named lambda has an extra first argument that is automatically filled in with the function itself. So during evaluation of the body of a named lambda, the name is bound to the named lambda, enabling the function to call itself recursively:

(defmacro named-lambda ((name &rest args) &body body)
  `(y1 (lambda (,name)
         (lambda ,args
           ,@body))))

* (funcall (named-lambda (factorial x)
            (if (zerop x)
                1
                (* x (funcall factorial (- x 1)))))
         6)
720
This leads us to named let expressions. In a named let, the implicit lambda that performs the let bindings is a named lambda. Using that name to invoke the lambda on a different set of arguments is like recursively re-doing the let.
* (named-let fact ((x 6)) (if (zerop x) 1 (* x (funcall fact (- x 1)))))

720

In Scheme, you use letrec to define recursive or mutually recursive procedures. Internal definitions expand into an appropriate letrec. letrec achieves the necessary circularity not through the Y operator, but through side effects. It is hard to tell the difference, but there is a difference. Using the Y operator would allow you to have recursion, but avoid the implicit side effects in a letrec.

Oleg Kiselyov has more to say about the Y operator at http://okmij.org/ftp/Computation/fixed-point-combinators.html

by Joe Marshall (noreply@blogger.com) at Friday, May 21, 2021

Tuesday, May 18, 2021

Ruse Scheme

Ruse Scheme shall be

2021/05/18 - Ruse Scheme shall be

Ruse Scheme, formely known as Arew Scheme, is at this stage, a collection of Scheme libraries for Chez Scheme. There is a grand scheme plan plot machination for it. Read on.

What is a civilization kit?

A civilization kit is a software or set of software that ease the organization of life. So far, there is really one civkit that is mostly privateer that includes and is not limited to:

  • Wikimedia project such as Wikipedia, Wikidata, Wiktionary...
  • Google, Facebook, Github, Instagram, Twitter, Reddit, StackOverflow, Quora...
  • Android, iOS, Firefox, Chrome..
  • MacOS, FreeBSD, NetBSD, Windows, Debian, Fedora, Ubuntu...
  • Mastodon, and other projects that rely on activitypub...

And many more... that is only the visible part of Earth software system. They are software that aim to ease the the production of software or hardware. They are also software that helps with governance, provide tools to ease law making process, sustain production chain of food, energy, medecine, culture, education...

They are a lot of software, and that collection form a civkit.

Is Ruse Scheme a new Scheme?

Yes, and no. It depends what is meant by a new Scheme.

Sometime a Scheme is a software that gathers many Scheme libraries, and rely on existing Scheme to execute their code. That is the case of Ruse.

Most of the time, a Scheme is a software that interpret and/or compile a lot of parentheses that is more or less compatible with RnRS. In this regard, Ruse is a Scheme, but it is not completly new. It rely on Chez Scheme to produce executables that can be run on a server or a desktop. Ruse will support Web Assembly and JavaScript to run Scheme in a Web browser.

Some Scheme implementation do a little of both, and also deliver features that go beyond past or current RnRS. Ruse does that, and shall reach beyond...

The main difference with existing Scheme implementations is not found at the programming language level. Ruse is and will stay a Scheme.

The main area Ruse try to innovate is the rest: whether it is the the production or sharing of code, Ruse aim to make it easier than sharing a meme. Another area Ruse try to innovate is to state upfront the scope of the project.

What are the short term goal of Ruse Scheme?

The short term goal of Ruse Scheme is to build a scalable search engine: Babelia. Babelia will both scale-up and scale-down in terms of required hardware. In other words, it may run in the cloud or on a Raspberry Pi.

That first milestone will demonstrate how to build a distributed Von Neumann architecture that aim to be easier to work with than current approaches.

This is the first milestone because it is easier than going fully dencentralized first. It will deliver the necessary tools to work with the current Internet.

The plan is to deliver Babelia in 2022.

What is the next Internet?

The next Internet is an Internet that is more open, more decentralized, more accessible, and resting upon the fundamental principle.

What is the distributed Von Neumann architecture?

The distributed Von Neumann architecture is like a regular computer that rely on multiple commodity computers.

It is different from a compute grid, because it is not meant only for batch processing.

In Babelia, that distributed computer has volatile memory, non-volatile memory, possibly vectors or graphics processing units, and generic computation units.

The goal is to make it easier to build software inside a trusted network of computers.

What are the mid term goals of Ruse Scheme?

Mid term goals of Ruse Scheme are three folds:

  • Offer enough tooling to make it easier to create, sell and make a living by producing Scheme code. This includes making it painless to interop with existing software.

  • Implement a package manager inspired from Nix, and backed up by content-addressable code that can be translated into multiple natural languages with the help of a decentralized peer-to-peer network.

  • Explore a new operating system desktop paradigm resting upon the fundamental principle.

What is the goal of Ruse Scheme?

The goal of Ruse Scheme is to build a coherant bootstrapable whole-stack civkit for a sustainable civilization, resting upon the fundamental principle.

What is whole-stack?

Whole-stack build upon the full-stack concept to include programming databases, and kernels.

What is Ruse Scheme license?

Ruse Scheme is licensed under the Cooperative Non-violent Public License without exceptions.

What is the fundamental principle?

If a system must serve the creative spirit, it must be entirely comprehensible by a single individual.

Tuesday, May 18, 2021

Programming Praxis

100011322432435545

I had an email from a reader who found my factoring program on Stack Overflow:

Python 2.7.13 (default, Mar 13 2017, 20:56:15)
[GCC 5.4.0] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> def isPrime(n, k=5): # miller-rabin
...     from random import randint
...     if n < 2: return False
...     for p in [2,3,5,7,11,13,17,19,23,29]:
...         if n % p == 0: return n == p
...     s, d = 0, n-1
...     while d % 2 == 0:
...         s, d = s+1, d/2
...     for i in range(k):
...         x = pow(randint(2, n-1), d, n)
...         if x == 1 or x == n-1: continue
...         for r in range(1, s):
...             x = (x * x) % n
...             if x == 1: return False
...             if x == n-1: break
...         else: return False
...     return True
...
>>> def factors(n, b2=-1, b1=10000): # 2,3,5-wheel, then rho
...     def gcd(a,b): # euclid's algorithm
...         if b == 0: return a
...         return gcd(b, a%b)
...     def insertSorted(x, xs): # linear search
...         i, ln = 0, len(xs)
...         while i < ln and xs[i] < x: i += 1
...         xs.insert(i,x)
...         return xs
...     if -1 <= n <= 1: return [n]
...     if n < -1: return [-1] + factors(-n)
...     wheel = [1,2,2,4,2,4,2,4,6,2,6]
...     w, f, fs = 0, 2, []
...     while f*f <= n and f < b1:
...         while n % f == 0:
...             fs.append(f)
...             n /= f
...         f, w = f + wheel[w], w+1
...         if w == 11: w = 3
...     if n == 1: return fs
...     h, t, g, c = 1, 1, 1, 1
...     while not isPrime(n):
...         while b2 <> 0 and g == 1:
...             h = (h*h+c)%n # the hare runs
...             h = (h*h+c)%n # twice as fast
...             t = (t*t+c)%n # as the tortoise
...             g = gcd(t-h, n); b2 -= 1
...         if b2 == 0: return fs
...         if isPrime(g):
...             while n % g == 0:
...                 fs = insertSorted(g, fs)
...                 n /= g
...         h, t, g, c = 1, 1, 1, c+1
...     return insertSorted(n, fs)

He complained that the program failed to factor the number 100011322432435545, entering an infinite loop instead of returning a list of factors.

Your task is to figure out what is wrong, and fix it. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at Tuesday, May 18, 2021

Saturday, May 15, 2021

Joe Marshall

β-conversion

If you have an expression that is an application, and the operator of the application is a lambda expression, then you can β-reduce the application by substituting the arguments of the application for the bound variables of the lambda within the body of the lambda.

(defun beta (expression if-reduced if-not-reduced)
  (if (application? expression)
      (let ((operator (application-operator expression))
            (operands (application-operands expression)))
        (if (lambda? operator)
            (let ((bound-variables (lambda-bound-variables operator))
                  (body (lambda-body operator)))
              (if (same-length? bound-variables operands)
                  (funcall if-reduced
                           (xsubst body
                                   (table/extend* (table/empty)
                                                  bound-variables
                                                  operands)))
                  (funcall if-not-reduced)))
            (funcall if-not-reduced)))
      (funcall if-not-reduced)))

* (beta '((lambda (x y) (lambda (z) (* x y z))) a (+ z 3))
        #'identity (constantly nil))
(LAMBDA (#:Z460) (* A (+ Z 3) #:Z460))

A large, complex expression may or may not have subexpressions that can be β-reduced. If neither an expression or any of its subexpressions can be β-reduced, then we say the expression is in “β-normal form”. We may be able to reduce an expression to β-normal form by β-reducing where possible. A β-reduction can introduce further reducible expressions if we substitute a lambda expression for a symbol in operator position, so reducing to β-normal form is an iterative process where we continue to reduce any reducible expressions that arise from substitution.

(defun beta-normalize-step (expression)
  (expression-dispatch expression

    ;; case application
    (lambda (subexpressions)
      ;; Normal order reduction
      ;; First, try to beta reduce the outermost application,
      ;; otherwise, recursively descend the subexpressions, working
      ;; from left to right.
      (beta expression
            #'identity
            (lambda ()
              (labels ((l (subexpressions)
                         (if (null subexpressions)
                             '()
                             (let ((new-sub (beta-normalize-step (car subexpressions))))
                               (if (eq new-sub (car subexpressions))
                                   (let ((new-tail (l (cdr subexpressions))))
                                     (if (eq new-tail (cdr subexpressions))
                                         subexpressions
                                         (cons (car subexpressions) new-tail)))
                                   (cons new-sub (cdr subexpressions)))))))
                (let ((new-subexpressions (l subexpressions)))
                  (if (eq new-subexpressions subexpressions)
                      expression
                      (make-application new-subexpressions)))))))

    ;; case lambda
    (lambda (bound-variables body)
      (let ((new-body (beta-normalize-step body)))
        (if (eql new-body body)
            expression
            (make-lambda bound-variables new-body))))

    ;; case symbol
    (constantly expression)))

;;; A normalized expression is a fixed point of the
;;; beta-normalize-step function.
(defun beta-normalize (expression)
  (do ((expression expression (beta-normalize-step expression))
       (expression1 '() expression)
       (count 0 (+ count 1)))
      ((eq expression expression1)
       (format t "~%~d beta reductions" (- count 1))
       expression)))

You can compute just by using β-reduction. Here we construct an expression that reduces to the factorial of 3. We only have β-reduction, so we have to encode numbers with Church encoding.

(defun test-form ()
  (let ((table
         (table/extend*
          (table/empty)
          '(one
            three
            *
            pred
            zero?
            y)
          '(
            ;; Church numeral one
            (lambda (f) (lambda (x) (f x)))

            ;; Church numeral three 
            (lambda (f) (lambda (x) (f (f (f x)))))

            ;; * (multiply Church numerals)
            (lambda (m n)
              (lambda (f)
                (m (n f))))

            ;; pred (subtract 1 from Church numeral)
            (lambda (n)
              (lambda (f)
                (lambda (x) (((n (lambda (g)
                                   (lambda (h)
                                     (h (g f)))))
                              (lambda (u) x))
                             (lambda (u) u)))))

            ;; zero? (test if Church numeral is zero)
            (lambda (n t f) ((n (lambda (x) f)) t))

            ;; Y operator for recursion
            (lambda (f)
              ((lambda (x) (f (x x)))
               (lambda (x) (f (x x)))))

            )))
        (expr
         '((lambda (factorial)
             (factorial three))
           (y (lambda (fact)
                (lambda (x)
                  (zero? x
                   one
                   (* (fact (pred x)) x))))))))
    (xsubst expr table)))

* (test-form)

((LAMBDA (FACTORIAL) (FACTORIAL (LAMBDA (F) (LAMBDA (X) (F (F (F X)))))))
 ((LAMBDA (F) ((LAMBDA (X) (F (X X))) (LAMBDA (X) (F (X X)))))
  (LAMBDA (FACT)
    (LAMBDA (X)
      ((LAMBDA (N T F) ((N (LAMBDA (X) F)) T)) X
       (LAMBDA (F) (LAMBDA (X) (F X)))
       ((LAMBDA (M N) (LAMBDA (F) (M (N F))))
        (FACT
         ((LAMBDA (N)
            (LAMBDA (F)
              (LAMBDA (X)
                (((N (LAMBDA (G) (LAMBDA (H) (H (G F))))) (LAMBDA (U) X))
                 (LAMBDA (U) U)))))
          X))
        X))))))

* (beta-normalize (test-form))

127 beta reductions
(LAMBDA (F) (LAMBDA (X) (F (F (F (F (F (F X))))))))

This is the Church numeral for 6.

I find it pretty amazing that we can bootstrap ourselves up to arithmetic just by repeatedly β-reducing where we can. It doesn't seem like we're actually doing any work. We're just replacing names with what they stand for.

The β-substitution above replaces all the bound variables with their arguments if there is the correct number of arguments. One could easily implement a partial β-substitution that replaced only some of the bound variables. You'd still have an application, but some of the bound variables in the lambda would be eliminated and the corresponding argument would be removed.

by Joe Marshall (noreply@blogger.com) at Saturday, May 15, 2021

Thursday, May 13, 2021

Andy Wingo

cross-module inlining in guile

Greetings, hackers of spaceship Earth! Today's missive is about cross-module inlining in Guile.

a bit of history

Back in the day... what am I saying? I always start these posts with loads of context. Probably you know it all already. 10 years ago, Guile's partial evaluation pass extended the macro-writer's bill of rights to Schemers of the Guile persuasion. This pass makes local function definitions free in many cases: if they should be inlined and constant-folded, you are confident that they will be. peval lets you write clear programs with well-factored code and still have good optimization.

The peval pass did have a limitation, though, which wasn't its fault. In Guile, modules have historically been a first-order concept: modules are a kind of object with a hash table inside, which you build by mutating. I speak crassly but that's how it is. In such a world, it's hard to reason about top-level bindings: what module do they belong to? Could they ever be overridden? When you have a free reference to a, and there's a top-level definition of a in the current compilation unit, is that the a that's being referenced, or could it be something else? Could the binding be mutated in the future?

During the Guile 2.0 and 2.2 cycles, we punted on this question. But for 3.0, we added the notion of declarative modules. For these modules, bindings which are defined once in a module and which are not mutated in the compilation unit are declarative bindings, which can be reasoned about lexically. We actually translate them to a form of letrec*, which then enables inlining via peval, contification, and closure optimization -- in descending order of preference.

The upshot is that with Guile 3.0, top-level bindings are no longer optimization barriers, in the case of declarative modules, which are compatible enough with historic semantics and usage that they are on by default.

However, module boundaries have still been an optimization barrier. Take (srfi srfi-1), a little utility library on lists. One definition in the library is xcons, which is cons with arguments reversed. It's literally (lambda (cdr car) (cons car cdr)). But does the compiler know that? Would it know that (car (xcons x y)) is the same as y? Until now, no, because no part of the optimizer will look into bindings from outside the compilation unit.

mr compiler, tear down this wall

But no longer! Guile can now inline across module boundaries -- in some circumstances. This feature will be part of a future Guile 3.0.8.

There are actually two parts of this. One is the compiler can identify a set of "inlinable" values from a declarative module. An inlinable value is a small copyable expression. A copyable expression has no identity (it isn't a fresh allocation), and doesn't reference any module-private binding. Notably, lambda expressions can be copyable, depending on what they reference. The compiler then extends the module definition that's residualized in the compiled file to include a little procedure that, when passed a name, will return the Tree-IL representation of that binding. The design of that was a little tricky; we want to avoid overhead when using the module outside of the compiler, even relocations. See compute-encoding in that module for details.

With all of that, we can call ((module-inlinable-exports (resolve-interface '(srfi srfi-1))) 'xcons) and get back the Tree-IL equivalent of (lambda (cdr car) (cons car cdr)). Neat!

The other half of the facility is the actual inlining. Here we lean on peval again, causing <module-ref> forms to trigger an attempt to copy the term from the imported module to the residual expression, limited by the same effort counter as the rest of peval.

The end result is that we can be absolutely sure that constants in imported declarative modules will inline into their uses, and fairly sure that "small" procedures will inline too.

caveat: compiled imported modules

There are a few caveats about this facility, and they are sufficiently sharp that I should probably fix them some day. The first one is that for an imported module to expose inlinable definitions, the imported module needs to have been compiled already, not loaded from source. When you load a module from source using the interpreter instead of compiling it first, the pipeline is optimized for minimizing the latency between when you ask for the module and when it is available. There's no time to analyze the module to determine which exports are inlinable and so the module exposes no inlinable exports.

This caveat is mitigated by automatic compilation, enabled by default, which will compile imported modules as needed.

It could also be fixed for modules by topologically ordering the module compilation sequence; this would allow some parallelism in the build but less than before, though for module graphs with cycles (these exist!) you'd still have some weirdness.

caveat: abi fragility

Before Guile supported cross-module inlining, there was only explicit inlining across modules in Guile, facilitated by macros. If you write a module that has a define-inlinable export and you think about its ABI, then you know to consider any definition referenced by the inlinable export, and you know by experience that its code may be copied into other compilation units. Guile doesn't automatically recompile a dependent module when a macro that it uses changes, currently anyway. Admittedly this situation leans more on culture than on tools, which could be improved.

However, with automatically inlinable exports, this changes. Any definition in a module could be inlined into its uses in other modules. This may alter the ABI of a module in unexpected ways: you think that module C depends on module B, but after inlining it may depend on module A as well. Updating module B might not update the inlined copies of values from B into C -- as in the case of define-inlinable, but less lexically apparent.

At higher optimization levels, even private definitions in a module can be inlinable; these may be referenced if an exported macro from the module expands to a term that references a module-private variable, or if an inlinable exported binding references the private binding. But these optimization levels are off by default precisely because I fear the bugs.

Probably what this cries out for is some more sensible dependency tracking in build systems, but that is a topic for another day.

caveat: reproducibility

When you make a fresh checkout of Guile from git and build it, the build proceeds in the following way.

Firstly, we build libguile, the run-time library implemented in C.

Then we compile a "core" subset of Scheme files at optimization level -O1. This subset should include the evaluator, reader, macro expander, basic run-time, and compilers. (There is a bootstrap evaluator, reader, and macro expander in C, to start this process.) Say we have source files S0, S1, S2 and so on; generally speaking, these files correspond to Guile modules M0, M1, M2 etc. This first build produces compiled files C0, C1, C2, and so on. When compiling a file S2 implementing module M2, which happens to import M1 and M0, it may be M1 and M0 are provided by compiled files C1 and C0, or possibly they are loaded from the source files S1 and S0, or C1 and S0, or S1 and C0.

The bootstrap build uses make for parallelism, with each compile process starts afresh, importing all the modules that comprise the compiler and then using them to compile the target file. As the build proceeds, more and more module imports can be "serviced" by compiled files instead of source files, making the build go faster and faster. However this introduces system-specific nondeterminism as to the set of compiled files available when compiling any other file. This strategy works because it doesn't really matter whether module M1 is provided by compiled file C1 or source file S1; the compiler and the interpreter implement the same language.

Once the compiler is compiled at optimization level -O1, Guile then uses that freshly built compiler to build everything at -O2. We do it in this way because building some files at -O1 then all files at -O2 takes less time than going straight to -O2. If this sounds weird, that's because it is.

The resulting build is reproducible... mostly. There is a bug in which some unique identifiers generated as part of the implementation of macros can be non-reproducible in some cases, and that disabling parallel builds seems to solve the problem. The issue being that gensym (or equivalent) might be called a different number of times depending on whether you are loading a compiled module, or whether you need to read and macro-expand it. The resulting compiled files are equivalent under alpha-renaming but not bit-identical. This is a bug to fix.

Anyway, at optimization level -O1, Guile will record inlinable definitions. At -O2, Guile will actually try to do cross-module inlining. We run into two issues when compiling Guile; one is if we are in the -O2 phase, and we compile a module M which uses module N, and N is not in the set of "core" modules. In that case depending on parallelism and compile order, N may be loaded from source, in which case it has no inlinable exports, or from a compiled file, in which case it does. This is not a great situation for the reliability of this optimization. I think probably in Guile we will switch so that all modules are compiled at -O1 before compiling at -O2.

The second issue is more subtle: inlinable bindings are recorded after optimization of the Tree-IL. This is more optimal than recording inlinable bindings before optimization, as a term that is not inlinable due to its size in its initial form may become small enough to inline after optimization. However, at -O2, optimization includes cross-module inlining! A term that is inlinable at -O1 may become not inlinable at -O2 because it gets slightly larger, or vice-versa: terms that are too large at -O1 could shrink at -O2. We don't even have a guarantee that we will reach a fixed point even if we repeatedly recompile all inputs at -O2, because we allow non-shrinking optimizations.

I think this probably calls for a topological ordering of module compilation inside Guile and perhaps in other modules. That would at least give us reproducibility, provided we avoid the feedback loop of keeping around -O2 files compiled from a previous round, even if they are "up to date" (their corresponding source file didn't change).

and for what?

People who have worked on inliners will know what I mean that a good inliner is like a combine harvester: ruthlessly efficient, a qualitative improvement compared to not having one, but there is a pointy end with lots of whirling blades and it's important to stop at the end of the row. You do develop a sense of what will and won't inline, and I think Dybvig's "Macro writer's bill of rights" encompasses this sense. Luckily people don't lose fingers or limbs to inliners, but inliners can maim expectations, and cross-module inlining more so.

Still, what it buys us is the freedom to be abstract. I can define a module like:

(define-module (elf)
  #:export (ET_NONE ET_REL ET_EXEC ET_DYN ET_CORE))

(define ET_NONE		0)		; No file type
(define ET_REL		1)		; Relocatable file
(define ET_EXEC		2)		; Executable file
(define ET_DYN		3)		; Shared object file
(define ET_CORE		4)		; Core file

And if a module uses my (elf) module and references ET_DYN, I know that the module boundary doesn't prevent the value from being inlined as a constant (and possibly unboxed, etc).

I took a look and on our usual microbenchmark suite, cross-module inlining doesn't make a difference. But that's both a historical oddity and a bug: firstly that the benchmark suite comes from an old Scheme world that didn't have modules, and so won't benefit from cross-module inlining. Secondly, Scheme definitions from the "default" environment that aren't explicitly recognized as primitives aren't inlined, as the (guile) module isn't declarative. (Probably we should fix the latter at some point.)

But still, I'm really excited about this change! Guile developers use modules heavily and have been stepping around this optimization boundary for years. I count 100 direct uses of define-inlinable in Guile, a number of them inside macros, and many of these are to explicitly hack around the optimization barrier. I really look forward to seeing if we can remove some of these over time, to go back to plain old define and just trust the compiler to do what's needed.

by the numbers

I ran a quick analysis of the modules include in Guile to see what the impact was. Of the 321 files that define modules, 318 of them are declarative, and 88 contain inlinable exports (27% of total). Of the 6519 total bindings exported by declarative modules, 566 of those are inlinable (8.7%). Of the inlinable exports, 388 (69%) are functions (lambda expressions), 156 (28%) are constants, and 22 (4%) are "primitives" referenced by value and not by name, meaning definitions like (define head car) (instead of re-exporting car as head).

On the use side, 90 declarative modules import inlinable bindings (29%), resulting in about 1178 total attempts to copy inlinable bindings. 902 of those attempts are to copy a lambda expressions in operator position, which means that peval will attempt to inline their code. 46 of these attempts fail, perhaps due to size or effort constraints. 191 other attempts end up inlining constant values. 20 inlining attempts fail, perhaps because a lambda is used for a value. Somehow, 19 copied inlinable values get elided because they are evaluated only for their side effects, probably to clean up let-bound values that become unused due to copy propagation.

All in all, an interesting endeavor, and one to improve on going forward. Thanks for reading, and catch you next time!

by Andy Wingo at Thursday, May 13, 2021

Tuesday, May 11, 2021

GNU Guix

GNU Guix 1.3.0 released

Image of a Guix test pilot.

We are pleased to announce the release of GNU Guix version 1.3.0!

The release comes with ISO-9660 installation images, a virtual machine image, and with tarballs to install the package manager on top of your GNU/Linux distro, either from source or from binaries. Guix users can update by running guix pull.

It’s been almost 6 months since the last release, during which 212 people contributed code and packages, and a number of people contributed to other important tasks—code review, system administration, translation, web site updates, Outreachy mentoring, and more.

There’s been more than 8,300 commits in that time frame, which we’ll humbly try to summarize in these release notes.

User experience

A distinguishing Guix feature is its support for declarative deployment: instead of running a bunch of guix install and guix remove commands, you run guix package --manifest=manifest.scm, where manifest.scm lists the software you want to install in a snippet that looks like this:

;; This is 'manifest.scm'.
(specifications->manifest
  (list "emacs" "guile" "gcc-toolchain"))

Doing that installs exactly the packages listed. You can have that file under version control and share it with others, which is convenient. Until now, one would have to write the manifest by hand—not insurmountable, but still a barrier to someone willing to migrate to the declarative model.

The new guix package --export-manifest command (and its companion --export-channels option) produces a manifest based on the contents of an existing profile. That makes it easy to transition from the classic “imperative” model, where you run guix install as needed, to the more formal declarative model. This was long awaited!

Users who like to always run the latest and greatest pieces of the free software commons will love the new --with-latest package transformation option. Using the same code as guix refresh, this option looks for the latest upstream release of a package, fetches it, authenticates it, and builds it. This is useful in cases where the new version is not yet packaged in Guix. For example, the command below, if run today, will (attempt to) install QEMU 6.0.0:

$ guix install qemu --with-latest=qemu 
The following package will be upgraded:
   qemu 5.2.0 → 6.0.0

Starting download of /tmp/guix-file.eHO6MU
From https://download.qemu.org//qemu-6.0.0.tar.bz2...
 …0.tar.bz2  123.3MiB                                                                                                                      28.2MiB/s 00:04 [##################] 100.0%

Starting download of /tmp/guix-file.9NRlvT
From https://download.qemu.org//qemu-6.0.0.tar.bz2.sig...
 …tar.bz2.sig  310B                                                                                                                         1.2MiB/s 00:00 [##################] 100.0%
gpgv: Signature made Thu 29 Apr 2021 09:28:25 PM CEST
gpgv:                using RSA key CEACC9E15534EBABB82D3FA03353C9CEF108B584
gpgv: Good signature from "Michael Roth <michael.roth@amd.com>"
gpgv:                 aka "Michael Roth <mdroth@utexas.edu>"
gpgv:                 aka "Michael Roth <flukshun@gmail.com>"
The following derivation will be built:
   /gnu/store/ypz433vzsbg3vjp5374fr9lhsm7jjxa4-qemu-6.0.0.drv

…

There’s one obvious caveat: this is not guaranteed to work. If the new version has a different build system, or if it requires extra dependencies compared to the version currently packaged, the build process will fail. Yet, it provides users with additional flexibility which can be convenient at times. For developers, it’s also a quick way to check whether a given package successfully builds against the latest version of one of its dependencies.

Several changes were made here and there to improve user experience. As an example, a new --verbosity level was added. By default (--verbosity=1), fewer details about downloads get printed, which matches the expectation of most users.

Another handy improvement is suggestions when making typos:

$ guix package --export-manifests
guix package: error: export-manifests: unrecognized option
hint: Did you mean `export-manifest'?

$ guix remve vim
guix: remve: command not found
hint: Did you mean `remove'?

Try `guix --help' for more information.

People setting up build offloading over SSH will enjoy the simplified process, where the guile executable no longer needs to be in PATH, with appropriate GUILE_LOAD_PATH settings, on target machines. Instead, offloading now channels all its operations through guix repl.

The Guix reference manual is fully translated into French, German, and Spanish, with preliminary translations in Russian, Chinese, and other languages. Guix itself is fully translated in French, German, and Slovak, and partially translated in almost twenty other languages. Translations are now handled on Weblate, and you can help!

Developer tools

We have good news for packagers! First, guix import comes with a new Go recursive importer, that can create package definitions or templates thereof for whole sets of Go packages. The guix import crate command, for Rust packages, now honors “semantic versioning” when used in recursive mode.

The guix refresh command now includes new “updaters”: sourceforge, for code hosted on SourceForge, and generic-html which, as the name implies, is a generic update that works by scanning package home pages. This greatly improves guix refresh coverage.

Packagers and developers may also like the new --with-patch package transformation option, which provides a way to build a bunch of packages with a patch applied to one or several of them.

Building on the Guix System image API introduced in v1.2.0, the guix system vm-image and guix system disk-image are superseded by a unified guix system image command. For example,

guix system vm-image --save-provenance config.scm

becomes

guix system image -t qcow2 --save-provenance config.scm

while

guix system disk-image -t iso9660 gnu/system/install.scm

becomes

guix system image -t iso9660 gnu/system/install.scm

This brings performance benefits; while a virtual machine used to be involved in the production of the image artifacts, the low-level bits are now handled by the dedicated genimage tool. Another benefit is that the qcow2 format is now compressed, which removes the need to manually compress the images by post-processing them with xz or another compressor. To learn more about the guix system image command, you can refer to its documentation.

Last but not least, the introduction of the GUIX_EXTENSIONS_PATH Guix search path should make it possible for Guix extensions, such as the Guix Workflow Language, to have their Guile modules automatically discovered, simplifying their deployments.

Performance

One thing you will hopefully notice is that substitute installation (downloading pre-built binaries) became faster, as we explained before. This is in part due to the opportunistic use of zstd compression, which has a high decompression throughput. The daemon and guix publish support zstd as an additional compression method, next to gzip and lzip.

Another change that can help fetch substitutes more quickly is local substitute server discovery. The new --discover option of guix-daemon instructs it to discover and use substitute servers on the local-area network (LAN) advertised with the mDNS/DNS-SD protocols, using Avahi. Similarly, guix publish has a new --advertise option to advertise itself on the LAN.

On Guix System, you can run herd discover guix-daemon on to turn discovery on temporarily, or you can enable it in your system configuration. Opportunistic use of neighboring substitute servers is entirely safe, thanks to reproducible builds.

In other news, guix system init has been optimized, which contributes to making Guix System installation faster.

On some machines with limited resources, building the Guix modules is an expensive operation. A new procedure, channel-with-substitutes-available from the (guix ci) module, can now be used to pull Guix to the latest commit which has already been built by the build farm. Refer to the documentation for an example.

POWER9 support, packages, services, and more!

POWER9 support is now available as a technology preview, thanks to the tireless work of those who helped porting Guix to that platform. There aren't many POWER9 binary substitutes available yet, due to the limited POWER9 capacity of our build farm, but if you are not afraid of building many packages from source, we'd be thrilled to hear back from your experience!

2,000 packages were added, for a total of more than 17K packages; 3,100 were updated. The distribution comes with GNU libc 2.31, GCC 10.3, Xfce 4.16.0, Linux-libre 5.11.15, LibreOffice 6.4.7.2, and Emacs 27.2, to name a few. Among the many packaging changes, one that stands out is the new OCaml bootstrap: the OCaml package is now built entirely from source via camlboot. Package updates also include Cuirass 1.0, the service that powers our build farm.

The services catalog has also seen new additions such as wireguard, syncthing, ipfs, a simplified and more convenient service for Cuirass, and more! You can search for services via the guix system search facility.

The NEWS file lists additional noteworthy changes and bug fixes you may be interested in.

Try it!

The installation script has been improved to allow for more automation. For example, if you are in a hurry, you can run it with:

# yes | ./install.sh

to proceed to install the Guix binary on your system without any prompt!

You may also be interested in trying the Guix System demonstration VM image which now supports clipboard integration with the host and dynamic resizing thanks to the SPICE protocol, which we hope will improve the user experience.

To review all the installation options at your disposal, consult the download page and don't hesitate to get in touch with us.

Enjoy!

Credits

Luis Felipe (illustration)

About GNU Guix

GNU Guix is a transactional package manager and an advanced distribution of the GNU system that respects user freedom. Guix can be used on top of any system running the Hurd or the Linux kernel, or it can be used as a standalone operating system distribution for i686, x86_64, ARMv7, AArch64 and POWER9 machines.

In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. When used as a standalone GNU/Linux distribution, Guix offers a declarative, stateless approach to operating system configuration management. Guix is highly customizable and hackable through Guile programming interfaces and extensions to the Scheme language.

by Ludovic Courtès, Maxim Cournoyer at Tuesday, May 11, 2021

Programming Praxis

Potholes

Today’s exercise comes from one of those online code exercises, via Stack Overflow:

There is a machine that can fix all potholes along a road 3 units in length. A unit of road will be represented by a period in a string. For example, “…” is one section of road 3 units in length. Potholes are marked with an “X” in the road, and also count as 1 unit of length. The task is to take a road of length N and fix all potholes with the fewest possible sections repaired by the machine.

The city where I live needs one of those machines.

Here are some examples:

.X.          1
.X...X       2
XXX.XXXX     3
.X.XX.XX.X   3

Your task is to write a program that takes a string representing a road and returns the smallest number of fixes that will remove all the potholes. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at Tuesday, May 11, 2021

Monday, May 10, 2021

Joe Marshall

Substitution

In McCarthy's early papers on Lisp, he notes that he needs a modified version of subst which needs to be aware of quoted expressions (and avoid substituting within them). He would also need a subst that was aware of lambda expressions. It would have to avoid substituting within the lambda if the name substituted matches one of the bound variables. To be useful for evaluation, it will have to deal with accidental variable capture when substituting within a lambda.

The root problem is that expressions are actually structured objects, but we are working with the list representation of those objects. Instead of substituting by operating on objects, we substitute on the list representation. We have to arrange for the syntactic substitution on the list representation to preserve the semantics of substitution on the objects they represent.

In the substitution model, we take a symbolic expression and replace some of the atoms in the expression with other expressions. We first need a way to discriminate between the different kinds of expressions. An expression is either an atomic symbol, or a list of expressions called an application. There are no other kinds of expressions.

(defun expression-dispatch (expression if-symbol if-application)
  (cond ((symbolp expression) (funcall if-symbol expression))
        ((consp expression)   (funcall if-application expression))
        (t (error "~s is not an expression." expression))))
Substitution is straightforward:
(defun xsubst (table expression)
  (expression-dispatch expression
    (lambda (symbol)
      (funcall table symbol #'identity (constantly symbol)))

    (lambda (subexpressions)
      (map 'list (lambda (subexpression) (xsubst table subexpression)) subexpressions))))

* (let ((table (table/extend (table/empty) 'x '(* a 42))))
    (xsubst table '(+ x y)))  
(+ (* A 42) Y)
We need a table of multiple substitutions so that we can substitute in parallel:
* (let ((table (table/extend
                 (table/extend (table/empty) 'x 'y)
                 'y 'x)))
    (xsubst table '(+ x y)))
(+ Y X)

So far, so good. Let's add lambda expressions. First, we need to add a new expression kind:

(defun expression-dispatch (expression if-symbol if-lambda if-application) 
  (cond ((symbolp expression) (funcall if-symbol expression))
        ((consp expression)
         (cond ((eq (car expression) 'lambda)
                (funcall if-lambda (cadr expression) (caddr expression)))
               (t (funcall if-application expression))))
        (t (error "~s is not an expression." expression))))

Substitution within a lambda expression is a bit tricky. First, you don't want to substitute a symbol if it is one of the bound variables of the lambda expression. Second, substituting a symbol may introduce more symbols. We don't want the new symbols to be accidentally captured by the bound variables in the lambda. If any new symbol has the same name as a bound variable, we have to rename the bound variable (and all its occurrances) to a fresh name so that it doesn't capture the new symbol being introduced. We'll need a helper function

(defun free-variables (expression)
   (expression-dispatch expression
     (lambda (symbol) (list symbol))
     (lambda (bound-variables body)
       (set-difference (free-variables body) bound-variables))
     (lambda (subexpressions)
       (fold-left #'union '() (map 'list #'free-variables subexpressions)))))
Now when we substitute within a lambda, we first find each free variable in the lambda, look it up in the substitution table, and collect the free variables of the substituted value:
(map 'list (lambda (var)
             (funcall table var #'free-variables (constantly '())))
      (free-variables expression))
This gives us the new free variables for each substitution. The union of all of these is the set of all the new free variables
(fold-left #'union '()
           (map 'list (lambda (var)
                        (funcall table var #'free-variables (constantly '())))
                 (free-variables expression)))
We have to rename the bound variables that are in this set:
 
(intersection 
  bound-variables
  (fold-left #'union '()
             (map 'list (lambda (var)
                          (funcall table var #'free-variables (constantly '())))
                  (free-variables expression))))
So we make a little table for renaming:
(defun make-alpha-table (variables)
  (fold-left (lambda (table variable)
               (table/extend table variable (gensym (symbol-name variable))))
             (table/empty)
             variables))

(let ((alpha-table
       (make-alpha-table
        (intersection 
          bound-variables
          (fold-left #'union '()
                     (map 'list (lambda (var)
                                  (funcall table var #'free-variables (constantly '())))
                          (free-variables expression)))))))
  …)
We rename the bound variables as necessary:
  (make-lambda
    (map 'list (lambda (symbol)
                 (funcall alpha-table symbol #'identity (constantly symbol)))
         bound-variables)
    …)
Finally, we redact the bound variables from the substitution table and append the alpha-table to make the substitutions we need for the lambda body
  (make-lambda
   (map 'list (lambda (symbol)
                (funcall alpha-table symbol #'identity (constantly symbol)))
        bound-variables)
   (xsubst (table/append alpha-table (table/redact* table bound-variables))
           body))))
The entire definition of xsubst is now this:
(defun xsubst (table expression)
  (expression-dispatch expression
    (lambda (symbol)
      (funcall table symbol #'identity (constantly symbol)))

    (lambda (bound-variables body)
      (let ((alpha-table
             (make-alpha-table
              (intersection
               bound-variables
               (fold-left #'union '()
                          (map 'list (lambda (var)
                                       (funcall table var
                                                #'free-variables
                                                (constantly '())))
                               (set-difference (free-variables body) bound-variables)))))))
        (make-lambda
         (map 'list (lambda (symbol)
                      (funcall alpha-table symbol #'identity (constantly symbol)))
              bound-variables)
         (xsubst (table/append alpha-table (table/redact* table bound-variables))
                 body))))

    (lambda (subexpressions)
      (make-application
       (map 'list (lambda (subexpression)
                    (xsubst table subexpression))
            subexpressions)))))
This is certainly more complicated than simple substitution, but we can see it does the right thing here:
* (xsubst (table/extend (table/empty) 'x '(* a y)) '(lambda (y) (+ x y)))
(LAMBDA (#:Y234) (+ (* A Y) #:Y234))

It should be obvious how to add quoted forms. This would require adding a new kind of expression to expression-dispatch and a new handling clause in xsubst that avoids substitution.

I'm not completely happy with how we've added lambda expressions to the expression syntax. Using the symbol lambda as a syntactic marker for lambda expressions causes problems if we also want to use that symbol as an argument or variable. Initially, it seems reasonable to be able to name an argument “lambda”. Within the body of the function, references to the variable lambda would refer to that argument. But what about references in the operator position? By defining lambda expressions as three element lists beginning with the symbol lambda we've made it ambiguous with two-argument applications whose operator is the variable lambda. We have to resolve this ambiguity. The current behavior is that we always interpret the symbol lambda as a syntactic marker so you simply cannot use a variable named lambda as a function.

by Joe Marshall (noreply@blogger.com) at Monday, May 10, 2021

Tuesday, May 4, 2021

The Racket Blog

Racket v8.1

posted by John Clements

Racket version 8.1 is now available from https://racket-lang.org/.

  • DrRacket tabs can be dragged, and have new close buttons.

  • Racket CS supports cross-compilation using raco exe.

  • Racket CS supports Android on 32-bit and 64-bit ARM processors.

  • The database library supports running queries in OS threads.

  • Check-Syntax arrows correctly identify the definition site of identifiers with contracts.

  • Racket CS performance has improved for structure predicates and accessors

  • Racket CS is faster at multiplying extremely large numbers and dividing large integers.

  • Racket CS allows callbacks to raise exceptions if they are annotated with #:callback-exns?.

  • New ephemeron hash tables simplify the implementation of tables where keys can refer to values.

  • Typed Racket supports for/foldr.

  • The stepper works for #lang htdp/*sl.

  • Struct signatures work for the ASL teaching language.

The following people contributed to this release:

Alex Harsányi, Alex Knauth, Alexander Shopov, Alexis King, Andrew Mauer-Oats, Anish Athalye, Ben Greenman, Bert De Ketelaere, Bob Burger, Bogdan Popa, Brian Adkins, Cameron Moy, David Van Horn, Dexter Lagan, Dominik Pantůček, Fred Fu, Greg Hendershott, Gustavo Massaccesi, Hazel Levine, Ismael Luceno, Jack Firth, Jarhmander, John Clements, Jörgen Brandt, Laurent Orseau, Lazerbeak12345, Matthew Flatt, Matthias Felleisen, Micah Cantor, Mike Sperber, Noah Ma, Patrick McCarty, Paulo Matos, Pavel Panchekha, Philip McGrath, Philippe Meunier, R. Kent Dybvig, Robby Findler, Ryan Culpepper, Ryan Kramer, Sam Tobin-Hochstadt, Sergiu Ivanov, Shu-Hung You, Sorawee Porncharoenwase, Stephen De Gabrielle, William J. Bowman, bmitc, xxyzz, yjqww6, and ymdarake.

Feedback Welcome!

by The Unknown Author at Tuesday, May 4, 2021