Planet Scheme

Tuesday, July 20, 2021

Scheme Requests for Implementation

SRFI 222: Compound Objects

SRFI 222 is now in final status.

Compound objects are analogous to R6RS compound conditions, and are suitable for use in creating and handling conditions on non-R6RS systems, among other purposes. They encapsulate an immutable sequence of subobjects, which can be any object except another compound object. It is possible to implement R6RS compound conditions on top of compound objects, but not vice versa. Note that this SRFI does not provide any analogue to R6RS simple conditions, which are just records.

by John Cowan (text) and Arvydas Silanskas (implementation) at Tuesday, July 20, 2021

Sunday, July 18, 2021

The Racket Blog

Racket v8.2

posted by John Clements

Racket version 8.2 is now available from https://download.racket-lang.org.

  • Racket CS improved the performance of large-integer arithmetic.

  • Racket has improved support for layered and tethered installation.

  • Racket CS supports nonatomic allocation via ffi/unsafe.

  • Cross-compilation works fully with the raco cross tool, which is distributed separately as the “raco-cross” package.

  • DrRacket has performance improvements when editing files with picts containing large bitmaps.

  • Typed Racket more consistently refines field types of non-polymorphic structs.

  • Printing of values is unified across the teaching language implementations and the stepper.

The following people contributed to this release:

Alex Harsányi, Alex Knauth, Amirouche, Andrew Mauer-Oats, Bob Burger, Bogdan Popa, Cameron Moy, Crystal Jacobs, Dale Vaillancourt, Diego A. Mundo, Fred Fu, Greg Hendershott, Gustavo Massaccesi, Jack Firth, Jamie Taylor, Jarhmander, Jason Hemann, Jay McCarthy, Jeffrey D. Swan, Jens Axel Søgaard, Jesse Alama, John Clements, Laurent Orseau, Lazerbeak12345, Matthew Flatt, Matthias Felleisen, Mike Sperber, Nada Amin, Noah Ma, Oscar Waddell, Paulo Matos, Pavel Panchekha, Philip McGrath, Ray Racine, Robby Findler, Ryan Culpepper, Sam Tobin-Hochstadt, Shu-Hung You, Sorawee Porncharoenwase, Stephen Chang, Thorsten Blum, Tony Garnock-Jones, WarGrey Gyoudmon Ju, William J. Bowman, Yu Fang, and minor-change.

Feedback Welcome

by The Unknown Author at Sunday, July 18, 2021

Monday, July 5, 2021

Mark Damon Hughes

Script the Scheme REPL with Expect

Routinely I want to open the Chez Scheme REPL in my code dir and load my standard libraries; I've been copy-pasting from a Stickies to get my setup each time, because you can't easily set a common prelude from command line. Finally solved that.

In ~/bin/scheme-repl, I put:

#!/usr/bin/env expect -f
log_user 0
cd "$env(HOME)/Code/CodeChez"
spawn scheme
expect "> "
send -- "(import (chezscheme) (marklib) (marklib-os)"
send -- "  (only (srfi s1 lists) delete drop-right last)"
send -- "  (only (srfi s13 strings) string-delete string-index string-index-right string-join string-tokenize) )\n"
log_user 1
interact

(make sure to change the path to wherever you keep your Scheme scripts, and whatever imports you like)

Added alias s=scheme-repl to my .zshrc

Now I can just hit one letter for shortcut:

% s
(import (chezscheme) (marklib) (marklib-os)  (only (srfi s1 lists) delete drop
-right last)  (only (srfi s13 strings) string-delete string-index string-index-r
ight string-join string-tokenize) )
> (os-path 'pwd)
"/Users/mdh/Code/CodeChez"
>

You can sort of do the same thing with a Chez boot file, but I wasn't able to get it to load libraries from thunderchez, even with --libdirs flag, so screw it.

I'd forgotten everything I ever knew about expect, and the only resources online are exact copies of the same "log into ssh with expect!" (which you should never do! Set up SSH keys for Cthulhu's sake!) tutorial over and over again, so had to read the man page to make even this trivial thing work.

by mdhughes at Monday, July 5, 2021

Wednesday, June 30, 2021

Scheme Requests for Implementation

SRFI 224: Integer Mappings

SRFI 224 is now in final status.

Integer maps, or fxmappings, are finite sets, where each element is an association between a fixnum (exact integer) key and an arbitrary Scheme object. They are similar to the general mappings of SRFI 146, but the restricted key-type allows implementations of fxmappings to benefit from optimized structures and algorithms. This library provides a rich set of operations on fxmappings, including analogues of most of the forms provided by SRFI 146. Fxmappings have no intrinsic order, but may be treated as ordered sets, using the natural ordering on keys; a substantial sublibrary for working with fxmappings in this fashion is included.

by Wolfgang Corcoran-Mathe at Wednesday, June 30, 2021

Saturday, June 26, 2021

Scheme Requests for Implementation

SRFI 225: Dictionaries

SRFI 225 is now in draft status.

The procedures of this SRFI allow callers to manipulate an object that maps keys to values without the caller needing to know exactly what the type of the object is. Such an object is called a dictionary in this SRFI.

by John Cowan (spec) and Arvydas Silanskas (implementation) at Saturday, June 26, 2021

Tuesday, June 22, 2021

Programming Praxis

Aronson’s Sequence

Aronson’s sequence 1, 4, 11, 16, 24, 29, 33, 35, 39, … (A005224) is an infinite self-referential sequence defined as:

T is the first, fourth, eleventh, … letter in this sentence.

Your task is to write a program that generates Aronson’s sequence and use it to compute the first hundred members of the sequence. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at Tuesday, June 22, 2021

Friday, June 18, 2021

GNU Guix

Substitutes now also available from bordeaux.guix.gnu.org

There have been a number of different project operated sources of substitutes, for the last couple of years the default source of substitutes has been ci.guix.gnu.org (with a few different URLs).

Now, in addition to ci.guix.gnu.org, bordeaux.guix.gnu.org is a default substitute server.

Put that way, this development maybe doesn't sound particularly interesting. Why is a second substitute server useful? There's some thoughts on that exact question in the next section. If you're just interested in how to use (or how not to use) substitutes from bordeaux.guix.gnu.org, then you can just skip ahead to the last section.

Why a second source of substitutes?

This change is an important milestone, following on from the work that started on the Guix Build Coordinator towards the start of 2020.

Back in 2020, the substitute availability from ci.guix.gnu.org was often an issue. There seemed to be a number of contributing factors, including some parts of the architecture. Without going too much in to the details of the issues, aspects of the design of the Guix Build Coordinator were specifically meant to avoid some of these issues.

While there were some very positive results from testing back in 2020, it's taken so long to bring the substitute availability benefits to general users of Guix that ci.guix.gnu.org has changed and improved significantly in the meantime. This means that any benefits in terms of substitute availability are less significant now.

One clearer benefit of just having two independent sources of substitutes is redundancy. While the availability of ci.guix.gnu.org has been very high (in my opinion), having a second independent substitute server should mean that if there's a future issue with users accessing either source of substitutes, the disruption should be reduced.

I'm also excited about the new possibilities offered by having a second substitute server, particularly one using the Guix Build Coordinator to manage the builds.

Substitutes for the Hurd is already something that's been prototyped, so I'm hopeful that bordeaux.guix.gnu.org can start using childhurd VMs to build things soon.

Looking a bit further forward, I think there's some benefits to be had in doing further work on how the nar and narinfo files used for substitutes are managed. There are some rough plans already on how to address the retention of nars, and how to look at high performance mirrors.

Having two substitute servers is one step towards stronger trust policies for substitutes (as discussed on guix-devel, where you would only use a substitute if both ci.guix.gnu.org and bordeaux.guix.gnu.org have built it exactly the same. This would help protect against the compromise of a single substitute server.

Using substitutes from bordeaux.guix.gnu.org

If you're using Guix System, and haven't altered the default substitute configuration, updating guix (via guix pull), reconfiguring using the updated guix, and then restarting the guix-daemon should enable substitutes from bordeaux.guix.gnu.org.

If the ACL is being managed manually, you might need to add the public key for bordeaux.guix.gnu.org manually as well.

When using Guix on a foreign distribution with the default substitute configuration, you'll need to run guix pull as root, then restart the guix-daemon. You'll then need to add the public key for bordeaux.guix.gnu.org to the ACL.

guix archive --authorize < /root/.config/guix/current/share/guix/bordeaux.guix.gnu.org.pub

If you want to just use ci.guix.gnu.org, or bordeaux.guix.gnu.org for that matter, you'll need to adjust the substitute urls configuration for the guix-daemon to just refer to the substitute servers you want to use.

by Christopher Baines at Friday, June 18, 2021

Tuesday, June 15, 2021

Programming Praxis

Cardinal And Ordinal Numbers

Cardinal numbers are those used for counting; spelled in letters, they are one, two, three, four, and so on.

Ordinal numbers are those used for ranking: spelled in letters, they are first, second, third, fourth, and so on.

Your task is to write a program that takes a number and returns the spelled-out cardinal and ordinal forms of that number. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at Tuesday, June 15, 2021

Monday, June 14, 2021

Jeremy Kun

Searching for RH Counterexamples — Exploring Data

We’re ironically searching for counterexamples to the Riemann Hypothesis.

  1. Setting up Pytest
  2. Adding a Database
  3. Search Strategies
  4. Unbounded integers
  5. Deploying with Docker
  6. Performance Profiling
  7. Scaling up
  8. Productionizing

In the last article we added a menagerie of “production readiness” features like continuous integration tooling (automating test running and static analysis), alerting, and a simple deployment automation. Then I let it loose on AWS, got extremely busy with buying a house, forgot about this program for a few weeks (no alerts means it worked flawlessly!), and then saw my AWS bill.

So I copied the database off AWS using pg_dump (piped to gzip), terminated the instances, and inspected the results. A copy of the database is here. You may need git-lfs to clone it. If I wanted to start it back up again, I could spin them back up, and use gunzip | psql to restore the database, and it would start back up from where it left off. A nice benefit of all the software engineering work done thus far.

This article will summarize some of the data, show plots, and try out some exploratory data analysis techniques.

Summary

We stopped the search mid-way through the set of numbers with 136 prime divisors.

The largest number processed was

1255923956750926940807079376257388805204
00410625719434151527143279285143764977392
49474111379646103102793414829651500824447
17178682617437033476033026987651806835743
3694669721205424205654368862231754214894
07691711699791787732382878164959602478352
11435434547040000

Which in factored form is the product of these terms

  2^8   3^7   5^4   7^4  11^3  13^3  17^2  19^2  23^2  29^2
 31^2  37^2  41^2  43^1  47^1  53^1  59^1  61^1  67^1  71^1
 73^1  79^1  83^1  89^1  97^1 101^1 103^1 107^1 109^1 113^1
127^1 131^1 137^1 139^1 149^1 151^1 157^1 163^1 167^1 173^1
179^1 181^1 191^1 193^1 197^1 199^1 211^1 223^1 227^1 229^1
233^1 239^1 241^1 251^1 257^1 263^1 269^1 271^1 277^1 281^1
283^1 293^1 307^1 311^1 313^1 317^1 331^1 337^1 347^1 349^1
353^1 359^1 367^1 373^1 379^1 383^1 389^1 397^1 401^1 409^1
419^1 421^1 431^1 433^1 439^1 443^1 449^1 457^1 461^1 463^1
467^1 479^1 487^1 491^1 499^1 503^1 509^1 521^1 523^1 541^1
547^1 557^1 563^1 569^1 571^1 577^1

The best witness—the number with the largest witness value—was

38824169178385494306912668979787078930475
9208283469279319659854547822438432284497
11994812030251439907246255647505123032869
03750131483244222351596015366602420554736
87070007801035106854341150889235475446938
52188272225341139870856016797627204990720000

which has witness value 1.7707954880001586, which is still significantly smaller than the needed 1.782 to disprove RH.

The factored form of the best witness is

 2^11   3^7   5^4   7^3  11^3  13^2  17^2  19^2  23^2  29^2 
 31^2  37^1  41^1  43^1  47^1  53^1  59^1  61^1  67^1  71^1 
 73^1  79^1  83^1  89^1  97^1 101^1 103^1 107^1 109^1 113^1 
127^1 131^1 137^1 139^1 149^1 151^1 157^1 163^1 167^1 173^1 
179^1 181^1 191^1 193^1 197^1 199^1 211^1 223^1 227^1 229^1 
233^1 239^1 241^1 251^1 257^1 263^1 269^1 271^1 277^1 281^1 
283^1 293^1 307^1 311^1 313^1 317^1 331^1 337^1 347^1 349^1 
353^1 359^1 367^1 373^1 379^1 383^1 389^1 397^1 401^1 409^1 
419^1 421^1 431^1 433^1 439^1 443^1 449^1 457^1 461^1 463^1 
467^1 479^1 487^1 491^1 499^1 503^1 509^1 521^1 523^1 541^1 
547^1 557^1 563^1 

The average search block took 4m15s to compute, while the max took 7m7s and the min took 36s.

The search ran for about 55 days (hiccups included), starting at 2021-03-05 05:47:53 and stopping at 2021-04-28 15:06:25. The total AWS bill—including development, and periods where the application was broken but the instances still running, and including instances I wasn’t using but forgot to turn off—was $380.25. When the application was running at its peak, the bill worked out to about $100/month, though I think I could get it much lower by deploying fewer instances, after we made the performance optimizations that reduced the need for resource-heavy instances. There is also the possibility of using something that integrates more tightly with AWS, such as serverless jobs for the cleanup, generate, and process worker jobs.

Plots

When in doubt, plot it out. I started by writing an export function to get the data into a simpler CSV, which for each n only stored \log(n) and the witness value.

I did this once for the final computation results. I’ll call this the “small” database because it only contains the largest witness value in each block. I did it again for an earlier version of the database before we introduced optimizations (I’ll call this the “large” database), which had all witness values for all superabundant numbers processed up to 80 prime factors.. The small database was only a few dozen megabytes in size, and the large database was ~40 GiB, so I had to use postgres cursors to avoid loading the large database into memory. Moreover, then generated CSV was about 8 GiB in size, and so it required a few extra steps to sort it, and get it into a format that could be plotted in a reasonable amount of time.

First, using GNU sort to sort the file by the first column, \log(n)

sort -t , -n -k 1 divisor_sums.csv -o divisor_sums_sorted.csv

Then, I needed to do some simple operations on massive CSV files, including computing a cumulative max, and filtering down to a subset of rows that are sufficient for plotting. After trying to use pandas and vaex, I realized that the old awk command line tool would be great at this job. So I wrote a simple awk script to process the data, and compute data used for the cumulative max witness value plots below.

Then finally we can use vaex to create two plots. The first is a heatmap of witness value counts. The second is a plot of the cumulative max witness value. For the large database:

Witness value heatmap for the large database
The cumulative maximum witness value for the large database.

And for the small database

A heatmap for the witness values for the small database
The cumulative maximum witness value for the small database.

Note, the two ridges disagree slightly (the large database shows a longer flat line than the small database for the same range), because of the way that the superabundant enumeration doesn’t go in increasing order of n. So larger witness values in the range 400-500 are found later.

Estimating the max witness value growth rate

The next obvious question is whether we can fit the curves above to provide an estimate of how far we might have to look to find the first witness value that exceeds the desired 1.782 threshold. Of course, this will obviously depend on the appropriateness of the underlying model.

A simple first guess would be split between two options: the real data is asymptotic like a + b / x approaching some number less than 1.782 (and hence this approach cannot disprove RH), or the real data grows slowly (perhaps doubly-logarithmic) like a + b \log \log x, but eventually surpasses 1.782 (and RH is false). For both cases, we can use scipy’s curve fitting routine as in this pull request.

For the large database (roughly using log n < 400 since that’s when the curve flatlines due to the enumeration order), we get a reciprocal fit of

\displaystyle f(x) \approx 1.77579122 - 2.72527824 / x

and a logarithmic fit of

\displaystyle f(x) \approx 1.65074314 + 0.06642373 \log(\log(x))

The fit of the large database to a + b/x. Note the asymptote of 1.7757 suggests this will not disprove RH.
The fit of the large database to a + b log log x. If this is accurate, we would find the counterexample around log(n) = 1359.

The estimated asymptote is around 1.7757 in the first case, and the second case estimates we’d find an RH counterexample at around log(n) = 1359.

For the small database of only sufficiently large witness values, this time going up to about log(n) \approx 575, the asymptotic approximation is

\displaystyle 1.77481154 -2.31226382 / x

And the logarithmic approximation is

\displaystyle 1.70825262 + 0.03390312 \log(\log(x))

The reciprocal approximation of the small database with asymptote 1.77481154
The logarithmic approximation of the small database with RH counterexample estimate at log(n) = 6663

Now the asymptote is slightly lower, at 1.7748, and the logarithmic model approximates the counterexample can be found at approximately \log(n) = 6663.

Both of the logarithmic approximations suggest that if we want to find an RH counterexample, we would need to look at numbers with thousands of prime factors. The first estimate puts a counterexample at about 2^{1960}, the second at 2^{9612}, so let’s say between 1k and 10k prime factors.

Luckily, we can actually jump forward in the superabundant enumeration to exactly the set of candidates with m prime factors. So it might make sense to jump ahead to, say, 5k prime factors and search in that region. However, the size of a level set of the superabundant enumeration still grows exponentially in m. Perhaps we should (heuristically) narrow down the search space by looking for patterns in the distribution of prime factors for the best witness values we’ve found thus far. Perhaps the values of n with the best witness values tend to have a certain concentration of prime factors.

Exploring prime factorizations

At first, my thought was to take the largest witness values, look at their prime factorizations, and try to see a pattern when compared to smaller witness values. However, other than the obvious fact that the larger witness values correspond to larger numbers (more and larger prime factors), I didn’t see an obvious pattern from squinting at plots.

To go in a completely different direction, I wanted to try out the UMAP software package, a very nice and mathematically sophisticated for high dimensional data visualization. It’s properly termed a dimensionality reduction technique, meaning it takes as input a high-dimensional set of data, and produces as output a low-dimensional embedding of that data that tries to maintain the same shape as the input, where “shape” is in the sense of a certain Riemannian metric inferred from the high dimensional data. If there is structure among the prime factorizations, then UMAP should plot a pretty picture, and perhaps that will suggest some clearer approach.

To apply this to the RH witness value dataset, we can take each pair (n, \sigma(n)/(n \log \log n)), and associate that with a new (high dimensional) data point corresponding to the witness value paired with the number’s prime factorization

\displaystyle (\sigma(n)/(n \log \log n), k_1, k_2, \dots, k_d),

where n = 2^{k_1} 3^{k_2} 5^{k_3} \dots p_d^{k_d}, with zero-exponents included so that all points have the same dimension. This pull request adds the ability to factorize and export the witness values to a CSV file as specified, and this pull request adds the CSV data (using git-lfs), along with the script to run UMAP, the resulting plots shown below, and the saved embeddings as .npy files (numpy arrays).

When we do nothing special to the data and run it through UMAP we see this plot.

UMAP plotted on the raw prime factorization and witness value dataset.

It looks cool, but if you stare at it for long enough (and if you zoom in when you generate the plot yourself in matplotlib) you can convince yourself that it’s not finding much useful structure. The red dots dominate (lower witness values) and the blue dots are kind of spread haphazardly throughout the red regions. The “ridges” along the chart are probably due to how the superabundant enumeration skips lots of numbers, and that’s why it thins out on one end: the thinning out corresponds to fewer numbers processed that are that large since the enumeration is not uniform.

It also seemed like there is too much data. The plot above has some 80k points on it. After filtering down to just those points whose witness values are bigger than 1.769, we get a more manageable plot.

Witness values and prime factors processed with UMAP, where the witness value is at least 1.769.

This is a bit more reasonable. You can see a stripe of blue dots going through the middle of the plot.

Before figuring out how that blue ridge might relate to the prime factor patterns, let’s take this a few steps further. Typically in machine learning contexts, it helps to normalize your data, i.e., to transform each input dimension into a standard Z-score with respect to the set of values seen in that dimension, subtracting the mean and dividing by the standard deviation. Since the witness values are so close to each other, they’re a good candidate for such normalization. Here’s what UMAP plots when we normalize the witness value column only.

UMAP applied to the (normalized) witness values and prime factorizations. Applied to all witness values.

Now this is a bit more interesting! Here the colormap on the right is in units of standard deviation of witness values. You can see a definite bluest region, and it appears that the data is organized into long brushstrokes, where the witness values increase as you move from one end of the stroke to the other. At worst, this suggests that the dataset has structure that a learning algorithm could discover.

Going even one step further, what if we normalize all the columns? Well, it’s not as interesting.

UMAP when normalizing all columns, not just the witness value.

If you zoom in, you can see that the same sort of “brushstroke” idea is occurring here too, with blue on one end and red on the other. It’s just harder to see.

The previous image, zoomed in around a cluster of data

We would like to study the prettiest picture and see if we can determine what pattern of prime numbers the blue region has, if any. The embedding files are stored on github, and I put up (one version of) the UMAP visualization as an interactive plot via this pull request.

I’ve been sitting on this draft for a while, and while this article didn’t make a ton of headway, the pictures will have to do while I’m still dealing with my new home purchase.

Some ideas for next steps:

  • Squint harder at the distributions of primes for the largest witness values in comparison to the rest.
  • See if a machine learning algorithm can regress witness values based on their prime factorizations (and any other useful features I can derive). Study the resulting hypothesis to determine which features are the most important. Use that to refine the search strategy.
  • Try searching randomly in the superabundant enumeration around 1k and 10k prime factors, and see if the best witness values found there match the log-log regression.
  • Since witness values above a given threshold seem to be quite common, and because the UMAP visualization shows some possible “locality” structure for larger witness values, it suggests if there is a counterexample to RH then there are probably many. So a local search method (e.g., local neighborhood search/discrete gradient ascent with random restarts) might allow us to get a better sense for whether we are on the right track.

Until next time!

by j2kun at Monday, June 14, 2021

Friday, June 11, 2021

GNU Guix

Reproducible data processing pipelines

Last week, we at Guix-HPC published videos of a workshop on reproducible software environments we organized on-line. The videos are well worth watching—especially if you’re into reproducible research, and especially if you speak French or want to practice. This post, though, is more of a meta-post: it’s about how we processed these videos. “A workshop on reproducibility ought to have a reproducible video pipeline”, we thought. So this is what we did!

From BigBlueButton to WebM

Over the last year and half, perhaps you had the “opportunity” to participate in an on-line conference, or even to organize one. If so, chances are that you already know BigBlueButton (BBB), the free software video conferencing suite initially designed for on-line teaching. In a nutshell, it allows participants to chat (audio, video, and keyboard), and speakers can share their screen or a PDF slide deck. Organizers can also record the session.

BBB then creates a link to recorded sessions with a custom JavaScript player that replays everything: typed chat, audio and video (webcams), shared screens, and slide decks. This BBB replay a bit too rough though and often not the thing you’d like to publish after the conference. Instead, you’d rather do a bit of editing: adjusting the start and end time of each talk, removing live chat from what’s displayed (which allows you to remove info that personally identifies participants, too!), and so forth. Turns out this kind of post-processing is a bit of work, primarily because BBB does “the right thing” of recording each stream separately, in the most appropriate form: webcam and screen shares are recorded as separate videos, chat is recorded as text with timings, slide decks is recorded as a bunch of PNGs plus timings, and then there’s a bunch of XML files with metadata putting it all together.

Anyway, with a bit of searching, we quickly found the handy bbb-render tool, which can first download all these files and then assemble them using the Python interface to the GStreamer Editing Services (GES). Good thing: we don’t have to figure out all these things; we “just” have to run these two scripts in an environment with the right dependencies. And guess what: we know of a great tool to control execution environments!

A “deployment-aware Makefile”

So we have a process that takes input files—those PNGs, videos, and XML files—and produces output files—WebM video files. As developers we immediately recognize a pattern and the timeless tool to deal with it: make. The web already seems to contain countless BBB post-processing makefiles (and shell scripts, too). We were going to contribute to this while we suddenly realized that we know of another great tool to express such processes: Guix! Bonus: while a makefile would address just the tip of the iceberg—running bbb-render—Guix can also take care of the tedious task of deploying the right environment to run bbb-render in.

What we did was to write some sort of a deployment-aware makefile. It’s still a relatively unconventional way to use Guix, but one that’s very convenient. We’re talking about videos, but really, you could use the same approach for any kind of processing graph where you’d be tempted to just use make.

The end result here is a Guix file that returns a manifest—a list of videos to “build”. You can build the videos with:

guix build -m render-videos.scm

Overall, the file defines a bunch of functions (procedures in traditional Scheme parlance), each of which takes input files and produces output files. More accurately, these functions returns objects that describe how to build their output from the input files—similar to how a makefile rule describes how to build its target(s) from its prerequisite(s). (The reader familiar with functional programming may recognize a monad here, and indeed, those build descriptions can be thought of as monadic values in a hypothetical “Guix build” monad; technically though, they’re regular Scheme values.)

Let’s take a guided tour of this 300-line file.

Rendering

The first step in this file describes where bbb-render can be found and how to run it to produce a GES “project” file, which we’ll use later to render the video:

(define bbb-render
  (origin
    (method git-fetch)
    (uri (git-reference (url "https://github.com/plugorgau/bbb-render")
                        (commit "a3c10518aedc1bd9e2b71a4af54903adf1d972e5")))
    (file-name "bbb-render-checkout")
    (sha256
     (base32 "1sf99xp334aa0qgp99byvh8k39kc88al8l2wy77zx7fyvknxjy98"))))

(define rendering-profile
  (profile
   (content (specifications->manifest
             '("gstreamer" "gst-editing-services" "gobject-introspection"
               "gst-plugins-base" "gst-plugins-good"
               "python-wrapper" "python-pygobject" "python-intervaltree")))))

(define* (video-ges-project bbb-data start end
                            #:key (webcam-size 25))
  "Return a GStreamer Editing Services (GES) project for the video,
starting at START seconds and ending at END seconds.  BBB-DATA is the raw
BigBlueButton directory as fetched by bbb-render's 'download.py' script.
WEBCAM-SIZE is the percentage of the screen occupied by the webcam."
  (computed-file "video.ges"
                 (with-extensions (list (specification->package "guile-gcrypt"))
                  (with-imported-modules (source-module-closure
                                          '((guix build utils)
                                            (guix profiles)))
                    #~(begin
                        (use-modules (guix build utils) (guix profiles)
                                     (guix search-paths) (ice-9 match))

                        (define search-paths
                          (profile-search-paths #+rendering-profile))

                        (for-each (match-lambda
                                    ((spec . value)
                                     (setenv
                                      (search-path-specification-variable
                                       spec)
                                      value)))
                                  search-paths)

                        (invoke "python"
                                #+(file-append bbb-render "/make-xges.py")
                                #+bbb-data #$output
                                "--start" #$(number->string start)
                                "--end" #$(number->string end)
                                "--webcam-size"
                                #$(number->string webcam-size)))))))

First it defines the source code location of bbb-render as an “origin”. Second, it defines rendering-profile as a “profile” containing all the packages needed to run bbb-render’s make-xges.py script. The specification->manifest procedure creates a manifest from a set of packages specs, and likewise specification->package returns the package that matches a given spec. You can try these things at the guix repl prompt:

$ guix repl
GNU Guile 3.0.7
Copyright (C) 1995-2021 Free Software Foundation, Inc.

Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
This program is free software, and you are welcome to redistribute it
under certain conditions; type `,show c' for details.

Enter `,help' for help.
scheme@(guix-user)> ,use(guix profiles)
scheme@(guix-user)> ,use(gnu)
scheme@(guix-user)> (specification->package "guile@2.0")
$1 = #<package guile@2.0.14 gnu/packages/guile.scm:139 7f416be776e0>
scheme@(guix-user)> (specifications->manifest '("guile" "gstreamer" "python"))
$2 = #<<manifest> entries: (#<<manifest-entry> name: "guile" version: "3.0.7" …> #<<manifest-entry> name: "gstreamer" version: "1.18.2" …> …)

Last, it defines video-ges-project as a function that takes the BBB raw data, a start and end time, and produces a video.ges file. There are three key elements here:

  1. computed-file is a function to produce a file, video.ges in this case, by running the code you give it as its second argument—the recipe, in makefile terms.
  2. The recipe passed to computed-file is a G-expression (or “gexp”), introduced by this fancy #~ (hash tilde) notation. G-expressions are a way to stage code, to mark it for eventual execution. Indeed, that code will only be executed if and when we run guix build (without --dry-run), and only if the result is not already in the store.
  3. The gexp refers to rendering-profile, to bbb-render, to bbb-data and so on by escaping with the #+ or #$ syntax (they’re equivalent, unless doing cross-compilation). During build, these reference items in the store, such as /gnu/store/…-bbb-render, which is itself the result of “building” the origin we’ve seen above. The #$output reference corresponds to the build result of this computed-file, the complete file name of video.ges under /gnu/store.

That’s quite a lot already! Of course, this real-world example is more intimidating than the toy examples you’d find in the manual, but really, pretty much everything’s there. Let’s see in more detail at what’s inside this gexp.

The gexp first imports a bunch of helper modules with build utilities and tools to manipulate profiles and search path environment variables. The for-each call iterates over search path environment variables—PATH, PYTHONPATH, and so on—, setting them so that the python command is found and so that the needed Python modules are found.

The with-imported-modules form above indicates that the (guix build utils) and (guix profiles) modules, which are part of Guix, along with their dependencies (their closure), need to be imported in the build environment. What about with-extensions? Those (guix …) module indirectly depend on additional modules, provided by the guile-gcrypt package, hence this spec.

Next comes the ges->webm function which, as the name implies, takes a .ges file and produces a WebM video file by invoking ges-launch-1.0. The end result is a video containing the recording’s audio, the webcam and screen share (or slide deck), but not the chat.

Opening and closing

We have a WebM video, so we’re pretty much done, right? But… we’d also like to have an opening, showing the talk title and the speaker’s name, as well as a closing. How do we get that done?

Perhaps a bit of a sledgehammer, but it turns out that we chose to produce those still images with LaTeX/Beamer, from these templates.

We need again several processing steps:

  1. We first define the latex->pdf function that takes a template .tex file, a speaker name and title. It copies the template, replaces placeholders with the speaker name and title, and runs pdflatex to produce the PDF.
  2. The pdf->bitmap function takes a PDF and returns a suitably-sized JPEG.
  3. image->webm takes that JPEG and invokes ffmpeg to render it as WebM, with the right resolution, frame rate, and audio track.

With that in place, we define a sweet and small function that produces the opening WebM file for a given talk:

(define (opening title speaker)
  (image->webm
   (pdf->bitmap (latex->pdf (local-file "opening.tex") "opening.pdf"
                            #:title title #:speaker speaker)
                "opening.jpg")
   "opening.webm" #:duration 5))

We need one last function, video-with-opening/closing, that given a talk, an opening, and a closing, concatenates them by invoking ffmpeg.

Putting it all together

Now we have all the building blocks!

We use local-file to refer to the raw BBB data, taken from disk:

(define raw-bbb-data/monday
  ;; The raw BigBlueButton data as returned by './download.py URL', where
  ;; 'download.py' is part of bbb-render.
  (local-file "bbb-video-data.monday" "bbb-video-data"
              #:recursive? #t))

(define raw-bbb-data/tuesday
  (local-file "bbb-video-data.tuesday" "bbb-video-data"
              #:recursive? #t))

No, the raw data is not in the Git repository (it’s too big and contains personally-identifying information about participants), so this assumes that there’s a bbb-video-data.monday and a bbb-video-data.tuesday in the same directory as render-videos.scm.

For good measure, we define a <talk> data type:

(define-record-type <talk>
  (talk title speaker start end cam-size data)
  talk?
  (title     talk-title)
  (speaker   talk-speaker)
  (start     talk-start)           ;start time in seconds
  (end       talk-end)             ;end time
  (cam-size  talk-webcam-size)     ;percentage used for the webcam
  (data      talk-bbb-data))       ;BigBlueButton data

… such that we can easily define talks, along with talk->video, which takes a talk and return a complete, final video:

(define (talk->video talk)
  "Given a talk, return a complete video, with opening and closing."
  (define file-name
    (string-append (canonicalize-string (talk-speaker talk))
                   ".webm"))

  (let ((raw (ges->webm (video-ges-project (talk-bbb-data talk)
                                           (talk-start talk)
                                           (talk-end talk)
                                           #:webcam-size
                                           (talk-webcam-size talk))
                        file-name))
        (opening (opening (talk-title talk) (talk-speaker talk))))
    (video-with-opening/closing file-name raw
                                opening closing.webm)))

The very last bit iterates over the talks and returns a manifest containing all the final videos. Now we can build the ready-to-be-published videos, all at once:

$ guix build -m render-videos.scm
[… time passes…]
/gnu/store/…-emmanuel-agullo.webm
/gnu/store/…-francois-rue.webm
…

Voilà!

Image of an old TV screen showing a video opening.

Why all the fuss?

OK, maybe you’re thinking “this is just another hackish script to fiddle with videos”, and that’s right! It’s also worth mentioning another approach: Racket’s video language, which is designed to manipulate video abstractions, similar to GES but with a sweet high-level functional interface.

But look, this one’s different: it’s self-contained, it’s reproducible, and it has the right abstraction level. Self-contained is a big thing; it means you can run it and it knows what software to deploy, what environment variables to set, and so on, for each step of the pipeline. Granted, it could be simplified with appropriate high-level interfaces in Guix. But remember: the alternative is a makefile (“deployment-unaware”) completed by a README file giving a vague idea of the dependencies needed. The reproducible bit is pretty nice too (especially for a workshop on reproducibility). It also means there’s caching: videos or intermediate byproducts already in the store don’t need to be recomputed. Last, we have access to a general-purpose programming language where we can build abstractions, such as the <talk> data type, that makes the whole thing more pleasant to work with and more maintainable.

Hopefully that’ll inspire you to have a reproducible video pipeline for your next on-line event, or maybe that’ll inspire you to replace your old makefile and shelly habits for data processing!

High-performance computing (HPC) people might be wondering how to go from here and build “computing-resource-aware” or “storage-resource-aware” pipelines where each computing step could be submitted to the job scheduler of an HPC cluster and use distributed file systems for intermediate results rather than /gnu/store. If you’re one of these folks, do take a look at how the Guix Workflow Language addresses these issues.

Acknowledgments

Thanks to Konrad Hinsen for valuable feedback on an earlier draft.

About GNU Guix

GNU Guix is a transactional package manager and an advanced distribution of the GNU system that respects user freedom. Guix can be used on top of any system running the Hurd or the Linux kernel, or it can be used as a standalone operating system distribution for i686, x86_64, ARMv7, AArch64 and POWER9 machines.

In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. When used as a standalone GNU/Linux distribution, Guix offers a declarative, stateless approach to operating system configuration management. Guix is highly customizable and hackable through Guile programming interfaces and extensions to the Scheme language.

by Ludovic Courtès at Friday, June 11, 2021

Wednesday, June 9, 2021

Gauche Devlog

Cookbook: Running commands remotely

Cookbook: Running commands remotely

(I frequently write throw-away scripts in Gauche, and it occurred to me that they can be a nice source of cookbook recipe. I'll write them up as I come to useful snippets.)

With run-process or do-process, you can invoke external commands (ref:gauche.process). One of their interesting feature is that you can run the commands on a remote host, if you have ssh access with public-key authentication to it.

Just add :host keyword argument.

(do-process '(ls) :host "remote.example.com")
 ;=> you see listing of your home directory at remote.example.com

Stdio is forwarded to the local process, so as process exit status. The :directory keyword argument works, though it is relative to your home directory of the remote machine. So, you can mix local execution and remote execution pretty much seamlessly.

The following snippet pushes local commits to the repo, pulls them on the remote machine, rebuild and restart the service. The return value of do-process is a boolean indicating command success or failure, so combining with and is like && in shell scripts.

(and (do-process '(git push) :directory *local-dir*)
     (do-process '(git pull) :host *host* :directory *remote-dir*)
     (do-process '(make)     :host *host* :directory *remote-dir*)
     (do-process '(make restart) :host *host* :directory *remote-dir*))

Tags: Cookbook, gauche.process

Wednesday, June 9, 2021

Tuesday, June 8, 2021

Programming Praxis

Approximate Squaring

Lagarias and Sloane study the “approximate squaring” map f(x) = xx⌉ and its behavior when iterated in this paper.

Consider the fraction x = n / d when n > d > 1; let’s take 8/7 as an example. In the first step, the smallest integer greater than 8/7 (the “ceiling”) is 2, and 8/7 × 2 = 16/7. In the second step, the ceiling of 16/7 is 3, so we have 16/7 × 3 = 48/7. And in the third step, the ceiling of 48/7 is 7, so we have 48/7 × 7 = 48. Now the denominator is 1 and the result is an integer, so iteration stops, and we say that 8/7 goes to 48 in 3 steps.

Study shows the iteration is chaotic; sometimes the iteration stops in just a few steps, as in 8/7, sometimes it takes longer (6/5 goes to a number with 57735 digits in 18 steps), and sometimes it’s just ridiculous (200/199 goes to a number with 10435 digits). It is conjectured but not proven that iterated approximate squaring always terminates in an integer.

Your task is to write a program that iterates approximate squaring. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at Tuesday, June 8, 2021

Monday, May 31, 2021

Gauche Devlog

Functional and linear-updating interfaces

Functional and linear-updating interfaces

Recent trend of data structure SRFIs is to provide two flavors of updating procedures:

  • Functional updaters never mutate the input structure, and always return a newly allocated structure.
  • Linear updaters are allowed to reuse the storage of input structure to produce the output, given that the caller guarantees the input structure will never be used.

Functional interface has a good nature--it won't create hidden dependencies thus the code is easy to reason about. It also plays nicely with concurrent execution, for you don't need to worry that your operations step on other threads' toes.

Linear updating interface gives the user to express opportunities of optimization. The implementation can take advantage of it to reduce allocations.

So, it appears to be a nice combination---except that, I think, the way they are currently specified is actually pulling each one's leg and reducing their merits.


Performance-sensitive users often frown on functional data structures, for they seem to copy everything every time. "It won't be that bad," functionally-minded users replies, "for it is often the case that the input structure and the updated structure can share most of their internals; the updated structure just allocates enough to store the updated parts. In the extreme case, the updater can just return the input as is, when it finds out the structure isn't altered at all (e.g. adjoining an item to a set that already has the item). The beauty of functional programming is that nobody cares whether it is shared or not---only the content matters."

It is true if everything is written functionally. However, to use the linear-updating interface, the caller needs to know that the structure to pass in isn't shared. If the functional interface may return a (partially) shared structure, it's hard to guarantee the "no-share" condition. Thus, SRFI states the functional interface always copies the input, even if there's no change at all. It can't take advantage of partial sharing as well, if the linear-updating version mutates internal structure.

This takes out the opportunity of optimization in the functional interface. The implementation needs to choose either (1) makes a slow functional version, in order to provide an efficient linear-updating version, or (2) makes a linear-updating version not mutate the input at all, and put functional optimizations.


I think we can do better. One idea is this:

  • The data structure has a mutability flag internally.
  • The functional interface always returns immutable data. It may return the input as is, or return a structure that partially shares the input, if the input structure is flagged immutable. If the input is flagged mutable, it always returns a fleshly copied immutable structure.
  • The linear-updating interface may mutate the input structure if it is flagged mutable, and copies if the input structure is flagged immutable.

If the SRFI does not provide an explicitly-mutating interface, it is actually almost indistinguishable from the existing SRFI spec, except when you compare input and output structures with eq?.

Given that explicitly-mutating interfaces (such as vector-set!) aren't popular in the SRFIs, I think it's good to allow the implementation to take the latter choice.


Discussion on srfi-discuss

Tags: srfi, Immutability, DataStructures

Monday, May 31, 2021

Friday, May 28, 2021

Scheme Requests for Implementation

SRFI 221: Generator/accumulator sub-library

SRFI 221 is now in final status.

This is a set of convenience routines for generators and accumulators intended to blend in with SRFI 158. The authors recommend that they be added to the (srfi 158) library provided by users or implementations. If they are approved by the R7RS-large process, they can also be added to (r7rs generator).

by John Cowan (text) and Arvydas Silanskas (implementation) at Friday, May 28, 2021

Friday, May 21, 2021

Joe Marshall

Stupid Y operator tricks

Here is the delta function: δ = (lambda (f) (f f)). Delta takes a function and tail calls that function on itself. What happens if we apply the delta function to itself? Since the delta function is the argument, it is tail called and applied to itself. Which leads again to itself being tail called and applied to itself. We have a situation of infinite regression: the output of (δ δ) ends up being a restatement of the output of (δ δ). Now in this case, regression is infinite and there is no base case, but imagine that somehow there were a base case, or that somehow we identified a value that an infinite regression equated to. Then each stage of the infinite regression just replicates the previous stage exactly. It is like having a perfectly silvered mirror: it just replicates the image presented to it exactly. By calling delta on delta, we've arranged our perfectly silvered mirror to reflect an image of itself. This leads to the “infinite hall of mirrors” effect.

So let's tweak the delta function so that instead of perfectly replicating the infinite regression, it applies a function g around the replication: (lambda (f) (g (f f))). If we apply this modified delta function to itself, each expansion of the infinite regression ends up wrapping an application of the g around it: (g (f f)) = (g (g (f f))) = (g (g (g (f f)))) = (g (g (g (g … )))). So our modified delta function gives us a nested infinite regression of applications of g. This is like our perfectly silvered mirror, but now the reflected image isn't mirrored exactly: we've put a frame on the mirror. When we arrange for the mirror to reflect itself, each nested reflection also has an image of the frame around the reflection, so we get a set of infinitely nested frames.

An infinite regression of (g (g (g (g … )))) is confusing. What does it mean? We can untangle this by unwrapping an application. (g (g (g (g … )))) is just a call to g. The argument to that call is weird, but we're just calling (g <something>). The result of the infinite regression (g (g (g (g … )))) is simply the result of the outermost call to g. We can use this to build a recursive function.

;; If factorial = (g (g (g (g … )))), then
;; factorial = (g factorial), where

(defun g (factorial)
  (lambda (x)
    (if (zerop x)
        1
        (* x (funcall factorial (- x 1))))))
The value returned by an inner invocation of g is the value that will be funcalled in the altenative branch of the conditional.

Y is defined thus:

Y = λg.(λf.g(f f))(λf.g(f f))

A straightforward implementation attempt would be
;; Non working y operator
(defun y (g)
  (let ((d (lambda (f) (funcall g (funcall f f)))))
    (funcall d d)))
but since lisp is a call-by-value language, it will attempt to (funcall f f) before funcalling g, and this will cause runaway recursion. We can avoid the runaway recursion by delaying the (funcall f f) with a strategically placed thunk
;; Call-by-value y operator
;; returns (g (lambda () (g (lambda () (g (lambda () … ))))))
(defun y (g)
  (let ((d (lambda (f) (funcall g (lambda () (funcall f f))))))
    (funcall d d)))
Since the recursion is now wrapped in a thunk, we have to funcall the thunk to force the recursive call. Here is an example where we see that:
* (funcall (Y (lambda (thunk)
                (lambda (x)
                  (if (zerop x)
                      1
                      (* x (funcall (funcall thunk) (- x 1)))))))
           6)
720
the (funcall thunk) invokes the thunk in order to get the actual recursive function, which we when then funcall on (- x 1).

By wrapping the self-application with a thunk, we've made the call site where we use the thunk more complicated. We can clean that up by wrapping the call to the thunk in something nicer:

* (funcall
    (y (lambda (thunk)
         (flet ((factorial (&rest args)
                  (apply (funcall thunk) args)))

           (lambda (x)
             (if (zerop x)
                 1
                 (* x (factorial (- x 1))))))))
      6)
720
And we can even go so far as to hoist that wrapper back up into the definiton of y
(defun y1 (g)
  (let ((d (lambda (f) (funcall g (lambda (&rest args) (apply (funcall f f) args))))))
    (funcall d d)))

* (funcall
    (y1 (lambda (factorial)
          (lambda (x)           
            (if (zerop x)
                1
                (* x (funcall factorial x))))))
    6)
720
y1 is an alternative formulation of the Y operator where we've η-expanded the recursive call to avoid the runaway recursion.

The η-expanded version of the applicative order Y operator has the advantage that it is convenient for defining recursive functions. The thunkified version is less convenient because you have to force the thunk before using it, but it allows you to use the Y operator to define recursive data structures as well as functions:

(Y
  (lambda (delayed-ones)
    (cons-stream 1 (delayed-ones))))
{1 …}

The argument to the thunkified Y operator is itself a procedure of one argument, the thunk. Y returns the result of calling its argument. Y should return a procedure, so the argument to Y should return a procedure. But it doesn't have to immediately return a procedure, it just has to eventually return a procedure, so we could, for example, print something before returning the procedure:

* (funcall (Y (lambda (thunk)
                (format t "~%Returning a procedure")
                (lambda (x)
                  (if (zerop x)
                      1
                      (* x (funcall (funcall thunk) (- x 1)))))))
         6)
Returning a procedure
Returning a procedure
Returning a procedure
Returning a procedure
Returning a procedure
Returning a procedure
720
There is one caveat. You must be able to return the procedure without attempting to make the recursive call.

Let's transform the returned function before returning it by applying an arbitrary function h to it:

(Y (lambda (thunk)
     (h (lambda (x)
          (if (zerop x)
              1
              … )))))
Ok, so now when we (funcall thunk) we don't get what we want, we've got an invocation of h around it. If we have an inverse to h, h-1, available, we can undo it:
(y (lambda (thunk)
      (h (lambda (x)
           (if (zerop x)
               1
               (* (funcall (h-1 (funcall thunk)) (- x 1))))))))
As a concrete example, we return a list and at the call site we extract the first element of that list before calling it:
* (funcall (car (y (lambda (thunk)
                     (list (lambda (x)
                             (if (zerop x)
                                 1
                                 (* x (funcall (car (funcall thunk)) (- x 1))))))))
           6)
720
So we can return a list of mutually recursive functions:
(y (lambda (thunk)
    (list
      ;; even?
      (lambda (n)
        (or (zerop n)
            (funcall (cadr (funcall thunk)) (- n 1))))

      ;; odd?
      (lambda (n)
        (and (not (zerop n))
             (funcall (car (funcall thunk)) (- n 1))))
      )))
If we use the η-expanded version of the Y operator, then we can adapt it to expect a list of mutually recursive functions on the recursive call:
(defun y* (&rest g-list)
  (let ((d (lambda (f)
             (map 'list (lambda (g)
                          (lambda (&rest args)
                            (apply (apply g (funcall f f)) args)))
                  g-list))))
     (funcall d d)))
which we could use like this:
* (let ((eo (y* (lambda (even? odd?)
                  (declare (ignore even?))
                  (lambda (n)
                    (or (zerop n)
                        (funcall odd? (- n 1)))))

                (lambda (even? odd?)
                  (declare (ignore odd?))
                  (lambda (n)
                    (and (not (zerop n))
                         (funcall even? (- n 1))))))))
     (let ((even? (car eo))
           (odd?  (cadr eo)))
       (do ((i 0 (+ i 1)))
           ((>= i 5))
         (format t "~%~d, ~s ~s"
                 i
                 (funcall even? i)
                 (funcall odd? i)))))))
                 
0, T NIL
1, NIL T
2, T NIL
3, NIL T
4, T NIL
Instead of returning a list of mutually recursive functions, we could return them as multiple values. We just have to be expecting multiple values at the call site:
(defun y* (&rest gs)
  (let ((d (lambda (f)
             (apply #'values
                    (map 'list
                         (lambda (g)
                           (lambda (&rest args)
                             (apply (multiple-value-call g (funcall f f)) args)))
                         gs)))))
    (funcall d d)))

MIT Scheme used to have a construct called a named lambda. A named lambda has an extra first argument that is automatically filled in with the function itself. So during evaluation of the body of a named lambda, the name is bound to the named lambda, enabling the function to call itself recursively:

(defmacro named-lambda ((name &rest args) &body body)
  `(y1 (lambda (,name)
         (lambda ,args
           ,@body))))

* (funcall (named-lambda (factorial x)
            (if (zerop x)
                1
                (* x (funcall factorial (- x 1)))))
         6)
720
This leads us to named let expressions. In a named let, the implicit lambda that performs the let bindings is a named lambda. Using that name to invoke the lambda on a different set of arguments is like recursively re-doing the let.
* (named-let fact ((x 6)) (if (zerop x) 1 (* x (funcall fact (- x 1)))))

720

In Scheme, you use letrec to define recursive or mutually recursive procedures. Internal definitions expand into an appropriate letrec. letrec achieves the necessary circularity not through the Y operator, but through side effects. It is hard to tell the difference, but there is a difference. Using the Y operator would allow you to have recursion, but avoid the implicit side effects in a letrec.

Oleg Kiselyov has more to say about the Y operator at http://okmij.org/ftp/Computation/fixed-point-combinators.html

by Joe Marshall (noreply@blogger.com) at Friday, May 21, 2021