Planet Scheme

Saturday, May 15, 2021

Joe Marshall

β-conversion

If you have an expression that is an application, and the operator of the application is a lambda expression, then you can β-reduce the application by substituting the arguments of the application for the bound variables of the lambda within the body of the lambda.

(defun beta (expression if-reduced if-not-reduced)
  (if (application? expression)
      (let ((operator (application-operator expression))
            (operands (application-operands expression)))
        (if (lambda? operator)
            (let ((bound-variables (lambda-bound-variables operator))
                  (body (lambda-body operator)))
              (if (same-length? bound-variables operands)
                  (funcall if-reduced
                           (xsubst body
                                   (table/extend* (table/empty)
                                                  bound-variables
                                                  operands)))
                  (funcall if-not-reduced)))
            (funcall if-not-reduced)))
      (funcall if-not-reduced)))

* (beta '((lambda (x y) (lambda (z) (* x y z))) a (+ z 3))
        #'identity (constantly nil))
(LAMBDA (#:Z460) (* A (+ Z 3) #:Z460))

A large, complex expression may or may not have subexpressions that can be β-reduced. If neither an expression or any of its subexpressions can be β-reduced, then we say the expression is in “β-normal form”. We may be able to reduce an expression to β-normal form by β-reducing where possible. A β-reduction can introduce further reducible expressions if we substitute a lambda expression for a symbol in operator position, so reducing to β-normal form is an iterative process where we continue to reduce any reducible expressions that arise from substitution.

(defun beta-normalize-step (expression)
  (expression-dispatch expression

    ;; case application
    (lambda (subexpressions)
      ;; Normal order reduction
      ;; First, try to beta reduce the outermost application,
      ;; otherwise, recursively descend the subexpressions, working
      ;; from left to right.
      (beta expression
            #'identity
            (lambda ()
              (labels ((l (subexpressions)
                         (if (null subexpressions)
                             '()
                             (let ((new-sub (beta-normalize-step (car subexpressions))))
                               (if (eq new-sub (car subexpressions))
                                   (let ((new-tail (l (cdr subexpressions))))
                                     (if (eq new-tail (cdr subexpressions))
                                         subexpressions
                                         (cons (car subexpressions) new-tail)))
                                   (cons new-sub (cdr subexpressions)))))))
                (let ((new-subexpressions (l subexpressions)))
                  (if (eq new-subexpressions subexpressions)
                      expression
                      (make-application new-subexpressions)))))))

    ;; case lambda
    (lambda (bound-variables body)
      (let ((new-body (beta-normalize-step body)))
        (if (eql new-body body)
            expression
            (make-lambda bound-variables new-body))))

    ;; case symbol
    (constantly expression)))

;;; A normalized expression is a fixed point of the
;;; beta-normalize-step function.
(defun beta-normalize (expression)
  (do ((expression expression (beta-normalize-step expression))
       (expression1 '() expression)
       (count 0 (+ count 1)))
      ((eq expression expression1)
       (format t "~%~d beta reductions" (- count 1))
       expression)))

You can compute just by using β-reduction. Here we construct an expression that reduces to the factorial of 3. We only have β-reduction, so we have to encode numbers with Church encoding.

(defun test-form ()
  (let ((table
         (table/extend*
          (table/empty)
          '(one
            three
            *
            pred
            zero?
            y)
          '(
            ;; Church numeral one
            (lambda (f) (lambda (x) (f x)))

            ;; Church numeral three 
            (lambda (f) (lambda (x) (f (f (f x)))))

            ;; * (multiply Church numerals)
            (lambda (m n)
              (lambda (f)
                (m (n f))))

            ;; pred (subtract 1 from Church numeral)
            (lambda (n)
              (lambda (f)
                (lambda (x) (((n (lambda (g)
                                   (lambda (h)
                                     (h (g f)))))
                              (lambda (u) x))
                             (lambda (u) u)))))

            ;; zero? (test if Church numeral is zero)
            (lambda (n t f) ((n (lambda (x) f)) t))

            ;; Y operator for recursion
            (lambda (f)
              ((lambda (x) (f (x x)))
               (lambda (x) (f (x x)))))

            )))
        (expr
         '((lambda (factorial)
             (factorial three))
           (y (lambda (fact)
                (lambda (x)
                  (zero? x
                   one
                   (* (fact (pred x)) x))))))))
    (xsubst expr table)))

* (test-form)

((LAMBDA (FACTORIAL) (FACTORIAL (LAMBDA (F) (LAMBDA (X) (F (F (F X)))))))
 ((LAMBDA (F) ((LAMBDA (X) (F (X X))) (LAMBDA (X) (F (X X)))))
  (LAMBDA (FACT)
    (LAMBDA (X)
      ((LAMBDA (N T F) ((N (LAMBDA (X) F)) T)) X
       (LAMBDA (F) (LAMBDA (X) (F X)))
       ((LAMBDA (M N) (LAMBDA (F) (M (N F))))
        (FACT
         ((LAMBDA (N)
            (LAMBDA (F)
              (LAMBDA (X)
                (((N (LAMBDA (G) (LAMBDA (H) (H (G F))))) (LAMBDA (U) X))
                 (LAMBDA (U) U)))))
          X))
        X))))))

* (beta-normalize (test-form))

127 beta reductions
(LAMBDA (F) (LAMBDA (X) (F (F (F (F (F (F X))))))))

This is the Church numeral for 6.

I find it pretty amazing that we can bootstrap ourselves up to arithmetic just by repeatedly β-reducing where we can. It doesn't seem like we're actually doing any work. We're just replacing names with what they stand for.

The β-substitution above replaces all the bound variables with their arguments if there is the correct number of arguments. One could easily implement a partial β-substitution that replaced only some of the bound variables. You'd still have an application, but some of the bound variables in the lambda would be eliminated and the corresponding argument would be removed.

by Joe Marshall (noreply@blogger.com) at Saturday, May 15, 2021

Thursday, May 13, 2021

Andy Wingo

cross-module inlining in guile

Greetings, hackers of spaceship Earth! Today's missive is about cross-module inlining in Guile.

a bit of history

Back in the day... what am I saying? I always start these posts with loads of context. Probably you know it all already. 10 years ago, Guile's partial evaluation pass extended the macro-writer's bill of rights to Schemers of the Guile persuasion. This pass makes local function definitions free in many cases: if they should be inlined and constant-folded, you are confident that they will be. peval lets you write clear programs with well-factored code and still have good optimization.

The peval pass did have a limitation, though, which wasn't its fault. In Guile, modules have historically been a first-order concept: modules are a kind of object with a hash table inside, which you build by mutating. I speak crassly but that's how it is. In such a world, it's hard to reason about top-level bindings: what module do they belong to? Could they ever be overridden? When you have a free reference to a, and there's a top-level definition of a in the current compilation unit, is that the a that's being referenced, or could it be something else? Could the binding be mutated in the future?

During the Guile 2.0 and 2.2 cycles, we punted on this question. But for 3.0, we added the notion of declarative modules. For these modules, bindings which are defined once in a module and which are not mutated in the compilation unit are declarative bindings, which can be reasoned about lexically. We actually translate them to a form of letrec*, which then enables inlining via peval, contification, and closure optimization -- in descending order of preference.

The upshot is that with Guile 3.0, top-level bindings are no longer optimization barriers, in the case of declarative modules, which are compatible enough with historic semantics and usage that they are on by default.

However, module boundaries have still been an optimization barrier. Take (srfi srfi-1), a little utility library on lists. One definition in the library is xcons, which is cons with arguments reversed. It's literally (lambda (cdr car) (cons car cdr)). But does the compiler know that? Would it know that (car (xcons x y)) is the same as y? Until now, no, because no part of the optimizer will look into bindings from outside the compilation unit.

mr compiler, tear down this wall

But no longer! Guile can now inline across module boundaries -- in some circumstances. This feature will be part of a future Guile 3.0.8.

There are actually two parts of this. One is the compiler can identify a set of "inlinable" values from a declarative module. An inlinable value is a small copyable expression. A copyable expression has no identity (it isn't a fresh allocation), and doesn't reference any module-private binding. Notably, lambda expressions can be copyable, depending on what they reference. The compiler then extends the module definition that's residualized in the compiled file to include a little procedure that, when passed a name, will return the Tree-IL representation of that binding. The design of that was a little tricky; we want to avoid overhead when using the module outside of the compiler, even relocations. See compute-encoding in that module for details.

With all of that, we can call ((module-inlinable-exports (resolve-interface '(srfi srfi-1))) 'xcons) and get back the Tree-IL equivalent of (lambda (cdr car) (cons car cdr)). Neat!

The other half of the facility is the actual inlining. Here we lean on peval again, causing <module-ref> forms to trigger an attempt to copy the term from the imported module to the residual expression, limited by the same effort counter as the rest of peval.

The end result is that we can be absolutely sure that constants in imported declarative modules will inline into their uses, and fairly sure that "small" procedures will inline too.

caveat: compiled imported modules

There are a few caveats about this facility, and they are sufficiently sharp that I should probably fix them some day. The first one is that for an imported module to expose inlinable definitions, the imported module needs to have been compiled already, not loaded from source. When you load a module from source using the interpreter instead of compiling it first, the pipeline is optimized for minimizing the latency between when you ask for the module and when it is available. There's no time to analyze the module to determine which exports are inlinable and so the module exposes no inlinable exports.

This caveat is mitigated by automatic compilation, enabled by default, which will compile imported modules as needed.

It could also be fixed for modules by topologically ordering the module compilation sequence; this would allow some parallelism in the build but less than before, though for module graphs with cycles (these exist!) you'd still have some weirdness.

caveat: abi fragility

Before Guile supported cross-module inlining, there was only explicit inlining across modules in Guile, facilitated by macros. If you write a module that has a define-inlinable export and you think about its ABI, then you know to consider any definition referenced by the inlinable export, and you know by experience that its code may be copied into other compilation units. Guile doesn't automatically recompile a dependent module when a macro that it uses changes, currently anyway. Admittedly this situation leans more on culture than on tools, which could be improved.

However, with automatically inlinable exports, this changes. Any definition in a module could be inlined into its uses in other modules. This may alter the ABI of a module in unexpected ways: you think that module C depends on module B, but after inlining it may depend on module A as well. Updating module B might not update the inlined copies of values from B into C -- as in the case of define-inlinable, but less lexically apparent.

At higher optimization levels, even private definitions in a module can be inlinable; these may be referenced if an exported macro from the module expands to a term that references a module-private variable, or if an inlinable exported binding references the private binding. But these optimization levels are off by default precisely because I fear the bugs.

Probably what this cries out for is some more sensible dependency tracking in build systems, but that is a topic for another day.

caveat: reproducibility

When you make a fresh checkout of Guile from git and build it, the build proceeds in the following way.

Firstly, we build libguile, the run-time library implemented in C.

Then we compile a "core" subset of Scheme files at optimization level -O1. This subset should include the evaluator, reader, macro expander, basic run-time, and compilers. (There is a bootstrap evaluator, reader, and macro expander in C, to start this process.) Say we have source files S0, S1, S2 and so on; generally speaking, these files correspond to Guile modules M0, M1, M2 etc. This first build produces compiled files C0, C1, C2, and so on. When compiling a file S2 implementing module M2, which happens to import M1 and M0, it may be M1 and M0 are provided by compiled files C1 and C0, or possibly they are loaded from the source files S1 and S0, or C1 and S0, or S1 and C0.

The bootstrap build uses make for parallelism, with each compile process starts afresh, importing all the modules that comprise the compiler and then using them to compile the target file. As the build proceeds, more and more module imports can be "serviced" by compiled files instead of source files, making the build go faster and faster. However this introduces system-specific nondeterminism as to the set of compiled files available when compiling any other file. This strategy works because it doesn't really matter whether module M1 is provided by compiled file C1 or source file S1; the compiler and the interpreter implement the same language.

Once the compiler is compiled at optimization level -O1, Guile then uses that freshly built compiler to build everything at -O2. We do it in this way because building some files at -O1 then all files at -O2 takes less time than going straight to -O2. If this sounds weird, that's because it is.

The resulting build is reproducible... mostly. There is a bug in which some unique identifiers generated as part of the implementation of macros can be non-reproducible in some cases, and that disabling parallel builds seems to solve the problem. The issue being that gensym (or equivalent) might be called a different number of times depending on whether you are loading a compiled module, or whether you need to read and macro-expand it. The resulting compiled files are equivalent under alpha-renaming but not bit-identical. This is a bug to fix.

Anyway, at optimization level -O1, Guile will record inlinable definitions. At -O2, Guile will actually try to do cross-module inlining. We run into two issues when compiling Guile; one is if we are in the -O2 phase, and we compile a module M which uses module N, and N is not in the set of "core" modules. In that case depending on parallelism and compile order, N may be loaded from source, in which case it has no inlinable exports, or from a compiled file, in which case it does. This is not a great situation for the reliability of this optimization. I think probably in Guile we will switch so that all modules are compiled at -O1 before compiling at -O2.

The second issue is more subtle: inlinable bindings are recorded after optimization of the Tree-IL. This is more optimal than recording inlinable bindings before optimization, as a term that is not inlinable due to its size in its initial form may become small enough to inline after optimization. However, at -O2, optimization includes cross-module inlining! A term that is inlinable at -O1 may become not inlinable at -O2 because it gets slightly larger, or vice-versa: terms that are too large at -O1 could shrink at -O2. We don't even have a guarantee that we will reach a fixed point even if we repeatedly recompile all inputs at -O2, because we allow non-shrinking optimizations.

I think this probably calls for a topological ordering of module compilation inside Guile and perhaps in other modules. That would at least give us reproducibility, provided we avoid the feedback loop of keeping around -O2 files compiled from a previous round, even if they are "up to date" (their corresponding source file didn't change).

and for what?

People who have worked on inliners will know what I mean that a good inliner is like a combine harvester: ruthlessly efficient, a qualitative improvement compared to not having one, but there is a pointy end with lots of whirling blades and it's important to stop at the end of the row. You do develop a sense of what will and won't inline, and I think Dybvig's "Macro writer's bill of rights" encompasses this sense. Luckily people don't lose fingers or limbs to inliners, but inliners can maim expectations, and cross-module inlining more so.

Still, what it buys us is the freedom to be abstract. I can define a module like:

(define-module (elf)
  #:export (ET_NONE ET_REL ET_EXEC ET_DYN ET_CORE))

(define ET_NONE		0)		; No file type
(define ET_REL		1)		; Relocatable file
(define ET_EXEC		2)		; Executable file
(define ET_DYN		3)		; Shared object file
(define ET_CORE		4)		; Core file

And if a module uses my (elf) module and references ET_DYN, I know that the module boundary doesn't prevent the value from being inlined as a constant (and possibly unboxed, etc).

I took a look and on our usual microbenchmark suite, cross-module inlining doesn't make a difference. But that's both a historical oddity and a bug: firstly that the benchmark suite comes from an old Scheme world that didn't have modules, and so won't benefit from cross-module inlining. Secondly, Scheme definitions from the "default" environment that aren't explicitly recognized as primitives aren't inlined, as the (guile) module isn't declarative. (Probably we should fix the latter at some point.)

But still, I'm really excited about this change! Guile developers use modules heavily and have been stepping around this optimization boundary for years. I count 100 direct uses of define-inlinable in Guile, a number of them inside macros, and many of these are to explicitly hack around the optimization barrier. I really look forward to seeing if we can remove some of these over time, to go back to plain old define and just trust the compiler to do what's needed.

by the numbers

I ran a quick analysis of the modules include in Guile to see what the impact was. Of the 321 files that define modules, 318 of them are declarative, and 88 contain inlinable exports (27% of total). Of the 6519 total bindings exported by declarative modules, 566 of those are inlinable (8.7%). Of the inlinable exports, 388 (69%) are functions (lambda expressions), 156 (28%) are constants, and 22 (4%) are "primitives" referenced by value and not by name, meaning definitions like (define head car) (instead of re-exporting car as head).

On the use side, 90 declarative modules import inlinable bindings (29%), resulting in about 1178 total attempts to copy inlinable bindings. 902 of those attempts are to copy a lambda expressions in operator position, which means that peval will attempt to inline their code. 46 of these attempts fail, perhaps due to size or effort constraints. 191 other attempts end up inlining constant values. 20 inlining attempts fail, perhaps because a lambda is used for a value. Somehow, 19 copied inlinable values get elided because they are evaluated only for their side effects, probably to clean up let-bound values that become unused due to copy propagation.

All in all, an interesting endeavor, and one to improve on going forward. Thanks for reading, and catch you next time!

by Andy Wingo at Thursday, May 13, 2021

Tuesday, May 11, 2021

GNU Guix

GNU Guix 1.3.0 released

Image of a Guix test pilot.

We are pleased to announce the release of GNU Guix version 1.3.0!

The release comes with ISO-9660 installation images, a virtual machine image, and with tarballs to install the package manager on top of your GNU/Linux distro, either from source or from binaries. Guix users can update by running guix pull.

It’s been almost 6 months since the last release, during which 212 people contributed code and packages, and a number of people contributed to other important tasks—code review, system administration, translation, web site updates, Outreachy mentoring, and more.

There’s been more than 8,300 commits in that time frame, which we’ll humbly try to summarize in these release notes.

User experience

A distinguishing Guix feature is its support for declarative deployment: instead of running a bunch of guix install and guix remove commands, you run guix package --manifest=manifest.scm, where manifest.scm lists the software you want to install in a snippet that looks like this:

;; This is 'manifest.scm'.
(specifications->manifest
  (list "emacs" "guile" "gcc-toolchain"))

Doing that installs exactly the packages listed. You can have that file under version control and share it with others, which is convenient. Until now, one would have to write the manifest by hand—not insurmountable, but still a barrier to someone willing to migrate to the declarative model.

The new guix package --export-manifest command (and its companion --export-channels option) produces a manifest based on the contents of an existing profile. That makes it easy to transition from the classic “imperative” model, where you run guix install as needed, to the more formal declarative model. This was long awaited!

Users who like to always run the latest and greatest pieces of the free software commons will love the new --with-latest package transformation option. Using the same code as guix refresh, this option looks for the latest upstream release of a package, fetches it, authenticates it, and builds it. This is useful in cases where the new version is not yet packaged in Guix. For example, the command below, if run today, will (attempt to) install QEMU 6.0.0:

$ guix install qemu --with-latest=qemu 
The following package will be upgraded:
   qemu 5.2.0 → 6.0.0

Starting download of /tmp/guix-file.eHO6MU
From https://download.qemu.org//qemu-6.0.0.tar.bz2...
 …0.tar.bz2  123.3MiB                                                                                                                      28.2MiB/s 00:04 [##################] 100.0%

Starting download of /tmp/guix-file.9NRlvT
From https://download.qemu.org//qemu-6.0.0.tar.bz2.sig...
 …tar.bz2.sig  310B                                                                                                                         1.2MiB/s 00:00 [##################] 100.0%
gpgv: Signature made Thu 29 Apr 2021 09:28:25 PM CEST
gpgv:                using RSA key CEACC9E15534EBABB82D3FA03353C9CEF108B584
gpgv: Good signature from "Michael Roth <michael.roth@amd.com>"
gpgv:                 aka "Michael Roth <mdroth@utexas.edu>"
gpgv:                 aka "Michael Roth <flukshun@gmail.com>"
The following derivation will be built:
   /gnu/store/ypz433vzsbg3vjp5374fr9lhsm7jjxa4-qemu-6.0.0.drv

…

There’s one obvious caveat: this is not guaranteed to work. If the new version has a different build system, or if it requires extra dependencies compared to the version currently packaged, the build process will fail. Yet, it provides users with additional flexibility which can be convenient at times. For developers, it’s also a quick way to check whether a given package successfully builds against the latest version of one of its dependencies.

Several changes were made here and there to improve user experience. As an example, a new --verbosity level was added. By default (--verbosity=1), fewer details about downloads get printed, which matches the expectation of most users.

Another handy improvement is suggestions when making typos:

$ guix package --export-manifests
guix package: error: export-manifests: unrecognized option
hint: Did you mean `export-manifest'?

$ guix remve vim
guix: remve: command not found
hint: Did you mean `remove'?

Try `guix --help' for more information.

People setting up build offloading over SSH will enjoy the simplified process, where the guile executable no longer needs to be in PATH, with appropriate GUILE_LOAD_PATH settings, on target machines. Instead, offloading now channels all its operations through guix repl.

The Guix reference manual is fully translated into French, German, and Spanish, with preliminary translations in Russian, Chinese, and other languages. Guix itself is fully translated in French, German, and Slovak, and partially translated in almost twenty other languages. Translations are now handled on Weblate, and you can help!

Developer tools

We have good news for packagers! First, guix import comes with a new Go recursive importer, that can create package definitions or templates thereof for whole sets of Go packages. The guix import crate command, for Rust packages, now honors “semantic versioning” when used in recursive mode.

The guix refresh command now includes new “updaters”: sourceforge, for code hosted on SourceForge, and generic-html which, as the name implies, is a generic update that works by scanning package home pages. This greatly improves guix refresh coverage.

Packagers and developers may also like the new --with-patch package transformation option, which provides a way to build a bunch of packages with a patch applied to one or several of them.

Building on the Guix System image API introduced in v1.2.0, the guix system vm-image and guix system disk-image are superseded by a unified guix system image command. For example,

guix system vm-image --save-provenance config.scm

becomes

guix system image -t qcow2 --save-provenance config.scm

while

guix system disk-image -t iso9660 gnu/system/install.scm

becomes

guix system image -t iso9660 gnu/system/install.scm

This brings performance benefits; while a virtual machine used to be involved in the production of the image artifacts, the low-level bits are now handled by the dedicated genimage tool. Another benefit is that the qcow2 format is now compressed, which removes the need to manually compress the images by post-processing them with xz or another compressor. To learn more about the guix system image command, you can refer to its documentation.

Last but not least, the introduction of the GUIX_EXTENSIONS_PATH Guix search path should make it possible for Guix extensions, such as the Guix Workflow Language, to have their Guile modules automatically discovered, simplifying their deployments.

Performance

One thing you will hopefully notice is that substitute installation (downloading pre-built binaries) became faster, as we explained before. This is in part due to the opportunistic use of zstd compression, which has a high decompression throughput. The daemon and guix publish support zstd as an additional compression method, next to gzip and lzip.

Another change that can help fetch substitutes more quickly is local substitute server discovery. The new --discover option of guix-daemon instructs it to discover and use substitute servers on the local-area network (LAN) advertised with the mDNS/DNS-SD protocols, using Avahi. Similarly, guix publish has a new --advertise option to advertise itself on the LAN.

On Guix System, you can run herd discover guix-daemon on to turn discovery on temporarily, or you can enable it in your system configuration. Opportunistic use of neighboring substitute servers is entirely safe, thanks to reproducible builds.

In other news, guix system init has been optimized, which contributes to making Guix System installation faster.

On some machines with limited resources, building the Guix modules is an expensive operation. A new procedure, channel-with-substitutes-available from the (guix ci) module, can now be used to pull Guix to the latest commit which has already been built by the build farm. Refer to the documentation for an example.

POWER9 support, packages, services, and more!

POWER9 support is now available as a technology preview, thanks to the tireless work of those who helped porting Guix to that platform. There aren't many POWER9 binary substitutes available yet, due to the limited POWER9 capacity of our build farm, but if you are not afraid of building many packages from source, we'd be thrilled to hear back from your experience!

2,000 packages were added, for a total of more than 17K packages; 3,100 were updated. The distribution comes with GNU libc 2.31, GCC 10.3, Xfce 4.16.0, Linux-libre 5.11.15, LibreOffice 6.4.7.2, and Emacs 27.2, to name a few. Among the many packaging changes, one that stands out is the new OCaml bootstrap: the OCaml package is now built entirely from source via camlboot. Package updates also include Cuirass 1.0, the service that powers our build farm.

The services catalog has also seen new additions such as wireguard, syncthing, ipfs, a simplified and more convenient service for Cuirass, and more! You can search for services via the guix system search facility.

The NEWS file lists additional noteworthy changes and bug fixes you may be interested in.

Try it!

The installation script has been improved to allow for more automation. For example, if you are in a hurry, you can run it with:

# yes | ./install.sh

to proceed to install the Guix binary on your system without any prompt!

You may also be interested in trying the Guix System demonstration VM image which now supports clipboard integration with the host and dynamic resizing thanks to the SPICE protocol, which we hope will improve the user experience.

To review all the installation options at your disposal, consult the download page and don't hesitate to get in touch with us.

Enjoy!

Credits

Luis Felipe (illustration)

About GNU Guix

GNU Guix is a transactional package manager and an advanced distribution of the GNU system that respects user freedom. Guix can be used on top of any system running the Hurd or the Linux kernel, or it can be used as a standalone operating system distribution for i686, x86_64, ARMv7, AArch64 and POWER9 machines.

In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. When used as a standalone GNU/Linux distribution, Guix offers a declarative, stateless approach to operating system configuration management. Guix is highly customizable and hackable through Guile programming interfaces and extensions to the Scheme language.

by Ludovic Courtès, Maxim Cournoyer at Tuesday, May 11, 2021

Programming Praxis

Potholes

Today’s exercise comes from one of those online code exercises, via Stack Overflow:

There is a machine that can fix all potholes along a road 3 units in length. A unit of road will be represented by a period in a string. For example, “…” is one section of road 3 units in length. Potholes are marked with an “X” in the road, and also count as 1 unit of length. The task is to take a road of length N and fix all potholes with the fewest possible sections repaired by the machine.

The city where I live needs one of those machines.

Here are some examples:

.X.          1
.X...X       2
XXX.XXXX     3
.X.XX.XX.X   3

Your task is to write a program that takes a string representing a road and returns the smallest number of fixes that will remove all the potholes. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at Tuesday, May 11, 2021

Monday, May 10, 2021

Joe Marshall

Substitution

In McCarthy's early papers on Lisp, he notes that he needs a modified version of subst which needs to be aware of quoted expressions (and avoid substituting within them). He would also need a subst that was aware of lambda expressions. It would have to avoid substituting within the lambda if the name substituted matches one of the bound variables. To be useful for evaluation, it will have to deal with accidental variable capture when substituting within a lambda.

The root problem is that expressions are actually structured objects, but we are working with the list representation of those objects. Instead of substituting by operating on objects, we substitute on the list representation. We have to arrange for the syntactic substitution on the list representation to preserve the semantics of substitution on the objects they represent.

In the substitution model, we take a symbolic expression and replace some of the atoms in the expression with other expressions. We first need a way to discriminate between the different kinds of expressions. An expression is either an atomic symbol, or a list of expressions called an application. There are no other kinds of expressions.

(defun expression-dispatch (expression if-symbol if-application)
  (cond ((symbolp expression) (funcall if-symbol expression))
        ((consp expression)   (funcall if-application expression))
        (t (error "~s is not an expression." expression))))
Substitution is straightforward:
(defun xsubst (table expression)
  (expression-dispatch expression
    (lambda (symbol)
      (funcall table symbol #'identity (constantly symbol)))

    (lambda (subexpressions)
      (map 'list (lambda (subexpression) (xsubst table subexpression)) subexpressions))))

* (let ((table (table/extend (table/empty) 'x '(* a 42))))
    (xsubst table '(+ x y)))  
(+ (* A 42) Y)
We need a table of multiple substitutions so that we can substitute in parallel:
* (let ((table (table/extend
                 (table/extend (table/empty) 'x 'y)
                 'y 'x)))
    (xsubst table '(+ x y)))
(+ Y X)

So far, so good. Let's add lambda expressions. First, we need to add a new expression kind:

(defun expression-dispatch (expression if-symbol if-lambda if-application) 
  (cond ((symbolp expression) (funcall if-symbol expression))
        ((consp expression)
         (cond ((eq (car expression) 'lambda)
                (funcall if-lambda (cadr expression) (caddr expression)))
               (t (funcall if-application expression))))
        (t (error "~s is not an expression." expression))))

Substitution within a lambda expression is a bit tricky. First, you don't want to substitute a symbol if it is one of the bound variables of the lambda expression. Second, substituting a symbol may introduce more symbols. We don't want the new symbols to be accidentally captured by the bound variables in the lambda. If any new symbol has the same name as a bound variable, we have to rename the bound variable (and all its occurrances) to a fresh name so that it doesn't capture the new symbol being introduced. We'll need a helper function

(defun free-variables (expression)
   (expression-dispatch expression
     (lambda (symbol) (list symbol))
     (lambda (bound-variables body)
       (set-difference (free-variables body) bound-variables))
     (lambda (subexpressions)
       (fold-left #'union '() (map 'list #'free-variables subexpressions)))))
Now when we substitute within a lambda, we first find each free variable in the lambda, look it up in the substitution table, and collect the free variables of the substituted value:
(map 'list (lambda (var)
             (funcall table var #'free-variables (constantly '())))
      (free-variables expression))
This gives us the new free variables for each substitution. The union of all of these is the set of all the new free variables
(fold-left #'union '()
           (map 'list (lambda (var)
                        (funcall table var #'free-variables (constantly '())))
                 (free-variables expression)))
We have to rename the bound variables that are in this set:
 
(intersection 
  bound-variables
  (fold-left #'union '()
             (map 'list (lambda (var)
                          (funcall table var #'free-variables (constantly '())))
                  (free-variables expression))))
So we make a little table for renaming:
(defun make-alpha-table (variables)
  (fold-left (lambda (table variable)
               (table/extend table variable (gensym (symbol-name variable))))
             (table/empty)
             variables))

(let ((alpha-table
       (make-alpha-table
        (intersection 
          bound-variables
          (fold-left #'union '()
                     (map 'list (lambda (var)
                                  (funcall table var #'free-variables (constantly '())))
                          (free-variables expression)))))))
  …)
We rename the bound variables as necessary:
  (make-lambda
    (map 'list (lambda (symbol)
                 (funcall alpha-table symbol #'identity (constantly symbol)))
         bound-variables)
    …)
Finally, we redact the bound variables from the substitution table and append the alpha-table to make the substitutions we need for the lambda body
  (make-lambda
   (map 'list (lambda (symbol)
                (funcall alpha-table symbol #'identity (constantly symbol)))
        bound-variables)
   (xsubst (table/append alpha-table (table/redact* table bound-variables))
           body))))
The entire definition of xsubst is now this:
(defun xsubst (table expression)
  (expression-dispatch expression
    (lambda (symbol)
      (funcall table symbol #'identity (constantly symbol)))

    (lambda (bound-variables body)
      (let ((alpha-table
             (make-alpha-table
              (intersection
               bound-variables
               (fold-left #'union '()
                          (map 'list (lambda (var)
                                       (funcall table var
                                                #'free-variables
                                                (constantly '())))
                               (set-difference (free-variables body) bound-variables)))))))
        (make-lambda
         (map 'list (lambda (symbol)
                      (funcall alpha-table symbol #'identity (constantly symbol)))
              bound-variables)
         (xsubst (table/append alpha-table (table/redact* table bound-variables))
                 body))))

    (lambda (subexpressions)
      (make-application
       (map 'list (lambda (subexpression)
                    (xsubst table subexpression))
            subexpressions)))))
This is certainly more complicated than simple substitution, but we can see it does the right thing here:
* (xsubst (table/extend (table/empty) 'x '(* a y)) '(lambda (y) (+ x y)))
(LAMBDA (#:Y234) (+ (* A Y) #:Y234))

It should be obvious how to add quoted forms. This would require adding a new kind of expression to expression-dispatch and a new handling clause in xsubst that avoids substitution.

I'm not completely happy with how we've added lambda expressions to the expression syntax. Using the symbol lambda as a syntactic marker for lambda expressions causes problems if we also want to use that symbol as an argument or variable. Initially, it seems reasonable to be able to name an argument “lambda”. Within the body of the function, references to the variable lambda would refer to that argument. But what about references in the operator position? By defining lambda expressions as three element lists beginning with the symbol lambda we've made it ambiguous with two-argument applications whose operator is the variable lambda. We have to resolve this ambiguity. The current behavior is that we always interpret the symbol lambda as a syntactic marker so you simply cannot use a variable named lambda as a function.

by Joe Marshall (noreply@blogger.com) at Monday, May 10, 2021

Wednesday, May 5, 2021

The Racket Blog

Racket v8.1

posted by John Clements

Racket version 8.1 is now available from https://racket-lang.org/.

  • DrRacket tabs can be dragged, and have new close buttons.

  • Racket CS supports cross-compilation using raco exe.

  • Racket CS supports Android on 32-bit and 64-bit ARM processors.

  • The database library supports running queries in OS threads.

  • Check-Syntax arrows correctly identify the definition site of identifiers with contracts.

  • Racket CS performance has improved for structure predicates and accessors

  • Racket CS is faster at multiplying extremely large numbers and dividing large integers.

  • Racket CS allows callbacks to raise exceptions if they are annotated with #:callback-exns?.

  • New ephemeron hash tables simplify the implementation of tables where keys can refer to values.

  • Typed Racket supports for/foldr.

  • The stepper works for #lang htdp/*sl.

  • Struct signatures work for the ASL teaching language.

The following people contributed to this release:

Alex Harsányi, Alex Knauth, Alexander Shopov, Alexis King, Andrew Mauer-Oats, Anish Athalye, Ben Greenman, Bert De Ketelaere, Bob Burger, Bogdan Popa, Brian Adkins, Cameron Moy, David Van Horn, Dexter Lagan, Dominik Pantůček, Fred Fu, Greg Hendershott, Gustavo Massaccesi, Hazel Levine, Ismael Luceno, Jack Firth, Jarhmander, John Clements, Jörgen Brandt, Laurent Orseau, Lazerbeak12345, Matthew Flatt, Matthias Felleisen, Micah Cantor, Mike Sperber, Noah Ma, Patrick McCarty, Paulo Matos, Pavel Panchekha, Philip McGrath, Philippe Meunier, R. Kent Dybvig, Robby Findler, Ryan Culpepper, Ryan Kramer, Sam Tobin-Hochstadt, Sergiu Ivanov, Shu-Hung You, Sorawee Porncharoenwase, Stephen De Gabrielle, William J. Bowman, bmitc, xxyzz, yjqww6, and ymdarake.

Feedback Welcome!

by The Unknown Author at Wednesday, May 5, 2021

Monday, May 3, 2021

Joe Marshall

Lightweight table

You don't need a data structure to make a lookup table. You can make a table just out of the lookup function. In this example, we start with a continuation passing style lookup function:

lookup (key if-found if-not-found)
  
    Invokes (funcall if-found value) if key is in the table,
    invokes (funcall if-not-found) otherwise.
An empty table just invokes the if-not-found continuation:
(defun table/empty ()
  (lambda (key if-found if-not-found)
    (declare (ignore key if-found))
    (funcall if-not-found)))
A table can be extended by wrapping it:
(defun table/extend (table key* value)
  (lambda (key if-found if-not-found)
    (if (eql key key*)
        (funcall if-found value)
        (funcall table key if-found if-not-found))))
So let's try it out:
(defvar *table-1* (table/extend 
                    (table/extend
                      (table/empty)
                      'foo 42)
                    'bar 69))

* (funcall *table-1* 'foo #'identity (constantly 'not-found))
42

* (funcall *table-1* 'quux #'identity (constantly 'not-found))
NOT-FOUND
You can also redact an entry from a table by wrapping the table:
(defun table/redact (table redacted)
  (lambda (key if-found if-not-found)
    (if (eql key redacted)
        (funcall if-not-found)
        (funcall table key if-found if-not-found))))

(defvar *table-2* (table/redact *table-1* 'foo))

* (funcall *table-2* 'foo #'identity (constantly 'not-found))
NOT-FOUND

Are there any advantages to implementing a table in this curious manner? Building a table by nesting a series of lookup steps leads to a linear lookup in linear space, so this kind of table should be more or less comparable to an alist for individual entries. Unlike a traditional table made with a data structure, you cannot enumerate the keys and values in the table. On the other hand, you gain the ability to map keys to values without having to enumerate the keys:

(defun table/bind-predicate (table predicate value)
  (lambda (key if-found if-not-found)
    (if (funcall predicate key)
        (funcall if-found value)
        (funcall table key if-found if-not-found))))

;;; bind all even numbers to the symbol 'EVEN
(defvar *table-3* 
  (table/bind-predicate *table-2* (lambda (n) (and (numberp n) (evenp n))) 'even))

* (funcall *table-3* 6 #'identity (constantly 'not-found))
EVEN
Or you can add a default value to an existing table:
(defun table/add-default (table default-value)
  (lambda (key if-found if-not-found)
    (declare (ignore if-not-found))
    (funcall table key
      if-found
      (lambda () (funcall if-found default-value)))))

(defvar *table-4* (table/add-default *table-3* 'default))

* (funcall *table-4* 'bar #'identity (constantly 'not-found))
69
    
* (funcall *table-4* 'xyzzy #'identity (constantly 'not-found))
DEFAULT

Perhaps the biggest disadvantage of this implementation is the difficulty in inspecting a table.

* *table-4*
#<CLOSURE (LAMBDA (KEY IF-FOUND IF-NOT-FOUND) :IN TABLE/ADD-DEFAULT) {1003CD641B}>
We can use the object inspector to peek inside the closure and maybe sleuth out what this table is made out of, but it isn't just an alist where we can print out the entries.

So far, we've defined a table as being a procedure with the (key if-found if-not-found) signature, but we can flip this around and say that any procedure with a (key if-found if-not-found) signature can be thought of as a table. For example, a regular expression matcher could be considered to be a table of strings (if that were a more useful model).

by Joe Marshall (noreply@blogger.com) at Monday, May 3, 2021

Tuesday, April 27, 2021

Programming Praxis

Remove Multiples

Today’s exercise comes from Stack Overflow:

Given a number N and a set of numbers s = {s1, s2, …, sn} where s1 < s2 < … < sn < N, remove all multiples of {s1, s2, …, sn} from the range 1..N.

You should look at the original post on Stack Overflow. The poster gets the answer wrong (he excludes 3), he get the explanation of his wrong answer wrong (he excludes 10 as a multiple of 2), and his algorithm is odd, to say the least. I’m not making fun of him, but I see this kind of imprecision of thought all the time on the beginner-programmer discussion boards, and I can only think that he will struggle to have a successful programming career.

Your task is to write a program that removes multiples from a range, as described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at Tuesday, April 27, 2021

Friday, April 23, 2021

GNU Guix

Building derivations, how complicated can it be?

Derivations are key to Guix, they're the low-level build instructions used for things like packages, disk images, and most things than end up in the store.

Around a year ago, the established approach to build derivations across multiple machines was daemon offloading. This offloading approach is mostly static in terms of the machines involved and uses SSH to communicate and move things between machines.

The Guix Build Coordinator project set out to provide an alternative approach, both to explore what's possible, but also to provide a usable tool to address two specific use cases.

The first use case was building things (mostly packages) for the purpose of providing substitutes. At the time, the daemon offloading approach used on ci.guix.gnu.org which is the default source of substitutes. This approach was not scaling particularly well, so there was room for improvement.

The second use case was more aspirational, support various quality assurance tasks, like building packages changed by patches, regularly testing fixed output derivations, or building the same derivations across different machines to test for hardware specific differences.

While both these tasks have quite a lot in common, there's still quite a lot of differences, this in part led to a lot of flexibility in the design of the Guix Build Coordinator.

Architecture

Like offloading, the Guix Build Coordinator works in a centralised manner. There's one coordinator process which manages state, and agent processes run on the machines that perform builds. The coordinator plans which builds to allocate to which agents, and agents ask the coordinator for builds, which they then attempt.

Once agents complete a build, they send the log file and any outputs back to the coordinator. This is shown in the diagram below. Note that the Guix Build Coordinator doesn't actually take care of building the individual derivations, that's still left to guix-daemon's on the machines involved.

Guix Build Coordinator sequence diagram

The builds to perform are worked through methodically, a build won't start unless all the inputs are available. This behaviour replicates what guix-daemon does, but across all the machines involved.

If agents can't setup to perform a build, they report this back to the coordinator, which may then perform other builds to produce those required inputs.

Currently HTTP is used when agents want to communicate to the coordinator, although additional approaches could be implemented in the future. Similarly, SQLite is used as the database, but from the start there has been a plan to support PostgreSQL, but that's yet to be implemented.

Comparison to offloading

There's lots more to the internal workings of the Guix Build Coordinator, but how does this compare to the daemon offloading builds?

Starting from the outside and working in, the API for the Guix Build Coordinator is all asynchronous. When you submit a build, you get an ID back, which you can use to track the progress of that build. This is in contrast to the way the daemon is used, where you keep a connection established while builds are progressing.

Offloading is tightly integrated with the daemon, which can be both useful as offloading can transparently apply to anything that would otherwise be built locally. However, this can also be a limitation since the daemon is one of the harder components to modify.

With offloading, guix-daemon reaches out to another machine, copies over all the inputs and the derivation, and then starts the build. Rather than doing this, the Guix Build Coordinator agent pulls in the inputs and derivation using substitutes.

This pull approach has a few advantages, firstly it removes the need to keep a large store on the machine running the coordinator, and this large store requirement of using offloading became a problem in terms of scalability for the offloading approach. Another advantage is that it makes deploying agents easier, as they don't need to be reachable from the coordinator over the network, which can be an issue with NATs or virtual machines.

When offloading builds, the outputs would be copied back to the store on build success. Instead, the Guix Build Coordinator agent sends the outputs back as nar files. The coordinator would then process these nar files to make substitutes available. This helps distribute the work in generating the nar files, which can be quite expensive.

These differences may be subtle, but the architecture makes a big difference, it's much easier to store and serve nars at scale if this doesn't require a large store managed by a single guix-daemon.

There's also quite a few things in common with the daemon offloading approach. Builds are still methodically performed across multiple machines, and load is taken in to account when starting new builds.

A basic example

Getting the Guix Build Coordinator up and running does require some work, the following commands should get the coordinator and an agent running on a single machine. First, you start the coordinator.

  guix-build-coordinator

Then in another terminal, you interact with the running coordinator to create an agent in it's database.

  guix-build-coordinator agent new

Note the UUID of the generated agent.

  guix-build-coordinator agent <AGENT ID> password new

Note the generated password for the agent. With the UUID and password, the agent can then be started.

  guix-build-coordinator-agent --uuid=<AGENT ID> --password=<AGENT PASSWORD>

At this point, both processes should be running and the guix-build-coordinator should be logging requests from the agent.

In a third terminal, also at the root of the repository, generate a derivation, and then instruct the coordinator to have it built.

  guix build --no-grafts -d hello

Note the derivation that is output.

  guix-build-coordinator build <DERIVATION FILE>

This will return a randomly generated UUID that represents the build.

While running from the command line is useful for development and experimentation, there are services in Guix for running the coordinator and agents.

Applications of the Guix Build Coordinator

While I think the Guix Build Coordinator is a better choice than daemon offloading in some circumstances, it doesn't currently replace it.

At a high level, the Guix Build Coordinator is useful where there's a need to build derivations and do something with the outputs or build results, more than just having the outputs in the local store. This could be serving substitutes, or testing derivations for example.

At small scales, the additional complexity of the coordinator is probably unnecessary, but when it's useful to use multiple machines, either because of the resources that provides, or because of a more diverse range of hardware, then it makes much more sense to use the Guix Build Coordinator to coordinate what's going on.

Looking forward

The Guix Build Coordinator isn't just an alternative to daemon offloading, it's more a general toolkit for coordinating the building of derivations.

Much of the functionality in the Guix Build Coordinator happens in hooks. There are bits of code (hooks) that run when certain events happen, like a build gets submitted, or a build finished successfully. It's these hooks that are responsible for doing things like processing nars to be served as substitutes, or submitting retries for a failed build.

This hook to automatically retry building particular derivations is particularly useful when trying to provide substitutes where you want to lessen the impact of random failures, or for quality assurance purposes, where you want more data to better identify problems.

There are also more features such as build and agent tags and build priorities that can be really useful in some scenarios.

My hope is that the Guix Build Coordinator will enable a better substitute experience for Guix users, as well as enabling a whole new range of quality assurance tasks. It's already possible to see some impact from the Guix Build Coordinator, but there's still much work to do!

Additional material

About GNU Guix

GNU Guix is a transactional package manager and an advanced distribution of the GNU system that respects user freedom. Guix can be used on top of any system running the Hurd or the Linux kernel, or it can be used as a standalone operating system distribution for i686, x86_64, ARMv7, and AArch64 machines.

In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. When used as a standalone GNU/Linux distribution, Guix offers a declarative, stateless approach to operating system configuration management. Guix is highly customizable and hackable through Guile programming interfaces and extensions to the Scheme language.

by Christopher Baines at Friday, April 23, 2021

Tuesday, April 20, 2021

Scheme Requests for Implementation

SRFI 220: Line directives

SRFI 220 is now in withdrawn status.

Many language-agnostic programming tools rely on specially formatted source code comments to annotate the code with metadata. Such "magic comments" are hard for both humans and computers to parse reliably, as the purpose of a comment is to be free-form text that is not interpreted by machine.

This SRFI extends the standard Scheme directive syntax (#!) to support line directives. They look like magic comments to language-agnostic tools but read as S-expressions in Scheme, combining the portability of magic comments with the well-defined syntax and easy parsing of ordinary Scheme code.

by Lassi Kortela at Tuesday, April 20, 2021

Monday, April 19, 2021

Joe Marshall

η-conversion and tail recursion

Consider this lambda expression: (lambda (x) (sqrt x)). This function simply calls sqrt on its argument and returns whatever sqrt returns. There is no argument you could provide to this function that would cause it to return a different result than you would get from calling sqrt directly. We say that this function and the sqrt function are extensionally equal. We can replace this lambda expression with a literal reference to the sqrt function without changing the value produced by our code.

You can go the other way, too. If you find a literal reference to a function, you can replace it with a lambda expression that calls the function. This is η-conversion. η-reduction is removing an unnecessary lambda wrapper, η-expansion is introducting one.

η-conversion comes with caveats. First, it only works on functions. If I have a string "foo", and I attempt to η-expand this into (lambda (x) ("foo" x)), I get nonsense. Second, a reduction strategy that incorporates η-reduction can be weaker than one that does not. Consider this expression: (lambda (x) ((compute-f) x)). We can η-reduce this to (compute-f), but this makes a subtle difference. When wrapped with the lambda, (compute-f) is evaluated just before it is applied to x. In fact, we won't call (compute-f) unless we invoke the result of the lambda expression somewhere. However, once η-reduced, (compute-f) is evaluated at the point the original lambda was evaluated, which can be quite a bit earlier.


When a function foo calls another function bar as a subproblem, an implicit continuation is passed to bar. bar invokes this continuation on the return value that it computes. We can characterize this continuation like this:

kbar = (lambda (return-value)
         (kfoo (finish-foo return-value)))
this just says that when bar returns, we'll finish running the code in foo and further continue by invoking the continuation supplied to foo.

If foo makes a tail call to bar, then foo is just returning what bar computes. There is no computation for foo to finish, so the continuation is just

kbar = (lambda (return-value)
         (kfoo return-value))
But this η-reduces to just kfoo, so we don't have to allocate a new trivial continuation when foo tail calls bar, we can just pass along the continuation that was passed to foo.

Tail recursion is equivalent to η-reducing the implicit continuations to functions where possible. A Scheme aficionado might prefer to say avoiding η-expanding where unnecessary.

This is a mathematical curiosity, but does it have practical significance? If you're programming in continuation passing style, you should be careful to η-reduce (or avoid η-expanding) your code.

Years ago I was writing an interpreter for the REBOL language. I was getting frustrated trying to make it tail recursive. I kept finding places in the interpreter where the REBOL source code was making a tail call, but the interpreter itself wasn't, so the stack would grow without bound. I decided to investigate the problem by rewriting the interpreter in continuation passing style and seeing why I couldn't η-convert the tail calls. Once in CPS, I could see that eval took two continuations and I could achieve tail recursion by η-reducing one of them.

by Joe Marshall (noreply@blogger.com) at Monday, April 19, 2021

Tuesday, April 13, 2021

Programming Praxis

Geothmetic Meandian

In mathematics, the arithmetic geometric mean of two positive real numbers is computed by repeatedly taking half their sum and the square root of their product until the two numbers converge. For instance, the arithmetic geometric mean of 24 and 6 is 13.456171…, with the iterative steps computed as follows:

0  24                            6
1  15                           12
2  13.5                         13.416407864998738175455042
3  13.458203932499369089227521  13.458139030990984877207090
4  13.458171481745176983217305  13.458171481706053858316334
5  13.458171481725615420766820  13.458171481725615420766806

The arithmetic geometric mean was invented by Lagrange and studied by Gauss, and is used today to compute various transcendental functions because it converges so quickly.

In the world of Randall Munroe, the geothmetic meandian of any set of positive numbers is computed by iterating three sequences — the arithmetic mean, the geometric mean, and the median — until they converge. For instance, the geothmetic meandian of the set (1,1,2,3,5) is 2.089, computed as follows:

1  2.4                1.9743504858348200 2
2  2.1247834952782734 2.1161924605448084 2
3  2.0803253186076938 2.0795368194795802 2.1161924605448084
4  2.0920181995440275 2.0919486049152223 2.0803253186076938
5  2.0880973743556477 2.0880901331209600 2.0919486049152223
6  2.0893787041306098 2.0893779142184865 2.0880973743556477
7  2.0889513309015815 2.0889512436159920 2.0893779142184865
8  2.0890934962453533 2.0890934865653277 2.0889513309015815
9  2.0890461045707540 2.0890461034958396 2.0890934865653277

[ I hate the damned new editor at WordPress; I struggle with it every time I post an exercise. I could not figure out how to embed the image from XKCD in this blog post. You can see it here. ]

Your task is to write programs that compute the arithmetic geometric mean and geothmetic meandian. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at Tuesday, April 13, 2021

Vladimir Nikishkin

I’ve hacked a simple script for drawing a file system tree as a graph with graphviz. Sounds fun?

Once I realised than a file system tree is like a MindMap…

Here’s the gitlab link: https://gitlab.com/Lockywolf/scsh-xattr-mindmap

Contrary to the name, it is actually in Chibi, not in scsh. I initially thought that scsh would be better due to more exensive posix support, but it turned out to be that Chibi was good enough.

It is a small-ish (500 lines of code) script to generate a graph from your filesystem tree. It accepts a few options (editable directly at the file top) and duplicates quite a lot of the GNU Find functionality, but I didn’t find a way to avoid doing that, as it has to use heuristics in order to prune the tree to a reasonable size.

The resulting image is like this:

2021-04-13_Slarm64-repo-tree.smaller.png

Small, 1Mb

Large, 44Mb

I plotted the Slarm64 (Unofficial Slackware for Raspberry Pi) repository tree, just for the demonstration purposes.

The size of the images above is 1×2.5 metres. It’s large, but my original goal was to plot my whole file system. The ’size=’ parameter is tunable. I think it is reasonable to assume that you need to have at least 4 square centimetres per node, so a graph that large would accommodate about 4000 nodes. In my opinion, 8000 is still possible, but too tight.

With the default settings the script ignores regular files, but traverses symlinks. In theory it also supports hardlinks, but you would need to turn on drawing regular files manually.

I made this script, because I started to feel that I am starting to forget what I have on my hard drive, that has amassed quite a lot of life history for the past 20 years. (Since hard drives became reasonably priced.)

Use-cases and pull requests welcome. One more reason to create this script was to prove that Scheme can be a practical programming language.

Technologically, this code is not terribly advanced, the only trick that may be interesting to programming nerds is having the r7rs module and the main function in the same file (like scsh/scheme48 suggest doing), which requires procedural module analysis.

I had to glue on a couple of C bindings for sys/xattr.h, those are now available at the Snow-Fort repo. Those are Chibi-specific.

Hope you will enjoy it.

by lockywolf at Tuesday, April 13, 2021

Monday, April 12, 2021

GNU Guix

New Supported Platform: powerpc64le-linux

It is a pleasure to announce that support for powerpc64le-linux (PowerISA v.2.07 and later) has now been merged to the master branch of GNU Guix!

This means that GNU Guix can be used immediately on this platform from a Git checkout. Starting with the next release (Guix v1.2.1), you will also be able to download a copy of Guix pre-built for powerpc64le-linux. Regardless of how you get it, you can run the new powerpc64le-linux port of GNU Guix on top of any existing powerpc64le GNU/Linux distribution.

This new platform is available as a "technology preview". This means that although it is supported, substitutes are not yet available from the build farm, and some packages may fail to build. Although powerpc64le-linux support is nascent, the Guix community is actively working on improving it, and this is a great time to get involved!

Why Is This Important?

This is important because it means that GNU Guix now works on the Talos II, Talos II Lite, and Blackbird mainboards sold by Raptor Computing Systems. This modern, performant hardware uses IBM POWER9 processors, and it is designed to respect your freedom. The Talos II and Talos II Lite have recently received Respects Your Freedom (RYF) certification from the FSF, and Raptor Computing Systems is currently pursuing RYF certification for the more affordable Blackbird, too. All of this hardware can run without any non-free code, even the bootloader and firmware. In other words, this is a freedom-friendly hardware platform that aligns well with GNU Guix's commitment to software freedom.

How is this any different from existing RYF hardware, you might ask? One reason is performance. The existing RYF laptops, mainboards, and workstations can only really be used with Intel Core Duo or AMD Opteron processors. Those processors were released over 15 years ago. Since then, processor performance has increased drastically. People should not have to choose between performance and freedom, but for many years that is exactly what we were forced to do. However, the POWER9 machines sold by Raptor Computing Systems have changed this: the free software community now has an RYF-certified option that can compete with the performance of modern Intel and AMD systems.

Although the performance of POWER9 processors is competitive with modern Intel and AMD processors, the real advantage of the Talos II, Talos II Lite, and Blackbird is that they were designed from the start to respect your freedom. Modern processors from both Intel and AMD include back doors over which you are given no control. Even though the back doors can be removed with significant effort on older hardware in some cases, this is an obstacle that nobody should have to overcome just to control their own computer. Many of the existing RYF-certified options (e.g., the venerable Lenovo x200) use hardware that can only be considered RYF-certified after someone has gone through the extra effort of removing those back doors. No such obstacles exist when using the Talos II, Talos II Lite, or Blackbird. In fact, although Intel and AMD both go out of their way to keep you from understanding what is going on in your own computer, Raptor Computing Systems releases all of the software and firmware used in their boards as free software. They even include circuit diagrams when they ship you the machine!

Compared to the existing options, the Talos II, Talos II Lite, and Blackbird are a breath of fresh air that the free software community really deserves. Raptor Computing Systems' commitment to software freedom and owner control is an inspiring reminder that it is possible to ship a great product while still respecting the freedom of your customers. And going forward, the future looks bright for the open, royalty-free Power ISA stewarded by the OpenPOWER Foundation, which is now a Linux Foundation project (see also: the same announcement from the OpenPOWER Foundation.

In the rest of this blog post, we will discuss the steps we took to port Guix to powerpc64le-linux, the issues we encountered, and the steps we can take going forward to further solidify support for this exciting new platform.

Bootstrapping powerpc64le-linux: A Journey

To build software, you need software. How can one port Guix to a platform before support for that platform exists? This is a bootstrapping problem.

In Guix, all software for a given platform (e.g., powerpc64le-linux) is built starting from a small set of "bootstrap binaries". These are binaries of Guile, GCC, Binutils, libc, and a few other packages, pre-built for the relevant platform. It is intended that the bootstrap binaries are the only pieces of software in the entire package collection that Guix cannot build from source. In practice, additional bootstrap roots are possible, but introducing them in Guix is highly discouraged, and our community actively works to reduce our overall bootstrap footprint. There is one set of bootstrap binaries for each platform that Guix supports.

This means that to port Guix to a new platform, you must first build the bootstrap binaries for that platform. In theory, you can do this in many ways. For example, you might try to manually compile them on an existing system. However, Guix has package definitions that you can use to build them - using Guix, of course!

Commonly, the first step in porting Guix to a new platform is to use Guix to cross-compile the bootstrap binaries for that new platform from a platform on which Guix is already supported. This can be done by running a command like the following on a system where Guix is already installed:

guix build --target=powerpc64le-linux-gnu bootstrap-tarballs

This is the route that we took when building the powerpc64le-linux bootstrap binaries, as described in commit 8a1118a. You might wonder why the target above is "powerpc64le-linux-gnu" even though the new Guix platform is called "powerpc64le-linux". This is because "powerpc64le-linux-gnu" is a GNU triplet identifying the new platform, but "powerpc64le-linux" is the name of a "system" (i.e., a platform) in Guix. Guix contains code that converts between the two as needed (see nix-system->gnu-triplet and gnu-triplet->nix-system in guix/utils.scm. When cross-compiling, you only need to specify the GNU triplet.

Note that before you can even do this, you must first update the glibc-dynamic-linker and system->linux-architecture procedures in Guix's code, as described in Porting. In addition, the versions of packages in Guix that make up the GNU toolchain (gcc, glibc, etc.) must already support the target platform. This pre-existing toolchain support needs to be good enough so that Guix can (1) build, on some already-supported platform, a cross-compilation toolchain for the target platform, (2) use, on the already-supported platform, the cross-compilation toolchain to cross-compile the bootstrap binaries for the target platform, and (3) use, on the target platform, the bootstrap binaries to natively build the rest of the Guix package collection. The above guix build command takes care of steps (1) and (2) automatically.

Step (3) is a little more involved. Once the bootstrap binaries for the target platform have been built, they must be published online for anyone to download. After that, Guix's code must be updated so that (a) it recognizes the "system" name (e.g., "powerpc64le-linux") that will be used to identify the new platform and (b) it fetches the new platform's bootstrap binaries from the right location. After all that is done, you just have to try building things and see what breaks. For example, you can run ./pre-inst-env guix build hello from your Git checkout to try building GNU Hello.

The actual bootstrap binaries for powerpc64le-linux are stored on the alpha.gnu.org FTP server. Chris Marusich built these bootstrap binaries in an x86_64-linux Guix System VM which was running on hardware owned by Léo Le Bouter. Chris then signed the binaries and provided them to Ludovic Courtès, who in turn verified their authenticity, signed them, and uploaded them to alpha.gnu.org. After that, we updated the code to use the newly published bootstrap binaries in commit 8a1118a. Once all that was done, we could begin bootstrapping the rest of the system - or trying to, at least.

There were many stumbling blocks. For example, to resolve some test failures, we had to update the code in Guix that enables it to make certain syscalls from scheme. In another example, we had to patch GCC so that it looks for the 64-bit libraries in /lib, rather than /lib64, since that is where Guix puts its 64-bit libraries by convention. In addition, some packages required in order to build Guix failed to build, so we had to debug those build failures, too.

For a list of all the changes, see the patch series or the actual commits, which are:

$ git log --oneline --no-decorate 8a1118a96c9ae128302c3d435ae77cb3dd693aea^..65c46e79e0495fe4d32f6f2725d7233fff10fd70
65c46e79e0 gnu: sed: Make it build on SELinux-enabled kernels.
93f21e1a35 utils: Fix target-64bit? on powerpc64le-linux.
8d9aece8c4 ci: %cross-targets: Add powerpc64le-linux-gnu.
c29bfbfc78 syscalls: Fix RNDADDTOENTCNT on powerpc64le-linux.
b57de27d03 syscalls: Fix clone on powerpc64le-linux.
a16eb6c5f9 Add powerpc64le-linux as a supported Guix architecture.
b50f426803 gnu: libelf: Fix compilation for powerpc64le-linux.
1a0f4013d3 gnu: texlive-latex-base: Fix compilation on powerpc64le*.
e9938dc8f0 gnu: texlive-bin: Fix compilation on powerpc64le*.
69b3907adf gnu: guile-avahi: Fix compilation on powerpc64le-linux.
4cc2d2aa59 gnu: bdb-4.8: Fix configure on powerpc64le-linux.
be4b1cf53b gnu: binutils-final: Support more Power architectures.
060478c32c gnu: binutils-final: Provide bash for binary on powerpc-linux.
b2135b5d57 gnu: gcc-boot0: Enable 128-bit long double for POWER9.
6e98e9ca92 gnu: glibc: Fix ldd path on powerpc*.
cac88b28b8 gnu: gcc-4.7: On powerpc64le, fix /lib64 references.
fc7cf0c1ec utils: Add target-powerpc? procedure.
8a1118a96c gnu: bootstrap: Add support for powerpc64le-linux.

In the end, through the combined efforts of multiple people, we slowly worked through the issues until we reached a point where we could do all of the following things successfully:

  • Build Guix manually on a Debian GNU/Linux ppc64el machine (this is Debian's name for a system using the powerpc64le-linux-gnu triplet), and verify that its make check tests passed.
  • Build GNU Hello using Guix and run it.
  • Run guix pull to build and install the most recent version of Guix, with powerpc64le-linux support.
  • Build a release binary tarball for powerpc64le-linux via: make guix-binary.powerpc64le-linux.tar.xz
  • Use that binary to install a version of Guix that could build/run GNU Hello and run guix pull successfully.

This was an exciting moment! But there was still more work to be done.

Originally, we did this work on the wip-ppc64le branch, with the intent of merging it into core-updates. By convention, the "core-updates" branch in Guix is where changes are made if they cause too many rebuilds. Since we were updating package definitions so deep in the dependency graph of the package collection, we assumed it wouldn't be possible to avoid rebuilding the world. For this reason, we had based the wip-ppc64le branch on core-updates.

However, Efraim Flashner proved us wrong! He created a separate branch, wip-ppc64le-for-master, where he adjusted some of the wip-ppc64le commits to avoid rebuilding the world on other platforms. Thanks to his work, we were able to merge the changes directly to master! This meant that we would be able to include it in the next release (Guix v.1.2.1).

In short, the initial porting work is done, and it is now possible for anyone to easily try out Guix on this new platform. Because guix pull works, too, it is also easy to iterate on what we have and work towards improving support for the platform. It took a lot of cooperation and effort to get this far, but there are multiple people actively contributing to this port in the Guix community who want to see it succeed. We hope you will join us in exploring the limits of this exciting new freedom-friendly platform!

Other Porting Challenges

Very early in the porting process, there were some other problems that stymied our work.

First, we actually thought we would try to port to powerpc64-linux (big-endian). However, this did not prove to be any easier than the little-endian port. In addition, other distributions (e.g., Debian and Fedora) have recently dropped their big-endian powerpc64 ports, so the little-endian variant is more likely to be tested and supported in the community. For these reasons, we decided to focus our efforts on the little-endian variant, and so far we haven't looked back.

In both the big-endian and little-endian case, we were saddened to discover that the bootstrap binaries are not entirely reproducible. This fact is documented in bug 41669, along with our extensive investigations.

In short, if you build the bootstrap binaries on two separate machines without using any substitutes, you will find that the derivation which cross-compiles %gcc-static (the bootstrap GCC, version 5.5.0) produces different output on the two systems. However, if you build %gcc-static twice on the same system, it builds reproducibly. This suggests that something in the transitive closure of inputs of %gcc-static is perhaps contributing to its non-reproducibility. There is an interesting graph toward the end of the bug report, shown below:

DifferingDerivations

This graph shows the derivations that produce differing outputs across two Guix System machines, when everything is built without substitutes. It starts from the derivation that cross-compiles %gcc-static for powerpc64-linux-gnu (from x86_64-linux) using Guix at commit 1ced8379c7641788fa607b19b7a66d18f045362b. Then, it walks the graph of derivation inputs, recording only those derivations which produce differing output on the two different machines. If the non-reproducibility (across systems) of %gcc-static is caused by a non-reproducible input, then it is probably caused by one or more of the derivations shown in this graph.

At some point, you have to cut your losses and move on. After months of investigation without resolving the reproducibility issue, we finally decided to move forward with the bootstrap binaries produced earlier. If necessary, we can always go back and try to fix this issue. However, it seemed more important to get started with the bootstrapping work.

Anyone who is interested in solving this problem is welcome to comment on the bug report and help us to figure out the mystery. We are very interested in solving it, but at the moment we are more focused on building the rest of the Guix package collection on the powerpc64le-linux platform using the existing bootstrap binaries.

Next Steps

It is now possible to install Guix on a powerpc64le-linux system and use it to build some useful software - in particular, Guix itself. So Guix is now "self-hosted" on this platform, which gives us a comfortable place to begin further work.

The following tasks still need to be done. Anyone can help, so please get in touch if you want to contribute!

About GNU Guix

GNU Guix is a transactional package manager and an advanced distribution of the GNU system that respects user freedom. Guix can be used on top of any system running the Hurd or the Linux kernel, or it can be used as a standalone operating system distribution for i686, x86_64, ARMv7, and AArch64 machines.

In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. When used as a standalone GNU/Linux distribution, Guix offers a declarative, stateless approach to operating system configuration management. Guix is highly customizable and hackable through Guile programming interfaces and extensions to the Scheme language.

by Chris Marusich and Léo Le Bouter at Monday, April 12, 2021

Sunday, April 11, 2021

Andy Wingo

guile's reader, in guile

Good evening! A brief(ish?) note today about some Guile nargery.

the arc of history

Like many language implementations that started life when you could turn on the radio and expect to hear Def Leppard, Guile has a bottom half and a top half. The bottom half is written in C and exposes a shared library and an executable, and the top half is written in the language itself (Scheme, in the case of Guile) and somehow loaded by the C code when the language implementation starts.

Since 2010 or so we have been working at replacing bits written in C with bits written in Scheme. Last week's missive was about replacing the implementation of dynamic-link from using the libltdl library to using Scheme on top of a low-level dlopen wrapper. I've written about rewriting eval in Scheme, and more recently about how the road to getting the performance of C implementations in Scheme has been sometimes long.

These rewrites have a quixotic aspect to them. I feel something in my gut about rightness and wrongness and I know at a base level that moving from C to Scheme is the right thing. Much of it is completely irrational and can be out of place in a lot of contexts -- like if you have a task to get done for a customer, you need to sit and think about minimal steps from here to the goal and the gut doesn't have much of a role to play in how you get there. But it's nice to have a project where you can do a thing in the way you'd like, and if it takes 10 years, that's fine.

But besides the ineffable motivations, there are concrete advantages to rewriting something in Scheme. I find Scheme code to be more maintainable, yes, and more secure relative to the common pitfalls of C, obviously. It decreases the amount of work I will have when one day I rewrite Guile's garbage collector. But also, Scheme code gets things that C can't have: tail calls, resumable delimited continuations, run-time instrumentation, and so on.

Taking delimited continuations as an example, five years ago or so I wrote a lightweight concurrency facility for Guile, modelled on Parallel Concurrent ML. It lets millions of fibers to exist on a system. When a fiber would need to block on an I/O operation (read or write), instead it suspends its continuation, and arranges to restart it when the operation becomes possible.

A lot had to change in Guile for this to become a reality. Firstly, delimited continuations themselves. Later, a complete rewrite of the top half of the ports facility in Scheme, to allow port operations to suspend and resume. Many of the barriers to resumable fibers were removed, but the Fibers manual still names quite a few.

Scheme read, in Scheme

Which brings us to today's note: I just rewrote Guile's reader in Scheme too! The reader is the bit that takes a stream of characters and parses it into S-expressions. It was in C, and now is in Scheme.

One of the primary motivators for this was to allow read to be suspendable. With this change, read-eval-print loops are now implementable on fibers.

Another motivation was to finally fix a bug in which Guile couldn't record source locations for some kinds of datums. It used to be that Guile would use a weak-key hash table to associate datums returned from read with source locations. But this only works for fresh values, not for immediate values like small integers or characters, nor does it work for globally unique non-immediates like keywords and symbols. So for these, we just wouldn't have any source locations.

A robust solution to that problem is to return annotated objects rather than using a side table. Since Scheme's macro expander is already set to work with annotated objects (syntax objects), a new read-syntax interface would do us a treat.

With read in C, this was hard to do. But with read in Scheme, it was no problem to implement. Adapting the expander to expect source locations inside syntax objects was a bit fiddly, though, and the resulting increase in source location information makes the output files bigger by a few percent -- due somewhat to the increased size of the .debug_lines DWARF data, but also due to serialized source locations for syntax objects in macros.

Speed-wise, switching to read in Scheme is a regression, currently. The old reader could parse around 15 or 16 megabytes per second when recording source locations on this laptop, or around 22 or 23 MB/s with source locations off. The new one parses more like 10.5 MB/s, or 13.5 MB/s with positions off, when in the old mode where it uses a weak-key side table to record source locations. The new read-syntax runs at around 12 MB/s. We'll be noodling at these in the coming months, but unlike when the original reader was written, at least now the reader is mainly used only at compile time. (It still has a role when reading s-expressions as data, so there is still a reason to make it fast.)

As is the case with eval, we still have a C version of the reader available for bootstrapping purposes, before the Scheme version is loaded. Happily, with this rewrite I was able to remove all of the cruft from the C reader related to non-default lexical syntax, which simplifies maintenance going forward.

An interesting aspect of attempting to make a bug-for-bug rewrite is that you find bugs and unexpected behavior. For example, it turns out that since the dawn of time, Guile always read #t and #f without requiring a terminating delimiter, so reading "(#t1)" would result in the list (#t 1). Weird, right? Weirder still, when the #true and #false aliases were added to the language, Guile decided to support them by default, but in an oddly backwards-compatible way... so "(#false1)" reads as (#f 1) but "(#falsa1)" reads as (#f alsa1). Quite a few more things like that.

All in all it would seem to be a successful rewrite, introducing no new behavior, even producing the same errors. However, this is not the case for backtraces, which can expose the guts of read in cases where that previously wouldn't happen because the C stack was opaque to Scheme. Probably we will simply need to add more sensible error handling around callers to read, as a backtrace isn't a good user-facing error anyway.

OK enough rambling for this evening. Happy hacking to all and to all a good night!

by Andy Wingo at Sunday, April 11, 2021