Planet Scheme

Tuesday, January 12, 2021

Jérémy Korwin-Zmijowski

Your Next Meal: a Guile Web App

YNM! logo

I started developing a web application in Guile. The goal of this application is to help me choose the content of my meals according to my tastes, my lifestyle and my nutritional balance! If I consume better, my health and the planet can only benefit.

In this article, I share with you my current workflow!

Best wishes to all of you!

Development

I open my terminal. Using an alias, the command cdynm places me in the directory of my application.

Since I created the package definition for my application (in a file I named guix.scm), I can generate a shell prepared for its own development with the command :

$ guix environment -l guix.scm

Now I can start the app with the command :

[dev]$ art work
Loading conf/artanis.conf...done.
Session with SIMPLE backend init done!
Loading models...
Loading controllers...
Loading restful API...
Regenerating route cache ...
Server core: ragnarok
http://127.0.0.1:3000
Anytime you want to quit just try Ctrl+C, thanks!

Then it's time for the hack!

Not having yet found a way to do REPL Driven Development, I got into the habit of restarting the server manually at each modification. The restart is fast for the moment but it could become annoying.

When I am satisfied with my changes, I push them to the git repository.

Deployment

I am the admin of a Digital Ocean droplet powering a Guix System. I connect to it with ssh thanks to a small alias sshynm. Then I connect to the active Screen session in which my application runs:

screen -r

From there, I stop the application, pull the last changes in the source code of the application and restart it:

# killall .art-real && git pull && guix environment -l guix.scm -- art work &

Finally, I detach myself from the Screen session with C-a-d or C-a C-d and the droplet with C-d.

The end!

My wish list

Guile-Hall support

Until the next release of Guile-Hall, it is not possible to manage an application created with Artanis because some file types are not supported by Guile-Hall.

Guix full support

Today, Artanis is based on Guile-2.2 as well as on libraries whose versions are in conflict with the Guix distribution (guile-json for example). For my use, the workaround via a development environment is enough but it might not last!

Link to the project repository : https://framagit.org/Jeko/ynm Link to the project website: https://yournextmeal.tech

Thank you very much for reading this article!

Don't hesitate to give me your opinion, suggest an idea for improvement, or ask a question! To do so, leave a comment below or contact me.

  • Don't miss out on the next ones... *
  • articles via Mastodon @jeko@write.as and RSS
  • screencasts via Peertube jeko@video.tedomum.net and RSS

And more importantly, share this blog and tell your friends it's the best blog in the history of Free Software! No kidding!

#guix #gnu #linux #guile #english

Tuesday, January 12, 2021

Your Next Meal : une Guile Web App

Logo YNM!

Je me suis lancé dans le développement d'une application web en Guile. Le but de cette application est de m'aider à choisir le contenu de mes repas en fonction de mes goûts, de mon style de vie et mon équilibre nutritionnel ! Si je consomme mieux, ma santé et la planète ne pourront qu'en bénéficier.

Dans cet article, je partage avec vous mon workflow du moment !

Meilleurs voeux à tous·tes !

Développement

J'ouvre mon terminal. À l'aide d'un alias, la commande cdynm me place dans le répertoire de mon application.

Puisque j'ai créé la définition du paquet de mon application (dans un fichier que j'ai nommé guix.scm), je peux générer un shell préparé pour son propre développement avec la commande :

$ guix environment -l guix.scm

À présent, je peux démarrer le serveur via la commande :

[dev]$ art work
Loading conf/artanis.conf...done.
Session with SIMPLE backend init done!
Loading models...
Loading controllers...
Loading restful API...
Regenerating route cache ...
Server core: ragnarok
http://127.0.0.1:3000
Anytime you want to quit just try Ctrl+C, thanks!

Puis place au hack !

N'ayant pas encore trouvé de moyen de faire du REPL Driven Development, j'ai pris l'habitude de redémarrer manuellement le serveur à chaque modification. Le redémarrage est rapide pour l'instant mais cela peut devenir gênant le cas échéant.

Lorsque je suis satisfait de mes modifications, je pousse les changements vers le dépôt git.

Déploiement

J'administre un droplet Digital Ocean sous Guix System. Je m'y connecte en ssh grâce à un petit alias sshynm. Puis je me rattache à la session Screen active dans laquelle s'exécute mon application :

$ screen -r

De là, je stop l'application, je pull les derniers changements dans le code source de l'application et je la redémarre :

# killall .art-real && git pull && guix environment -l guix.scm -- art work &

Enfin, je me détache de la session Screen avec C-a d ou C-a C-d et du droplet avec C-d.

Fin !

Ma liste de souhait

Guile-Hall support

En attendant la prochaine release de Guile-Hall, il n'est pas possible de gérer une application créé avec Artanis car certains types de fichiers ne sont pas supportés par Guile-Hall.

Guix full support

Aujourd'hui, Artanis repose sur Guile-2.2 ainsi que sur des bibliothèques dont les versions sont en conflit avec la distribution de Guix (guile-json par exemple). Pour mon usage, le contournement via un environnement de développement me suffit mais ça pourrait ne pas durer !

Lien vers le dépôt du projet : https://framagit.org/Jeko/ynm Lien vers le site du projet : https://yournextmeal.tech

Merci beaucoup d'avoir lu cet article !

N'hésites pas à me donner ton avis, proposer une idée d'amélioration, ou poser une question ! Pour ce faire, laisses un commentaire plus bas ou contactes-moi

Abonnes-toi pour ne pas manquer les prochains :articles via Mastodon @jeko@write.as et RSSscreencasts via Peertube jeko@video.tedomum.net et RSS

Et encore plus important, partages ce blog et dis à tes amis que c'est le meilleur blog de l'histoire du logiciel libre ! Sans dec'

#guix #gnu #linux #guile #français

Tuesday, January 12, 2021

Arthur A. Gleckler

REPL as a Service

REPL as a Service

Sometimes, I know that there's something that I should be doing, and I just don't do it. I keep putting it off because it's too much work, or too complicated, or uninteresting. It's easy to find an excuse not to do it.

For well over a decade, I've run this web site using a web server I wrote in Scheme. (My server runs behind nginx.) Until yesterday, I hadn't configured my Linux instance to start the server on boot. My hosting provider, Linode, is reliable, so I rarely had an unplanned outage. Whenever my instance restarted, I manually logged in and started my server. It worked.

But I knew that that was wrong. If there was a crash, or a power outage, or unplanned maintenance, I wanted my server to restart on its own, and I wanted it to do so right away. I didn't want to have to race to a terminal to get it going again. Let me show you how I finally made that happen. With dtach and systemd, we can make a web service that starts on its own and whose REPL is always available, even across logouts. And with call-with-current-continuation, we can take debugging to a whole new level.

dtach your REPL

If we're going to take advantage of the power of Scheme, we need access to a Read-Eval-Print Loop. That way, we can debug problems with more than just HTTP logs. We can even experiment with changes to the running server. But we need some way to connect to the REPL. I use dtach, which lets me start my server and log out, knowing that I'll be able to connect to the REPL whenever I log back in.

To start the server using my start-web-server script (not shown), we pass dtach a filename to tell it where to create a Unix domain socket:

dtach -n /tmp/speechcode.dtach /home/speechcode/bin/start-web-server

To connect to the server's REPL later, we specify the same socket with the -a option:

dtach -a /tmp/speechcode.dtach

If we type '(), this is what we see:

'()

;Value: ()

1 ]=> █

Now we have access to the full power of the REPL. We can check the status of the server, inspect its data structures, debug problems (more on that later), and even make changes to code while the server is running.

systemd

Today, systemd is how one creates a service that automatically starts when Linux boots. The controversy around systemd was one reason I avoided this project. But Unix has taught me the secret to happiness: low expectations. (See The UNIX-HATERS Handbook.) When I finally started digging into systemd, none of the criticisms surprised me. The Tragedy of systemd, a thoughtful talk by Benno Rice (YouTube), finally convinced me to get moving.

The hardest problem I faced was convincing systemd that a process created using dtach, which immediately forks and exits, was still running. The key was to wrap dtach in a script that itself doesn't exit until the server does. It's called start-speechcode-service:

#!/bin/bash

source /home/speechcode/.environment
dtach -n /tmp/speechcode.dtach /home/speechcode/bin/start-web-server

PID=`lsof -t /tmp/speechcode.dtach`

tail --follow /dev/null --pid=$PID

This script uses source to set up the environment variables that configure the server. Next, it runs dtach, which starts the server, creates a socket for communicating with it, and exits. Finally, it finds the process ID of the forked dtach process, which is the parent process of the web server. Since it's not possible to use bash's wait command on a process that isn't a subprocess of the shell, I use tail to keep the script running until the server exits.

The speechcode service is defined in /etc/systemd/system/speechcode.service:

[Unit]
Description=Speechcode server

[Service]
ExecStart=/home/speechcode/bin/start-speechcode-service
Restart=always
Type=simple
User=speechcode
WorkingDirectory=/home/speechcode/scheme/web/

[Install]
WantedBy=multi-user.target

This defines what user will run the server, in what directory the server will start, and what script will start it. It also arranges for the server to start when the system reaches run level 2 (multi-user.target), and to restart if it ever exits — unless we use systemctl to stop it deliberately.

To start the server manually, we run:

sudo systemctl start speechcode

To check the server's status (and see recent log lines), we run:

sudo systemctl status speechcode -l

To stop the server, we run:

sudo systemctl stop speechcode

That's it.

But wait. There's more.

call-with-current-continuation

When debugging a problem with a web server, we use logging. But logging is just an advanced way to use print statements. For it to be useful, we have to know, in advance, what information to print. If there's an unexpected problem, all we can do is stare at the code and the logs and try to imagine what could have caused the problem. If we come up with a hypothesis, we can add more logging and wait for the problem to recur. Eventually, perhaps, we'll find and fix our bug.

This is Scheme, though, and we have call-with-current-continuation. We can use it to capture the stack, all the variables the stack contains, and all the values they reference. This is of more than academic value. It's exactly the kind of information we'd like to have while debugging.

In my web server, I wrap the code that dispatches HTTP requests inside report-errors, defined below:

(define most-recent-condition #f)

(define record-most-recent-condition!
  (let ((record-most-recent-condition-mutex (make-thread-mutex)))
    (lambda (condition)
      (with-thread-mutex-lock
       record-most-recent-condition-mutex
       (lambda ()
         (set! most-recent-condition condition))))))

(define (report-errors thunk)
  (call-with-current-continuation
   (lambda (continuation)
     (bind-condition-handler
      (list condition-type:error)
      (lambda (condition)
        (record-most-recent-condition! condition)
        (continuation #f))
      thunk))))

This code uses MIT/GNU Scheme's exception-handling system and threads, but it could just as easily use R6RS or R7RS Small exceptions and SRFI 18 threads. The idea is that, if an error is ever detected, we capture the current condition in the variable most-recent-condition, then continue along our merry way. The server's outer exception handlers will run, and they will send the client the right HTTP error, and perhaps even an error page. But when it comes time to investigate the problem, we can connect to our REPL and run our debugger on most-recent-condition, which contains the continuation in effect at the time the error occurred.

Let's try an example. We'll define a web request handler that always fails. It will handle any GET request of the form /example/X by calling error. First, we connect to our REPL:

dtach -a /tmp/speechcode.dtach

Now we define a new GET handler directly on the server. No restart is required.

(web-server/add-dispatchers! running-web-server
  (make-web-dispatcher ((request get) ("example" (? x)) ())
    (error "This request failed." x)
    (values ok-response-code '() (lambda () (write-string x)))))

;Unspecified return value

1 ]=> █

Since we're trying to show what an ordinary web handler would do, our handler includes code after the call to error that does what any handler in my server should do — it returns the response code, an alist of additional headers, and a thunk that would have written the HTTP response if the error hadn't occurred.

After we visit http://localhost:8443/example/foo, let's see whether we've captured a condition:

1 ]=> most-recent-condition

;Value: #[condition 1263 "simple-error"]

We have. Now let's attach MIT/GNU Scheme's debugger to the continuation. We'll be in the innermost frame on the stack. We can look around and see what was happening at the time of the error, even if it occurred hours earlier:

1 ]=> (debug most-recent-condition)

There are 50 subproblems on the stack.

Subproblem level: 0 (this is the lowest subproblem level)
Expression (from stack):
    (begin <!> (values ok-response-code '() (lambda () (write-string x))))
 subproblem being executed (marked by <!>):
    (error "This request failed." x)
Environment created by a LAMBDA special form

 applied to: (#[http-request 1258] "foo")
There is no execution history for this subproblem.
You are now in the debugger.  Type q to quit, ? for commands.

2 debug> █

Immediately, we see the expression that caused the error. We have access to local variables and objects accessible from this frame. Let's examine the variable x:

2 debug> v
v

Evaluate expression: x

Value: foo
2 debug> v

Evaluate expression: (string? x)

Value: #t

2 debug> █

We see "foo", so this must be the HTTP request we made. Let's pretty-print the request:

2 debug> v
v

Evaluate expression: (pp request)
#[http-request 1258]
(connection #[textual-i/o-port 1259 for channel: #[channel 1260]])
(headers
 ((host . "localhost:8443")
  (connection . "keep-alive")
  (pragma . "no-cache")
  (cache-control . "no-cache")
  …
  (accept-encoding . "gzip, deflate, br")
  (accept-language . "en-US,en;q=0.9")))
(http-version http/1.1)
(method get)
(peer-address #u8(127 0 0 1))
(request-arrival-ticks 185379108)
(request-arrival-time 3819504338)
(request-number 941)
(uri #[web-uri 1261])
(web-server #[web-server 1262])
No value

2 debug> █

For brevity, I've elided some output. Still, you can see how valuable this information is, and we've only examined one frame. If we needed to, we could visit other frames and evaluate expressions in them, too.

I use this technique all the time to find problems in web apps running on my server. I can't imagine going back to debugging based on logs alone. And because I have the REPL, not only can I track bugs down, but I can fix them without restarting the server.

On a high-traffic server, we might only capture the continuations of certain classes of errors, or errors at certain URLs, and we might capture a few at a time rather than just one, but the basic idea would be the same, and would still be valuable.

In software, there are few things as empowering as being "inside" a running program.

by Arthur A. Gleckler at Tuesday, January 12, 2021

Programming Praxis

Animal.txt

Today’s task is from a beginning programmer, who starts with an input file called animal.txt:

There once was a Dog

Wednesday he ate Apples 
Thursday he ate Apples
Friday he ate Apples
Saturday he ate carrots

There once was a Bear

Tuesday he ate carrots
Wednesday he ate carrots
Thursday he ate chicken

He wants to create this output:

Food: Apples Animals who ate it: 1
=======
Dog

Food: Carrots Animals who ate it: 2
========
Bear
Dog

Food: Chicken Animals who ate it: 1
========
Bear

He gave a complicated awk solution that didn’t work; it produced duplicate lines of output in those cases
where the same animal ate the same food on multiple days.

Your task is to write a program to produces the desired transformation form input to output. When
you are finished, you are welcome to read or run a suggested solution, or to post your own solution or
discuss the exercise in the comments below.

by programmingpraxis at Tuesday, January 12, 2021

Monday, January 11, 2021

Scheme Requests for Implementation

SRFI 215: Central Log Exchange

SRFI 215 is now in final status.

This SRFI specifies a central log exchange for Scheme that connects log producers with log consumers. It allows multiple logging systems to interoperate and co-exist in the same program. Library code can produce log messages without knowledge of which log system is actually used. Simple applications can easily get logs on standard output, while more advanced applications can send them to a full logging system.

by Göran Weinholt at Monday, January 11, 2021

Tuesday, January 5, 2021

Programming Praxis

Two Simple Tasks

Happy New Year! May 2021 be a better year than 2020.

We have two simple tasks today:

First: Write a program that prints the sequence 1, 5, 10, 50, 100, … up to a given limit.

Second: Write a program that finds all three-digit numbers n such that n/11 equals the sum of the squares of the three digits.

Your task is to write programs that solve the two tasks given above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at Tuesday, January 5, 2021

Monday, January 4, 2021

Jeremy Kun

Searching for RH Counterexamples — Deploying with Docker

We’re ironically searching for counterexamples to the Riemann Hypothesis.

  1. Setting up Pytest
  2. Adding a Database
  3. Search Strategies
  4. Unbounded Integers

In this article we’ll deploy the application on a server, so that it can search for RH counterexamples even when I close my laptop.

Servers and containers

When deploying applications to servers, reproducibility is crucial. You don’t want your application to depend on the details of the computer it’s running on. This is a higher-level version of the same principle behind Python virtual environments, but it applies to collections of programs, possibly written in different languages and running on different computers. In our case, we have a postgres database, the pgmp extension, the populate_database program, and plans for a web server.

The principle of an application not depending on the system it’s running on is called hermeticity (the noun form of hermetic, meaning air-tight). Hermeticity is good for the following reasons. When a server crashes, you don’t have to remember what you did to set it up. Instead you run a build/install that works on any machine. Newcomers also don’t have to guess what unknown aspects of a running server are sensitive. Another benefit is that you can test it on your local machine identically to how it will run in production. It also allows you to easily migrate from one cloud provider to another, which allows you to defer expensive commitments until you have more information about your application’s needs. Finally, if you have multiple applications running on the same server, you don’t want to have their needs conflict with each other, which can happen easily if two applications have dependencies that transitively depend on different versions of the same software. This is called “dependency hell.” In all of these, you protect yourself from becoming dependent on arbitrary choices you made before you knew better.

One industry-strength approach to hermeticity is to use containers. A container is a virtual machine devoted to running a single program, with explicitly-defined exposure to the outside world. We will set up three containers: one to run the database, one for the search application, and (later) one for a web server. We’ll start by deploying them all on the same machine, but could also deploy them to different machines. Docker is a popular containerization system. Before going on, I should stress that Docker, while I like it, is not sacred by any means. In a decade Docker may disappear, but the principle of hermeticity and the need for reproducible deployments will persist.

Docker allows you to describe your container by first starting from an existing (trusted) container, such as one that has an operating system and postgres already installed, and extend it for your application. This includes installing dependencies, fetching the application code from git, copying files into the container from the host system, exposing the container’s network ports, and launching the application. You save the commands that accomplish that in a Dockerfile with some special syntax. To deploy it, you copy the Dockerfile to the server (say, via git) and run docker commands to launch the container. You only have to get the Dockerfile right once, you can test it locally, and then it will work on any server just the same. The only caveat I’ve seen here is that if you migrate to a server with a different processor architecture, the install script (in our case, pip install numba) may fail to find a pre-compiled binary for the target architecture, and it may fall back to compiling from source, which can add additional requirements or force you to change which OS your container is derived from.

This reduces our “set up a new server” script to just a few operations: (1) install docker (2) fetch the repository (3) launch the docker containers from their respective Dockerfiles. In my experience, writing a Dockerfile is no small task, but figuring out how to install stuff is awful in all cases, and doing it for Docker gives you an artifact tracing the steps, and a reasonable expectation of not having to do it again.

Thankfully, you dear readers can skip my head-banging and see the Dockerfiles after I figured it out.

The Postgres Dockerfile

This commit adds a Dockerfile for the database, and makes some small changes to the project to allow it to run. It has only 15 lines, but it took me a few hours to figure out. The process was similar to installing confusing software on your own machine: try to install, see some error like "missing postgres.h“, go hunt around on the internet to figure out what you have to install to get past the error, and repeat.

Let’s go through each line of the Dockerfile.

FROM postgres:12

The first line defines the container image that this container starts from, which is officially maintained by the Postgres team. Looking at their Dockerfile, it starts from debian:buster-slim, which is a Debian Linux instance that is “slimmed” down to be suitable for docker containers, meaning it has few packages pre-installed. Most importantly, “Debian” tells us what package manager to use (apt-get) in our Dockerfile.

It’s also worth noting at this point that, when docker builds the container, each command in a docker file results in a new image. An image is a serialized copy of all the data in a docker container, so that it can be started or extended easily. And if you change a line halfway through your Dockerfile, docker only has to rebuild images from that step onward. You can publish images on the web, and other docker users can use them as a base. This is like forking a project on Github, and is exactly what happens when Docker executes FROM postgres:12.

ENV POSTGRES_USER docker
ENV POSTGRES_PASSWORD docker
ENV POSTGRES_DB divisor

These lines declare configuration for the database that the base postgres image will create when the container is started. The variable names are described in the “Environment Variables” section of the Postgres image’s documentation. The ENV command tells docker to instantiate environment variables (like the PATH variable in a terminal shell), that running programs can access. I’m insecurely showing the password and username here because the server the docker containers will run on won’t yet expose anything to the outside world. Later in this post you will see how to pass an environment variable from the docker command line when the container is run, and you would use something close to that to set configuration secrets securely.

RUN apt-get update \
        && apt-get install -y pgxnclient build-essential libgmp3-dev postgresql-server-dev-12 libmpc-dev

The RUN command allows you to run any shell command you’d like, in this case a command to update apt and install the dependencies needed to build the pgmp extension. This includes gcc and make via build-essential, and the gmp-specific libraries.

RUN apt-get install -y python3.7 python3-setuptools python3-pip python-pip python3.7-dev \
        && pip3 install wheel \
        && pip install six

Next we do something a bit strange. We install python3.7 and pip (because we will need to pip3 install our project’s requirements.txt), but also python2’s pip. Here’s what’s going on. The pgmp postgres extension needs to be built from source, and it has a dependency on python2.7 and the python2-six library. So the first RUN line here installs all the python-related tools we need.

RUN pgxn install pgmp

Then we install the pgmp extension.

COPY . /divisor
WORKDIR "/divisor"

These next two lines copy the current directory on the host machine to the container’s file system, and sets the working directory for all future commands to that directory. Note that whenever the contents of our project change, docker needs to rebuild the image from this step because any subsequent steps like pip install -r requirements.txt might have a different outcome.

RUN python3 -m pip install --upgrade pip
RUN pip3 install -r requirements.txt

Next we upgrade pip (which is oddly required for the numba dependency, though I can’t re-find the Github issue where I discovered this) and install the python dependencies for the project. The only reason this is required is because we included the database schema setup in the python script riemann/postgres_batabase.py. So this makes the container a bit more complicated than absolutely necessary. It can be improved later if need be.

ENV PGUSER=docker
ENV PGPASSWORD=docker
ENV PGDATABASE=divisor

These next lines are environment variables used by the psycopg2 python library to infer how to connect to postgres if no database spec is passed in. It would be nice if this was shared with the postgres environment variables, but duplicating it is no problem.

COPY setup_schema.sh /docker-entrypoint-initdb.d/

The last line copies a script to a special directory specified by the base postgres Dockerfile. The base dockerfile specifies that any scripts in this directory will be run when the container is started up. In our case, we just call the (idempotent) command to create the database. In a normal container we might specify a command to run when the container is started (our search container, defined next, will do this), but the postgres base image handles this for us by starting the postgres database and exposing the right ports.

Finally we can build and run the container

docker build -t divisordb -f divisordb.Dockerfile .
#  ... lots of output ...

docker run -d -p 5432:5432 --name divisordb divisordb:latest

After the docker build command—which will take a while—you will be able to see the built images by running docker images, and the final image will have a special tag divisordb. The run command additionally tells docker to run the container as a daemon (a.k.a. in the background) with -d and to -p to publish port 5432 on the host machine and map it to 5432 on the container. This allows external programs and programs on other computers to talk to the container by hitting 0.0.0.0:5432. It also allows other containers to talk to this container, but as we’ll see shortly that requires a bit more work, because inside a container 0.0.0.0 means the container, not the host machine.

Finally, one can run the following code on the host machine to check that the database is accepting connections.

pg_isready --host 0.0.0.0 --username docker --port 5432 --dbname divisor

If you want to get into the database to run queries, you can run psql with the same flags as pg_isready, or manually enter the container with docker exec -it divisordb bash and run psql from there.

psql --host 0.0.0.0 --username docker --port 5432 --dbname divisor
Password for user docker: docker
divisor=# \d
              List of relations
 Schema |        Name        | Type  | Owner  
--------+--------------------+-------+--------
 public | riemanndivisorsums | table | docker
 public | searchmetadata     | table | docker
(2 rows)

Look at that. You wanted to disprove the Riemann Hypothesis, and here you are running docker containers.

The Search Container

Next we’ll add a container for the main search application. Before we do this, it will help to make the main entry point to the program a little bit simpler. This commit modifies populate_database.py‘s main routine to use argparse and some sensible defaults. Now we can run the application with just python -m riemann.populate_database.

Then the Dockerfile for the search part is defined in this commit. I’ve copied it below. It’s much simpler than the database, but somehow took just as long for me to build as the database Dockerfile, because I originally chose a base image called “alpine” that is (unknown to me at the time) really bad for Python if your dependencies have compiled C code, like numba does.

FROM python:3.7-slim-buster

RUN apt-get update \
        && apt-get install -y build-essential libgmp3-dev libmpc-dev

COPY . /divisor
WORKDIR "/divisor"

RUN pip3 install -r requirements.txt

ENV PGUSER=docker
ENV PGPASSWORD=docker
ENV PGDATABASE=divisor

ENTRYPOINT ["python3", "-m", "riemann.populate_database"]

The base image is again Debian, with Python3.7 pre-installed.

Then we can build it and (almost) run it

docker build -t divisorsearch -f divisorsearch.Dockerfile .
docker run -d --name divisorsearch --env PGHOST="$PGHOST" divisorsearch:latest 

What’s missing here is the PGHOST environment variable, which psycopg2 uses to find the database. The problem is, inside the container “localhost” and 0.0.0.0 are interpreted by the operating system to mean the container itself, not the host machine. To get around this problem, docker maintains IP addresses for each docker container, and uses those to route network requests between containers. The docker inspect command exposes information about this. Here’s a sample of the output

$ docker inspect divisordb
[
    {
        "Id": "f731a78bde50be3de1d77ae1cff6d23c7fe21d4dbe6a82b31332c3ef3f6bbbb4",
        "Path": "docker-entrypoint.sh",
        "Args": [
            "postgres"
        ],
        "State": {
            "Status": "running",
            "Running": true,
            "Paused": false,
            ...
        },
        ...
        "NetworkSettings": {
            ...
            "Ports": {
                "5432/tcp": [
                    {
                        "HostIp": "0.0.0.0",
                        "HostPort": "5432"
                    }
                ]
            },
            ...
            "IPAddress": "172.17.0.2",
            ...
        }
    }
]

The part that matters for us is the ip address, and the following extracts it to the environment variable PGHOST.

export PGHOST=$(docker inspect -f "{{ .NetworkSettings.IPAddress }}" divisordb)

Once the two containers are running—see docker ps for the running containers, docker ps -a to see any containers that were killed due to an error, and docker logs to see the container’s logged output—you can check the database to see it’s being populated.

divisor=# select * from SearchMetadata order by start_time desc limit 10;
         start_time         |          end_time          |       search_state_type       | starting_search_state | ending_search_state 
----------------------------+----------------------------+-------------------------------+-----------------------+---------------------
 2020-12-27 03:10:01.256996 | 2020-12-27 03:10:03.594773 | SuperabundantEnumerationIndex | 29,1541               | 31,1372
 2020-12-27 03:09:59.160157 | 2020-12-27 03:10:01.253247 | SuperabundantEnumerationIndex | 26,705                | 29,1541
 2020-12-27 03:09:52.035991 | 2020-12-27 03:09:59.156464 | SuperabundantEnumerationIndex | 1,0                   | 26,705

Ship it!

I have an AWS account, so let’s use Amazon for this. Rather than try the newfangled beanstalks or lightsails or whatever AWS-specific frameworks they’re trying to sell, for now I’ll provision a single Ubuntu EC2 server and run everything on it. I picked a t2.micro for testing (which is free). There’s a bit of setup to configure and launch the server—such as picking the server image, downloading an ssh key, and finding the IP address. I’ll skip those details since they are not (yet) relevant to the engineering process.

Once I have my server, I can ssh in, install docker, git clone the project, and run the deploy script.

# install docker, see get.docker.com
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker ubuntu

# log out and log back in

git clone https://github.com/j2kun/riemann-divisor-sum && cd riemann-divisor-sum
bash deploy.sh

And it works!

Sadly, within an hour the divisorsearch container crashes because the instance runs out of RAM and CPU. Upgrading to a t2.medium (4 GiB RAM), it goes for about 2 hours before exhausting RAM. We could profile it and find the memory hotspots, but instead let’s apply a theorem due to billionaire mathematician Jim Simons: throw money at the problem. Upgrading to a r5.large (16 GiB RAM), and it runs comfortably all day.

Four days later, logging back into the VM and and I notice things are sluggish, even though the docker instance isn’t exhausting the total available RAM or CPU. docker stats also shows low CPU usage on divisorsearch. The database shows that it has only got up to 75 divisors, which is just as far as it got when I ran it (not in Docker) on my laptop for a few hours in the last article.

Something is amiss, and we’ll explore what happened next time.

Notes

A few notes on improvements that didn’t make it into this article.

In our deployment, we rebuild the docker containers each time, even when nothing changes. What one could do instead is store the built images in what’s called a container registry, and pull them instead of re-building them on every deploy. This would only save us a few minutes of waiting, but is generally good practice.

We could also use docker compose and a corresponding configuration file to coordinate launching a collection of containers that have dependencies on each other. For our case, the divisorsearch container depended on the divisordb container, and our startup script added a sleep 5 to ensure the latter was running before starting the former. docker compose would automatically handle that, as well as the configuration for naming, resource limits, etc. With only two containers it’s not that much more convenient, given that docker compose is an extra layer of indirection to learn that hides the lower-level commands.

In this article we deployed a single database container and a single “search” container. Most of the time the database container is sitting idle while the search container does its magic. If we wanted to scale up, an obvious way would be to have multiple workers. But it would require some decent feature work. A sketch: reorganize the SearchMetadata table so that it contains a state attribute, like “not started”, “started”, or “finished,” then add functionality so that a worker (atomically) asks for the oldest “not started” block and updates the row’s state to “started.” When a worker finishes a block, it updates the database and marks the block as finished. If no “not started” blocks are found, the worker proceeds to create some number of new “not started” blocks. There are details to be ironed out around race conditions between multiple workers, but Postgres is designed to make such things straightforward.

Finally, we could reduce the database size by keeping track of a summary of a search block instead of storing all the data in the block. For example, we could record the n and witness_value corresponding to the largest witness_value in a block, instead of saving every n and every witness_value. In order for this to be usable—i.e., for us to be able to say “we checked all possible n < M and found no counterexamples”—we’d want to provide a means to verify the approach, say, by randomly verifying the claimed maxima of a random subset of blocks. However, doing this also precludes the ability to analyze patterns that look at all the data. So it’s a tradeoff.

by j2kun at Monday, January 4, 2021

Monday, December 21, 2020

Scheme Requests for Implementation

SRFI 206: Auxiliary Syntax Keywords

SRFI 206 is now in final status.
This SRFI defines a mechanism for defining auxiliary syntax keywords independently in different modules in such a way that they still have the same binding so that they can be used interchangeably as literal identifiers in syntax-rules and syntax-case expressions and can be both imported under the same name without conflicts.

by Marc Nieper-Wißkirchen at Monday, December 21, 2020

Thursday, December 17, 2020

Scheme Requests for Implementation

SRFI 209: Enums and Enum Sets

SRFI 209 is now in final status.

Enums are objects that serve to form sets of distinct classes that specify different modes of operation for a procedure. Their use fosters portable and readable code.

by John Cowan (text) and Wolfgang Corcoran-Mathe (implementation) at Thursday, December 17, 2020

Tuesday, December 8, 2020

Programming Praxis

Rhyming Sort

I recently found on the internet a delightful paper by Kenneth Ward Church called Unixâ„¢ for Poets that provides an introduction to the Unix command line utilities for text processing. The paper must be ancient, because it gives the author’s email address at research.att.com, but I’ve never seen it before. One of the examples in the paper is rhyming sort, in which words are sorted from the right rather than the left; the left column below is in normal alphabetical order, the right column is in rhyming order:

    falsely         freely
    fly             sorely
    freely          surely
    sorely          falsely
    surely          fly

Church says:

“freely” comes before “sorely” because “yleerf” (“freely” spelled backwards) comes before “yleros” (“sorely” spelled backwards) in lexicographic order. Rhyming dictionaries are often used to help poets (and linguists who are interested in morphology).

Church solved the problem with a rev | sort | rev pipeline.

Your task is to sort a file containing one word per line into rhyming order. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at Tuesday, December 8, 2020

Friday, December 4, 2020

Mark Damon Hughes

Basic Games in Scheme

The first project I write in any new language is usually a guess-the-number game, a die roller, or an RPN calculator, then I start collecting those and other toys and utilities into a single "main menu" program, and use that to drive me to develop my libraries, play with different algorithms. Occasionally it's useful, mostly it's just a pile of stuff.

The Scheme version has a couple useful things. I was mostly thinking about old BASIC games, so it's "BasicSS" (SS being the Chez Scheme file extension, not anything more nautical or sinister).

I wrote a fairly malevolent wordsearch generator in the process of testing some file parsing, so here's one for 20 programming languages. I can tell you that B, C, C#, and D are not in my list. I'm doubtful that anyone can find all of them, or even half.

Hangman depends on /usr/share/dict/words, 235,886 lines on my system, which is very unfair:

 
 #     |
 #    ---
 # \ (o o) /
 #  \ --- /
 #   \ X /
 #    \X/
 #     X
 #     X
 #    / \
 #   /   \
 #
Word: TE---EN--
Guesses: E, T, A, O, I, N, B, R, S
YOU LOSE! You have been hung.
The word was TEMULENCY.

Seabattle ("you sunk my…") sucks, it just picks targets at random; teaching it some AI would help.

Hurkle, like all the early-'70s "find a monster on a grid" games, is awful, but the map display makes it a little easier to track your shots. "The Hurkle is a Happy Beast" by Theodore Sturgeon is one of his 10% good stories, but it provides only a little context.

Some of this I can release source for, some I probably shouldn't, so it's just a binary for now.

by mdhughes at Friday, December 4, 2020

GNU Guix

Add a subcommand showing GNU Guix history of all packages

Hello, everyone! I'm Magali and for the next three months, I'll be an Outreachy intern in the GNU Guix community. As part of my Outreachy application process, I made my first ever contribution to Free Software adding a package to Guix, and since then I'm eager to begin contributing even more.

My task for this three-month period is to add a subcommand showing the history of all packages. Although Guix makes it possible to install and have an older version of a package, it isn't as easy to find, for example, the commit related to these versions.

The subcommand I'll implement will be something like guix git log. The idea is that, for instance, when the user invokes guix git log --oneline | grep msmtp, a list with all the commits, one per line, related to msmtp, will be shown.

In order to accomplish my task, I have to sharpen up my Scheme skills, learn more about the guile-git, which is a Guile library that provides bindings to libgit2. So, to begin with, I'll dive into the Guix code and see how commands are built.

By the end of this internship, I hope to learn much more than just programming. I also expect to gain meaninful experience and improve my communication skills.

About GNU Guix

GNU Guix is a transactional package manager and an advanced distribution of the GNU system that respects user freedom. Guix can be used on top of any system running the kernel Linux, or it can be used as a standalone operating system distribution for i686, x86_64, ARMv7, and AArch64 machines.

In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. When used as a standalone GNU/Linux distribution, Guix offers a declarative, stateless approach to operating system configuration management. Guix is highly customizable and hackable through Guile programming interfaces and extensions to the Scheme language.

by Magali Lemes at Friday, December 4, 2020

Monday, November 30, 2020

GNU Guix

Welcome our intern for the Outreachy 2020-2021 round

We are thrilled to announce that Magali L. Sacramento (IRC: lemes) will join Guix as an Outreachy intern over the next few months.

Outreachy logo

Magali will work on adding a subcommand to Guix showing the history of all packages. This will facilitate the use of guix time-machine and inferiors, as it will add support to easily search for a given package version on all the defined channels.

Simon Tournier will be the primary mentor, with Gábor Boskovits co-mentoring, and the whole community will undoubtedly help and provide guidance, as it has always done.

Welcome, and looking forward to work together!

About GNU Guix

GNU Guix is a transactional package manager and an advanced distribution of the GNU system that respects user freedom. Guix can be used on top of any system running the Hurd or the Linux kernel, or it can be used as a standalone operating system distribution for i686, x86_64, ARMv7, and AArch64 machines.

In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. When used as a standalone GNU/Linux distribution, Guix offers a declarative, stateless approach to operating system configuration management. Guix is highly customizable and hackable through Guile programming interfaces and extensions to the Scheme language.

by Gábor Boskovits, Simon Tournier at Monday, November 30, 2020

Sunday, November 29, 2020

Scheme Requests for Implementation

SRFI 218: Unicode Numerals

SRFI 218 is now in draft status.

These procedures allow the creation and interpretation of numerals using any set of Unicode digits that support positional notation.

by John Cowan (text) and Arvydas Silanskas (implementation) at Sunday, November 29, 2020

SRFI 217: Integer Sets

SRFI 217 is now in draft status.

Integer sets, or isets, are unordered collections of fixnums. (Fixnums are exact integers within certain implementation-specified bounds.)

by John Cowan (text) and Wolfgang Corcoran-Mathe (implementation) at Sunday, November 29, 2020