Planet Scheme

Saturday, October 16, 2021

Idiomdrottning

Gemini Ain’t Broke

Gemini is good the way it is and a lot of the proposed changes lately stress me out.

I didn’t want to care about Gemini.

I’ve said many times that I saw it as “more specs”, as part of the web, part of the problem with the web’s complexity and not the solution.

I reluctantly jammed a capsule together, thinking “what’s the harm”.

Every time the community was proposing changes that would make it worse I would try to remind myself that I didn’t need to get worked up, that I didn’t need to care, that I didn’t agree with the idea in the first place, that it’s just an offshoot to my normal homepage, that I don’t get this invested in Atom or CSS spec stuff.

I was wrong.

I do need to care

I do need to care.

One reason is the community, which, as with everything else, is Sturgeon’s Law, but I’ve found some capsule gems.

The other is the whole… Often it feels like everyone on the web and the Google and Apple app stores is out to get you. A razor in every apple. Here in the EU, GDPR has made that clear: I go to what I’ve always believed was someone’s cozy, personal recipe web site and up pops ten thousand “legitimate interest” checkboxes. (It’s not that GDPR is bad, I appreciate it for exposing the maggots in the woodwork that had been there for a while.)

The idea that sites also can bitcoin mine on your machine with JavaScript is also pretty messed up.

On Gemini, there’s none of that paranoia. It’s just text. Sure, there’s the occasional dork that tries to submit half a gigabyte archives to Antenna but for the most part it’s just text. If I’m reading a gem file I know that I’m just reading a gem file.

End of an era

Slopes can be pretty slippery and I just remember so clearly how the original web got gradually messed up.

  • Tables for layout as opposed to data.
  • Images for texts as opposed to pictures.
  • Bad taste in font design and selection. OMFG Verdana with its messed up X-height that made everyone use font-size puny so that every non-Verdana machine was messed up.
  • Frames. So awful.
  • Broken, unfinished tags leading to the entire rest of the page being in italics.
  • That time the entire web was just huge blocks of broken puzzle pieces because everything was Java and Flash.

It’s gonna sound weird but CSS and JS made things better, not worse, because people could then do the messed up and ugly things they wanted to do in a way that was easy to disable. For a brief while, until that, too, was exploited to pieces. Popups and XSS and tracking and rest in peace web.

It could easily happen again to you folks

That is what Gemini is in the future if we’re not careful. Gemini has become this precious, cozy thing. It works. It’s good. There’s good stuff there.

If Gemini becomes the web, what, then, is even the point of Gemini?

by Idiomdrottning (sandra.snan@idiomdrottning.org) at Saturday, October 16, 2021

Thursday, October 14, 2021

Jeremy Kun

Group Actions and Hashing Unordered Multisets

I learned of a neat result due to Kevin Ventullo that uses group actions to study the structure of hash functions for unordered sets and multisets. This piqued my interest because a while back a colleague asked me if I could think of any applications of “pure” group theory to practical computer programming that were not cryptographic in nature. He meant, not including rings, fields, or vector spaces whose definitions happen to be groups when you forget the extra structure. I couldn’t think of any at the time, and years later Kevin has provided me with a tidy example. I took the liberty of rephrasing his argument to be a bit simpler (I hope), but I encourage you to read Kevin’s post to see the reasoning in its original form.

Hashes are useful in programming, and occasionally one wants to hash an unordered set or multiset in such a way that the hash does not depend on the order the elements were added to the set. Since collection types are usually generic, one often has to craft a hash function for a set<T> or multiset<T> that relies on a hash function hash(t) of an unknown type T. To make things more efficient, rather than requiring one to iterate over the entire set each time a hash is computed, one might seek out a hash function that can be incrementally updated as new elements are added, and provably does not depend on the order of insertion.

For example, having a starting hash of zero, and adding hashes of elements as they are added (modulo 2^{64}) has this incremental order-ignorant property, because addition is commutative and sums can be grouped. XOR-ing the bits of the hashes is similar. However, both of these strategies have downsides.

For example, if you adopt the XOR strategy for a multiset hash, then any element that has an even quantity in the multiset will be the same as if it were not in the set at all (or if it were in the set with some other even quantity). This is because x XOR x == 0. On the other hand, if you use the addition approach, if an element hashes to zero, its inclusion in any set has no effect on the hash. In Java the integer hash is the identity, so zero would be undetectable as a member of a multiset of ints in any quantity. Less drastically, a multiset with all even counts of elements will always hash to a multiple of 2, and this makes it easier to find hash collisions.

A natural question arises: given the constraint that the hash function is accumulative and commutative, can we avoid such degenerate situations? In principle the answer should obviously be no, just by counting. I.e., the set of all unordered sets of 64-bit integers has size 2^{2^{64}}, while the set of 64-bit hashes has size merely 2^{64}. You will have many many hash collisions, and would need a much longer hash to avoid them in principle.

More than just “no,” we can characterize the structure of such hash functions. They impose an abelian group structure on the set of hashes. And due to the classification theorem of finite abelian groups, up to isomorphism (and for 64-bit hashes) that structure consists of addition on blocks of bits with various power-of-2 moduli, and the blocks are XOR’ed together at the end.

To give more detail, we need to review some group theory, write down the formal properties of an accumulative hash function, and prove the structure theorem.

Group actions, a reminder

This post will assume basic familiarity with group theory as covered previously on this blog. Basically, this introductory post defining groups and actions, and this followup post describing the classification theorem for commutative (abelian) groups. But I will quickly review group actions since they take center stage today.

A group G defines some abstract notion of symmetries in a way that’s invertible. But a group is really meaningless by itself. They’re only useful when they “act” upon a set. For example, a group of symmetries of the square acts upon the square to actually rotate its points. When you have multiple group structures to consider, it makes sense to more formally separate the group structure from the set.

So a group action is formally a triple of a group G, a set X, and a homomorphism f:G \to S_X, where S_X is the permutation group (or symmetry group) of X, i.e., the set of all bijections X \to X. The permutation group of a set encompasses every possible group that can act on X. In other words, every group is a subgroup of a permutation group. In our case, G and f define a subgroup of symmetries on X via the image of f. If f is not injective, some of the structure in G is lost. The homomorphism determines which parts of G are kept and how they show up in the codomain. The first isomorphism theorem says how: G / \textup{ker} f \cong \textup{im} f.

This relates to our hashing topic because an accumulative hash—and a nicely behaved hash, as we’ll make formal shortly—creates a group action on the set of possible hash values. The image of that group action is the “group structure” imposed by the hash function, and the accumulation function defines the group operation in that setting.

Multisets as a group, and nice hash functions

An appropriate generalization of multisets whose elements come from a base set X forms a group. This generalization views a multiset as a “counting function” T: X \to \mathbb{Z}. The empty set is the function that assigns zero to each element. A positive value of k implies the entry shows up in the multiset k times. And a negative value is like membership “debt,” which is how we represent inverses, or equivalently set difference operations. The inverse of a multiset T, denoted -T, is the multiset with all counts negated elementwise. Since integer-valued functions generally form a group under point-wise addition, these multisets also do. Call this group \textup{MSet}(X). We will freely use the suggestive notation T \cup \{ x \} to denote the addition of T and the function that is 1 on x and 0 elsewhere. Similarly for T - \{ x \}.

\textup{MSet}(X) is isomorphic to the free abelian group on X (because an instance of a multiset only has finitely many members). Now we can define a hash function with three pieces:

  • An arbitrary base hash function \textup{hash}: X \to \mathbb{Z} / 2^n \mathbb{Z}.
  • An arbitrary hash accumulator \textup{h}: \mathbb{Z} / 2^n \mathbb{Z} \times \mathbb{Z} / 2^n \mathbb{Z} \to \mathbb{Z} / 2^n \mathbb{Z}
  • A seed, i.e., a choice for the hash of the empty multiset s \in \mathbb{Z} / 2^n \mathbb{Z}

With these three data we want to define a multiset hash function h^*: \textup{MSet}(X) \to \mathbb{Z} / 2^n \mathbb{Z} recursively as

  • h^*(\left \{ \right \}) = s
  • h^*(T \cup \left \{ x \right \}) = h(h^*(T), \textup{hash}(x))
  • h^*(T - \left \{ x \right \}) = \dots

In order for the second bullet to lead to a well-defined hash, we need the property that the accumulation order of individual elements does not matter. Call a hash accumulator commutative if, for all a, b, c \in \mathbb{Z} / 2^n \mathbb{Z},

\displaystyle h(h(a,b), c) = h(h(a,c), b)

This extends naturally to being able to reorder any sequence of hashes being accumulated.

The third is a bit more complicated. We need to be able to use the accumulator to “de-accumulate” the hash of an element x, even when the set that gave rise to that hash didn’t have x in it to start.

Call a hash accumulator invertible if for a fixed hash z = \textup{hash}(x), the map a \mapsto h(a, z) is a bijection. That is, accumulating the hash z to two sets with different hashes under h^* will not cause a hash collision. This defines the existence of an inverse map (even if it’s not easily computable). This allows us to finish the third bullet point.

  • Fix z = \textup{hash}(x) and let g be the inverse of the map a \mapsto h(a, z). Then h^*(T - \left \{ x \right \}) = g(h^*(T))

Though we don’t have a specific definition for the inverse above, we don’t need it because we’re just aiming to characterize the structure of this soon-to-emerge group action. Though, in all likelihood, if you implement a hash for a multiset, it should support incremental hash updates when removing elements, and that formula would apply here.

This gives us the well-definition of h^*. Commutativity allows us to define h^*(T \cup S) by decomposing S arbitrarily into its constituent elements (with multiplicity), and applying h^*(T \cup \{ x \}) or h^*(T - \{ x \}) in any order.

A group action emerges

Now we have a well-defined hash function on a free abelian group.

\displaystyle h^*: \textup{MSet}(X) \to \mathbb{Z} / 2^n \mathbb{Z}

However, h^* is not a homomorphism. There’s no reason hash accumulation needs to mesh well with addition of hashes. Instead, the family of operations “combine a hash with the hash of some fixed set” defines a group action on the set of hashes. Let’s suppose for simplicity that h^* is surjective, i.e., that every hash value can be achieved as the hash of some set. Kevin’s post gives the more nuanced case when this fails, and in that case you work within S_{\textup{im}(h^*)} instead of all of S_{\mathbb{Z} / 2^n \mathbb{Z}}.

The group action is defined formally as a homomorphism

\displaystyle \varphi : \textup{MSet}(X) \to S_{\mathbb{Z} / 2^n \mathbb{Z}}

where \varphi(T) is the permutation a \mapsto h(a, h^*(T)). Equivalently, we start from a, pick some set S with h^*(S) = a, and output h^*(T \cup S).

The map \varphi is a homomorphism. The composition of two accumulations is order-independent because h is commutative. This is how we view h as “the binary operation” in \textup{im} \varphi, because combining two permutations a \mapsto h(a, h^*(T)) and a \mapsto h(a, h^*(S)) is the permutation latex a \mapsto h(a, h^*(S \cup T)).

And now we can apply the first isomorphism theorem, that

\displaystyle \textup{MSet}(X) / \textup{ker} \varphi \cong \textup{im} \varphi \subset S_{\mathbb{Z} / 2^n \mathbb{Z}}

This is significant because any quotient of an abelian group is abelian, and this quotient is finite because S_{\mathbb{Z} / 2^n \mathbb{Z}} is finite. This means that the group \textup{im} \varphi is isomorphic to

\displaystyle \textup{im} \varphi \cong \bigoplus_{i=1}^k \mathbb{Z}/2^{n_i} \mathbb{Z}

where n = \sum_i n_i, and where the operation in each component is the usual addition modulo n_i. The i-th summand corresponds to a block of n_i bits of the hash, and within that block the operation is addition modulo 2^{n_i}. Here the “block” structure is where XOR comes in. Each block can be viewed as a bitmask with zeros outside the block, and two members are XOR’ed together, which allows the operations to apply to each block independently.

For example, the group might be \mathbb{Z} / 2^{4} \mathbb{Z} \times \mathbb{Z} / 2^{26} \mathbb{Z}\times \mathbb{Z} / 2^{2} \mathbb{Z} for a 32-bit hash. The first block corresponds to 32-bit unsigned integers whose top 4 bits may be set but all other bits are zero. Addition is done within those four bits modulo 16, leaving the other bits unchanged. Likewise, the second component has the top four bits zero and the bottom two bits zero, but the remaining 26 bits are summed mod 2^{24}. XOR combines the bits from different blocks.

In one extreme case, you only have one block, and your group is just \mathbb{Z} / 2^n \mathbb{Z} and the usual addition combines hashes. In the other extreme, each bit is its own block, your group is (\mathbb{Z} / 2 \mathbb{Z})^n, the operation is a bitwise XOR.

Note, if instead of 2^n we used a hash of some other length m, then in the direct sum decomposition above, m would be the product of the sizes of the components. The choice m = 2^n maximizes the number of different structures you can have.

Implications for hash function designers

Here’s the takeaway.

First, if you’re trying to design a hash function that avoids the degeneracies mentioned at the beginning of this article, then it will have to break one of the properties listed. This could happen, say, by maintaining additional state.

Second, if you’re resigned to use a commutative, invertible, accumulative hash, then you might as well make this forced structure explicit, and just pick the block structure you want to use in advance. Since no clever bit shifting will allow you to outrun this theorem, you might as well make it simple.

Until next time!

by j2kun at Thursday, October 14, 2021

Joe Marshall

Update October 2021

Here's a few things I've been playing with lately.

jrm-code-project/utilities has a few utilities that I commonly use. Included are utilities/lisp/promise and utilities/lisp/stream which provide S&ICP-style streams (lazy lists). utilities/lisp/utilities is a miscellaneous grab bag of functions and macros.

jrm-code-project/homographic is a toolkit for linear fractional transforms (homographic functions). In addition to basic LFT functionality, it provides examples of exact real arithmetic using streams of LFTs.

jrm-code-project/LambdaCalculus has some code for exploring lambda calculus.

jrm-code-project/CLRLisp is an experimental Lisp based on the .NET Common Language Runtime. The idea is that instead of trying to adapt a standard Lisp implementation to run on the .NET CLR, we just add a bare-bones eval and apply that use the CLR reflection layer and see what sort of Lisp naturally emerges. At this point, it only just shows signs of life: there are lambda expressions and function calls, but no definitions, conditionals, etc. You can eval lists: (System.Console.WriteLine "Hello World."), but I haven't written a reader and printer, so it is impractical for coding.

by Joe Marshall (noreply@blogger.com) at Thursday, October 14, 2021

Wednesday, October 6, 2021

Idiomdrottning

Bad Emacs defaults

Littering backup files all over the place

Emacs by default leaves files~ and #files# all over.

This is annoying because those files may get autoloaded.

A solution is

(defun make-backup-file-name (file)
  (concat "~/.emacs_backups/" (file-name-nondirectory file) "~"))

I was working on a new installation where I didn’t have this in place yet and I kept trying to update a shell function, re-sourcing it but it never changed. Turns out it was loading from a backup~ of the same file.

Backs up by moving the actual file

(setq backup-by-copying t)

The default is nil and that means that every time it makes one of those backups~, it actually moves your file there, copies the backup to the original name, and gives you the copy to work on. This isn’t just a philosophical dilemma (“Am I a butterfly that dreams I am a backup~?”) but actually breaks hardlinks. I can’t believe this is the default.

Sentences have two spaces between them

(setq sentence-end-double-space nil)

The default is t which might’ve made sense in the typewriter era but not only messes up filling paragraphs, it also borks the wonderful family of forward-sentence, kill-sentence, transpose-sentences etc.

Indentation, tabs, and spaces

Emacs famously has its idiosyncratic brace style and indentation style using a mix of tabs and spaces that no-one else uses.

Which would be fine in a vacuum but you end up fighting it when making changes in other people’s projects.

We’re on a super AI Lisp genius pile, can’t it figure it out from the rest of the file or the other files in the directory?

For this, I still use the old guess-style packages but I hope there is a better way. Write in!

Ctrl-H doesn’t delete backwards

C-h being a convenient, home row way to backspace has been a thing since the original ASCII table was laid out, and is a staple feature whenever you see “Emacs shortcuts supported” like bash or zsh. Except in Emacs itself, where it launches a huge, prompting, window-splitting help affair.

This was the first Emacs setting I ever changed.

Not sure what’s the best way to do it; I use:

(global-set-key [(control h)] 'delete-backward-char)
(keyboard-translate ?\C-h ?\C-?)

by Idiomdrottning (sandra.snan@idiomdrottning.org) at Wednesday, October 6, 2021

Thursday, September 30, 2021

Idiomdrottning

Separating schema from selection

In this 2018 talk at Clojure Conj, Rich Hickey proposes separating “schema� from “selection�.

The example he gives (in pseudocode) is this:

(s/def ::street string?)
(s/def ::city string?)
(s/def ::state state-code?)
(s/def ::zip int?)
(s/def ::addr (s/schema [[::street ::city ::state ::zip]]))

(s/def ::id int?)
(s/def ::first string?)
(s/def ::last string?)
(s/def ::user (s/schema [[::id ::first ::last ::addr]]))

for the schema, and then using those schemata in selections like

(get-movie-times
 user =>
 (s/select ::user [::id ::addr {::addr [::zip]}]))

or

(place-order
 user =>
 (s/select ::user
           [::first ::last ::addr
                    {::addr [::street ::city ::state ::zip]}]))

(Or, he wrote :: city with an extra space in there but we get the gist.)

And I’m, at first, like: “That’s great!�

(He had already won me over by slagging Haskell’s “Maybe� type, something I have also been ragging on. The more I get into Clojure the more I find that they’ve been a couple of steps ahead of me on the same sorta stuff I’ve been writing about for the past year or so♥.)

But then I’m like wait a minute… If you’ve got the “selection�, why the heck do you even need the schema?

Don’t we wanna single-point-of-truth this stuff? And the “selection� is clearly holding “the truth�. A user in the context of someone requesting movie times is a user id and a zip code. That’s the “truth� in the context of that selection. The schema isn’t doing much—on the leaf level, sure, it’s specifying that it’s ints we’re requesting, but there’s nada on the tree level. The schema is just sitting there like a moldy peach.

If we can define and reference vars, we can handle both the leaf issue (the int checking) and we can also reuse parts of schema.

That’s what Rich wanted in the first place; to reuse schemata between requirement and production.

Additionally, and I know that this was just placeholder syntax, but it’s super cumbersome to

{::addr [::street ::city ::state ::zip]}

Even in Zork we could get everything from the mailbox.

The human compiler at work.💔

by Idiomdrottning (sandra.snan@idiomdrottning.org) at Thursday, September 30, 2021

Scheme, if you already know how to program other stuff

I wanna make a separate “Scheme if you don’t know how to program” tutorial some other time!

We’re gonna use this example, a program that asks for two numbers, adds them, and displays the result.

(define (ask question)
  (display question)
  (newline)
  (read))

(display
 (+ (ask "What's the first number?")
    (ask "What's the second number?")))

(newline)

Functions and function calls

(foo bar baz)

is the equivalent of what’s in C, Java, Javascript etc would’ve been written as:

foo(bar, baz);

How to write without Emacs

A shorthand to count up parenthesis more easily!

Let’s look at this part as an example:

(display
 (+ (ask "What's the first number?")
    (ask "What's the second number?")))

To know how many parentheses are gonna be on that last line (in this case three), you can count all the ( that is further to the left.

So in this example:

(define (ask question)
  (display question)
  (newline)
  (read))

You are on the last line, “read”, and there are two parenthesis to the left of it.

You don’t have to look at the lines (display question) and (newline) because they’re not to the left.

Also decrement the count for each closing parentheses in the same line:

(+ (ask "hi") (ask "there"))

has two closing parentheses.

Extending the language

Now, the difference between a macro and a function is…

Uh, let’s start with the similarity.

(foo bar baz quux)

(foo bar baz quux)

They look kinda similar.

A function gets the values of those variables. So if foo is a function, if bar is 1, baz is 2, and quux is 3, the function will see those numbers and will never know their names.

If foo is macro, it will see the whole (foo bar baz quux) list and can manipulate it as any other list. But it can never see the numbers.

A macro will return another list. For example, the macro might turn that example into

(let ((hi (+ baz baz)))
  (+ hi (* bar bar)))

and that’s what’ll get called. So that is where those numbers will then be seen and used by the language.

That is kind of a pointless example but just to show what it can and can’t do.

Also, functions work from the inside out and macros from the outside in.

(foo bar (+ 13 14))

If foo is a function, it sees whatever value bar has, and 27. If it’s a macro, it instead sees that expression as a whole. It can change the plus to a multiplication if you want. This is the wonder of macros: you can deal with stuff in an unusual order, or conditionally skip things or repeat them or whatever. With macros you can create new syntax, new language features, but in the end it’s got to be functions that do the actual work.

All there is to it

In Scheme, stuff like “if” and “let” and “define” are macros.

(if (< 1 3) 'yes 'no)

That becomes 'yes.

(define (ask question)
  (display question)
  (newline)
  (read))

That doesn’t become anything at all. But it binds a function to the variable “ask” in the top level, and that function displays the question, displays a newline, and reads (and becomes, or, “evaluates to”) the answer.

“Evaluate”, what’s with the lingo? Just means it turns something into another “value”. E-value-ate.

Conventions and lingo

  • A procedure is the same as a function (pretty much)
  • A macro is the same as a special form
  • A zero-argument procedure is nicknamed a “thunk”
  • A true/false question procedure is nicknamed a “predicate” (and sometimes have their name end with a question mark)
  • Functions that only return values are called “pure”
  • Functions that aren’t pure are said to have “side-effects” (and sometimes have their name end with an exclamation mark)

A function that doesn’t have a name is called a “lambda”.

(define (ask question)
  (display question)
  (newline)
  (read))

is shorthand for

(define ask
  (lambda (question)
    (display question)
    (newline)
    (read)))

Functions are values, too, so (define another-ask ask) works, for example, or:

(map ask (list "First question" "Second question" "Third question"))

That might show you the questions in a weird order in some Schemes, but you’ll end up with a list of the answers.

That’s right! All this was sneakily just a buildup to explaining lambda calculus! Which you now know, since you know programming, and all programming is based on it. Thanks, Alonzo Church!

(Note that macro names can’t be passed around the same way.)

What functions and macros are available?

At this point you’re not learning new grammar, you’re learning vocabulary, and a tutorial isn’t necessary anymore; instead, go look at a long list of existing functions and macros. They’re also what you are gonna use for making your own functions and macros. Many implementations have specific stuff here that’s kinda golden compared to the default standard old stuff.

I suggest you learn a specific implementation instead of just trying to learn all Scheme versions at once. Read your implementation’s docs and also look in the R5RS index.

by Idiomdrottning (sandra.snan@idiomdrottning.org) at Thursday, September 30, 2021

Saturday, September 25, 2021

Idiomdrottning

When worse got better

Why in the burning ouchy heckfire did the Lisp community, for all its talk of “worse is better”, never actually roll up our sleeves and think of this!

I use and love the sexp output for the rare few apps that support it, like notmuch, and of course xmlstarlet is amazing, but we never did a large scale, ambitious, sexpify everything project like this. The JSON dorks beat us to the punch! We should just shame scoop on the spot.

I hate JSON and I’ve never even used my own xj filter.

When you just needs some nested alists JSON overkill and when you need the full schema and xpath rig it’s underkill compared to SXML but you know what they’ve got that we don’t?

It’s there. They actually delivered.

This “jc” is a killer app among killer apps. Holy smoke.

Why don’t you put the whole world in a serialized object notation, Superman? Well, they just up and did that.

by Idiomdrottning (sandra.snan@idiomdrottning.org) at Saturday, September 25, 2021

Tuesday, September 21, 2021

Idiomdrottning

Whitespace philosophy (for Lisp stuff)

Can people who put in a ton of whitespace in the middle of lines so that (on monospace) bindings or table values match up vertically please lay that off? Thank you.

  • It looks messed up on proportional fonts and on screenreader.
  • It’s difficult to extend; say you need to add a new line and it makes some of the “columns” now too narrow. Your diff now has to touch every line.
  • It makes it fiddly to add code in that area and to use Paredit properly.
  • You can’t string search for key/value pairs or name/binding pairs without a whitespace regex.

I don’t want my Lisp to be ASCII art. This isn’t Befunge.

(This isn’t refering to whitespace at the beginning of lines, a.k.a. indentation. That’s fine.)

by Idiomdrottning (sandra.snan@idiomdrottning.org) at Tuesday, September 21, 2021

Tuesday, September 14, 2021

Programming Praxis

Motzkin Numbers

[ I’ve had a couple of readers ask where I have been. I put in my retirement papers at work a few months ago, and will retire at the end of 2021. That sounds good, but setting up my post-retirement finances, learning about Medicare (it’s a huge mess) and deciding what to do with the rest of my life has been exhausting, and I still have a job to do. Here’s a simple exercise; I’ll try to be more regular about posting in future weeks. — Phil ]

Your task is to write a program to generate Motzkin Numbers (A001006). If you don’t like Motzkin numbers, pick any other sequence from the OEIS. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at Tuesday, September 14, 2021

Friday, September 3, 2021

Ruse Scheme

Ruse Scheme shall be

2021-05-18 - Ruse Scheme shall be

Ruse Scheme, formely known as Arew Scheme, is at this stage, a collection of Scheme libraries for Chez Scheme. There is a grand scheme plan plot machination for it. Read on.

What is a civilization kit?

A civilization kit is a software or set of software that ease the organization of life. So far, there is really one civkit that is mostly privateer that includes and is not limited to:

  • Wikimedia project such as Wikipedia, Wikidata, Wiktionary...
  • Google, Facebook, Github, Instagram, Twitter, Reddit, StackOverflow, Quora...
  • Android, iOS, Firefox, Chrome..
  • MacOS, FreeBSD, NetBSD, Windows, Debian, Fedora, Ubuntu...
  • Mastodon, and other projects that rely on activitypub...

And many more... that is only the visible part of Earth software system. They are software that aim to ease the the production of software or hardware. They are also software that helps with governance, provide tools to ease law making process, sustain production chain of food, energy, medecine, culture, education...

They are a lot of software, and that collection form a civkit.

Is Ruse Scheme a new Scheme?

Yes, and no. It depends what is meant by a new Scheme.

Sometime a Scheme is a software that gathers many Scheme libraries, and rely on existing Scheme to execute their code. That is the case of Ruse.

Most of the time, a Scheme is a software that interpret and/or compile a lot of parentheses that is more or less compatible with RnRS. In this regard, Ruse is a Scheme, but it is not completly new. It rely on Chez Scheme to produce executables that can be run on a server or a desktop. Ruse will support Web Assembly and JavaScript to run Scheme in a Web browser.

Some Scheme implementation do a little of both, and also deliver features that go beyond past or current RnRS. Ruse does that, and shall reach beyond...

The main difference with existing Scheme implementations is not found at the programming language level. Ruse is and will stay a Scheme.

The main area Ruse try to innovate is the rest: whether it is the the production or sharing of code, Ruse aim to make it easier than sharing a meme. Another area Ruse try to innovate is to state upfront the scope of the project.

What are the short term goal of Ruse Scheme?

The short term goal of Ruse Scheme is to build a scalable search engine: Babelia. Babelia will both scale-up and scale-down in terms of required hardware. In other words, it may run in the cloud or on a Raspberry Pi.

That first milestone will demonstrate how to build a distributed Von Neumann architecture that aim to be easier to work with than current approaches.

This is the first milestone because it is easier than going fully dencentralized first. It will deliver the necessary tools to work with the current Internet.

The plan is to deliver Babelia in 2022.

What is the next Internet?

The next Internet is an Internet that is more open, more decentralized, more accessible, and resting upon the fundamental principle.

What is the distributed Von Neumann architecture?

The distributed Von Neumann architecture is like a regular computer that rely on multiple commodity computers.

It is different from a compute grid, because it is not meant only for batch processing.

In Babelia, that distributed computer has volatile memory, non-volatile memory, possibly vectors or graphics processing units, and generic computation units.

The goal is to make it easier to build software inside a trusted network of computers.

What are the mid term goals of Ruse Scheme?

Mid term goals of Ruse Scheme are three folds:

  • Offer enough tooling to make it easier to create, sell and make a living by producing Scheme code. This includes making it painless to interop with existing software.

  • Implement a package manager inspired from Nix, and backed up by content-addressable code that can be translated into multiple natural languages with the help of a decentralized peer-to-peer network.

  • Explore a new operating system desktop paradigm resting upon the fundamental principle.

What is the goal of Ruse Scheme?

The goal of Ruse Scheme is to build a coherant bootstrapable whole-stack civkit for a sustainable civilization, resting upon the fundamental principle.

What is whole-stack?

Whole-stack build upon the full-stack concept to include programming databases, and kernels.

What is Ruse Scheme license?

Ruse Scheme is licensed under the Cooperative Non-violent Public License without exceptions.

What is the fundamental principle?

If a system must serve the creative spirit, it must be entirely comprehensible by a single individual.

Friday, September 3, 2021

Let there be binary executables

2021-09-02 - Let there be binary executables

I released ruse-exe. A Chez program that allows to create binary executables from Scheme programs.

Here is in full the usage documentation:

Usage:

  ruse-exe [--dev] [--petite] [--optimize-level=0-3] [EXTENSIONS-OR-DIRECTORIES ...] program.scm [ -- ARGS ...]

Given a Scheme program inside the file `program.scm`, produce a
standalone executable inside the current directory called `a.out`.
ruse-exe will look for the necessary files inside
`EXTENSIONS-OR-DIRECTORIES`.  The arguments following two dashed
`ARGS` will passed to the underlying C compiler.

The flag --dev will enable generate allocation and instruction count.

The flag --petite will only build with petite scheme.

The arguments `EXTENSIONS-OR-DIRECTORIES` will replace the default
extensions, source and library directories. If you want to compile a
project that has both `.ss` and `.scm` with libraries in the current
directory, you need to use something along the line of:

  ruse-exe .ss .scm . program.scm

Homepage: http://letloop.xyz

You can grab a ready to use on ubuntu 21.04 binary release at github.

Friday, September 3, 2021

Oops!

2021-09-03 - Oops!

Sorry planet Scheme there was a bug in my blog engine. I hope it is fixed now...

Friday, September 3, 2021

Jack: One Thread Per Core

2021-08-25 - Jack: One Thread Per Core

I have being making progress with Babelia on the web user interface, IRC interface, and also the backend.

Regarding the backend, even if Chez Scheme is fast, it is clear even on the small dataset I have put together, that is around eleven gigabytes without compression. I need something like map-reduce [1], in LISP world known under the name of for-each-parallel-map.

In full-text search, and in a search product like google, they are tips, and tricks to avoid to hit the worst case. The worst being a query where the least frequent word is also one of the most frequent in the index. Possible workarounds include 0) using AND as the default operator 1) eliminating most common word (also known as stop-words); 2) caching results; 3) approximating results with user profiling...

All those workarounds give rise to other problems, or they need a lot of work such as profiling users, which is in my opinion not a problem when that is limited to profiling users' data that are published in the open (unlike tracking search users via their queries, or mail, etc...).

Anyway, threads are difficult, so I wanted to give it try. The above is trying to explain from where my motivation stems from.

It is still unclear whether make-jack works reliably all the time. You tell me.

(make-jack count) → procedure?

make-jack initialize a pool of COUNT parallel threads, and return a possibly endless generator that produces jacks. A jack is made of two procedure:

  1. The first procedure is an accumulator that will consume one or more thunks. That is how the user request the parallel execution of something.

  2. The second procedure will generate the results of the thunks submitted with the associated accumulator in an unspecified order.

Example:

(call-with-values jack
   (lambda (consumer producer)
     ;; submit some work to the pool, the consumer will block
     ;; if there is too much work already scheduled.
     (consumer thunks)
     ;; pull results
     (let loop ((count (length input)))
       (unless (fxzero? count)
         ;; producer will block the current thread until there
     ;; is something to produce
         (display (producer))
     (newline)
         (loop (fx- count 1))))))

Here are some numbers that backup the claim that it may work as expected, the tests were done with pool of size five. The work, called THUNKS in the above snippet, is the computation of one thousand times fibonacci of 40:

(time (call-with-values jack ...))
    no collections
    0.029955076s elapsed cpu time
    152.646622367s elapsed real time
    592688 bytes allocated
        Command being timed: "scheme --libdirs src/ --program main.scm"
        User time (seconds): 761.84
        System time (seconds): 0.04
        Percent of CPU this job got: 498%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 2:32.71
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 49624
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 14194
        Voluntary context switches: 1011
        Involuntary context switches: 3646
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

The code:

(define-record-type* <queue>
    ;; This is a fifo queue, that can signal when items are available
    ;; or space is available. Space is available when there is less
    ;; than MARK items inside the REST. It is used in jacks in both
    ;; accumulators and generators.
    (make-queue% name head rest mark mutex item-available space-available)
    queue?
    (name queue-name)
    (head queue-head queue-head!)
    (rest queue-rest queue-rest!)
    ;; mark is an integer that allows to keep the number of produced,
    ;; and accumulated values low; Tho, there is room for starvation.
    (mark queue-mark)
    (mutex queue-mutex)
    (item-available queue-item-available)
    (space-available queue-space-available))

  (define (make-queue name mark)
    (make-queue% name '() '() mark (make-mutex) (make-condition) (make-condition)))

  (define (queue-pop! queue)
    (mutex-acquire (queue-mutex queue))
    (if (null? (queue-head queue))
        (if (null? (queue-rest queue))
            (begin
              ;; Wait for work...
              (condition-wait (queue-item-available queue) (queue-mutex queue))
              (mutex-release (queue-mutex queue))
              ;; recurse
              (queue-pop! queue))
            (let* ((item+new-head (reverse (queue-rest queue)))
                   (item (car item+new-head))
                   (new-head (cdr item+new-head)))
              ;; There is nothing in the head, but the rest has stuff
              ;; reverse the rest to keep it FIFO, and replace the
              ;; HEAD with it. Return the first item immediatly to
              ;; avoid to pressure the mutex, and best performance.
              (queue-rest! queue '())
              (condition-signal (queue-space-available queue))
              (queue-head! queue new-head)
              (mutex-release (queue-mutex queue))
              item))
        ;; There is work, all is well.
        (let ((item (car (queue-head queue)))
              (new-head (cdr (queue-head queue))))
          (queue-head! queue new-head)
          (mutex-release (queue-mutex queue))
          item)))

  (define (queue-push! queue . items)
    (mutex-acquire (queue-mutex queue))
    ;; we only check that the rest is less than the mark. BUT the user
    ;; may append more than mark ITEMS.
    (if (fx<? (length (queue-rest queue)) (queue-mark queue))
        (begin
          (queue-rest! queue (append items (queue-rest queue)))
          ;; TODO: It may not be necessary to wake up all waiting
          ;; threads, but only (length (queue-rest queue))?
          (condition-broadcast (queue-item-available queue))
          (mutex-release (queue-mutex queue)))
        (begin
          ;; Block until some work is done.
          (condition-wait (queue-space-available queue) (queue-mutex queue))
          (mutex-release (queue-mutex queue))
          ;; TODO: here it is possible to append the items without
          ;; recursing.
          (apply queue-push! queue items))))

  (define (make-jack count)

    ;; That is the queue for all work for the created thread pool.
    ;; The mark is an arbitrary number, it could be an argument.
    (define input (make-queue 'input (fx* count 2)))

    (define (worker-loop index input)
      ;; TODO: replace thunk+output with output+thunk, and avoid the
      ;; cdr before the car.
      (let ((thunk+output (queue-pop! input)))
        (queue-push! (cdr thunk+output) ((car thunk+output)))
        (worker-loop index input)))

    (define (worker-init count)
      (let loop ((count count))
        (unless (fxzero? count)
          ;; count is passed as the thread index for debugging
          ;; purpose
          (fork-thread (lambda () (worker-loop count input)))
          (loop (fx- count 1)))))

    (define (input-consumer input output)
      (lambda (thunks)
        ;; TODO: avoid the call to apply. The reverse is necessary, to
        ;; keep around the priority information FIFO.
        (apply queue-push! input (reverse (map (lambda (thunk) (cons thunk output)) thunks)))))

    (define (output-producer output)
      (lambda ()
        (queue-pop! output)))

    ;; Initialize thread pool at call site, that is somewhat unusual
    ;; for a generator to have side-effects outside producing values.
    (worker-init count)

    (lambda ()
      ;; again the mark is a clueless guess.
      (define output (make-queue 'output (fx* count 2)))
      (values (input-consumer input output) (output-producer output))))

Reference

  1. MapReduce: Simplified Data Processing on Large Clusters

Friday, September 3, 2021

Wednesday, September 1, 2021

Scheme Requests for Implementation

SRFI 230: Atomic Operations

SRFI 230 is now in draft status.

This SRFI defines atomic operations for the Scheme programming language. An atomic operation is an operation that, even in the presence of multiple threads, is either executed completely or not at all. Atomic operations can be used to implement mutexes and other synchronization primitives, and they can be used to make concurrent algorithms lock-free. For this, this SRFI defines two data types, atomic flags and atomic (fixnum) boxes, whose contents can be queried and mutated atomically. Moreover, each atomic operation comes with a memory order that defines the level of synchronization with other threads.

by Marc Nieper-Wißkirchen at Wednesday, September 1, 2021

Jeremy Kun

Carnival of Mathematics #197

Welcome to the 197th Carnival of Mathematics!

197 is an unseemly number, as you can tell by the Wikipedia page which currently says that it has “indiscriminate, excessive, or irrelevant examples.” How deviant. It’s also a Repfigit, which means if you start a fibonacci-type sequence with the digits 1, 9, 7, and then continue with a_n = a_{i-3} + a_{i-2} + a_{i-1}, then 197 shows up in the sequence. Indeed: 1, 9, 7, 17, 33, 57, 107, 197, …

Untangling the unknot

Kennan Crane et al showcased a new paper that can untangle tangled curves quickly, and can do things like generate Hilbert-type space-filling curves on surfaces. It’s a long thread with tons of links to videos and reading materials, covering energy functions, functional analysis, Sobolev methods, and a custom inner product.

Folding equilateral triangles without measuring

Dave Richeson shows off a neat technique for folding equilateral triangles using just paper and no measurements. Replies in the thread show the geometric series that converges to the right 60 degree angle.

Shots fired at UMAP and t-SNE

Lior Pachter et al. study what sorts of structure are preserved by dimensionality reduction techniques like UMAP (which I have also used in a previous article) by comparing it against a genomics dataset with understood structure. They make some big claims about how UMAP and t-SNE destroy important structure, and they show how to contrive the dimensionality reduction plot to look like an elephant even when there’s no elephantine structure in the data.

I’m not expert, but perhaps one best case scenario for UMAP enthusiasts would be that their analysis only applies when you go from very high dimensions down to 2 just so you can plot a picture. But if you stop at, say, \sqrt{n} dimensions, you might still preserve a lot of the meaningful structure. Either way, they make a convincing pitch for Johnson-Lindenstrauss’s random linear reductions, which I’ve also covered here. Their paper is on biorXiv.

Studying the Sieve

Ben Peters Jones took up Grant Sanderson’s math video challenge and released a series of videos studying the Sieve of Eratosthenes.

Additional submissions

Be sure to submit fun math you find in September to the next carvinal host!

by j2kun at Wednesday, September 1, 2021