Rollinglisp 001 - Of closures, mutability and switchable references

Tags: lisp langdev languages programming code programminglanguages interpreter referencecounting rust clojure

posted on Mon, 09 December 2024 by mai

Im currently in the first stages to implement my first ever lisp. And like every good lispy language, closures are a realy important part of the language and it's versatility. And as I choose to implement my lisp for the time being without any fancy tracing gc, only with good old ARC (Atomic reference counted), there arise some serious problems when trying to implement colsures.

But what's a closure anyway you might ask. For that we first need to understand what lambdas are.

Closures, lambdas and the world between them

Lambdas are just first-class functions, and are essentially the "inside" of functions in a more classical imperative language. A lambda holds just the code it should be executing and some "metadata" like it's count and name(s) of parameters. Lambdas already give us an powerfull concept: everything is a value. With this, we are able to write functions that themself return lambdas which the caller can invoke to do more things! Something like this:

function getHelloWorldPrinter() {
	// Here we're returning a lambda!
	return function(text) {
		console.log(`Hello world, ${text}!`);
	}
}

let printer = getHelloWorldPrinter();
// Invoke the lambda just like any other function:
printer("Mai");
// This prints: "Hello world, Mai!"

This already is handy, espc because we can give it parameters like normal, but it lacks one thing: context. Lambdas are just like functions: they can only access their parameters (and global variables, if the language supports this concept), everything else is off-limits. This has the problem that you need to pass in everything your lambda needs everytime you'll want to call it.

But fear not! There's already resure on it's way, and it's named closures! Closures are lambdas that close over their environment. It sounds very fancy, but is fairly simple: closures store additionaly a reference to the environment they're defined / created in! With this, a lambda can use things a normal lambda / function couln't ever reach.

For example, we can re-write our getHelloWorldPrinter function from above into this:

function getHelloWorldPrinter(name) {
	// This time this is a closure!
	// Note: this *seems* the same, only bc javascript decided
	//       that lambda's and closures are written *the same way*
	//       and the language decides which one to "deploy".
	return function() {
		console.log(`Hello world, ${name}!`);
	}
}

let printer = getHelloWorldPrinter("Mai");
// Now we can invoke our closure *without* the need to specify the name everytime:
printer();
// This still prints: "Hello world, Mai!"

As you see, this gives us a whole new dimension of posibilities: filter functions that can refer to outside state, functions that return clojures managing an internal state and thus not needing any sort of "object" for methods to attach to, and so much more!

Rolling your own lisp and the problems of closures

As every lisp language is at it's core purely based on values, if you implement an own dialect of it, you sooner or later come to the point where you need to implement not just lambdas, but closures. While lambdas are relatively straigt forward (they are really just a tuple of params + it's "body"), closures need to somehow store it's environment. While this is not particulary hard on it's own, it's all the effects you need to be aware off.

For example, you cannot just "copy" the environment; not only can it lead to big inefficencies due to coping and keeping huge environments, but you'll also need to make sure you'll recieve *updates* of the values inside an environment. Both of already defined symbols and undefined ones. If you get one of them wrong, with essential slightly better lambdas that cannot modify their state correctly or react to it properly.

Two such example that I personally used to test against these problems where a counter and an recursive algorith like the fibonacci sequence; however these examples might be different based on your languages inner working.

The "counter" test displays mutability of state; Like I said, it's might differ how you treat symbols. In clojure for example, they're themself are immutable and need a mutable container like atom

;; A counter; returns a tuple of two lambdas sharing an internal state `i`.
;; The first lambda returns the current value of the state while
;; the second lambda increments the current value of the state.
(def counter
	(fn [start]
		(let [i (atom start)]
			(list
				(fn [] (deref i))
				(fn [] (swap! i inc))
			)
		)
	)
)

;; Bind the tuple to the symbol "my",
;; As well as create convinience bindings
;; to the get / increment lambdas.
(def my (counter 5))
(def my_get (nth my 0))
(def my_inc (nth my 1))

;; Call the getter; it should respond with the starting value of 5
(print (my_get))

;; Call the incrementor
(my_inc)

;; Call the getter again;
;; Since we called the incrementor inbetween this and the last get call,
;; the value should be one higher, i.e. "6"
(print (my_get))

In my own dialect, variables are mutable unless stated otherwise, so we can define the counter differently:

(def counter
	(lambda (start)
		(progn
			(deflocal i start)
			(list
				(lambda () (i))
				(lambda () (set i (+ 1 i)))
			)
		)
	)
)

The other example, fib is even simpler: we define an recursive function fib that calls itself. Since the lambda is evaluated first in some implementations before the symbol is injected into the environment, the lambda itself could "oversee" the definition of fib and don't can resolve it later on.
Plain Text
```
(def fib
	(lambda (n)
		(if (eq n 0)
			0
			(if (eq n 1)
				1
				(+
					(fib (- n 1))
					(fib (- n 2))
				)
			)
		)
	)
)
```

Approches to implement closures

First approaches I've seen (but not tested enough yet):

Everything is immutable. Thats how clojure handles it; an env isn't mutable at all. Which you can actually see in the counter example for clojure above. We need to use atom, which is an container which allows mutation, but the symbol i itself isn't mutated ever.
Just define in your spec that the environment is fixed at declaration time. Emacs lisp does this; they simply state that closures in their dialect only capture a list of symbols from the lexical environment at the time they're declared. That effectviely means that espc our fib example is just not gonna work in emacs lisp bc the spec said so.

But I personally thing it's more entertaining to figuring out how this can work without constraining the language at all!

Here I will list approaches I tried (and why they ultimatively failed):

Just use strong references like Rc<RefCell<Env>> in rust. While it works on the surface, every check with valgrind or compareable memoryleak detectors will cry out big time since it can (and will!) lead to cycles in even simple closure usage like (define foo (fn [] ())). This happens since the env holds a strong reference on the value of foo, while the closure (the very value of foo!) holds a strong reference onto it's environment and the cycle is born!
Normaly, we just would use a weak reference anywhere but sadly we can't: The environment needs to ensure all values of all it's symbols stay atleast as long as it stays alive, and the closure needs to ensure that it keeps it's environment alive.
Instead of using a simple Map<Symbol, Value>, use an Map<Symbol, RefCount<Value>. Either copy the entire map whenever we need to "capture" the environment, or do some analysis to only "capture" what's needed. This approach gets rid of the problem of mutability, but both variants of this fail in the fib test since we define the fib after the lambda has copied everything, and thus the lambda actually fails to retrieve the definition of fib.
Switchable references. This is a somewhat "cursed" technique I tought of. In variant 1, we discover that we can't just place an weak reference into a closure since both the env and the closure need a strong reference to each other. Well thats not entirely true: we can get away with a conditional reference or "switchable" reference:
While the environment always needs strong references, the closure actually dosn't; that is as long the environment it closes over itself is alive. One it's get inactive or the closure "escapes", we can switch the closure's weak reference to a strong one.
- A simple escape-analysis at the end of closures / let's dosn't work for the nearly the same reason as our first approach: cycles. While calling our counter function will perfectly work, binding it onto another env will again make a cycle. How? Like this: we create an closure counter that itself has only a weak reference onto the envrionment; it dosn't prevent the env from going out of scope. It's not even the instantation itself (i.e. (counter 5): while it returns an strong reference to the inner environment holding the state; since its directly consumed the strong reference is gone with it. But now we bind it: (define my (counter 5)). Now the env contains an strong reference on an closure that contains a strong reference to the state-holding env creating inside counter, which itself needs to know it's parents! So creates a whole chain of strong references up until the topmost env; including the one we bind the closure to! So we again have a cycle. To combat this, we'll add another techinque I call "strongchain elimination". It's simple really: everytime we want to add a value to an env, we'll check if it's a lambda, and then go trough the chain of strong references until we find the env we're currently wanting it to be a part in; this reference we'll switch to an "weak" one. This ofc can potentially be dangerous since we dont know with whom we share a environment with. But in my (to be fair simple) tests, it dosnt produced any error nor memleak; and the tests where done in rust, so I'm somewhat fairly convinced that this works as intended. Espc bc currently I can't see a way how a lambda could share a environment-chain if not with a lambda that also escapes the same environment. (Which actually happens in our `counter` test, but worst thing is that we do the whole strongchain elemination a second time.)
- Another avenue you could pursue (which I havent tested yet), is to store the lambdas declared inside an environment inside the env itself as a list, and once it's about to be dropped, we switch all the lambda's that point towards it from "weak" to "strong". Might need the same "strongchain elimination" as the other approach in this category.

Finishing sparkles

At the end, I've choosen the switchable-refs with strongchain-elimination technique I tought of. It still needs more testing (and testcases!) but the the rust lisp implementation I've used to test this didn't threw any compiler error, nor hat it's test-suite of about ~40 test for evaluation found anything out of the ordinary. So yeah... "It just works"(TM) !!

(No seriously; if you have any idea of how to test this further, you're more than welcome to send me some lisp code that might validate the solidity of this technique further. Just send a open issue, or multiple, at my fork of rust_lisp or send me them via my socials, like mastodon .)

Otherwise you can always check out my own lisp implementation here