Logarithms, Loops, and the Architecture of Memory

What 3Blue1Brown's latest video taught me about how AI might actually remember.


Disclaimer: I am an actuary. ASA not yet achieved; all requirements are met and the designation is expected in June or July depending on application timing. The thoughts and analyses here are my own explorations, not professional advice. Draw your own conclusions.

I watched a math video last week and haven't stopped thinking about it. That's standard for me — I'm the kind of guy who pauses YouTube to grab a notebook — but this one stuck harder than most. It wasn't about AI. It wasn't about insurance. It was about taking the logarithm of an image. Somehow that made me rethink both. I love when math does this.

The video is Grant Sanderson's "How (and why) to take a logarithm of an image" on 3Blue1Brown. On its surface, it's about Escher's Print Gallery — that famous lithograph of a man standing in a gallery, looking at a print that contains the gallery that contains the man. Turtles all the way down. But underneath the Escher, Sanderson is really showing you something about the complex logarithm, and the strange topological things that happen when you invert the exponential function in the complex plane.

Stay with me — this is where it gets good.

The Video: Unwrapping Escher

The setup is this. The complex exponential function ez is many-to-one. If you walk 2π units along the imaginary axis, you end up back where you started — same output, different input. The function wraps infinitely many horizontal strips onto the same region, like rolling a sheet of wallpaper into a tube.

The complex logarithm, being the inverse of the exponential, goes the other direction: it's one-to-many. A single input maps to infinitely many outputs, each spaced 2πi apart vertically. Mathematicians usually handle this by picking a "branch" — choosing one value and pretending the others don't exist. But Sanderson makes the case that for understanding Escher, you should embrace the full multi-valued structure.

When you apply the complex log to an image, circles become lines, spirals unwind, and the self-similar structure of something like a Droste effect — an image containing a smaller version of itself, containing a smaller version of itself — becomes simple translational repetition. The complex log literally unwraps the infinite recursion into a flat, periodic tile.

The magic is in the return trip. To close the loop and create Escher's impossible print, you apply a specific rotation in the logarithmic space, then exponentiate back. The vertical periodicity rolls back up into closed circles, and zooming inward now smoothly rotates you into the next iteration. The math stitches the seam between one copy and the next so perfectly that you get a continuous, infinitely self-referencing image.

The man looks at the print. The print contains the gallery. The gallery contains the man. Escher drew it. Sanderson explained it. Math closed the loop.

The One-to-Many Insight

This is where I started scribbling. The complex logarithm is a one-to-many function. One input, infinite outputs. That is not a bug — it is the whole point. The multi-valuedness is what creates the looping, self-referencing structure that makes Escher's print work.

Now think about how neural networks operate. A trained neural network is a deterministic function: one input goes in, one output comes out. Feed it the same prompt twice, you get the same response (temperature=0, at least). There is no multi-valuedness. There is no loop. The computation flows forward through the layers exactly once, and that's it.

This is by design. Determinism is what makes neural networks trainable — backpropagation needs a clear gradient path from output to input. But it also means the architecture is fundamentally non-looping. The information passes through once and is done. There's no mechanism for a representation to refer back to itself, to contain itself, to spiral inward the way Escher's gallery does.

What if that's exactly what memory requires?

Why Looping Matters for Memory

Human memory is deeply, irreducibly recursive. I don't just remember the time I visited the Met — I remember remembering it three years later when someone mentioned Vermeer, and I remember that memory triggering a conversation about light in Dutch painting, which I now remember while writing this sentence. Memory refers to itself. It contains itself. It is the print that contains the gallery.

Current transformer-based AI doesn't do this. A large language model processes tokens in a single forward pass — attention can look back at earlier tokens in the context window, but there's no recurrence, no feedback loop, no mechanism for a representation to evolve by repeatedly revisiting itself. The context window is a canvas, not a loop. It has edges.

RNNs and LSTMs did have recurrence — they literally fed their output back as input for the next step. But that recurrence was sequential and local: one step influencing the next. It wasn't the kind of global, self-similar looping that the complex logarithm produces, where one point maps to an infinite tower of reflections.

I wonder if there's something in between. Not the simple sequential recurrence of an RNN, and not the single-pass attention of a transformer, but something that allows for multi-valued representations — where a memory isn't stored as one vector in one place, but as a family of related states, spaced out like the 2πi-periodic values of a complex logarithm, each one a different perspective on the same experience.

The speculative leap: What if episodic memory in AI needs the same mathematical structure as Escher's print — a one-to-many mapping that creates self-reference? Not a single stored vector, but an infinite, self-similar family of representations that loop back into each other?

I have no idea if this is tractable. Multi-valued functions are notoriously hard to work with computationally. Branch cuts exist for a reason. But the more I think about it, the more I suspect that the reason current AI feels "memoryless" — despite context windows growing to hundreds of thousands of tokens — is that the architecture can't loop. It can attend to the past, but it can't contain the past the way human memory does.

The Actuarial Connection

I should mention — I'm an actuary. Actuaries think about feedback loops all day. We just don't usually reach for Escher to explain them, which is frankly a missed opportunity.

Loss development factors compound. Today's reserve estimate feeds into next quarter's pricing, which affects what business we write, which determines next year's loss experience, which revises the reserve estimate. The insurance cycle is a loop. Soft market → aggressive pricing → poor loss ratios → hard market → conservative pricing → good loss ratios → soft market again. It spirals, it self-references, it contains itself.

And the math we use to model these cycles — exponential growth, logarithmic transforms, periodic functions — is the same family of mathematics that Sanderson used to decode Escher. I'm not claiming the connection is rigorous. But there's something aesthetically right about the idea that the math of self-referencing visual paradoxes might be the same math that governs how experience folds back into expectation.

Maybe Escher was an actuary. (He wasn't. But he would have been good at it.)

Open Questions

I want to be honest about what I'm doing here. This is speculation — pattern-matching across domains, the kind of intellectual play that's more fun than it is rigorous. But the questions feel real enough to write down:

Could multi-valued activation functions improve memory? What if, instead of the standard ReLU or softmax, certain layers used activations inspired by the complex logarithm — functions that map one input to a structured family of outputs? This would massively complicate training, but it might create the kind of self-referencing representations that episodic memory seems to need.

Is the Droste effect a computational primitive? Self-similarity shows up everywhere in nature — fractals, coastlines, blood vessels, neural branching patterns. Is it also a primitive for memory? Is "this thing contains a smaller version of this thing" a fundamental operation that computing architectures should support natively?

What would a "looping" transformer look like? Not an RNN-style sequential loop, but something where the output of attention is fed back through the same layers multiple times, with each pass adding a 2πi-like offset that creates a different "branch" of the representation. A kind of spiral attention.

Is there a connection to Riemann surfaces? Mathematicians handle multi-valued functions by working on Riemann surfaces — spaces where you "unfold" the branches so the function becomes single-valued on a more complex domain. Maybe the right way to think about memory in AI isn't multi-valued activations in flat space, but single-valued activations on a richer underlying geometry.


I don't have answers yet. I have a notebook full of diagrams, a browser with thirty open tabs, and the distinct feeling that a YouTube video about a Dutch lithograph told me something real about how minds — biological and artificial — might work. That is a completely absurd sentence and I'm thrilled to live in a century where it's true.

Good enough for one blog post. Escher would approve.

Thoughts? Pushback? Better math?

I'd love to hear from you — especially if you think I'm wrong.

Send me an email