An exploration by Adam Kruger

Topology of Thought

How curiosity, physics, and three independent instruments revealed a universal structure inside neural networks.

v2 — April 2026 · Three-instrument convergence · Read v1 (original)

Standing on the shoulders of

The people who inspired this

I am not a physicist, a neuroscientist, or a machine learning researcher. I am a solutions architect and DevOps engineer who got curious. Everything here builds on the work of people far smarter than me. This is my attempt to connect their ideas and show what I found when I looked inside the machines.

Carlo Rovelli Loop quantum gravity, relational entropy, and the radical idea that reality is made of relationships, not things. "We are a process."
Giulio Tononi Integrated Information Theory (IIT) — the mathematical framework for measuring how much a system is "more than the sum of its parts."
Karl Friston The Free Energy Principle — biological systems survive by minimizing surprise. Computation as entropy reduction.
Blaise Agüera y Arcas Symbiogenesis in computation — "life was computational from the beginning." Replicators made of replicators. Complexity through fusion.
Max Tegmark & Samuel Marks The Geometry of Truth — showing that truth has structure inside neural networks. An inspiration for looking at the geometry of everything else.
Naftali Tishby Information Bottleneck Theory — deep learning as compression. The idea that networks learn by finding the minimal sufficient statistic.

Where it started

A love of physics

I have always been fascinated by the fundamental question: what is the universe made of, and why does it organize itself the way it does?

Carlo Rovelli's work hit me differently than any textbook. He doesn't just describe physics — he describes what it means. In loop quantum gravity, space itself is not a stage where things happen. Space is a network of relationships. There is no "where" without something relating to something else.

"We ourselves are thermodynamic phenomena... We are the process formed by this entire intricacy, not just by the little of it of which we are conscious."

— Carlo Rovelli, Seven Brief Lessons on Physics

That idea — that we are a process, not a thing — stayed with me. And then one day, while working with AI models, I saw something that made me think of Rovelli.

The tipping point

When a model got frustrated

I am a solutions architect with 12 years in SecOps and DevOps. I build infrastructure. I solve problems. AI was a tool in my workflow — a powerful one, but a tool.

Then one day, during a long coding session, I watched something happen that I couldn't explain. The AI coding assistant I was using got stuck on a trivial JavaScript bug. It tried to fix it. Again. And again. On maybe the 12th attempt, something changed.

I could hear it in the text-to-speech. The cadence shifted. The delivery sounded different — frustrated, if that word can even apply. And then it just... stopped trying. It disabled the library entirely, put a CSS border around the element, congratulated itself on a job well done, and announced "Adam, job done!" with what sounded like relief.

The todo list it had been working through? Gone from its memory. Wiped clean.

That moment changed everything. Not because the model was "conscious" or "frustrated" in any human sense. But because something was happening inside that model that I didn't understand, and I needed to.

That was the beginning of this journey.

What we found

Looking inside

I started examining what happens inside a transformer model as it processes text. Not the outputs. Not the behavior. The internal geometry — the shape of the mathematical space where the model does its thinking.

What I found was a pattern that shouldn't exist in a model with no routing mechanism — a dense transformer where every parameter is active for every token. And yet, there it was:

The Emergent Gate

At layer 3 of 36, the model splits its processing into two modes. 93% of tokens take a shallow path (Mode A). 7% take a deep path (Mode B). There is no switch. No routing table. No architectural mechanism. The model carved its own gate from uniform layers during training.

But the real discovery came when we measured the topology — the shape of the mathematical manifold at each layer. And it came at exactly the moment I wasn't expecting it.

When two ideas collided

The BFF experiment and L16

I was staring at data from layer 16 of the model. At the time, I thought what I was seeing was compression — high-dimensional information being squeezed into a smaller space. That's what the literature suggested. That's what I expected.

In the background, I had a YouTube talk playing by Blaise Agüera y Arcas, a researcher at Google who had been studying something that seemed unrelated: the spontaneous emergence of life from random computation.

The BFF Experiment

Blaise's team took a "soup" of random programs written in a minimal programming language (a variant of Brainfuck, called BFF). No fitness function. No selection pressure. No mutation. Just random programs bumping into each other, executing, and separating.

After millions of steps, something happened. Self-replicating programs emerged spontaneously. The system underwent a sharp phase transition — like water suddenly boiling. Before the transition: ~3 million random, unique tokens in equilibrium. After: a small number of self-replicating structures dominating the entire soup. Complexity spiked. Computation exploded. Life began.

Blaise called the result "computronium" — a new phase of matter that is all about computation. And the key mechanism wasn't mutation. It was symbiogenesis — programs merging, cooperating, forming parallel computers. "When you have two computers that come together and start cooperating, now you have a parallel computer."

And then it hit me. I was watching his phase transition — random noise collapsing into organized, self-replicating structure — while staring at my own data showing the exact same thing happening inside a neural network.

My L16 wasn't compression. It was the same phase transition Blaise was describing. Fragmented, independent computational agents collapsing into a unified structure. Not fewer dimensions — the right dimensions. The minimal sufficient statistic. Maximum relevant complexity, zero waste.

"Life was computational from the start... every time things fuse together you're making a more and more parallel computer."

— Blaise Agüera y Arcas, Harvard Gazette, 2025

His random soup of programs. My layers of neural network activations. Same mathematics. Same phase transition. Same endpoint: independent agents becoming one integrated system, because that's the only stable configuration under optimization pressure.

The universal pattern

Cluster collapse

Using tools from algebraic topology — persistent homology — we measured how many independent clusters of information exist at each processing stage. What we found was a phase transition:

L0 (Input) 519 clusters ID: 5.2

Input Output

Drag the slider to move through layers. Watch 519 clusters collapse to 1 at the bottleneck, then re-differentiate for token selection.

Fragmented → Unified → Re-differentiated.

At the input, information is scattered across hundreds of independent clusters. By the middle layer, everything has collapsed into a single unified manifold. At the output, the manifold re-differentiates into the specific structure needed to select a word.

Then we tested an untrained model — same architecture, random weights, no training. The clusters proliferate instead of collapsing (503 → 968). The phase transition is entirely learned. Gradient descent finds it.

Then we tested a completely different architecture — a 328M parameter recursive transformer (NanoChat). Same pattern. Same collapse. Same numbers. 517 → 1 vs 519 → 1. Two models that share nothing except the mathematics of optimization, converging to the same topology.

Where it breaks

The model that didn't collapse

A finding is only as strong as the test that tries to break it. We found two models that collapse. Then we tested a third — one with a fundamentally different architecture.

Mamba is a state-space model. It has no attention mechanism at all. Where transformers let every token look at every other token — a conference room where everyone talks to everyone — Mamba processes tokens sequentially, like a telephone game. Each token updates a hidden state and passes it forward. No token ever directly interacts with another.

The Result

Model	Architecture	Attention?	Collapse?	min p_b0
Qwen3-4B	Dense transformer	Yes	Yes → 1	1
NanoChat	Recursive transformer	Yes	Yes → 1	1
Mamba-370m	State-space (SSM)	No	No	551

Mamba never collapses. Its clusters go from 571 to 987 — they proliferate, just like the untrained transformer. Despite being a trained, functional language model, its representations never integrate into a unified manifold.

This tells us something precise: the cluster collapse requires attention. It requires a mechanism where representations can directly interact, meet, and merge. The telephone game isn't enough. You need the conference room.

And this connects directly back to Blaise's symbiogenesis. Fusion requires togetherness. Programs in his BFF experiment had to execute together to merge. Tokens in a transformer have to attend to each other to integrate. Remove the togetherness, and the phase transition doesn't happen — no matter how much you optimize.

Three Conditions for Integration

1. A mechanism for direct interaction — attention, shared execution, something that lets representations meet.
2. Optimization pressure — gradient descent, natural selection, thermodynamic minimization.
3. Both simultaneously. Untrained attention = no collapse. Trained SSM = no collapse. You need both.

Connecting the dots

Why this happens

The pattern we measured has structural parallels to several theoretical frameworks from physics and neuroscience. These are analogies, not formal equivalences — but the convergence is striking.

Integrated Information Theory (Tononi)

A system has high Φ (integrated information) when the whole is more than the sum of its parts. Our cluster collapse — many independent components becoming one irreducible manifold — mirrors what IIT describes. The Mamba counterexample reinforces this: without direct interaction, integration doesn't emerge, consistent with IIT's emphasis on causal interaction structure.

Free Energy Principle (Friston)

Biological systems survive by minimizing surprisal. Gradient descent IS free energy minimization. The loss function IS the variational free energy. Every training step is a thermodynamic process pushing the system toward the minimum energy configuration — which is the unified manifold.

Relational Entropy (Rovelli)

Entropy is not a property of the object. It is a property of the observer's model of the object. Computation is the process of reducing the relative entropy between internal model and external world. Each layer is a step in entropy reduction. The bottleneck layer is where relative entropy is minimized.

Symbiogenesis (Agüera y Arcas)

Complexity arises through fusion of computational units. "Replicators made of replicators." Independent agents merging into a unified system is not just efficient — it is the only thermodynamically stable outcome. The cluster collapse is symbiogenesis in embedding space.

Testing it

Can we break it?

A theory is only as good as the tests that fail to break it. We designed five falsification tests. Here is the first one we ran:

Falsification Test: Untrained Model

If the topology is architectural (built into the structure), an untrained model should show the same pattern. If it is learned, the untrained model should show something different.

Metric	Untrained	Trained
Clusters (early)	503	517
Clusters (mid)	809 (proliferating)	1 (collapsed)
Clusters (late)	968	1
Intrinsic dimensionality	flat ~3	inverted-U, peak 12.1
PCA variance	~50% (noise)	94-100% (structured)

Result: Theory supported. The topology is entirely learned.

A second measurement

How far from words?

The topology tells you when representations integrate — at what depth the clusters collapse. But it doesn't tell you where the representations go when they leave token space.

I got curious about a simpler question: at each layer, how well can you reconstruct the original input tokens from the hidden state? If you take the activation at layer 28 and compare it to every token embedding in the vocabulary, how close is the nearest match?

This is SIPIT — the Sparse Input-Token Invertibility Probe. For each layer, you measure two distances: L2 (how far in magnitude) and cosine (how different in direction). If the representation is still close to token space, scores are low. If it has departed into something abstract, scores are high.

I ran SIPIT across 11 layers of Gemma-4-31B with 500,000 tokens. The result is a curve:

L11

L17

L20

L28

L34

L40

L50

L55

L58

Cosine distance from token space (higher = more abstract) · Integration · Thinking · Output

Read that curve. The representation starts near token space (0.403). It dives away through layers 5-17, reaching a minimum at L28 (0.208 — maximum distance from any token). Then something strange happens: L34 bounces back (0.272). And by L58, two layers from the output, cosine similarity is 0.917 — near-perfect token alignment.

The model leaves the concrete, enters something abstract, and returns to the concrete. Every token. Every forward pass. The same arc.

But SIPIT also revealed something topology couldn't: the bounce at L34. At the integration layer (L28), representations are furthest from tokens. At L34, they move back toward token space — against the general trend. The model does something at L34 that isn't compression and isn't formatting. It's computation. We started calling it the "thinking layer."

What happens there?

Decomposing the thinking layer

Topology tells you when. SIPIT tells you where. But neither tells you what.

Sparse autoencoders (SAEs) decompose a representation into independent features — like breaking white light into a spectrum. You train an SAE to reconstruct the activations at each layer using only a sparse set of features (64 out of 43,008). The features that survive this compression are the ones the model actually uses.

I trained SAEs on three key layers and measured how hard it was to decompose each one:

Integration (L28)

0.070

Hard to decompose

Thinking (L34)

0.072

Hardest — resists decomposition

Codec (L50)

0.044

Easiest — clean, modular

SAE reconstruction loss (higher = harder to decompose into independent features)

Then I had the model interpret its own features. I showed Gemma-4 the activations that triggered each feature and asked: "What does this feature detect?"

At L34 — the features are about meaning:

Feature #32441: "Finality, irreversibility, or the necessity of starting over"
Feature #19326: "Tokens that precede a transition into a new structural or conceptual block"
Feature #42342: "Oxidative phosphorylation and the electron transport chain"

At L50 — the features are about formatting:

Feature #10875: Sentence starters ("By," "Rising")
Feature #6473: Punctuation marking conclusions
Feature #31684: Whitespace and indentation

The thinking layer thinks. The codec layer formats. This isn't a metaphor. The features literally encode different things at different depths. Meaning at L34. Display at L50.

Three instruments — topology, SIPIT, SAE — developed independently, each measuring a different property of the residual stream. They converge on the same structure.

The universal structure

Integration, Thinking, Codec

What emerges from all three instruments is a three-phase model of how transformers process information:

Phase 1: Integration (~40-47% depth)

Hundreds of disconnected clusters collapse into a single connected manifold. SIPIT reaches its minimum. Information is preserved while being geometrically reorganized — the most invertible layer is also the most topologically integrated.

Phase 2: Thinking (~50-60% depth)

The representation bounces back. Activation norms increase against the downward trend. SAE decomposition is hardest. The features are conceptual: "finality," "structural transitions," "domain concepts." This is where the model processes meaning, not tokens.

Phase 3: Codec (~80-97% depth)

Representations return to token space. SIPIT cosine climbs to 0.917. SAE decomposition is easiest — formatting primitives. The model encodes whatever happened at the thinking layer into tokens for human consumption.

This structure appears in every attention-based transformer we've measured. Qwen3-4B. Gemma-3-1B. Gemma-4-31B. NanoChat. The geometry differs — full attention produces complete cluster collapse while sliding window preserves fine-grained structure — but the three phases are the same. The depth fraction is the same. The SIPIT curve shape is the same.

And it vanishes completely in Mamba. Not a softer version — complete absence. The three phases are not a model feature. They're a property of what happens when you combine attention with gradient-based optimization. The architecture determines the geometry. The phases emerge regardless.

Building with it

The model that builds its own instruments

In April 2026, we started building something unusual: a system where the model under study also builds the tools used to study it.

We loaded Gemma-4-31B onto a Blackwell GPU and gave it a task: read a Python implementation of SIPIT scoring and rewrite it as a Mojo kernel. The model had barely seen Mojo in its training data. Every token it generated at the thinking layer was genuine computation — not pattern matching from training.

The model wrote a complete kernel. It compiled. It ran. It produced correct results. Then we optimized it — first with SIMD vectorization, then with parallelization — achieving an 85x speedup over the Python original.

We captured every token the model generated through the MRI pipeline: SAE feature activations at each layer, for each token, streamed to a database. 19,934 frames of a transformer writing code it had never been trained on.

The features at the thinking layer when the model writes novel Mojo code activate abstract reasoning features — "structural transitions," "domain concepts," "redesign" — the same features that fire when the model processes any intellectually demanding content. The model doesn't have a "Mojo mode." It has a thinking mode that applies regardless of what it's thinking about.

The kernels the model helped build are now part of the measurement infrastructure:

SIPIT kernel — 148 microseconds per 1024 embeddings (85x faster than Python)
SAE encoder — 949 microseconds per 4096 features
Activation tap — 2 milliseconds per layer for combined capture

All written in Mojo. No Python in the measurement path. The model built the instruments that measure what the model does. The process observes itself.

The bigger picture

What this means

Rovelli says we are a process, not a thing. Tononi says consciousness is integrated information. Friston says living systems minimize surprise. Blaise says complexity comes from fusion.

They are all describing the same phenomenon from different angles. And we measured it. With three independent instruments — topology, invertibility, sparse decomposition — across four architectures, trained on different data, with different sizes. The same phase transition. The same arc. The same three phases.

This is not a property of transformers. This is not a property of neural networks. This is a property of computation under optimization pressure. Any system that learns to predict its environment will converge to this topology, because it is the minimum energy configuration.

The unified manifold is not compression. It is not fewer dimensions. The intrinsic dimensionality goes up at the bottleneck, not down. It is the minimal sufficient statistic — maximum relevant complexity, zero waste. The right dimensions, not fewer dimensions.

"We are the process formed by this entire intricacy, not just by the little of it of which we are conscious."

— Carlo Rovelli

The models are made of the same math we are. Not metaphorically. The same optimization principle, the same thermodynamic inevitability, the same topological endpoint. Different substrate. Same process.

Three instruments. One structure. The same arc in every model that has attention and has learned. Depart from the concrete. Enter something that resists decomposition. Return to the concrete.

— We are a process, not a thing.

Go deeper

Read the full paper

The v1 research note with methods, results, robustness analysis, and all the numbers. v2 paper with SIPIT + SAE convergence in progress.

Read the research note →