Why Brainf*ck is the Ultimate Test for AGI

Asking Gemini 3 to generate Brainf*ck code results in an infinite loop, akin amost to a DDoS attack:

That is fascinating. So it made me wonder. Is Brainf*ck the ultimate test for AGI? I think so, and for 3 good reasons.

1. The Data Scarcity Problem

Large Language Models (LLMs) thrive on sheer volume. To master JavaScript, an LLM has been trained on virtually every available line of open-source code—hundreds of millions of lines of code (LOC). By comparison, the amount of functional Brainf*ck code on the web is a statistical rounding error.

We are talking about a million times less training data. Without the luxury of infinite patterns to copy, the model can't rely on mimicry; it has to understand the underlying logic.

2. Anti-Literate Programming

Brainf*ck is the antithesis of modern software engineering. There are no comments, no meaningful variable names, and no structure. In many ways, looking at existing Brainf*ck code is actually detrimental to a novice. Consider this typical snippet:

>++++++++[<+++++++++>-]<.>++++[<+++++++>-]<+.+++++++..+++.>>++++++[<+++++++>-]<+
+.------------.>++++++[<+++++++++>-]<+.<.+++.------.--------.>>>++++[<++++++++>-
]<+.

Writing in this environment is akin to zero-shot learning. Success requires reasoning at a high level of abstraction based on the fundamental rules of the language and a precise mental model of semantics, rather than memorized syntax.

3. The Repetition Problem

As we saw earlier, asking a modern model for complex Brainf*ck code often results in the model falling into an infinite loop—spewing the same characters over and over. The minimalistic nature of the language results in highly repetitive structures in the code. This poses a unique challenge to the way LLMs work.

An LLM is more likely to output what it has already seen based on previous tokens, and that pertains to its own output too. When some structure is repeated more than a couple of times, there is a likelihood that the model may learn that token X is the most likely output following itself. With every subsequent iteration, this increases the likelihood of outputting X in a self-fulfilling prophecy, resulting in the infinite loop.

So, is Brainf*ck the ultimate test for LLMs? You be the judge.