What We Know About How LLMs Actually Work
And What We Donât Know Yet, And Might Never Find Out
Iâve been in IT long enough to remember when we were proud if a server survived a week without crashing. I thought I had seen it all: databases, networks, I developed my own ERP back in early 2000s, Iâve seen migrations from nightmares.
Then Large Language Models arrived. And suddenly, all that âcomplexâ IT looked almost cute, like Duplo blocks next to LEGO Technic.
The paradox is this: these AI systems can write poetry, summarise contracts, explain quantum physics to your teenager - and yet we donât really know how they do it. At least, not fully. And still, the last week, theyâve crossed one billion active users. The whole world is using something that kind of works, but which even its creators donât entirely understand.
The uncomfortable truth about our AI revolution
A close friend of mine lost her job this spring - replaced by AI. OK, she was in industry for which she always knew was endangered (translator), but still. It was the first time the disruption wasnât just in a news article but sitting at my kitchen table.
And yet, the thing is, weâve built astonishing machines that replace human work, but our understanding of them is as primitive as medieval medicine. Back then, doctors knew blood circulated, but thought the body was powered by âhumours.â They saw the effects, but not the mechanism. Thatâs where we are with AI at the moment.
Classical software? Predictable. You write rules, the machine follows. Break something, you attach a debugger and step through line by line.
LLMs? Forget it. They arenât programmed in the usual sense. Theyâre grown. We feed them oceans of text, adjust billions of invisible dials (parameters), and intelligence emerges.
Imagine trying to teach someone cooking, not by giving them recipes, but by feeding them thousands of meals. âHereâs lasagne, hereâs sushi, hereâs Äevapi. Good luck!â After enough exposure, they might recreate the dishes. Thatâs how LLMs âlearnâ.
Mechanistic interpretability: anatomy of a digital brain
Thereâs now a whole field called mechanistic interpretability - basically digital anatomy.
Think of it this way: you want to understand how Windows works, but you donât have the source code. Only the binary. Now imagine that binary is 175 billion random numbers. Thatâs what researchers stare at when they open up GPT-4.
They poke, prod, and map. Like Renaissance anatomists dissecting cadavers and discovering organs they didnât know existed. Sometimes, they stumble on strange little circuits inside the model, the way an old anatomist might discover a nerve that goes nowhere obvious.
Transformers: from steam engines to jet turbines
The architecture behind modern AI is called the transformer. Google unveiled it in 2017 with the modest title âAttention Is All You Needâ.
If earlier neural networks were steam engines - clunky, smoky, limited - transformers are jet turbines. Same goal (movement), utterly different design, exponentially more power.
The journey looks like this:
- Text becomes tokens (numbers for words or parts of words).
- Those numbers flow through dozens or hundreds of transformer layers.
- Each layer performs mind-bending matrix maths.
- Out comes text, one token at a time.
Itâs autoregressive. Think of it as autocomplete - but if autocomplete had read all of Wikipedia, your email archives, and the entire history of German literature.
Attention: the secret sauce
So, what makes transformers so powerful? Attention.
When you read a novel and stumble on the name âElizabethâ, your mind jumps back: âAh yes, Elizabeth from chapter three.â You weigh old information against the new. Thatâs what attention does: each token looks back at all others and decides which matter most.
Researchers discovered that certain âattention headsâ specialise:
-
Induction heads: pattern matchers. If the model saw âA â Bâ earlier, and now it sees A again, it predicts B. Thatâs why LLMs can learn a pattern from just a few examples.
-
IOI circuits: more complex teamwork. Example: âJohn and Mary went to the cafĂ©. John bought a drink forâŠâ The model answers âMary.â Behind that is a detective-like collaboration of heads: one tracks names, another watches context, another copies sequences.
Itâs like watching a football team. One player marks the striker, another covers the wings, another tracks back. No single player has the full picture, but together they defend the goal.
Remember: nobody coded these behaviours. They emerged.
The quiet giants: MLPs
Between these attention layers sit MLPs: Multi-Layer Perceptrons. Boring name, but they hold two-thirds of the modelâs knowledge.
Think of them as giant associative memory banks:
- Keys: âAre we talking about cooking? Formal grammar? Medieval history?â
- Values: âBoost food-related words. Use polite phrasing. Mention knights.â
Thousands of them fire in parallel. The bigger the model, the more of these âmemoriesâ it can juggle. Thatâs why GPT-4 feels more knowledgeable than GPT-3.
Think of MLPs like an enormous library with librarians who specialise in odd topics. One librarian knows âFrench wine references,â another âC++ error messages.â When a query comes in, dozens of librarians throw suggestions onto the pile at once. The final answer is the combination of their votes.
From nonsense to competence: the training journey
When training starts, output is gibberish. Random text. But gradually, patterns emerge.
- Stage 1: Basic word statistics.
- Stage 2: Pattern-matching heads appear.
- Stage 3: Understanding concepts and relationships.
- Stage 4: Complex reasoning.
Sometimes learning happens in leaps - âphase transitionsâ. Like a child who struggles with the bike, falls fifty times, then suddenly pedals smoothly.
And then thereâs grokking. A model memorises answers without understanding the rule. Much later, it suddenly generalises. Like a student who passes an exam by rote memorisation, then months later blurts out, âOhhh, thatâs how it works.â
Superposition: too many ideas, too few neurons
Hereâs another puzzle: LLMs store more concepts than they have neurons. The trick is superposition.
One neuron might represent âcheeseâ, âRenaissance artâ, and âC# debugging.â Ridiculous? Not really. Because those rarely appear together, itâs safe to reuse the same slot.
The downside: this makes neurons messy and âpolysemantic.â
Imagine your office has only 10 meeting rooms, but 100 teams. You tell the teams: âYou can share rooms as long as you never meet at the same time.â Suddenly, Room 7 is used for âHR strategy,â âmarketing brainstorms,â and âkaraoke practice.â Thatâs superposition.
Researchers invented Sparse Autoencoders to clean this up. These tools untangle neurons into clear, human-readable âfeatures.â Anthropic did this with Claude and found millions of interpretable concepts: âGolden Gate Bridge,â âdeception,â âPython errors.â Flip the right feature, and the model starts talking about bridgesâor lying.
Debugging thoughts: attribution graphs
The latest advance is attribution graphs. These map how information flows inside the model.
Ask: âWhatâs the capital of the state containing Frankfurt?â
- âFrankfurtâ lights up.
- âHesseâ lights up.
- âCapitalâ lights up.
- Out comes: âWiesbaden.â
Block the âFrankfurtâ feature, and suddenly it might say âMunich.â
For the first time, we can attach something like a debugger to an AIâs thought process.
What we knowâand what we donât
So far, the rough picture is:
- Text becomes numbers.
- Attention finds context.
- Specialised circuits solve sub-tasks.
- MLPs act as associative memories.
- Layer after layer builds reasoning.
But the open questions are huge:
- Do models understand, or just predict?
- Why does scaling up make them so much better?
- What really happens during those sudden learning leaps?
Why this matters
This isnât academic daydreaming. These systems already shape our daily lives and boardroom strategies.
- Safety: you canât secure what you donât understand.
- Efficiency: these models burn electricity like data centres in July.
- Trust: regulators and customers wonât accept âjust trust us.â
- Strategy: CTOs canât build roadmaps on black boxes.
The part we canât explain: why does this even work?
Because the unsettling part is, we donât really know why this works. Yes, we can describe whatâs happening - billions of parameters tuned by statistical gradients, layers of attention and memory, circuits that emerge out of nowhere. But the central question - why does simply scaling these models up suddenly produce such rich behaviour? - that remains unanswered.
We donât have a first-principles theory of why language models âclickâ into âintelligenceâ. Why does feeding them more text, more parameters, and more GPU time make them not just incrementally better, but qualitatively different? Why does a model go from gibberish to grammar, from parroting phrases to demonstrating reasoning, without anyone explicitly designing those steps? Nobody knows. Itâs like planting a tree, watering it, and then being surprised that it not only grows leaves, but also starts playing chess with you.
Researchers call this the scaling laws. Make the model bigger, give it more data, and it gets more capable. Not linearly, but in sudden leaps, almost as if hidden doors are opening inside. This is deeply frustrating to scientists - because it means we can predict that bigger models will be smarter, but we canât explain why intelligence emerges this way. We can trace some circuits after the fact, like archaeologists brushing dust off ancient ruins, but we canât design them from scratch.
And thatâs where the unease comes from. Classical engineering works because we understand cause and effect. We know why a bridge holds or why a database query runs. With LLMs, we know how to build them, but not why the result behaves the way it does. Itâs as if weâve discovered fire: we can make it, we can use it, but we donât really understand its chemistry yet. We stand in front of a system that produces reasoning, creativity, even flashes of humour - and weâre still unable to say why series of floating-point numbers arranged in just the right way suddenly start acting like an âintelligentâ mind.
And what about AGI, you ask!
Thereâs a canyon we donât talk about enough - the gap between todayâs Large Language Models and what we imagine as Artificial General Intelligence (AGI). Current LLMs are dazzlingly good at patterns, language, and problem-solving, but they are still, at their core, extremely sophisticated prediction machines. They donât âknowâ in the human sense. They donât form intentions, build mental models of the world, or understand consequences. They stitch words together in statistically likely sequences - and remarkably, that already looks like reasoning, sometimes even creativity.
AGI, on the other hand, is the holy grail: a machine that can think across domains, reason flexibly, learn new concepts without mountains of training data, and operate with the kind of adaptability that humans do naturally. In other words, not just a master of pattern-matching, but a general-purpose mind.
Hereâs where the debate rages. Some of the fieldâs most influential leaders - Sam Altman at OpenAI, Demis Hassabis at DeepMind, Dario Amodei at Anthropic - believe we may be closer than we think. They see scale as the hidden key. Make the models bigger, feed them more data, throw more compute at them, and new capabilities emerge. Not gradually, but in leaps. Altman has even gone so far as to say his team now âknows how to build AGI as we have traditionally understood it.â Amodei suggests human-level AI could appear as soon as 2026 or 2027. To them, AGI is not a distant dream but the natural continuation of what weâre already seeing with scaling laws: intelligence as an emergent property of size and training.
But many others remain unconvinced. Appleâs AI group, for instance, has been surprisingly blunt: todayâs LLMs, no matter how big, fail at consistent reasoning, stumble on algorithmic tasks, and expose their lack of true understanding when pressed. These arenât just bugs; Apple researchers call them fundamental barriers. In other words: you donât reach AGI just by pumping more steroids into the same architecture.
Academia echoes this skepticism. A recent survey by AAAI showed that three out of four AI experts believe simply scaling transformers will not yield AGI. Yes, the models impress us, but they are still bounded by their statistical nature. Without a breakthrough in how we model reasoning, memory, and understanding, we may just be building ever-larger parrots: cleverer ones, but parrots nonetheless.
So, where are we? The optimists see AGI shimmering on the horizon, reachable within a decade if we just keep scaling. The skeptics see a mirage, warning that weâll march toward it for years, only to discover that more size doesnât equal more mind. And somewhere in between lies todayâs uneasy reality: weâve built something that feels intelligent enough to shake industries and politics, but we still donât know if itâs a stepping stone to AGI, or a dead end.
Hereâs a refined version of that closing section with your requested addition woven in smoothly, keeping the flow and tone intact:
On the endâŠ
Weâve built machines that work (astonishingly well!), but which we donât fully understand. They donât run algorithms; they discover them. Theyâre not databases; theyâre compression engines for human knowledge that sometimes act like reasoning minds.
Each new technique - induction heads, sparse autoencoders, attribution graphs - is a telescope into this universe. Each discovery is another constellation in a sky weâre still charting.
And yet, for all their brilliance, these systems are not AGI. They look intelligent, they act intelligent, but they are still bounded by what theyâve been trained on. Whether they will ever cross the threshold into true general intelligence is something nobody can answer today - optimists and skeptics alike are still debating if that line is even reachable with our current approaches.
So, when you talk with ChatGPT or Mistral, remember: youâre not speaking to a traditional program. Youâre conversing with something that learned language like a child: through immersion, trial, and sudden leaps of intuition, but not something that âunderstandsâ the world the way you and I do.
đ Ready to Master AI Integration?
The future of AI development is unfolding before our eyes, and MCP is just the beginning. Join us at the European AI & Cloud Summit to dive deeper into cutting-edge AI technologies and transform your organizationâs approach to artificial intelligence.
Advanced AI Integration Patterns
Learn from real-world implementations of MCP, function calling, and emerging AI protocols
Enterprise AI Architecture
Discover how leading companies are building scalable, production-ready AI agent systems
Hands-on Workshops
Get practical experience with the latest AI tools, frameworks, and integration techniques
Networking with AI Leaders
Connect with pioneers, researchers, and practitioners shaping the future of AI development
Join 3,000+ AI engineers, technology leaders, and innovators from across Europe at the premier event where the future of AI integration is shaped.