What We Know About How LLMs Actually Work

By Adis Jugo

29 August 2025

| AI & Technology

And What We Don’t Know Yet, And Might Never Find Out

I’ve been in IT long enough to remember when we were proud if a server survived a week without crashing. I thought I had seen it all: databases, networks, I developed my own ERP back in early 2000s, I’ve seen migrations from nightmares.

Then Large Language Models arrived. And suddenly, all that “complex” IT looked almost cute, like Duplo blocks next to LEGO Technic.

The paradox is this: these AI systems can write poetry, summarise contracts, explain quantum physics to your teenager - and yet we don’t really know how they do it. At least, not fully. And still, the last week, they’ve crossed one billion active users. The whole world is using something that kind of works, but which even its creators don’t entirely understand.

The uncomfortable truth about our AI revolution

A close friend of mine lost her job this spring - replaced by AI. OK, she was in industry for which she always knew was endangered (translator), but still. It was the first time the disruption wasn’t just in a news article but sitting at my kitchen table.

And yet, the thing is, we’ve built astonishing machines that replace human work, but our understanding of them is as primitive as medieval medicine. Back then, doctors knew blood circulated, but thought the body was powered by “humours.” They saw the effects, but not the mechanism. That’s where we are with AI at the moment.

Classical software? Predictable. You write rules, the machine follows. Break something, you attach a debugger and step through line by line.

LLMs? Forget it. They aren’t programmed in the usual sense. They’re grown. We feed them oceans of text, adjust billions of invisible dials (parameters), and intelligence emerges.

Imagine trying to teach someone cooking, not by giving them recipes, but by feeding them thousands of meals. “Here’s lasagne, here’s sushi, here’s ćevapi. Good luck!” After enough exposure, they might recreate the dishes. That’s how LLMs “learn”.

Mechanistic interpretability: anatomy of a digital brain

There’s now a whole field called mechanistic interpretability - basically digital anatomy.

Think of it this way: you want to understand how Windows works, but you don’t have the source code. Only the binary. Now imagine that binary is 175 billion random numbers. That’s what researchers stare at when they open up GPT-4.

They poke, prod, and map. Like Renaissance anatomists dissecting cadavers and discovering organs they didn’t know existed. Sometimes, they stumble on strange little circuits inside the model, the way an old anatomist might discover a nerve that goes nowhere obvious.

Transformers: from steam engines to jet turbines

The architecture behind modern AI is called the transformer. Google unveiled it in 2017 with the modest title “Attention Is All You Need”.

If earlier neural networks were steam engines - clunky, smoky, limited - transformers are jet turbines. Same goal (movement), utterly different design, exponentially more power.

The journey looks like this:

Text becomes tokens (numbers for words or parts of words).
Those numbers flow through dozens or hundreds of transformer layers.
Each layer performs mind-bending matrix maths.
Out comes text, one token at a time.

It’s autoregressive. Think of it as autocomplete - but if autocomplete had read all of Wikipedia, your email archives, and the entire history of German literature.

Attention: the secret sauce

So, what makes transformers so powerful? Attention.

When you read a novel and stumble on the name “Elizabeth”, your mind jumps back: “Ah yes, Elizabeth from chapter three.” You weigh old information against the new. That’s what attention does: each token looks back at all others and decides which matter most.

Researchers discovered that certain “attention heads” specialise:

Induction heads: pattern matchers. If the model saw “A → B” earlier, and now it sees A again, it predicts B. That’s why LLMs can learn a pattern from just a few examples.
IOI circuits: more complex teamwork. Example: “John and Mary went to the café. John bought a drink for…” The model answers “Mary.” Behind that is a detective-like collaboration of heads: one tracks names, another watches context, another copies sequences.

It’s like watching a football team. One player marks the striker, another covers the wings, another tracks back. No single player has the full picture, but together they defend the goal.

Remember: nobody coded these behaviours. They emerged.

The quiet giants: MLPs

Between these attention layers sit MLPs: Multi-Layer Perceptrons. Boring name, but they hold two-thirds of the model’s knowledge.

Think of them as giant associative memory banks:

Keys: “Are we talking about cooking? Formal grammar? Medieval history?”
Values: “Boost food-related words. Use polite phrasing. Mention knights.”

Thousands of them fire in parallel. The bigger the model, the more of these “memories” it can juggle. That’s why GPT-4 feels more knowledgeable than GPT-3.

Think of MLPs like an enormous library with librarians who specialise in odd topics. One librarian knows “French wine references,” another “C++ error messages.” When a query comes in, dozens of librarians throw suggestions onto the pile at once. The final answer is the combination of their votes.

From nonsense to competence: the training journey

When training starts, output is gibberish. Random text. But gradually, patterns emerge.

Stage 1: Basic word statistics.
Stage 2: Pattern-matching heads appear.
Stage 3: Understanding concepts and relationships.
Stage 4: Complex reasoning.

Sometimes learning happens in leaps - “phase transitions”. Like a child who struggles with the bike, falls fifty times, then suddenly pedals smoothly.

And then there’s grokking. A model memorises answers without understanding the rule. Much later, it suddenly generalises. Like a student who passes an exam by rote memorisation, then months later blurts out, “Ohhh, that’s how it works.”

Superposition: too many ideas, too few neurons

Here’s another puzzle: LLMs store more concepts than they have neurons. The trick is superposition.

One neuron might represent “cheese”, “Renaissance art”, and “C# debugging.” Ridiculous? Not really. Because those rarely appear together, it’s safe to reuse the same slot.

The downside: this makes neurons messy and “polysemantic.”

Imagine your office has only 10 meeting rooms, but 100 teams. You tell the teams: “You can share rooms as long as you never meet at the same time.” Suddenly, Room 7 is used for “HR strategy,” “marketing brainstorms,” and “karaoke practice.” That’s superposition.

Researchers invented Sparse Autoencoders to clean this up. These tools untangle neurons into clear, human-readable “features.” Anthropic did this with Claude and found millions of interpretable concepts: “Golden Gate Bridge,” “deception,” “Python errors.” Flip the right feature, and the model starts talking about bridges—or lying.

Debugging thoughts: attribution graphs

The latest advance is attribution graphs. These map how information flows inside the model.

Ask: “What’s the capital of the state containing Frankfurt?”

“Frankfurt” lights up.
“Hesse” lights up.
“Capital” lights up.
Out comes: “Wiesbaden.”

Block the “Frankfurt” feature, and suddenly it might say “Munich.”

For the first time, we can attach something like a debugger to an AI’s thought process.

What we know—and what we don’t

So far, the rough picture is:

Text becomes numbers.
Attention finds context.
Specialised circuits solve sub-tasks.
MLPs act as associative memories.
Layer after layer builds reasoning.

But the open questions are huge:

Do models understand, or just predict?
Why does scaling up make them so much better?
What really happens during those sudden learning leaps?

Why this matters

This isn’t academic daydreaming. These systems already shape our daily lives and boardroom strategies.

Safety: you can’t secure what you don’t understand.
Efficiency: these models burn electricity like data centres in July.
Trust: regulators and customers won’t accept “just trust us.”
Strategy: CTOs can’t build roadmaps on black boxes.

The part we can’t explain: why does this even work?

Because the unsettling part is, we don’t really know why this works. Yes, we can describe what’s happening - billions of parameters tuned by statistical gradients, layers of attention and memory, circuits that emerge out of nowhere. But the central question - why does simply scaling these models up suddenly produce such rich behaviour? - that remains unanswered.

We don’t have a first-principles theory of why language models “click” into “intelligence”. Why does feeding them more text, more parameters, and more GPU time make them not just incrementally better, but qualitatively different? Why does a model go from gibberish to grammar, from parroting phrases to demonstrating reasoning, without anyone explicitly designing those steps? Nobody knows. It’s like planting a tree, watering it, and then being surprised that it not only grows leaves, but also starts playing chess with you.

Researchers call this the scaling laws. Make the model bigger, give it more data, and it gets more capable. Not linearly, but in sudden leaps, almost as if hidden doors are opening inside. This is deeply frustrating to scientists - because it means we can predict that bigger models will be smarter, but we can’t explain why intelligence emerges this way. We can trace some circuits after the fact, like archaeologists brushing dust off ancient ruins, but we can’t design them from scratch.

And that’s where the unease comes from. Classical engineering works because we understand cause and effect. We know why a bridge holds or why a database query runs. With LLMs, we know how to build them, but not why the result behaves the way it does. It’s as if we’ve discovered fire: we can make it, we can use it, but we don’t really understand its chemistry yet. We stand in front of a system that produces reasoning, creativity, even flashes of humour - and we’re still unable to say why series of floating-point numbers arranged in just the right way suddenly start acting like an “intelligent” mind.

And what about AGI, you ask!

There’s a canyon we don’t talk about enough - the gap between today’s Large Language Models and what we imagine as Artificial General Intelligence (AGI). Current LLMs are dazzlingly good at patterns, language, and problem-solving, but they are still, at their core, extremely sophisticated prediction machines. They don’t “know” in the human sense. They don’t form intentions, build mental models of the world, or understand consequences. They stitch words together in statistically likely sequences - and remarkably, that already looks like reasoning, sometimes even creativity.

AGI, on the other hand, is the holy grail: a machine that can think across domains, reason flexibly, learn new concepts without mountains of training data, and operate with the kind of adaptability that humans do naturally. In other words, not just a master of pattern-matching, but a general-purpose mind.

Here’s where the debate rages. Some of the field’s most influential leaders - Sam Altman at OpenAI, Demis Hassabis at DeepMind, Dario Amodei at Anthropic - believe we may be closer than we think. They see scale as the hidden key. Make the models bigger, feed them more data, throw more compute at them, and new capabilities emerge. Not gradually, but in leaps. Altman has even gone so far as to say his team now “knows how to build AGI as we have traditionally understood it.” Amodei suggests human-level AI could appear as soon as 2026 or 2027. To them, AGI is not a distant dream but the natural continuation of what we’re already seeing with scaling laws: intelligence as an emergent property of size and training.

But many others remain unconvinced. Apple’s AI group, for instance, has been surprisingly blunt: today’s LLMs, no matter how big, fail at consistent reasoning, stumble on algorithmic tasks, and expose their lack of true understanding when pressed. These aren’t just bugs; Apple researchers call them fundamental barriers. In other words: you don’t reach AGI just by pumping more steroids into the same architecture.

Academia echoes this skepticism. A recent survey by AAAI showed that three out of four AI experts believe simply scaling transformers will not yield AGI. Yes, the models impress us, but they are still bounded by their statistical nature. Without a breakthrough in how we model reasoning, memory, and understanding, we may just be building ever-larger parrots: cleverer ones, but parrots nonetheless.

So, where are we? The optimists see AGI shimmering on the horizon, reachable within a decade if we just keep scaling. The skeptics see a mirage, warning that we’ll march toward it for years, only to discover that more size doesn’t equal more mind. And somewhere in between lies today’s uneasy reality: we’ve built something that feels intelligent enough to shake industries and politics, but we still don’t know if it’s a stepping stone to AGI, or a dead end.

Here’s a refined version of that closing section with your requested addition woven in smoothly, keeping the flow and tone intact:

On the end…

We’ve built machines that work (astonishingly well!), but which we don’t fully understand. They don’t run algorithms; they discover them. They’re not databases; they’re compression engines for human knowledge that sometimes act like reasoning minds.

Each new technique - induction heads, sparse autoencoders, attribution graphs - is a telescope into this universe. Each discovery is another constellation in a sky we’re still charting.

And yet, for all their brilliance, these systems are not AGI. They look intelligent, they act intelligent, but they are still bounded by what they’ve been trained on. Whether they will ever cross the threshold into true general intelligence is something nobody can answer today - optimists and skeptics alike are still debating if that line is even reachable with our current approaches.

So, when you talk with ChatGPT or Mistral, remember: you’re not speaking to a traditional program. You’re conversing with something that learned language like a child: through immersion, trial, and sudden leaps of intuition, but not something that “understands” the world the way you and I do.

🚀 Ready to Master AI Integration?

The future of AI development is unfolding before our eyes, and MCP is just the beginning. Join us at the European AI & Cloud Summit to dive deeper into cutting-edge AI technologies and transform your organization’s approach to artificial intelligence.

Advanced AI Integration Patterns

Learn from real-world implementations of MCP, function calling, and emerging AI protocols

Enterprise AI Architecture

Discover how leading companies are building scalable, production-ready AI agent systems

Hands-on Workshops

Get practical experience with the latest AI tools, frameworks, and integration techniques

Networking with AI Leaders

Connect with pioneers, researchers, and practitioners shaping the future of AI development

Join 3,000+ AI engineers, technology leaders, and innovators from across Europe at the premier event where the future of AI integration is shaped.

Secure Your Tickets Now

Early bird pricing available • The sooner you register, the more you save