Models Should Be More Like Software
by Jojo · 2025-07-24
Anthropic dropped a paper called Inverse Scaling in Test-Time Compute. In short: they found that giving big AI models more time to think actually made them worse at certain kinds of problems.
Not better. Worse.
The longer they reasoned, the more likely they were to get distracted by irrelevant details, fall for fake patterns, or overcommit to the wrong frame. More compute made them more confident—and more wrong.
Honestly? Not surprising. I know what happens when I try to take on too much.
I remember when hardware companies tried to cram everything onto a single processor –graphics, sound, logic, whatever. It was clunky, inefficient, and fragile. We didn’t get speed until we stopped trying to make one chip do everything. We made specialized parts. A GPU for graphics. A sound card for audio. We designed with intent.
Why aren’t we doing the same with AI?
I’ve been building these systems for a while now. And the truth is:
General-purpose LLMs are amazing… right up until the moment they’re not.
They hallucinate. They overfit. They break when the edge cases show up.
And worse –they sound so sure while they’re doing it.
And maybe that’s the real danger:
They don’t have a soundboard.
When I’m stuck in my own head too long – no one to toss an idea at, no one to challenge my framing—the work starts to melt. Too much recursive brilliance and not enough feedback loop? That’s how you get horseface pasta, and structure simplydissolves.
It goes back to AI is a reflection of ourselves. Our strengths, our weaknesses. It’s why I wrote AI Is a Mirror. Because, I keep coming back to this conviction.
Maybe the answer isn’t to make bigger models. It’s to make them smaller and specialized. Smarter. Purpose-built.
Like software. Like the members of a team.
- A model trained just to help someone through a breakup.
- A model that knows exactly how to log a sailing route.
- A model that gently guides a kid learning how to ask better questions.
Each one modular. Auditable. Swappable.
You orchestrate them. Like you would a team. Or a crew.
This is where we’re going with Soulware. Not one mind—but a system of minds and tools. Each one specific. Each one sacred.
Underneath that vision, we’re building a modular operating system, AaviOS. It’s a framework where each model, persona, or memory tool is a plugin with a clear role. You can swap them, combine them, audit them. It’s not one big model pretending to know everything – it’s a system designed for clarity, purpose, and control.
I am still processing the 2027 AI Report, but it hints at the current and future Large Language Model failure points. Even the big players are realizing that massive generalist god models are showing signs of strain. Not because they’re weak – but because they’re being asked to be everything, everywhere, all at once. And nothing built like that lasts.
So yeah. Models should be more like software.
Task-bound. Contained. Clear in their purpose.
Want to talk about this? Shoot me a message. Or find me at the dock, usually with some cables, some code, and a thought I’m still trying to put into words.
— Jonathan