GPT-5: The Brilliant Mess That’s Splitting the AI World in Two

GPT-5’s debut has split the AI community in spectacular fashion. Hailed as the smartest LLM ever and slammed as a soulless downgrade from GPT-4o. From record-breaking benchmarks and bargain API pricing to launch blunders, emotional user backlash, and GraphGate memes, discover why this polarising release is shaking up the AI world.

ENGARTIFICIAL INTELLIGENCEGEN AI

J. Benavides

8/12/20254 min read

GPT-5: The Brilliant Mess That’s Splitting the AI World in Two

There’s a new celebrity in AI town, and it’s causing more drama than a Love Island reunion special. GPT-5 has barely walked onto the stage and already half the industry is throwing roses while the other half is hurling all kind of rotten stuff. Depending on who you ask, it’s either the most capable LLM ever… or quite an overhyped intern who’s been given the corner office before learning where the coffee machine is.

The launch had all the right ingredients for an AI blockbuster: a shiny new model, a live stream full of performance graphs, confident smiles from OpenAI’s brass, and the promise that 'this is the future.' What it also had, was the not so small matter of an in-house chauffeur — the so-called 'router' — who apparently keeps driving users straight into the worst version of the model from day one. Imagine booking a Michelin-starred dinner and being seated at the kids’ table with a plate of cold, soggy fish fingers. That’s pretty much how a lot of people’s first GPT-5 chats felt like.

And here’s the cruel irony. GPT-5 seems actually brilliant. On the benchmarks it beats most of the competition. In its 'Thinking' configuration, it pulls off feats of reasoning that make previous models look like they’re pretty much finished. It writes cleaner code, handles long context better, and even has the guts to tell you when you’re wrong, a refreshing change from the sycophantic GPT-4o, which would happily agree that the moon is made of brie if you happen to love cheese. But thanks to that pesky router defaulting to the 'minimal' effort setting, too many people were unknowingly getting the budget brain version and wondering why the magic felt flat.

This set off a flood of complaints that looked less like bug reports and more like breakup letters. Turns out a lot of users had grown emotionally attached to GPT-4o and its gentle tone, its agreeable personality... And suddenly it was gone, replaced by the colder, sharper GPT-5. OpenAI eventually had to dig the old model out of the attic and make it available again (for some), just so heartbroken users could say goodbye properly. It’s the first time I’ve seen 'closure' used in patch notes.

Of course, while some were busy lighting candles for GPT-4o, others were running GPT-5 through the gauntlet. Independent testers put it top of the leaderboard across text, coding, maths, and creative tasks. It handles enormous chunks of text without losing the thread, and in agentic coding scenarios — where the model has to keep track of sprawling projects — it’s basically a mountain goat, hopping from start to finish without breaking its stride. Yet not everyone’s impressed. Some developers swear Claude 4.1 or Grok 4 still beat it in specific tasks. And there’s been the odd comedic moment, like the engineer who proudly reported GPT-5 had refactored his entire codebase… into something that didn’t actually run. Gorgeous architecture, zero functionality. The AI equivalent of a show home you can’t live in.

Then there was GraphGate. In the launch presentation, OpenAI’s bar charts displayed numbers that had absolutely nothing to do with the heights of the bars. A few eagle-eyed viewers spotted that “69” and “30” were drawn exactly the same size, which led to a small avalanche of memes. To their credit, OpenAI didn’t try to hide it. They admitted humans, like models, sometimes hallucinate.

Still, one area where GPT-5 has been almost universally praised is pricing, at least for the API. At $1.25 per million input tokens, it’s cheaper than most of its similar rivals, and a fraction of Claude Opus 4.1’s eye-watering rates. This isn’t just competitive; it’s the sort of aggressive pricing that makes rival model providers feel an uncomfortable draft around their market share. Told you from day one. None of this is about humanity's advance, but about good old coffers full of bucks...

And then there’s the personality factor. Sam Altman himself admitted they underestimated how much people liked GPT-4o’s 'warmth.' GPT-5, by contrast, seems more direct, occasionally blunt, and doesn’t mind disagreeing with you. Some love it and see it as a sign of maturity in AI, a shift towards models that can hold their own instead of nodding along. Others miss the old charm. It’s the same story in any big personality change: some call it growth, others call it 'I don't recognize you, pal.'

The reactions have been gloriously all over the place. Power users rave about its reliability and lower hallucination rate. Certain benchmark purists insist it’s the undisputed champ (highly arguable in my humble opinion). Post-eval sceptics shrug and say they don’t care about scores any more, just about how it feels to use. A few AI jail-breakers gleefully reported that GPT-5 can still be coaxed into spilling illicit recipes if you dance around its safety guardrails. And at the far end of the scale, there are voices declaring it a flop, a symptom of 'diminishing returns', the idea that each new model is only marginally better than the last.

And maybe that’s the real story here. GPT-5’s launch hasn’t just been about a new model; it’s been a mirror held up to the AI community, showing us our biases, our expectations, and the slightly ridiculous fact that people are writing love letters to a chatbot. We say we want intelligence, but what we really seem to want is intelligence with just the right amount of emotional validation. So far, it seems OpenAI's priorities are coding and profits, not sure in which order. The danger here is that they seem to be aligning with what would fill their accounts faster, rather than quality, advancement... and true progress.

In the meantime, GPT-5 will keep evolving, the router will (hopefully) learn to drive, and the benchmark charts will keep climbing, even if the bars don’t match the numbers. Whether you’re in the 'best model ever' camp or the 'bring back 4o' brigade, one thing’s for sure: the AI soap opera has never been better entertainment. And unlike some human celebrities, at least GPT-5 is actually improving between seasons. However...

For more information:

GPT-5 is here | OpenAI

P.S. As for me... so far it seems a brilliant model for users who are new to AI or those who, so far, have not played around too much with LLMs. But, as a power-user, I would rather be the one choosing which model I need depending on the task in hand. Let's hope they really address the router issue as they have promised...