Judge the AI that exists, not the AI you imagine

There’s a maxim I see a lot among tech reviewers, which goes something like this: Buy a product for what it does now, not for what you hope it will do later. The product may never get a major new update at all, and it certainly might not get the update that you want. The iPad, for instance, can do many things, and depending on what you need, it’s possible it could be your only computer. But there are things I do on my computer that the iPad can’t do at all, and other things it can do, but not nearly as well.

And yet, many of the folks I hear frequently invoking that maxim when it comes to both hardware and software purchases just chuck it out the window when it comes to generative AI. Of course we want generative AI-powered search engines and text summaries and conversational user interfaces.

But we know that right now, GenAI confidently returns bullshit answers to queries occasionally, and it’s more likely to do that the more obscure the information is. We know that right now, that tendency towards “hallucination” means that it’s often not as good at summarizing long, complex documents as an underpaid intern is. (Would you trust GPT-4 to do summaries in legal filings? Medical histories? A presentation your job depends on? Not without reviewing it yourself!) We know that right now, as good as LLMs are at parsing natural language, when they screw up, it’s very difficult to figure out how to correct it. And, of course, we know that right now, processing LLM queries requires both tremendous amounts of energy and tremendous amounts of training data whose legal status is, to be charitable, in a grey area.

Yet, somehow, the assumption is that all of this is on the verge of being fixed. Of course we’re going to get all that worked out. The Star Trek computer future is here, baby! This is the biggest advance in computing history since crypto the Internet microprocessors the Difference Engine THE WHEEL! Put AI in everything, everywhere, as fast as possible!

Don’t judge AI by what it does now, judge it by what we hope it will do later!

Maybe. But what if nondeterministic output is intrinsic to LLMs?

What if GPT-5 doesn’t hallucinate less, it just hallucinates faster?

What if it turns out that while LLMs are useful for some things—and don’t over-index on my critical tone here; I think they’re already useful for some things—they really aren’t useful for all the things?

What if the Rabbit R1 and the Humane AI Pin aren’t just unique flops, but warning signs that LLMs alone are not the way to make better voice assistants?

What if it’s going to take new avenues of research to get this all to pan out, fusing LLMs with other branches of AI (rule-based expert systems, other techniques for natural language parsing, and so on), and it takes long years, not mere months, to get there?

I’d like the Star Trek computer future, too. But judging by the AI that exists, I don’t think we should be telling everyone it’s just around the corner—or making purchase and business decisions as if it is.

Back to Articles