21 Comments
User's avatar
Mandy Honeyman's avatar

"The term hallucination has become nearly worthless in the LLM discourse"... Will be taking this back as a starter for teaching both staff and students. Thank you. (Because of the difference between (mis)using llms for searching and analysis.)

Expand full comment
Matt Crosslin's avatar

The first concern I have is that the example given of finding the truth about the Slade photo takes most people seconds, not minutes to figure out. I have done this many, many times myself. So the question remains. Why take all the time to examine and refine Ai output when you can quickly do the search yourself? The whole "oh, you anti-Ai people don't know prompt engineering" response is getting very, very thin after all this time.

Second, this whole example seems to justify Ai being this way because it is like human thinking - which it is not. I keep going back to Stephan Wolfram's description of what is happening inside Ai and it still holds. It's not working through responses like a human - it is ranking possible pattern completions based on how well each response completes the pattern. The fact that you don't get a correct answer the first time is actually the Ai system working correctly. Because of the misinformation out there, it is telling you the most likely response. This is because, again, Ai is not human and does not view truth the way a human does. It is looking at the most likely pattern completion in a database trained on all the misinformation out there. You then asking for "evidence" is asking it to override it's core programming. Asking for evidence tells Ai to stop with pattern completion and use a different algorithm. One that it should have used in the first place - but the people that created it don't want it to use (red flag here!) because that makes it harder to control.

The third problem I have is the gross misrepresentation of Ai critics. Yes, we do know how to do prompt engineering, or whatever you want to call it now. Most Ai usage is not a simple web search photo hoax investigation. There are lawyers using Ai to write court filings with fake citations... MIT found that 95% of all business implementations of Ai are currently failing... People are being told they are gods, or that they should kill themselves, or all kinds of horrific things... Saying these things are often "merely" a first pass when these are what you get through deep engagement? When it is a simple true/false fact about a person in a picture, sure you can get to the truth through Ai (even though I find I am faster with just myself than the process to get there with Ai). But the more complex things people actually do with Ai? Things get weirder and weirder the longer you go.

Expand full comment
Max's avatar

"Asking for evidence tells Ai to stop with pattern completion and use a different algorithm. One that it should have used in the first place - but the people that created it don't want it to use (red flag here!) because that makes it harder to control."

What algorithm are you referring to here?

Expand full comment
Dustin's avatar

I guarantee you this is not a gross misrepresentation of some substantial portion of AI criticisms. Not all posts have to directly address all possible arguments of all people holding an opposing opinion.

Expand full comment
Matt Crosslin's avatar

Dustin, I never said anything about addressing all possible arguments. Research is generally starting to find that the more people learn about Ai, the more they are critical of it. If you go out there and read various criticisms, you will see that it is from people that know Ai well. Pease read the research as well as the majority of the criticism.

Expand full comment
Dustin's avatar

Your criticism was that this post was a gross misrepresentation of AI critics. This is incorrect. This post accurately represents the position of many.

I may or may not agree with your position. Whatever the research shows people think about AI is irrelevant to my point.

Expand full comment
Matt Crosslin's avatar

Again, you really should read the research and the criticism. Research is beginning to show that the more people understand Ai, the more critical they are of it. Mike made the point that they "often" do not understand how to prompt Ai correctly. Research is beginning to show that criticism most often comes from people that do know how to prompt correctly. You are moving the goal post by choosing a very obscure unquantifiable term of "many." I doubt what you are counting as "many" is what I would count as "many." "Often" implies a norm, while "many" is more subjective. Not to mention I said "gross misrepresentation of Ai critics." Name one Ai critic that actually doesn't know how prompt engineering can improve results. I would be amazed if you could find a few, much less many. The general population of people probably don't know this much about Ai - but Ai critics that have been speaking out about how much they dislike Ai? Nope.

Expand full comment
Dustin's avatar

Ok, I'll bite. Show me this research.

Expand full comment
Matt Crosslin's avatar

If you know Ai critics so well that you can claim to know what many of us current do and don't know, it seems you would have seen the link to the Wall Street Journal article on this topic many of us shared. But since that is paywalled, I will point to some research from earlier this year:

Tully, S. M., Longoni, C., & Appel, G. (2025). Lower Artificial Intelligence Literacy Predicts Greater AI Receptivity. Journal of Marketing, 89(5), 1-20.

I'm still waiting for the name of an Ai Critic that doesn't know about prompt engineering. I think if you would go on Bluesky, you would see many Ai Critics getting tired of the constant "you don't like Ai because you don't know how to use it" refrain in different flavors. Ed Zitron wrote a 16,000 page rant about how tired he is of getting gaslit as an Ai critic. This one part of that rant is applicable here:

https://bsky.app/profile/edzitron.com/post/3lxafjpkp4s2s

Expand full comment
Maha Bali's avatar

Part of me loves this post - this way that you so generously take us through your human and AI-prompted processes - this transparency is so useful and eye-opening. I also just love reframing the prompt to "what is the evidence for and against..."

What stopped me early on, Mike, is that I really think the depth of the problem differs by what you're asking about. IF you ask something that is within the reach of an LLM based on its training set, you'll eventually find a way to iterate to find the answers. However, I believe there is a cultural bias in AI hallucinations, and that is basically that you are more likely to come across hallucinations if you're asking about stuff that is far from US/Western/Anglo/Internet culture - and this one won't be fixed by additional prompting, but by a complete retraining maybe of AI (just RAGs don't usually do it enough, but they help improve it).

Expand full comment
Mike Caulfield's avatar

Oooh... This is a great point. Particularly since the way that I find to make these systems struggle (like with the Drabble photo) is to find places where there is not much authoritative stuff in either the training set or search results, and you're right that is not that difficult in English language topics but it is probably ridiculously easy in many other languages. And when the data voids get too big, hallucinations are going to be the result.

So this isn't meant to be primarily a statement on hallucination prevalence, but more how a class of responses are really saveable by iteration, and I think that general point applies in all languages, but how many things fall into that class (vs just being bad answers) is absolutely going to be language dependent. Thank you!

Expand full comment
Maha Bali's avatar

yes, language and culture dependent. Whatever isn't in the training data or is marginal in the training data, is probably not going to be found or unhallucinated with better prompting. But I so love the approach for things where it could work.

Expand full comment
DocDre's avatar

I feel like this calls for a DLC chapter of Verified, since i read your post thru that lens. What would Verified say about AI-as-search-engine?

Expand full comment
Mike Caulfield's avatar

I'd love to talk about that. I'm actually working on a method that will help. I think the key for me is for all the talk about LLMs and autocomplete and parrots, the core of most of these things is summary of real-time search, and we can therefor port a lot of lessons over.

I think the other thing is people can debate for ten years whether people should be using them or not, but over 800 million people are using them in some capacity right now, and it's just criminal to say "we educators are not going to show you how to do this more effectively ON PRINCIPLE". What!? Even if you think it AI is the devil (which I don't) such an attitude is dangerous just on harm reduction principles.

Expand full comment
Alexander Ortweiler's avatar

Hi, its not only about retraining. You mentioned RAG. As part of this approach many use web search or social media newsfeeds as sources. And this creates new noise that may distract.

Expand full comment
Marcos H's avatar

This mindset assumes you need to use LLMs at all for this- where as the better solution here is "google reverse image search" without an LLM, and/or read the reddit/twitter replies.

One should start with 'Is an LLM the right tool for this?" and the answer for anything search related for me and many others is "No." That is why disabling the AI feature on google is something many people want and actively hack their web browsers etc to achieve.

Expand full comment
Alexander Ortweiler's avatar

Thanks for the inspiring article and the iteration process it shows up.

There are even more obvious things, I don't know a "critical" LLM can ever figure out. Your example shows a perfectly squared portrait (instaready) without any additional artifacts (or elements) around. It is the same like noisy pictures or videos. When (additional) details are missing, your intuition should already warn you not to trust blindly.

Expand full comment
Gerben Wierda's avatar

"The term hallucination has become nearly worthless in the LLM discourse." Yes. And it was wrong to begin with. "Failed approximation" might have been better (as token statistics is used to approximate what might have come from understanding. These are indeed not 'failed approximations'. These are correct approximations based on faulty (search-provided) inputs.

Expand full comment
David H Feldman's avatar

Not too long ago I tested Chat with an obscure question.

Did David Feldman give boric acid to Ezra Pound in northern Italy when Pound was imprisoned by the US military.

How's that for obscure!

I was curious how deep a web search Chat would do. Well, it did a very shallow search and bombastically told me that there is absolutely no evidence that someone named David Feldman gave boric acid, or anything else, to Ezra Pound.

I smirked. My father was named David Feldman, and he was a US soldier assigned to a DTC (detention center for American soldiers who had committed crimes) outside of Pisa, Italy. Pound was arrested for his anti-American propaganda broadcasts and brought to the DTC. My father told me the story of Pound approaching him through a barrier fence to ask if he could slip into the dispensary to get boric acid that pound used on his eyes. My father did so, and had many chats with Pound at the fence over the next few weeks.

My father also wrote about the experience in the journal Paideuma, which is a journal about Ezra Pound and his work.

This journal is conveniently found on JSTOR, so a search of JSTOR might have uncovered the information. Chat's first pass did not see it, but that did not stop the LLM from disclaiming authoritatively!!

In the followup I pointed to the article in JSTOR, and Chat replied, essentially, "oops, I did not see that. You are correct that David Feldman gave boric acid to Ezra Pound and I will note that for future reference." Chat learned from me.

Expand full comment
Alexander Ortweiler's avatar

Be careful. It heavily depends on the design of the LLM application / "Chatbot". LLMs without web access use a URL string already as source (ie. www.domain.com/feldman-gave-boric-acid-to-pound) without really processing its content. One source for hallucinations. Btw. LLMs tend to confirm "mistakes" and apologize as part of there persona.

Expand full comment
David H Feldman's avatar

It was ChatGPT, and so I went back and asked chat essentially the same question. It "thought" and said it was looking up sources, and then spat out ...

Thought for 34 seconds

Short answer: Yes.

Ezra Pound was arrested by U.S. forces in Italy in May 1945 and held at the U.S. Detention and Training Centre (the camp near Pisa). (Wikipedia)

A U.S. soldier/guard named David Feldman later wrote a first-hand recollection titled “Ezra Pound: A Poet in a Cage” (Paideuma 10.2, 1981) describing Pound’s time at the camp. In that piece Feldman says he watched Pound in the cage, later found him moved to the medical area, and — when Pound was allowed to use the dispensary typewriter — approached him “just to say hello.” (paideuma.wordpress.com) (Vajra Chandrasekera)

Scholarly treatments of Pound’s detention (e.g., Michael Moody’s biography notes) also cite Feldman’s Paideuma memoir as one of several guard/officer recollections about Pound at the DTC near Pisa. (publicism.info)

If you’d like, I can try to locate a PDF or library copy of Feldman’s Paideuma article (1981) and pull out the exact passage.

Expand full comment