Differences in link hallucination and source…

Jun 1

I've been waiting for someone else to talk about this, but I guess it will be me

8 Comments

Another superb post. I just don't think many folks are testing these models in the way that you are, but I would think that someone, somewhere should be doing this. I've used your prompt in Claude multiple times and the results are fantastic. What do you make of the new Google AI Search mode? Ridiculously primitive compared to what you're doing, but do you think it will continue to get better and better? As you say, it would seem like many of these problems should be fixable. Nice work!

Expand full comment

aholan@poynter.org

Jun 2

Great post. I am very interested in thinking through how we can do more systematic testing of hallucinations and sourcing.

Expand full comment

Heide Estes

Jun 29

Thanks so much for this comprehensive discussion. This is such important work.

Expand full comment

Roger Schibli

Jun 5

Love this so much!

Expand full comment

Dean Lingley

Jun 3

Great post Mike! Thanks for your work and sharing! Interesting on the link hallucinations in gemini breaking around 10, I saw a similar issue when asking it to create a youtube music playlist for me from a list of songs. It would get the first 10 perfect and then start going off the rails... must be something with the 10 number...

Expand full comment

Reply (1)

Mike Caulfield

Jun 5

That's very interesting actually. I wonder if maybe there's a background limit on page fetches and after that it YOLO's it?

Expand full comment

Denis Setiawan

Jun 2

Have you tried in notebook LM? Will the result different with gemini?

Expand full comment

Reply (1)

Mike Caulfield

Jun 3

Notebook LM wouldn't really be able to do this -- the challenge of fact-checking is you don't know what documents you need as sources and how to weight them, which is one reason why things that don't use SIFT Toolbox fail.

Expand full comment

The End(s) of Argument

Differences in link hallucination and source…