14 Comments
User's avatar
Peter Northup's avatar

Extra heart for footnote 4.

Expand full comment
Ben's avatar

Hadn't thought about a modal logic analysis of LLM truthtropic dialogue. Very appealing conceptual space.

Expand full comment
KB's avatar

Here's an obnoxious correction for you: the concept of the common ground wasn't introduced by Stalnaker, it was introduced by Grice (arguably it was introduced by Peirce, but that's a whole other debate).

Expand full comment
Mike Caulfield's avatar

I oversimplified it, and see now I presented it wrongly with stalnaker introducing the concept not the specifc model,thank you for being annoying and I will fix.

Expand full comment
Zane's avatar

If the initial answer was wrong, why would you trust the follow-up information? Once I see a wrong answer, I'm skeptical of anything else from that source on the subject - I wouldn't ask for more details.

Expand full comment
Mike Caulfield's avatar

AI is not a source. If you stop thinking of it as a source it will be clear to you that just as with search, your second query is likely to return better results than the first. If you continue to think of it as a source and not a technology your intuitions will be incorrect.

Expand full comment
Zane's avatar

Ahhhh. Of course. You've said that before. So part of my problem is I think, "let me ask Claude". I should think "let me Claude it", or something.

Expand full comment
Mike Caulfield's avatar

I might steal that! Because that's exactly it.

It is weird, I think conceptualizing these systems as human isn't *always* bad, but on the source thing it is because while it can be thought to "process" stuff all the stuff it returns is just ingredients from elsewhere baked into something custom. If you see it's not exactly doing what you want process wise (like not surfacing sources or choosing weird experts) you can get a better tasting cake by critiquing its process or ingredient selection even if you can't taste the cake (i.e. know the truth of the result). Not sure that analogy works. Sorry if my initial answer was curt.

Expand full comment
Zane's avatar

It's yours. You weren't curt. Thanks!

Expand full comment
Simpson, Erik's avatar

This is a fascinating post, and I've been reading your arguments about these technologies with keen attention. At the risk of being the "Cool defense of wrong answers, bro" person, I will share that your posts have helped me appreciate the extraordinary powers of LLMs, but they also show me a technology that does not fail well--that, in fact, fails shockingly badly. I accept that your processes often make those failures interesting and valuable, but I also know that virtually nobody is using the tools as you do. Most of your posts have a stage where you say something like "OK, if you stopped here, this would be very bad." (In the post above, you express it as "This is the sort of answer we often mock with LLMs, because people think the point of dialogic systems is to give you perfectly correct answers."

But the problem is that people do indeed think that! And they're not being silly or unreasonable. They're being told to think it, over and over again, by the marketing and then by the tools themselves. Looking at the exact screenshot you're describing at that moment. Why would a user have any impression other than the goal of the system being correctness?

So yes, if you stopped at that point, the result would be bad. And almost everyone is stopping at that point! If this were any other product designed for mass use, those results--a tendency to produce routinely bad results with no explicit indication of their badness, requiring exceptional skill and knowledge to mitigate the effect of the badness--would indicate a failed product. When I see you methods, I think, wow, these systems when used by Mike Cualfield are amazing. And when used by almost all the humans who are using them--who are using them as the design most directly encourages them to be used--they are even more destructive than I had imagined.

Expand full comment
Mike Caulfield's avatar

So I agree with your first point, but not the second. I do think people are often stopping at step one. That's maybe not as bad as it seems, because I tend to cherry-pick instances of failure because one-round successes are not particularly educational. I also drill down on smaller details many hardly would notice, and so on. But I agree it's just wrong to take answers at face value without grounding them somewhere else.

Where I differ is that I think actually all information technologies fail if this is the standard. I think were you to look up something and pull a book from the library on it you would still have to say where did this answer come from and how can I be more sure of it. Search is of course famously a mixed bag -- on a given page there will be many wrong answers and many right answers and for a lot of questions lots of gray area. And it does take some skill.

If you look at a lot of my examples it's telling that very few others get it right. Most answers that are mucked up on LLM response are mucked up because most everyone on the internet gets it wrong and the LLM reads that signal as a consensus. Consequently, a person turning to the internet, blogs, or in some cases newspapers and journals has no chance at getting it right.

The big difference to me is not the capability -- I actually am quite sure an average person looking for an answer will do better with an LLM than search, having studied people using search for a very long time. The big difference to me --as your first point states -- is framing. And if the framing can change I don't think that the skills are any harder than anything else people have to learn online. I often show virtuoso stuff, because I think it is instructive, but I did a test a while back and I found that any evidence based follow up -- as simple as "why did you say x?" will on average improve a bad result. I think that's actually easier to remember than some of the SIFT stuff, like "search for the URL in Wikipedia" etc.

I'll be getting to that stuff soon - I've been out of the classroom lately and haven't had a time to refine really simple moves, but it's coming. But your question is a good push to do that sooner than later.

Expand full comment
Simpson, Erik's avatar

I appreciate very much your taking the time to answer my question in such detail. I continue to wrestle with these issues for myself and for my students, and I'm grateful for your thoughtful writing.

Expand full comment
Simpson, Erik's avatar

Ugh, I apologize for the typo in your name at the end. No edit function.

Expand full comment
Prof. Attilio's avatar

I hope you will not hate this comment, but I know you probably will.

I appreciate your articles, I see you have a deep understanding of LLM tools, and see something other people don't see. However I struggle a bit with your articles. 1) the first visual is a wall of text, and for me that I receive 30 or 40 messages like this, I really have to dig into the text to find what you want to say and learn from it. 2) I'm not an English native speaker, so my quick reading (skimming) in other languages is more efficient than in English. I think it would DRAMATICALLY help if you clearly put the key points of your articles at the beginning (the so called executive summary) or at the end, like "conclusions". A few times I had to ask AI to summarize the text for me and I also asked to tell me in one sentence which were the practical tips.

Sorry for being direct, but this is my reality. If you think it doesn't apply to other readers just ignore this comment.

Expand full comment