A Note on Deep Research Products and the Return of the Traditional Credibility Chain

With a more traditional value chain, enlarging the source pool of licensed material is going to be the competitive advantage

Feb 19, 2025

With Grok and Perplexity introducing “deep research” tools over the past week, these applications have become a standard piece of the LLM toolkit across all the major models. And it’s worth slowing down a minute and thinking about how these tools operate in a different model than many other LLM responses.

Let’s start with basic LLM output outside of Deep Research. The oldest LLM model is that the LLM swallows a bazillion sources, turns them into math, gets asked a question, and puts out a response. People then compare that response to the world it represents. If it says Finland has a per capita gun ownership that is greater than that in New York City, the question is whether that is a broadly supported claim or not. We also value qualification. If there is debate about the question, then it needs to be qualified — “Many experts believe” etc, something like that.

The relationship tested there is something like this:

STATE OF KNOWLEDGE (SOK) ==> STATEMENT ABOUT STATE OF KNOWLEDGE

To evaluate it, we evaluate whether the statement about the state of knowledge accurately represents the state of knowledge.

Over time, companies began playing with retroactively fitting links to sources to their statements about the world of knowledge. A statement is made about the state of knowledge, and then the system runs out to find evidence that that statement is reasonable and supportable. So this retroactive sourcing looks like this.

SOK ==> STATEMENT ABOUT SOK (SSOK) ==> CITATION SUPPORTING SSOK

The citation in this case is really about whether the citation supports that statement. So the credibility chain is whether the citation supports the SSOK and the SSOK represents the SOK.

With Deep Research, the way it is presented, the questions about credibility are structured differently, and more traditionally. You have a state of knowledge, from which you draw sources. Then you make statements about what those sources say, and have conclusions drawn from them.

And of course it’s not exactly like that, just as in research nothing is that linear. We often come to conclusions, and then seek sources, and making cases that sources are representative in practice. But from the point of view of evaluation the chain of credibility is different.

The first question is whether the sources consulted adequately represent the state of knowledge and are relevant to the topic

SOK ==>SOURCES

The second question is whether the statements about what those sources say accurately reflect what those sources are and what they say

SOK ==> SOURCES==>STATEMENTS

And finally, there is a synthesis of that into something that pulls such evidence either into an explanation or an argument. And the question there is whether such statements create a coherent and compelling explanation or argument greater than the sum of its parts which connect the presented evidence in a “warranted” way

SOK ==> SOURCES==>STATEMENTS==>SYNTHESIS

You can use the tools of argumentation theory to look at each piece of this chain, and at some point I probably will. But that first piece (SOK==>SOURCES) operates very differently than most AI credibility to date. The strength of your product is not the strength of your answer with some scattered nods at citation after the fact. The credibility of the project is based — from that first link of the chain — on whether you can provide sources that can be seen as capturing the current state-of-knowledge.

This requirement to provide named sources from which the validity of the rest of the chain flows means that the group to win the Deep Research wars is going to be the group that gets legitimate intellectual property rights to the the largest and broadest material not in the public domain. And because that first choice of sources is fundamental to the chain of validity, it’s not going to be a matter of surreptitiously ingesting such things, and then finding a random citation after the fact. The status of your product is as determined as much by your demonstrated library as your result.

Anyway, I hope that this becomes an opportunity to come up with a fairer relationship between sources and these platforms. I know there’s more than enough grounds for cynicism here, given how things have gone. But I think there is a power in expecting better that is stronger than the power of cynicism, and I hope recognizing how the value chain has shifted here can help us all to do that.

The End(s) of Argument

Discussion about this post