AI in Search Is Really the Race for the Overlay

AI is a tool; the real goal is to put information and context into new spaces. Also: the rise of "soft context".

Oct 06, 2024

There was an article in the MIT Technology Review that got panned recently by many on Bluesky. The upshot of it was that the best thing about new smart glasses was ubiquitous AI. And predictably people pointed out — given how poorly AI has functioned as a source of information compared to traditional search —how this seemed less of a feature and more of a bug.

I think this is the wrong take. I’ve been talking for a while about how the real way to see what’s going on with AI is not through the lens of AI, but through the lens of what I call the overlay. The overlay is a vision of search that is:

Just-in-time, and
Right-in-place

and it is an old vision of what the web could be. It’s in many ways the oldest vision of the web, going back to visionaries like Vannevar Bush.

It exists now, in ways most people don’t realize. A while back I went through a phase (which I may return to) of analyzing search trends. One thing I was looking at was search trends around TV shows. This fascinated me because I noticed the now common trend of pausing a TV show and trying to figure out where one had seen an actor before. Someone says “where do we know this person from?” and the phones come out. There’s a reason why one of the biggest searches around show is [show name] cast. And you can even see this pattern for smaller questions. For instance, here is a small but not insignificant set of people seeing a Simon Biles ad for Mounjaro during the Olympics and trying to find out whether Biles has a personal connection to diabetes:

I can’t say that every peak here is that commercial airing, but having gotten in the habit of checking trends after I saw a commercial on live TV I am almost certain it lines up directly after the airing.

One thing I’ve commented before is that the dream of smart glasses in the early 2010s has been realized in many ways — it’s just taken place on cell phones. Instead of looking through glasses at a flower we want to identify, we use Google Lens. Instead of a smart glasses informing us Star Trek like of what the song we are hearing is, we use Shazam, or more recently Android’s search anywhere tool. Unnoticed by many, the camera and microphone in our phones became the sort of AR interface in the Star Trek dream of smart glasses. We just weren’t wearing it on our face.

Search providers know this. There’s a reason why when you search a TV show on Google you get a panel of exactly the sort of “overlay” questions you might be asking while either watching a TV show or deciding whether to watch one.

The *Severance* Google panel showing options for overview, cast, watch show, reviews, and episode guide.

Who is this actor, where do we know them from? Is this show any good? Where can I find it? And for shows that are more episodic “What epsiode was that again?” A long time ago I remember an executive at Microsoft saying that with the Xbox they aimed to “own the living room.” But that’s not how it worked out. People watching TV on an Xbox don’t turn to their TV by and large to answer questions about what they see or hear, they turn to their phones, right there in the moment. Your phone owns the living room, just like it tends to own other spaces.

TV is for me just an interesting example of a phenomenon around searching for information or context. Just like any other industry you can see the search market in two ways. First, you can see the market as consisting of all searches currently executed and ask yourself how to get a bigger slice of that pie. Secondly you can see the market as the total of all possible searches, including ones that are not being made, and ask how to get people to use search more places. This particular insight is not revolutionary, it’s mundane. And it’s clear that the more just-in-time/right-in-place functionality is added the more people use search. And this has been a long, long trend. For instance, the rollout of location-based searches in the early 2010s along with GPS created the “near me” searches that are common now:

Relative search volume for “near me”. Use caution in looking at the 2022 fall-off, as it may be tied to a difference in how search data was compiled and tabulated.

Intrusiveness, the Overlay, and the rise of “soft context”

Those seeking the overlay often want quick context, and do not want to be dragged into a wider search. This is not always the case but is often the situations for where someone decides not to search for context. A person wonders what style of architecture a building is and what period it is from, but doesn’t want to make a research project out of it. A person mentions a book but can’t remember the author’s name. The same is true with personal search —- think of how many times the question of whether we are available to do something on a particular date, and despite having our phone the seeming small chore of opening our calendar and seeing if we are is too much for us, and we say something along the lines of “I think so, let me get back to you.”

I am the world’s biggest advocate of textual, GUI-based search. But there is a switching cost to it. There’s a lot of information that we are interested in, but not interested in enough that we would let us pull us out of whatever task flow we are in.

And that’s where AI comes in, and why, I think, it remains a focus of search even though it is currently quite poor compared to the relative maturity of document-based search.

If you want to increase the usefulness of the overlay, you have to increase the amount of synthesized answers that can be comprehended without an additional click, scanning and scrolling, etc. Traditional search is not great about this, which is many of the features added by search companies over the past years have been around things like featured snippets, direct answers, and the knowledge graph. There’s been a tendency to see these as attempts to keep someone on the search engine’s page, and I’m not here to tell you capitalism doesn’t exist. But they are also there to make sure that the large set of users that will search for something if and only if they can get a quick answer keep searching, rather than, well, not searching.

This is a big piece of the existing market, whether its asking “Who is alive from Gilligan’s Island cast?” (Tina Louise) or standing in line at the coffee place and wondering whether dark roast has more caffeine than light (it doesn’t, in fact it runs slightly the other way round). But it’s also an emergent market as more complex searches can be served.

And that’s where the MIT Technology Review article comes in. As Honan notes:

That’s why when I tried Snap’s new Spectacles a couple of weeks ago, I was less taken by the ability to simulate a golf green in the living room than I was with the way I could look out on the horizon, ask Snap’s AI agent about the tall ship I saw in the distance, and have it not only identify it but give me a brief description of it. Similarly, in The Verge Heath notes that the most impressive part of Meta’s Orion demo was when he looked at a set of ingredients and the glasses told him what they were and how to make a smoothie out of them.

Again, this is the overlay and it’s not new. It’s been a dream of the web since before there was a web, and it’s existed in a thousand ways before this. I’m not arguing wearing a camera on your face is the best way to get there. But the underlying driver of this is that a variety of software capabilities are reaching maturity (OCR, voice recognition) and a number of developing technologies are expanding the search space (LLMs, object recognition). But I would argue that it’s not really that the glasses are valuable because they give access to AI features. I think it’s more the other way around, that smart glasses have always been about the overlay, and these technologies finally offer a possibility of that benefit. That said, it’s not clear to me the future here is in glasses. I just think it’s much bigger than that.

The dream of smart glasses I think — with the above excerpt being an example — is a dream of what I call “soft context”. I think this term might be already used in a very narrow sense in machine learning, but I mean it in a broader sense. An example for me is setting up your phone where it shows on the lock screen the current song playing wherever you are. Because it knows your current environment it can retrieve relevant context without you specifically asking for it — and without you breaking the flow of what you’re currently doing to go into computer operator mode. You look down and see the song name and go, oh, that’s right, The Shins. It’s a spectrum for me, but the tall ship example in Honan’s article is similar. You’re looking at a ship, you say what’s that while still looking at the ship. Maybe from there you go into a search mode and learn more, at which point your activity becomes “researching a ship” (hard context). But there’s just a lot of points in our lives where we’ll seek context — as long as it does not pull us away from what we are doing.

For me, this is not then a story about glasses or AI. It’s about how answers that require less mental effort to either ask, comprehend, or synthesize continue to be the growing market. Just as ubiquitous availability of GPS positioning on phones opened up a whole new overlay in the early 2010s, the hope is that advanced image recognition and context sensitivity combined with LLM responses can open up whole other realms of search. There’s a whole bunch of issues with this, from accuracy to privacy. I’m not going to parse through those here. I’ll add that I am not sure that LLM-based response is the right way to go with a lot of this, at least as the technology presently operates. But looking at it all through this perspective — where can information needs be served where people are currently not searching, the way the pace of support for this has outstripped the capabilities of it makes a bit more sense to me, even where the execution and choice of tools remains a bit of a disappointment.

Bryan Alexander

Oct 7

Very interesting take, Mike. That overlay will be good enough for many uses - and the history of the digital world shows that people love good enough.

Years ago I asked Neal Stephenson which device would succeed smartphones. Glasses was his immediate answer. We're heading that way, no?

Expand full comment

Sifu Dai

Oct 8

Great observations to consider. Did you catch this?

https://www.404media.co/someone-put-facial-recognition-tech-onto-metas-smart-glasses-to-instantly-dox-strangers/

1 more comment...

The End(s) of Argument

Discussion about this post