Your reality is already augmented: Answering some questions about my overlay post.
The overlay is about a particular practice and contextual relation that is both increasingly frequent and capturable.
Some questions and answers regarding my overlay post.
What qualifies as an “overlay”?
The overlay is a system that retrieves a set of relevant information and possible paths for exploration regarding something currently in front of a user. The thing in front of the user might be digital, or it might be something non-digital that is capable of being processed by the digital world.
Is the overlay new?
Not at all. It was the core idea of the Memex proposed by Bush in 1945. It was the dream of Ted Nelson, the obsession of Alan Kay. It was the basis of early hypertext and has been a concern of search from an early time. It’s the basis of most visions of Augmented Reality (AR), from Robocop to — well, just about any AR scenario. When you reverse image search an image in your browser, that’s the overlay. When you use your phone camera to identify a flower, that’s the overlay.
Isn’t this just search context by a different name?
No. From my perspective, the overlay is a particular use of search that has grown more prominent as two things have happened. First, we live much more of our life online, often through digital artifacts. Second, we have 24/7 access to the internet via our phones.
That’s resulted in a distinct shift. For instance, say I am looking at buying headphones in 2003. I get on the internet and research headphones, come to some conclusions. I go to Best Buy and purchase them. That’s search, followed by a purchase.
Now in 2024, I have headphones in my Amazon cart and I’m about to hit purchase and I have sudden doubts. I take the name of the headphones and search it. Make a final round of checks. That’s using Google as an overlay. It’s just-in-time and it’s right-in-place.
I’ll give another example from the article. In 1998 I wondered for weeks what happened to Emma Peel’s husband in The Avengers series. When I happened to be in front of a search engine I searched on that. Turns out his plane was lost, presumed dead. To me this is just general search.
More recently, last week I was re-watching Battlestar Galactica, and we didn’t have access to the pilot episode, just the episodes that followed. About 18 episodes in, a plot point was raised about something a character on the show had done presumably in the pilot, some horrible act. I did what we all do now. I paused the show, typed in “what'd gaius do on caprica”, got my answer, un-paused the show and continued on. That’s an overlay.
Isn’t this just “AR without the googles?”
Yes, exactly. AR is already here, as long as you don’t get hung up on it not looking like it does in science fiction.
People don’t realize this because they’ve convinced themselves that the digital artifacts we interact with on a daily basis don’t share the same reality as the physical ones. But consider this interface for a moment, showing information about an online photo:
A search on this photo shows a person’s ID (Babe Ruth), some similar photos, and related photos. It provides an option to find the original, and option for a general search on Babe Ruth, as well as a variety of other elements.
This may seem like an odd point, but imagine this was a physical photo you were looking at, or a picture in a magazine, and your AR glasses provided the exact same information you see above. Or even if your phone, when pointed at a photo in a magazine, provided the same information. This would feel like a very Robocop, sci-fi moment, because the photograph is “real”.
But this distinction doesn’t make a lot of sense. “Real life” for you nowadays isn’t poring over physical photos or leafing through magazines. It’s looking at online artifacts most of the time. And I think because of that — because of this artificial distinction between the online and offline — the AR-ness of so much of what we do online is being missed. It may be a computer contextualizing other things on a computer, but there’s really not much functional difference.
In a similar vein, I think where successes in what other people call AR — mapping context onto the physical world — are going to come about is in slowly expanding the searchable environment. If you look at some of those Google I/O examples, I think it’s fair to say that after a false start with the idea of Google Glasses back a decade ago, we’re seeing another attempt to broaden the searchable environment, in a way that makes more sense. You don’t need glasses, you just need a broader and more fluid array of entry points into a search process that is already acting very, well, “overlayish”.
Why does this distinction of the “overlay” from traditional search matter to you?
Primarily because the overlay is a powerful way to think about sense-making, especially once you strip it of its sci-fi mystique. I talk about this part in the previous post and don’t need to belabor it.
The second reason is that overlays often involve a very specific and very discoverable search context. If a search knows the photo you’re looking at or the article you are reading, that’s likely to be a determinative piece of context. Likewise, if I am searching pesticides fresh from putting a can of bug spray in my Amazon cart I am likely looking for very different information from a person searching pesticide after encountering an article about the historical impact of DDT.1
If the ability to provide this context can be made more natural people will be more likely to seek relevant context. And if the context can be made more discoverable to the search algorithm, people might get better results. I talk sometimes about better sense-making arcs and this is what I mean, that as you pass in and out of Google you are taking a built context of the task.
That brings me to my third reason, which is that on a number of fronts Google has been working on various inputs into a search process that provide potential overlays with context. In Chrome, that might be how the browser looks at the URL and finds information on the site. In Android that might be Circle to Search. In the I/O event, the idea of searching with camera video, is also an overlay, with a physical object. AR without the glasses, etc.
So it’s a related to a specific and valuable type of context that is increasingly being captured, that’s interesting to me.
Why is the overlay different from the “answer box”?
It’s not always. If I am driving on a road and I ask “What’s the speed limit here?” and I get an answer, that’s both an overlay and an answer box.
Likewise, some things can be answer box but not overlays. If I’m curious about sports scores (which a lot of Google traffic is) and ask who won the Euro matches today, that’s an answer box, but not an overlay.
The aspects of an overlay as I see it are:
A search context that is immediate and visible to search, related to the thing in front of you, the physical situation or location you are in, etc.
Where appropriate, a “chorus of voices” as a response, and a way to navigate those voices.
Where appropriate, a series of forking paths that can be followed to drill down or up on a question.
The first element is necessary under any definition of an overlay. The second two elements might be seen as more personal to my interpretation, and more drawn from the Bush-Licklider-Kay-Nelson tradition. But I think they are necessary to a good overlay, because they are mechanisms by which users identify the context they believe they need or benefit from, and they are necessary in cases where context is complex.
That’s probably too big a claim to defend here, but I think the examples in my earlier post show why when this works it feels like magic.
Regarding the answer box vs overlay distinction, they can overlap; the question is what is the controlling idea of the interface. More on that in the last section.
Is Microsoft Copilot an overlay?
Yep, a lot of the time. And this gets to a point I’d like to clarify — we’re already engaged in the overlay wars, or, if you prefer, “AR without the glasses.” Different companies are working from different strengths.
I was using Edge the other day and looking at a spec sheet for a particular video switcher in the browser. I selected the model and fed it to Copilot. Copilot opened a side window and wrote up a description of the item and its features, and gave me some links to follow. It served up some ads. It suggested some questions to ask.
If you read my original post, these are a set of features that Google has added to their search page over the years, but powered with a different model that didn’t puree sources into a single answer. So my argument isn’t “things that are overlays are good and things that aren’t are bad”. Rather, I want us to think about what makes a good overlay.2
What makes a good overlay?
My admitted bias here is that the Bush-Licklider-Engelbart-Kay (and whoever you want to add in there) computing revolution is not something to throw away lightly. It is the chain of revolutions in computing behind some of the biggest productivity gains in the history of humankind. As a revolution it was so successful that people are not realizing how retrograde many conceptions of AI are. Every person typing out their thoughts is typing out their thoughts in the GUI environment created by those and related pioneers.
There’s not enough space to go into that here, but a big piece of what that revolution was about was replacing serial input to output command line interfaces with interfaces that made use of a wide variety of human capacities and interactions. Our ability to scan and select, our ability to manipulate objects in intuitive ways, to forage, to hold in the mind and evaluate competing ideas. To model what we know, then test that model with a question or a spreadsheet. To tweak that model.
My worry is that the “answer box” mentality flattens all of that. The world of personal computing opened up a world where we could make sense of things using a fuller array of our cognitive capabilities. At their best these interfaces and approaches allowed us to see and visualize a chorus of voices, to form mental maps of the topography of a subject. Sometimes that resulted in information overload, and there’s ways that tapping into immediate context more efficiently could address that. But a lot of this stuff seems to be going backwards into a pre-Licklider conception of computing. Again, that’s a much larger piece. But for me the idea of the “overlay” embraces the traditions of the models I think have proven most historically powerful, and provides a better way of thinking about where the true advances from any technology — including AI — might be found.
Thanks to Henry Farrell for inspiring this example.
This is actually an interesting example, because it’s a bit deceptive. I’m on a spec/features sheet, I search for things about this item, and most of the stuff I get is from the sheet I am already on. A good overlay would know I have a spec sheet already and seek to augment what I know, either explicitly summarizing this document or returning information that was not spec-sheet information.
Interesting, both this and the prior post. I'm not sure how much these examples are difference from older versions of overlays that we'd call a personal library, or an encyclopedia my parents had that I used on occasion as a child even when I wasn't working on a paper I was late on. To some extent, all of these are ways of supplementing thin mental references when we have some trigger to expand on it.
Perhaps the simplest version of an overlay using technology is the idea of ordering things by where they are in either time or space. Timelines! Maps! We didn't need Bush to see the value of consolidated information as quick reference. Encylopedias! Dictionaries!
One more layer: I suspect psychologists would point out that research on memory consolidation suggests that ALL permanent memories themselves are overlays.