The Video Apocalypse Is Not At Hand. We've Just Been Teaching Students the Wrong Skills.
Sora 2 shows that "looking for clues" is useless information literacy. Can we now focus on durable understandings?
Sora 2 has been out a while now, and if you read this blog I doubt you need me to walk through examples of it with you. It’s been hard to miss. Suffice it to say that while it has some quirks, it is getting very close to indistinguishable from real video, at least for short snippets that don’t push on its flaws. And of course it will get better.
So what’s the likely impact on online information literacy?
A few years ago so many workshops on AI misinformation told people to “look for the clues” to spot AI. Six fingers. Lack of believable shadows. Funky background distortions.
People would often call me, the information literacy guy, for three bullet points to “spot AI” they could put at the end of a story on some AI photo that went viral. To their frustration I’d usually refuse, saying that such lessons weren’t durable. I’d ask that they focus on where the video came from (SIFT’s Investigate the Source move), and whether other reporting or information backed up the details depicted (SIFT’s Find Better Coverage). Sometimes I’d get them to put this lateral reading advice in instead, and sometimes they’d just go to another expert and get a quote about six fingers or looking for mangled text.
But the reason why Sam Wineburg and I were so insistent that such spot-the-six-fingers pedagogy was garbage was that we had been here before. We knew that early online information literacy around text told people to look for spelling errors (until people got spell-check), clean layouts (until people got WordPress), and “.com” extensions (until people — actually this one was dumb from the start). The idea of these early and completely ineffective lessons was if you knew the “clues” to look for you could recognize whether something was real or not, true or not, and so on.
It went about as well as you’d expect. Telling people to look for spelling errors to identify misinformation seems bizarre now, though any person can go to the Internet Archive and browse a wide range of mid-2000s library sites telling people knowing “i before e except after c” would save them. Now, of course, we know such clues quickly disappeared. Technology changes. And the shame of that — again, seen in our own research and practice — was that students taught clues to look for in those early days of the web did not stop looking for those clues when they disappeared. The bad clues taught were so resistant to updating that future teachers going through college in the early 2000s were passing them onto students in 2020, two decades later and at least a decade and a half after any of the clues had real value.
The good news is in the case of synthetic video I do not think the bad lessons will stick. Anyone looking for six fingers in a Sora 2 video will be roundly mocked, and anyone insisting a Nano Banana photo with perfect text most be real because the text is not garbled will be rightly set straight.
The better alternative to “spotting” AI was always the lateral reading focus on provenance and context. Who shot this video? When? What was the purpose? Can we trace it to an authoritative, trustworthy source, a useful caption, or something else? And if we can’t do that and veracity is important to us, could we find something else to put our attention towards? Those are durable insights about information, ones that don’t decay over time.
Which brings us to the bad news. And it’s really bad. While students may not be looking for six fingers anymore, they have been raised — once again, just as in the early 2000s — on the belief that what will save them is the clues. And now that the clues are gone what you see in nihilism. Having never been given the understandings that provenance and context are what actually matter the prevalent belief I see expressed online is now that we will never be able to tell the real from the fake again, that the information environment is a hall of mirrors, and what’s the point anyway.
Of course that’s true if your idea of informing yourself is doomscrolling TikTok videos of the latest outrage, in an environment where provenance is often obscured or hidden, without ever getting out of your own feed.
But it is not true more generally.
A few weeks ago, I watched a video of a federal agent tear-gassing a Logan Square neighborhood in Chicago for what seemed no good reason other than a scooter blocking his way. People scatter, coughing, then — interestingly — regroup, not cowed by the attack. It was shared by someone I’ve followed a long time online who used to live in Logan Square and recognized if not the exact corner, the neighborhood more generally. Clicking on the link, I saw that it was not just from any subreddit, but from the Logan Square neighborhood subreddit, where people would know a fake. Others who were there took to multiple channels of social media to describe the event in text. Somewhat later the local press began to cover it, noting a local elementary moved recess inside to avoid the children suffering effects of tear gas, citing a Chicago Public Schools response to a query.
That video is knowable, not because people are puzzling over the number of fingers on the people getting gassed or mapping out the motion physics of the guy on the bicycle. It is knowable because of what Sam and I call “expensive signals” in our book Verified — rich networks of intersecting reference, historical reputation, and converging media. It’s easy now to get Sora to produce a guy on a bicycle. It’s harder for AI to create a time machine that introduces me to the work of Chicago native Dan Sinker (the person that brought my attention to it) in the mid-2010s, to create a legion of people on a local reddit forum many with histories and popular posts talking about local Chicago food and events who are confirming the event, to produce a statement by the Chicago Public Schools about bringing kids inside to avoid chemical exposure.
Even if someone tries to layer in a fake video into the real ones (which sometimes does happen), the question becomes whether that individual video sits within this tapestry of rich reputational signals or outside it. And the question for the person that cares about this stuff is not whether all videos are true, but which of the videos they see is the one most suitable to share. The answer should be easy — the one for which the supporting provenance is good, and the details reinforced elsewhere.
All this is possible for ordinary people to master, or at least master enough to help them to ethically advocate for the causes and people they care about. But the students taught clue-seeking know none of this. They are either looking for clues in the video, sharing recklessly, or worse throwing up their hands with a “who can really say” abandon.
The result so far has been not smarter consumer of information, but a nihilistic citizenry. This time around we can do better.


When I hear these dystopian tales, I am always wondering - what are the prompts and what are the tools that produce these hyper-realistic deep fakes that have potential to end up in the news? I am still struggling to use generative AI for video case studies in ways that are natural enough that they support learning and not distract from it (because in theory that is a great use case for education, social work, nursing, medicine and can enrich role play and other forms of direct practice). Likewise for scripting, it really takes a lot of work and editing to create something natural sounding.
En fuego 👏🏽