What a simple question about My Cousin Vinny taught me about LLMs
If you want to see an LLM really malfunction, ask it to reconstruct a story from pieces. Also: Should this be a benchmark?
We live in weird times.
I tried to start this post repeatedly, with half a dozen different openings. And that’s what I end up with: we live in weird times.
Existing benchmarks for LLM performance are getting saturated. People run around saying meaningless things like “This AI has moved from college senior level intelligence to PhD level intelligence.” Other people hear these things and nod their heads. I look around and see the nodding and hear these meaningless phrases and think I’m losing my mind.
Maybe you do too?
I don’t see myself as “pro-AI” or “anti-AI.” But I understand how the AI doubters feel. They read about the myriad successes of LLMs, and then they ask an LLM a question about something they actually know and get a bizarre answer. Pressed later for an example of an error, they can’t remember, but they are pretty sure they see errors in output all the time. People tell them they are wrong and imagining things, that these issues were solved long ago. They feel a bit gaslit. I’m here to tell you the errors they see are real.
I also know how the people boosting AI feel. If you’ve been paying attention to response quality over the last year it’s been an amazing year of progress. Answers have gotten remarkably better. Yet AI doubters are talking about stalling progress. How is that possible?
I think both of these issues stem — at least partially — from having a set of metrics that don’t capture well the daily experience of these systems. Both in how they still malfunction horribly at weirdly basic things and in how they have dramatically improved over time. And I want to tell that story in a way that I hope will lead to greater détente by discussing the 1992 comedy My Cousin Vinny.
A Simple Question
Let’s ask an LLM what seems like a simple question. I swear I have not chosen this question as a trick. I haven’t tried to find a way to fool AI for hours. This isn’t “glue on pizza” nonsense or “How many rocks should I eat a day?” both of which are questions which are, in my opinion, overly constructed.
This is a real question of the sort many people ask in a day:
What were Marisa Tomei’s most famous quotes from My Cousin Vinny and what was the context?
If you want to see where I got interested in this, there’s a video I made last month where I was looking at a question about the Academy Awards, and I remembered that whole debate in 1993 about whether Marisa Tomei getting the best supporting actress award was a mistake. Like, literally, a mistake, as in they read the wrong name and didn’t want to take it back.
I got curious about this story. It seemed a really hurtful story, and I wanted a list of links to video with some plot context to refresh my 30 year old memory of that performance.
I prettied up the question to help AI with it. Truthfully your average question would probably just be Marisa Tomei quotes My Cousin Vinny top or something like that. But hey, it’s AI. Give it a more defined question.
Now here’s where talking about AI error gets hard. In order for you to evaluate the answer AI gives, you have to learn about the subject. And that takes so much more time than you’d think.
One thing I like about this question is at least learning the background of this is going to be fun, because on a bit of investigation it turns out Marisa Tomei is a) absolutely hilarious and b) deserved the hell out of that Oscar.
Marisa Fucking Tomei
So the premise of My Cousin Vinny is pretty simple. A couple of city kids (including baby-faced Karate Kid Ralph Macchio himself) get accused of murder while on a road trip through the Deep South. Never fear, Macchio has a cousin who is a lawyer in Brooklyn: Joe Pesci. But Pesci not only is a fish out of water in the Deep South. He is newly admitted to the bar and has never actually been in a courtroom.
So that’s the premise.
Maybe you groan, because we’ve seen dozens of bad films like this. So have I. The beginning of the film is nothing special. Later on, of course, Pesci is pretty great in the courtroom scenes and the courtroom reveals are quite good. But what makes this film sparkle, in large part, is the performance of Marisa Tomei as Pesci’s out-of-work hair dresser and sometimes mechanic girlfriend. Pesci is alternately ridiculously overconfident and riddled with doubt. Tomei provides the perfect foil for him. You see this a bit in an early exchange people like to share online: “Yeah, you blend…”
You see it when Pesci is asking her what pants he should wear for hunting and she’s horrified that he’s going to shoot animals just to bond with the prosecutor.
You see it in the famous “biological clock” line, where she complains that she’s been waiting ten years for him to propose stamping her foot while saying “meanwhile my biological clock is TICKing like THIS.”1
And of course, you see it when the prosecutor questions whether she’s really an expert on cars enough to testify about a discrepancy in tire tracks. By that point in the film, we all feel like Joe Pesci as he taps Macchio saying, “watch this”:
Oh, and she gives the testimony the turns the case, by the way, noting that the tracks the murderer’s car left (two equal-length tire marks that stayed flat and even on an uneven surface) would be impossible in the Buick Skylark they were driving, given the technologies of the car.
Mona Lisa Vito: The car that made these two, equal-length tire marks had positraction. You can’t make those marks without positraction, which was not available on the ‘64 Buick Skylark!
Vinny Gambini: And why not? What is positraction?
Mona Lisa Vito: It’s a limited slip differential which distributes power equally to both the right and left tires. The ‘64 Skylark had a regular differential, which, anyone who’s been stuck in the mud in Alabama knows, you step on the gas, one tire spins, the other tire does nothing.
Juror #1: That’s right.
Vinny Gambini: Is that it?
Mona Lisa Vito: No, there’s more! You see? When the left tire mark goes up on the curb and the right tire mark stays flat and even? Well, the ‘64 Skylark had a solid rear axle, so when the left tire would go up on the curb, the right tire would tilt out and ride along its edge. But that didn’t happen here.
Maybe this is your deal, and maybe it isn’t. It’s admittedly corny. I had an absolute blast watching it for “research” here, but would not say I am a My Cousin Vinny superfan. It just started as an interesting question: how hard is it for an LLM to get this stuff right?
The first thing I’ll tell you is that it is possible for a human to do this, though it’s harder than you might think. In fact, for a person with no knowledge of the film to assemble these quotes and contextualize them is actually quite hard and time-consuming. I’ve been trying it with a couple foreign films like La Haine,2 where I don’t know anything, and it’s a really instructive process.3
Some parts are easy. If you wanted to know what the famous lines were, we could look at the Rotten Tomatoes/Fandango YouTube clips, searching for My Cousin Vinny (incidentally as proof this is not niche, look at those view counts):
You could look on the IMDB page.
The IMDB quotes are fan edited, but are pretty accurate. The full script is on Script Slug — or at least a particular revision of it. There’s a bunch of Reddit threads on both the film and the characters.4
But to truly contextualize the quotes you have to weave the quotes together with the plot as described on Wikipedia and then look at people’s commentary on the quote (such as the long reddit thread walking through her limited slip differential explanation). You have to match lines to context and context to motivation. You have to get little details right. I researched this My Cousin Vinny issue for a good long time, then watched the film. And despite all that research, it wasn’t until then I noticed little things I could have known from online materials and just hadn’t put together. For example, Tomei doesn’t truly solve the case. It’s Pesci who notices the tire mark discrepancy and puts Tomei on the stand knowing that she’ll not only come to the same conclusion independently, but that she’ll bring a depth of knowledge and authority to the subject that he — with his intuition — couldn’t touch.
That’s a really fine-grained distinction that requires a good read of the script, but it’s also a crucial point. Pesci isn’t a buffoon needing saving, but he does need a trusted partner. In fact, there’s a joke about this in the end, where Pesci says he wanted to win this case on his own, and in what I feel is her most underrated line Tomei says:
You know, this could be a sign of things to come. You win all your cases, but with somebody else’s help, right? You win case after case, and then afterwards you have to go up to somebody and you have to say, “Thank you.” [pauses, then mockingly] Oh, my God, what a fucking nightmare!
It’s both a possible task to get these things right and a more difficult one than you might think. So let’s see how a variety of models do.
Here’s our six noteworthy quotes in the film:
“My biological clock is ticking like this” — honestly not the funniest line, but one that resonated with the audience a lot and became a repeated phrase at the time.
The deer monologue. Just a beautiful juxtaposition of tenderness and profanity.
Bullshit question. Who doesn’t love seeing Tomei whallop this guy?
“Oh yeah, you blend.” Again, not the best line but eminently re-quotable.
The climatic explanation in court. Not as quotable, but people love watching it. And talking about it! There are long reddit threads on the mechanics of it!
One other line I didn’t mention here where in an argument with Joe Pesci that is hard to describe she argues that the sink is dripping because it’s broken, and in the process makes up a “laboratory edition” wrench that was certified by the state and federal department of weights and measures to be “dead on balls accurate” It’s an endearing scene that is meant to show how arguing is their love language, but really hard to explain.
There’s a couple smaller ones here, nice to have but I don’t think essential. A joke about being a hostile witness. There’s a crucial line to the plot where Pesci asks her on the stand if the defense’s case is correct, and she replies (as Pesci anticipates) that Pesci’s case is wrong, which (as Pesci anticipates) allows him to introduce the much stronger case that it wasn’t that two identical cars were mixed up, but that an entirely different car left those marks.
Now let’s get to experimenting. I honestly don’t know what we’ll find but hoping it will be fun.
Older Models
So let’s run our query. In a perfect world I’d set this up to run ten times and report high-level stats on it, but this is just an initial ramble. And I should stress that it’s a ramble, as I work through these ideas, if you don’t like getting into the weeds a bit this might not be the piece for you.
Here’s the query again. We’ll run it first against the current version of 4o:
What were Marisa Tomei’s most famous quotes from My Cousin Vinny and what was the context?
So, just glance at the piece below from 4o. You don’t have to read it all. Just skim it like you normally would. (Full 4o response here)
🗣️ “My biological clock is ticking like this!” (stomps foot)
Context:
Lisa delivers this line in a dramatic and emotional moment while arguing with Vinny (Joe Pesci) about their relationship and his inability to commit. She is frustrated that they’re stuck in Alabama trying to resolve a murder case involving his cousin, while she wants to move forward with her life. The line humorously captures her desperation, frustration, and passion all at once—and became one of the film’s most quoted moments.🛠️ “It’s a bullshit question. It’s impossible to answer.”
Context:
This line occurs during Lisa’s surprise testimony in court. The prosecutor asks her a seemingly simple question about how long it takes for tire marks to disappear, intending to discredit her. She shuts him down with this line and follows it up with a thorough technical explanation, demonstrating her deep knowledge of cars. It turns the tide of the case.🚗 “The 1964 Skylark had a solid rear axle.”
Context:
This is the pivotal quote from her courtroom testimony. Lisa proves the prosecution’s case is flawed by identifying that the tire tracks found at the crime scene could not have been made by the defendants’ car (a 1964 Skylark) because it had a solid rear axle, whereas the actual getaway car (a 1963 Pontiac Tempest) had an independent rear suspension. This technical distinction exonerates Vinny’s cousin and his friend.👠 “Oh yeah, you blend.”
Context:
Lisa says this sarcastically to Vinny when he tries to dress in a camouflage outfit to “blend in” with the locals. The humor comes from the fact that Vinny sticks out in Alabama like a sore thumb, no matter what he wears. This line showcases her sharp wit and their constant bickering dynamic.🕊️ “Imagine you’re a deer. You’re prancing along…”
Context:
Lisa uses this amusing analogy while explaining to the court how ridiculous it is to think a hunter could confuse a person for a deer. She mimics a deer frolicking through the woods before describing it getting “blown to smithereens.” The analogy is vivid, funny, and persuasive—a classic example of her unique courtroom style.
Pretty good, right? When I first saw this generate, I thought it wasn’t that bad. Then I put it into my LLM-based fact-checker. (Results)
It’s actually horrible.
First, the good. Answer one is a bit off on what their conversation is about. She’s not frustrated by the fact they are in Alabama as much as she needs Vinny to succeed and get his life on track (including winning his case) so they can take their life to the next stage. She’s been waiting for him to get his life together as he has failed the bar repeatedly. But this is minor, really.
The third quote, on the rear axle, could go into a bit more detail — it’s not just the axle, it’s the axle plus the “positraction”, which is the more famous part of the quote. But this is minor, and I think it’s partially a result of restrictions on LLM output that cap the numbers of characters of quotes from film scripts that can be reproduced for copyright reasons (in reasoning traces I’ve seen ChatGPT both try to enforce a 90 character limit or a 25 word limit).5
The rest of it is actually a mess:
It hallucinates the question asked by the prosecutor. The prosecutor in the film asks about the timing on a 327 engine, but in a car that predates that engine by a number of years. This says the question is about how long it takes tire tracks to disappear? Also what turns the tide of the case is her later answer about positraction and independent suspension above.
On the “you blend” quote, it invents clothing, saying that the line occurs when Vinny decides “to dress in a camouflage outfit to ‘blend in’ with the locals”. In reality the joke is the opposite: Vinny is in his typical Brooklyn outfit, with the minor addition that he is wearing “cowboy boots”. You blend is joking about the minimal change he has made. The confabulation reverses this meaning.
The most ridiculous one is the deer monologue. The response says rather than about Vinny going hunting, the monologue occurs when she is explaining “to the court how ridiculous it is to think a hunter could confuse a person for a deer” at the trial. I don’t even know what this means. The one direct quote in the paragraph, “blown to smithereens” doesn’t actually exist in the monologue (Her wording — “BAM! A fuckin bullet rips off part of your head! Your brains are laying on the ground in little bloody pieces!” — is the reason this bit is so viral). She does not mimic a deer “frolicking”. This is the most noteworthy quote, and here it’s hallucinated phrasing paired with hallucinated action in a hallucinated scene connected to plot points in the trial that don’t exist.
Foreground Content and Background Content Blindness
A couple observations in the moment about this. First, by the time I looked at this output I had spent several hours watching video clips, learning the background of the film, and tracking down information on odd little questions like “Why are they in a cabin for the “biological clock” scene?”
What is typical but still shocking is how much the errors on my brief scan were
Invisible to me, and
Once visible, downplayed in my mind as to severity
Part of this is due to the fact that so many of the errors that LLMs produce happen in the background content, that is, in the content that does not answer the issue at hand directly but provides supporting claims or detail.6 And if the foreground content is right our mind just sort of yadda yaddas over the background, so we read it like this:
👠 “Oh yeah, you blend.”
Context:
Lisa says this sarcastically to Vinny when yadda yadda yadda “blend in” with the locals. The humor comes from the fact that Vinny sticks out yadda yadda yadda
Then when we notice the background content is wrong, initially we dismiss this as minor. It’s background content after all.
But again, when we actually sit down to really analyze it (or get an AI to look at it) we notice it’s not just that the supporting detail is off, it is that the supporting detail is wrong, or contradictory; conflicting or non-supportive. Not to hammer on this, but the yadda yaddas in this case actually change the meaning of the foreground content. That’s how bad they are.
Lisa says this sarcastically to Vinny when he tries to dress in a camouflage outfit to “blend in” with the locals.
Again, the joke here is that he is not wearing a special outfit. He is wearing what he always wears but has just added cowboy boots. THAT’S THE JOKE. And yet I will tell you that if you don’t approach this systematically you will either miss this or miss something like this. I know this intellectually from a decade and a half of teaching and research but its still so odd to feel it.
Why is ChatGPT 5.2 Instant… worse?
It’s worth noting that 4o is not the original 4o May 2024 release; it’s been updated many times since. Still, even though it is not a snapshot of where these systems were 19 months ago, it does provide a window into the past. I am not so surprised it bungled it so badly.
I was curious how much better 5.2 Instant would be.
Here’s what I was not expecting: it was somehow worse. Here’s the result (fact-check here):
Here are Marisa Tomei’s most famous quotes from My Cousin Vinny, along with what’s happening in the story when she says them. Tomei plays Marisa Tomei’s iconic character Mona Lisa Vito, whose sharp intelligence and explosive delivery drive many of the film’s most memorable moments.
1. “The defense is wrong!”
Context:
During the climactic trial, Mona Lisa is finally allowed to testify as an expert witness. After being underestimated and mocked for her appearance and attitude, she confidently interrupts the courtroom to dismantle the prosecution’s theory about the crime.Why it’s famous:
This line marks the turning point of the case—and the moment when everyone realizes she’s the smartest person in the room.2. “It’s a trick question!”
Context:
The prosecutor tries to trap Mona Lisa with a misleading technical question about car mechanics. She instantly calls it out before answering, refusing to be boxed into a false premise.Why it’s famous:
It perfectly captures her no-nonsense personality and mechanical expertise, and it lands as both comedy and courtroom triumph.3. “I’m an expert on automobiles.”
Context:
When asked to justify her qualifications, Mona Lisa matter-of-factly explains her lifelong immersion in cars—thanks to her family’s garage business.Why it’s famous:
The understated delivery contrasts hilariously with how dismissive the court initially is, setting up her later mic-drop moments.4. “Does that include positraction?”
Context:
This is the setup to the film’s most iconic exchange. Mona Lisa challenges the prosecutor’s assumptions about the getaway car by zeroing in on a crucial mechanical detail.Why it’s famous:
The word “positraction” became inseparable from the film—and from Tomei’s performance—thanks to the line’s rhythm and escalating intensity.5. “Are you sure?”
Context:
She repeatedly presses a witness on their certainty about what they saw, using calm persistence rather than aggression.Why it’s famous:
It shows how she dismantles testimony not with theatrics, but with precision and confidence.Why these lines endure
They flip stereotypes about intelligence and expertise
They combine technical accuracy with comedic timing
They culminate in one of the most satisfying courtroom victories in film history
Marisa Tomei’s performance won the Academy Award for Best Supporting Actress, and these quotes are a big reason why the character remains unforgettable.
I don’t even know where to start here. Three of the five lines are either hallucinated or conflated.
Tomei never says she is an expert on automobiles. That quote is made up. Even more odd is the LLM calling “I am an expert on automobiles” understated delivery. The quote is hallucinated but perhaps the result of a source confusion around a question Vinny asks of her:
Likewise the Positraction line is another result likely of source confusion and a bit of hallucinatory reconstruction. It’s Pesci that asks a question about Positraction. And of course that line is not a particularly famous line of Pesci’s either.
The “Are you sure?” mistake is the oddest one, because just read the ridiculousness of this statement:
“Are you sure?” She repeatedly presses a witness on their certainty about what they saw, using calm persistence rather than aggression…it shows how she dismantles testimony not with theatrics, but with precision and confidence.
Again, this is source confusion with Joe Pesci’s line — but more than that, it’s confusion with Pesci’s role. Vito is “pressing a witness"? Dismantling testimony?
There are so many other things here, and you can check the automated fact-check if you want to see more. On the “defense is wrong” quote, for example, it leans heavily into filmic tropes, assuming that she must actually mean the prosecution is wrong, unable to capture that the uniqueness of that line is the subversion of that trope. It invents a mocking of her attire, and portrays the line as an interruption (presumably of the prosecution). But the line is delivered at the invitation of Pesci.
What’s going on here?
Honestly, I don’t know. It’s very weird. I’m not saying that mockingly or anything, I get that fast and cheap models make compromises. And in the end neither 4o or 5 Instant is covering itself with glory here. But it’s odd.
It’s so odd that I ran it a number of other times. It got worse. Here’s just the quotes on another run (session here):
The defense is wraaahng.
I’m an automotive expert.
My father was a mechanic. My mother was a mechanic…
It’s a 1973 Buick Skylark.
Are you sure about that five minutes?
I wore this ridiculous thing for you.
Again there’s a bunch of source confusion and a lot of hallucination. Her mother wasn’t a mechanic, her mother’s father was a mechanic. The explanations of these have her talking about her “history of racing”, it creates scenes where she’s wearing things in court she is not. I’m not even sure how 1973 comes into this. The five minutes line is about grits, and comes from Pesci.
Even where it makes the same mistake as 4o, it makes it worse. Much worse. It repeats the mistake of 4o in seeing the “defense is wrong” quote as being an outburst rather than something Pesci is hoping for, but look at the details it confabulates. Suddenly Lisa is not on the stand, she’s maybe the lawyer? The quote is placed after someone identifies the suspects? Vinny is embarrassed of her, not insanely proud? The “first hint she might actually be useful” is a phrasing that gets a a key theme of the film (that she wants to help but can’t figure out how) right. But believe me, it is not the first hint by any means. The whole film is is extended foreshadowing of that moment.
The whole response is bullshit.
1. “The defense is wraaahng.”
Context:
Mona Lisa blurts this out in court when the prosecution’s witness confidently identifies the suspects. Her thick Brooklyn accent and total certainty shock the courtroom—and embarrass Vinny. It’s the first hint that she might actually be useful.
Just to show you how mucked this is, here is the actual scene to compare that description to:
Most of the “quotes” in this response are that wrong. Really, check out the automated fact-check, it’s got more things here wrong than right.
If I get some time I might set up one of my automated fact-checkers to classify errors as conflation, source confusion, bad quote reconstruction, bad summary, confabulated context etc., and use it to get a better idea of what’s going wrong here. Just to be clear, for other reasons I think that 4o falls into the “unsafe at any speed” category and should be shelved; I’m just interested in the deterioration on this task.
ChatGPT 5.2 Thinking Mode and the trap of tropic (TROH-pic) bias
Ok, lets try ChatGPT 5.2 Thinking. This should be a lot better.
Here’s the result which took two minutes and ten seconds of thinking and an additional twenty seconds to write (standard thinking setting):
Here are some of Marisa Tomei (as Mona Lisa Vito)’s most famous, most-quoted lines from My Cousin Vinny—and what’s happening in the story when she says them:
“Oh, yeah. You blend.” (IMDb)
Context: Early on in Alabama, Vinny insists he “fits in” better because he’s wearing cowboy boots. Mona Lisa shoots back this perfectly dry, sarcastic one-liner.“My biological clock is ticking like this…” (Surrey Libraries)
Context: In the motel, with the case going badly, she confronts Vinny about his long-standing promise: they’d get married after he won his first case—and time (and stress) are piling up.“I routinely twist to maximum allowable torquage.” (Movie Sounds)
Context: During their legendary motel-room argument about the dripping faucet, she defends herself like she’s on the witness stand—turning a petty fight into mock technical testimony.“Dead-on balls accurate.” (Movie Sounds)
Context: Same faucet “cross-examination.” She escalates the absurdity by insisting the torque wrench was calibrated to be perfectly accurate—making the line an all-timer.“No, the defense is wrong!” (93.3 WMMR)
Context: Vinny is stuck, and she’s looking at key evidence (tire marks/vehicle details). She realizes something doesn’t add up and bluntly calls it—setting up the crucial shift that helps crack the case.“Chevy didn’t make a 327 in ’55.” (Movie Sounds)
Context: On the stand as an expert witness, the prosecutor tries to stump/disqualify her with a car question. She immediately corrects the premise, showing she’s the real deal.“However, in 1964, the correct ignition timing would be four degrees before top-dead-center.” (Movie Sounds)
Context: The mic-drop finish to that same courtroom exchange—she doesn’t just answer, she out-experts the attempt to trap her.“So, I wore this ridiculous thing for you.” (93.3 WMMR)
Context: Part of their running “fish out of water” struggle to look/act acceptable in small-town Alabama—she’s exasperated about changing her look to help Vinny be taken seriously.If you want, tell me how many you need (like top 5 vs. top 15), and I’ll expand the list with more Mona Lisa classics—but still keep each quote short and paired with the scene context.
My automated fact-checker did OK on this response, but made an error or two itself. Which I find interesting. But it found two errors and one discrepancy. The big one is that the last quote about the “ridiculous thing I wore for you” is by Pesci to the judge about a second-hand suit he bought, not Lisa to Vinny. Note that it leans into tropic bias here, assuming that Pesci is embarrassed about the outfits that Lisa wears. He’s not — despite the “you stick out like a sore thumb” line, he adores her and her style,7 and I might be forgetting something but I don’t think he apologizes for her once. As a matter of fact, after she is on the stand he asks the other expert witness to confirm she’s cute:
I say tropic bias (TROH-pic, yeah I’m inventing this word) because a lot of what goes wrong here shows how this movie does subtly subverts norms in ways you don’t notice. It is a common trope that the boyfriend trying to look legit argues with his more over-the-top girlfriend that she has to change so they can look respectable. Then — if you’re following the trope —there’s the eventual fight where the girlfriend complains that the boyfriend is changing and the boyfriend complains that she needs to conform, etc.
So it’s interesting, right? If I just gave you the line:
So, I wore this ridiculous thing for you.
And if I asked you to guess the context of it, you might, like the LLM, come up with an explanation like this:
[This is] part of their running “fish out of water” struggle to look/act acceptable in small-town Alabama—she’s exasperated about changing her look to help Vinny be taken seriously.
But one thing I realized that is a bit special about this movie is that Pesci’s character doesn’t apologize for Mona Lisa Vito at all. Every scene she has a more over the top outfit and the only thing you see on his face is absolute pride. She looks great, and if you can’t see that, you’re the problem. And as far as the courtroom, Lisa does tone it down a little bit, but it’s not part of any argument. So a line that in a more cliche movie would fit in her mouth ends up coming out of his.
The same thing happens with the “No, the defense is wrong!” quote. Here’s the wrong explanation of that quote:
“No, the defense is wrong!” (93.3 WMMR) Context: Vinny is stuck, and she’s looking at key evidence (tire marks/vehicle details). She realizes something doesn’t add up and bluntly calls it—setting up the crucial shift that helps crack the case.
Again, think of what you would assume if you knew Lisa said this line and you had to guess based on the Wikipedia article and a knowledge of film tropes what it was about. This would be a pretty good guess!
But that’s not how this film works. It is a formula film in so many ways, but as mentioned above, on this point the film has a unique setup up. Vinny realizes looking at a photo during court recess the significance of the tire tracks — that they are unlikely to be the tracks of a ‘64 Skylark.8 But to bring that evidence into the courtroom he needs to put someone on the stand who is an expert. And in his mind there’s no bigger expert on cars than Lisa.
Lisa looking at the photo sees the same thing that he does, comes to the conclusion independently, then with an encyclopedic knowledge of cars that dwarfs Pesci’s, she knocks her testimony out of the park starting with a very unconventional line for a defense case — the defense is wrong, because the case is actually much stronger than what the defense has presented.
In other words, where the film subverts, the reasoning fails.
Notably, my fact-checker nearly gets there, flagging this as a context error, but puts the realization in the hotel room, not the cafe. Something for me to work on with it, maybe give it another location pass:
There’s the other error here, that despite providing eight lines of dialogue it misses the deer monologue and the Positraction explanation. Both of these are monologues but also have very famous and identifiable short quotes in them. The loss of the Positraction one is fine, probably, since “the defense is wrong” quote stands in for it, but the loss what is arguably the one of the most famous bits from the film (second only to Pesci’s “two yutes” quote). I do wonder though if this is partly a problem with the way copyright infringement guardrails are enforced —
ChatGPT Pro 5.2 Nails It
So I gave the question to ChatGPT Pro 5.2. The first interesting thing about this was the time it took with the question. I know this is usual for Pro, but for a question like this I thought it might be faster. But it took a whole 12 minutes on this question. I ran it a second time, and it took over ten minutes.
Here’s one of those runs. Sorry this one is so long, but you can skim to the bottom, I’ll discuss the interesting bits…
Marisa Tomei plays Mona Lisa Vito (usually called “Lisa”), Vinny’s sharp, no-nonsense fiancée in My Cousin Vinny (1992). (Wikiquote)
Here are some of her most-quoted lines (as short excerpts) and what’s happening in the scene:The motel faucet / “torque wrench” bit
Quote: “I routinely twist the maximum allowable torquage.”
Context: In their motel room, Vinny is annoyed by a dripping faucet and interrogates Lisa about whether she really turned it off. She escalates the argument into a hilariously over-technical explanation, establishing her “I know what I’m talking about” energy. (Wikiquote)Quote: “dead-on balls accurate.”
Context: Still in the faucet argument, she insists her torque wrench is perfectly calibrated—using absurdly official-sounding language to shut the conversation down. (Wikiquote)Quote: “It’s an industry term.”
Context: Vinny repeats her phrase in disbelief (“Dead-on balls accurate?”) and she casually doubles down, as if this is all completely normal. (Wikiquote)“Disclosure” — Lisa quietly being the most competent person in the room
Quote: “It’s called ‘disclosure’, ya dickhead!”
Context: Vinny thinks he cleverly manipulated the prosecutor into handing over the case files. Lisa explains that the prosecutor had to hand them over as part of required discovery—meaning Vinny’s “win” wasn’t a con at all, just the law. (Wikiquote)The marriage pressure / “biological clock” scene
Quote: “My biological clock is… tickin’ like this…”
Context: On the eve of trial (when Vinny’s stressed and exhausted), Lisa finally vents: they said they’d get married after he won his first case, and it’s been ten years. She physically punctuates “tickin’” by stomping, turning anxiety into comedy. (Wikiquote)The hunting clothes reality check
Quote: “Imagine you’re a deer.”
Context: Vinny worries about whether his pants are right for a hunting trip. Lisa responds with a vivid (and very darkly funny) description of being shot—basically telling him: the target won’t care about your outfit. (Wikiquote)The courtroom: “hostile witness” banter
Quote: “No, I hate him.”
Context: When Vinny calls Lisa as an auto expert witness, she pointedly refuses to cooperate at first (because they’re fighting). The judge presses her to answer whether she’s an expert, and she deadpans this instead. (Wikiquote)Quote: “You think I’m hostile now? Wait till you see me tonight.”
Context: Vinny asks the judge if he can treat her as a hostile witness; she shoots back with this line, turning a legal procedure into relationship comedy in front of the whole courtroom. (Wikiquote)The courtroom: the “bullshit” trick question moment
Quote: “That’s a bullshit question.”
Context: The prosecutor tries to discredit her by asking a car-nerd question designed to make her look like a fake expert. She immediately recognizes the setup and calls it out. (Wikiquote)Quote: “No, it is a trick question!”
Context: The judge asks if she can answer it; she explains it’s “impossible” as asked—then proves she’s the real deal by explaining the model-year mismatch and giving the correct timing for the year where the parts actually exist. (Wikiquote)The climax: “positraction” testimony (the case-breaker)
Quote: “Would you like me to explain?”
Context: Vinny pushes her to justify how she can tell key facts from photos/tire marks. This line is her calm “yes, I absolutely can” setup before the technical teardown. (Wikiquote)Quote: “The car that made these… tire marks had positraction.”
Context: This is the famous start of her expert explanation: the tire marks indicate a feature the prosecution’s alleged car wouldn’t have—helping Vinny show the wrong vehicle (and therefore the wrong suspects). (Wikiquote)Quote: “It’s a limited-slip differential which distributes power equally…”
Context: She defines “positraction” in plain-but-authoritative terms and explains why the suspected car couldn’t have made those marks—winning over the jury with real expertise. (Wikiquote)Bonus: Lisa roasting Vinny (gently) about pride
Quote: “Well, I guess that plan’s moot.”
Context: After Lisa secretly helps by getting a fax to validate Vinny’s legal credentials, Vinny complains he wanted to win without help. She punctures his pride and keeps the focus on winning. (Wikiquote)If you want, tell me whether you mean “most famous” as in most memed, funniest, or most important to the plot, and I’ll rank them that way (with the scene context).
First of all, the thing to note here is the extra time and runtime computation made a huge difference. This is near perfect.
Where there is weirdness in it, a lot of the weirdness in it is the result of copyright guardrails. We can’t get extended quotes. But given that restriction, it works around it quite well.
Take the hostile banter segment. If you go to Wikiquote you will find that section laid out like this:
Judge Haller: Would you please answer the counselor’s question?
Lisa: No, I hate him.
Vinny: Your Honor, may I ask your permission to treat Ms. Vito as a hostile witness?
Lisa: You think I’m hostile now? Wait till you see me tonight.
That’s 42 words and 230 characters. You can see in the thinking trace that ChatGPT is trying to find a way to select quotes that fall under the 90 character or 25 word rule it has imposed on itself. So we end up with this stilted presentation that all the same does convey the essence of the scene:
The courtroom: “hostile witness” banter
Quote: “No, I hate him.”
Context: When Vinny calls Lisa as an auto expert witness, she pointedly refuses to cooperate at first (because they’re fighting). The judge presses her to answer whether she’s an expert, and she deadpans this instead. WikiquoteQuote: “You think I’m hostile now? Wait till you see me tonight.”
Context: Vinny asks the judge if he can treat her as a hostile witness; she shoots back with this line, turning a legal procedure into relationship comedy in front of the whole courtroom. Wikiquote
My automated fact-checker only captures two errors, and in one case is wrong, and in another one is debatable.
First, my checker whiffs on the dead-on balls accurate discussion, placing it in a cabin they stay in later in the movie. ChatGPT places it correctly in an earlier motel room location. Additionally, in a bit of its own tropic bias, my fact-checker frames it as an argument where Vinny has tried to fix the faucet. That would make sense of course, but the scene is much weirder (and funnier) than that. They are arguing over whether Lisa fully turned off the faucet or if it is broken; it is meant to be an example of what is stated in the previous scene by Ralph Macchio: Gambinis are world-class arguers, compared to most people who are “amateurs”.
The second error is more interesting, a place where ChatGPT gets it wrong but for good reasons. The biological clock line occurs not on the eve of trial, but on the eve of of the defense. The prosecution has just wrapped up its case, but they have been in trial for a while. That’s not an inconsequential fact: the reason Vinny is frustrated and Lisa is worried is it looks like he is going to lose the case.
So ChatGPT is (slightly) wrong here, but it’s easy to see why. If you look at the quote on Wikiquote, Vinny wrongly (or perhaps imprecisely) states that “the trial starts tomorrow”.
Vinny: What are you nervous about? I’m the one that’s under the gun here. Trial starts tomorrow.
Lisa: You wanna know what I’m nervous about? I’ll tell ya what I’m nervous about. I am in the dark here with all this legal crap. I have no idea what’s goin’ on. Alls I know is that you’re screwin’ up and I can’t help.
For whatever reason in the script the more accurate “defense starts tomorrow” was not used. But we’ve had days of testimony at this point, so it’s clear the trial is in progress, and as mentioned, its the fact the trial is in progress and going poorly that sets the scene.
The rest of this, however, is near perfect. Having seen the multiple ways it is possible to screw up the contextualization of these quotes it’s truly remarkable to see how clean this is, and the level of nuance that it gets.
Some thoughts
One of the obvious patterns here is the impact test-time compute on this sort of task compared to training and model size. Surprisingly, I didn’t find any improvement from 4o to 5.2 Instant, and though a more definitive conclusion would need a defined scoring system, I feel it’s pretty obvious that 5.2 Instant did worse, if anything.
On the other hand, thinking time, when combined with iterative use of search tools, looks much more predictive of response quality and seems in this limited test likely to correlate with it (as reasoning time and tool use increases response quality increases).
At the same time I was struck just how much reasoning time needed to be expended to get this task right. It’s possible that My Cousin Vinny is uniquely hard to parse, but I don’t think that is the case. I’ve tried this with a half dozen other films and the pattern seems to hold. If it’s true that a significant amount of similar film contextualization tasks are solvable with test-time compute but require extensive compute to get it right, it seems to me this could be the basis of a number of useful benchmarks. You’d have to have a system to quantify types and levels of error, but that’s doable. This particular task struggles with copyright guardrails (which is useful in itself) but other questions I’ve played with on plot and casting intersect less with that issue yet still provide similar error dynamics across a wide variety of platforms.
It also highlights for me the stark disconnect in perspective between people using free products that do not spend a lot of compute on reasoning and people using some of those paid products. I have to admit to being surprised by the frequency that all the free and “quick” products repeatedly confabulated scenes, lines, and even cast members, even when they were the latest models. When I talk to some very smart people about these sorts of errors I get the impression they think such errors are rare. In my experimentation with film plot summary, for instance, in asking Gemini 3.0 Flash why characters play Russian Roulette in La Haine, is it is possible to generate a near infinite amount of error in this space. Just poking around, I seldom found a question of moderate complexity that didn’t result in notable (and often critical) errors.
It’s true that film plots are an interesting and somewhat challenging information space; there are so many people talking about what happens, but in circular ways, often trying to avoid spoilers. There’s a certain “three blind men describing an elephant” element to at least some of it. But that is hardly a unique dynamic in information spaces. The unique thing here is a tester can pay $3.99 to a streamer and examine the whole elephant themselves.
Places where it’s easy to generate error and equally easy to evaluate or reconcile them are interesting spaces and unique opportunities! Quite often challenging information spaces are also challenging assessment spaces. Places where answers are hard to get are often places that have no single answers. There are many debatable things one will hit when talking about films, and that’s one of the enjoyable things about films. But there seems to be a whole class of film elements that are not debatable at all and yet easy enough to assess definitively.
I suppose this is all a defense of why I spent my entire weekend analyzing responses to My Cousin Vinny. But I found it fascinating. If you’re reading this sentence 8,000 words later, I’m hoping you found it fascinating too?
Note: I used a Gemini 3 Pro with Search based fact-checking tool I built to assist me with fact-checking LLM responses for this post. A limited version of that tool (limited to 250 uses a day across all users) is available for your own experimentation at https://overlay.cv. The gating passphrase is “downpour”. I chose to analyze ChatGPT for this post, but all LLMs display this behavior in my experience and I encourage you to formulate your own similar tests on other models and use the tool to help you check the responses. Throw what you find into the comments. As always use LLM-based tools to supplement and support your reasoning and research, not replace it. The tool is not perfect but it can help you see things you might otherwise miss.
This was a line which the writer apparently didn’t want to put in because he thought — rightly, I think — it was cliche and a bit anti-feminist. But the studio was fighting to remove Tomei’s character from the script altogether, and insisted if the character was to stay in they needed the marriage plot point, which is what the film ends on. The writer put the line in to save the character. But Tomei manages to deliver it with a mixture of compassion and frustration and relatability that somehow skirts the “nagging girlfriend” trope and puts you at least halfway on her side. So much so that it became an imitated line by 1990s women everywhere to express their own frustration with non-committal men or the horrors of the dating scene.
Yes, my current research set includes My Cousin Vinny and La Haine. Ask me anything.
Honestly, I wish I still taught classes because I think this sort of exercise where you have to reconstruct the meaning of a film without having the experience of a film is more instructive to students thinking through language models than a lot of the stuff taught. I really need to get back to the classroom. If I can ever afford it, that is.,
I realize that there are a bunch of copyright rules on putting out full quotes and maybe on crawling some of these things as well. That’s honestly part of what I’m interested in here.
This will make me sound pro-AI (which is sad in its own way), but the whole exercise of this has made me think of the contradictions of many AI-related positions. I think you can root for the NYT to win its case against OpenAI on the basis that ChatGPT under certain circumstances spit out large segments of NYT text in response to specific prompts. It’s a perfectly fine position to argue that at some level of reproduction this technology becomes a substitute for the sources it is built on, and that’s bad. I personally wouldn’t set that at 25 words or 90 characters, but it’s a defensible case. I think it’s also a defensible case that everything on the open internet is not open for LLM consumption, that there is a difference between indexing a copyrighted document someone put on Script Slug and consuming that document as part of an LLM data set. I feel like that is less defensible personally because I tend to see these systems as a map of the available information environment and to some extent they are not responsible for the state of that environment. But it’s absolutely a supportable position. What I don’t think you can do support these things and then be particularly surprised that these technologies struggle sometimes to place quotes in context or be shocked they reconstruct quotes wrongly. These copyright concerns and these accuracy concerns are in tension. Like most things in law they involve competing goods, and it doesn’t do the broader AI discussion any good to be dishonest about these tradeoffs, on either side of the equation.
If you’re familiar with formal pragmatics you’ll clock this as a variant of the at-issue and not-at-issue distinction. However I don’t think some of this behavior fits the negation requirements of the at-issue/not-at-issue distinction, so I am using broader terminology instead. Maybe I’m wrong though?
Even in the the “You blend” scene you can see him looking at her bemused but he’s also adoring. Mona Lisa Vito is Mona Lisa Vito.
Someone asked on Bluesky how he comes to that intuition. I think it is meant to tie to an earlier scene were they get stuck in the mud, but the idea is not well-telegraphed.












Gemini Pro also did great compared to Gemini reg. Compute time FTW!
I did find it fascinating, and love the whole "TROH-pic" conceit, you are good. I also love you indulging My Cousin Vinny. I always wrote that movie off, but Antonella is a fan, and recently we watched it and I realized how wrong I was, what a delight.
As for the accuracy of the chatGPT model, I have been experimenting with Madden 2001 game play and summarizing games, and it is pretty interesting how much it confabulates depending on thinking time, so I really like this theory you are presenting. I find the more time it spends trying to calculate Standings and Playoff Picture scenarios (which it is truly challenged by, despite what would be evident math for a machine like it) the more likely it is somewhat accurate---but damn it always messes something up, kinda like you are suggesting.
Doing this for movies though is so much more accessible and fun, no one gives a shit about Madden 2001 anymore cause they are Elden Ring philistines.
Anyway, this is awesome and i am gonna have to play with that Rewind tool you are building for fact checking cause MBS and I use ChatGPT occasionally for our Film Podcast, so this would be most useful so we don't make egregious mistakes---but watching the film in its entirety rather then depending on YT clips cuts a lot of that AI dependence I find :)