Getting infinitely better at playing "What's…

May 13

Comparing models over time on an underrated but common contextualization scenario

6 Comments

Moral of story. Keep prompts that fail and try them every 3-6 months on newer models. You will be surprised by the progress. While some of it could be "data leakage" or some training targetting your use case but if its obscure enough is unlikely.

Expand full comment

Paul Cook

May 13

Fascinating work as always! But Step Brothers is a modern classic. Give it a chance!

Expand full comment

Reply (1)

Mike Caulfield

May 13

Ok, ok, people may have convinced me

Expand full comment

Stephen Fitzpatrick

May 13

So glad others are making the point about the rapid changes in the models. Many (if not most) people's impression of AI was fixed in the first 6 months of ChatGPT's release and few have kept up (understandably - the explosion of models and LLM's has been significant in just 30 months). This is a great example of the progression of improvement.

Expand full comment

Anna Mills

May 13

It really is striking the contrast. I wonder if we'll be able to say the same a year from now.

Expand full comment

Reply (1)

Mike Caulfield

May 13

It seems like it keeps finding ways to scrape together new gains. The dream that data would solve everything turned out to be a dream, but CoT reasoning and grounding techniques produced different kinds of gain (and to my way of thinking a more exciting ones). I imagine some of the benefits of those are slowing, so I guess the question will be "What next?"

Expand full comment

The End(s) of Argument

Getting infinitely better at playing "What's…