6 Comments
User's avatar
Aaron Tay's avatar

Moral of story. Keep prompts that fail and try them every 3-6 months on newer models. You will be surprised by the progress. While some of it could be "data leakage" or some training targetting your use case but if its obscure enough is unlikely.

Expand full comment
Paul Cook's avatar

Fascinating work as always! But Step Brothers is a modern classic. Give it a chance!

Expand full comment
Mike Caulfield's avatar

Ok, ok, people may have convinced me

Expand full comment
Stephen Fitzpatrick's avatar

So glad others are making the point about the rapid changes in the models. Many (if not most) people's impression of AI was fixed in the first 6 months of ChatGPT's release and few have kept up (understandably - the explosion of models and LLM's has been significant in just 30 months). This is a great example of the progression of improvement.

Expand full comment
Anna Mills's avatar

It really is striking the contrast. I wonder if we'll be able to say the same a year from now.

Expand full comment
Mike Caulfield's avatar

It seems like it keeps finding ways to scrape together new gains. The dream that data would solve everything turned out to be a dream, but CoT reasoning and grounding techniques produced different kinds of gain (and to my way of thinking a more exciting ones). I imagine some of the benefits of those are slowing, so I guess the question will be "What next?"

Expand full comment