6 Comments
User's avatar
Anna Mills's avatar

This is so helpful; your prompting example definitely taught me that I can make a prompt quite long and complex and still hope the directions will be followed...

Expand full comment
Mike Caulfield's avatar

I think having come to prompting a bit late I never realized that early versions of these systems stalled out on complex prompts, so I just started talking to it like I would to undergraduate researchers who needed to classify things systematically. But it turns out it was definitely true in the past that by the time you pushed 500 words the instructions at the beginning were being forgotten as the session progressed.

Expand full comment
Anna Mills's avatar

Yes, we were seeing this with feedback prompts, but we upgraded models. My colleague thinks it's still a problem, but I don't think so; probably we need to test a bit more. I don't know how you manage the volume of testing you do! It's so time consuming to do a good job evaluating the results.

Expand full comment
Mike Caulfield's avatar

The secret is a well developed test prompt library from years as a misinformation researcher plus a habit of binging TV shows on the weekend while lazily running tests against a prompt and comparing it to the reference spec. Also never leak your test prompt library because prompts get spoiled so fast.

Expand full comment
Mike Caulfield's avatar

But everyone can benefit from even a small test prompt library, maybe some day I'll talk about how to make one.

Expand full comment
Anna Mills's avatar

I'd love to know more! (I'm mystified about prompts being spoiled fast).

I'm toying with the idea of trying a platform set up for testing rather than just me and my spreadsheet. Maybe PromptLayer, Anthropic's workbench, or OpenAI's Playground.

Expand full comment