I asked Claude to extract data from customer reviews once, and my first attempt was just telling it what I wanted. The results were inconsistent - sometimes I’d get JSON, sometimes plain text, and sometimes it would just make up fields that didn’t exist.
Then I tried showing it one example of what I wanted, and the results got better. But it was still inconsistent on edge cases.
Finally, I showed it three examples covering different scenarios, and suddenly I got perfect results every time.
That’s the difference between zero-shot learning (no examples), one-shot learning (one example), and few-shot learning (multiple examples). Same task, different number of examples, completely different quality.
Table of contents
Open Table of contents
Understanding the Basics
First, let’s clarify the terminology: a “shot” is just an example you give to the AI.
Zero-shot means you give no examples, just instructions. One-shot means you provide one example. Few-shot means you give multiple examples, usually between 2 and 10.
The basic principle is simple: more examples lead to better pattern recognition, which leads to more consistent outputs.
Zero-Shot Learning
Zero-shot learning means you give no examples, just instructions. Something like “Classify the sentiment of this text as positive, negative, or neutral.”
When to use it: Zero-shot works best for straightforward tasks, when you need to minimize tokens, or when you’re doing quick prototyping.
Pros: It’s efficient, fast, and keeps your prompt clean. Cons: The output format can be inconsistent, results can be ambiguous, and you have limited control over the exact format.
Zero-shot is great for quick tests and experiments, but it’s usually too inconsistent for production use where you need reliable, predictable outputs.
One-Shot Learning
One-shot learning means you provide one example that demonstrates the input-output pattern you want.
When to use it: Use one-shot when you need a specific format, you’re working on a somewhat specialized task, or you want to balance consistency with token constraints.
Pros: It sets clear expectations, gives you better consistency than zero-shot, and maintains moderate efficiency. Cons: The AI has limited pattern recognition from just one example, and it might not handle edge cases well.
This is my go-to approach for most tasks because it strikes a good balance between token usage and consistency.
Few-Shot Learning
Few-shot learning means you provide multiple examples (usually 2-10) that show patterns, variations, and edge cases.
When to use it: Use few-shot when you need high consistency, when there are nuances or variations to consider, when the format is critical, when the task is domain-specific, or when edge cases matter.
Pros: It gives you the highest consistency, handles complexity well, provides tight format control, and produces quality outputs. Cons: It’s token intensive, more costly to run, slower to create, and you hit diminishing returns beyond 5-10 examples.
Use few-shot when quality matters more than cost - for customer-facing features, data extraction tasks, or anything where inconsistency creates real problems.
Comparing Approaches
Here’s how these three approaches stack up against each other:
Quality: Zero-shot produces the lowest quality, one-shot is better, and few-shot gives you the highest quality outputs.
Token efficiency: Few-shot uses the most tokens, one-shot is in the middle, and zero-shot is the most efficient.
Time to create: Few-shot takes the longest to set up, one-shot is quicker, and zero-shot is the fastest to implement.
Choosing the Right Approach
Here’s a simple decision framework:
If you need high consistency, use few-shot. If it’s a specialized task, use few-shot. If you’re limited on tokens or budget, use zero-shot. If you’re dealing with a complex pattern, use few-shot. If it’s a simple or common task, zero-shot will probably work fine.
My personal process: I start with zero-shot, which takes about 30 seconds to set up, and then I test it. If the format or style is off, I add one example to make it one-shot. If edge cases are still failing, I add 2-3 more examples to turn it into few-shot.
Advanced Tips
Example quality matters more than quantity: Make sure your examples have consistent formatting, clear patterns, and representative variations. One great example beats three mediocre ones.
Example diversity helps with edge cases: Include a simple case, an edge case, a complex case, and an unusual variation. This teaches the AI to handle different scenarios.
You can optimize token usage: Don’t be afraid to abbreviate your examples as long as the pattern stays clear. The AI can still learn from shorter, punchier examples.
Common Mistakes
Using inconsistent examples: Keep the format uniform across all your examples, or the AI won’t know which pattern to follow.
Providing too many examples: Beyond 5-10 examples, you hit diminishing returns and you’re just wasting tokens without improving quality.
Choosing non-representative examples: Make sure to mix simple cases, complex cases, and edge cases so the AI learns the full range of scenarios.
Having unclear patterns: Make the desired pattern obvious. If you can’t spot the pattern in your examples, neither can the AI.
Practical Applications
Data extraction: Use few-shot because you need consistency and the ability to handle variations in the input data.
Simple classification: Use zero-shot since it’s a common task with clear categories that the AI already understands.
Format conversion: Use one-shot because you need to demonstrate the exact format you want, but one example is usually enough.
Creative writing: Stick with zero-shot because you don’t want to constrain the AI’s creativity with examples.
Conclusion
The best strategy is to start with zero-shot and add examples only when you need them.
Here’s my typical workflow: I try zero-shot first, which takes about 30 seconds to set up. If the results are consistent, I’m done.
If the output is inconsistent, I add one example to make it one-shot. This fixes about 80% of format issues right away.
If I’m still seeing problems, I add 2-3 more examples to turn it into few-shot, which handles the remaining 20% of edge cases.
Looking at my own usage, I use zero-shot about 30% of the time, one-shot about 50% of the time, and few-shot about 20% of the time.
Few-shot costs more in tokens and time, but for critical applications where quality matters most, it’s absolutely worth it.
The best approach is always the simplest one that works. Match the technique to your task, test it, and ship it.