Which prompt engineering pattern gives the best results?

Chain-of-Thought prompting typically provides 40-60% accuracy improvements on reasoning tasks. However, the best pattern depends on your specific use case - use Few-Shot for structured outputs, ReAct for multi-step tasks, and Self-Consistency for reducing variance.

How do I know which pattern to use?

Match the pattern to your task type: use Chain-of-Thought for complex reasoning, Few-Shot for structured outputs, Role-Based for domain expertise, and Prompt Chaining for multi-step workflows.

The Prompt Engineering Patterns I Actually Use in Production (And the Ones I Don't)

AI neural network visualization representing prompt engineering patterns

There's a funny thing about prompt engineering tutorials. They present 15 techniques as if they're all equally important, then show you a toy example of each. You leave feeling like you've learned a lot, but when you sit down to write a prompt for your actual system, you still stare at a blank screen.

I've written prompts for production systems processing millions of queries. Here's what I've actually learned: three or four patterns do 90% of the work. The rest are situational — useful when you need them, but you usually don't.

So instead of a checklist of 12 techniques with identical formatting, let me tell you which patterns I reach for first, which ones I pull out when things get hard, and which ones I almost never use despite them being in every tutorial.

The Big Three: Patterns I Use Every Single Day

1. Constrained Generation (The One That Fixes Everything)

If I could only teach one prompt engineering technique, it would be this: tell the model exactly what format you want.

This sounds obvious. It isn't. I review client prompts regularly, and the single most common mistake is vague output expectations. "Analyze this data" vs. "Return a JSON object with these exact fields" is the difference between a prototype and a production system.

Here's a real example. We had a product categorization system that was "working but unreliable." The prompt was:

prompt = "Extract product details from this description."
# Output: Sometimes JSON, sometimes markdown, sometimes prose.
# Parsing broke randomly. Team was pulling their hair out.

The fix took five minutes:

prompt = """
Extract product details in this exact JSON format:
{
  "name": "product name",
  "price": numeric value,
  "category": "category name",
  "features": ["feature1", "feature2"],
  "inStock": boolean
}

Rules:
- Price must be a number (no currency symbols)
- Features must be an array
- inStock must be true or false
- Use null for missing information

Product description: {description}

JSON output:
"""

Parsing errors dropped from ~20% to under 1%. Not because the model got smarter — because we stopped being vague about what we wanted.

For production systems, pair this with Pydantic validation:

from pydantic import BaseModel
import json

class ProductInfo(BaseModel):
    name: str
    price: float
    category: str
    features: list[str]
    in_stock: bool

def extract_product(description: str) -> ProductInfo:
    response = llm.generate(prompt.format(description=description))
    data = json.loads(response)
    return ProductInfo(**data)  # Validates or throws

If the model returns garbage, Pydantic catches it. You retry or escalate. No silent failures.

2. Few-Shot Learning (Show, Don't Tell)

The second pattern I can't live without: give the model examples instead of instructions.

I used to write elaborate prompts explaining exactly how I wanted entity extraction done. Five paragraphs of rules, edge cases, formatting requirements. The model would follow some rules, ignore others, and invent its own formatting.

Then I switched to examples:

prompt = """
Extract entities in the format [TYPE: VALUE]

Example 1:
Input: Google launched Gemini in Mountain View.
Output: [COMPANY: Google], [PRODUCT: Gemini], [LOCATION: Mountain View]

Example 2:
Input: Tesla unveiled Cybertruck at the LA Auto Show.
Output: [COMPANY: Tesla], [PRODUCT: Cybertruck], [EVENT: LA Auto Show]

Example 3:
Input: Microsoft released GPT-4 integration in Azure.
Output: [COMPANY: Microsoft], [PRODUCT: GPT-4], [PLATFORM: Azure]

Now extract entities from:
Input: Apple announced the iPhone 15 in Cupertino.
Output:

Three examples do more than three paragraphs of explanation. The model pattern-matches from examples far better than it follows complex written rules.

How many examples? Three is my default. Sometimes two works. I've never needed more than five. If five examples don't solve it, the problem isn't the number of examples — it's that the task is genuinely ambiguous.

Which examples to pick? Cover your edge cases. If you have a common case, a tricky case, and a boundary case, those three examples will outperform ten "normal" examples.

Abstract AI technology pattern showing prompt engineering concepts

3. Chain-of-Thought (Making the Model Show Its Work)

Chain-of-thought is the most famous prompt engineering technique, and it deserves the hype — for the right tasks.

The idea: instead of asking for the answer directly, ask the model to reason step by step. This dramatically improves accuracy on math, logic, and multi-step problems.

# Without CoT — model often gets this wrong
prompt = """
Calculate the total cost:
- 3 notebooks at $4.50 each
- 2 pens at $1.25 each
- 15% discount on total
"""
# Output: "$12.75" (wrong)

# With CoT — model almost always gets it right
prompt = """
Calculate the total cost step by step:
- 3 notebooks at $4.50 each
- 2 pens at $1.25 each
- 15% discount on total

Let's solve this step by step:
1. First, calculate the cost of notebooks
2. Then, calculate the cost of pens
3. Add them together for subtotal
4. Apply the 15% discount
5. Calculate final total
"""
# Output:
# "1. Notebooks: 3 × $4.50 = $13.50
#  2. Pens: 2 × $1.25 = $2.50
#  3. Subtotal: $13.50 + $2.50 = $16.00
#  4. Discount: $16.00 × 0.15 = $2.40
#  5. Final total: $16.00 - $2.40 = $13.60"

When I use it: Any time the task involves calculation, comparison, multi-step logic, or analysis. It's essentially free (a few extra tokens) and reliably improves accuracy by 40-60%.

When I don't: Simple extraction, classification, or translation. For "what category does this product belong to?" — chain-of-thought just adds unnecessary tokens. The model doesn't need to "think" about it.

A practical pattern for building CoT into your system:

def chain_of_thought_prompt(problem: str, steps: list[str]) -> str:
    return f"""
{problem}

Let's solve this step by step:
{chr(10).join(f"{i+1}. {step}" for i, step in enumerate(steps))}

Now, work through each step carefully.
"""

The Power Moves: Patterns I Reach For When Things Get Hard

These patterns aren't daily drivers, but when you need them, nothing else works.

4. Role-Based Prompting

Assigning a role sounds almost too simple to be a "technique." But the difference between "explain neural networks" and "you are a senior ML engineer explaining neural networks to a developer who's never done ML" is enormous.

prompt = """
You are a senior machine learning engineer with 10 years of experience
teaching complex concepts to software developers who are new to AI.

Explain neural networks to a developer who understands programming
but hasn't worked with ML before. Use code analogies and practical examples.
"""

The role doesn't just change the style — it changes what the model includes and excludes. A "teacher" role adds analogies. A "code reviewer" role focuses on bugs. A "consultant" role gives actionable recommendations instead of textbook explanations.

I combine this with almost every other pattern. It's the seasoning, not the main dish.

5. Negative Prompting (The DO NOT List)

Here's a counterintuitive discovery: sometimes telling the model what not to do is more effective than telling it what to do.

prompt = """
Summarize this article in 3-4 sentences.

DO:
- Focus on main facts and key points
- Use neutral, objective language

DO NOT:
- Add your own opinions or interpretations
- Include minor details or examples
- Use more than 4 sentences
- Start with "This article discusses..."
"""

That last "DO NOT" is my favorite. Without it, about 40% of summaries start with "This article discusses..." which is useless padding. One negative constraint, 40% better outputs.

I keep a running list of "DO NOTs" for each production prompt. Every time the model does something annoying, I add it to the list. After a few iterations, the prompt is dialed in.

6. Prompt Chaining (Breaking It Down)

When a task is too complex for one prompt — and you can feel the model struggling — split it into steps.

# Instead of one monolithic prompt...
prompt = "Analyze this code, identify bugs, suggest fixes, and rewrite it with improvements."
# ...which produces unfocused results

# Break it into a chain:
# Step 1: Analysis
analysis = llm.generate(f"Analyze this code and list potential issues:\n{code}")

# Step 2: Prioritization
priorities = llm.generate(f"Prioritize these issues by severity:\n{analysis}")

# Step 3: Fix suggestions
fixes = llm.generate(f"Suggest fixes for the Critical and High items:\n{priorities}")

# Step 4: Rewrite
final = llm.generate(f"Rewrite the code with these fixes applied:\n{code}\n\nFixes:\n{fixes}")

Each step gets the model's full attention on one sub-task. The output is better, and if something goes wrong, you can see exactly which step failed.

The cost trade-off: chaining uses 3-4x more tokens than a single prompt. For most production systems, the quality improvement is worth it. For high-volume, low-stakes tasks (sentiment classification, simple extraction), a single well-crafted prompt is fine.

7. Contextual Priming

When the model gives generic advice, it's usually because you gave it a generic question. Adding context transforms "should we use microservices?" from a textbook answer into a relevant recommendation:

prompt = """
Context:
- Team size: 8 developers
- Current system: Django monolith (50k lines)
- Traffic: 100k requests/day
- Pain points: Slow deployments, testing bottlenecks
- Budget: Limited DevOps resources
- Timeline: 6 months

Given this context, should we migrate to microservices?
Provide a recommendation specific to our situation.
"""
# Output: Tailored advice that accounts for team size, budget, timeline

Without context, you get "microservices have pros and cons." With context, you get "with 8 developers and limited DevOps, microservices will slow you down. Consider modularizing your monolith first."

Night and day.

Developer implementing advanced prompt engineering techniques

The Specialist Tools: Powerful But Situational

8. Self-Consistency (Voting on the Answer)

This is one of those techniques that sounds unnecessary until you're working on a task where accuracy really matters.

The idea: generate multiple independent answers to the same question, then pick the most common one. It's like asking five doctors instead of one.

from collections import Counter

class SelfConsistency:
    def __init__(self, llm_client, num_samples: int = 5):
        self.llm = llm_client
        self.num_samples = num_samples

    def solve(self, problem: str) -> dict:
        answers = []
        for _ in range(self.num_samples):
            solution = self.llm.generate(f"{problem}\n\nSolve step by step.")
            answers.append(self._extract_answer(solution))

        answer_counts = Counter(answers)
        best, count = answer_counts.most_common(1)[0]

        return {
            "answer": best,
            "confidence": count / self.num_samples
        }

When I use it: Math problems, classification tasks where the cost of being wrong is high, medical/legal/financial analysis. The 5x cost increase is justified when one wrong answer matters.

When I don't: 95% of the time. For a chatbot or a content generation system, one good prompt with CoT is enough.

9. ReAct (Reasoning + Acting)

ReAct is the foundation of AI agents. The model alternates between thinking ("I need to look up X") and acting (calling a tool to look up X).

prompt = """
Use the ReAct framework to solve this:

Question: What is the capital of the country where the Eiffel Tower is located?

Thought: I need to find which country the Eiffel Tower is in.
Action: Search["Eiffel Tower location"]
Observation: The Eiffel Tower is in Paris, France.

Thought: France's capital is Paris. I have the answer.
Action: Finish["Paris"]
"""

This is less a "prompt engineering pattern" and more an "application architecture." If you're building agents, you'll use ReAct. If you're not, you probably won't. See my agent building guide for the full picture.

10. Tree of Thoughts

Tree of Thoughts explores multiple reasoning paths and picks the best one. It's like brainstorming three approaches, evaluating each, then committing to the winner.

I'll be honest: I've used this in production exactly twice. Both times for complex planning tasks where the first solution path was often suboptimal. For 99% of use cases, regular chain-of-thought is sufficient.

If you're curious, the pattern is straightforward: generate three solutions, then ask the model to evaluate and pick the best one. But the 3x cost and latency means I only reach for it when the decision quality genuinely matters.

11. Meta-Prompting

Using the model to generate or improve prompts. This sounds recursive and weird, but it's genuinely useful when you're stuck.

meta_prompt = """
I need a prompt that extracts named entities from medical records.
The output should be structured as [TYPE: VALUE].

Generate an effective prompt that includes:
1. An appropriate role/expertise
2. 2-3 examples
3. Clear formatting rules
4. Edge case handling

Generate the prompt:
"""

I use this as a starting point, never as the final product. The model generates a reasonable first draft, and then I iterate manually based on real outputs. It's particularly useful for domains where I'm not an expert — the model often suggests edge cases I wouldn't have thought of.

Multiple passes to progressively improve output. Draft → refine for clarity → refine for engagement → final polish.

Useful for content generation where quality matters. Overkill for most production tasks. I mention it for completeness, but in practice, a single well-prompted call with good constraints usually gets you 90% of the way there.

What I've Learned About Combining Patterns

The real skill isn't knowing individual patterns — it's knowing which ones to stack for a given problem.

My most common stacks:

For structured extraction: Few-shot + Constrained Generation + Negative Prompting

Show examples of the format, specify the schema, list common mistakes to avoid

For analysis tasks: Role + Context + Chain-of-Thought

Set the expertise, provide background, ask for step-by-step reasoning

For high-stakes decisions: Context + CoT + Self-Consistency

Full context, step-by-step reasoning, multiple samples with voting

For agent systems: Role + ReAct + Constrained Generation (for tool calls)

See my agent guide for details

Don't stack more than 3-4 patterns at once. Each one adds tokens and complexity. If your prompt is longer than your expected output, you've probably over-engineered it.

The Honest Performance Table

Every tutorial gives you percentages like "40-60% improvement." Here's my honest take: these numbers are real but context-dependent. A 50% improvement on a toy benchmark might be a 10% improvement on your specific production data. Still worth it, but manage expectations.

Pattern	My Real-World Impact	When I Use It	Cost
Constrained Generation	Huge — fixes most "unreliable output" problems	Every structured task	Free
Few-Shot	High — 3 examples beat 3 paragraphs of instructions	Domain-specific extraction	Minimal
Chain-of-Thought	High for reasoning, useless for simple tasks	Math, logic, analysis	Minimal
Role-Based	Medium — seasoning, not main dish	Almost always, as a prefix	Free
Negative Prompting	Medium — fixes specific annoying behaviors	When model keeps doing something wrong	Free
Prompt Chaining	High — but 3-4x cost	Complex multi-step tasks	3-4x
Contextual Priming	High for recommendations, low for extraction	Advisory/recommendation tasks	Minimal
Self-Consistency	High for accuracy, but 5x cost	Only when accuracy is critical	5x
ReAct	Essential for agents, irrelevant otherwise	Agent systems	Variable
Tree of Thoughts	Rarely justified outside planning	Complex planning tasks	3x
Meta-Prompting	Useful as starting point, not final product	When stuck or entering new domain	1 extra call
Iterative Refinement	Moderate — usually overkill	High-quality content generation	3-4x

The Advice I Wish I'd Gotten Earlier

Your prompt is a living document. Version control it. Track which version produces which results. When something breaks in production, you want to git diff your prompts.

The best prompt is the shortest one that works. Longer is not better. Every extra sentence is a potential source of confusion for the model. If you can get the same results with fewer words, do it.

Test on real data, not examples you made up. I've written prompts that worked perfectly on my test cases and failed on the first real input. Real data is messier, more ambiguous, and more diverse than anything you'll construct.

When the prompt isn't working, the problem might not be the prompt. Sometimes the model genuinely can't do the task. Sometimes your data is bad. Sometimes you need RAG or fine-tuning, not a better prompt. Prompt engineering has limits.

Start with Constrained Generation and Few-Shot. Add Chain-of-Thought for reasoning tasks. Layer in Role and Negative Prompting to fine-tune behavior. That's the playbook. Everything else is for specific situations.

Need help optimizing your AI system's prompts? I've tuned prompts for systems handling millions of queries. Sometimes the fix is a single line. Let's talk.

The Prompt Engineering Patterns I Actually Use in Production (And the Ones I Don't)

The Prompt Engineering Patterns I Actually Use in Production (And the Ones I Don't)

The Big Three: Patterns I Use Every Single Day

1. Constrained Generation (The One That Fixes Everything)

2. Few-Shot Learning (Show, Don't Tell)

3. Chain-of-Thought (Making the Model Show Its Work)

The Power Moves: Patterns I Reach For When Things Get Hard

4. Role-Based Prompting

5. Negative Prompting (The DO NOT List)

6. Prompt Chaining (Breaking It Down)

7. Contextual Priming

The Specialist Tools: Powerful But Situational

8. Self-Consistency (Voting on the Answer)

9. ReAct (Reasoning + Acting)

10. Tree of Thoughts

11. Meta-Prompting

12. Iterative Refinement

What I've Learned About Combining Patterns

The Honest Performance Table

The Advice I Wish I'd Gotten Earlier

Related Articles

Recommended for You