Multi-channelTutorials•6 min read

Build an AI Ad-Copy A/B Testing Pipeline in 30 Minutes

Claude generates variants. Promptfoo scores them. You ship the winner. End-to-end, no spreadsheets.

Most ad copy testing is theater. Three barely-different variants from one writer's brain, judged by clicks for a week, no idea which structural variable actually moved the needle. This pipeline fixes both problems.

The end state: you write a brief, the pipeline produces 10 structurally varied versions, scores them automatically against your criteria, and you ship the top two for the live test.

What you need

Claude API key (Anthropic console, $5 covers months of this)
promptfoo installed (npm i -g promptfoo)
A JSON file with your brand voice rules

Step 1: Define your variant axes (5 min)

Edit axes.json:

{
  "hook_type": ["question", "stat", "pattern_interrupt", "story", "POV"],
  "length": ["short", "medium"],
  "cta_style": ["soft", "direct"],
  "emotion": ["fear_of_missing_out", "curiosity", "aspiration"]
}

Pick axes that you actually believe affect performance. Don't include axes you're not willing to act on.

Step 2: Write the brief prompt (10 min)

brief.txt:

Generate ONE ad copy variant for [product] given these constraints:

Hook type: {{hook_type}}
Length: {{length}}
CTA style: {{cta_style}}
Emotion to target: {{emotion}}

Brand voice rules:
[paste from voice-rules.json]

Banned words: [list]

Output ONLY the ad copy. No commentary.

Step 3: Configure promptfoo eval (10 min)

promptfooconfig.yaml:

prompts:
  - file://brief.txt
providers:
  - anthropic:claude-sonnet-4-5
tests:
  - vars:
      product: "[your product description]"
    assert:
      - type: llm-rubric
        value: "Does NOT use banned words from voice-rules.json"
      - type: llm-rubric
        value: "Reads like a person wrote it, not a template"
      - type: llm-rubric
        value: "Makes a specific claim, not vague benefits"
      - type: llm-rubric
        value: "CTA is under 5 words and active"

Step 4: Run the matrix (1 min)

promptfoo eval -t axes.json

This generates a copy for every combination of axes (60 variants in our example), grades each on your rubric, and outputs a sortable HTML report.

Step 5: Pick + ship (5 min)

Top 2 by score go into your ad platform as the live A/B test. The rest stay in a "graveyard" file for next quarter — sometimes losers become winners when the audience or offer shifts.

What this catches that human review doesn't

Subtle voice violations. You'd let "leverage" or "unlock" slip past at the end of a 4-hour copy session. The rubric won't. Over a year, that catches dozens of brand drift moments you'd never notice individually.

Go deeper

Get new tools first

Fresh templates, tools, and automation recipes in your inbox each week. No noise.

More from the playbook

Back to playbook

Loading…

Step 1: Define your variant axes (5 min)

Edit axes.json:

{ "hook_type": ["question", "stat", "pattern_interrupt", "story", "POV"], "length": ["short", "medium"], "cta_style": ["soft", "direct"], "emotion": ["fear_of_missing_out", "curiosity", "aspiration"] }

Pick axes that you actually believe affect performance. Don't include axes you're not willing to act on.

Generate ONE ad copy variant for [product] given these constraints: Hook type: {{hook_type}} Length: {{length}} CTA style: {{cta_style}} Emotion to target: {{emotion}} Brand voice rules: [paste from voice-rules.json] Banned words: [list] Output ONLY the ad copy. No commentary.

Step 3: Configure promptfoo eval (10 min)

promptfooconfig.yaml:

prompts: - file://brief.txt providers: - anthropic:claude-sonnet-4-5 tests: - vars: product: "[your product description]" assert: - type: llm-rubric value: "Does NOT use banned words from voice-rules.json" - type: llm-rubric value: "Reads like a person wrote it, not a template" - type: llm-rubric value: "Makes a specific claim, not vague benefits" - type: llm-rubric value: "CTA is under 5 words and active"

Build an AI Ad-Copy A/B Testing Pipeline in 30 Minutes

What you need

Step 1: Define your variant axes (5 min)

Step 2: Write the brief prompt (10 min)

Step 3: Configure promptfoo eval (10 min)

Step 4: Run the matrix (1 min)

Step 5: Pick + ship (5 min)

What this catches that human review doesn't

How AI Is Transforming B2B Lead Generation

Maximizing ROI with AI Automation: Operator Framework

Get new tools first

More from the playbook

Set Up Claude Code for a Marketing Team From Scratch

Automate Competitor SERP Monitoring With n8n and Claude

AI Lead Scoring: Prompts, Pipelines, and the Pitfalls Nobody Warns You About

Build an AI Ad-Copy A/B Testing Pipeline in 30 Minutes

What you need

Step 1: Define your variant axes (5 min)

Step 2: Write the brief prompt (10 min)

Step 3: Configure promptfoo eval (10 min)

Step 4: Run the matrix (1 min)

Step 5: Pick + ship (5 min)

What this catches that human review doesn't

How AI Is Transforming B2B Lead Generation

Maximizing ROI with AI Automation: Operator Framework

Get new tools first

More from the playbook

Set Up Claude Code for a Marketing Team From Scratch

Automate Competitor SERP Monitoring With n8n and Claude

AI Lead Scoring: Prompts, Pipelines, and the Pitfalls Nobody Warns You About