Multi-channelAI Marketing Automation•6 min read

AI Lead Scoring: Prompts, Pipelines, and the Pitfalls Nobody Warns You About

LLMs are weirdly good at lead scoring. Until they're not. Here's how to set it up, and the three failure modes that catch every team off guard.

Rule-based lead scoring is brittle. You get one weight wrong (say, "title contains Director = +10") and you ship junk to sales for a month before catching it. LLMs handle the fuzzy reasoning humans use intuitively — "this lead's job title is weird but the company profile and email signature pattern screams enterprise budget."

The catch: LLMs also confidently score garbage when you set them up wrong. Here's the path that works.

The minimal pipeline

New lead → enrich → score with Claude → assign tier → push to CRM

The score prompt is the part that matters. Everything else is plumbing.

The score prompt that doesn't suck

You are a B2B sales-development analyst scoring inbound leads for [company]
that sells [product] to [ICP].

Given the lead data below, return JSON with:
- score: 1-100
- tier: "hot" | "warm" | "cool" | "junk"
- reasoning: 2-3 sentences
- top_risk: the one thing that would make this lead not convert

Lead data:
[enriched profile]

Scoring rubric:
- Title fit (0-25): does this person have authority + budget for our price point?
- Company fit (0-25): is the company the right size, stage, industry?
- Intent signal (0-25): what evidence do we have they're actively shopping?
- Engagement quality (0-25): what is the quality of how they reached us?

Be honest in `top_risk`. A great-looking lead with a tell that they're a tire-kicker should score below an okay lead with strong intent.

Pitfall #1: The model loves your leads too much

Default LLM behavior is to find reasons everything is good. After two weeks, you'll notice 70% of your leads are scoring "warm" or higher. That's calibration drift.

Fix: ground the prompt with examples. Append:

Reference cases (these score from your past data):
- [paste 3 past hot leads that closed]
- [paste 3 past warm leads that needed nurture]
- [paste 3 past junk leads that wasted SDR time]

The few-shot examples pull the model's calibration toward your actual reality.

Pitfall #2: Hallucinated signals

The model will sometimes invent things it "saw" in the data. "Their funding round suggests they're scaling" — but no funding data was in the input. This is hallucination dressed as inference.

Fix: require the model to quote the source data for each scoring dimension. If it can't quote, it can't score that dimension.

Pitfall #3: The model gets worse over time (silently)

Three months in, score quality drifts. Could be the model version updated. Could be your audience changed. Could be that your ideal-customer definition changed and nobody updated the prompt.

Fix: weekly drift check. Sample 20 random leads from the past week, have a human re-score them, compare to the LLM's score. If average gap > 10 points, retrain — meaning: update the few-shot examples in the prompt.

What ships better than this

Honestly, nothing if your lead volume is under 500/month. The pipeline costs about 8 cents per lead in API calls and 30 minutes to set up. If it cuts your SDR's time on bad leads by 30%, the math works for any team doing more than a hundred leads a week.

Go deeper

Get new tools first

Fresh templates, tools, and automation recipes in your inbox each week. No noise.

More from the playbook

Back to playbook

Loading…

The score prompt that doesn't suck

You are a B2B sales-development analyst scoring inbound leads for [company] that sells [product] to [ICP]. Given the lead data below, return JSON with: - score: 1-100 - tier: "hot" | "warm" | "cool" | "junk" - reasoning: 2-3 sentences - top_risk: the one thing that would make this lead not convert Lead data: [enriched profile] Scoring rubric: - Title fit (0-25): does this person have authority + budget for our price point? - Company fit (0-25): is the company the right size, stage, industry? - Intent signal (0-25): what evidence do we have they're actively shopping? - Engagement quality (0-25): what is the quality of how they reached us? Be honest in `top_risk`. A great-looking lead with a tell that they're a tire-kicker should score below an okay lead with strong intent.

Pitfall #1: The model loves your leads too much

Default LLM behavior is to find reasons everything is good. After two weeks, you'll notice 70% of your leads are scoring "warm" or higher. That's calibration drift.

Fix: ground the prompt with examples. Append:

Reference cases (these score from your past data): - [paste 3 past hot leads that closed] - [paste 3 past warm leads that needed nurture] - [paste 3 past junk leads that wasted SDR time]

The few-shot examples pull the model's calibration toward your actual reality.

Pitfall #2: Hallucinated signals

The model will sometimes invent things it "saw" in the data. "Their funding round suggests they're scaling" — but no funding data was in the input. This is hallucination dressed as inference.

Fix: require the model to quote the source data for each scoring dimension. If it can't quote, it can't score that dimension.

Pitfall #3: The model gets worse over time (silently)

Three months in, score quality drifts. Could be the model version updated. Could be your audience changed. Could be that your ideal-customer definition changed and nobody updated the prompt.

AI Lead Scoring: Prompts, Pipelines, and the Pitfalls Nobody Warns You About

The minimal pipeline

The score prompt that doesn't suck

Pitfall #1: The model loves your leads too much

Pitfall #2: Hallucinated signals

Pitfall #3: The model gets worse over time (silently)

What ships better than this

How AI Is Transforming B2B Lead Generation

Maximizing ROI with AI Automation: Operator Framework

Get new tools first

More from the playbook

The AI Marketing-Ops Stack That Replaced 5 SaaS Tools

Personalize 10,000 Emails a Week With AI Without Sounding Like a Bot

Build an AI Ad-Copy A/B Testing Pipeline in 30 Minutes

AI Lead Scoring: Prompts, Pipelines, and the Pitfalls Nobody Warns You About

The minimal pipeline

The score prompt that doesn't suck

Pitfall #1: The model loves your leads too much

Pitfall #2: Hallucinated signals

Pitfall #3: The model gets worse over time (silently)

What ships better than this

How AI Is Transforming B2B Lead Generation

Maximizing ROI with AI Automation: Operator Framework

Get new tools first

More from the playbook

The AI Marketing-Ops Stack That Replaced 5 SaaS Tools

Personalize 10,000 Emails a Week With AI Without Sounding Like a Bot

Build an AI Ad-Copy A/B Testing Pipeline in 30 Minutes