How to Build an Intelligent Account Scoring Model That Actually Works

Most scoring models break under real-world complexity. Build a dynamic model combining firmographic, behavioral, and intent signals to rank accounts.

Quick Answer: What Is an Account Scoring Model?

An account scoring model assigns a numerical value to each company based on how likely it is to become a customer. You build one by scoring firmographic data (revenue brackets, employee count, industry), technographic fit, engagement signals, and intent data, then combining them with a weighted formula. This article gives you the actual code.

What Is an Account Scoring Model?

An account scoring model is a system that ranks companies by their likelihood to buy your product. It takes raw data about a company -- revenue, headcount, industry, tech stack, how they interact with your brand -- and outputs a number that tells your sales team where to spend their time.

If you have worked in B2B, you have seen the alternative: reps cherry-picking accounts based on gut feel, or marketing dumping thousands of "qualified" leads into a CRM with no prioritization. Account scoring replaces that with a repeatable algorithm.

The difference between account scoring and lead scoring matters. Lead scoring evaluates individual people. Account scoring evaluates entire companies. In B2B, where buying committees of 6-10 stakeholders make purchase decisions, the company-level view is more predictive. Five engaged contacts at a well-fit company are worth more than fifty scattered leads at random organizations.

The rest of this article gives you the concrete algorithm. We will start with revenue bracket scoring (the most common firmographic signal), build up to a full weighted model in Python, show you how to calibrate weights against real data, and then cover what to do when the static model stops working.

The Revenue Bracket Scoring Algorithm

Revenue is the single most common firmographic signal in account scoring. The logic is simple: companies in certain revenue ranges are more likely to buy your product. A $5M ARR startup has different needs (and budgets) than a $500M enterprise.

The first step is defining your revenue brackets. These should come from your actual closed-won data, not guesswork. Pull your last 200 closed deals and look at the revenue distribution.

Revenue Bracket Scoring in SQL

If your account data lives in a database or warehouse, here is the SQL to assign bracket scores. This is a pattern you can plug directly into a dbt model or a CRM workflow.

-- Revenue bracket scoring algorithm
-- Adjust brackets and scores to match your ICP

SELECT
    account_id,
    account_name,
    annual_revenue,
    CASE
        WHEN annual_revenue IS NULL        THEN 20   -- Unknown: low default
        WHEN annual_revenue < 1000000      THEN 15   -- Under $1M: too small
        WHEN annual_revenue < 10000000     THEN 55   -- $1M-$10M: emerging fit
        WHEN annual_revenue < 50000000     THEN 90   -- $10M-$50M: sweet spot
        WHEN annual_revenue < 200000000    THEN 100  -- $50M-$200M: ideal
        WHEN annual_revenue < 1000000000   THEN 75   -- $200M-$1B: good, longer cycle
        ELSE                                    40   -- $1B+: enterprise, diff motion
    END AS revenue_score,
    CASE
        WHEN annual_revenue IS NULL        THEN 'unknown'
        WHEN annual_revenue < 1000000      THEN 'smb'
        WHEN annual_revenue < 10000000     THEN 'lower_mid_market'
        WHEN annual_revenue < 50000000     THEN 'upper_mid_market'
        WHEN annual_revenue < 200000000    THEN 'mid_enterprise'
        WHEN annual_revenue < 1000000000   THEN 'enterprise'
        ELSE                                    'large_enterprise'
    END AS revenue_tier
FROM accounts
ORDER BY revenue_score DESC;

Two things to notice. First, null handling matters. Revenue data is frequently missing, especially for private companies. Assigning a low default score (rather than zero) keeps these accounts in the funnel without over-prioritizing them. Second, the score curve is not linear. Your "sweet spot" bracket gets the highest score, and the score drops off on both ends. A company that is too small cannot afford you; a company that is too large may need an entirely different sales motion.

Revenue Bracket Scoring in Python

If you are building scoring in a Python pipeline (common with Clay workflows or custom enrichment scripts), here is the equivalent function.

def score_revenue_bracket(annual_revenue: float | None) -> dict:
    """Score an account based on annual revenue bracket.

    Returns dict with score (0-100), tier label, and reasoning.
    Adjust BRACKETS to match your ICP's revenue sweet spot.
    """

    # Define brackets: (max_revenue, score, tier_name)
    # Order matters -- first match wins
    BRACKETS = [
        (1_000_000,       15,  "smb"),
        (10_000_000,      55,  "lower_mid_market"),
        (50_000_000,      90,  "upper_mid_market"),
        (200_000_000,     100, "mid_enterprise"),
        (1_000_000_000,   75,  "enterprise"),
        (float("inf"),     40,  "large_enterprise"),
    ]

    if annual_revenue is None:
        return {
            "score": 20,
            "tier": "unknown",
            "reasoning": "Revenue data unavailable. Default low score."
        }

    for max_rev, score, tier in BRACKETS:
        if annual_revenue < max_rev:
            return {
                "score": score,
                "tier": tier,
                "reasoning": f"Revenue ${annual_revenue:,.0f} falls in {tier} bracket."
            }

    return {"score": 20, "tier": "unknown", "reasoning": "Fallback."}


# Usage
result = score_revenue_bracket(35_000_000)
# {'score': 90, 'tier': 'upper_mid_market', 'reasoning': 'Revenue $35,000,000 falls in upper_mid_market bracket.'}

Deriving Your Brackets from Data

Do not copy these brackets blindly. Pull your closed-won deals from the last 12 months, bucket them by revenue, and calculate win rate per bucket. Your brackets should reflect where you actually win, not where you wish you did. If 60% of your wins come from the $10M-$50M range, that is your sweet spot.

Building the Full Account Scoring Model

Revenue brackets are one signal. A complete account scoring model combines firmographic fit, technographic compatibility, engagement behavior, and intent data into a single weighted score. Here is a production-ready implementation.

Firmographic Scoring Function

This function scores the static attributes of a company: revenue, employee count, and industry. These three factors together form the firmographic foundation of your model.

def score_firmographics(account: dict) -> dict:
    """Score firmographic fit (0-100).

    Combines revenue bracket, employee count, and industry.
    """
    # Revenue score (reuse the function from above)
    rev = score_revenue_bracket(account.get("annual_revenue"))

    # Employee count score
    emp = account.get("employee_count")
    if emp is None:
        emp_score = 20
    elif emp < 50:
        emp_score = 15
    elif emp < 200:
        emp_score = 60
    elif emp < 1000:
        emp_score = 95
    elif emp < 5000:
        emp_score = 80
    else:
        emp_score = 50

    # Industry score -- your ICP verticals get the highest scores
    ICP_INDUSTRIES = {
        "saas": 100, "fintech": 95, "cybersecurity": 90,
        "healthtech": 80, "ecommerce": 70, "martech": 85,
    }
    industry = account.get("industry", "").lower()
    ind_score = ICP_INDUSTRIES.get(industry, 30)  # Default for non-ICP

    # Weighted combination within firmographics
    firmo_score = (rev["score"] * 0.45) + (emp_score * 0.30) + (ind_score * 0.25)

    return {
        "score": round(firmo_score, 1),
        "revenue": rev,
        "employee_score": emp_score,
        "industry_score": ind_score,
    }

Engagement Scoring with Time Decay

Engagement signals are inherently time-sensitive. A pricing page visit last week is a fundamentally different signal than one six months ago. The algorithm needs a decay function to prevent stale activity from inflating scores.

from datetime import datetime, timedelta
import math

def score_engagement(events: list[dict], now: datetime = None) -> dict:
    """Score engagement signals with exponential time decay.

    Each event has: type, timestamp, and a base weight.
    Decay halves the value every 14 days.
    """
    now = now or datetime.utcnow()
    HALF_LIFE_DAYS = 14

    # Base points by event type
    EVENT_WEIGHTS = {
        "pricing_page_view":  25,
        "demo_request":        50,
        "webinar_attended":    20,
        "email_replied":       30,
        "content_download":    15,
        "email_opened":        5,
        "page_view":           3,
    }

    total_decayed = 0
    for event in events:
        base = EVENT_WEIGHTS.get(event["type"], 1)
        days_ago = (now - event["timestamp"]).days
        # Exponential decay: value = base * (0.5 ^ (days / half_life))
        decay_factor = math.pow(0.5, days_ago / HALF_LIFE_DAYS)
        total_decayed += base * decay_factor

    # Normalize to 0-100 scale (cap at 200 raw points = 100 score)
    score = min(100, (total_decayed / 200) * 100)

    return {
        "score": round(score, 1),
        "raw_decayed_points": round(total_decayed, 1),
        "event_count": len(events),
    }

The Composite Scoring Function

Now combine everything into a single account score. The weighted formula is straightforward: normalize each component to 0-100, apply weights, sum them.

def score_account(account: dict, events: list[dict]) -> dict:
    """Calculate composite account score.

    Weights should be calibrated against your closed-won data.
    These defaults are a reasonable starting point.
    """
    WEIGHTS = {
        "firmographic":  0.45,  # Largest signal for most B2B
        "engagement":    0.25,  # Active interest indicator
        "technographic": 0.20,  # Stack compatibility
        "intent":        0.10,  # Third-party signals
    }

    # Score each component
    firmo = score_firmographics(account)
    engagement = score_engagement(events)
    techno_score = account.get("technographic_score", 50)
    intent_score = account.get("intent_score", 30)

    # Weighted composite
    composite = (
        firmo["score"]   * WEIGHTS["firmographic"] +
        engagement["score"] * WEIGHTS["engagement"] +
        techno_score       * WEIGHTS["technographic"] +
        intent_score       * WEIGHTS["intent"]
    )

    # Assign priority tier
    if composite >= 80:
        tier = "hot"
    elif composite >= 60:
        tier = "warm"
    elif composite >= 40:
        tier = "nurture"
    else:
        tier = "cold"

    return {
        "composite_score": round(composite, 1),
        "tier": tier,
        "components": {
            "firmographic": firmo["score"],
            "engagement": engagement["score"],
            "technographic": techno_score,
            "intent": intent_score,
        },
        "weights": WEIGHTS,
    }

Worked Example

Here is what the output looks like for a real account running through this model.

Example: TechCorp Inc -- Score: 78.3/100

Firmographic (85.3 x 0.45 = 38.4) -- $35M revenue (upper_mid_market), 420 employees, SaaS industry. Strong ICP fit across all three firmographic dimensions.
Engagement (68.0 x 0.25 = 17.0) -- 3 pricing page views in the last 10 days, replied to one email, downloaded a case study. Time decay is working in their favor because the activity is recent.
Technographic (80 x 0.20 = 16.0) -- Uses Salesforce (direct integration), no competing product installed.
Intent (69 x 0.10 = 6.9) -- Moderate research activity on G2 in the relevant category.

Tier: Hot. But the score alone does not tell you who at TechCorp is driving the research, why they are looking now, or which product of yours is the best fit. We will come back to that gap.

How to Calibrate Your Scoring Weights

The algorithm above works out of the box, but the default weights are generic. The difference between a mediocre scoring model and one that actually predicts conversion is calibration against your own data.

The process is straightforward: calculate win rate by factor, then adjust weights so that the factors most correlated with winning get the most weight.

SQL: Win Rate by Revenue Bracket

-- Win rate by revenue bracket (last 12 months)
SELECT
    CASE
        WHEN a.annual_revenue < 1000000    THEN 'Under $1M'
        WHEN a.annual_revenue < 10000000   THEN '$1M-$10M'
        WHEN a.annual_revenue < 50000000   THEN '$10M-$50M'
        WHEN a.annual_revenue < 200000000  THEN '$50M-$200M'
        ELSE '$200M+'
    END AS revenue_bracket,
    COUNT(*) AS total_opps,
    SUM(CASE WHEN o.stage = 'closed_won' THEN 1 ELSE 0 END) AS wins,
    ROUND(
        SUM(CASE WHEN o.stage = 'closed_won' THEN 1 ELSE 0 END) * 100.0
        / NULLIF(COUNT(*), 0)
    , 1) AS win_rate_pct
FROM opportunities o
JOIN accounts a ON o.account_id = a.id
WHERE o.created_at >= CURRENT_DATE - INTERVAL '12 months'
GROUP BY 1
ORDER BY win_rate_pct DESC;

Run the same analysis for employee count, industry, and each engagement signal. The pattern that emerges will tell you where your current weights are off. If your $10M-$50M bracket wins at 32% but your $50M-$200M bracket wins at only 18%, your revenue scoring curve needs to reflect that -- not the other way around.

Sample Size Matters

You need at least 30 opportunities per bracket before the win rate is statistically meaningful. If you are pre-scale with fewer than 200 total opportunities, start with the default weights and recalibrate after two quarters of data. Premature optimization of scoring weights is a common trap.

Adjusting Weights Based on Correlation

Once you have win rates by factor, the calibration logic is simple. Factors with higher win-rate variance between "good" and "bad" values deserve more weight. If industry predicts conversion at 3x the rate of employee count, it should carry more weight in the formula.

A practical approach: score your closed-won deals and closed-lost deals through the model. If the model cannot distinguish between the two groups, the weights are wrong. Iterate until the score distributions for wins and losses have minimal overlap.

Revisit this analysis quarterly. Your ICP shifts as you grow, and weights that were accurate six months ago may not reflect your current market position.

Where Static Scoring Breaks Down

The model above is solid for prioritization. It will outperform gut-feel account selection and it will give your SDRs a ranked list they can work from. But after running it for 12-18 months, most teams hit the same ceiling.

Three failure modes show up repeatedly.

Context collapse

A 78-point account at Company A and a 78-point account at Company B get the same treatment. But Company A is in your core vertical with a champion you met at a conference, while Company B is a stretch industry with no existing relationship. The score collapses the rich context that determines how you should actually engage into a single number.

Your best reps work around the score. They look at the account, do their own research, and make a judgment call. Your less experienced reps follow the score blindly and get worse results. The scoring model is only as useful as the context it throws away.

Tribal knowledge stays outside the model

Your best SDRs know which verticals are hot right now, which competitor situations are winnable, and which buyer personas convert fastest. This knowledge lives in their heads. It never makes it into the CASE statements or weight configurations.

As your persona-based targeting becomes more sophisticated, the gap between what the model knows and what the team knows widens. The scoring algorithm cannot encode "we just hired an AE who came from their industry and has relationships there."

Static weights in a dynamic market

You calibrate weights based on last quarter's data. A new competitor launches. A product release changes your ICP. A macroeconomic shift makes one vertical suddenly more budget-conscious. Your weights are now wrong, but the model still returns confident-looking numbers.

Even teams that commit to quarterly recalibration find the weights drift faster than they can update them. The market is continuous; your calibration cycle is discrete. That gap is structural, not operational.

The Real Limitation

The fundamental problem with static scoring is not the math. The math is fine. The problem is that a single number cannot carry enough information for a rep to take the right action. A score tells you which accounts to prioritize. It cannot tell you why they are a fit or how to approach them.

From Static Scores to AI-Powered Qualification

This is where the scoring conversation shifts. If your static model is working well enough for basic prioritization, keep it. But when you need the why behind the score -- the reasoning that tells a rep how to engage an account, not just whether to -- you need a different kind of system.

AI-powered qualification agents evaluate accounts against your actual ICP definition, product criteria, and competitive positioning. Instead of returning just a number, they return a score plus the reasoning behind it.

How Qualification Agents Work

Octave provides two qualification agents that operate this way: the Qualify Company Agent and the Qualify Person Agent.

The Qualify Company Agent evaluates a company against one or more of your products to determine fit. You define "good fit" and "bad fit" qualifying questions in your Octave Library (for example: "Does the company sell B2B software?" or "Is the company in a highly regulated industry where our compliance features matter?"). The agent researches the company and answers each question with a yes/no determination, a rationale, and a confidence level (LOW, MEDIUM, or HIGH).

The output includes an overall score, an overall rationale explaining the score, and individual answers to each qualifying question. This is a meaningfully different output than a static score. When a rep sees "Score: 82 -- Strong fit. They are a mid-market SaaS company using Salesforce (we integrate natively). Answered YES with HIGH confidence to 4 of 5 good-fit questions. One concern: they appear to have a small sales team, which may limit expansion potential" -- they know exactly what to do with that account.

The Qualify Person Agent does the same thing at the individual contact level, evaluating a person against both a product and a persona. The output includes product qualification, persona fit, and segment alignment, each with their own scores and reasoning.

Calling the API

Both agents are accessible via API, which means you can integrate them into Clay workflows, custom pipelines, or any orchestration tool. Here is what a call to the Qualify Company Agent looks like.

import requests

def qualify_company(company_domain: str, agent_id: str, api_key: str,
                     runtime_context: dict = None) -> dict:
    """Qualify a company using Octave's Qualify Company Agent.

    Args:
        company_domain: e.g. "techcorp.com"
        agent_id: Your agent OId from the Octave dashboard
        api_key: Your Octave API key
        runtime_context: Optional dict of known data to pass
            (e.g., {"employee_count": 420, "tech_stack": ["salesforce"]})
    """
    response = requests.post(
        "https://app.octavehq.com/api/v2/agents/qualify-company/run",
        headers={"api_key": api_key, "Content-Type": "application/json"},
        json={
            "agentOId": agent_id,
            "companyDomain": company_domain,
            "runtimeContext": runtime_context,
            "includeFullAnnotation": False,
        }
    )
    response.raise_for_status()
    return response.json()


# Example usage
result = qualify_company(
    company_domain="techcorp.com",
    agent_id="your_agent_oid",
    api_key="your_api_key",
    runtime_context={"employee_count": 420}
)

# Response includes:
# - Overall score + rationale
# - Product qualification (answers to good/bad fit questions)
# - Segment qualification
# - Confidence levels (LOW, MEDIUM, HIGH) per answer
# - Disqualifier summary (if any instant disqualifiers triggered)

Runtime Context Tip

If you have specific quantitative data about a company (employee count, revenue, tech stack) from an enrichment step, pass it as runtimeContext. This prevents the agent from having to infer those values from public sources. For questions like "Does the company have more than 100 employees?", the agent will use your runtime context value for a precise answer rather than an estimate.

When to Use Each Approach

Static scoring and AI qualification are not mutually exclusive. They solve different problems at different points in your funnel.

Use Case	Static Scoring	AI Qualification
Bulk prioritization of 10K+ accounts	Fast, cheap, good enough	Slower, higher cost per account
SDR outreach prep	Tells them which account to call	Tells them what to say and why
ICP drift detection	Requires manual recalibration	Adapts as you update Library criteria
Multi-product qualification	Needs separate models per product	One agent, multiple product evaluations
Rep trust and adoption	"The score says 75" (opaque)	"Here is why they fit" (transparent)

A practical setup: use your static scoring model as the first filter to identify which accounts are worth qualifying, then run the Qualify Company Agent on the top tier. This keeps API costs proportional to the accounts that matter while giving reps the context they need on priority accounts.

Frequently Asked Questions

How do you build a scoring algorithm based on company revenue brackets?

Define revenue tiers that match your ICP (for example: under $1M, $1M-$10M, $10M-$50M, $50M-$200M, $200M+), assign each tier a score from 0-100 based on your historical win rates per bracket, then use a SQL CASE statement or Python conditional to map each account to its bracket score. Combine the revenue score with employee count, industry, and other signals using a weighted formula. The code examples in this article are production-ready starting points.

What is the difference between lead scoring and account scoring?

Lead scoring evaluates individual contacts. Account scoring evaluates entire companies. In B2B, where buying committees of 6-10 stakeholders make purchase decisions, account scoring is more predictive of deal potential. Most mature GTM teams use both: account scoring for prioritization, lead scoring for routing contacts within prioritized accounts.

How often should I recalibrate my scoring weights?

Quarterly at minimum. Run the win-rate-by-factor SQL queries shown in this article and compare the output to your current weight configuration. Major recalibration should happen after product launches that change your ICP, when entering new markets, or when you see score distributions for wins and losses converging (meaning the model is losing discriminative power).

What is a good score threshold for SDR follow-up?

It depends on your funnel capacity. If SDRs are capacity-constrained, set the threshold higher (70+). If you need top-of-funnel volume, lower it (50+). The right threshold balances coverage with conversion rate. Monitor both metrics and adjust weekly until you find the equilibrium where SDRs are neither starved nor overwhelmed.

Should I use machine learning for account scoring?

ML can improve accuracy if you have enough closed-won data (500+ deals minimum) and clean input features. The tradeoff is explainability: a gradient-boosted model may score more accurately, but your reps cannot see why an account scored high or low, which hurts adoption. A practical middle ground is to use a rules-based model for transparency and layer AI qualification agents on top for contextual reasoning about why an account fits.

What data sources do I need for account scoring?

At minimum: CRM data (company name, industry, employee count, revenue) and your own engagement data (website visits, email interactions). To improve accuracy, add technographic data from providers like BuiltWith or SimilarTech, and third-party intent signals from Bombora or G2 Buyer Intent. The more signals you layer in, the more discriminative the model becomes -- but start with what you have and add sources incrementally.

The foundation for agentic GTM

Try for free