A/B Testing & Experimentation

Measuring the True Incrementality of Personalization: Why Feature-Level Testing Isn’t Enough

By Nick Bade
Cardboard pieces with question marks on them

Executives don’t just want to know what’s working—they want to know how much it’s worth. This framework helps you measure personalization’s total business impact with confidence.

Personalization has come a long way.

Modern brands have moved far beyond [first-name] in subject lines into dynamic, real-time product recommendations and adaptive experiences powered by machine-learning models. Today’s leaders deliver deeply embedded personalization across every channel—web, app, email, SMS, and push—based on browsing behavior, purchase patterns, and predicted intent.

But while personalization strategy and activation have advanced dramatically, measurement has often lagged behind.

Feature-level A/B tests remain the dominant tool for measurement and attribution. These tests are useful and necessary—but they do not answer the most important question for executives, shareholders, and P&L owners:

What is the real, total incremental revenue attributable to personalization?

The ability to quantify that number—accurately and with analytical rigor—transforms personalization from a tactical enhancement into a proven enterprise growth strategy.

Why A/B Testing Isn’t Enough

A/B testing continues to be a foundational part of any experimentation program. It’s how we test hypotheses, validate new experiences, and incrementally improve performance. But A/B tests are limited in what they can tell us:

  • They typically isolate individual features without accounting for interaction effects. When features stack, their combined impact is often less (or more) than the sum of their parts.
  • Most tests are run over short time windows, making them vulnerable to seasonality, limited sample sizes, and noise.
  • They measure lift at a moment in time—not over a lifecycle, cohort, or customer journey.
  • Wins are often overstated due to targeting bias or sampling artifacts.
  • And most importantly, they don't measure the true volume of revenue the entire personalization program is generating.

Executives want to understand the business value of personalization. Tactical tests alone can’t answer that. To answer these questions, brands must implement a program-level measurement strategy.

What Program-Level Measurement Looks Like

A program-level measurement strategy helps you quantify the full value of personalization investments—across users, channels, and tactics.

It’s built on two key components:

  1. Persistent holdout groups to observe what performance would look like without personalization, across both known and anonymous users.
  2. Tactical A/B testing to optimize individual features and tactics, layered on top of the holdout framework.

This combination reveals both the big picture and the fine-grained detail. And most importantly, it measures the real, full scale impact of investments on the organization’s bottom line.  

How to Structure Holdout Groups

To build this strategy, brands design a multi-layered holdout system:

1. Combined Universal Holdout (Known Users)

Exclude a small percentage (e.g., 2–5%) of CRM-identified users from all personalized experiences—CRM, web, and app. This group provides the clearest read on net lift from personalization across the entire known user base.

2. Anonymous Web/App Holdout (Unknown Visitors)

Create a randomized group of anonymous web or app users who receive no personalization. This lets you isolate the impact of behavior- or session-level personalization tactics.

3. CRM-Only Holdout

Hold out another group from personalized email and SMS messaging but allow web and app personalization. This allows you to isolate the impact of outbound channel personalization.

4. Web/App-Only Holdout

Do the inverse—exclude known users from on-site or in-app personalization but continue sending them CRM messages. This gives you channel-level attribution on the on-site experience.

5. Feature-Level A/B Tests

Continue running individual experiments on specific personalization tactics. These tests remain important for learning and optimization—but their value is enhanced when interpreted within the context of full-program lift.

Calculating Truly Incremental Revenue

At the core of this strategy is a straightforward (but powerful) formula:

True Incremental Revenue =
Lift from Known Users (via Universal Holdout)
+
Lift from Unknown Users (via Anonymous Holdout)

Example:

Let’s say your digital business drives $100M/year in revenue.

  • Known users (CRM-identified) contribute 60% = $60M
  • Anonymous users contribute 40% = $40M
  • Your holdout tests show:
    • 10% lift from personalization for known users = $6M
    • 4% lift for anonymous users = $1.6M

Total incremental revenue: $6M + $1.6M = $7.6M annually

That’s net-new revenue directly tied to personalization. Real dollars. Real business impact.

What This Unlocks

This approach gives organizations more than just stronger analytics. It unlocks:

  • A shared source of truth across marketing, product, and analytics teams
  • Clarity on where value is coming from: CRM vs. on-site/app personalization
  • Visibility into overlap and cannibalization, allowing smarter prioritization
  • Accountability at the program level—not just for individual wins, but for overall performance
  • A trusted ROI model that can be shared with finance, leadership, and the board

In other words: confidence. Credibility. And a clearer case for investment.

The Role of A/B Testing—Reimagined

This isn’t about replacing A/B tests. In fact, a strong measurement strategy makes individual experiments more valuable.

By comparing top-down lift (from holdouts) with bottom-up lift (from tests), you can calculate a true incrementality ratio. That helps you:

  • See how much value is being captured vs. predicted
  • Identify inflated or redundant wins
  • Focus on the personalization tactics that are moving the needle

It turns experimentation into a feedback loop that supports learning and strategy—not just optimization.

Final Thoughts: Measurement That Moves the Needle

If your team is personalizing at scale—but still measuring in fragments—this is your opportunity.

Program-level measurement helps move from testing in silos to telling a compelling story about the total value of your work. It’s how brands shift from isolated optimizations to enterprise-level growth impact.

And in a world where budgets are tight and leadership is looking for ROI, it’s exactly the kind of clarity that earns confidence—and future investment.

Want to learn more?

We’ve helped brands across industries design and implement these strategies. If you're ready to elevate how you measure personalization, let’s talk.

In today’s environment, every investment needs to prove its value. We can help you measure and communicate the true incremental impact of your team’s work.

Sign up to receive our bimonthly newsletter!
White envelope icon symbolizing email on a purple and pink gradient background.

Not sure on your next step? We'd love to hear about your business challenges. No pitch. No strings attached.

Concord logo
©2025 Concord. All Rights Reserved  |
Privacy Policy