Concord USA on Embracing Non-Winning Test Results

Reframing A/B testing failures for valuable insights.

In the late 1960s, a 3M scientist named Spencer Silver endeavored to create a super-strong adhesive for the aerospace industry. Spencer Silver failed. Instead, he created a reusable, low-tack adhesive. If Silver and his colleagues had let failed experiments lie, this wouldn’t be a very interesting opening story. Luckily for me—and for anyone who has ever worked in an office—Silver’s “failed” adhesive was recognized for its potential, and the Post-it note was born.

We can take a note from 3M’s playbook here, because most companies' experimentation cultures prize “positive” results and dismiss experiments that don’t confirm hypotheses. In this discussion, we’ll explore how null results can still provide crucial information for decision-making. By the end, I hope you’ll agree with me that null results can still be noteworthy.

Publication Bias Impacts Everyone

It’s an open secret in academia that there is a publication bias: researchers are more likely to submit—and editors are more likely to publish—papers with positive results, leaving null or inconclusive findings in the proverbial “file drawer,” never contributing to our collective knowledge about a subject. This problem is so prevalent that the phenomenon has been coined the “file-drawer problem.”

Businesses face the same challenge—sometimes with even fewer guardrails and stronger incentives to promote winning experiments. While academia has tried to address this trend, albeit only partially, by establishing journals focused solely on replication and requiring researchers to conduct meta-analyses, most companies have no formal mechanism for preserving "unsuccessful" experimental knowledge.

There is high-stakes pressure to find winners. Data scientists and analysts often work in environments where team leaders must justify resource spending with positive ROI from experimentation. Company goals are frequently focused on wins, not learning outcomes.

This, unfortunately, creates a perfect storm for experimental bias. Four common temptations include:

Data Dredging – repeating experiments until you get a favorable result
Selective Reporting – cherry-picking only the metrics that show improvement
Data Snooping – segmenting data post hoc to find seemingly successful subgroups
P-hacking – adjusting alpha levels to reach statistical significance

However, this “winner-take-all” culture creates organizational blind spots that can pose significant costs to the business. A few examples include:

Hidden Victories – When thoughtful experimentation occurs early and often, there’s an opportunity to reduce negative impact by testing changes and preventing the rollout of unsuccessful features
Duplicated Effort – Teams may repeatedly explore paths that others have already discovered to be dead ends
Resource Waste – Time and money are spent relearning known limitations
Reduced Insights – Some of the most valuable findings come from understanding why an experiment didn’t work

Perhaps most damaging, however, is that this type of culture stifles innovation. In environments where only positive results are valued, there’s a tendency to gravitate toward safe, incremental experiments with a high probability of success, rather than bold explorations that might yield transformative insights—even if they initially show null results.

A Note on Terminology

In this article, we will use the terms “failed” or “non-winning” results as a catch-all for experiments that do not produce the anticipated effect in a statistically significant way. Some organizations may refer to these as “flat,” “null,” “in the noise,” or “non-significant.”

Part 1: How to Learn from Every Experiment

What Makes a “Failed” Experiment Valuable

‍Not all failed experimental results hold the same weight. The difference between a wasteful experiment and a valuable learning opportunity often lies not in the outcome, but in the experimental design and execution. Several factors distinguish informative “non-winners” from experiments that fail to generate useful insights

Clear Hypothesis Formulation and Experimental Design

‍A clear, testable hypothesis should be the bedrock of every experiment—full stop. Even outside a strict scientific framework, a hypothesis like “Will changing the button color increase conversions?” is far more valuable than one that simply aims to “improve the website.” This specificity allows for better learning opportunities, even when the answer is “no.”

Experiments that provide meaningful insights typically:

Start with specific, testable hypotheses
Connect hypotheses directly to business questions or objectives
Identify clear criteria for supporting or rejecting the hypothesis
Specify thresholds for practical significance, not just statistical significance (alpha levels)

Rigorous Methodology and Execution

‍Methodology plays a critical role in the credibility of any experiment—especially one with a non-winning result. Consider two A/B tests with statistically non-significant outcomes: Experiment 1 ran for two days with a small sample size; Experiment 2 ran for two weeks with sufficient power. Experiment 2 is the “failure” worth learning from because its setup is sound.

Experiments that yield helpful insights typically include:

A power analysis and subsequently sufficient observed power
Random assignment to treatment and control groups (even if random selection isn't possible)
Adequate duration to account for seasonal patterns or novelty effects

Reliable Data Collection and Unbiased Analysis‍

Data quality fundamentally determines the value of any experimental result. It’s a “garbage in, garbage out” situation. A properly conducted experiment showing no effect is much more valuable than a flawed experiment showing a dramatic effect. The former provides potential guidance; the latter can lead businesses astray.

You should have increased confidence in an experiment that has the following:

Complete data capture without systematic gaps
Analysis methods suited to the data’s characteristics
Proper application of statistical tests without engaging in p-hacking

Documentation of Process and Context

‍Context transforms raw results into organizational knowledge. In a few quarters, when someone proposes a similar idea, a well-documented null result in a central repository prevents the organization from revisiting a dead end.

Helpful documentation should include:

Thorough records of experimental conditions and experiences deployed
- Images are often extremely helpful for this step
Documentation of technical implementation
Logs of any unexpected events or deviations from the predefined experimental protocol
Any business context surrounding the experiment that is likely to be lost over time or due to employee turnover

Transparent Reporting of Results

‍The communication of results determines whether learning spreads. A transparent report that clearly states, “We found no evidence to support the hypothesis that feature X improves metric Y (Credible Intervals: -0.5 to +0.5),” can provide helpful guidance to teams.

To increase the likelihood of accepted learnings, engage in the following:

Clear presentation of the hypothesis and actual findings
Honest reporting of limitations and caveats
Descriptions that highlight practical significance, not just statistical outcomes
Discussion of alternative interpretations and next steps

Don’t forget: experimentation is an iterative process. A “failed” experiment is just one phase in that cycle. This may lead to informed re-experimentation of a previously “failed” experiment.

The Business Value of Documenting “Failures”

‍The initial letdown of a "failed" experiment often blinds organizations to the important long-term value these outcomes can provide. By systematically documenting and sharing non-winning experiments, companies can unlock additional business benefits.

Prevents Repetition of Unproductive Paths
Without systematic documentation of what doesn’t work, organizations are likely to revisit the same fruitless endeavors as team members change, leadership shifts, or organizational memory fades.
Accelerates Team Learning Curves
New team members and recently formed teams can leverage documented failures to quickly reach the same level of domain understanding that took others years to develop, creating a hefty ROI on knowledge transfer.
Saves Resources in the Long Run
While documenting experiments requires an upfront investment, the long-term resource savings are substantial. Additionally, if possible, track the “opportunity cost savings” of avoiding major investments in approaches that experimentation has shown to be ineffective.
Provides Better Experimental ROI
Every experiment is an investment of time, technology resources, and often opportunity cost related to customer experience. Extracting maximum value from each experiment improves the overall ROI of your data science and experimentation operations.
Increases Experimentation Program Metadata
Every experiment your team runs and documents adds a data point to the distribution of knowledge about your customers, products, and ultimately their impact on your experimentation program. Visualizing your experiment history as a histogram—with width representing experiments across different segments, features, or parts of your webpage, and height representing how often experiments in each bin produce a win—lets you identify zones of frequent wins (high bars) and frequent failures (low bars). This informs resource allocation toward proven successful areas and strategic investments in understanding underperforming domains.

Best Practices for Learning from All Experiments‍

Highlighted below are some approaches that require intentional effort but will allow your business to maximize ROI on experimental investment by documenting learnings from all experiments.

Create Experiment Documentation
Individual data scientists and experimentation teams benefit greatly from maintaining structured records of their experimental journey—not just the highlights. Winners are easy to remember, so special attention should be dedicated to documenting non-winners.
This documentation should include experimental basics (e.g., hypothesis, methodology, results) as well as learnings (e.g., surprises, follow-up questions generated, reflections on how the experiment could have been implemented differently).
Build an Experimentation Knowledge Repository
Individual experiment documentation becomes useful when organized into a searchable, easily cross-referenced knowledge system.
This repository should include standard templates for documentation and a tagging system for categorization (e.g., by product area, customer segment, methodology, etc.).
Host Knowledge Sharing Sessions
Regular roundtables focused on sharing experiments with null results create space for cross-team learning. It is important that these sessions do not devolve into finger-pointing.

Part 2: How to Tell the Story of Null Result Experiments

Types of “Non-Winner” Experiments Worth Sharing

‍Not all experiments that fail to confirm your hypothesis yield useful learnings. Understanding the different categories of "non-winner" experiments can help teams better classify, document, and garner insights from their work. Below are a handful of null experimental outcomes that deserve attention.

Results That Challenge Assumptions

‍These experiments directly contradict conventional wisdom or challenge deeply held organizational beliefs. Their value lies in their ability to overturn "common knowledge" that may be leading the organization astray.

They can prevent organizations from making decisions based on external industry myths or outdated internal heuristics, creating space for more promising approaches as the team moves away from false assumptions.

Results That Reveal Unexpected Interactions

‍Some experiments yield non-winning results overall but show interesting patterns when segmented or reveal interactions between variables that weren't the primary focus of the analysis.These experiments often provide a more nuanced understanding of factors at play in your customer ecosystem than a straightforward "win" would. They highlight complexities that simple A/B testing might miss and point toward more sophisticated approaches for future experimentation.

Technical Failures That Expose Infrastructure Gaps

‍Sometimes experiments “fail” not because the hypothesis was wrong, but due to infrastructure limitations, data quality issues, or technical constraints that weren’t obvious beforehand. These failures provide crucial information for technical roadmaps and infrastructure planning to avoid major problems down the road.

How to Present Failed Results Effectively

‍Even the most valuable experimental results provide no benefit if they’re ignored or misunderstood—especially failed test results. Communicating experimental outcomes that didn’t “win” requires skill to ensure the insights are properly understood and applied. Below are a few strategies for presenting non-winning test results to maximize learning and decision-making impact.

Focus on the Question, Not the Answer
When presenting failed test results, reframe the conversation around the importance of the original question your hypothesis was designed to answer, rather than the disappointment of not getting a positive result.
Frame Insights in Terms of Future Directions
Connect the non-winning result directly to the next steps or alternative approaches now realized because of this experiment. This is the crux of the iterative experimental process: each result, regardless of outcome, becomes the next step, informs the next decision, and guides the next experiment.
Be Clear About What the Experiment Did and Didn’t Demonstrate
Being precise about what can and cannot be concluded from a failed test result prevents incorrect generalizations. Note: this applies to all experimental results, even winners.
Quantify the Value of the Null Result
If possible, calculate the value of having learned that something doesn’t work.

Conclusion‍

Embracing "non-winning but noteworthy" results can transform how you approach experimentation. By not devaluing or ignoring results that don’t confirm our hypotheses, organizations build more robust knowledge, avoid repeating mistakes, and create a culture where genuine innovation can flourish.

At Concord, we believe that every experiment generates value when approached with rigor and curiosity. Our team of experienced data scientists and experimentation SMEs has helped organizations across various industries transform their experimental approaches, building comprehensive learning systems that capture insights from every experiment—whether it confirms hypotheses or challenges assumptions.

Are you ready to extract more value from your experimentation program? Contact Concord today.

Embracing Non-Winning Test Results in Experimentation