
We’ve all been there. Something is off. Your conversion rates are impossibly high, your test winner seems too good to be true, or this week's numbers contradict last week's. Your finger hovers over "send," ready to escalate to your manager, stakeholder, or maybe the engineering team.
But here's what I've learned: most "critical data issues" aren't problems with the data itself, but things we can catch with better attention to fundamental details. Are we in the right environment? Did our settings load correctly? Did that join work as expected? These are small, fixable oversights that a systematic check catches before escalation. Every false alarm costs credibility with stakeholders, time from engineering teams, and trust that when we do escalate, it matters.
The solution? A systematic pre-flight checklist that catches false positives before they become fire drills. Here are five things I check every time before hitting send.
In my experience, environment mismatches are responsible for more wasted debugging hours than any other single issue. You're running code in your development environment when you are meant to be in production. You're connected to last month's data snapshot instead of the live database. Your configuration file didn't load, so you're using default settings that don't match your project's requirements.
Generally, the insidious thing about environment issues is that your code runs perfectly; it just runs perfectly wrong.
Start with the fundamentals:
Create a standardized environment verification checklist for your team. Better yet, if possible, build a quick diagnostic script that runs automatically when you start a new analysis session. This script should confirm:
Keeping a simple verification cell at the top of an analytical notebook that prints out these key settings is a great way to save hours of confusion with code that runs in five seconds.
Aggregated results can hide structural problems with your data. A summary statistic that looks perfectly reasonable might be masking the fact that you have far too few observations, that your comparison groups are allocated incorrectly, or that 90% of your data is coming from a single unexpected source.
Before you can trust your analysis, you need to understand whether you have the right amount of data distributed in the way you expect.
Build a standard diagnostic query that gives you visibility into your data's volume and distribution:
Save your diagnostic queries and adapt them to your specific use case. Whether that's A/B tests, dashboard reports, or predictive models, having a notebook to rinse and repeat with is invaluable, here. The exact metrics will vary, but the principle is the same: confirm you have the right volume of data, distributed correctly, before you trust what it's telling you.
My go-to volume and distribution check includes:
Even when you have the right volume of data distributed correctly, the actual values within that data can be wrong. Summary statistics are wonderful liars where an average can look perfectly reasonable while hiding impossible values, data entry errors, or collection failures. A total can be technically correct while including outliers that completely distort the business reality.
You need to look at actual data values to ensure they make sense, not just that they aggregate something reasonable.
The specifics depend on your data type, but the principle is universal: look at the actual values!
For binary or categorical data:
For continuous metrics:
For all data types:
Make it a habit to preview actual data values, not just aggregates. Take a random sample and eyeball it. Sort by your key metrics and look at the extremes. Ask yourself: "Would I believe these values if I encountered them in the real world?"
For continuous metrics with potential outliers, consider whether to cap outliers at a reasonable maximum or minimum would be appropriate. Sometimes extreme values are legitimate; sometimes they're data quality issues. The only way to know is to look and pull from your domain knowledge to make a decision.
One of my favorite quick checks is to calculate impossible values. If you're measuring time duration, look for negatives. If you're measuring conversion rates, look for values above 1. If you're measuring age, look for values over 120. A single impossible value tells you there's likely a broader data collection or transformation issue upstream.
Data rarely travels directly from source to analysis without transformation. You join tables together. You filter relevant records. You aggregate, pivot, and reshape. Each transformation is an opportunity for something to go wrong and for that error to compound downstream. When I was doing data engineering work, my first mentor in analytics always told me “When values don’t look right, investigate the joins.” Her piece of advice has rarely steered me wrong; Thanks, Kate.
The most dangerous data transformations are the ones that seem to work but subtly change your data in ways you don't intend.
For every significant transformation in your data pipeline, verify that inputs and outputs match your expectations:
Row counts are your friend:
Value checks for transformations:
Join validations:
Build verification into your workflow as a standard practice, not an afterthought. For any complex transformation:
Document your expected versus actual outcomes. If a transformation is supposed to filter your data down to 80% of original rows, and you're seeing 60%, that's a signal to investigate.
Here's a scenario that plays out in data teams everywhere: An analyst encounters a cryptic error message. They spend two hours searching Stack Overflow, trying different solutions, and finally finding the fix. Three weeks later, a teammate encounters the exact same error. They spend two hours searching Stack Overflow. A month after that, the original analyst encounters it again in a different project and has forgotten the solution. Two more hours lost.
Sometimes, we're solving the same problems over and over, wasting time on issues we've already conquered. As someone who loves optimization, this upsets me. We do not need to reinvent the proverbial wheel.
Before you dive deep into debugging, ask yourself:
If the answer is yes, you can save yourself significant time. If the answer is no, and you end up solving a tricky problem, you have a responsibility to document it for next time.
Create and maintain a living troubleshooting guide for your team. This doesn't need to be fancy; my personal favorite is a shared document because it allows for pictures of the error, not just text. But a wiki or Confluence page, or even a Slack canvas dedicated to "problems solved" can work for your team.
For each entry, capture:
Make it searchable. Make it accessible to the whole team. Most importantly, make contributing to it part of your team culture. When someone solves a tricky problem, adding it to the guide should be as automatic as committing code.
Let's say your team of five analysts each encounter three repeated technical issues per month that take an hour each to resolve. That's 15 hours per month, 180 hours per year of duplicated effort. If documenting solutions let you solve repeat issues in 10 minutes instead of an hour, you've just reclaimed 180 hours annually, which is nearly a month of productivity.
However, in reality, time savings compound because documented solutions prevent rabbit holes, reduce frustration, and let people maintain focus on actual analysis rather than task switching with technical troubleshooting.
Here's the paradox of data issue escalation: the more urgent something feels, the more important it is to slow down and check your work. When results look surprisingly good or surprisingly bad, when dashboards show unexpected changes, when stakeholders are asking for immediate answers, those are precisely the moments to take a breath and run through this checklist.
These checks take 10 to 15 minutes in most cases. They're not glamorous. They won't impress anyone in a meeting. But they prevent hours or days of wild goose chases, protect your credibility with stakeholders, and ensure that when you do escalate an issue, people know it's genuine.
The best data analysts aren't the ones who never encounter issues. They are the ones who systematically eliminate false positives, who escalate confidently because they've done their homework, and who save the five-alarm urgency for situations that truly warrant it.
Build this checklist into your workflow. Force it to become muscle memory, not an afterthought. Your future self, your teammates, and your stakeholders will thank you.
Not sure on your next step? We'd love to hear about your business challenges. No pitch. No strings attached.