How to Clean Your Market Research Survey Data

You’ve fielded a market research survey.

For weeks, you wrote and rewrote your survey questions. You paid for a SurveyMonkey license and spent hours learning how to program your survey. You leveraged dozens of industry connections to get survey answers — a hard-earned set of 300 respondents.

Getting here wasn’t easy.

But unfortunately, you’re not done. Before drawing findings from your survey, you need to clean your data. This is absolutely essential for maintaining the quality of your research. Here are the three most important things to look for when cleaning your survey data.*


These are respondents who took your survey too fast. Identifying responses like this is based on the median time spent taking your survey.

The rule of thumb here is to disqualify responses from anyone who completed your survey in less than half the median time. There are some exceptions, like if your survey includes a logic branch that had certain respondents answering just a few questions. But in general, anyone going more than twice as fast as the average respondent is likely someone who sped through the survey without giving the questions much thought.

You can identify speeders by downloading your survey data into Excel, then subtracting the “time completed” from the “time started.” Most survey platforms I’ve used record this information. If yours doesn’t, you may have to skip this flag, but be sure to check for the following two.

In general, not more than 10% of your survey sample should be discarded for speeding.


These are people who picked the same answer to every (or most) multiple-choice question in your survey. For example, say that you asked four open-ended questions about price (like a Van Westendorp question set). A flatliner answered the same thing for each of these four questions (say, $10).

If you notice that more than 10% of your survey respondents are flagging on flatlining, you may want to look more closely at the questions you’re including in your scan for flatliners. It may be that some respondents only appear to be flatlining, when they are, in fact, giving honest answers. This may have to do with the way you’ve asked your questions (for example, if you placed student, 18–22 years, unmarried, and no kids all as the first answer options to questions asking about employment, age, marital status,and children).

You can only determine this by looking at respondents’ individual answers — but don’t bother with this if less than 10% of your survey sample is flatlining.

Gibberish and Contradictory Answers

These types of responses can be harder to spot. They require you to look, line by line, at answers to open-ended questions in order to identify ones that 1) are gibberish (i.e., dk3i8sw) and/or 2) don’t correspond to other answers in that row. For example, if someone says they are single at the beginning of the survey, then mention their “wife” or “husband” in a later open-ended question, delete that respondent. They are not being honest, and you want data that you can stand on.

If you designed your survey well, your data cleaning shouldn’t result in discarding more than 15% of your responses. If you’re worried you’re throwing out too many, take a closer look at the ones you’re throwing out. Consider keeping a few that give other indications of being good, honest answers or the ones that flagged on only one of the three criteria listed above.

*PeopleFish analysts clean every one of our client’s survey datasets according to the criteria set forth in this article. Data cleaning is a standard piece of our survey project offering. Nevertheless, it’s helpful for researchers to understand how survey data is cleaned, for their own knowledge and, of course, should they want to conduct their own market research survey independently of a market research firm like PeopleFish.

How to Design your First Market Research Survey

At the end of the day, entrepreneurs need consumer data. Investors simply won’t trust your gut. Nor should you.

That said, the first big step toward turning your product or service idea into a sound business concept, and ultimately toward wowing investors, is market research. And the bottom line is that your first market research survey should answer three very specific questions.

In this Startup Grind article, our Founder Nick Freiling identifies these three questions, and explains how to design a basic, first-pass market research survey to test & validate your product idea.

To learn more about how PeopleFish empowers entrepreneurs to get feedback on their market research questions, click here.

How to Overcome Sampling Bias in Your Market Research Survey

In the market research world, sampling bias is a consistent error that arises due to the way a survey’s sample was selected. It occurs when a sample is not random, meaning certain types of respondents are more or less likely to be chosen for the sample.

The result: Survey results that don’t reflect the population you purport to represent. Instead, they reflect a stilted sample.

For example, a survey of potential voters in the upcoming presidential election may suffer from sampling bias if the list of people invited to take the survey come from, say, a conservative think tank’s donor list. Such respondents are going to be more likely to favor the Republican candidate than are voters in general, and it’s precisely those voters in general that a political pollster probably cares about.

Overcoming sampling bias

Generally speaking, sampling bias cannot identified or overcome by examining a survey’s response data alone. Sampling bias is identified only by comparing a survey’s sample to the population of interest.

In other words, you can’t just look at a survey’s results and decide the sample is biased one way or another. You can (and should) compare a survey’s results to other similar surveys to see how respondents’ sentiments might differ, but that’s inexact.

“You can’t just look at a survey’s results and decide the sample is biased one way or another.”

The only way to accurately measure sampling bias is to compare your survey’s sample, on every relevant characteristic imaginable, to the general population your survey aims to understand. This, of course, is impossible, but that doesn’t mean we can’t get close.

For example, pollsters from the hypothetical voter survey mentioned above might include questions about their respondents’ ages, political affiliations, and past voting behavior, then compare those results to other surveys of the voting population to see how well their sample compares.

You can see here, though, that judging sampling bias relies heavily on intuition. What characteristics are relevant to your particular survey? What should you look at for when judging whether your survey sample is biased, based on the issues you’re trying to understand?

What does this mean for you?

First, don’t trust just any survey about your customers. Regardless of the topic or sample, analysts must consider how the sample selection may be biased, and what differences may exist between those who did and did not complete the survey. Further, these differences must be considered in light of the client’s final research question — it could be, for whatever reason, that differences between those who did and did not complete the survey aren’t meaningful to you and won’t affect your key takeaways.

Second, be vigilant in collecting your customers’ contact information. When my team conducts a survey, we try to include as many customers as possible in the survey sample. If we are surveying a coffee shop’s customers, for example, about their willingness to pay more for a particular menu item, our results come with big caveats if we are only able to survey customers that have, say, a rewards account at the coffee shop. Those customers are going to differ from customers who do not belong to the rewards program. They may love the coffee more than non-members and be more willing to pay extra for the menu item in question. Or they may feel betrayed, as rewards members, being asked if they’d pay extra.

There’s really no way of knowing for sure that the sample isn’t biased unless we survey every single customer, and we get closer to that if we have contact info for as many customers as possible.

Finally, include demographic variables in your customer surveys. This way, you can compare the makeup of your sample — the “average” respondent” — to what your intuition tells you is your “average” customer. Gender, age, household income, job, family size, and/or other behaviors and characteristics relevant to your product or service. Asking for these demographic variables also has the benefit of allowing you to cut and segment your results along these demographic variables, perhaps exposing opportunities to up-sell or improve your targeted marketing campaigns.

One more thing…

All this might sound hopeless. Unless we survey every single customer, we can’t be 100% our sample isn’t biased one way or another.

But as mentioned above, intuition really is key to knowing whether your sample is biased and how that might affect your key findings and inferences. Our best work happens with clients who understand their customers in a way that supplements whatever survey they’re trying to run. If a business owner knows from experience, for example, that his rewards program customers are more loyal and generally willing to pay more for his products, we can account for that when drawing inferences from the survey results. If he knows from experience that previous price increases have not affected his business with non-rewards members, we can account for that when drawing inferences from the survey results.

Surveys are typically quantitative market researchbut quantitative data must be interpreted through the lens of experience and subject matter expertise. When it comes to your customers, you probably know more than any single survey can tell you.