You’ve fielded a market research survey.

For weeks, you wrote and rewrote your survey questions. You paid for a SurveyMonkey license and spent hours learning how to program your survey. You leveraged dozens of industry connections to get survey answers — a hard-earned set of 300 respondents.

Getting here wasn’t easy.

But unfortunately, you’re not done. Before drawing findings from your survey, you need to clean your data. This is absolutely essential for maintaining the quality of your research. Here are the three most important things to look for when cleaning your survey data.*

Speeders

These are respondents who took your survey too fast. Identifying responses like this is based on the median time spent taking your survey.

The rule of thumb here is to disqualify responses from anyone who completed your survey in less than half the median time. There are some exceptions, like if your survey includes a logic branch that had certain respondents answering just a few questions. But in general, anyone going more than twice as fast as the average respondent is likely someone who sped through the survey without giving the questions much thought.

You can identify speeders by downloading your survey data into Excel, then subtracting the “time completed” from the “time started.” Most survey platforms I’ve used record this information. If yours doesn’t, you may have to skip this flag, but be sure to check for the following two.

In general, not more than 10% of your survey sample should be discarded for speeding.

Flatliners

These are people who picked the same answer to every (or most) multiple-choice question in your survey. For example, say that you asked four open-ended questions about price (like a Van Westendorp question set). A flatliner answered the same thing for each of these four questions (say, $10).

If you notice that more than 10% of your survey respondents are flagging on flatlining, you may want to look more closely at the questions you’re including in your scan for flatliners. It may be that some respondents only appear to be flatlining, when they are, in fact, giving honest answers. This may have to do with the way you’ve asked your questions (for example, if you placed student, 18–22 years, unmarried, and no kids all as the first answer options to questions asking about employment, age, marital status,and children).

You can only determine this by looking at respondents’ individual answers — but don’t bother with this if less than 10% of your survey sample is flatlining.

Gibberish and Contradictory Answers

These types of responses can be harder to spot. They require you to look, line by line, at answers to open-ended questions in order to identify ones that 1) are gibberish (i.e., dk3i8sw) and/or 2) don’t correspond to other answers in that row. For example, if someone says they are single at the beginning of the survey, then mention their “wife” or “husband” in a later open-ended question, delete that respondent. They are not being honest, and you want data that you can stand on.


If you designed your survey well, your data cleaning shouldn’t result in discarding more than 15% of your responses. If you’re worried you’re throwing out too many, take a closer look at the ones you’re throwing out. Consider keeping a few that give other indications of being good, honest answers or the ones that flagged on only one of the three criteria listed above.

*PeopleFish analysts clean every one of our client’s survey datasets according to the criteria set forth in this article. Data cleaning is a standard piece of our survey project offering. Nevertheless, it’s helpful for researchers to understand how survey data is cleaned, for their own knowledge and, of course, should they want to conduct their own market research survey independently of a market research firm like PeopleFish.