The odds are good that every hot take on political polls you’ve ever read is wrong.
That’s not to say that the various writers of opinion columns everywhere are lying to you. They just don’t hold advanced degrees in statistics, so they all make the same mistakes. They also usually haven’t spent much time, if any, working at a polling outfit or some equivalent. I understand where they’re coming from.
But the misconceptions they have get carried over into the greater world. That’s what I want to correct.
1. Polling Numbers Aren’t Numbers
This is perhaps the most important, and most frequently forgotten about one: polls don’t give you numbers, they give you ranges.
In any survey, there is an inherent statistical uncertainty because you haven’t given a questionnaire to every voter in the U.S. We handle that with a margin of error: when Gallup reports that Trump has a 41% approval rating, what they really mean is “Trump’s approval rating is about 41%, give or take a bit”. You can calculate that bit from the size of the sample. Most polls aim for at least 1,000 respondents and about a 3% margin of error.
This is important, because we tend to only care about who wins or loses a race. Take, for instance, I gave you two polls before the 2016 election: one predicted that Hillary would win by about a point, and the other predicted that Trump would win in a landslide. Most people, including most reporters, would instinctively say that the latter was more accurate: it successfully predicted Trump’s win. But any pollster will tell you that instinct is wrong: the former isn’t just more accurate, it’s a lot more. After all, it got the actual result within its margin of error, and the Trump-win poll was way off the deep end.
2. Polling Numbers Aren’t the Same as What the Respondents Said
This one is a bit less well known. Let’s say I have a poll of 1,000 people, and it reports that Trump has a 41% approval rating. That means 41% of the people in the sample supported him, so 410 people said “yes, I approve of Donald Trump’s performance”, right?
Wrong! You see, for a poll to work, you need to have an accurate sample to the population, which means an accurate sample of the population needs to pick up their phones and answer the questions. But most people don’t answer their phones when a random number calls. As of 2017, the average response rate for a phone survey is less than 10%.
It gets worse. There are two different ways you can run a poll: you can have a computer call random numbers and record responses through number presses, or you can hire an entire phone bank of people to call random numbers all day. One of those is significantly cheaper than the other, so now most polls are automated. But automated call systems are prohibited by law from calling cell phones. If you don’t have a land line, you will never hear from an automated polling outfit, and it’s impossible for you to be in the sample.
The population of people with land lines does not look that much like the general U.S. population. They skew older, whiter, wealthier, more conservative, and more rural. If we just left the sample as is, every poll would be like that Trump victory one from the last paragraph.
Pollsters correct for that by weighting the sample: some people’s responses count for more than others, depending on how much of the electorate their particular demographic blend is likely to be and how many people like them are in the sample. This also introduces the potential for human error: you don’t know how many millenials are going to be part of the electorate next year, and neither do Pew and Gallup. They take an educated guess, and if their guess is wrong, their poll will be too.
The good news is, they’ve gotten very good at guessing. The bad news is, the combination of low response rates and spotty coverage, particularly of blacks, latinos, and the young, means that the people you find in those categories can count for so much that random noise throws off the sample set. In 2016, one 19-year-old black man from Illinois who was all-in on Trump was so heavily weighted he threw off the LA Times tracking poll to the point that they predicted Trump would win by 5 points.
And if you’re thinking that they called it right and the other polls were wrong because Trump DID win, remember that Hillary won the popular vote by 2 points. The tracking poll was trying to score the national popular vote, not the electoral college margin, and it was off by more than all the polls that “got it wrong”.
3. Never, EVER trust the crosstabs
If you’re reading an op-ed, and they quote a poll as saying that “Hillary’s support among blacks is comparatively lower than Obama’s”, or something similar, feel free to stop reading.
When polling outfits release their results, they don’t just give you the top line numbers. They also break down the results by race, gender, education, political party, etc. These are called “crosstabs”, and they generate them exactly how you’d expect: they slice up the sample to include only the subset that shares that trait, and look at the results for just that group.
The problem is that these crosstabs are a lot smaller than the actual sample. Even the largest one, men vs. women, will only be about half the size. That means a much bigger margin of error. It’s not just twice. That shit ain’t linear. And when you’re talking about even smaller groups like Republicans, men without a college degree, or black women ages 18-29, it gets even worse.
Combine that with the hefty overweighting of certain populations because they’re less likely to be reached by the pollsters, and as much as 15% of that crosstab analysis could come from one person’s opinion.
The only exception to this, and I mean the only one, is if the same crosstab analysis holds across large numbers of polls for a long time. We know, for instance, that Trump’s approval rating is still very high among republicans, because even with as much as a 15-20% margin of error per poll, enough have said he does that we can be sure of it. But small fluctuations in the crosstabs, between different polls, in a short amount of time? It’s meaningless. It gives the pundits something to agonize over between Trump tweets.
4. National Polls are National
Despite what the entire news media would have you believe, the polls did moderately well in 2016. According to Fivethirtyeight, on average they predicted Hillary would win by about 3.5 points. In fact, she won by about 2.1. An error of 1.4 is high, and when it’s across a polling average it’s indicative of a systemic problem, but it’s well within the acceptable margins. You wouldn’t, for instance, be angry if she won by 6 points and the polls predicted she’d win by 7.4.
The problem is that what the polls were measuring (the national popular vote) and what we all really cared about (the vote margins in the key swing states to put each candidate over the edge) weren’t the same thing. They were correlated, in the sense that in most presidential elections, the candidate who wins the popular vote also wins the electoral college, but they aren’t identical. They can go in different directions.
That, of course, is what happened in 2016. Clinton won a close but decisive victory in the popular vote by racking up huge margins in blue states and high turnout in deep-red states like Texas, while Trump won the exact right mix in the right places to eke out a win in the electoral college. The polls were largely right. But we were wrong in our interpretations of them.
In other words, be very careful to understand what question each poll is actually answering. It may not be the same as the one you’re asking.
Fin
As a general rule, you probably put too much faith in the exact numbers a poll reports to you. Polls don’t lie (usually), but the people interpreting them for you can, and you have to always be aware of the inherent uncertainty in any survey. Pollsters stake their livelihoods on the quality of their predictions. They are very, very good at what they do. But we can make their job easier by learning how to read what they produce.