1 in 5: a VERY deep dive into campus sexual assault statistics

1 in 5 women will be sexually assaulted at some point during their time in college. It’s a shocking number, one that’s led to a lot of agonizing and discourse across the political spectrum and a variety of reforms put in place on campus. As it should. There is no society in which a statistic like that should be acceptable.

It’s also led to a lot of scrutiny from people who do not want to believe that sexual assault is such a problem in our universities. These people, mostly conservatives, point to a wide variety of perceived flaws in the original study to discredit its findings. They point to other studies with different methodologies that contradict the number. They accuse the authors of fudging the data to promote a political agenda. Debunking this study is a minor pasttime in the right-wing media bubble, like shuffleboard or badminton. But do their critiques hold water? What’s the truth buried in the data?

Before we begin, two warnings: I’m not going to be double-checking their regression analyses here, but there’s no way to talk about this without covering at least a little math. So if you’re one of these people who can’t handle numbers, now would be a good time to leave. More importantly though, I’m gonna be touching on some heavy shit here. There won’t be any graphic descriptions or stories. This is all numbers. But if that isn’t your thing, don’t feel bad noping out of this one.

1. The Survey

Generally speaking, when people cite “1 in 5”, they’re referring to this study by the department of justice. There are a lot of others that reach basically the same results, but all the ones I’ve seen use essentially the same methodology and weighting, and find similar results so I’m gonna focus on it.

Basically, they took two unnamed large Universities, one in the south and one in the midwest, and emailed every student there the same survey asking about their history with sexual assault. They broke it down between forced (i.e. violent) and incapacitated (i.e. drunk) sexual assault, while excluding suspected but not confirmed accounts in the latter category. So already, there’s one way the numbers could be HIGHER than currently reported: not every victim is gonna be sure about what happened. They also looked at trends in attempted vs. completed, and a number of other things.

After some weighting, they found that 19% of women reported experiencing an attempted or completed sexual assault during their time in college: 12.8% for attempted, 13.7% for completed. If you read YouTube comments (and you shouldn’t), you’ll see people use those numbers to argue that the study is somehow fraudulent: 12.8+13.7=26.5, not 19.0. Because apparently you can’t experience both. This is another way that it understates the total rate of sexual assault at universities, though it wouldn’t change the top line number: they only ask if someone has experienced these things, not how often. This is common across most of these surveys.

There are other interesting findings in the data, some more surprising than others. It’s not uniformly distributed through time: there’s a distinct “rape season”, roughly corresponding with fall semester. It peaks in September-October. More than half of all sexual assaults are committed on Friday or Saturday, which makes sense since the most common location for it is at a party. All of those are more pronounced for incapacitated sexual assault than forced, by the way.

The highest reported percentage is among seniors. There’s a credible argument that you should only be looking at them, because counting freshman in prevalence rates across the entirety of the college experience seems dumb, but there’s a real risk of people forgetting about incidents earlier in their studies, or becoming less willing to count it as the victimization fades. Freshman and sophomores are the most likely to be experience this, so it’s important to include them. And before you say “who the fuck forgets being raped”, only a QUARTER of incapacitated sexual assault victims classified their experience as such in the survey.

That’s roughly what it covers. I’m going to move on to the flaws and tradeoffs in the study in a moment, but first I want to point out something that really bothers me. You might heard some variation of “1 in 5 women and 1 in 27 men” in one of these articles or consent workshops. That’s not what the study finds. They found that 6.1% or roughly 1 in 16 men had been a victim of sexual assault. I’m not sure where the 1 in 27 number comes from, but it’s exactly what would happen if you used this study as a source, then only counted completed sexual assaults for men and both attempted and completed assaults for women. If anybody knows better, please send me sources because I want to still have faith in humanity.

2. Shortcomings in the Dataset

While this study is good, it’s not perfect. There are several real issues with how it handles the numbers, and where it draws them from, that should be concerning to anyone relying on them. That’s not to say it’s bullshit: these flaws are natural byproducts of good-intentioned decisions on the part of its authors. If they had done things differently, they would have just had other problems.

There is no way to get a perfect survey on a subject like sexual assault. Anyone who claims they have one isn’t arguing in good faith.

First off, let’s talk about the dataset. I’ve already snuck in one issue with it: the choice of universities. The authors only looked at two institutions in the country, and while they were geographically distinct, they were demographically similar. They were both large, with 30,000 and 35,000 students each. The results may therefore not be representative of the experience of significantly smaller universities. While there are counterparts which HAVE looked at these colleges and found similar numbers, with a smaller college comes a smaller sample to draw on, resulting in noisier data. You can mitigate this somewhat by including even more universities, but because of the significant overhead involved, most papers either use a smaller sample or make do with a lower response rate. More on that later.

The other issue is that they excluded all students under the age of 18. They kinda had to: otherwise they’d need to get parental consent for those people to respond. I’ve heard credible arguments that this exclusion could bias the results towards overestimating AND underestimating the prevalence. It’s hard to say. Either way, their absence is significant: between them and other groups excluded from the study, only half the enrolled in either university were ever gonna be included in the data. With no information on the other 50% at all, it’s hard to say what effect, if any, this might have.

3. Shortcomings in the Procedure

The authors of this study didn’t fly out to these colleges and personally interview over 6,000 students. They sent each participant a survey via email and had them fill it out online. Data collection of that form tends to get a low response rate. After all, how likely are you to respond to a random email asking you to fill out a questionnaire? And indeed, that’s what we see: response rates of about 40% at both universities, higher for women and lower for men.

That would be fine, if who responds to a survey and who doesn’t were random. But we know that isn’t true. Racial and ethnic minorities consistently under-respond to polls of all forms, and online polls in particular tend to include more people for whom the subject matter is relevant. That factor can lead to significant and at times catastrophic overestimates of relatively rare phenomena.

Put another way: if you have been sexually assaulted, you are more likely to be interested in a survey about sexual assault than if you have not been. You’re more likely to read the email, and you’re more likely to fill it out.

There are a lot of conflicting factors here. Victims of sexual assault may be less willing to answer, out of fear that they might be shamed or to avoid having to answer uncomfortable questions. There are any number of ways for the topic to be important to you without being an actual victim. You might know a friend, for instance, or simply be engaged with the topic.

But there are some aspects of the study that suggest there was an effect here. The response rates for men and women were markedly different: 42% for women, and only 33% for men. We also know that men are less likely to be victims of sexual assault. In fact, this is a consistent pattern across the board for studies that found a result somewhere in the 1-in-5 range. They’re mostly online surveys sent to students, and they almost always have a higher response rate among women than men.

Here’s where it gets complicated. There are ways to account for non-response bias, at least partially. The scientists who put this study together used three of those ways.

First, they compared the demographic information in their survey respondents to that of all people who did not respond, and to that of the university as a whole. Wherever there was a demographic discrepancy, they gave more weight in the results to people underrepresented in the survey. For instance, nonwhite students were less likely to respond, so they counted the answers from nonwhite students who DID respond more.

They weighted by four factors: which university they were in, their gender, their year of study, and their race/ethnicity. That list is pretty sparse. Most surveys would get a lot more demographic info on each person, and then figure out what to weight from there. The problem is that it’s hard to balance that extra information with guarantees of anonymity. Especially with a topic as fraught as sexual assault, it’s crucially important that participants don’t feel their answers might get connected back to them. Even without the ethical concerns, it can lead to lower response rates among people who HAVE been assaulted. Surveys without the same dedication to anonymity report significantly lower numbers, sometimes below 1%. So this is kind of a damned-if-you-do situation.

Second, they used something called the “continuum of resistance” model. Basically, it says that whether or not someone is willing to answer a survey isn’t a binary thing: the less likely you are to respond to it, the more likely you are to put off doing it. In other words, the demographics of the people who took the longest to fill out the survey probably match those of the people who didn’t fill it out at all, and their responses are probably similar.

This effect doesn’t always show up, but it looks like it did here. Nonwhite students were more likely to not answer the questions, and also (somewhat) more likely to be a late responder. They found no significant difference in answers between late and early responders, which suggests that whatever nonresponse bias existed was fairly small.

The third method they used is less reliable. Essentially, they did a follow-up survey of all people who didn’t respond to the first one (note: they still knew who did and didn’t respond because respondents got a small cash award and they could see who collected it, though not which responses corresponded to which person), and asked them why they didn’t respond. Most nonrespondents said they’d either never received the emails or weren’t sure if they had, and only a very small number said they didn’t respond because they hadn’t experienced sexual assault.

Personally, I wouldn’t have even included this section in the study. The response rate for this follow-up was abysmal: barely 10%, compared to nearly 40 for the top level. It also will exhibit the same kinds of biases the first one did. For instance, people who would be interested in the first study but just didn’t see it in their inbox will be more likely to respond to the second one than people who weren’t interested at all. I mean, do you want to fill out a questionnaire about why you don’t want to answer another questionnaire?

All in all, the authors of this study were meticulous and honest with their findings. They crafted their study to prioritize the privacy and comfort of their respondents, they were forthcoming about potential sources of error, and they made good-faith efforts to adjust for those sources wherever they could. I’ve read crappy studies, and I’ve read fraudulent studies. This one looks nothing like those.

However, there is only so much the authors can do to adjust for these factors. Their selection of methodology inherently comes with certain errors that are nearly impossible to correct. And while there is an argument that sexual assault victims would also be less likely to respond due to discomfort, the fact that there are many more nonvictims than victims means that even if that were true, the numbers would still probably be an overestimate. While the findings here are valuable, they are not gospel, and it’s likely they are inadvertently highballing it.

3. The Other Options Suck Too Though

Online surveys of university students are not the only way to answer this question. Conservatives often cite two other studies, both done by the government. The first is the FBI Uniform Crime Report, which isn’t a survey at all. It’s a thorough accounting of every crime reported to the police in a given year. They generally find somewhere around 100,000 reported rapes to have occurred each year, total, implying an almost minuscule percentage on campuses.

If you’ve made it this far into the post, you’ve probably already seen the problem with that sentence. The reporting rate for rape is really, really low. Only about a third of rape victims inform the police. And it gets worse. Until 2013, the UCR used the word “forced” in their definition of rape. If it wasn’t forced, it wasn’t counted. That would exclude many cases of coerced sex and even some cases of violent, forced sex (for instance, the people reporting it to the FBI won’t necessarily count marital rape, because people are awful).

One of my first jobs ever was data prep for the sexual assault division of my local District Attorney’s office. Even within the prosecutorial community, the FBI numbers are seen as comically low. We didn’t use them.

Instead, we relied on the National Crime Victimization Survey, the other source conservatives like to draw on. It accounts for the low reporting rate because it’s an actual survey of a randomized sample. It’s done through in-person or phone interviews, both of which significantly reduce the interest-bias you find in their online counterparts (you’re more likely to answer the questions when there’s a person on the other end). And it finds that roughly half a million rapes occur each year. More than the UCR, but it would still be less than 1% for women on campus.

It has its own problems, though. The NCVS generally just asks “have you been raped?” or some variant, which we know from countless other studies doesn’t cover all or even most sexual assault victims. It’s likely that the NCVS is significantly lowballing the numbers as a result. They’ve tried to adjust for that in recent years, but most researchers outside the Bureau of Justice Statistics don’t think they’ve done enough, and I’m inclined to agree. Additionally, because the NCVS is explicitly done by a government agency, survivors will be less likely to respond to them for the same reasons they don’t report their assaults to the police. Think of it as the other side of the 1-in-5 studies. They are equally methodical, but where one errs on the side of overestimating when there’s a tradeoff they have to make, the other errs on the side of underestimating.

There are other studies, using some combination of in-person and phone interviews, online results, and other metrics, and different ways of determining whether or not a subject has been assaulted. Their results are all over the map, but tend to fall somewhere in between the NCVS and the 1-in-5 study. They also tend to fall on the high end of that range, so the real number is probably closer to 1-in-5 than to the <1% the NCVS reports. It could be 10. It could be 15. We can’t be sure.

4. Why We Don’t Have a Perfect Study

By now, you might be thinking “okay, so why don’t we pull together some academics, do in-person interviews at a few dozen representative universities, and get some unimpeachable numbers?” After all, it’s not like any of the issues of these studies are inherent. There’s no law that says only the government can use direct sampling or you have to do everything online if you’re talking to college students.

The real obstacle here is money. Online surveys are prevalent because online surveys are cheap. Email is free, so the main expenses are a few grad students to crunch the numbers, the salary of whoever makes sure the study is ethical, and whatever incentive you give people for participating. That 1-in-5 study probably cost about $75,000.

For in-person or phone interviews, you have to pay people to ask the questions. The more folks in your sample, the more people you have to pay for longer. Then you have to vet those people to make sure they know what they’re doing and won’t influence people’s responses. And you have to pay for travel times to make sure those people get to the various campuses. And you have to figure out how to turn their responses into data for the computer which means either expensive Scantron machines or paying more people for data entry and then there’s the privacy concerns, because HTTPS doesn’t exist in the outside world, so somebody has to oversee the data entry….

You get the idea. All told, a study like that one could easily set you back $15 million. That’s more than the total budget your average Sociology department gets in a year.

There are also ethical concerns. Direct interviews may have a higher response rate, but they can also take an emotional toll on sexual assault victims who will have to discuss their trauma with a complete stranger. Science is not done in a vacuum (except for Astronomy), and you have to be careful not to hurt the very people you are studying in the process of learning from them. Additionally, $15 million is not a small amount of money to throw at a problem. It’s hard to justify spending that much on a fact-finding mission instead of, for instance, paying for every untested rape kit in the state of California. There are better ways to allocate our resources here.

5. Why Is This What You’re Fixated On

These numbers get complicated, but at this point it’s fairly clear that the 1-in-5 statistic is not as reliable as we assume it is. It’s probably too high (note: while it’s less likely, it could also be too low), and when accounting for systemic errors it’s probably somewhere in the 1-in-10 to 1-in-6 range. Where you think it lands depends a lot on what specific choices your preferred researchers made when handling the technical details of their study. Even the 1-in-5 authors believe in a much more nuanced take on the data.

That’s a good thing. Your average discourse in the media and in our political forums will always be more simplistic than the careful quantitative analyses of peer-reviewed journals. Scientists and scientific studies will disagree with each other based on their particular decisions over their particular methodologies. And while we don’t know for sure what the percentage is, we’ve narrowed it down quite a bit.

Specifically, we’ve narrowed it down to “too damn high”. 1 in 5 is too damn high. 1 in 10 is too damn high. 1 in 20 is too damn high. Even the more conservative studies outside the NCVS give staggeringly high totals of sexual assaults in our universities. We may not know exactly, quantifiably how bad the problem is, but we know that it’s bad, and warrants immediate action.

But the critics of this study seem to think otherwise. They seem to think that if there are flaws in this paper, then there’s no problem at all. They believe that because the studies we cite can’t guarantee us total certainty, there is no value in what they say. It is the worst sort of scientific illiteracy. Even if you allow for significant errors, and if anything I’ve been too harsh on the original paper here, the numbers would STILL be staggeringly high. You could assume that there was not a single sexual assault victim in either of the two universities who didn’t fill out that survey, and you’d STILL find that about 3% of women were assaulted during their time there.

The science of accounting for sexual assault on campus is tricky and imprecise. There is a lot of room for careful critique of the numbers we have, and many questions for which we don’t yet have answers. But don’t let those uncertainties become a smokescreen for what we do know.