keep-calm-and-randomize-on-1The new SAT will occasionally ask you questions about experimental design—whether the results of an study conducted by some students, for example, can be generalized to an entire population, or whether some experimental intervention has a causal impact. These questions will not be rocket science, and will not require any math at all, even though they’re in the math section. They will require some critical thinking and careful reading, though.


They will also require you to be on the lookout for randomization. Generally speaking, the more randomized an experiment is, the stronger the conclusions that can be drawn.

If you’re doing research, and you want to be able to generalize your findings over an entire population, then you have to randomly select the subjects for your experiment from that entire population. Say you want to know, on average, how likely a new driver in the United States is to have an accident within one year of getting his or her driver license. If you want to be able to generalize your findings to the entire population of new drivers in the United States, then you need to do a random selection from that whole population. That’s not going to be easy to do, of course, but good research is hard! If you want to generalize about all new drivers in the US, you’re going to have to find someone who has the data you need. Maybe start calling the Department of Motor Vehicles in every state, or car insurance companies.

If you tried to take the easy way out and just went to your local shopping mall and asked the first hundred 17-year-olds you saw whether they’d been in an accident in the first year of having their licenses, then you could really only generalize your findings about the population of new drivers who visit that mall. Maybe going to that mall requires driving on a highway, which not every new driver in the country does. Maybe that mall is in a city, where people are more likely to get into a fender bender than new drivers who live in rural areas. Maybe people who shop in malls are more likely to get into car accidents than people who do most of their shopping online. You see?

Things get a bit more complicated when you’re dealing with experiments in which an intervention of some kind occurs. If you want to be able to generalize your results to an entire population, you need to select randomly from that population. If, further, you want to be able to argue that some intervention is causing some difference between experimental groups, you need to make sure that the intervention is assigned to subjects randomly. Again, in practice, this is not easy to do. That’s why good research is valuable: it’s hard to do!

Say a cognitive scientist is trying to determine whether a certain intervention can be used to cause infants to exhibit object permanence earlier than they usually do. Say, further, that this researcher managed to obtain a truly random sample of infants (this is very difficult, since parents would need to sign off on such a thing, and maybe there’s some difference between children of parents who would sign off and children of parents who wouldn’t…). If the data shows that babies who received the intervention did, on average, exhibit object permanence earlier than those who did not receive the intervention, the cognitive scientist could only claim that the intervention caused the accelerated object permanence if the babies given the intervention were selected at random.

Holy cow—that’s a lot of text about this stuff! My longwindedness notwithstanding, though, experimental design questions really aren’t hard—I promise. Just remember: if the question you get asks you to find a weakness in the design of an experiment, chances are very good that the answer will be something that’s obviously not random.

Other elements of experimental design

Remember that we use statistics to make generalizations about large populations based on observations we make of small groups from those populations. The small groups we use (randomly, if we’re doing it right) are called samples. The number of members of the sample is called the sample size. Generally speaking, the bigger the sample size, the more likely it is that it’s representative of the whole population.

Experimental design and interpretation questions on the SAT will usually provide a sample size. In the released tests, there’s no question where a too-small sample size is an experimental design problem, but I suppose it’s possible that something like that could appear in the future. Rule of thumb: if your sample size is bigger than 100, it’s probably fine. Only consider sample size a problem if it’s comically small.

This kind of question will also sometimes make mention of a study’s margin of error or confidence interval. If you’ve taken a statistics course, then you’ve probably had to calculate these things. You’ll never have to do that on the SAT. You’ll just have to know, in the most basic sense, what they are and why they’re important.

When we generalize what we’ve observed in a sample to draw a conclusion about an entire population, we have to remember that there’s a small possibility that we’re wrong. For example, after measuring the heights of 300 randomly selected 33-year-old American males, we might say that we are 95% sure that the true average height height of 33-year-old American males is within 3 inches of the 70 inch average we found. 3 inches is our margin of error, 67 to 73 is our confidence interval. As our sample size increases and we become more confident that the sample is representative of the population, our margin of error and confidence interval will shrink: we’ll be able to, with the same level of certainty, that the average we’re finding is closer and closer to the true population average.

One more thing: the 95% in this example is called the confidence level—you’ll probably only ever see confidence levels of 95% (maybe 99% once in a while). We include a confidence level to acknowledge that, even if we’ve designed our study carefully—randomized properly, made sure our sample size was sufficient, etc.—it’s mathematically possible that our sample just won’t represent the overall population well.

You need to be registered and logged in to take this quiz. Log in or Register

Comments (2)

Not so much population size, but sample size. If a sample is very small, then it’s unlikely to provide reliable data. The SAT won’t test you on edge cases here—if they’re trying to get you to say a sample is too small, the sample will be very small. See Official Test #2, section 4, #13 for the only useful example of this in an official question so far. In that case, the sample size is greater than 100, so it’s not the issue.

Leave a Reply