Stat Chat: The Pros and Cons of ‘Controlling’ for Things

Stat Chat: The Pros and Cons of ‘Controlling’ for Things

keyboardIf you are a policy maker or an education executive—or you are an advisor to one—you’ll likely have this conversation with one of your data people:

You: Can we look at the impact of X on Y?

Data person: Do you want to control for prior performance, ethnicity, IEP, ELL, and poverty? What about nesting of students within classrooms and schools?

You: How would we do that?

Data person: I can include it in my regression model/ANOVA/ANCOVA/HLM?

I will leave it to your data person to explain how their models work and how to interpret the results. For today, I want to focus on what you gain and lose by “controlling” for those variables. Just for now, let’s set aside our reservations about how well the assessments measure what we care about. A legitimate conversation, yes, but for another day. The present discussion is relevant to any measure—graduation rate, attendance rates, suspension counts, you name it.

Let’s say you want to find out which of your schools is doing a great job academically. The obvious thing to do is look at who has the highest average scores on some standardized test you administer to students across the whole district (or state, diocese, etc.). So you look at test scores, and what a surprise. With a few exceptions, schools in the expensive neighborhoods boast good scores. Selective-admission magnet schools top everyone.

Are those schools the best? Most people think so. Are those kids learning the most? That’s more difficult to say. Are those teachers and principals doing a better job than the ones at the low-performing schools? You’d be crazy to even suggest it, right?

So, we start controlling for things. If you have baseline data (from a prior year, for example), you might start by controlling for that. This gives you a model that answers the question: Given kids with the same baseline performance, how did each school do on the measure of interest? Strictly speaking, growth modeling is a somewhat different thing, but conceptually it is similar.

Or you could throw in controls for all the ways we categorize kids. (For most districts, these categories are ELL, IEP, Ethnicity, and Free/Reduced Lunch eligibility.) Now we are answering the question: Given kids with the same baseline performance and demographics, how did each school do on the measure of interest?

So when is it appropriate to include those controls? Here are my suggestions:

  • If you are primarily interested in the outcome itself, it is best not to include controls.
  • On the other hand, if you are primarily interested in the effectiveness of a program, an intervention, or the adult professionals, then it is generally best to include controls.

In a previous post, I mentioned that as a scrawny middle-aged nerd, I have started studying Gracie Jiu-Jitsu. While the art teaches techniques that nullify natural advantages in size, strength, and agility, those advantages are very real.  So there are two ways to rate how “good” I am. First, one could simply look at the percentage of sparring matchups that I clearly win (approximately 2%). Or, one could gather a lot of data and come up with a model that controls for age, weight, and years of experience to predict how many matches I would win in the hypothetical world where everybody had the same weight, age, and amount of experience. In the controlled model, perhaps I would look even better than the 24-year-old, 215-pound kettle bell instructor who has a higher belt rank than me and routinely beats me without breaking a sweat. If you were picking an instructor, you would have some evidence that my instructor is pretty good. But if you are picking a buddy to have your back in a bar fight, I guarantee that you want the bigger, stronger, younger guy with more experience. All else being equal, perhaps I am as good. But all else is not equal.

Let’s now apply this to an educational situation. If a superintendent needs to identify which of her schools has the biggest liabilities in terms of serious incidents and therefore needs additional support for school climate and safety issues, it is not particularly helpful to control for prior-year serious incidents and a bunch of demographic variables that are likely correlated with neighborhood challenges and serious incidents. If you do include all those controls, your model could trick you into thinking that South Comprehensive High is no more dangerous than Mayor’s Magnet School for Math and Philosophy.

However, if she is implementing a new anti-bullying program and wants to see how well it is working, you should control for as many other bullying-related factors as possible, because otherwise the “all-else” blurs the effect of the actual intervention.

If I am a parent, I typically want to see uncontrolled data. Give me the school with the highest scores, the highest graduation rates, the best AP performance, the most college scholarships, and the lowest number of serious incidents. If my kids are poor and black, I don’t want to send them to a school that theoretically does a pretty good job, after controlling for the fact that all of the students are poor and black. If there is a school that has mostly rich or middle-class kids in it, and on average that school’s students have much better outcomes, I might like my child to go there instead—especially if there are enough students and teachers who look like her that she is not isolated, marginalized, or threatened.

If I am a teacher or a principal, I want to see the controlled model: Don’t rate my performance without taking into account all the things outside my control.

The gray areas in between pose the greatest challenge. When districts create school performance indices, they are implicitly trying to do both simultaneously: Indicate which schools are the best (which would motivate not using controls) and indicate which schools are doing the best job given the particular challenges they face (which would motivate using the controls).

A performance index that includes controls for demographics and prior performance de facto sets a lower bar for more-challenged neighborhood schools. It is one thing to tell your education professionals: Taking into account the social challenges and prior performance, you did as good a job as the staff at the school in the exclusive neighborhood. It is quite another to tell the local parents and press: If we set aside the social challenges and prior performance, this school is just as good as that other school.

1 Comment

  1. Carolyn Deyo March 10, 2015 Reply

    Thanks, Sean,
    this is a helpful perspective on a potentially confusing issue.

Leave a reply

Your email address will not be published. Required fields are marked *