Thursday, December 1, 2011

Moving on from the "Search for Universals"

As I've delved deeper into the field of education research, I've grown increasingly frustrated by two large problems I see in that research.  While listening to Malcom Gladwell's TED talk on consumer choice, something he said helped crystallize the second issue (I'll explore the first in another post).

In the talk, Gladwell discusses the shift of food science (starting with Prego pasta sauce) from trying to create the "one best" product for all to creating multiple products that fit specific groups.  He compares that trend to cancer research, arguing that "the great revolution in science of the last 10-15 years" is "the movement from the search for universals to the understanding of variability".

If he's right, then it seems ed policy researchers missed a memo somewhere (to be fair, it's not just ed policy -- I see the same problem in lots of policy literature).

The vast majority of the policy research I've read, seen presented, or heard discussed focuses exclusively, or at least mainly, on the average effect of a particular variable, intervention, or policy.  Over the years, we've developed increasingly sophisticated methods to more accurately estimate these average effects.  But I rarely hear people discuss the differential effects of the variable, intervention, or policy of interest.

In other words, we keep trying to figure out if different policies "work," but we define "work," as making a statistically significant difference for the average student, teacher, principal, school, district, or state.  If we instead asked for whom a given policy works, we'd likely find that it works very well for some while harming others.

In the talk, Gladwell mentions a food scientist hired to create the "one best Pepsi" who conducts taste tests of the product with varying amounts of sugar.  To his surprise, no one, clear winner emerges.  Some people prefer only a little sugar, some like a medium amount, and some like a lot.  He then has an epiphany and realizes that there's no such thing as the "one best Pepsi" (which eventually leads to the creation of a wide variety of Prego sauces that target different audiences).

I'd argue that education policy is almost exactly the same.  For example, imagine a new math curriculum.  How do we decide if it works?  The gold standard of research would dictate that we randomly assign, say, 100 classrooms to use the new curriculum and 100 to use the old one.  We'd then compare the average scores at the beginning and end of the year of the treatment and control groups.  If kids in the treatment group score significantly higher, on average, than the control group then that curriculum earns a gold star.

We spend increasing amounts of time trying to figure out which math curricula will yield the largest achievement gains across the students who use them.  But it seems exceedingly unlikely that "one best math curriculum" truly exists.  Some states, districts, schools, teachers, and/or students will do best with curriculum A, some with curriculum B, and some with curriculum C.

Wouldn't our time be better spent figuring out for which students curriculum A would be best and for which students curriculum B would be best (and why)?  And we could say the same about all of the largest issues in education policy -- teacher training, teacher pay, charter schools, and so on.  In later posts, I'll explore how our research and policy would differ if we aimed to understand and account for variability rather than simply finding the one best policy.

I've long thought it odd that we spend most of our lives being told not to stereotype and make generalizations while the most educated people in the country strive to make the largest generalization possible in their research.  Most (though not all) researchers focus incessantly on "generalizability" (one of those words we use in the field but the spellchecker won't recognize): if we can generalize your findings to 10 million students across the country, you're likely to publish the paper in a better journal than if we can only generalize your results to students in one school.

As it turns out, I was right to think that odd.  Greater generalizability matters in many ways.  But, at some point, we start missing the point -- and for exactly the same reasons teachers and parents across the country are telling our kids not to generalize.  Everybody is different.

Every student is different.  Every teacher is different.  Every principal is different.  Every school is different.  Every district is different.  Every state is different.  The best policy, on average, will help one and hurt another.  Until we understand this variability, we're doomed to fail.