First, let’s just admit to ourselves that doing surveys with kids is exponentially more fun than with adults. Unfortunately, they don’t tend to know as much information as their parents (at least, not the info that would be relevant to the study), so the post-fun analysis requires a bit more magic. Cue: instrumental variables.
The instrumental variable (IV) method is often used when there exists some sort of self-selection bias in the way a program is implemented. One oft-used example is that of the impact of additional years of education (treatment) on earnings (outcome). Of course the two are correlated, but it is hard to untangle them in a way that shows cause. Highly educated people might have more income, but it could be that it’s just people who are more motivated who are both a) driven to finish school, and b) perform better in their careers.
How can we attribute the additional earnings directly to education? We would have to control for selection-bias (i.e. the reasons that people choose to continue their education), which encompasses a whole host of factors — some of which, like “motivation”, are hard to measure. As all impact evaluation methods attempt to do, an IV can help to isolate the relationship between an intervention and its outcome, without the extra bias messing up the estimates of the treatment’s effect.
You can read up on the method here, but as a quick review: IVs follow two main rules.
- It is a variable that influences the assignment to treatment (in this case, a person’s likelihood of getting more education)
- It does not directly influence the outcome or the other independent variables (i.e. doesn’t directly help them with income, doesn’t influence their motivation/IQ, or any other influential factors).
Solution? Find an IV, and essentially trap the influence of ‘years of education’ which has nothing to do with what we’ll call the ‘extras’ (motivation, IQ, etc.). Then we replace the treatment variable with this specific IV, in order to narrow the impact to just treatment without extras. In this case, David Card’s research used the variable distance to university. The logic is that kids to live closer to universities are more likely to continue their education. However, the location of someone’s house doesn’t affect their motivation or IQ, and it wasn’t correlated with income. So by capturing the influence of distance to university, and then replacing the original treatment (years of education) with that influence, we can narrow the impact down to that which is not due to motivation or IQ.
IV’s are tricky to come by in datasets, and you have to find a really good one that satisfies the two rules, or else the IV will actually create more bias in your estimates. But you can also plan an evaluation and identify IVs in advance of data collection.
What does this have to do with kids? As I said, sometimes kids don’t have all the answers we’re looking for. But if we’re evaluating the impact of an educational intervention, then we’ll have to test them anyway. To get a more robust dataset, we could also go find their parents and spend another hour with each family conducting a survey. But doing that costs time and money, and slows down the data collection process. If you’re like most evaluators, you can’t always have the best, most perfect dataset. Sometimes you have to settle for second-best, given your time and resource constraints. We’re already here with all of the kids in the school: what if we could get all the data we need from them?
Another example: let’s say we want to evaluate the impact of attending pre-school on the kids’ literacy development (maybe measured through letter name identification). Well, not all kids attend pre-school. In fact, the factors that determine whether or not a kid is in pre-school are complicated. It probably has a lot to do with the parents’ income, or their own level of education. But, kids can’t answer that for us. And we don’t have the time or resources to do household surveys with every parent.
So (and here’s the magic): ask the kid to give you data on an IV. Cut me some slack here, because I’m thinking of a potential IV more or less off of the top of my head, but let’s use presence of young woman in home as an IV. In many low-income communities, it’s common to see young women in the family doing care-work. And if an extra carer is available in the home, and the children aren’t yet at the age to have government-mandated schooling, it’s more likely the kids will stay home out of convenience. However, whether or not there happens to be a woman of a specific age range in the home is not related with the parents’ income or level of education.
The best part is, kids do know who live with them! They know if their sister is out of school, they know if their aunt helps out in the home. We could ask this quick question with every kid interviewed, and then use it to vacuum up all of the extra bias that is inherent in the decision to send a kid to pre-school. Add in the kid’s age, gender, ethnicity, or any of the other readily available variables that influence educational outcomes in that context, and you have yourself a kid-friendly impact evaluation. True, it may not be as rigorous of an evaluation as you could get if you had infinite time and resources. But really – who has that?
Need a TL;DR? Kids don’t know everything, but we can use what they know to fulfill our data needs with the magic of IV and some careful evaluation planning.