Using the HRS

Insights and advice for new and seasoned users of the Health and Retirement Study

Why are There People Under Age 50 in the Data?

(The short answer: HRS samples households, not individuals, and in some households people over age 50 are married to people under age 50)

The very first conundrum I encountered as a new user of HRS data was the presence of people as young as age 25 in the data set. First I puzzled over how a survey of adults nearing or beyond retirement could contain hundreds of respondents in their 20s, 30s, and 40s. Then I puzzled over how to exclude these younger folks from my analysis. Fortunately, it’s a relatively straight forward matter.

An incredibly important, but often overlooked, aspect of the HRS survey design is that the HRS is a sample of households, not a sample of individuals. More specifically, the HRS samples household financial units – basically, couples – and surveys are conducted with all members of the “unit” or couple. At least one member of the “unit” has to be age-eligible, but the other part of that couple could be any age. This means that HRS conducts surveys of younger members of this household unit, who are the spouses or cohabiting partners of age-eligible respondents. (And yes, this means there are 50-year-old HRS respondents who are partnered with people in their 20s. Though you might be surprised to know that some of these HRS respondents are women.)

So what do we do with the younger adults in the data? Assuming you use sample weights in your descriptive and multivariate analysis, you don’t have to do anything. Non-eligible cases will have a zero value on the sample weight and, if recoded to a missing value, will be dropped automatically from your analysis. Even if you don’t use sample weights in your analysis, you can use the Tracker variable *WHY0RWT to determine which respondents were not cohort eligible in that survey wave. You can also drop anyone who was not eligible based on their birth year* (BIRTHYR in the Tracker file and rabyear in the RAND file). The RAND file includes a constructed variable – racohbyr – that assigns all cases to their respective birth cohort as well as an addition category designation, “Not in any cohort”. This variable makes it very easy to exclude cases that are not age-eligible and is handy when you can’t rely on the sample weight to do the job for you.

*I advise against dropping based on age because there is some overlap in age between those who are and are not cohort eligible.


Comments are closed.