Using the HRS

Insights and advice for new and seasoned users of the Health and Retirement Study

What are Proxies?

Much of the value of the HRS as a study of aging is in its use of a number of innovative survey research techniques, including the use of proxy interviews. Researchers who include respondents with interviews by proxy in their analytic sample should be well informed about what a proxy interview is and how these respondents differ from the rest of the sample.

Continue reading


Why are There People Under Age 50 in the Data?

(The short answer: HRS samples households, not individuals, and in some households people over age 50 are married to people under age 50)

The very first conundrum I encountered as a new user of HRS data was the presence of people as young as age 25 in the data set. First I puzzled over how a survey of adults nearing or beyond retirement could contain hundreds of respondents in their 20s, 30s, and 40s. Then I puzzled over how to exclude these younger folks from my analysis. Fortunately, it’s a relatively straight forward matter.

Continue reading

Leave a comment

Beware of version updates!

Data management of large, complex surveys like the HRS is a fairly difficult and time-consuming task. The HRS staff endeavors to produce a public-release data set as soon as possible (and they produce them much faster than many other publicly funded data collections!), so they release an early version of the data: Core Early Release (V1.0)*. The updated RAND files tend to follow soon after.

HRS does an early release to get the data to the public as fast as possible, but HRS staff continues to process the data until they have a final data release. Sometimes the early release is ultimately designated as the final release – you’ll know this is the case if you see “Final V1.0” followed by the date of release. But sometimes there are issues in the data that need to be resolved. These tend to be the result of programming errors, but there are sometimes problems with the data that the HRS staff catch during data inspections after the early release (e.g., a case is designated as a non-sample member after closer inspection). So, in some years the final release is a V1.0, but in other years it may be a V2.0 or V3.0. And in rare cases the final release is a V4.0 or V5.0.

What is the significance of all of this for the user? Continue reading

Leave a comment


First a disclaimer. I choose the title of this post to reflect how I think others view these data products. These are not, in fact, competing data products, nor are they alternatives to one another. I view these data products as complementary, each with individual strengths and weaknesses, but ultimately a powerful data analysis resource when combined. Below I briefly describe the difference between the two data products and outline the pros and cons of using each (though my ultimate recommendation is to combine the two data products for a “pros-only” data set).

Continue reading