Young people are less likely to have health conditions that make them particularly vulnerable to Covid-19. However, it may be common to live in a household with a vulnerable person.

      In this post, I try to get an estimate for the percentage of people who live in a household where at least one person has a condition that makes them at higher risk of serious complications from Covid-19.1 Halfway though working on it, I realized my approach was flawed. However, the flaw is kind of interesting, so I thought it was worth writing up.

Data

      Most surveys ask questions to only one person in the household.2 However, we need to know information about each person living in a household to estimate the number of people who live with someone vulnerable to covid.

      To get this household level information, I turned to the Panel Study of Income Dynamics (PSID). This survey asks a number of questions about health conditions to the head of household and (if there is one) spouse, and contains basic information about everyone else living with the household.3

Methods

      To estimate the health conditions of other household members, I used the data for people I did have health condition information for–the head of household and spouse. Specifically, I modeled the probability of having each health condition based on age, gender, and race using the health outcomes for the head of household and spouse. I then used this model to predict health conditions for everyone else in the household.4

      This imputation approach suffers from a few problems. One issue is that it’s possible that people who aren’t a head of household or spouse have systematically better or worse health even after adjusting for gender, race and age.

      The more serious problem is that health conditions are correlated within households. For instance, if a family member has heart disease, you’re more likely to have heart disease then what would be predicted based on just your age, gender, and race. My imputation approach predicts health conditions based on age gender and race, so this within household correlation is a big problem that I cannot account for.5

      The intra-household correlation of health conditions causes me to overestimate how many people have a health condition in the household. (At least that’s my confident guess, I don’t actually have data to prove it.)6 Age, gender, and race are not very predictive of most health conditions. This means my predictions, while close to correct in the aggregate (you can check that individual level predictions here), will give most people a medium percentage chance of having any given condition. When aggregating across multiple individuals in a family, this means family level risk for at least one person having a given condition will be relatively high. In reality, the distribution of health conditions is more tail heavy, with many households that very healthy and a smaller fraction that have a lot of health conditions. Predicting health based on age, gender, and race does not account for the fact that health vulnerabilities are concentrated within families, for environmental, social, and genetic reasons.

      tl;dr Health problems are correlated within families, so estimating health problems based on demographics make them look more diffuse across families than reality, which (maybe counter intuitively) makes my estimates of household level health issues too pessimistic. Fewer families have any given health condition, but families that do have at least one health condition are likely to have multiple conditions.

Results

      With those huge caveats in mind, the graph below plots that chance any given individual is in a household where at least one family member has a given condition. As you can see, the probability of having someone in the household with at least one health condition is quite high by this estimate, but it is almost certainly an overestimate.

      The only number that is not going to suffer from within-household correlation in health is the probability of having someone above age 65 in your household. I might break that down further in a future post, as the race differences were quite striking to me, and the opposite of what I expected.


  1. Based on this list from the CDC)↩︎

  2. If you’ve ever been surveyed, they often ask you to give the phone to the person with the nearest birthday as a way of making sure people who answer the phone in a given household aren’t the only people who are surveyed. Lately, that’s probably uncommon as more sophisticated surveys call cell phones and sample off the voter file (so they ask for a specific person if they call the landline).↩︎

  3. It actually doesn’t include non-family members who don’t share most household expenses, so I think a roommate who you only split the expense of rent and utilities with wouldn’t qualify. There is a question about how many people are living in the household so I can check to see how big of an exclusion that is.↩︎

  4. My code is here if you’re curious about the details. One thing of note is that I assume children don’t have any conditions. They would be tough to estimate because children aren’t head of household, so I’d have no data for people under the age of 18. However, I think just imputing zeros is fine here since children are low risk. In general, since there are so few covariates available for other household members in the PSID, I don’t think there can be much value added in any particular modeling approach.↩︎

  5. While I technically could try to add some kind of within household correlation factor to my imputation model, it would be a totally made up number. To estimate the household level correlation in health conditions I would need data that has health conditions of all members of the household/family, which defeats the purpose of the entire imputation exercise. I think the best approach would be to provide some bounds and show how the estimates change with different estimated within household correlations.↩︎

  6. The one bias I didn’t mention that would pull in the opposite direction is under-reporting of health conditions, either due to not wanting to admit something to an interviewer or just not being aware you have e.g. high blood pressure. The PSID is a really high quality survey that asks a ton of details about finances, so I don’t think social desirability bias would be a big worry here, but undiagnosed conditions could be important for things like high blood pressure.↩︎