The Data
A researcher wants to measure how diabetes affects healthcare costs. Claims data shows a relationship, but chart review reveals the claims often misclassify patients. What happens to the estimated effect? (Data are simulated for illustration.)
Healthcare Costs by Diabetes Status
Claims data misses some diabetics and mislabels some non-diabetics.
This "noise" in measurement affects what we can learn. But how much depends on the type of error.
Types of Measurement Error
Measurement error comes in two forms. Classical error is random, like a ruler that sometimes reads slightly high or low. Non-classical error is systematic, depending on the true value or other variables.
Classical Random Error
Errors are random and unrelated to the true value. Some measurements are too high, others too low, but there is no pattern.
Examples in Claims Data
- Coding typos that occur randomly
- Sporadic billing system glitches
- Random data entry errors
Non-Classical Systematic Error
Errors depend on the true value or other variables. The pattern of mistakes is predictable, not random.
Examples in Claims Data
- Sicker patients more likely to have conditions coded
- Certain hospitals overcode for reimbursement
- Diagnosis depends on whether treatment was sought
Why Does Error Type Matter?
Classical error in an independent variable (like diabetes status) always biases effects toward zero. This is called attenuation bias. You can sometimes correct for it if you know the error rate.
Non-classical error has unpredictable effects. If hospitals overcode diabetes for sicker patients, your estimate conflates the effect of diabetes with the effect of being sicker in general.
Classical error predictably shrinks effect estimates.
Understanding this pattern lets us quantify how much bias might exist and, sometimes, correct for it.
Attenuation Bias
Classical measurement error in an independent variable creates predictable bias. The formula below shows exactly how much the estimate shrinks based on the reliability of the measure.
Knowing the reliability ratio lets you "un-shrink" the estimate.
Several methods exist to correct for attenuation, each with different data requirements.
Correction Methods
When you know or can estimate the extent of measurement error, several approaches can recover the true effect. Each requires different assumptions and data.
Validation Study
Compare claims data to a "gold standard" (chart review, lab results) in a subsample to estimate sensitivity and specificity.
Requires
- Access to gold standard measure
- Representative validation sample
- Error rates that generalize
Regression Calibration
Replace the mismeasured variable with its expected value conditional on observed data, estimated from a calibration subsample.
Requires
- Subsample with true values
- Approximately linear relationships
- Classical or known error structure
Instrumental Variables
Find a variable (instrument) that predicts the true value but is unrelated to the measurement error.
Requires
- Valid instrument
- Strong first stage
- Exclusion restriction
Sensitivity Analysis
Report how estimates would change under different assumptions about error rates, even without validation data.
Requires
- Reasonable bounds on error rates
- Transparent assumptions
- Clear presentation of uncertainty
When Correction Gets Complicated
These methods assume classical error or known error structures. With non-classical error, corrections can make bias worse. If sicker patients are more likely to be coded as diabetic, adjusting for random misclassification will not fix the problem. You need to understand the source and pattern of error before choosing a correction approach.
Correction works when error is understood.
Economists emphasize quantifying measurement error direction and magnitude before interpreting any claims-based estimate.
Key Insight
These questions help identify measurement error threats and their likely direction. They structure thinking about when claims-based estimates can be trusted and when correction is needed.
How is the variable measured?
Diagnoses from claims depend on billing codes, which depend on provider behavior, patient presentation, and reimbursement incentives. Each step introduces potential error.
Is the error random or systematic?
Random errors shrink estimates toward zero. Systematic errors (like overcoding for sicker patients) create bias in unpredictable directions.
What is the reliability?
Validation studies comparing claims to chart review typically find sensitivities of 60-80% for chronic conditions. This implies substantial attenuation.
Is the error in X or Y?
Error in the independent variable (X) causes attenuation. Error in the dependent variable (Y) increases variance but does not bias coefficients if the error is classical.
Concepts Demonstrated in This Lab
Key Takeaway
Noting "measurement issues" is not enough. Economists quantify the direction and magnitude of measurement error bias. Classical error in an independent variable shrinks effect estimates predictably. With validation data, you can correct for this attenuation. Without validation, sensitivity analysis shows how conclusions depend on error assumptions. Understanding measurement error transforms a vague caveat into a quantifiable threat that can be addressed.