Aaron Zhu
1 min readJul 18, 2022

Thanks for reading! Yes, a raw correlation between two variables alone is NOT enough to establish the causal relationships as I mentioned in the post. A confounding variable (Z) is a feature that has meaningful causal relationships with the target variable (Y) and other predictor (X) in the linear regression model. Therefore, to correctly estimate the effect of X on Y (the goal of the model), we should include Z in the model. If we omit Z in the model, the meaningful explanatory element of Z on Y would leak to the error term, as a result the effect of X becomes biased. See more details here, https://towardsdatascience.com/understand-bias-and-variance-in-causal-inference-with-linear-regression-a02e0a9622bc; https://towardsdatascience.com/causal-inference-with-linear-regression-endogeneity-9d9492663bac

In your example, I agree to exclude unemployment application from the model, but unemployment application is NOT a valid predictor to begin with, let alone a confounding variable because it doesn’t meaningfully affect unemployment rate (the target variable) even if it might affect other predictor (interest rate).

Aaron Zhu
Aaron Zhu

Written by Aaron Zhu

Senior Data Analyst | Always looking for new and exciting ways to turn complex data into actionable insights | https://www.linkedin.com/in/aaron-zhu-53105765/

Responses (1)

Write a response