Aaron Zhu
1 min readJul 18, 2022

--

Thanks for reading! Yes, a raw correlation between two variables alone is NOT enough to establish the causal relationships as I mentioned in the post. A confounding variable (Z) is a feature that has meaningful causal relationships with the target variable (Y) and other predictor (X) in the linear regression model. Therefore, to correctly estimate the effect of X on Y (the goal of the model), we should include Z in the model. If we omit Z in the model, the meaningful explanatory element of Z on Y would leak to the error term, as a result the effect of X becomes biased. See more details here, https://towardsdatascience.com/understand-bias-and-variance-in-causal-inference-with-linear-regression-a02e0a9622bc; https://towardsdatascience.com/causal-inference-with-linear-regression-endogeneity-9d9492663bac

In your example, I agree to exclude unemployment application from the model, but unemployment application is NOT a valid predictor to begin with, let alone a confounding variable because it doesn’t meaningfully affect unemployment rate (the target variable) even if it might affect other predictor (interest rate).

--

--

Aaron Zhu

Senior Data Analyst | Always looking for new and exciting ways to turn complex data into actionable insights | https://www.linkedin.com/in/aaron-zhu-53105765/