Thanks for reading! Yes, a raw correlation between two variables alone is NOT enough to establish the causal relationships as I mentioned in the post. A confounding variable (Z) is a feature that has meaningful causal relationships with the target variable (Y) and other predictor (X) in the linear regression model. Therefore, to correctly estimate the effect of X on Y (the goal of the model), we should include Z in the model. If we omit Z in the model, the meaningful explanatory element of Z on Y would leak to the error term, as a result the effect of X becomes biased. See more details here, https://towardsdatascience.com/understand-bias-and-variance-in-causal-inference-with-linear-regression-a02e0a9622bc; https://towardsdatascience.com/causal-inference-with-linear-regression-endogeneity-9d9492663bac

In your example, I agree to exclude unemployment application from the model, but unemployment application is NOT a valid predictor to begin with, let alone a confounding variable because it doesn’t meaningfully affect unemployment rate (the target variable) even if it might affect other predictor (interest rate).

Data Science | Machine Learning | Economics Consulting https://www.linkedin.com/in/aaron-zhu-53105765/

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store