Prediction Or What If: What's the Nature of Your Problem?
With the rise of machine learning techniques, I believe we have become more susceptible to the 'law of instrument bias.' This principle suggests that if your only tool is a hammer, every problem looks like a nail. While techniques like regression, classification, and tree-based models offer tremendous value to businesses and societies, they are not well-suited for a crucial type of problem: intervention.
In predictive modelling, given a set of features \(X \), we estimate the probability that \(Y \) is true – \(P(X|Y) \). Machine learning algorithms use historical data to approximate this conditional probability distribution, which isn't always available. Then, the trained model is used for making predictions. This approach works effectively in many tasks, such as facial recognition and image classification. However, the model's predictive power diminishes if the probability distribution changes.
This limitation is why predictive models are unsuitable for intervention problems. When a policy interacts with its subjects, such as through vaccine administration or advertising exposure, it alters the likelihood of a response, \(Y \), given \(X \). Relying on the pre-intervention conditional probability distribution can lead to inaccurate predictions. The probability we need to estimate is not \(P(X|Y) \) but \(P(X|do(Y)) \), where 'do' signifies intervention. One might consider using a predictive model with a comparable group that has not undergone the intervention. However, once an action is taken, the model cannot observe the counterfactual, or 'what-if,' scenario. Predictive models suffice in an idealized world where data for both scenarios are available.
Many business problems are, in fact, intervention problems. This includes areas like marketing, UI/UX design, policy design, and customer service. To assess the effectiveness of a particular campaign, layout, rules, or service scripts, we must compare them with their respective counterfactual scenarios: what if we didn't run the campaign or alter the layout, rules, and scripts? The standard approach is a randomized experiment. However, experiments are not always feasible; they can be costly and time-consuming. Therefore, I believe businesses should arm themselves with another tool – Causal Inference – and distinguish between prediction problems and intervention problems.
Comments ()