What are we actually learning?

andreiluchici
Dec 5, 2025
5 min read

Generated using Google's Nano Banana model based on the content of the article.

Executive Summary

In the domain of supervised machine learning, the objective is to train a model that can map a given set of inputs to a desired set of outputs. While this process is often described as "learning," it is critical for strategic decision-making to understand the precise nature of this learned artefact. This post establishes that a standard supervised learning model does not learn a true, mechanistic model of the world. Instead, it learns a highly complex associative relation, a sophisticated statistical mapping that links input patterns to output labels based on correlations present in the training data. This distinction is not merely academic; it has profound implications for model reliability, robustness, and strategic utility. Models built on association are exceptionally powerful for prediction within stable environments but are inherently brittle when faced with novel conditions or shifts in the underlying data-generating process. The key takeaway for executive leadership is to view these models as powerful pattern-matching instruments that augment, rather than replace, human domain knowledge and causal reasoning.

1. Introduction

The proliferation of machine learning (ML), particularly supervised learning, has enabled significant advancements in automation and prediction across industries. The fundamental paradigm involves providing a model with a dataset of input-output pairs (e.g., customer demographics and their purchase history) and tasking it with learning a generalizable rule. This rule, or mapping, can take many forms: a complex multivariate function in a neural network, a series of hierarchical decisions in a decision tree, or a set of logical clauses in an inductive logic program.

Common business parlance might suggest the model "understands" the relationship between inputs and outputs. However, this report posits a more precise and cautious interpretation, aligned with the scientific consensus. The learned mapping is best characterised as an associative relation, a mathematical construct optimised to correlate specific input features with specific outputs. This analysis will deconstruct this concept, contrasting it with a true causal model of a system, and explore the strategic business implications that arise from this critical distinction.

2. Analysis: The Nature of the Learned Mapping

The core of supervised learning is function approximation. Given a dataset of 𝑛 examples:

D = {(x₁, y₁), (x₂, y₂), ..., (xₙ, yₙ)}

where xᵢ is an input vector, and yᵢ is the corresponding output, the goal is to learn a function f such that f(x) ≈ y.

The learning process is one of optimisation, where the parameters of the function f are adjusted to minimise a loss function, a measure of the discrepancy between the model's predictions f(xᵢ) and the actual values yᵢ.

2.1. Learning as Statistical Association

The optimisation process is agnostic to any underlying physical, economic, or social mechanisms that connect $$x$$ to $$y$$. The algorithm's sole directive is to find a functional form that minimises prediction error on the provided data. Consequently, the model becomes an expert at identifying and exploiting statistical correlations, regardless of their origin.

Correlation vs. Causation: The most fundamental principle is that the model learns correlation, not causation. A classic example is the correlation between ice cream sales and drowning incidents. An ML model trained to predict one from the other would learn a strong positive relationship. It would not, however, learn the hidden causal factor (i.e., warm weather) that drives both. The model's "knowledge" is purely associative: more of input A is statistically associated with more of output B.
Spurious Correlations: High-dimensional data is rich with spurious correlations, patterns that appear statistically significant by chance but have no meaningful connection. A sufficiently complex model can easily learn these statistical artefacts, treating them as valid predictive signals. This can lead to a model that performs well on historical data but is based on nonsensical or unstable relationships.

2.2. A Mapping, Not a World Model

A genuine model of the world, in a scientific sense, attempts to describe the underlying data-generating process. It posits mechanisms, principles, and causal links that explain why an output is produced from an input. Newton's law of universal gravitation, F = G (m₁m₂ / r²), is a world model; it describes a fundamental mechanism of the universe.

In contrast, a supervised ML model is an instrumental artefact. It does not propose a mechanism. It is a "black box" that executes a complex mathematical transformation learned from data. Its success is measured by its predictive accuracy (instrumental performance), not its explanatory power (realistic fidelity). This leads to two critical limitations:

The Problem of Extrapolation: An associative model is effective at interpolation—making predictions for new inputs that are similar to those in the training data. It is notoriously poor at extrapolation, or making predictions for inputs outside the distribution of the training data. Because it has not learned the underlying rules of the system, it has no basis upon which to reason about novel scenarios.
Brittleness to Distribution Shift: The learned associations are only valid as long as the statistical properties of the real world remain consistent with the training data. A distribution shift—a change in the underlying data-generating process due to new market conditions, a global event, or evolving customer behaviour—can invalidate the learned correlations, causing a previously accurate model to fail abruptly and silently.

3. Discussion: Strategic Implications for Business

Understanding that ML models are associative mappers, not causal reasoners, is essential for mitigating risk and maximising their strategic value.

3.1. Redefining the Role of AI in Decision-Making

Executives should view ML models as powerful tools for automating and scaling pattern recognition, not as replacements for strategic thinking.

Use Case Suitability: Models are best suited for stable, well-defined prediction tasks where the environment is not expected to change dramatically (e.g., image classification, transcribing audio). They are less suited for strategic "what-if" analysis (e.g., "How will a change in our pricing strategy affect customer loyalty?"), which requires causal understanding.
Augmenting Human Expertise: The most effective use of ML is to augment human experts. A model can identify subtle patterns in vast datasets that a human might miss, but a human expert, possessing deep domain knowledge and causal reasoning, must interpret those patterns, validate their relevance, and make the final strategic decision.

3.2. Managing Model Risk

The "associative, not causal" nature of ML is a primary source of model risk.

Continuous Monitoring: Models must be continuously monitored for performance degradation. A drop in accuracy is often the first sign of a distribution shift, indicating that the learned associations are no longer valid.

Avoiding Overconfidence: High accuracy scores on a test dataset can breed a false sense of security. Leaders must question why the model is working and understand the key features it relies on. If the model is leveraging spurious or unstable correlations, it represents a significant hidden risk. For instance, a loan approval model that associates zip codes with creditworthiness is not learning about an individual's financial responsibility but is instead encoding potentially biased and unstable demographic correlations.

4. Conclusion

The output of a supervised machine learning process is not a comprehensive model of reality. It is a highly optimised, often complex, associative mapping between inputs and outputs. This mapping is constructed from the statistical correlations present in historical data and is devoid of any inherent causal or mechanistic understanding. While this makes it an incredibly powerful tool for prediction and automation in stable environments, its associative nature also renders it brittle and incapable of true extrapolation.

For business leaders, the strategic imperative is to harness the predictive power of these models while remaining vigilant of their fundamental limitations. Success lies not in treating these systems as autonomous decision-makers but in integrating them as sophisticated instruments that serve and augment human expertise, insight, and strategic judgment. The ultimate understanding of "why" remains a human endeavour.