Most operators believe they are negotiating the economics of a data deal. They are not. They are discovering economics that were set, in their final form, years before anyone opened a term sheet. The rake – the slice the asset owner keeps versus the slice that flows to whoever turns the asset into something a buyer pays for – is set by one thing: where the enrichment work sits. The contract records what the architecture already decided.
This is the move most boards miss, and the cost is the entire commercial position. Operators sign a partnership, hand over raw or lightly processed data, and discover eighteen months later that the partner keeps seventy percent of the economics on an asset the partner does not own. They blame the contract. The contract was a lagging indicator of an architectural decision made years earlier; the rake was already set. Renegotiation at renewal does not move it.
Enrichment is the work that takes a corpus from “I have it” to “a buyer will pay for it.” Cleaning, structuring, modelling, attaching identifiers, building the inferential layer that makes a raw signal commercially legible. There is a narrow class of assets – real-time exchange feeds, exclusive regulated data – where the substrate is itself the product and enrichment is trivial. Almost nothing else qualifies. In the database era enrichment was mechanical and the rake question rarely surfaced; the value sat in the join, and the join sat with whoever held the identity spine. In the AI era enrichment is where model-relevant signal is constructed, and the construction is most of the value. Whoever does the construction, keeps the value. This is not strategy. It is arithmetic.
Three places the work can sit. Each fixes a different rake.
The asset owner can do the enrichment itself. It invests in its own enrichability and ships a product, not a substrate. The rake stays high because the buyer is paying for actual work the buyer cannot reconstruct. Dunnhumby reached this position with Tesco’s basket data, turning it into a media and insights product Tesco alone could not have sold. The economics that followed could not be eroded by a partner who had done the hard part.
Or a partner does it. The owner ships raw data into a joint venture, a clean room, a specialist that faces the buyer. The rake collapses. The owner does not understand why until renewal and then frames the result as a partnership problem. It is not. The partner is doing the work that makes the asset commercial. Every dollar the buyer pays is paid for what the partner built. The owner could not have monetised the asset alone. The split tracks the work. Most retailers, most banks, most telcos sit here, and most of them spend a decade pretending the next round of commercial terms will fix what the architecture chose.
Or the buyer does it. This is the configuration foundation model training contracts have made suddenly visible. The buyer ingests the asset and processes it internally against its own models. The rake collapses to commodity pricing on the raw feed. The asset, in the form it ships, is interchangeable with any other source of similar signal. Reddit licenses its corpus to Google for around sixty million dollars a year for a corpus Google enriches into something worth multiples of that inside its own models. The buyer captures the upside. The owner is paid for the substrate, not the product, and the price tracks what the buyer would pay to substitute the substrate elsewhere.
The principle reaches every mode. An operator who decides to Sell without first deciding where enrichment will sit is not deciding to sell; they are deciding to be a supplier to whoever decides to enrich. An operator who decides to Wrap without enrichment internalised is wrapping somebody else’s product. The rake is whatever the someone else permits. Improve is the only mode where the rake question stays internal, and even there it returns the moment the improvement is exposed to a partner. No mode lets you postpone the enrichment-location decision.
Take any data revenue line and ask one question. If the partner or buyer stopped doing the enrichment tomorrow, would the buyer pay the same price for the asset by itself? If yes, the rake is yours and the architecture is sound. If no, the rake belongs to whoever does the work, and the contract is a lagging indicator of a decision made years ago. Most operators, running this test honestly, discover they are not in the data business they thought they were in. They are supplying to whoever does the enriching. The data business sits one layer downstream, owned by someone else.
Enrichment moves toward whoever invests in it. An asset enriched today by a partner is enriched tomorrow by the buyer, the moment the buyer’s internal capability can do without the partner. The default is toward the buyer, who is closer to the use case, and whose incentive to invest is permanent. The owner who does not invest watches the rake migrate.
Three structural defences slow this, and they are the only three. Each is a form of prior investment that no single buyer can independently replicate. The first is cross-customer scale: Bloomberg has done enrichment on financial data for forty years and no trading desk has internalised it; the cross-issuer, cross-counterparty view depends on being outside any single buyer. The credit bureaus sit here. Nielsen sits here for television measurement. The second is regulatory authorisation: where enrichment requires a licence buyers cannot independently hold – PHI tokenisation, certain banking compliance flows, certain rights-cleared content – the licence itself is the moat. The third is real-time operational integration: where the cost of failure exceeds the cost of dependence – weather feeds inside airline operations, exchange feeds inside trading systems, authorisation feeds inside payment networks – the risk arithmetic forbids internalisation.
If your enrichment edge is one of those three, the rake is durable. If it is not, the trajectory applies. The default for a data licensing business in 2026 is the trajectory. The exceptions are a small, identifiable set, and most operators who think they sit inside one of them do not.
Without investment, today’s rake is a depreciating lease.
The power to invert this is always with the asset owner. The owner decides the boundary of the asset, the level of enrichment, and the form in which the asset ships. The combination determines the rake. The decision is architectural, not contractual. Walmart Connect is the visible case: Walmart built an anonymised, enriched, productised advertising layer on its own first-party data and sold directly to brands. It did not negotiate a better rake from the agencies; it built a position from which the agencies were never going to extract one. Most asset owners do not make these choices deliberately. They drift into letting the partner do the enrichment and frame the resulting low rake as a partnership problem. The architecture they did not design is the architecture they got.
The correction is not commercial. It is architectural. The operator internalises the enrichment, builds the enrichability into the asset itself, or accepts the rake the current configuration produces and stops being surprised by it. There is no fourth option. There is no negotiation that substitutes for the design decision, and the operator who keeps signing partnerships hoping the next term sheet will rescue what the architecture gave away will be wrong again at the next renewal.
The rake is decided before the negotiation. The negotiation only discovers it.