Big data

Data domains and marketplace: how to guarantee valuable data under a monopoly regime

Jul 30, 2024

Big data Data Strategy

While I am not an economist, I was wondering how to make sense of terms like domain ownership and data marketplace linking dots among difficult concepts and disambiguating misconceptions.

I started by analysing the natural monopoly dictated by data domains and continued with explaining how to optimise the usage of data in an enterprise. Let's dive in!

Domain ownership is a monopoly.

We want our company’s domains to take full ownership because a technology oriented ownership model breaks things.

Nevertheless, a single valuable data asset is produced by a single domain establishing a monopoly regime in the internal data market.

For example, billing information cannot be generated by two different company’s departments and its meaning doesn’t depend from the technology storing this information. Thus, we want to move ownership to a domain that understands and can manage this data in whatever form is necessary by the company along the whole lifecycle.

No money exchange

A market is an environment (virtual or physical) where sellers and buyers meet to exchange goods and services through money.

This doesn’t apply to data within a company. No one sells data, everyone serves or copies data. Cost management disciplines can help identify the cost of producing data to introduce a chargeback for consumers. This is not the equivalent of associating a commercial value to data to sell it at the best price.

So, where is the market? What do we exchange? What’s the meaning of value?

Data valuation is a discipline devoted to associating a commercial value to data. Severals techniques exist from net present value (NPV) to cost based estimation (defaulting to cost management practices). Nonetheless, they are not used as a day by day tool to monetise data exchange internally. They are oriented more towards estimating the value of large data initiatives or corporate capitals.

No competition, no money exchange

The internal data market provides neither competition (domain ownership is a monopoly) nor money exchange. Keep these elements in mind when we get to the marketplace and data value.

Business-driven data value

Data only converts to value if it is used to make business decisions. Otherwise they are just a liability.

We will see why below.

Making business decisions requires the whole data value chain to function, not a data set only. A decision can be something like investing, saving, entering a market, hiring, improving operations, reducing waste, etc.

Every decision should correspond to a certain expected economic value. Part of it can be attributed to data that helped making the decision. This portion could be redistributed back to the data value chain to estimate the economic value of each data set contributing to the piece of data necessary to make the decision. In the the case you have a single monolithic program producing the data necessary to make decision.

Consider the case of KPIs uniquely built from operational sources through KPI-specific applications.

Graph of KPIs built from operational sources

The following combination is the data value chain to produce a single KPI:

DATA_VALUE_CHAIN = DATA_SOURCES + DATA_PROGRAM + FINAL_DATA

The value can be attributed to the decision-makers and the set of KPIs used to make the decision. If the value is equally attributed to the used data, we can recognize data that are contributing to the data value chain with a certain weight.

Graph of value chain attributable with a certain weight to certain data sources

From this perspective, monolithic applications easily identify with the value of the data generated because they are uniquely responsible to generate the final information necessary to make a decision.

Let’s look at the cost of producing data.

Graph signifying the cost of producing data that goes through its phases of extracting, transforming, and loading before the KPI can get produced.

It's easy to recognize a certain redundancy in the cost of ingestion while it is not clear which redundancy resides within the business logic to produce the final information. The only guarantee is that KPIs necessary to make decisions are identified along with the decision makers. This means that, even if establishing the value of a decision is difficult, this value is fixed given decision makers and the available information.

The return of investment is defined by the following relationship:

ROI = NET_PROFIT / COST_OF_INVESTMENT

It is clear that to increase the ROI we must reduce operating expenses (OPEX) in turn increasing the net profit, this means having a cost efficient data value chain.

On the other hand, to reduce the cost of investment we must be able to build data sets through low cost effort (low CAPEX) while avoiding bad impact on the OPEX. That is, companies can try saving as much as possible to build but what they build must not be expensive to run.

In summary, even if data valuation is complex, we can improve the ROI by reducing

the effort to build data (minimise CAPEX)
the cost of resources to run data (OPEX)

The role of regulation within a monopoly

What if we indefinitely reduce costs of our data construction and data operations? Surely, we are going to affect KPIs quality, availability, delivery, etc. That is, poor data quality affect the ability of decision makers to make informed decisions.

Every natural monopoly needs regulations to work. This happens in the utility industry (for instance) whether the service is energy, water, or gas.

Thus, we can update our partial conclusion with the following statement:

A data management system can improve the ROI attributable to data by reducing

the effort to build data (minimise CAPEX)
the cost of resources to run data (OPEX)

with quality constraints ensuring that what we build is usable.

These constraints are equivalent to regulations necessary in case of natural monopolies to avoid negative impacts on consumers.

Graph of a data value chain

The data value chain is given by a data management paradigm (DWH, data lake, data mesh, lakehouse, mix of them) and a set of data practices (data quality, data privacy, data lineage, etc.).

Graph of a collection of the data management paradigm and data practices

Whatever data management paradigm is chosen by a company, the purpose of that choice should be to reduce the TCO (CAPEX + OPEX) under quality constraints.

Data value in the internal market

The marketplace is not a market in the common sense. Since producers and consumers are not sellers and buyers, they don’t exchange money. Still, they have a value retrofitted from the ROI of any initiative coming from a decision maker. The only way to increase this value is through the minimization of the total cost of ownership of the data value chain while guaranteeing the right quality.

Graph of data management paradigms

Every data management paradigm represents a means to reduce the TCO applying the following principles:

Reusability: data, software, architecture, infrastructure, metadata must be reusable. The more reusability, the less CAPEX;
Domain-ownership: ownership organization around knowledge rather than technologies. This implies less time to produce data at the right quality and reduce incident management costs;
Self-service capabilities: combining reusability and domain-ownership, we can empower a data domain through extreme autonomy to reduce the lead time.

Since we don’t want to reduce quality, there should be e mechanism to regulate the minimum quality to deliver usable data. This can be guaranteed through the following principles:

Computational governance: provides automated constraints, validation and verification of the available data value chain (data management paradigm and data practices) through coded policies during the whole data lifecycle;
Governance shift left: anticipate constraints as soon as possible during the data lifecycle to reduce remediation and data debt since the beginning.

Designing, implementing and monitoring reusability, self-service capabilities, domain ownership, computational governance and governance shift left principles corresponds to managing the data value chain through a strict governance. KPIs showing how effectively we are managing our data management paradigm and data practices are a proxy for the improvement of the data ROI.

The data marketplace

Now, we have established principles to build valuable data through a governed data value chain. How can we exchange value?

Using data.

The data marketplace provides all the capabilities necessary for a data consumers:

Discover: go shopping, freely explore possibilities; understand the current context and data semantics;
Search: enter a domain, ask specific questions about specific data;
Access: buy some data, ask access to directly use it;

Guarantees: data are certified, they comes from strict compliance and regulation that put data consumers in condition to use and make safe decisions.

The marketplace is the collector of all data generated under controlled conditions. Data consumers enter the marketplace the get orientation, find information and access data for direct consumption.

Conclusion

This article explores the relationship among different interwoven concepts such as data value, marketplace, and domain ownership. I go through the natural monopoly regime existing within domains of knowledge and how this affects an effective data value chain.

Data domains and marketplace: how to guarantee valuable data under a monopoly regime

Domain ownership is a monopoly.

No money exchange

No competition, no money exchange

Business-driven data value

The role of regulation within a monopoly

Data value in the internal market

The data marketplace

Conclusion

Similar posts

Data Catalogs: The Semantic Layer - Part 3

Enabling FinOps for Data Mesh

FinOps Architecture for Data Products