Elite Data Engineering Manifesto
Learn about the principles of Elite Data Engineering that enhance modern data management practices.
While I am not an economist, I was wondering how to make sense of terms like domain ownership and data marketplace linking dots among difficult concepts and disambiguating misconceptions.
We want our company’s domains to take full ownership because a technology oriented ownership model breaks things.
Nevertheless, a single valuable data asset is produced by a single domain establishing a monopoly regime in the internal data market.
For example, billing information cannot be generated by two different company’s departments and its meaning doesn’t depend from the technology storing this information. Thus, we want to move ownership to a domain that understands and can manage this data in whatever form is necessary by the company along the whole lifecycle.
A market is an environment (virtual or physical) where sellers and buyers meet to exchange goods and services through money.
This doesn’t apply to data within a company. No one sells data, everyone serves or copies data. Cost management disciplines can help identify the cost of producing data to introduce a chargeback for consumers. This is not the equivalent of associating a commercial value to data to sell it at the best price.
So, where is the market? What do we exchange? What’s the meaning of value?
Data valuation is a discipline devoted to associating a commercial value to data. Severals techniques exist from net present value (NPV) to cost based estimation (defaulting to cost management practices). Nonetheless, they are not used as a day by day tool to monetise data exchange internally. They are oriented more towards estimating the value of large data initiatives or corporate capitals.
The internal data market provides neither competition (domain ownership is a monopoly) nor money exchange. Keep these elements in mind when we get to the marketplace and data value.
Data only converts to value if it is used to make business decisions. Otherwise they are just a liability.
We will see why below.
Making business decisions requires the whole data value chain to function, not a data set only. A decision can be something like investing, saving, entering a market, hiring, improving operations, reducing waste, etc.
Every decision should correspond to a certain expected economic value. Part of it can be attributed to data that helped making the decision. This portion could be redistributed back to the data value chain to estimate the economic value of each data set contributing to the piece of data necessary to make the decision. In the the case you have a single monolithic program producing the data necessary to make decision.
Consider the case of KPIs uniquely built from operational sources through KPI-specific applications.
The following combination is the data value chain to produce a single KPI:
DATA_VALUE_CHAIN = DATA_SOURCES + DATA_PROGRAM + FINAL_DATA
The value can be attributed to the decision-makers and the set of KPIs used to make the decision. If the value is equally attributed to the used data, we can recognize data that are contributing to the data value chain with a certain weight.
From this perspective, monolithic applications easily identify with the value of the data generated because they are uniquely responsible to generate the final information necessary to make a decision.
Let’s look at the cost of producing data.
It's easy to recognize a certain redundancy in the cost of ingestion while it is not clear which redundancy resides within the business logic to produce the final information. The only guarantee is that KPIs necessary to make decisions are identified along with the decision makers. This means that, even if establishing the value of a decision is difficult, this value is fixed given decision makers and the available information.
The return of investment is defined by the following relationship:
ROI = NET_PROFIT / COST_OF_INVESTMENT
It is clear that to increase the ROI we must reduce operating expenses (OPEX) in turn increasing the net profit, this means having a cost efficient data value chain.
On the other hand, to reduce the cost of investment we must be able to build data sets through low cost effort (low CAPEX) while avoiding bad impact on the OPEX. That is, companies can try saving as much as possible to build but what they build must not be expensive to run.
In summary, even if data valuation is complex, we can improve the ROI by reducing
What if we indefinitely reduce costs of our data construction and data operations? Surely, we are going to affect KPIs quality, availability, delivery, etc. That is, poor data quality affect the ability of decision makers to make informed decisions.
Every natural monopoly needs regulations to work. This happens in the utility industry (for instance) whether the service is energy, water, or gas.
Thus, we can update our partial conclusion with the following statement:
A data management system can improve the ROI attributable to data by reducing
with quality constraints ensuring that what we build is usable.
These constraints are equivalent to regulations necessary in case of natural monopolies to avoid negative impacts on consumers.
The data value chain is given by a data management paradigm (DWH, data lake, data mesh, lakehouse, mix of them) and a set of data practices (data quality, data privacy, data lineage, etc.).
Whatever data management paradigm is chosen by a company, the purpose of that choice should be to reduce the TCO (CAPEX + OPEX) under quality constraints.
The marketplace is not a market in the common sense. Since producers and consumers are not sellers and buyers, they don’t exchange money. Still, they have a value retrofitted from the ROI of any initiative coming from a decision maker. The only way to increase this value is through the minimization of the total cost of ownership of the data value chain while guaranteeing the right quality.
Every data management paradigm represents a means to reduce the TCO applying the following principles:
Since we don’t want to reduce quality, there should be e mechanism to regulate the minimum quality to deliver usable data. This can be guaranteed through the following principles:
Designing, implementing and monitoring reusability, self-service capabilities, domain ownership, computational governance and governance shift left principles corresponds to managing the data value chain through a strict governance. KPIs showing how effectively we are managing our data management paradigm and data practices are a proxy for the improvement of the data ROI.
Now, we have established principles to build valuable data through a governed data value chain. How can we exchange value?
Using data.
The data marketplace provides all the capabilities necessary for a data consumers:
The marketplace is the collector of all data generated under controlled conditions. Data consumers enter the marketplace the get orientation, find information and access data for direct consumption.
This article explores the relationship among different interwoven concepts such as data value, marketplace, and domain ownership. I go through the natural monopoly regime existing within domains of knowledge and how this affects an effective data value chain.
Learn about the principles of Elite Data Engineering that enhance modern data management practices.
Understanding the true essence of data products for effective decentralization and ownership.
Explore how the convergence of Data Mesh and ESG offers organizations a unique opportunity to drive sustainability and create long-term value.
If you enjoyed this article, consider subscribing to our Newsletter. It's packed with insights, news about Witboost, and all the knowledge we publish!