Glossary

Explore our glossary containing relevant technical terms and their explanations from the realms of Data Engineering, Data Management, Data Mesh, Data Governance, Data Lake, Big Data, AI and microservices explained.

Data Governance

Data governance (DG) is the process of managing the availability, usability, integrity, and security of data throughout its lifecycle to ensure that it is accurate, reliable, and compliant with all applicable regulations. Data governance includes establishing policies and procedures for data collection, storage, sharing, and destruction. It also ensures that data is used in a responsible and ethical manner.

Some key examples of data governance:
• Creating and implementing data quality standards and procedures
• Implementing data security and privacy measures
• Conducting regular data audits to identify and address any compliance risks
• Establishing a data catalog to document and track all data assets
• Using Data Access Management to ensure no sensitive data is compromised by unauthorized access

Governance Shift Left

Governance Shift Left is a progressive approach to data governance that focuses on embedding data governance practices earlier in the data lifecycle, akin to the "shift left" concept in software development. It addresses the challenges posed by the ever-expanding volume of data and aims to rectify the disconnect between data governance and data management that often results in data quality issues, security breaches, and compliance violations.

Governance Decision Record

A Governance Decision Record (GDR) is a structured framework and documentation method designed to enhance the efficiency and effectiveness of data governance within 
organizations. It serves as a comprehensive record of key decisions related to data governance, policies, and their implementation. The GDR framework is particularly valuable in the context of modern data governance practices, such as those aligned with the Data Mesh paradigm.

Computational Governance


Computational governance is the use of software and automation to enforce data governance policies and procedures. It is an emerging approach that is being adopted by organizations of all sizes to improve the efficiency, effectiveness, and scalability of their data governance programs.
Computational governance can be used to automate a wide range of data governance tasks, including:
    •    Data access control and role management
    •    Data quality monitoring and remediation
    •    Data audit and compliance reporting
    •    Data masking and encryption
    •    Data retention and deletion
    •    Data classification and tagging
    •    Data lineage tracking and reporting
By automating these tasks, computational governance can free up data governance professionals to focus on more strategic initiatives, such as developing and implementing new data governance policies and procedures.
Computational governance is also becoming increasingly important as organizations adopt new data technologies, such as cloud computing and artificial intelligence. These technologies can generate and store vast amounts of data, which can be difficult to manage and govern using traditional manual methods. Computational governance can help organizations to manage and govern their data more effectively in these new environments.
These are some of the benefits of using computational governance:


    •    Increased efficiency: Computational governance can help organizations to automate time-consuming and repetitive data governance tasks. This can free up data governance professionals to focus on more strategic initiatives.


    •    Improved effectiveness: Computational governance can help organizations to enforce data governance policies and procedures more consistently and effectively. This can reduce the risk of data breaches and compliance violations.


    •    Increased scalability: Computational governance can help organizations to scale their data governance programs as their data volumes grow. This is because computational governance can automate many of the tasks that would otherwise need to be performed manually.


Computational governance is a powerful tool that can help organizations to improve the efficiency, effectiveness, and scalability of their data governance programs. It is an emerging approach that is being adopted by organizations of all sizes to manage and govern their data more effectively.

Data Governance Architecture

Data governance architecture is the framework of policies, procedures, and technologies that are used to manage the availability, usability, integrity, and security of data. It is a critical component of any organization that relies on data to make decisions.
Data governance architecture typically includes the following components:


    •    Data governance policies and procedures: These define the rules and guidelines for how data is managed, including who is responsible for data, how data is accessed and used, and how data is protected.


    •    Data governance technologies: These tools and systems help to automate and enforce data governance policies and procedures. Common data governance technologies include data catalogs, data quality tools, and data access control systems.


    •    Data governance roles and responsibilities: These define who is responsible for different aspects of data governance, such as data stewards, data security officers, and data privacy officers.
Data governance architecture is important because it helps organizations to:


    •    Ensure that data is accurate, reliable, and trustworthy.
    •    Protect data from unauthorized access, use, or disclosure.
    •    Comply with data privacy and security regulations.
    •    Improve data sharing and collaboration across the organization.
    •    Make better decisions based on data.
These are some examples of how data governance architecture is used in practice:


    •    A financial services company might use data governance architecture to ensure that customer data is protected from unauthorized access and use. The company might also use data governance architecture to comply with financial data privacy regulations.
    •    A healthcare organization might use data governance architecture to ensure that patient data is accurate and reliable. The organization might also use data governance architecture to comply with healthcare data privacy regulations.


    •    A retail company might use data governance architecture to improve data sharing and collaboration across different departments, such as marketing, sales, and customer service. The company might also use data governance architecture to make better decisions about product development and marketing campaigns.


Data governance architecture is an essential component of any organization that relies on data to make decisions. By implementing a well-designed data governance architecture, organizations can ensure that their data is accurate, reliable, secure, and accessible to those who need it.

Agile Data Governance

Agile data governance is a data governance approach that is based on the principles of agile software development. It is a flexible and iterative approach that focuses on collaboration, continuous improvement, and rapid delivery of value.


Agile data governance is designed to help organizations manage their data more effectively in a rapidly changing environment. It emphasizes the importance of empowering data stewards and other stakeholders to make decisions about data governance, and it provides a framework for continuously improving data governance processes and practices.


Some of the key principles of agile data governance include:


    •    Collaboration: Agile data governance emphasizes the importance of collaboration between data stewards, business users, and IT professionals. This collaboration helps to ensure that data governance policies and procedures are aligned with the needs of the business and that they are effective in supporting the organization's data-driven goals.


    •    Continuous improvement: Agile data governance is an iterative approach that focuses on continuous improvement. This means that data governance policies and procedures are regularly reviewed and updated to reflect changes in the business environment, new data technologies, and evolving data governance best practices.


    •    Rapid delivery of value: Agile data governance is designed to help organizations deliver value quickly. This is done by focusing on high-priority data governance initiatives and by using an iterative approach to implement those initiatives.


Agile data governance can be used to manage all aspects of data governance, including data access control, data quality management, data security, and data privacy. It is a valuable tool for organizations of all sizes and industries that are looking to improve their data governance practices.


These are some examples of how agile data governance can be used in practice:


    •    A company could use agile data governance to implement a new data access control system. The company could start by developing a prototype of the system and then iteratively improve the system based on feedback from users.


    •    A company could use agile data governance to improve its data quality management practices. The company could start by identifying the data quality issues that are most important to the business and then develop and implement solutions to address those issues.


    •    A company could use agile data governance to implement a new data security solution. The company could start by conducting a risk assessment to identify the security risks that pose the greatest threat to the organization's data. The company could then develop and implement security solutions to mitigate those risks.


Agile data governance is a flexible and adaptable approach that can be used to manage data governance in a variety of different situations. It is a valuable tool for organizations that are looking to improve their data governance practices and to get the most out of their data.

Decentralized Data Governance

Decentralized data governance is a data governance approach that distributes decision-making and control over data  across the organization to the people who are closest to it. This is in contrast to centralized data governance, where all decisions about data are made by a central team.


It is a relatively new approach, and there is no one-size-fits-all definition. However, decentralized data governance typically involves the following elements:


    •    Empowering data stewards and other stakeholders: Decentralized data governance empowers data stewards and other stakeholders to make decisions about data governance. This can be done by establishing clear roles and responsibilities, and by providing data stewards with the resources and training they need to be successful.


    •    Using technology to automate and support data governance: Decentralized data governance can be supported by technology, such as data catalogs, data quality tools, and data lineage tools. These tools can help to automate data governance tasks, such as data access control, data quality management, and data lineage tracking.


    •    Fostering a culture of data collaboration: Decentralized data governance requires a culture of data collaboration. Data stewards and other stakeholders need to be willing to share data and collaborate with each other to ensure that data is managed effectively across the organization.


Decentralized data governance can offer a number of benefits, including:


    •    Increased agility: Decentralized data governance can help businesses to be more agile in their use of data. This is because data stewards and other stakeholders are empowered to make decisions about data governance without having to go through a central authority.


    •    Improved data quality: Decentralized data governance can help to improve data quality by ensuring that data is managed by the people who are most familiar with it.


    •    Reduced costs: Decentralized data governance can help to reduce data costs by eliminating the need for a central data governance team.


    •    Increased data engagement: Decentralized data governance can help to increase data engagement by giving more people a role in managing data.
However, decentralized data governance also has some challenges, including:


    •    Complexity: Decentralized data governance can be more complex to implement and manage than centralized data governance. This is because it requires a high level of coordination and collaboration between data stewards and other stakeholders.


    •    Risk: Decentralized data governance can increase the risk of data breaches, data quality issues, and compliance violations. This is because data is managed by a distributed group of people, and it can be difficult to maintain a consistent level of data governance across the organization.


Overall, decentralized data governance is a promising approach to data governance. It can offer a number of benefits, such as increased agility, improved data quality, reduced costs, and increased data engagement. However, it is important to be aware of the challenges involved before implementing decentralized data governance.


These are some examples of how decentralized data governance can be used in practice:


    •    A company could use decentralized data governance to manage its customer data. The company could empower its sales team to manage customer data related to their sales accounts, and empower its customer support team to manage customer data related to customer support tickets.


    •    A company could use decentralized data governance to manage its product data. The company could empower its product development team to manage product data related to the products they are developing, and empower its marketing team to manage product data related to the products they are marketing.


    •    A company could use decentralized data governance to manage its financial data. The company could empower its accounting team to manage financial data related to financial transactions, and empower its risk management team to manage financial data related to financial risks.


Decentralized data governance is a powerful tool that can help businesses to get the most out of their data. By empowering data stewards and other stakeholders to make decisions about data governance, and by fostering a culture of data collaboration, businesses can improve the agility, quality, and cost-effectiveness of their data governance programs.

Data Governance Maturity

Data Governance Maturity is a measure of how well an organization is managing its data. A mature data governance program will help organizations to comply with regulations, mitigate risk, and make better business decisions.

A data governance maturity assessment can be used to measure an organization's data governance maturity and to identify areas for improvement. The assessment typically covers the following areas:

•Organization and processes: This area assesses the organization's data governance structure, processes, and responsibilities.


•Data policies: This area assesses the organization's data policies, standards, and guidelines.


•Data compliance and risk management: This area assesses the organization's data privacy and risk management practices.


•Data quality and de-duplication: This area assesses the organization's data quality and de-duplication practices.


•Data standards and metadata management: This area assesses the organization's data standards and metadata management practices.

Organizations can use the results of their data governance maturity assessment to develop a roadmap for improving their data governance program.

Data Governance Framework

A blueprint for managing an organization's data in a secure and compliant manner. It includes policies, procedures, and standards for data collection, storage, processing, use, and sharing. The right data governance framework can effectively mitigate risk and maximize the effective use and quality of data.

To do this, multiple things are required, including defined policies and procedures, streamlined processes, and active management of an organization’s vast data ecosystem. The challenge of ensuring data trust is equal to that of ensuring fast and efficient data access and results. 

Benefits of a data governance framework:

•Improved data quality
•Reduced risk
•Increased compliance
•Better decision-making
•Increased data literacy and understanding

Examples of data governance framework elements:

•Data roles and responsibilities
•Data policies and standards
•Data access and security
•Data monitoring and reporting

Data Mesh

A data mesh architecture is a decentralized approach to data management that represents a fundamental departure from traditional centralized approaches. It embraces 
decentralization, autonomy, and self-service while promoting collaboration and agility. A data mesh architecture is distributed and domain-centric, enabling horizontal scaling and sustainable expansion in response to increasing data demands.

The main concept of decentralization espoused by a data mesh architecture promotes agility and interoperability, as domains can iterate on their data products independently and efficiently, reducing bottlenecks and dependencies. As such, the risk of creating data silos is also mitigated, as a data mesh architecture encourages sharing and interoperability of data assets across domains.

In practice this could look like:


•A retail company might have a data mesh architecture with the following domains: customer, product, order, inventory, and marketing. Each domain would own and manage its own data, and would be responsible for making that data available to other domains through data products. For example, the customer domain might create a data product that contains customer demographics and purchase history. This data product could then be used by the marketing domain to create targeted marketing campaigns.

•A financial services company might have a data mesh architecture with the following domains: customer, account, transaction, and risk. Each domain would own and manage its own data, and would be responsible for making that data available to other domains through data products. For example, the risk domain might create a data product that contains customer risk profiles. This data product could then be used by the customer service domain to identify and proactively address high-risk customers.

Data Product


A data product inside a data mesh is a self-contained, domain-specific dataset that is curated and managed by a domain team. Data products are the fundamental building blocks of a data mesh architecture, and they are designed to be easily discoverable, consumable, and interoperable.


Data products in a data mesh can take many forms, such as:
    •    Raw data, such as customer transactions or sensor readings
    •    Processed data, such as aggregated metrics or enriched customer profiles
    •    Derived data, such as machine learning models or predictive analytics results
Data products are typically made available to other teams through a self-service data platform. This allows teams to access the data they need without having to go through a central data team.


The benefits of using data products in a data mesh include:


    •    Increased agility: Data products make it easier for teams to get the data they need quickly and easily. This can lead to faster decision-making and more agile product development.


    •    Improved data quality: Data products are typically curated and managed by domain experts, which helps to ensure that the data is of high quality.


    •    Increased data accessibility: Data products are made available to other teams through a self-service data platform, which makes it easier for everyone to get the data they need.


    •    Reduced risk: Data products are self-contained and isolated from other data products, which reduces the risk of data corruption or propagation of errors.
Here are some examples of data products inside a data mesh:


    •    A customer data product that contains all of the customer data for a particular domain, such as e-commerce or customer support.


    •    A product data product that contains all of the product data for a particular domain, such as inventory or sales data.


    •    A financial data product that contains all of the financial data for a particular domain, such as accounting or risk management.


Data products in a data mesh are essential for enabling data-driven decision-making and innovation throughout the organization.

Data Product Flow

Data Product Flow is a process for identifying data products in a Data Mesh architecture. It starts with identifying business decisions and then works backwards to identify the data that is needed to support those decisions. The process also considers the ownership of the data and the need to keep the operational and analytical planes separate.

Here is a summary of the steps involved in the Data Product Flow:

  • Identify the business decisions that need to be supported by data products.

  • Identify the data that is needed to support those decisions.

  • Determine the ownership of the data.
    Consider the need to keep the operational and analytical planes separate.

  • Define the data products that will be created.
The Data Product Flow is a continuous process, as business needs and data availability change over time. It is important to regularly review the Data Product Flow to ensure that it is still meeting the needs of the organization.

Consumer-aligned Data Product

A consumer-aligned data product is a data product designed to meet the specific needs of a particular user or group of users. It is typically created by domain experts who have a deep understanding of the needs of the target users. Consumer-aligned data products are often combined with other data products to create even more valuable and actionable insights.


Examples of consumer-aligned data products include:


    •    A customer segmentation dashboard that helps marketers to identify and target different customer segments.


    •    A sales forecasting model that helps sales teams to predict future sales and revenue.


    •    A product recommendation system that helps customers to discover new products that they might be interested in.


    •    A risk assessment model that helps banks to assess the risk of lending money to different borrowers.


Consumer-aligned data products are essential for businesses that want to make data-driven decisions and improve the customer experience. By providing users with the data and insights they need, consumer-aligned data products can help businesses to increase sales, improve efficiency, and reduce costs.


Here are some of the key benefits of using consumer-aligned data products:


    •    Improved decision-making: Consumer-aligned data products provide users with the data and insights they need to make better decisions.


    •    Increased efficiency: Consumer-aligned data products can help businesses to automate tasks and streamline processes.


    •    Reduced costs: Consumer-aligned data products can help businesses to identify and eliminate waste and inefficiencies.


    •    Improved customer experience: Consumer-aligned data products can help businesses to better understand their customers and provide them with the products and services they need.


If you are looking for ways to improve your business with data, then developing and using consumer-aligned data products is a great place to start.

Source Aligned Data Product

A source-aligned data product is a data product that is designed to represent the data as it is in the operational system with minimal transformation and it is created by directly ingesting data from an operational system. This means that the data product is typically a copy of the operational data, with some basic cleaning and formatting applied.


It is a type of data product designed to provide users with access to the most up-to-date and accurate data from the source system. Source-aligned data products are typically used to provide real-time or near-real-time access to data, as well as to provide access to very large and complex datasets.

Source-aligned data products are often used to support real-time decision-making and analytics. For example, a source-aligned data product could be used to power a real-time dashboard that provides sales representatives with insights into their performance. Or, a source-aligned data product could be used to power a machine learning model that predicts customer churn.

Source-aligned data products are also often used as a starting point for creating other data products, such as aggregated data products and consumer-aligned data products. They can also be used to support a variety of data use cases, such as data warehousing, data analytics, and machine learning.
Some examples of source-aligned data products include:


    •    A data stream from a sensor network
    •    A log file from a web server
    •    A customer transaction database
    •    A product inventory system

Source-aligned data products can be used for a variety of purposes, such as:


    •    Real-time monitoring and analytics
    •    Fraud detection
    •    Customer segmentation
    •    Product recommendation systems

These are some of the benefits of using source-aligned data products:


    •    Real-time access to data: Source-aligned data products provide users with real-time access to the most up-to-date data from the source system. This is because source-aligned data products are directly ingesting data from the source system.


    •    Improved data quality: Source-aligned data products can help to improve data quality by ensuring that the data is consistent and accurate. This is because source-aligned data products are typically using the same data validation and cleansing rules as the source system.


    •    Reduced risk: Source-aligned data products can help to reduce the risk of data corruption and errors. This is because source-aligned data products are typically using the same data processing and storage technologies as the source system.

And these are some examples of how source-aligned data products are used in practice:


    •    A financial services company might use a source-aligned data product to track customer transactions in real time. This would allow the company to detect fraudulent transactions quickly and prevent them from being completed.


    •    A retail company might use a source-aligned data product to track product inventory in real time. This would allow the company to ensure that products are always in stock and to avoid lost sales.


    •    A healthcare organization might use a source-aligned data product to track patient data in real time. This would allow the organization to monitor patients' health status and provide them with the best possible care.

Source-aligned data products are an important part of any data-driven organization. By providing users with real-time access to accurate and reliable data, source-aligned data products can help organizations to make better decisions and improve their performance.


    •    Data governance policies and procedures: These define the rules and guidelines for how data is managed, including who is responsible for data, how data is accessed and used, and how data is protected.


    •    Data governance technologies: These tools and systems help to automate and enforce data governance policies and procedures. Common data governance technologies include data catalogs, data quality tools, and data access control systems.


    •    Data governance roles and responsibilities: These define who is responsible for different aspects of data governance, such as data stewards, data security officers, and data privacy officers.
Data governance architecture is important because it helps organizations to:


    •    Ensure that data is accurate, reliable, and trustworthy.


    •    Protect data from unauthorized access, use, or disclosure.


    •    Comply with data privacy and security regulations.


    •    Improve data sharing and collaboration across the organization.


    •    Make better decisions based on data.
These are some examples of how data governance architecture is used in practice:


    •    A financial services company might use data governance architecture to ensure that customer data is protected from unauthorized access and use. The company might also use data governance architecture to comply with financial data privacy regulations.


    •    A healthcare organization might use data governance architecture to ensure that patient data is accurate and reliable. The organization might also use data governance architecture to comply with healthcare data privacy regulations.


    •    A retail company might use data governance architecture to improve data sharing and collaboration across different departments, such as marketing, sales, and customer service. The company might also use data governance architecture to make better decisions about product development and marketing campaigns.


Data governance architecture is an essential component of any organization that relies on data to make decisions. By implementing a well-designed data governance architecture, organizations can ensure that their data is accurate, reliable, secure, and accessible to those who need it.

Data as a Product

Data as a Product is a pillar of the Data Mesh paradigm. It means that data is treated as a first-class citizen, with its own owners, product teams, and lifecycle. This approach has a number of benefits, including:

•Improved data quality: Data product teams are responsible for the quality of their data products, which leads to improved data quality overall.


•Increased data accessibility: Data products are designed to be easily accessible and consumable by data consumers, which makes it easier to get the data you need when you need it.


•Reduced data silos: Data products break down data silos by providing a single source of truth for data.


•Improved data governance: Data products can help to improve data governance by providing a central location to manage data access and security.


•Increased business value: Data products help organizations to get more value from their data by making it easier to use data for data-driven decision-making.

In a Data Mesh architecture, data teams are responsible for the end-to-end lifecycle of their data products. This includes:

•Identifying data opportunities: The first step is to identify opportunities to create data products that will meet the needs of data consumers. This can be done through a process of domain mapping and business analysis.


•Designing data products: Once data opportunities have been identified, the next step is to design data products that will meet those needs. This includes defining the scope of the data product, the data that will be included in the data product, and the format of the data product.


•Developing data products: Once data products have been designed, the next step is to develop them. This includes collecting, processing, and cleaning the data, and then loading it into the data product.


•Delivering data products: Once data products have been developed, the next step is to deliver them to data consumers. This can be done through a variety of channels, such as APIs, data catalogs, and data lakes.


•Maintaining data products: Once data products have been delivered, data product teams are responsible for maintaining them. This includes keeping the data up-to-date and fixing any bugs.

Data as a Product is a powerful way to improve data management and governance. By treating data as a product, organizations can get more value from their data and improve their business outcomes.


Agile data governance is designed to help organizations manage their data more effectively in a rapidly changing environment. It emphasizes the importance of empowering data stewards and other stakeholders to make decisions about data governance, and it provides a framework for continuously improving data governance processes and practices.


Some of the key principles of agile data governance include:


    •    Collaboration: Agile data governance emphasizes the importance of collaboration between data stewards, business users, and IT professionals. This collaboration helps to ensure that data governance policies and procedures are aligned with the needs of the business and that they are effective in supporting the organization's data-driven goals.


    •    Continuous improvement: Agile data governance is an iterative approach that focuses on continuous improvement. This means that data governance policies and procedures are regularly reviewed and updated to reflect changes in the business environment, new data technologies, and evolving data governance best practices.


    •    Rapid delivery of value: Agile data governance is designed to help organizations deliver value quickly. This is done by focusing on high-priority data governance initiatives and by using an iterative approach to implement those initiatives.


Agile data governance can be used to manage all aspects of data governance, including data access control, data quality management, data security, and data privacy. It is a valuable tool for organizations of all sizes and industries that are looking to improve their data governance practices.


These are some examples of how agile data governance can be used in practice:


    •    A company could use agile data governance to implement a new data access control system. The company could start by developing a prototype of the system and then iteratively improve the system based on feedback from users.


    •    A company could use agile data governance to improve its data quality management practices. The company could start by identifying the data quality issues that are most important to the business and then develop and implement solutions to address those issues.


    •    A company could use agile data governance to implement a new data security solution. The company could start by conducting a risk assessment to identify the security risks that pose the greatest threat to the organization's data. The company could then develop and implement security solutions to mitigate those risks.


Agile data governance is a flexible and adaptable approach that can be used to manage data governance in a variety of different situations. It is a valuable tool for organizations that are looking to improve their data governance practices and to get the most out of their data.

Data Mesh Readiness

Data Mesh Readiness is our measure of how well-prepared an organization is to adopt the Data Mesh paradigm. It is important to assess Data Mesh readiness before embarking on a 
Data Mesh journey, as it can help organizations identify areas where they need to improve in order to be successful.

The Data Mesh Readiness Assessment is our holistic evaluation of five key areas:


•Organizational structure: The organizational structure should be aligned with the Data Mesh paradigm, with domain teams owning and managing their own data.


•Data culture: The organization should have a data-driven culture, where data is valued and used to make decisions.


•Governance: A governance framework should be in place to ensure that data is used responsibly and ethically.


•Engineering: The organization should have the engineering capabilities to implement and manage a Data Mesh architecture.


•Technological capabilities: The organization should have the necessary technological capabilities and agnosticity to support a Data Mesh architecture, such as a data catalog, data lake, and data pipelines.

The Data Mesh Readiness Assessment provides an overall readiness benchmark that organizations can use to measure their progress and identify areas where they need to improve. By taking targeted actions to address shortcomings, organizations can increase their chances of success in adopting Data Mesh.


It is a relatively new approach, and there is no one-size-fits-all definition. However, decentralized data governance typically involves the following elements:


    •    Empowering data stewards and other stakeholders: Decentralized data governance empowers data stewards and other stakeholders to make decisions about data governance. This can be done by establishing clear roles and responsibilities, and by providing data stewards with the resources and training they need to be successful.


    •    Using technology to automate and support data governance: Decentralized data governance can be supported by technology, such as data catalogs, data quality tools, and data lineage tools. These tools can help to automate data governance tasks, such as data access control, data quality management, and data lineage tracking.


    •    Fostering a culture of data collaboration: Decentralized data governance requires a culture of data collaboration. Data stewards and other stakeholders need to be willing to share data and collaborate with each other to ensure that data is managed effectively across the organization.


Decentralized data governance can offer a number of benefits, including:


    •    Increased agility: Decentralized data governance can help businesses to be more agile in their use of data. This is because data stewards and other stakeholders are empowered to make decisions about data governance without having to go through a central authority.


    •    Improved data quality: Decentralized data governance can help to improve data quality by ensuring that data is managed by the people who are most familiar with it.


    •    Reduced costs: Decentralized data governance can help to reduce data costs by eliminating the need for a central data governance team.


    •    Increased data engagement: Decentralized data governance can help to increase data engagement by giving more people a role in managing data.
However, decentralized data governance also has some challenges, including:


    •    Complexity: Decentralized data governance can be more complex to implement and manage than centralized data governance. This is because it requires a high level of coordination and collaboration between data stewards and other stakeholders.


    •    Risk: Decentralized data governance can increase the risk of data breaches, data quality issues, and compliance violations. This is because data is managed by a distributed group of people, and it can be difficult to maintain a consistent level of data governance across the organization.


Overall, decentralized data governance is a promising approach to data governance. It can offer a number of benefits, such as increased agility, improved data quality, reduced costs, and increased data engagement. However, it is important to be aware of the challenges involved before implementing decentralized data governance.


These are some examples of how decentralized data governance can be used in practice:


    •    A company could use decentralized data governance to manage its customer data. The company could empower its sales team to manage customer data related to their sales accounts, and empower its customer support team to manage customer data related to customer support tickets.


    •    A company could use decentralized data governance to manage its product data. The company could empower its product development team to manage product data related to the products they are developing, and empower its marketing team to manage product data related to the products they are marketing.


    •    A company could use decentralized data governance to manage its financial data. The company could empower its accounting team to manage financial data related to financial transactions, and empower its risk management team to manage financial data related to financial risks.


Decentralized data governance is a powerful tool that can help businesses to get the most out of their data. By empowering data stewards and other stakeholders to make decisions about data governance, and by fostering a culture of data collaboration, businesses can improve the agility, quality, and cost-effectiveness of their data governance programs.

Data Marketplace

A data marketplace within a Data Mesh is a self-service platform that allows data producers to publish their data products and data consumers to discover and consume those data products. It is an essential component of a Data Mesh architecture, as it enables data sharing and collaboration across the organization.


The data marketplace provides a number of benefits, including:


    •    Increased data accessibility: The data marketplace makes it easy for data consumers to find and access the data they need, regardless of where it is stored.


    •    Improved data quality: The data marketplace provides data producers with tools and resources to ensure that their data products are of high quality.


    •    Reduced data costs: The data marketplace can help businesses to reduce the cost of accessing data by providing a variety of pricing options.


    •    Increased data monetization: The data marketplace can help data producers to monetize their data products by making them available to a wider audience.


These are some examples of how a data marketplace can be used within a Data Mesh:
    •    A customer data product can be published to the data marketplace so that other teams, such as marketing and sales, can access and use it.


    •    A product data product can be published to the data marketplace so that other teams, such as supply chain and product development, can access and use it.


    •    A financial data product can be published to the data marketplace so that other teams, such as finance and accounting, can access and use it.


The data marketplace is a powerful tool that can help businesses to get the most out of their data. By using a data marketplace within a Data Mesh, businesses can improve data accessibility, quality, and cost-effectiveness.


•Data policies: This area assesses the organization's data policies, standards, and guidelines.


•Data compliance and risk management: This area assesses the organization's data privacy and risk management practices.


•Data quality and de-duplication: This area assesses the organization's data quality and de-duplication practices.


•Data standards and metadata management: This area assesses the organization's data standards and metadata management practices.

Organizations can use the results of their data governance maturity assessment to develop a roadmap for improving their data governance program.

Data Mesh Interoperability

Data Mesh interoperability is the ability of data products in a Data Mesh architecture to communicate and exchange data with each other. This is achieved through the use of open standards and protocols, as well as through the development of common data models and vocabularies.


Data Mesh interoperability is important because it allows data products to be used in combination to create new and more valuable insights. For example, a customer data product could be combined with a product data product to create a more complete view of the customer journey. Or, a sales data product could be combined with a financial data product to create a more accurate forecast of future revenue.


Data Mesh interoperability is also important for enabling data sharing and collaboration across the organization. For example, a marketing team could use data from a sales team to create more targeted marketing campaigns. Or, a product development team could use data from a customer support team to identify and fix product defects.
These are some of the benefits of Data Mesh interoperability:


    •    Increased data accessibility: Data Mesh interoperability makes it easier for users to access the data they need, regardless of where it is stored.


    •    Improved data quality: Data Mesh interoperability can help to improve data quality by ensuring that data is consistent and accurate across different data products.


    •    Reduced data costs: Data Mesh interoperability can help to reduce data costs by making it easier to reuse data across different data products.


    •    Increased data agility: Data Mesh interoperability can help to increase data agility by making it easier to develop and deploy new data products.


There are a number of different ways to achieve Data Mesh interoperability. One common approach is to use open standards and protocols, such as Apache Thrift, Apache Parquet, and Apache Kafka. Another approach is to develop common data models and vocabularies that can be used by all data products in the data mesh.


Data mesh interoperability is an essential component of a successful data mesh architecture. By enabling data sharing and collaboration across the organization, data mesh interoperability can help businesses to make better decisions, improve efficiency, and reduce costs.

Metadata Activation

Metadata activation is the process of making metadata actionable. This involves using metadata to automate tasks, improve decision-making, and enable new business capabilities. 
There are a number of different ways to activate metadata, but the most common approach is to use a metadata management platform, a core feature of Witboost.

Metadata activation can be used to:

•Automatically generate data catalogs and glossaries, or even better, data marketplaces
•Identify and remediate data quality issues
•Enforce data access and security policies 
•Automate data governance workflows
•Support data-driven decision-making
•Enable new data-driven applications and products

In practice, a bank might use metadata activation to automate the process of reviewing and approving loan applications. The bank could use metadata to identify the data that is needed for each loan application, to enforce data quality standards, and to route applications to the appropriate decision-makers.

Metadata-as-Code

Companies use Metadata-as-Code as a powerful tool which helps improve the quality, accuracy, and consistency of their metadata. This can lead to a number of 
benefits, including improved efficiency, reduced risk, and better decision-making. It's a practice of managing metadata using the same principles and tools as software development. This means treating metadata as code, which makes it easier to version, test, and deploy.

Metadata-as-code can be used to manage the metadata for data catalogs, data warehouses, and other data systems. For example, a company might use metadata-as-code to manage the schema of their data warehouse. This would involve defining the schema in a code file, which would then be used to generate the data warehouse tables and columns.

Metadata-as-code has a number of benefits, including:

•Improved accuracy and consistency: Metadata-as-code helps to ensure that metadata is accurate and consistent across all systems. This is because metadata is defined in a central location, and is then used to generate the metadata for each system.


•Increased agility: Metadata-as-code makes it easier to change and update metadata. This is because metadata changes can be made in the code file, and then deployed to all systems automatically.


•Reduced risk: Metadata-as-code helps to reduce the risk of errors. This is because metadata changes can be tested and reviewed before they are deployed to production.

A company might use metadata-as-code to manage the metadata for their customer data platform. The company could use metadata-as-code to define the schema of the customer data platform, as well as the rules for how customer data is collected, stored, and used. This would allow the company to easily change and update the customer data platform metadata, and to ensure that the metadata is accurate and consistent across all systems.

Technology Agnosticism

Technology agnosticism is the principle of designing systems and applications to be independent of any particular technology or technology vendor. This means that the architecture, system, or application can be implemented using any technology that meets the requirements, without being locked into a particular vendor or platform.

For example a company might design a Data Warehouse to be technology agnostic, so that it can be implemented using any database engine, such as MySQL, PostgreSQL, or Oracle. This would give the company the flexibility to switch vendors or platforms in the future without having to rewrite the data warehouse application.

One common practice for implementing technology agnosticism is to use open standards. Open standards are vendor-neutral and can be implemented by any vendor. This makes it easier to switch vendors or platforms in the future without having to make significant changes to the system or application.

Another common practice for implementing technology agnosticism is to use abstraction layers. Abstraction layers provide a layer of separation between the system or application and the underlying technology. This makes it easier to change the underlying technology without having to make changes to the system or application.

Our practice methodology is to gather business requirements, followed by creating a logical data platform model that is technology agnostic. This model consists of a set of rules, constraints, and formal requirements that the physical implementation must fulfill. The next step is to implement processes that automatically transform this logical model into the physical one.

Benefits of technology agnosticism:

•Flexibility: Technology agnosticism gives organizations the flexibility to choose the best technology for their needs, without being locked into a particular vendor or platform.


•Cost savings: Technology agnosticism can help organizations to save money by avoiding vendor lock-in.


•Innovation: Technology agnosticism encourages organizations to adopt new technologies more quickly and easily.

Challenges of technology agnosticism:

•Complexity: Technology agnosticism can add complexity to systems and applications, as they need to be designed to be compatible with a wider range of technologies.


•Expertise: Technology agnosticism requires organizations to have the expertise to manage and support a wider range of technologies.

Data Catalog

A data catalog is a searchable inventory of all the data assets within an organization. It provides information about the data, such as its location, format, type, and purpose. Data 
catalogs can also include technical metadata, such as the schema and data quality metrics. A data catalog might include information about the following data assets:

•Customer data
•Product data
•Sales data
•Financial data
•Operational data
•Analytical data

Data catalogs are used by a variety of stakeholders across an organization, including data analysts, data scientists, business users, and IT professionals. Data catalogs can help users to:

   •Find the data they need to support their work
   • Understand the data and its purpose
    •Identify and mitigate data quality issues
    •Ensure compliance with data regulations

Best practices:

    •Make sure that the data catalog is complete and up-to-date.
    •Use consistent naming conventions for data assets.
    •Tag data assets with relevant keywords and metadata.
    •Make the data catalog accessible to all authorized users.
    •Provide training on how to use the data catalog effectively.

A Data Catalog is different from a Data Marketplace. On one hand, Data Marketplace guarantees that data is correct because it guarantees the data is compliant according to computational policies. As the data gets deployed it's checked against policies for compliance. On the other hand, Data Catalogs act ex-post. They just crawl the data and show it to the user. It doesn't have the capability to do quality checks.

Marketplace for Data

A data marketplace is a platform or ecosystem where data providers can make their data available for consumption, if free, or for purchase and data consumers can find and buy the data they need. Data marketplaces can be used to buy and sell a wide variety of data, including:


    •    Business data, such as financial data, market research data, and customer data


    •    Public data, such as government data, weather data, and census data


Data marketplaces can be used by businesses of all sizes, in all industries. They can be a valuable resource for businesses or departments inside a business that need to access data that they do not collect themselves. 


Data marketplaces can also be a valuable resource for data providers. They can provide a way for data providers to monetize their data and/or reach a wider audience. For example, a government agency might use a data marketplace to sell public data to businesses and researchers.


Here are some of the benefits of using data marketplaces:


    •    Increased data access: Data marketplaces make it easier for businesses or departments within a business to access the data they need, even if they do not collect it themselves.


    •    Reduced data costs: Data marketplaces can help businesses reduce the cost of accessing data by providing a variety of options and/or pricing options.


    •    Improved data quality: Data marketplaces typically have quality control mechanisms in place to ensure that the data they provide or sell is of high quality.


    •    Increased data monetization: Data marketplaces can help data providers to monetize their data and reach a wider audience.


Data marketplaces are a growing part of the data economy. As more and more businesses collect and generate data, data marketplaces are becoming an increasingly important way to access and share data.


Here are some examples of popular data marketplaces:
    •    Google Cloud Marketplace
    •    Amazon Web Services Data Exchange
    •    Snowflake Marketplace
    •    Microsoft Azure Marketplace
    •    Quandl
    •    Kaggle
    •    Data.gov

Decentralized Data Ecosystem

A decentralized data ecosystem is a network of interconnected data systems that are not controlled by any single entity. Instead, they are governed by a set of rules and protocols that are agreed upon by all participants in the ecosystem.


Decentralized data ecosystems offer a number of advantages over traditional centralized data systems. First, they are more resilient to attack. If one node in the network is compromised, the other nodes can continue to operate. Second, decentralized data ecosystems are more transparent. All participants have access to the same data, and all transactions are recorded on a public ledger. This makes it difficult for any one participant to defraud the system.
 Third, decentralized data ecosystems are more democratic. All participants in the ecosystem have a say in how the ecosystem is governed. This ensures that everyone benefits from the ecosystem.


Decentralized data ecosystems have the potential to benefit a wide range of industries, including finance, healthcare, and supply chain management. For example, decentralized data ecosystems could be used to create a more secure and transparent financial system, to improve the delivery of healthcare services, and to optimize supply chains.

Data Quality

Data quality is the degree to which data is accurate, complete, consistent, and timely. High-quality data is essential for making informed decisions and driving business success.

A company's customer data might be considered high-quality if it is accurate, complete, and up-to-date. This means that the data should contain the correct information about each customer, such as their name, address, and contact information. In banking, a bank's loan applicant database  should contain assets, liabilities, and credit history, thus making sure that data is also complete.

There are many different practices that organizations can use to improve data quality. One common practice is to implement data quality standards. Data quality standards are a set of rules and guidelines that define what constitutes high-quality data for the organization. These standards can be used to assess the quality of existing data and to ensure that new data is collected and stored in a way that meets the organization's data quality requirements.

Another common practice for improving data quality is to implement data quality controls. Data quality controls are processes or systems that are used to identify and correct errors in data. These controls can be implemented at different stages of the data lifecycle, such as data collection, storage, and processing.

Benefits of Data Quality:

• Improved decision-making
• Increased efficiency
• Reduced costs
• Improved customer satisfaction

CCPA

The California Consumer Privacy Act (CCPA) is a state-wide data privacy law that regulates how businesses all over the world are allowed to handle the personal information (PI) of 
California residents. The CCPA grants California residents several rights, including:

•The right to know what personal information a business collects, uses, and shares about them.
•The right to opt out of the sale of their personal information.
•The right to request that a business delete their personal information.
•The right to equal service and prices, even if they exercise their CCPA rights.

If you are a business that collects or uses the personal information of California residents, it is important to understand your obligations under the CCPA. One way to ensure compliance is to use a data privacy management solution like Witboost Privacy. Witboost Privacy provides businesses with the tools they need to manage their customers' privacy preferences, respond to data subject requests, and detect and respond to data breaches.

Data Security and Compliance

Data security and compliance is the practice of protecting data from unauthorized access, use, disclosure, disruption, modification, or destruction. 


It also involves ensuring that data is handled in accordance with all applicable laws and regulations. By taking steps to protect data and ensure compliance, organizations can reduce the risk of data breaches, regulatory fines, and damage to their reputation.

A healthcare organization might implement data security and compliance measures to protect patient data. This might include encrypting patient data, restricting access to patient data to authorized personnel, and conducting regular security audits. The organization might also implement data security policies and procedures that are tailored to the requirements of the Health Insurance Portability and Accountability Act (HIPAA).

Data security and compliance measures might include:

 •Encrypting data at rest and in transit
 •Implementing access controls to restrict who can access data
 •Conducting regular security audits and penetration testing
 •Training employees on data security best practices
 •Developing and implementing data security policies and procedures

Data Interoperability

Data interoperability is essential for organizations that need to share data between different systems or platforms. For example, a company might need to share data between its CRM system and its ERP system. Or, a government agency might need to share data with other government agencies.

Data interoperability can be achieved by using open standards and interfaces. Open standards are standards that are developed and maintained by independent organizations. Open interfaces are interfaces that are publicly documented and can be used by anyone.

There are a number of different data interoperability standards available, such as XML, JSON, and CSV. These standards define how data should be formatted and exchanged.

Data interoperability can also be achieved by using data integration tools. Data integration tools can be used to transform data from one format to another, and to load data into different systems. Data interoperability is an essential part of any modern data management strategy. By enabling organizations to share data between different systems and platforms, data interoperability can help organizations to improve their efficiency, productivity, and decision-making.

Benefits of data interoperability:

•Improved data sharing and collaboration
•Increased efficiency and productivity
•Reduced costs
•Improved decision-making
•Reduced risk

Data Stewardship

Data stewardship is the process of managing and maintaining data assets within an organization. It involves identifying, creating, curating, storing and maintaining data, and ensuring that it is used in a responsible and ethical manner. Data stewards are responsible for ensuring that data is accurate, complete, timely, and accessible to those who need it.

Data stewardship is a critical part of any data management program. By effectively managing their data, organizations can improve decision-making, reduce risk, and comply with regulations.

Data stewards play a variety of roles, including:

•Identifying and defining data assets: Data stewards identify and define the organization's data assets, including data that is collected, stored, and used.


•Creating and maintaining data policies and standards: Data stewards create and maintain data policies and standards to ensure that data is managed consistently and effectively.


•Ensuring data quality: Data stewards ensure that data is accurate, complete, timely, and accessible to those who need it.


•Protecting data security and privacy: Data stewards protect data from unauthorized access, use, disclosure, modification, or destruction.


•Promoting data literacy: Data stewards promote data literacy and understanding throughout the organization.

Data stewards can be found in a variety of industries, including healthcare, finance, retail, and government. The specific role and responsibilities of a data steward will vary depending on the organization and the industry in which it operates.

Here are some examples of data stewardship activities:

•Developing and implementing data governance policies and procedures
•Creating and maintaining data catalogs and glossaries
•Monitoring data quality and identifying and correcting errors
 •Managing data access and security
•Providing training and support to users on data management best practices

Benefits of effective data stewardship:

•Improved data quality
•Reduced risk
•Increased compliance
•Improved decision-making
•Increased data literacy and understanding

Data stewardship is essential for any organization that wants to get the most value from its data. By effectively managing their data, organizations can improve their operations, make better decisions, and reduce risk.

Data Lineage

Data lineage is the process of tracking the origin and transformation of data throughout its lifecycle. It is a critical component of data governance, as it enables organizations to understand how their data is collected, processed, stored, and used.

Data lineage can be used to improve data quality, reduce risk, and comply with regulations. For example, by understanding the data lineage of a customer record, an organization can identify the source of any errors in the data and take corrective action. Additionally, data lineage can be used to identify and mitigate data security risks. For example, if an organization knows that a particular piece of data is being used in a sensitive application, it can take steps to protect that data from unauthorized access.

Data lineage can be implemented using a variety of tools and technologies. Some common approaches include:

•Data catalogs: Data catalogs provide a central repository of information about data assets, including their data lineage.


•Data lineage tools: Data lineage tools track the movement of data through systems and applications.


•Metadata management tools: Metadata management tools track the properties of data, including its data lineage.

Data lineage can be challenging to implement and maintain, but it is a valuable investment for any organization that wants to get the most value from its data.

Here are some examples of data lineage information:

•The source of the data, such as a customer relationship management (CRM) system or a financial management system


•The transformations that the data has undergone, such as cleaning, aggregation, and enrichment


•The systems and applications that the data has been used in


•The people who have accessed the data

Data lineage can be used to answer a variety of questions, such as:

•Where did this data come from?
•How has this data been changed?
•Who has accessed this data?
•What systems and applications use this data?
•What impact will changes to this data have on other systems and applications?

Data Ownership

The legal and ethical right to control the use of data. It is the foundation of data governance, as it defines who is responsible for making decisions about how data is collected, stored, 
processed, used, and shared. Data ownership can be complex, as it can be applied to different types of data, such as personal data, customer data, and financial data.

Additionally, data ownership can change over time, as data is shared and used by different people and organizations.

Examples of data ownership:

•A company owns the data it collects from its customers. This data may include customer names, addresses, email addresses, purchase history, and product preferences. The company can use this data to improve its products and services, target its marketing campaigns, and make better business decisions.


•A government agency owns the data it collects from its citizens. This data may include census data, tax data, and criminal records. The government can use this data to provide services to its citizens, develop policies, and enforce laws.


•An individual owns their own personal data, such as their name, address, date of birth, and medical history. Individuals can choose to share their personal data with others, but they have the right to control how their data is used.

In practice, data ownership is a complex topic with many legal and ethical implications. Organizations should develop and implement data ownership policies and procedures to ensure that their data is managed in a responsible and ethical manner. These policies and procedures should address the following key areas:

•Who owns the data?
•What are the rights and responsibilities of data owners?
•How is data shared and used?
•How is data protected from unauthorized access, use, disclosure, modification, or destruction?