Data Governance Framework: Pioneering Governance Shift Left
This informative piece delves into the complexities of effective data governance and explores strategies to overcome common challenges. Readers will discover how to streamline approvals, bridge data gaps, and adopt a proactive shift-left approach to achieve effective data management. The secrets to successful data governance are unravelled, making it a must-read for those seeking comprehensive insights.
The data governance process refers to the set of policies, processes, and frameworks that an organization puts in place to manage and use data effectively. It involves defining data-related roles and responsibilities, creating policies and guidelines for data access, data usage, ensuring data quality and accuracy, as well as aligning data initiatives with business goals. However, the data governance process is often broken or ineffective because they were defined after the explosion of data management and engineering.
In this article:
Disconnect between Data Governance and Data Management
The exponential growth of data in recent years has led to new challenges in data governance, such as data quality, security, privacy, metadata completeness and compliance, which were not previously considered. Moreover, the data governance process is typically acting in a remediation mode, which means that it is reactive instead of proactive. Organizations often realize the importance of data governance, only after they have faced a problem, such as a data breach or a compliance violation, or when people cannot discover and understand the data produced across the company. As a result, they have to act quickly to remediate the issue, because not having a compliant governance process will lead to compromising situations sooner or later. This reactive approach will lead to several issues, such as:
- Inconsistent data quality
- Security breaches
- Compliance violations
- Lack of Metadata
Inconsistent data quality: Without a proper process, data can become inconsistent, inaccurate, and unreliable. This leads to incorrect business decisions potentially being made, lost opportunities, and reputational damage.
Security breaches: Data breaches can occur when there are no proper controls in place to secure data. Without strong data governance, data may be exposed to unauthorized access, leading again to reputational damage, financial losses and legal liabilities.
Compliance violations: Compliance regulations such as GDPR, CCPA, or HIPAA require organizations to implement the appropriate data governance framework. Failure to comply with these regulations can result in severe penalties and fines.
Lack of Metadata: Metadata is crucial to let an approved data consumer discover, understand and connect data, with sources being available from different data producers and business domains.
Organizations simply can't keep up as their governance policies have fallen behind. Let's take a look at several data governance challenges they are facing.
Why Data Governance is Broken
1. Misalignment Between Data and Metadata
Data and metadata are crucial components of any organization’s data management process. However, they operate differently and are often managed by different teams, which leads to inconsistencies as well as gaps in the data quality. Data is generated, collected, and maintained by data engineers, while metadata is created and managed by data governance teams.
This leads us to a key question:
Why do we allow the modification of metadata directly in production?
The primary challenge is that data is often pushed to production without proper metadata, including business descriptions, tags, classifications, SLA/SLO, etc. This creates an information gap that undermines the trust of data consumers. When data is consumed without proper metadata, users are unable to interpret the data accurately or understand its context. It ultimately results in data issues, leading to incorrect business decisions and financial losses. Moreover, the data is flexible and changes frequently, while metadata is stable and should reflect the most recent changes in data definitions. However, changes in the data definitions are not always promptly reflected in the metadata. As a result, consumers of data may be using outdated metadata, which can lead to significant errors in analysis and decision-making.
2. Loss of Information During Knowledge Hand-off
The knowledge hand-off between the domain expert, data engineer, and data steward is a critical step in the data management process.
However, it is often a problematic stage that can hamper productivity and effectiveness. One of the major issues with knowledge hand-off is that it exposes you to knowledge loss. When knowledge is not effectively communicated and transferred from one person to another, the recipient may miss important details or misunderstand certain aspects.
This can result in costly errors and delays in the project timeline. Additionally, knowledge hand-off can lead to team alignment issues.
Finally, the lack of clear ownership can also be a problem. When there is no clear owner of a particular piece of knowledge, it can be difficult to determine who is responsible for maintaining and updating it. Is the knowledge reflected in the code of your data pipelines or in the metadata? Is it both? Such an unclear data governance process can lead to lost or forgotten critical information and ultimately impact the quality of the data management process.
3. Lack of Data Catalog Completeness
The completeness of a Data Catalog refers to the degree to which all relevant information is included, for all the data assets of an organization. To achieve this level of completeness, it is the responsibility of Data Stewards to map out the entire landscape of data that exists within the company. Therefore, the onus is on the capacity or capabilities of the Data Stewards, which can significantly impact the completeness of a Data Catalog. If the Data Stewards are understaffed, lack the necessary skills, or are unable to engage business stakeholders, then they may not be able to map out all of the relevant data assets effectively.
Additionally, the task of mapping out the entire data landscape can be overwhelming and complex, which can lead to data governance teams being understaffed. Due to the challenges faced by Data Governance teams, they often have to prioritize which data assets to include in the Catalog. This prioritization can sometimes lead to data assets being compromised as they are left out of the Catalog or are in an incomplete state. As a result, there may be incomplete information available to employees or stakeholders when making decisions.
Ultimately, the completeness of the Data Catalog is crucial for making informed decisions based on accurate data, but more importantly, is about generating trust that it's accurate for any data consumers. Once a user experiences missing or inconsistent metadata for a specific data asset, they will realize that the Catalog is eventually inconsistent with data, losing trust in it. If a user is not 100% sure that can rely on information in the data Catalog, they will then be required to double-check with a domain expert, losing productivity, time and wasting investments made into the Data Catalog.
4. Data Quality Effectiveness
In a scenario where the data governance team is solely responsible for defining data quality controls across all domains and use cases, there are potential issues that could arise due to the disconnection between those who have the knowledge to create data quality controls and those who implement them.
Essentially, the governance team will have to rely on other departments to provide them with the necessary information about the data in order to create effective quality controls. This handoff of information could result in delays and gaps in data quality coverage. Furthermore, the data governance team may not have a full understanding of the intricacies and nuances of each individual department’s data needs and processes. This could lead to the implementation of ineffective quality controls or controls that are too strict or not business oriented. In addition, there may be a long period of time before the proper quality controls are implemented due to the extended time needed to gather information and create effective controls.
Data Governance Redefined
One of the standard definitions used for data governance is: “Data governance (DG) is the process of managing the availability, usability, integrity and security of the data used in a company.”
So, what is the goal for the data engineering process? Just to create data without taking care of all the aforementioned characteristics?
We think this is a better approach:
“Data Governance is not just a process, Data Engineering is a process with the clear goal to produce data that is available, usable, secure, etc. Data Governance is a set of policies/standards and accountabilities that must be enforced within the Data Engineering process. The Governance team is responsible to define and enforce data governance goals as well as policies across the entire company independent of technologies and people”
Better Data Consumption Experiences
Organizations must integrate metadata management with data management to ensure data consistency, accuracy, and credibility. They must also adopt a robust metadata management framework that aligns the data, software and metadata lifecycles and ensures that metadata accurately reflects any data changes.
By doing so, organizations can bridge the gap between data and metadata, improve data quality, boost trust among data consumers and activate their metadata to build intelligent automation and better data consumption experiences.
A New and Better Data Governance Framework: Governance Shift Left
The Governance Shift Left refers to a proactive approach to data governance that emphasizes integrating data governance practices earlier in the data lifecycle. In traditional software development, the term “shift left” refers to the practice of moving activities and responsibilities earlier in the development process to catch and address issues sooner. The earliest step in the creation of data is the software implementation phase when data pipelines and other components are built.
The data Governance Shift Left is based on four pillars:
1. Metadata as code
Metadata, code, and data should follow the same lifecycle and align with the code lifecycle, as they are all part of the business value we generate.
2. You build it, you govern it
The Data Engineering team is becoming accountable for respecting and complying with governance pillars by adopting the policies.
3. Not just guidelines
Governance policies are no longer mere guidelines. They are automatically enforced through code and cannot be bypassed. Policy as Code is a key element in adopting this pattern correctly.
Governance policies should be documented, accessible, and self-explanatory. A good policy should explain why it exists, the consequences, and the trade-offs involved.
Adopting this practice can bring several benefits:
· Quality Gate
By aligning data documentation with the software lifecycle, we can apply quality gates as we do with software before it goes into production. Higher quality data means fewer manual checks, less time spent on maintenance, and ultimately less maintenance costs
· No Hand-off
We no longer need another team to create data that is accessible, usable, secure, etc. We can achieve these goals within the Data Engineering team. This means less time being wasted on putting the puzzle pieces back together, retracing steps, and requesting more information. This speed significantly improves any data project's time to market.
· No Data Entry in the Catalog
The Data Catalog automatically aligns with the governance policies, saving the Governance team’s time by eliminating manual data entry. Automation compounds its effectiveness as the data activities scale, therefore reducing errors and slashing costs.
· Data and Metadata always in sync
No information and time gaps between data and metadata mean greatly improved trust in the Data Catalog. Trust in the data enables consumers to have a better discovery experience.
Data Contract, which includes technical schemas, semantics, business metadata, SLAs, and quality expectations, is crucial in this context. Data Contracts should be software-defined and part of the artefacts produced by any data producer teams. This enables the enforcement of governance at deployment time.
Implementing Policies as code provides the ability to build quality gates for metadata and enforce them during the CI/CD process. These policies can be complex, such as checking if the semantics align with the use case and comply with the business glossary. This approach elevates the quality of metadata from the inception of a new project.
In conclusion, the concept of Governance Shift Left emerges as a powerful solution to address data governance challenges. As organizations grapple with the complexities of managing exponential data growth, bridging data gaps, and ensuring data quality, the Governance Shift Left approach offers a transformative path forward.
The journey of data governance begins by acknowledging that the existing models are fragmented and reactive. The consequences of broken data governance, ranging from inconsistent data quality to security breaches and compliance violations, highlight the urgent need for a proactive paradigm shift. By integrating data governance into the data engineering process itself, organizations can unlock a multitude of benefits that drive efficiency, accuracy, and trust in data.
This shift ensures that data and metadata are no longer separate entities managed by distinct teams, but rather integral components harmonized within the data lifecycle.
The benefits are resounding. Quality gates embedded early in the development process, elimination of hand-offs, automatic synchronization of data and metadata, and a trustworthy Data Catalog all contribute to more agile, reliable, and cost-effective enterprise data governance.
As data continues to evolve as a strategic asset, organizations must recognize the imperative of shifting their governance practices to the left. The Governance Shift Left approach is not just about rectifying data governance inefficiencies, but about pioneering a culture of excellence where data is not just managed, but truly governed from the moment of inception.
The Governance Shift Left is a data governance framework that is the compass guiding organizations towards a future where data security, compliance, and quality are not afterthoughts, but intrinsic elements of data engineering. By adopting this transformative approach, organizations can fortify their data foundations, build trust among data consumers, and accelerate their journey towards data-driven success. The Governance Shift Left beckons—a visionary approach to redefining data governance for a brighter and more secure digital future.
Posted by Paolo Platter
CTO & Co-Founder. Paolo explores emerging technologies, evaluates new concepts, and technological solutions, leading Operations and Architectures. He has been involved in very challenging Big Data projects with top enterprise companies. He's also a software mentor at the European Innovation Academy.
Don’t let Data Governance be an afterthought. Automatically apply Governance Shift Left with our Computational Governance Platform.