Data Governance

Boost Data Governance with Governance Decision Record

Enhance Data Governance efficiency with the Governance Decision Record framework. Improve computational governance for structured data practices.

In this article I will present the open-source Governance Decision Record (GDR) framework and how it could boost the efficiency and efficacy of Data Governance frameworks.

We care a lot about the word practice in Agile Lab, as a foundational piece of our Elite Data Engineering manifesto.

There’s a clear reference and a perfect fit of the Governance Decision Record into the Data Mesh scenario in the form of a Federated Computational practice for Data Governance goals. Still, the idea of structuring an operating model for dealing with the decisions to be taken as part of the enterprise Data Governance accountabilities can very easily be adopted even in many data management contexts and architectures, like the Data Lakehouse.

Let’s take a look!

Table of contents

  1. Introduction
  2. 2023 State of Federated Computational Governance
  3. What is a Data Governance Decision Record?
    1. Policy Lifecycle State
    2. Policy History State
    3. Consequences and Accepted Tradeoffs
    4. Implementation Steward
    5. Where the Policy Becomes Computational
    6. Automation Needs Metadata
    7. Governance Decision Record Example
    8. Federated Decisions
  4. Wrapping Up and Next Steps

2023 State of Federated Computational Governance

First, let’s understand very briefly what the modern view of federated computational data governance is. To do so, let’s go over the status quo first.

Digital alarm clock displaying "LATE" as the hour. Used for the context of the 2023 state of federated computational governance: late and manual.

Yeah, late.

In one word, that’s what Data Governance Frameworks have been most of the time in the majority of the enterprises I was able to put my eyes on (I’m Staff Data Architect at a Data Engineering solutions firm). Centralized Data Governance teams usually:

  • Performed “offline” (late!) periodic analysis of data to verify compliance policies.

  • Used to tag schemas with business terms on data catalogs while data was already there (usable/consumed by clients).

  • Went through ad-hoc (per use-case) design decisions for data infrastructure and applications.

  • Had to think about data quality metrics for every domain of data while also being responsible for its guarantee.

  • Had sometimes to be part of the access management process.

  • Had to define the privacy constraints for every domain of data, and much more.

 

A long list, which always made the centralized practice inefficient and not scalable.

Another word that represents typical Data Governance initiative is manual.

Humans don’t scale out. We sleep, and we make mistakes. Data don’t sleep, so mistakes mean risks and losses.

High-quality analytical data must be provided with clear ownership and timeliness. They must be interoperable across domains, from both a semantic and technical point of view. They need to be safely but securely accessed and need to be discovered, understood, and especially used with trust and reliability.

Ok, I’ve brutally summarized several concepts of the Data Mesh paradigm (excuse me Zhamak- but I wanted to get straight to the chase).

To get all the features above, we need to answer the following questions:

  • What are the standards for interoperability, secure access and privacy, self-serve provisioning, immutability, change management, breaking changes, schema evolution, polysemes, cloud costs management, data cataloging, compliance, etc.? The list goes on...

  • Why are these standards required? We could trade off agility for safety, reliability for compliance, efficacy for efficiency, etc.

  • Who should take the decisions about these standards, or with whom? Who will take ownership of its maintenance along the way?

  • Where and How can we apply these standards to our data architecture, design, and engineering practice? How do we monitor and verify they are applied? 

Furthermore, one last question:

How are effective data governance decisions cataloged, maintained, and shared with the whole organization?

The answer to this last question is the whole purpose of the Governance Decision Record (GDR). It also provides a framework/tool to answer the others above.

 

What is a Data Governance Decision Record?

Every decision leads to a series of policies. They need clear ownership but can be discussed with different organizational models. Eventually, decisions can change, evolve, and improve. Such decisions also should lead to automation. These are architectural decisions.

Automation is what brings us from late to on time. From to be verified to compliant by design. From arbitrary to structural. 

The Governance Decision Record is basically an evolution of the Architectural Decision Record. It’s a broadly adopted framework to consolidate architectural decisions, with 2 major improvements dedicated to computational governance:

  1. It has a specific section to report “where and how the decision becomes a computational policy”. It provides clarity on how we automate it, in which piece of the platform, the architecture, the infrastructure the decision is implemented as automation, and who is going to steward such implementation.

  2. It suggests a policy-as-code approach. It adopts a version-controlled repository as a container for both decisions (documents) and the associated policies-as-code (pieces of testable software that implement the decision logic, through a platform of some sort — that’s up to the user)

A Governance Decision Record contains:

  • a policy lifecycle state
  • a policy history state
  • the policy title
  • the context
  • the decision
  • the consequences and accepted trade-offs.
  • an implementation steward
  • where the policy becomes computational

Also, policies can have different scopes:

  • LOCAL: the policy is applied at runtime, with local scope
  • GLOBAL: where the policy is applied at deploy-time, with global scope

A Markdown Governance Decision Record template is provided, as a version-controlled document. I believe a Git repo is way more searchable and maintainable (in terms of releases, evolutions, concurrent contribution, and enabler for automation).

Let’s dive into each section.

 

Policy Lifecycle State

This can be as simple as a label tracking down the lifecycle state of a policy. Common values are:

  • Draft - when a policy is being developed and still needs to be formally approved or has been submitted for approval.
  • Approved - when a policy has been formally approved. This makes it actionable and a reference for the overall governance.
  • Rejected - when a policy has been formally rejected after the approval process.

In the Governance Decision Record template file, some pre-compiled web-rendered labels are provided.

Let’s go over the composing parts.

 

Policy History State

This can be as simple as a label tracking down the history state of a policy. Common statuses are:

  • New - when a policy is created for the first time, it doesn't amend or supersede an existing one.
  • Amends or Amended - when an approved policy amends or is amended by another existing policy.
  • Supersedes or Superseded - when a policy supersedes or is superseded by another existing policy.
  • Deprecated - when a policy ceases to be valid/applied and no other one amends or supersedes it.

NOTE: in the case of amend* and supersede* the related policy should be linked.

In the GDR template file, some pre-compiled web-rendered labels are provided.

 

Context

This section describes what is the context to which the policy applies (and why).

 

Decision

The decision the policy aims to apply. Below, an example will clarify the scope of a possible governance decision.

 

Lifecycle

Declare what changes to the metadata (or anything else) would be considered breaking and what not breaking. This is important to implement automation at the platform level and create a robust change management process based on trust.

 

Consequences and accepted trade-offs

What we accept to happen while the policy is applied, including pros (improvements) and cons (impacts, rework, new accountabilities, or requirements).

Since there's no "universally optimal decision", the policy should also report the trade-offs the organization is going to accept with this policy, which could mean in some scenarios making the accumulated tech debt explicit.

A note on tech debt: it’s usually hidden and hard to track. When making it explicit, it is easier to measure/keep track of the overall tech debt, system quality in terms of architecture and behavior, etc.

 

Implementation Steward

The person responsible for taking care of the implementation. We talk about implementation since the policy is supposed to become as "computational" as possible, thus leading to automating the data management practice, probably with the help of a backing platform.

It can also be the role of accountability to follow the application of such policy.

 

Where the policy becomes computational

These are the specific points in the architecture, the platform, the system, the context, etc., where this policy and its checks, if any, are implemented to become automation (thus becoming "computational").

 

This is split into LOCAL and GLOBAL policy: while the former assesses the context of a policy locally implemented/applied/verified, the latter is for policies globally applied.

 

The LOCAL application is supposed to be applied/verified at runtime (e.g. in the execution environment of a data asset, being that a specific analytical workspace, distributed processing or storage system, or a running job and its output data quality metrics, etc…), while the GLOBAL one addresses checks at deployment time (an example of application can target deployments of Data Products modeled as Data Product Specification).

 

If using a descriptive modeling language, a metadata validation policy-as-code file can be provided (it will probably be integrated into the platform, e.g., using CUE lang for YAML).

 

Automation needs metadata

In the example provided in the repo, we assume that a policy becomes computational thanks to an enabling platform, driven by machine-readable metadata to model the policy content.

In the repo, examples leveraging the Data Product Specification are presented but, again, the framework is agnostic to whether we are adopting the Data Mesh paradigm or not. We are decentralizing ownership towards domains of the central Data Team that is managing the whole data platform. We are creating Data Products or “just” data assets.

 

Governance Decision Record Example (decision-related to a Data Product’s File output port)

A pretty exhaustive example policy and related metadata + policy-as-code validation files are provided in the example folder of the repo. In this example, the specific GDR is provided to describe how an Output Port of type “FILES” should be defined, provisioned, configured, described, and validated. The folder contains 3 files:

The Governance Decision Record versioning assumes this is the first policy created to address this governance topic.

The policy metadata can be validated with the policy-as-code file using the CUE CLI (if installed):

cue vet example/data-mesh/data-product/output-port/files/0001-data-product-output-port-files-example.yaml example/data-mesh/data-product/output-port/files/0001-data-product-output-port-files.cue

Many more details can be found in the official GitHub repo.

 

Federated decisions

This article won’t dive into operating models for federated decision-making, however, the Governance Decision Record with its status labels is ready to be adopted in any Agile-like workflow. Please let me know if you are interested in a follow-up article on this topic.

 

Wrapping Up and Next Steps

How do you persist data governance decisions (both as policies and as code)? What specification does such a policy follow?

The Governance Decision Record is a handy open-source option for that, but alone it’s not enough:

Next step 1 - An organizational and operating model needs to be structured to take federated decisions.

Next Step 2 - A platform is where policies become computational. Here, we also might have a solution.


Posted by Roberto Coluccio

paolo platter portrait

Staff Data Architect. Roberto takes vague requirements and molds them into scalable data solutions while being accountable for the delivery and the development team.

LinkedIn

 

Automating data governance requires your governance policies to become computational. Discover the lifecycle of a computational policy in our free-to-download white paper.

GET YOUR COPY

Similar posts