When “Centralized Data Engineering” generates too much frustration for business users, we typically see the so-called “Shadow-IT” emerging, thus leading to:
- Adoption of new technology platforms for data processing and manipulation
- Creation of new teams (often composed of data scientists) delivering data initiatives directly into production to achieve better time-to-market
- Creation of “bubbles” operating outside the expected standards
This practice, at first, is seen purely as an added value because, suddenly, business initiatives go live in unbelievable times, and it also creates the illusion that “in the end, it is not that difficult to deal with data.”
This generates a flywheel effect: budgets increase, the company starts to leverage the pattern more and more, and new technologies are introduced.
Since all these technologies don’t usually natively integrate with each other, it immediately becomes a data integration problem because all these platforms and teams need to insource data to operate.
As soon as data is stored, it requires dealing with all the issues related to Data Management ( security, privacy, compliance, etc. ). Business Intelligence is typically the first area where this pattern will happen because the business operates autonomously with dedicated teams. BI tools, instead of just allowing to explore and present data, often turn into full-fledged data management tools because they require ingesting a copy of the data ( data integration ) to achieve better performance.
The same phenomenon occurs with Machine Learning platforms, allowing data scientists to analyze, generate and distribute new data within an organization, operating with high flexibility and freedom. Machine learning platforms rapidly become a new way to deliver data to the business.
At the beginning all these patterns seem to generate value for the company. Still, whenever we institute practices and technologies disconnected from data governance and compliance needs, we generate Data Debt. This debt manifests itself late and on a large scale with the following effects:
- Inconsistent and unreliable data
- High maintenance costs
- Data issue troubleshooting becomes hard
- Hard to keep people accountable for the generated data.
When Shadow Data Engineering appears, the core data team loses credibility because the organization will only see lightning results and related business impacts. It is tough to demonstrate the negative effects in the long term when these are not visible yet.
After a few years, it will be highly complicated to make a comeback when collateral damages begin to appear.
Establishing good data engineering practices in this scenario is impossible, even with a clear mandate. When a Head of Data asks to implement some data management best practices, stakeholders operating in shadow mode formally embrace them. Anyway, being fully autonomous on their platform, their personal goals will always bypass the guideline implementation. Stakeholder’s MBOs always target short-term results, and teams working on data receive pressure to deliver business value.