Scaled Data Mesh
This contribution wants to shed a light on some of the limits that a Data Mesh implementation will experience sooner or later.
Artificial Intelligence is everywhere — from generative chatbots to code assistants — but with speed comes risk. Across enterprises, employees are embracing a plethora of tools to speed up their work, often without IT or security approval. This phenomenon is called Shadow AI, and while it starts innocently — generating a quick summary, debugging code, or creating a draft — it can silently expose organizations to serious risks.
At Agile Lab, we help companies balance innovation with control, enabling teams to work smarter without compromising security, compliance, or trust. Let's cover the dangers of shadow AI and how to actually work smarter without compromising your organization.
Shadow AI refers to employees using AI tools, like generative chatbots, image generators, or analytics tools, without any approval or oversight from their IT or security teams.
Shadow AI has been defined by the Cloud Security Alliance as shadow IT’s rebellious cousin: quickly adopted, often well-intentioned, but running wild behind the scenes. Even if the intent is to be more productive, the user and the company might be exposed to significant risks.
It's not about sneaky sabotage; it’s about speed and convenience. People just want to get stuff done, such as summarizing a report, drafting a code snippet, or whipping up internal presentations.
What seems like harmless shortcuts can ripple into big trouble. Uploading customer logs into ChatGPT, for instance, might expose personal data, violate GDPR or other regulations, and risk turning confidential info into part of public model training sets.
Also, these tools often skirt around traditional controls. They’re web-based or browser extensions, so security systems don’t even detect them. A state IT team said their legacy systems “can't catch it,” and that AI will have to monitor AI to keep up.
When companies integrate AI tools into their workflows, the risks are far more complex than they first appear. One of the most immediate and visible dangers is data leakage. Every time employees input sensitive information—whether it’s proprietary business strategies, customer records, or internal reports—that data may end up stored on a third-party server. Depending on the AI provider’s terms of service, this information could even be used to train their models, making what should have been private potentially accessible or, at the very least, no longer fully under the company’s control.
Beyond privacy concerns, regulatory compliance becomes a looming challenge. Laws like GDPR, HIPAA, and strict financial regulations impose heavy responsibilities on how data is managed and stored. If a company cannot track where its information goes once it enters an AI system, it risks unintentionally violating these frameworks—exposing itself to significant legal and financial consequences.
Security is another critical area where the stakes are high. Not all AI tools are equally trustworthy, and malicious or compromised services can open unexpected backdoors into corporate systems. In some cases, they may even act as vectors for malware, potentially putting intellectual property and confidential assets at risk.
Then there’s the issue of accuracy. AI-generated content can be persuasive but flawed, producing errors or even complete fabrications without clear warning signs. If such information makes its way into reports, campaigns, or client communications without thorough vetting, the company’s credibility could suffer lasting damage.
Finally, there’s a less obvious but equally serious concern: the potential loss of intellectual property. By feeding a company’s unique voice, tone, and proprietary knowledge into public AI systems, organizations risk diluting their distinct identity. Over time, what sets them apart could be absorbed into the very models they rely on, inadvertently eroding their competitive advantage.
Let’s consider an AI code-completion tool: instead of using the company’s vetted and secured internal tools, you are going to copy and paste a complex function from the company private code base into a public AI tool for refined help. A company’s trade secrets are now potentially part of a publicly available dataset. From a technical perspective, Shadow AI poses an unmanaged and unmonitored data exfiltration risk. It bypasses established data loss prevention (DLP) protocols and security gateways.
Endpoint: The employee's device (laptop, phone) is the entry point. The employee installs or accesses an unapproved AI application or website.
Data ingress: Unauthorized data, potentially containing sensitive or classified information, is entered into the AI tool. This data transfer is not logged or verified by corporate security tools.
Data processing: The AI tool, often a cloud-based service such as a large language model (LLM), processes data outside the corporate security perimeter. The company has no control over how this data is stored, used, or who has access to it.
Output: The AI tool provides a response, which the employee then uses in their work. The company has no way of knowing whether this output contains inaccuracies, vulnerabilities (in the case of code), or violates compliance regulations.
The rise of user-friendly tools like ChatGPT, Midjourney, and Google's Gemini has turned Shadow AI into a widespread issue. Unlike traditional Shadow IT, which often involved downloading and installing specific software, these AI tools are easily accessible through a web browser, making them incredibly simple for employees to use without company oversight.
Blocking AI platforms outright usually backfires. People still find a way—using personal emails, devices, or VPNs. That just makes everything harder to track. Bans could, potentially, drive Gen-AI deeper underground, creating bigger blind spots. What helps: governance over bans.
The first step is to gain complete visibility into how AI is being used within your organization. Start by looking at network traffic analysis and DNS logs to identify connections to popular AI domains. This goes beyond relying on simple block lists; it allows you to see exactly who is connecting to AI services and provides a high-level view of AI tool adoption.
At the same time, keep an eye on browser extensions and desktop applications, since many AI tools are installed this way. Regular audits of approved software are essential, and monitoring for unapproved installations can prevent security risks. Modern endpoint detection and response (EDR) solutions can assist by flagging suspicious or unauthorized software.
Another important area to examine is API traffic. As more AI tools are integrated through APIs, monitoring network traffic for calls to known AI services becomes critical. This is a more technical approach, but it can reveal how developers or data scientists are integrating AI without formal approval.
Finally, if your organization uses a web proxy or secure web gateway, analyzing its logs can provide valuable insights. These records can show which employees are accessing specific websites and how often, giving you a clearer and more detailed picture of AI tool usage across the company.
An effective strategy is to adopt context-aware policies that balance security with productivity, providing clear guidelines without stifling innovation.
One way to achieve this is through tiered risk assessment, where AI tools and their use cases are categorized based on the level of risk they pose.
For example, using an AI chatbot for general research or summarizing publicly available articles might be considered low-risk and allowed, provided there is a clear policy instructing employees not to input sensitive information.
Medium-risk scenarios, such as using AI image generators for marketing concepts, could require additional measures, including management approval and a data-sharing agreement with the vendor.
On the other hand, high-risk situations—such as uploading proprietary code, customer lists, or financial data to an AI service—should be strictly prohibited and protected through automated blocking and real-time alerts.
Another key aspect is adopting data-centric controls. Instead of blocking entire tools outright, policies should focus on what data can be uploaded. Automated systems can detect and prevent the sharing of sensitive information, such as files tagged as “confidential” or documents containing personally identifiable information (PII) or protected health information (PHI), when employees attempt to send them to unapproved domains.
Finally, it’s important to offer approved alternatives and smooth redirection mechanisms. When employees try to access a blocked or high-risk AI service, they should be seamlessly guided toward a secure, authorized option. For example, if someone attempts to use a public chatbot, they could be redirected to the company’s internally hosted Large Language Model (LLM) or to a vetted, secure third-party solution, ensuring both safety and usability are preserved.
Technology alone isn't enough; you need to pair it with clear communication and education to change employee behavior.
Deploy real-time DLP tools that can scan data in transit. These tools are crucial for intercepting risky uploads and preventing data from leaving your network. They can be configured to look for specific keywords, file types, or data patterns that indicate sensitive information.
This is the most important part of the recipe. Create clear, concise, and easy-to-understand training modules, that might include: explain what Shadow AI is and why it's a risk using real-world examples (e.g., "Don't paste customer data into a public chatbot"), outline the company's official policies on AI tool usage. Clearly state what's allowed, what's prohibited, and what the approved alternatives are, and explain the "why", helping employees understand that these policies are in place to protect the company, their jobs, and customer trust, not to stifle their creativity or productivity.
AI governance isn't a one-time project; it's an ongoing effort that must evolve with technology and new risks.
By following this expanded recipe, organizations can move from a reactive, punitive stance on Shadow AI to a proactive, protective, and empowering one.
Everyone is using AI at work. It feels natural—quick answers, easy drafts—but too often it’s done without anyone watching. That’s the essence of Shadow AI. It’s not just about giving up control; it’s about accidentally leaving the door open for data leaks, compliance failures, or worse.
Governance doesn’t need to be boring. It’s more like installing guardrails on a party road: keep the fun, but don’t let things crash. When done right (visibility, thoughtful rules, training), employees keep the boost, and companies keep the control.
This contribution wants to shed a light on some of the limits that a Data Mesh implementation will experience sooner or later.
Discover the evolution of literate programming through a new spin on code documentation, tools, and implementation in this insightful blog post.
Learn how to build a Spark Connect client with a console-based tool in Ruby, step-by-step.