How to run Dremio with AWS Glue on your local machine
Learn how to locally replicate cloud infrastructure using Docker, Dremio, LocalStack, and Spark for testing and experimentation.
Conversational access to data is becoming normalized as just another way to explore, analyze, and extract insights from data. Modern large language models (LLMs), with a few caveats, enable building conversational agents that can perform routine exploratory analysis including descriptive statistics, aggregations, and KPI calculations. This capability proves valuable for both experienced analysts exploring new datasets and business-oriented users seeking quick insights without modifying existing dashboards.
But what are the key factors needed to build conversational data access in enterprise settings?
Figure 1. Logical architecture of an AI Agent for conversational analytics on enterprise data. The Agent backend implements a retriever-router-analyzer pattern. Data Layer: each hexagon represents a single “data perimeter” that can be selected by the agent. A data perimeter can be a Data Product or data project containing a small set of closely related tables. Semantic Layer: single source of truth for accessing the metadata, automation should enforce that the data and semantic layers are kept in sync. LLM Providers: a selection of vetted LLMs accessed through an LLM Gateway.
We start introducing the high-level architecture an AI Agent for data analytics, illustrating how it interacts with other enterprise systems.
The foundation of effective conversational data access lies in understanding how AI agents navigate and query enterprise data. I propose an implementation that follows the retriever-router-analyzer pattern: the agent first identifies relevant data sources, routes the query to appropriate systems, and then analyzes and presents results. This simple three-step process addresses the core challenge of enterprise data access—connecting user intent with the right data across potentially hundreds or thousands of available sources.
In this section, we illustrate the main architectural principle to follow while integrating an AI Agent in an enterprise environment.
Decouple agent logic from both LLM and data sources. The agent's decision-making process should be independent of which language model powers it or which databases it queries. This separation enables you to upgrade models, switch data sources, or modify business logic without cascading changes across your system.
Establish a single source of truth for metadata. Business metadata—table descriptions, column definitions, business rules—must come from one authoritative system that stays synchronized with actual data. This metadata becomes as critical as your source code; changes to it directly affect agent behavior and should be managed with the same rigor as any other system dependency. I will come back to this point in section 3.
Favor graph-based over multi-agent frameworks. Agent development framework typically falls in two categories: graph-based and multi-agent, depending on the core abstraction offered. Mathematically, both are equivalent, but a graph-based abstraction provides a simpler mental model of the program flow.
In practice, graph-based frameworks like LangGraph, PydanticAI, or Haystack tend to provide more control, offering less baked-in (i.e., hidden) behavior, making them more observable and customizable compared to multi-agent alternatives like CrewAI or AutoGen, at least at this point in time.
For example, a typical implementation of a retriever-router-analyzer agent requires just three main decision nodes, plus a node for data querying and visualization tools. This simple structure, together with an LLM-driven flow, allows for handling complex enterprise data scenarios while remaining debuggable and maintainable.
The architectural foundation determines everything that follows—get this right, and your agent can scale across your enterprise; get it wrong, and you'll face inconsistent performances and costly rewrites as requirements evolve.
In the previous section, we have seen the AI Agent architecture. We now go deeper into some technical implementation details related to the deployment process, the LLM selection, and compliance constraints.
From an infrastructure perspective, agent logic can be implemented in a simple containerized backend that doesn't require GPU resources. This allows for flexible deployment on serverless infrastructure like Azure Function App or traditional Kubernetes clusters, depending on your scaling requirements and infrastructure preferences.
Optional session history management requires an additional dedicated database to maintain conversation context across user interactions. Consider whether your use case demands persistent sessions or if stateless interactions suffice for your enterprise users.
Most of today’s mid-tier or top-tier LLMs are viable options to consider for an AI Agent doing data analysis. Due to the quickly evolving landscape, rather than recommending specific models, I suggest focusing on the fundamental trade-offs that remain constant. You'll need to balance cost, latency, and throughput against planning and tool-calling performance.
Higher-capability models excel at complex query generation and reliable tool execution but come with increased costs and potentially higher latency. Smaller, faster models may struggle with nuanced data interpretation or multi-step reasoning, but offer better economics for high-volume usage.
Most importantly, benchmark your chosen model against your specific data and query patterns. Generic benchmarks rarely reflect the performance achieved with the specific data landscape and user requests you'll encounter within your enterprise.
EU companies must ensure their chosen models are hosted within EU boundaries to comply with privacy and AI regulations. For organizations requiring complete privacy control, on-premises self-hosting becomes cost-effective quickly as usage scales beyond critical thresholds.
Source your models from trusted providers regardless of deployment approach—the reliability of your conversational data access depends entirely on the underlying model's consistency and availability. Open weight models (models where weights are available, with varying licensing terms) deployed on your enterprise infrastructure are a suitable option to overcome privacy concerns. However, make sure to comply with the model licensing terms for your enterprise use.
In this section, we describe how an AI Agent is able to scale to a very large number of tables, as is the case typically in every enterprise. We highlight that an accurate semantic data description is crucial to achieve reliable AI Agent answers. Finally, we describe the ideal data access point for the AI Agent.
An LLM can only manage a small number of tables at once. As we stuff more and more table schemas and descriptions in the context window, performance falls (both latency and accuracy), and costs skyrocket. Remember that for each question, the agent's LLM needs to process the entire schema and descriptions of all considered tables, which quickly becomes the dominant source of token usage.
Selecting a small number of tables is not enough; you should also ensure that the selected tables are all related to a coherent business concept. If you've been developing data products in recent years, you may recognize that a similar logic applies to defining a data product "perimeter" (see How to identify data products? and The Data Product Trap by Paolo Platter).
A data product's output ports represent a set of tables related to a common business concept, designed to be self-contained, semantically documented, and interoperable with other data products. So, if your company has already implemented data products, you get the definition of table sets for free.
Finally, to handle hundreds or thousands of data products, the AI Agent first performs a retrieval (semantic search) from the semantic layer to identify the most useful data product for the current request. Next, the AI Agent loads the semantic description and performs several iterations of tool use until it can answer the user's query. This is a common compositional pattern for architecting agents called retriever-router-analyzer (see figure 2 below).
The quality of semantic descriptions for your data is crucial to agent performance. The garbage in, garbage out mantra has no exceptions for AI Agents. Now, you should treat table and column descriptions as part of your "source code." Changes to metadata will affect agent behavior, so you should bind metadata changes to the data/code deployment lifecycle.
Think about it: an incremental data product release can be a breaking change for the AI Agent even if it changes only semantic descriptions! If your enterprise already requires that metadata be deployed together with your data project, you already have the proper change management in place to enable AI Agent data consumers.
Finally, the semantic business metadata should be retrieved from a single source of truth, the “semantic layer”, a system that can be trusted to stay up-to-date and synchronized with actual data. I will provide more information on the semantic layer in the section Features of the Semantic Layer.
How do you connect the agent backend to multiple data sources in your enterprise? Should the AI Agent access operational systems directly, or the analytical plane (enterprise data warehouse, data lake, data products)?
Consider that an AI Agent, just like us Humans, requires an up-to-date semantic layer (see previous section on Metadata). Accessing the analytical data and its semantic layer is an easier option, especially if you already have policies in place to ensure metadata is complete and up-to-date. Conversely, if you opt for accessing an operational system, you need to create and maintain the semantic layer as a separate resource, something that is time-consuming, error-prone, and non-scalable.
This section illustrates how to structure access to the multiple data systems typically present in any real-world enterprise analytical layer.
LLMs are text processors that take text as input and provide text as output. To "act" on the external world, we enable the tool-use pattern, one of the fundamental agentic design patterns (Agentic Design Pattern by Andrew Ng). We add natural-language descriptions of available tools to the prompt, plus instructions and examples on how to call them. With this information, the LLM can "request" tool execution by outputting text in a predefined format, typically JSON or Markdown.
Modern foundational LLMs are fine-tuned for tool use, meaning they're trained to output a preferred format when requesting tool execution. Ensure your framework employs the optimal tool prompt format for your specific LLM—this avoids degrading the tool-calling success rates and increases the agent reliability.
Figure 2. Scenario: The AI Agent accesses a data product through a general-purpose tabular Output Port. The AI Agent needs a tech adapter for the table format of the Output Port and a query engine to query the data. The AI Agent loads the relevant semantic descriptions from a Metadata Store.
What kind of tools can we provide to the LLM to query data? We have two primary approaches, each with distinct trade-offs.
The most straightforward method is providing a generic SQL executor function. This general-purpose approach allows the LLM to write custom queries on the fly to answer arbitrary user requests. It's powerful and flexible but relies entirely on the LLM's capability to understand user requests and synthesize correct query language (i.e. SQL dialect). This approach works reasonably well when data complexity is limited and/or you're using sufficiently capable (typically higher-cost) LLMs.
In any case, always add a fail-safe mechanism to limit the maximum number of rows returned by the query.
When LLMs struggle with complex query generation or cost considerations demand smaller models, you can use less general-purpose functions. For example, one function to "GET records" (applying only row-filters and column projections), a second function "Split-Apply-Combine" to apply a common query pattern (useful to compute KPIs). These specialized functions are easier for LLMs to call correctly compared to generating equivalent SQL from scratch, but they can still be applied to any data. With this approach, you can increase tool-calling success rates, keeping the same LLM, or keep the same success rate while employing smaller, cheaper models.
You can always provide both the specialized functions and the generic SQL executor as tools, instructing the LLM to use the latter when the first two are not suitable. This unified approach maintains the benefit of general-purpose queries while increasing the effectiveness of more common queries.
Figure 3. Scenario: the AI Agent accesses a data product through a dedicated MCP Output Port. The AI Agent needs an MCP client to retrieve the list of tools exposed by the output port. As opposed to the scenario in Figure 2, the query engine is provided by the Data Product. As before, the AI Agent loads the relevant semantic descriptions from a Metadata Store.
Each data project can expose its data to AI agents through standardized interfaces. There are two approaches: use the same interfaces (i.e., output port technology) used by all other data consumers for the AI agent as well (Figure 2) or create a specialized output port tailored for AI agent consumption (Figure 3).
The first approach applies to enterprises that have widely adopted one main output port standard for batch tabular data (for example, Delta Lake or Iceberg). In this case, the agent can access the generic output port just like any other consumer. The advantage is that you can implement an analytics AI agent in an enterprise without changing any data products or data sources. The drawbacks are that it couples the AI agent to the (potentially numerous) output port technologies. This drawback can be mitigated using a backend-agnostic DataFrame API such as Ibis or Narwhals. Still, the AI agent needs access to a query engine to query the data. For more advanced cases, for example when you need to join/correlate data across different backends, you need a full-fledged data virtualization solution such as Dremio or Denodo.
The second approach requires data products to expose output ports through a specialized API for AI agents, for example, using Anthropic's Model Context Protocol (MCP). This approach decouples the agent from all the different output port technologies used in the enterprise. It requires that all data products adopt and expose the dedicated output port for agents. This makes the AI agent simpler, since it shifts tech adapter from the AI Agent to the data producer. Furthermore, the AI Agent doesn’t need a dedicated query engine. Additionally, this approach gives data product owners the opportunity to expose additional custom tools that offer specialized views or KPIs encoding domain-specific concepts, enhancing the capabilities of the AI agent to extract domain-specific insights.
Regardless of the approach, the choice should be standardized by your platform team, which should provide templates to both data project developers and AI Agent developers. Consistency in data access patterns is paramount to reduce complexity and improve maintainability across your enterprise AI Agent ecosystem.
The semantic layer should expose both search functionality and metadata retrieval, accessible through API endpoints. The search function enables the AI agent to find “data perimeters” (or data products) suitable to answer the user request, based on a full-text and semantic search. The retrieval function allows to obtain full metadata (schema, semantic descriptions, join keys, business terms) for a specific data perimeter.
Like humans, AI Agents need this semantic description to formulate specific queries needed to answer the user request. Since the semantic descriptions are part of the instructions passed to the AI Agent, their accuracy is crucial. For this reason, they should be generated by the person with the deepest knowledge on the data (typically the data product owner). Furthermore, the metadata should be treated as source code: any metadata change should trigger a data version change, non-trivial description changes should be flagged as breaking changes, and the metadata should be released within the same process as the data and code. This ensures that metadata remains accurate and in sync with the data, a condition necessary for the AI Agent to correctly answer user questions.
In this section, we give an indication of the agent development practice, evaluation, and deployment considerations.
Developing AI Agents requires a scientific and engineering-driven approach that combines both Data Science and Software Engineering expertise. This hybrid approach is crucial—data scientists bring the analytical mindset needed for understanding model behavior and performance analysis, while software engineers provide the architectural discipline required for production-grade systems.
Avoid the common pitfall of treating agent development as purely a software engineering or purely a data science problem. The intersection of these domains creates unique challenges that require both perspectives to solve effectively.
Implement comprehensive performance benchmarking covering both individual steps and the end-to-end system.
As an individual step, you can monitor:
To benchmark the end-to-end system, you can validate answers to a reference set of questions using the LLM-as-a-Judge approach (refer to this post for more details: Creating a LLM-as-a-Judge That Drives Business Results by Hamel Husain).
Auto-optimizers such as AdalFlow or DSPy can significantly accelerate development by using performance scores to optimize end-to-end agent performance automatically. These tools replace lengthy, brittle trial-and-error manual prompt optimizations that are often based on subjective evaluations, providing more systematic and reproducible improvements.
Success in enterprise deployment extends beyond technical implementation. Here are a few tips:
Deploying a robust and effective "conversational data interface” to your enterprise is an achievable goal, provided you address holistically the various aspects involved with this new paradigm of interaction.
The key to success lies in treating this as a comprehensive system design challenge rather than simply deploying a single AI model. From architectural principles and data strategy to implementation patterns and operational excellence, each component must work in harmony to deliver reliable, valuable insights to your users.
Conversational access to data is becoming the new normal. Organizations that approach this transformation systematically—with proper architecture, data governance, and development practices—will gain significant competitive advantages in how their teams interact with and derive insights from enterprise data.
Acknowledgements: I would like to thank Paolo Platter for the many insightful discussions and Antonio Murgia, Irene Donato, and many other Agilers for providing feedback, insights, and inspirations that helped solidify this article.
Learn how to locally replicate cloud infrastructure using Docker, Dremio, LocalStack, and Spark for testing and experimentation.
Explore the distinctions between AI agents and tools, their real-world applications, and their future implications for businesses and technology.
Navigating data privacy with large language models: Learn about risks and challenges for secure and compliant implementation in the age of AI.