Data Migration Goes Cloud Native in Automotive Manufacturing
A global enterprise with an extensive data landscape partnered with Agile Lab to modernize its infrastructure by migrating to the cloud.
Leveraging a fully cloud‑native approach built on AWS and Kubernetes, the solution addressed cost inefficiencies, network bottlenecks, and vendor lock‑in risks.
Customer Context
The customer operates an extensive data infrastructure consisting of approximately 500 databases encompassing over 40,000 tables. The objective was a comprehensive migration to a cloud environment, specifically leveraging AWS services. The sheer volume and complexity of this migration project demanded an innovative, streamlined approach that ensures minimal disruption to ongoing business operations while also maintaining cost efficiency and scalability.
The Challenge
Two major challenges surfaced early in the migration:
Cost Management Issues
Initial solutions employing AWS Glue for batch processing and EMR for streaming computations were significantly costly. Despite their advantages as managed services, the financial implications for sustained usage became a notable concern.
Network Limitations
Another critical issue identified was related to network constraints. The customer’s Virtual Private Cloud (VPC) had an imposed limitation of only 128 IP addresses. Given the extensive requirements from endpoints such as AWS Data Migration Service (DMS), Lambda functions, and EMR nodes, this limit posed a considerable bottleneck, hindering scalability and efficiency.
The Solution: A Cloud-Native Approach
To resolve the identified challenges, several advanced technological solutions were integrated:
1. Infrastructure-as-Code and Self-Service Configuration
A Terraform-based solution was developed to manage configurations via a YAML file, allowing metadata-driven ingestion without direct IT involvement. This YAML file defines essential parameters such as database connection details, streaming or batch processing methods, and potential future metadata extensions.
AWS Data Migration Service (DMS) subsequently leverages this configuration for efficient Change Data Capture (CDC) operations, depositing data into a landing zone.
2. Spark Pipelines and Data Layering
Data from the landing zone undergoes further processing through Spark jobs. These jobs generate two critical data layers:
- Historical Layer: Capturing each data mutation.
- Cloud-Native Table Layer: Mirroring the on-premises table structure, enabling consistency and seamless integration within the cloud environment.
This layered approach simplifies data management, providing both historical context and current data states, enhancing analytical capabilities.
3. Adoption of Kubernetes via Amazon EKS
Transitioning computational workloads from Glue and EMR to Kubernetes, managed through AWS Elastic Kubernetes Service (EKS), significantly optimized resource allocation. Kubernetes offers a powerful, flexible environment to run Spark workloads, thereby reducing dependency on AWS-specific managed services.
4. Avoiding Lock-in with PaaS Services
Adopting Kubernetes also addressed concerns about vendor lock-in associated specific cloud-vendor solutions.
By relying on Kubernetes, which is an open standard, the infrastructure can easily be migrated across different cloud providers or even to an on-premises environment, offering the flexibility to adapt to future business requirements or technological advancements without significant re-architecture.
5. Cilium for Network Optimization
To overcome the IP address limitation of the existing VPC, Cilium was deployed to establish an Overlay Network within the Kubernetes cluster. This approach decouples real VPC IP addresses from virtual addresses, effectively bypassing network constraints and providing near-unlimited scalability for IP address allocation.
6. Resource Efficiency with Karpenter
Karpenter was introduced for efficient node scaling, dynamically adjusting cluster resources based on workload demands.
By implementing vertical scaling, the number of nodes required was reduced by optimizing resource allocation—fewer nodes with enhanced computational capacity, effectively reducing overhead and costs.
7. CloudWatch, Prometheus, and Grafana: Enhanced Observability
An extensive observability layer was established using AWS CloudWatch, Prometheus, and Grafana. This integration provides detailed visibility into pipeline health, execution success rates, and resource consumption metrics, significantly improving operational transparency.
8. Incident Management and Data Quality Assurance
The observability infrastructure also supports proactive incident management, triggering alerts for pipeline failures or anomalies. This capability ensures timely interventions, minimizing downtime and operational disruptions.
Further, metrics tracking data freshness and quality adherence were implemented, adhering to Service Level Agreements (SLAs). This approach ensures continuous compliance with data contracts, maintaining high standards of data integrity and reliability.
9. Future-Ready Data Mesh Principles
Adopting data mesh philosophies, the migration strategy emphasized self-service data infrastructure and decentralized data governance. This model allows non-technical stakeholders to initiate new database migrations autonomously through metadata-driven YAML configurations.
The infrastructure supports potential enhancements, including comprehensive metadata tagging and advanced marketplace concepts where data consumers can transparently view and manage resource consumption and associated costs.
Powering Digital Transformation through Data Platform Enablement



Data-driven organizations are three times more likely to report significant improvements in decision-making speed, helping them to respond faster to market changes
(Source: HARVARD BUSINESS SCHOOL)
Data Platforms can allow companies to realize cost savings of up to 15% through minimized redundancies, optimized resource utilization and streamlined processes.
(Source: McKinsey&Company)
Companies focusing on structured data management can improve data accuracy and consistency by 10-20% through centralized data platforms
(Source: McKinsey&Company)
Real-World Impact and Benefits
The project resulted in some key benefits:
Operational Area | Before Implementation | After Implementation |
---|---|---|
Cost Efficiency | Batch (Glue) and streaming (EMR) managed services created high ongoing costs. | Transition to Kubernetes, Spark, and Karpenter reduced costs by ~30–40% while optimizing resources. |
Network Scalability | VPC constrained by 128 IP addresses, limiting nodes and endpoints. | Cilium overlay network decoupled IP allocation, providing virtually unlimited scalability. |
Vendor Lock-In | Heavy dependency on AWS Glue and EMR increased the risk of cloud-vendor lock-in. | Kubernetes-based workloads enabled portability across clouds and hybrid environments. |
Resource Management | Static infrastructure with underutilized nodes increased overhead. | Karpenter’s intelligent scaling dynamically adjusted node capacity, reducing waste. |
Observability & Operations | Limited visibility into pipeline health and resource use, with reactive incident management. | Integrated CloudWatch, Prometheus, and Grafana provided proactive monitoring, alerts, and transparency. |
Data Quality & Compliance | Data migrations lacked SLA validation on freshness and quality. | Automated checks enforced SLAs, ensuring reliable, trusted, and compliant data flows. |
Innovation & Governance | Centralized IT involvement slowed new migrations and governance adoption. | Metadata-driven YAML configs and Data Mesh principles enabled self-service migrations and decentralized governance. |
Agility & Business Impact | Slow, costly migrations hindered the adoption of modern data-driven services. | Future-ready, flexible cloud-native ecosystem accelerated time-to-market and innovation. |
Conclusion
This migration represents a significant advancement in cloud strategy, successfully combining technical innovation with operational efficiency and paving the way for scalable, flexible data management in a cloud-native ecosystem.