witboost Data Capture automatically keeps in sync a data lake with the source systems.
Every data lake has an ingestion pipeline, more and more often a real-time stream of data coming from change data capture, which creates the problem of applying mutations into a data lake storage (typically immutable) without involving tons of batch jobs (degrading data freshness and creating a scheduling hell into the cluster).
witboost Data Capture:
• Decouples the pipeline from the specific CDC format
• Applies all the mutations in streaming refreshing the lake in near real-time
• Guarantees ACID compliance, supporting all major table formats (Delta, Iceberg, Hudi)
• Codeless, just configuration
• Enables business events generation directly on CDC stream
- Data Deduplication
Would you like to know more?