Skip to content

Data Mesh explanation

Data Mesh explanation How and why successful data-driven companies are adopting Data Mesh Paradigm shift Every once in a while, a new way of doing things comes along and changes
Read More

Spark Remote Debugging

  Spark Remote Debugging Hi everybody! I’m a Big Data Engineer @ Agile Lab, a remote-first Big Data engineering and R&D firm located in Italy. Our main focus is to
Read More

Spark 3.0: First hands-on approach with Adaptive Query Execution (Part 3)

Spark 3.0: First hands-on approach with Adaptive Query Execution (Part 3) In the previous articles (1)(2), we started analyzing the individual features of Adaptive Query Execution introduced on Spark 3.0.
Read More

Spark 3.0: First hands-on approach with Adaptive Query Execution (Part 2)

In the previous article, we started analyzing the individual features of Adaptive Query Execution introduced on Spark 3.0. In particular, the first feature analyzed was “dynamically coalescing shuffle partitions”. Let’s get
Read More

The secret to reduce Spark applications costs

Who of you has right now the reasonable certainty that all your Spark jobs are performing at their maximum without wasting more computational resources than necessary? If so, what information
Read More

Spark 3.0: First hands-on approach with Adaptive Query Execution (Part 1)

Apache Spark is a distributed data processing framework that is suitable for any Big Data context thanks to its features. Despite being a relatively recent product (the first open-source BSD license was
Read More

How to create an Apache Spark 3.0 development cluster on a single machine using Docker

Apache Spark is the most widely used in-memory parallel distributed processing framework in the field of Big Data advanced analytics. The main reasons for its success are the simplicity of use
Read More

Darwin, Avro schema evolution made easy!

Hi everybody! I’m a Big Data Engineer @ Agile Lab, a remote-first Big Data engineering and R&D firm located in Italy. Our main focus is to build Big Data and
Read More

Scala ‘fun’ error handling

Hi everybody! I’m Antonio Murgia, a Big Data Architect @ Agile Lab, a remote-first Big Data engineering and R&D firm located in Italy. Our main focus is to build Big
Read More

Master Data Management: challenges and basics

A Master Data Management system is the single point of truth of all data company-wide. The problem we want to manage is related to unifying and harmonizing ambiguous and discordant
Read More