Skip to content

My name is Data Mesh. I solve problems.

My name is Data Mesh. I solve problems. In spite of the innumerable advantages it provides, we often feel that technology has made our work harder and more cumbersome instead
Read More

The rise of Big Data testing

What is big data testing? Big data testing is the process of testing data for data and processing integrity; and to validate the quality of big data and ensure data
Read More

Extending Flink functions

For the most part, frameworks provide all kinds of built-in functions, but it would be cool to have the chance to extend their functionalities transparently. In particular, in this article, I want
Read More

Scala “fun” error handling — Part 2

As the title suggests, this is the second post of a series about error handling in a functional way in Scala. Last time we saw how we can encode our results
Read More

Data Mesh explanation

Data Mesh explanation How and why successful data-driven companies are adopting Data Mesh Paradigm shift Every once in a while, a new way of doing things comes along and changes
Read More

Spark Remote Debugging

  Spark Remote Debugging Hi everybody! I’m a Big Data Engineer @ Agile Lab, a remote-first Big Data engineering and R&D firm located in Italy. Our main focus is to
Read More

Spark 3.0: First hands-on approach with Adaptive Query Execution (Part 3)

Spark 3.0: First hands-on approach with Adaptive Query Execution (Part 3) In the previous articles (1)(2), we started analyzing the individual features of Adaptive Query Execution introduced on Spark 3.0.
Read More

Spark 3.0: First hands-on approach with Adaptive Query Execution (Part 2)

In the previous article, we started analyzing the individual features of Adaptive Query Execution introduced on Spark 3.0. In particular, the first feature analyzed was “dynamically coalescing shuffle partitions”. Let’s get
Read More

The secret to reduce Spark applications costs

Who of you has right now the reasonable certainty that all your Spark jobs are performing at their maximum without wasting more computational resources than necessary? If so, what information
Read More

Spark 3.0: First hands-on approach with Adaptive Query Execution (Part 1)

Apache Spark is a distributed data processing framework that is suitable for any Big Data context thanks to its features. Despite being a relatively recent product (the first open-source BSD license was
Read More