Skip to content

Spark 3.0: First hands-on approach with Adaptive Query Execution (Part 1)

Apache Spark is a distributed data processing framework that is suitable for any Big Data context thanks to its features. Despite being a relatively recent product (the first open-source BSD license was
Read More

How to create an Apache Spark 3.0 development cluster on a single machine using Docker

Apache Spark is the most widely used in-memory parallel distributed processing framework in the field of Big Data advanced analytics. The main reasons for its success are the simplicity of use
Read More

Darwin, Avro schema evolution made easy!

Hi everybody! I’m a Big Data Engineer @ Agile Lab, a remote-first Big Data engineering and R&D firm located in Italy. Our main focus is to build Big Data and
Read More

Scala ‘fun’ error handling

Hi everybody! I’m Antonio Murgia, a Big Data Architect @ Agile Lab, a remote-first Big Data engineering and R&D firm located in Italy. Our main focus is to build Big
Read More

Master Data Management: challenges and basics

A Master Data Management system is the single point of truth of all data company-wide. The problem we want to manage is related to unifying and harmonizing ambiguous and discordant
Read More

A Data Lake new era

Data Lake and Data Warehouse in real-time and low cost   “A data lake is a centralized repository that allows you to store all your structured and unstructured data at
Read More

3D Pose Estimation and Tracking from RGB-D

Hi everyone, this is my first article so I am going to introduce myself. My name is Lorenzo Graziano and I work as Data Engineer at Agile Lab, an Italian
Read More

Open Data and Big Data

Meteorological phenomena need data to be collected at a global level to capture the physical laws that govern nature. In this case, the size of the phenomenon weather is global,
Read More

NewSQL….the new era of Relational Databases?

The term NewSQL represents a new generation of Relational Database Management Systems
Read More

Management of small files on HDFS: problem analysis and best practices

Hadoop is now the Big Data de-facto standard platform in the Enterprise world. In particular, HDFS, Hadoop Distributed File System – the Hadoop module implementing the distributed storage part – is
Read More