Skip to content

Spark Remote Debugging

  Spark Remote Debugging Hi everybody! I’m a Big Data Engineer @ Agile Lab, a remote-first Big Data engineering and R&D firm located in Italy. Our main focus is to
Read More

Spark 3.0: First hands-on approach with Adaptive Query Execution (Part 3)

Spark 3.0: First hands-on approach with Adaptive Query Execution (Part 3) In the previous articles (1)(2), we started analyzing the individual features of Adaptive Query Execution introduced on Spark 3.0.
Read More

Spark 3.0: First hands-on approach with Adaptive Query Execution (Part 2)

In the previous article, we started analyzing the individual features of Adaptive Query Execution introduced on Spark 3.0. In particular, the first feature analyzed was “dynamically coalescing shuffle partitions”. Let’s get
Read More

The secret to reduce Spark applications costs

Who of you has right now the reasonable certainty that all your Spark jobs are performing at their maximum without wasting more computational resources than necessary? If so, what information
Read More

Spark 3.0: First hands-on approach with Adaptive Query Execution (Part 1)

Apache Spark is a distributed data processing framework that is suitable for any Big Data context thanks to its features. Despite being a relatively recent product (the first open-source BSD license was
Read More

How to create an Apache Spark 3.0 development cluster on a single machine using Docker

Apache Spark is the most widely used in-memory parallel distributed processing framework in the field of Big Data advanced analytics. The main reasons for its success are the simplicity of use
Read More

Spark in Industry – Webinar 15 maggio 2020, 9.30am

Quali sono i framework tipici del mondo Big Data? Perchè Spark è il più diffuso e come viene adottato in contesti industriali? Quali sono gli use case più significativi? Queste
Read More

Data Quality for Big Data

In today’s data intensive society Big Data applications are becoming more and more common. Their success stems from the ability to analyze huge collections of data opening up new business prospectives.
Read More

WASP is now open source on GitHub!

WASP is a framework that enables the development of full stack complex real time applications like IoT for example, complex big data streaming analytics, massive data ingestion or data offload from
Read More

Big Data Trainings

Agile Lab unveils its onsite big data training programs!! We want to share knowledge about big data topics, to bring innovation. Right now the focus is on Spark and Cassandra. But we
Read More