Skip to content

Spark 3.0: First hands-on approach with Adaptive Query Execution (Part 1)

Apache Spark is a distributed data processing framework that is suitable for any Big Data context thanks to its features. Despite being a relatively recent product (the first open-source BSD license was
Read More

The world is real-time, not batch – White Paper

WHITE PAPERTHE WORLD IS REAL TIME  NOT BATCH An overview of Data Streaming scenario, its stages of evolution and benefits. Are you getting your data fast enough? Why is streaming data
Read More

How to create an Apache Spark 3.0 development cluster on a single machine using Docker

Apache Spark is the most widely used in-memory parallel distributed processing framework in the field of Big Data advanced analytics. The main reasons for its success are the simplicity of use
Read More

Massive Streaming IoT platform on Hadoop and more! – Online Meetup, 22 September 2020 | 7pm

We are ready for the next Meetup! Let’s meet online, Tuesday, September the 22nd at 19:00 CEST (details about the meetup link will be given to RSVPs). As usual, the
Read More

A unified data management platform

From days to minutes: one of the world’s top-five insurance companies has improved its end-to-end delivery of data thanks to cloud services OVERVIEW   SCENARIO Many sub-companies based on different
Read More

Webinar on-demand | Managed Services for Mission Critical Big Data environment. L’esperienza con Banca Popolare di Sondrio

L’introduzione di sistemi Big Data e/o piattaforme distribuite è strettamente legata a problematiche relative all’integrazione con l’architettura esistente e la necessità dell’evoluzione dei relativi modelli di gestione. L’esternalizzazione dei servizi
Read More

Darwin, Avro schema evolution made easy!

Hi everybody! I’m a Big Data Engineer @ Agile Lab, a remote-first Big Data engineering and R&D firm located in Italy. Our main focus is to build Big Data and
Read More

Scala ‘fun’ error handling

Hi everybody! I’m Antonio Murgia, a Big Data Architect @ Agile Lab, a remote-first Big Data engineering and R&D firm located in Italy. Our main focus is to build Big
Read More

Master Data Management: challenges and basics

A Master Data Management system is the single point of truth of all data company-wide. The problem we want to manage is related to unifying and harmonizing ambiguous and discordant
Read More

AWS DataLake & Apache Flink – 7 luglio 2020, ore 19

Nuovo evento organizzato dalla community Big Data Torino (molto probabilmente sarà nuovamente online, ma seguiranno informazioni più precise nelle prossime settimane). Come sempre ospitiamo due talk: 1) AWS Datalake –
Read More