We help companies to identify best practices to develop a big data strategy, what technologies might be used, how to build effective analytics. To understand how to run that process, we often practice “hacking sessions” for non- technical people with our customers: custom workshops to help identify the right use cases and what type of insight could be obtained, analyze and map the company’s data landscape, support the re-design of the business models or concepts and calculate the ROI of a possible project.


Machine learning is the art and science of giving computers the ability learn from data and solve problems without being explicitly programmed to do so, and in the last 10 years it has enabled many enterprises to overcome challenges once thought to be impossible, from giant corporations to small and innovative startups.
In this course you will learn how to leverage its power too, from theoretical aspects to current practical methodologies, also in distributed environments, with a focus on hands-on excercises and real use cases.

Machine Learning (3 days)

Machine Learning introduction

  • Scope and motivations
  • Terminology and workflow
  • Typical pipelines
  • Approaches and algorithms
  • Algorithms in-depth
  • Use cases and demo

Advanced Machine Learning

  • Features Engineering
  • Advanced pipelines
  • Specialized Algorithms
  • Model selection and evaluation
  • Recommender systems
  • Use cases and demo

Large Scale Machine Learning

  • Spark MLlib and Big Data
  • Deep Learning

Apache Spark plays a relevant role in modern big data platforms thanks to its performance, flexibility, modularity and integrations with other technologies.
Its easiness of use and abstraction from distributed computing makes it accessible for a wide developers audience. However this abstraction brings easily to low performances and very bad cluster resources usage if used without understanding some core concepts.
This course aims to explain how Spark works, how to use it correctly being aware of what happens under the hood, to take advantage of its features reaching high performances and scalability.

Spark Core + Spark SQL (2 days)
  • BigData overview
  • Spark Story & Community
  • Spark vs Hadoop
  • Spark Integrations
  • Spark Build
  • Spark Deployment
  • How it works
  • API overview
  • First Job (LAB)
  • RddAPI (LAB)
  • RDD vs DataFrame vs DataSet
  • DataFrame API (LAB)
  • Final project (LAB)
  • Tips & Tricks
  • Spark SQL
  • SparkSQL vs Hive vs Impala
  • SparkSQL API
  • SparkSQL Job ( API )
  • Spark Thrift Server + BI connection
Spark Streaming + Spark ML (2 days)
  • Spark Streaming
  • Spark Streaming vs Storm vs Flink
  • Spark Streaming integrations
  • First stream Job (LAB)
  • Lamda Architecture
  • Advanced Streaming (LAB)
  • Spark for Machine learning
  • ML vs MLLib
  • Algorithms
  • Clustering: K-Means (LAB)
  • Recommendation: ALS (LAB)
  • Model Server with Lambda Architecture
  • Tips & Tricks
  • Datascience & Production
  • Spark Notebook

Apache Hadoop is an open-source framework for reliable, scalable, distributed computing.
It has some main modules, like HDFS or YARN, and a lot of other Hadoop-related projects exist that could be computing engines, data storage systems, coordination services and much more!
In this complex scenario, still growing overtime, finding the right tools for the various use cases is hard.
This course aims to show the main actors in this ecosystem, what they do, how they works and how they could be use together to build complex platforms serving different business needs.

Introduction to Hadoop (2 days)

Big Data Platforms

  • Overview
  • NoSQL benchmarking

Hadoop + Cloudera components

  • Hadoop vs RDBMS
  • Hadoop in Enterprise

Data Stores

  • HDFS Advanced
  • HBase Design & DataModel
  • Solr

Data Ingestion

  • Kafka
  • Sqoop
  • Flume

Data Analysis

  • Impala
  • Hive
  • Mapreduce Concepts & Development
  • Mapreduce Input&Output


  • Security Authentication
  • Security Authorisation
  • Hadoop Processes
In-Depth Administration (1 day)

Design & Setup

  • Hardware considerations
  • Software installation
  • Launch


  • Core config
  • Sanity tests
  • Machanics
  • Resources management


  • Charts & Dashboards
  • Custom triggers, custom alerts
  • Integrations and REST API

Test & Benchmarks

  • Functionality Tests
  • Performance


  • Cloudera Manager
  • HDFS operations
  • Host maintenance
  • Disaster recovery
  • Troubleshooting

Cassandra is one of the most popular NoSQL databases for IoT, unstructured data and large OLTP workloads.
NoSQL is all about the most classical tradeoff of computer science: performance versus flexibility, and it is crucial for a project success to deal with it in the correct way from both the development and operation standpoint.
Through this course you will understand the main features of available NoSql solutions, going then in depth with Cassandra’s architecture discovering its strengths, pitfalls and best practices to make the right choices in an informed and autonomous manner.

Cassandra Core (2 days)
  • BigData and NoSQL overview
  • Installation and configuration (LAB)
  • Tools: nodetool, cqlsh, stress (LAB)
  • Replication and Consistency
  • Gossip
  • Data Model
  • CQL (LAB)
  • Write and Read Path (LAB)
  • Compaction and Tombstoning
  • Hardware best practices
Operations (1 day)
  • Environment
  • Adding nodes (LAB)
  • Remove, Decommission and Replace nodes (LAB)
  • Bootstrap and Cleanup
  • Hinted Handoff (LAB)
  • Repair (LAB)
  • Backup and Recovery
  • Security
  • DR and MultiDatacenter
  • JVM tuning
  • Disk tuning
Data Model (1 day)
  • Logical model
  • Conceptual model
  • Physical model
  • Data Types
  • How to validate model
  • Transactions
  • Client Side Joins
  • Best practices
  • Workshop (LAB)
Datastax platform and integrations (1 day)
  • Datastax overview
  • Solr Overview
  • Search fundamentals
  • Solr Queries (LAB)
  • Inverted Index and Document Scoring Datastax integration
  • CQL Extensions (LAB) Cassandra Spark Connector Read from Cassandra
  • Write into Cassandra
  • Group by, Join and Partitioning Dataframe
  • Lambda architecture

We are Also Lightbend certified trainers so we can deliver certified courses

  • Lightbend Reactive Architecture – Professional
  • Lightbend Akka Streams for Scala – Professional
  • Lightbend Scala Language – Professional (formerly Fast Track to Scala)
  • Lightbend Scala Language – Expert (formerly Advanced Scala)
  • Lightbend Akka for Scala – Professional (formerly Fast Track to Akka for Scala)
  • Lightbend Akka for Java – Professional (formerly Fast Track to Akka for Java)
  • Lightbend Akka for Scala – Expert (formerly Advanced Akka for Scala)
  • Lightbend Akka for Java – Expert (formerly Advanced Akka for Java)
  • Lightbend Apache Spark for Scala – Professional (formerly Spark Workshop)
  • Fast Track to Play with Scala