Real-Time Analytics in Mission Critical Applications For Usage Based Insurance (UBI) services

Real-Time Analytics in Mission Critical Applications 

for Usage-Based Insurance (UBI) services

“The IOT generates large amounts of data and above all the expectation, for those who use them, to have processed and transformed them into useful information in real time. Therefore, we need solutions that are scalable in terms of economics and efficiency and, at the same time, guarantee the expected results. The choice of partners such as Agile Lab, always aware of innovation and with a solid expertise in new technologies, allows us to achieve ambitious results and to adopt innovative solutions for the challenges of tomorrow.”


Paolo Giuseppetti

Head of Innovation and Connected Mobility Platform, Vodafone Automotive


In order to to provide insurance companies with specific risk profiles for every driver, extracted from on-board black boxes, Vodafone Automotive needs a whole new systems for acquisition and processing of telemetry data, capable of Big Data collection, management and analysis in real time for over 227 million weekly mileage and driving related messages.  

The new architecture, based on Cloudera, adding the components of Apache, Kafka, Spark and the combination of HDFS and HBase, has been specifically designed to be available on-premises. The Cloud Computing branch of Vodafone handles the management and maintenance of the environments, while the company relies on its ongoing collaboration with AgileLab for the application services development and management. 

This new architecture – introducing a more flexible and innovative platform – has enabled the company to meet the level of service expected by Clients (i.e. a maximum of 15 minutes of latency between data generation and its reception on the Client’s Cloud, including cross-time on both mobile networks and the Internet) and it has become the solid foundation for developing services that the company wouldn’t otherwise have been able to offer. 

Client contest

Vodafone Automotive, part of Vodafone Group, operates in the specialized segment of Vodafone IoT, and it focuses on providing IT platforms to the mobility world. The company supplies telecommunication services and products targeted to the automotive sector. In particularit offers stolen vehicle assistance and recovery services, theft prevention, crash and vehicle accident management services. Specifically for insurance-related services, it provides analytical functions for driving habits and styles, risk-management assessmentsas well as a wide scope of vehicle management services (i.e. maintenance and management life cycle) for both fleet and OEM manufacturers (of automotive on board electronics). The main focus of our case study is on Usage Based Insurance (UBI).  


UBI aims to address the traditional trade-off between the driver’s privacy and the cost of the insurance plan, a key aspect of the car insurance sector. Vodafone Automotive is able to provide insurance companies with different driving style profiles by collecting multiple information, as for instance the location and acceleration of a car, by installing on board the vehicle an electronic device (the black box).  

Through this information, Vodafone Automotive helps insurance companies create a “score” that represents with the outmost precision the risk associated with the type of driver and therefore the single insurance plan, also providing data on the type of roads traveled, driving times, and many more.  

The project was born in light of a necessity for extracting the maximum value from the data generated by on-board devices (often called black boxes, like the flight recorders on airplanes), to better cater to the needs of insurance companies. They can therefore utilize this data for policy pricing, (computationally, is organized in time intervals, with pre-established elaboration cycles and post-elaboration submission of data sets as per agreed with the company and used, for example, at the time of policy renewal, quarterly or even annually), but also to offer new services to their subscribers, strengthening and enhancing the customer experience. For example, by sending alerts related to the level of danger of a certain area (i.e. where the client may have parked), or localized weather-related alerts (such as hail alerts).  

In compliance with Vodafone Automotive’s goal to increase safety on the street, the company launched this project to revise and revolutionize their systems for acquisition and processing of telemetry data (generated by systems previously installed by Vodafone Automotive on insured vehicles) by introducing the features and capabilities offered by the Cloudera platform to collect, manage and analyze in real-time the Big Data sent by the installed black boxes.  


The Vodafone Automotive project, started in 2017, was aimed at deploying, managing and consolidating a platform able to collect and elaborate big quantities of data, to help insurance companies risk evaluation process both for issuing insurance plans and offering real-time services to their customers. The project led to the replacement of the previous architecture with a newer and innovative one, based on Cloudera, adding the components of Apache, Kafka, Spark and the combination of HDFS and HBase (architectural model “Lambda”), and later on also NIFI – which can elaborate data with a latency of a few seconds, regardless of their quantity or frequency. The primary feature of this platform is its ability to flexibly manage high volumes of data, and to be able to expand and grow according to the company’s evolving needs. Data processing occurs mainly through Apache Spark, which captures the data and processes them after their extraction from Code Kafka. Afterwards, the platform drops the base data on a distributed HDFS file system. Whereas processed data are saved on the NoSQL Database, achieving impressive performance results. The collected data are then sorted through the Lambda architecture, enabling both real-time data processing and effective storage for future re-processing needs. To accomplish the latter function, the architecture relies on NoSQL HBase. It should be noted that the primary data processing reconstructs the driver’s path from localization and speed data, and geographical information acquired through GPS system and the accelerometer in the vehicle’s black box.

Additional operations are required to guarantee the reliability of collected data: it is fundamental, for instance, to perform data cleansing and preparation, in order to spot any device malfunctions or differentiate between a pothole and a car’s impact (and consequently understand whether or not to provide assistance to the driver). The new architecture has been specifically designed to be available on-premises, and its related servers are placed in Vodafone Group’s Technology Hub in Milan, (for Vodafone Italy), which hosts the VFIT services. Also, a back-up server cluster has been created in the twin data center of Vodafone, as part of the disaster recovery plan. The Cloud Computing branch of Vodafone handles the management and maintenance of the environments (GDC – Group Data Center, where the data processing resources are being implemented. Vodafone caters to the Southern European market through this structure), while the company relies on its collaboration with AgileLab for the application services development and management. As a matter of fact, the architectural evolution that Vodafone Automotive implemented allowed the company to not only effectively handle high volumes of data, but also represented a qualifying element to guarantee the availability of real-time analyzed and processed data to insurance companies. Thanks to the new platform, today insurance companies are able to receive real-time information on the driving habits of their client, with a latency of mere minutes from the event registration in the vehicle’s onboard black box.  

The following are some figures from our project:  

  • over 33 million messages per day
  • 227 million weekly mileage and driving-related messages (for insured clients) 
  • 130 terabytes of data collected in 3 years.  

From a management point of view, the project has required the involvement of a dedicated team, that focuses exclusively on the design and development of new architectural scenarios. This organizational choice and the collaboration with AgileLab – which took charge of every detail regarding the planning, engineering, and optimization of the application platform – played a key role in the success of the project. After the project launched, the team created by Vodafone Automotive to manage the development phases of the project, joined the IT of the company to work in the areas of Project Management, Development, Operations.  

The greatest challenge faced by the company has been the need to integrate numerous recent technologies into its existing information system kit. The IT department was required to manage a series of new tools and platforms, and take all the necessary steps (also from a training perspective) to both maintain and employ those technologies to their fullest potential.  

Achieved Results and Benefits

First of all, the new architecture – introducing a more flexible and innovative platform – has enabled the company to meet the level of service expected by Clients (i.e. a maximum of 15 minutes of latency between data generation and its reception on the Client’s Cloud, including cross-time on both mobile networks and the Internet). In addition, the new architecture has become the solid foundation for developing services that the company wouldn’t otherwise have been able to offer. It allowed Vodafone Automotive to acquire a definitive competitive advantage, positioning itself as one of the most innovative players on the market.

Future Developments

Among the potential evolutions of the platform, there is the possibility of adding a Machine Learning function to be applied to reliability and data quality check processes, even in streaming mode (as they occur). The introduction of automatic learning techniques would allow the company to identify any possible device malfunction a lot more quickly, becoming proactive in the process of maintenance and replacement, when needed, of the black box. This would also bring about the added benefit of avoiding corrective  measures on corrupted data ingested because of device errors or malfunctions. 

This case study was published in Italian by the Management Engineering department of Milan’s Polytechnic University as part of the 2020 Business case of Big Data and Business Analytics Digital Innovation Observatories of the of the School of Management of Politecnico ( Copyright © Politecnico di Milano / Dipartimento di Ingegneria Gestionale)  

Il caso Vodafone Automotive: Real-time Analytics in applicazioni mission Critical con Agile Lab WASP e Cloudera

Il caso Vodafone Automotive: Real-time Analytics in applicazioni mission Critical con Agile Lab WASP e Cloudera

Si è da poco conclusa l’edizione 2020 dell’Osservatorio Big Data & Business Analytics del Politecnico di Milano, da anni un punto di riferimento per il mondo della ricerca e non solo, che, come ogni anno, ha delineato l’evoluzione del mercato e portato alla ribalta interessanti use case realizzati in ambito Big Data.

Tra questi è stato inserito il progetto realizzato da Vodafone Automotive, l’azienda del noto gruppo che si occupa di fornire servizi, prodotti ed architetture telematiche afferenti al mondo della mobilità. Grazie all’impiego della piattaforma WASP – Wide Analytics Streaming Platform – di Agile Lab e alla tecnologia Cloudera, Vodafone Automotive è stata in grado di gestire la grande mole di dati provenienti dalle black box installate sulle vetture per raccogliere, analizzare ed elaborare un’ingente mole di dati ed erogare servizi ai propri clienti in near-real-time.

AgileRAI: scalable, real-time, semantic video search


scalable, real-time, semantic video search

Television content has a very long life cycle. After being produced and broadcasted it is archived in order to be reused later (days, months, even years)  for both rebroadcasts and inclusion in other content. This content archive is a central asset in today’s entertainment industry, and lots of resources are devoted to its creation, maintenance, management and use.
An mere archive of content is not useful by itself : we need an efficient way to search the archive in order to find what is relevant to our need. This means we need information on the subject matter of the content, which is usually obtained by manual addition of metadata; this approach however requires human intervention and thus it is costly, slow and prone to biases and errors. Furthermore, given the ever-growing amount of content produced everyday, the manual approach is not sustainable in the long term.
AgileRAI is a a system for multimedia content analysis that takes on this challenge by providing a scalable platform for innovative multimedia archiving and production services by leveraging advanced pattern recognition techniques and combining them with a modern distributed architecture.
The project has been developed in collaboration with CELI that has implemented the semantic annotation and the user interface. This is a good example of how two small and dynamic companies could drive innovation with a public giant like RAI.



The platform supports real time ingestion of multiple video streams of various types (e.g. RTP streams, video files from storage systems, etc.) on which different techniques are applied in a parallel and scalable way in order to recognize specific visual patterns like logos, paintings, buildings, monuments, etc. The video streams are analyzed by extracting visual features from the frames, which are then matched to a reference database of visual patterns to produce a set of meta-tags describing the ingested contents in real time. Furthermore, these tags can then be further enriched with semantic thanks to open semantic data repositories. This allows searching and retrieval operations based on high-level concepts. The architecture is designed to be parallel and scalable in order to ensure near real time, frame by frame pattern detection and recognition in video data.
In the experimental setup of the system, we leveraged the Compact Descriptors for Visual Search (CDVS) standard proposed by the Moving Picture Experts Group (MPEG). The CDVS standard describes how to extract, compress and decompress relevant visual information in a robust and interoperable format that leverages the Scale-Invariant Feature Transform (SIFT) algorithm for feature detection.
The core building blocks of CDVS consists in global and local descriptor extractors and compressors based on selected SIFT features. The first operation to extract these descriptors is removing color information. Then, candidate key-points are extracted using SIFT. These candiate points are then evaluated and filtered according to various metrics, in order to select those that are best able to provide a “feature description” of the objects contained. These selected points are then encoded in the descriptors, which can then be used to determine if two images have objects in common.
In order to efficiently match the patterns encoded in the CDVS descriptors with the reference patterns, a reference database is used. This database is built from a collection of images containing visual objects of interest (e.g. paintings, buildings, monuments, etc.) under different scales, views and lighting conditions. Each visual object is represented by a unique label, with each image marked with the corresponding label. When the database is created the visual features are extracted from the reference images and the corresponding labels are associated with them. When the database is queried using a CDVS descriptor the label matching the contained patterns is returned irrespective of which particular reference image matched.
Both the feature extraction and the matching step are computationally intensive; moreover, such a system must be able to handle multiple steams at once, both real-time and batch. This raises the need for a scalable architecture.
Let’s see how the AgileRAI system is able to parallelize all the operations needed to analyze a video stream.



The AgileRAI system architecture merges classical information analysis and retrieval components with the most advanced fast data architectures. The high-level architecture of the system may be considered as a cluster made of multiple back-end computation nodes and a single front-end node.

Video processing

The video processing pipeline is based on witboost Data Streams (formerly Wasp), an open source framework written in Scala. You can find a detailed description of the witboost system at
The input RTP (or file-based) video is decoded using the FFMPEG library in order to extract the raw frames out of the incoming stream. Then, CDVS descriptors are generated for each frame and pushed to a Kafka queue. Kafka acts as a kind of persistent, fault tolerant and distributed publish/subscribe layer, which allows decoupling video feature extraction from feature matching tasks.
The descriptors queue is consumed by Spark Streaming in order to match the incoming descriptors versus the reference images database. The database is broadcasted to all the nodes participating in the computation, and the matching operation is performed concurrently on the descriptors. This step produces a list of labels for each frame, representing the visual objects contained therein.
The output of the Spark processing step is a new Kafka queue made of tuples <FrameID, Labels>, where the FrameID is a unique identifier of the processed frame (i.e. input stream reference and timestamp data) and Labels is the list of labels of the matching visual patterns within the frame.
The Kafka queue is finally sent to the semantic annotation node for enrichment, storing and publication, as well as delivered to an ElasticSearch cluster for monitoring purposes.

Semantic annotation

The queue of <FrameID, Labels> tuples generated by the video processing pipeline is consumed by the enrichment pipeline. Since the incoming frames are labelled with the URIs of linked data resources, the system is able to access the rich set of properties and relations provided by the referenced web sources. As an example, if within the input video stream a monument is retrieved at a certain date and time with an unambiguous URI (e.g. dbpedia:Sforza_Castle) the system may extract (and use as metadata for subsequent searches) all its properties like e.g. geographical location, year of construction, related external contents, etc. With this approach, the intellectually expensive and time-consuming work of annotating video data is requested to be performed on a (small) portion of data, e.g. when the target image dataset is created or updated, and not on the entire archived or broadcasted content. Furthermore, the creation and annotation of the target image dataset may be automated itself, by e.g. using a Web image search engine to source visual training data for target queries.

The collected information (i.e. source stream identifier, detection timestamps, linked data URIs) is stored as RDF triples in a triple store following a purpose-built ontology, in order to be semantically searchable and accessible by means of SPARQL queries.

User interface

In addition to being stored in the triple store, the semantically enriched metadata is also leveraged in the system’s GUI. This information, along with a transcoded version of the input video, is used by an HTML5 web page that allows the user to choose between the available channels and shows the live video of the input video stream being processed along with a stream of contextual information. The image below shows an example: the input video is depicting a portrait of William Shakespeare. Some pictures of the famous English writer were previously collected and tagged in the AgileRAI image reference dataset. Thus, the system is able to recognise his appearance within the input stream and collect further information listing e.g. his biography and other personal data.



The combination of Visual Analysis techniques for the detection of visual patterns with Semantic Web technologies enables the extraction and most importantly the use of content information from video media, an important capability in today’s entertainment industry. Parallel computation is crucial to achieving scale out capabilities in this kind of use cases. Thanks to the witboost Data Streams (formerly Wasp) framework, AgileRAI makes it possible to ingest several live video streams in parallel and analyze them in a single cluster with a single application deployment, providing powerful video analysis capabilities with low effort.