From days to minutes: one of the world’s top-five insurance companies has improved its end-to-end delivery of data thanks to cloud services
OVERVIEW
SCENARIO
Many sub-companies based on different data management systems, heterogeneous technologies and several days of delay to deliver data to the business.
PROJECT GOALS:
Centralization of data information, while reducing Process Cycle Time.
THE SOLUTION:
Amazon Cloud Services
RESULTS:
Significant reduction of the end-to-end delivery of data from several days to minutes (the original pipelines based on batch processes) through AWS services.
* * *
When one of the world’s top-five insurance groups asked Agile Lab to contribute in designing and develop the new data management platform, it was clear the complexity that it was going to be achieved starting from the initial situation: many sub-companies based on different data management systems, heterogeneous technologies and several days of delay to deliver data to the business.
Agile Lab provided its technical skills in Cloud and Big Data technologies, collaborating with the internal team of the customer to compose a unique and top-notch team. Given the complexity of this challenge, the big point was picking the right composition of services to address all the requirements.
Data from any source
AWS provides a plethora of tools for data ingestion and integration. With the will to collect data for near real-time scenarios, Amazon MSK has represented a baseline for any kind of data ingestion, enabling the data platform to gather data from legacy operational systems through CDC and be open to other interfaces, such as data APIs and batches, easily reaching TB of data volumes.
Data exploration and business models
Building a landing area on top of AWS S3 has provided the opportunity for data exploration by means of AWS Athena. On the one hand, data exploration enables data analysts to understand and analyze data and build business models. On the other hand, AWS Glue fits the need for industrialization of business models, since big data engineer can build Spark applications based on data analysts’ results as specifications. Further, AWS Glue provides great and transparent horizontal scaling, zeroing the time for operations.
Master Data and KPI
This initiative has dealt with many challenges, among the others: how to centralize master information keeping the near-real-time requirement in mind. Having AWS MSK as an entry point, it is possible to expose a mechanism that unifies data streams into a company’s standard format that can be stored to AWS Aurora. This addresses data access in terms of performance and provides interesting integrations with AWS Lambda to extend the scope of a database transaction. For instance, this case has considered the opportunity to integrate AWS Elasticsearch to empower search capabilities. Thus, having centralized and standardize data, it is easier to summarize into analytics and KPIs what is required for such an insurance company for strategic purposes.
User Experience
The big achievement can be experienced through a significant reduction of the end-to-end delivery of data from several days (the original pipelines based on batch processes) to minutes through AWS services. This meaning that a customer can see the results of his/her operations as soon as they have been requested rather than struggling several days to get the evidence of those operations.
A Master Data Management system is the single point of truth of all data company-wide. The problem we want to manage is related to unifying and harmonizing ambiguous and discordant information coming from multiple sources of the same organization. Every single piece of information can be replicated across several systems and referenced by different identifiers.
Data can be labeled in many ways and take separate routes from the acquisition to the utilization step.
We will proceed here by providing a sample scenario that happens in real cases.
A reference scenario: banking and insurance companies
A banking institution is usually surrounded by myriads of data coming from many different directions.
Typically, we can mention much more than the following divisions within such organizations: Private Banking, Insurance Services, Legal & Compliance, Asset Management, Id Services, Real Estate.
Those areas need to communicate across subsidiaries, external services, and government agencies for regulation purposes.
Each of those items corresponds to a separate data management system that gathers information about a bunch of business entities and turns them into contracts, services, transactions, and whatever is necessary to accomplish the business.
In reality, each division within the organization has a separate life. They are developed by different suppliers, shifted over time, rely on different technologies, they’re managed by people having divergent backgrounds and usually, they don’t even share the same vocabulary BUT they are making all the same business.
A banking/insurance system: different technology stacks, separate data flows, different languages.
How information diverges
From a system perspective, the issue may look merely technical with no consequences over the business. Having distinct technologies it is something that it is likely resolvable through system integration with custom developments or specialized tools.
Nevertheless, the mismatch is deeper than how it appears and it goes from the technical to the cultural level across all divisions of an organization.
The business relies on business objects that have a specific representation within each subsystem they live in.
The representation of the same business object is heterogeneous across the subsystems of the organization.
For instance, a single person may be identified through an email within a marketing campaign, the unique registration id within the Real Estate division, or the fiscal code for the legal and compliance department. Every profile related to that person may have registered email, phone, and even a document id different from each other. That is, the same attribute “email” has different values for the same person creating ambiguities among data within the several subsystems of the organization.
Business models with many ambiguities.
The same ambiguity arises for all the other business objects bringing to models that contain a lack of clarity, misunderstandings, and data difficult to reuse without carrying out lots of mistakes.
Many events can occur on the different subsystems that change those data, for instance, a change of the email, the address of residence, the ID due to renewal, etc.
Events that change of business object attributes
How should we consider the attributes coming from different sources when they are related to the same business objects? This is one of the important questions to answer in the context of Master Data Management.
How data quality affects ambiguity
Data quality matters if our organization wants to keep ambiguity away. Degradation and missing quality gates can increase the possibility to bring mistakes into the data model of the business objects. Consider a fiscal code not validated that contains a single mistaken character. This fiscal code can easily lead to match another person. A mobile number can be reassigned by the telecom company so that multifactor authentication can reduce such degradations, a fake email or phone number can be also validated before being accepted by the system. Different categorization of products by the different departments of the company without cross-system validation is also another issue. There are tons of those examples.
Golden record
The main objective of a Master Data Management system is to keep an up-to-date version of a Golden Record. This is a clear data model of a business object that integrates attributes coming from all subsystems.
Unification and harmonization of business object data models
Matching phase
Updates from any of the subsystems have to be identified to match the Golden Record, that is the MDM system must detect whether this business object instance has already entered the organization in some shape.
Examples of matching records
The matching phase explores a set of rules of several types to match a Golden Record. Those rules can be for exact or fuzzy matching, meaning that sometimes it is necessary to relax the algorithm to account for poor data quality, typos, and misspellings (like Johnny and Jonny). The same rule can aggregate checks for different attributes at the same time to reinforce a test against the Golden Record.
Given a new business object instance entering the organization, the matching phase checks that there is at least a Golden Record that validates any of the rules given in a certain order.
This may seem controversial because there could exist many Golden Records that correspond to a business object instance update. Anyhow, we have to implement a strategy that reduces mistakes and this is the importance given by the order of execution of the rules and the type of matches we apply to a new record.
Where no matches are found, the business object instance is elected as a new Golden Record.
Merging phase
The second point to construct a Golden Record is the merging phase. After a match is found for the incoming business object instance, there is the need for merging this new record with attributes of the Golden Record. Here some of the criteria to be applied.
Source Reliability
As we have mentioned, data quality is relevant to the disambiguation process. In this respect, we can associate a sort of score to each attribute for each business object to sort them out by reliability.
For instance, we can give a not validated email from the marketing subsystem a score of 1 out of 5 while an email coming from the real estate subsystem a score of 5 since it is empowered by multifactor authentication.
Last attributes come first
The most recent data are usually considered more reliable because considered up-to-date.
Group attributes by category Consider to not replace the Last Name from the new business object instance and keep the Name from the Golden Record, they are substantially a combined attribute and does not make sense to treat them separately.
Challenges and opportunities
There are many challenges supposed to be solved by a Master Data Management system.
Data Privacy
One of the most relevant is data privacy. Managing consent policies across multiple channels (divisions of the same organization) is very hard to deal with. In fact, consider the same customer being involved in a marketing campaign where she/he allows the organization to use her/his data uniquely for the submitted survey while the same customer is going to provide contradictory consents from another channel (branches or agencies). What to do in this case? Worse, think about a group of companies exchanging/sharing customers’ information that must respect regulations like GDPR and HIPAA. How to deal with different or contradictory consents? How to propagate bottom-up and top-down privacy consents across all subsystems? How to segregate duties among divisions with respect to regulations? All this is far from being simple and clear and MDM can really help to sort those things out. Nevertheless, designing such a system is a complex task.
Data access control
Any organization working with a big amount of data has to deal with controlled data access levels. Data stewards, chief data officers, data/ML engineers, CRM people, IT operations, data governance, they all access data but with distinct privileges against data. Thus, data masking and segregation based on roles and duties must be provided at any level and MDM can drive this complex logic.
Batch vs real-time
Customers are greedy for technology. Thus, they want to buy an insurance policy or get a loan approved and at the same time, they wish to monitor the status of any action from a smartphone in real-time. These simple requirements translate into complex subsystem integrations that may take even several days to be available to the final user in a solution based on batch processes. For instance, purchasing a single loan from internet banking services can potentially involve every division in the organization such as Real Estate, legal and compliance, insurance services, etc. Such a requirement may require a shift from batch to real-time architectures that surely need to involve a Master Data Management system as a central asset of the organization and this is far from being trivial (see A Data Lake new era).
360 Customer View
A 360 Customer View focuses on customer’s data analytics providing the organization with a customer-centric view. This is strictly related to the customer business model provided by the MDM and it is able to inspect customer’s needs from his/her perspective giving the business one more extra chance to do better. Obviously, MDM supplies a comprehensive collection of information related to individuals that can be customers or prospects, so representing the principal source of trusted information about an individual.
What’s next
This is just an introductory article I’ve proposed to provide you with intuitions about what a Master Data Management system has to deal with and which are the basic principles behind. Next time, we will go more into technical and functional details of Master Data Management to dig into some of the mentioned challenges and related solutions. Please share your experience and send back your feedback! Follow us on AgileLab.
Written by Ugo Ciracì – Agile Lab Project Lead
If you found this article useful, take a look at our blog and follow us on our Medium Publication, Agile Lab Engineering!
X This website uses cookies (including third party cookies as well as other tracking technologies) to make our site work, for marketing purposes and to improve your online experience. We won‘t set optional cookies unless you enable them. You can change your cookie settings at any time. Further information can be found in our Privacy Policy and Cookie Policy.
Il presente sito web utilizza cookie (compresi cookie di terze parti e altre tecnologie di rilevamento) al fine di gestire il nostro sito, per finalità di marketing e per migliorare la tua esperienza online. Non saranno impostati cookie opzionali a meno che non vengano da te abilitati. Potrai modificare in qualsiasi momento le tue impostazioni dei cookie. Troverai maggiori informazioni nella nostra Privacy Policy e Cookie Policy.
This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
Cookie
Duration
Description
connect.sid
2 hours
This cookie is used for authentication and for secure log-in. It registers the log-in information.
cookielawinfo-checkbox-advertisement
1 year
Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional
11 months
The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
elementor
never
This cookie is used by the website's WordPress theme. It allows the website owner to implement or change the website's content in real-time.
viewed_cookie_policy
1 months
The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Cookie
Duration
Description
aka_debug
session
Vimeo sets this cookie which is essential for the website to play video functionality.
bcookie
2 years
LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
ELOQUA
1 year 1 month
The domain of this cookie is owned byOracle Eloqua. This cookie is used for email services. It also helps for marketing automation solution for B2B marketers to track customers through all phases of buying cycle.
lang
session
LinkedIn sets this cookie to remember a user's language setting.
lidc
1 day
LinkedIn sets the lidc cookie to facilitate data center selection.
UserMatchHistory
1 month
LinkedIn sets this cookie for LinkedIn Ads ID syncing.
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Cookie
Duration
Description
dtCookie
This cookie is set by the provider Dynatrace. This is a session cookie used to collect information for Dynatrace. Its a system to track application performance and user errors.
INGRESSCOOKIE
23 hours
This cookie is used for load balancing and session stickiness. This technical session identifier is required for some website features.
SRM_B
1 year 24 days
Used by Microsoft Advertising as a unique ID for visitors.
YSC
session
This cookies is set by Youtube and is used to track the views of embedded videos.
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Cookie
Duration
Description
_ga
2 years
This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_71167806_1
1 minute
This cookie is set by Google and is used to distinguish users.
_gat_UA-140152462-1
1 minute
This is a pattern type cookie set by Google Analytics, where the pattern element on the name contains the unique identity number of the account or website it relates to. It appears to be a variation of the _gat cookie which is used to limit the amount of data recorded by Google on high traffic volume websites.
_gh_sess
session
GitHub sets this cookie for temporary application and framework state between pages like what step the user is on in a multiple step form.
_gid
1 day
This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.
vuid
2 years
This domain of this cookie is owned by Vimeo. This cookie is used by vimeo to collect tracking information. It sets a unique ID to embed videos to the website.
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Cookie
Duration
Description
ANONCHK
10 minutes
The ANONCHK cookie, set by Bing, is used to store a user's session ID and also verify the clicks from ads on the Bing search engine. The cookie helps in reporting and personalization as well.
IDE
1 year 24 days
Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
MUID
1 year 24 days
Bing sets this cookie to recognize unique web browsers visiting Microsoft sites. This cookie is used for advertising, site analytics, and other operations.
test_cookie
15 minutes
This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the user's browser supports cookies.
uuid
never
MediaMath sets this cookie to avoid the same ads from being shown repeatedly and for relevant advertising.
VISITOR_INFO1_LIVE
5 months 27 days
This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.
yt-remote-connected-devices
never
YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id
never
YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId
never
This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests
never
This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
Cookie
Duration
Description
_clck
1 year
No description
_clsk
1 day
No description
_lfa
2 years
This cookie is set by the provider Leadfeeder. This cookie is used for identifying the IP address of devices visiting the website. The cookie collects information such as IP addresses, time spent on website and page requests for the visits.This collected information is used for retargeting of multiple users routing from the same IP address.