To understand the meaning of the next two points, “Self-serve data infrastructure as a platform” and “Federated computational governance”, let’s draw a parallel with a platform type that “changed the game” in the past. It is an over-simplification, but bear with us. Imagine you work in Logistics and you need to book a hotel for your company’s sales meeting. You have a few requirements (enough rooms available, price within budget, conference room on premise, easy parking) and a list of “nice-to-have” (half board or restaurant so you don’t have to book a catering for lunch, shuttle to and from the airport for those who fly in, not too far from the airport of from the city center). Your company Data Lake contains all the data you need: it contains all the hotels in the city where the meeting will take place. But so did the “Yellow Pages” of yesteryear. How long does it take to find a suitable hotel using such a directory? It’s basically unknowable: if the Abbey Hotel satisfies all requirements, not too long, but good luck getting to the Zephir Hotel near the end of the list. Not only they are sorted in a pre-determined way (alphabetically), but you have to check each one in sequence by calling on the phone if you want to know availability, price and so on. Moreover, the quantity of information available for each hotel directly in the Yellow Pages is wildly inconsistent. It would be nice to be able to rule out a number of them, so you don’t have to call, but some hotel bought big spaces where they write if they have a conference room or a restaurant, other just list a phone number. If you also need to check the quality of an establishment, to avoid sending the Director of Sales to a squalid dump, the complexity grows exponentially, like the uncertainty of how long it will take to figure it out. Maybe you are lucky and find a colleague who’s already been there and can vouch for it, or you’d have to drive to the place yourself. When you budget the company sale’s meeting how do you figure out how much it will cost, in terms of time and money, just finding the right hotel? And if not having an answer wasn’t bad enough, the Sales Director, who’s paying for the convention, is fuming mad because this problem repeats itself every single year and you can never give an estimate, because the time it took you to find a hotel last time is not at all indicative of how long it will take next time.
If you now change the word “hotel” with “data” in the above example, you’ll see that it might be a tad extreme, but it is not so far-fetched. A Data Mesh is, in this sense, like a booking platform (think hotels.com or booking.com) where every data producer becomes a data owner and, just like a hotel owner, wants to be found by those who use the platform. In order to be listed, the federated governance, imposes some rules. The hotel owner must list a price (cost) and availability (uptime), as well as a structured description of the hotel (dataset) that includes the address, category and so forth, and all the required metadata (is there parking? a restaurant? a pool? breakfast is included? and so on). Each of these features becomes easily visible (it is always present and shown in the same place), searchable and can act as a filter. People who stay at the hotel (use the dataset) also leave reviews, which help both other customers in choosing and the hotel owner in improving the quality of its offer or at least the accuracy of the description.
The “self-serve” aspect is two-fold. From a user perspective it means that with such a platform the Sales department can choose and book the hotel directly, without needing (and paying for) the help of Logistics (Data Lake engineers). From an owner perspective (hotel or data owner) it means that they can independently choose and advertise what services to offer (rooms with air conditioning, Jacuzzis, butler service and so on) in order to meet and even exceed the customers’ wishes and demands. In the data world this second aspect relates to the freedom of data producers to autonomously choose their technology path, in accordance with the federated governance approved standards.
Last, but definitely not least, the Data Mesh architecture brings to the table the ease of scalability (once you have all the Hotels/datasets in one city, the system can grow to accommodate those of other cities, as well) and reuse. Reuse means that the effort you spent in creating a solution can, at least in part, be employed (reused) to create another. Let’s stick to the hotel analogy. If you created it last year and now want to do something similar for B&Bs, there is a lot you don’t have to redo from scratch. Of course, the “metadata” will be different (Bed and Breakfast don’t have conference rooms), but you can still use the same system of user feedback, the same technology to gather information on prices and availability, that once again will be up to the B&B’s owner to keep up to date, and so on.