Data Mesh

Prologue: Imagine Data Mesh

Chapter 1. Data Mesh in a Nutshell

Data mesh is a new approach in sourcing, managing, and accessing data for analytical use cases at scale. Architecturally, it shifts data collection from warehouses and lakes into a distributed "mesh" of connected data products. These parts freely share data through contracts agreed upon by a group of individuals considered "experts" in their domain. There is no central orchestrator.

Fundamental Principles of a Data Mesh:
  1. Decentralized domain ownership - people closest to the business domain owns and manages the data
  2. Data as a product is collecting data where each domain owners have the organizational responsibility of sharing it with any of the its users - analysts or scientists.
  3. Self-serving data platform where services are exposed for consumption of cross-functional teams or technology generalists
  4. Federated data governance in which decisions on how to standardize storage and access are made by different domain representatives to ensure interoperability, compliance, security, etc...

Data Mesh Diagram

Chapter 2. Principle of Domain Ownership

A Data Mesh does not enforce the idea of a single source of truth for data. You don't expect it to have one. These domain data can be Analytical or Operational data.

Chapter 3. Principle of Data as a Product

For data to be a product, it has to have these 3 characteristics to its users:

  1. Feasible
  2. Valuable
  3. Usable (Easily accessed and consumed)
  4. FAIR

Chapter 4. Principle of the Self-Serve Data Platform

Chapter 5. Principle of Federated Computational Governance

Federated and computational governance is a decision-making model led by the federation of domain data product owners and data platform product owners, with autonomy and domain-local decision-making power, while creating and adhering to a set of global rules—rules applied to all data products and their interfaces. It addresses what is considered one of the most common mistakes of today’s data governance: being an IT initiative with an organizational model that parallels business and is not embedded within the business.

For the federated group to manage its operation, it defines the following operation elements:

  1. Federated Team - The governance is composed of a cross-functional team of representatives from domains, as well as platform experts and subject matter experts from security, compliance, legal, etc.
  2. Guiding Values - value system that guides how decisions are made and what their scope of influence is - global across domains or local within its domain.
  3. Policies - Standards that specify what "good" looks like and how to ensure it is maintained
  4. Incentives - These help in balancing the priorities of the domain representatives between their local and global priorities.

Chapter 6. The Inflection Point

Chapter 7. After the Inflection Point

Chapter 8. Before the Inflection Point

Dark Data - information organizations collect but fail to use

Chapter 9. The Logical Architecture

Given the complexity of the data mesh platform and its wide range of capabilities, its logical architecture is composed of multiple (platform) planes. Each plane is a logical collection of capabilities with complementary objectives. It abstracts infrastructure complexity and offers a set of interfaces (APIs) for the capabilities it implements.

Chapter 10. The Multiplane Data Platform Architecture

Chapter 11. Design a Data Product by Affordances

Common characteristics of a Data Product Architecture

  1. Designed for change - it can respond gracefully to changes (to data model, access, or type) and allow for it to evolve over time
  2. Designed for scale - allows for the architecture to scale out as more data products are added
  3. Designed for value to its data consumers

Chapter 12. Design Consuming, Transforming, and Serving Data

There are 3 basic functions that all data products implement:

1. Serve Data

Bitemporal History a pattern for storing consumed data where 2 separate timestamps are stored - 1 for the storage date, and another for the effective date of the corresponding value. This allows for changes to be made on past records without losing historical data. For example, this table for recording an employee's salary with an increase recorded on March 25 for the salary from February 25

processed_date effective_date salary
Jan 25 Jan 25 6000
Feb 25 Jan 25 6000
Mar 25 Jan 25 6000
Feb 25 Feb 25 6000
Mar 25 Feb 25 6500
Mar 25 Mar 25 6500

2. Consume Data

3. Transform Data

Chapter 15. Strategy and Execution

The primary criteria for adopting data mesh are the intrinsic business complexity and proliferation of data sources and data use cases.

Chapter 16. Organization and Culture