Content - leuduan.work

Here is a comprehensive summary of Chapter 9: "Solution Architecture Design" from Data Mesh in Action.

Introduction

Chapter 9 shifts focus from high-level principles and organizational structures to the technical implementation of the data mesh. It places the reader in the role of a software architect at Messflix, tasked with designing the technical architecture for data products. The chapter guides the reader through capturing the current state of systems, understanding architectural drivers, and conducting design sessions to create data products from various sources, such as files, monoliths, and data streams.

9.1 Capturing and Understanding the Current State

Before planning any architectural changes, it is essential to understand the existing landscape. Just as an interior designer requires a floor plan before moving walls, a software architect must understand the current system structure to make meaningful decisions.

Defining Software Architecture The chapter defines software architecture in two ways:

Architecture as Structure: This includes the system's building blocks, their interfaces and relations, applied patterns, technology stacks, and cross-cutting concerns like security. These are the elements that are typically difficult and time-consuming to change.
Architecture as Process: This is the act of translating architectural drivers (requirements) into a design through conscious decision-making.

The C4 Model for Documentation To document architecture effectively and create a common language across the organization, the authors recommend the C4 model created by Simon Brown. This notation uses four hierarchical levels of abstraction:

Level 1: Context. Shows the software system as a black box, illustrating how users and other systems interact with it.
Level 2: Container. Zooms in to reveal the "containers" that make up the system, such as applications (web apps, microservices) and data stores. It specifies technologies (e.g., Java, Oracle) and communication protocols.
Level 3: Component. Zooms into a specific container to show its internal building blocks or modules.
Level 4: Code. Zooms into a component to show class diagrams. The authors note that they often skip this level, preferring self-explanatory code and tests over detailed UML class diagrams.

9.2 Understanding Architectural Drivers of a Data Product Design

Architectural drivers are the contextual information required to make design decisions, much like a military commander needs intelligence before a battle. These drivers fall into four categories:

1. Functional Requirements In the context of data products, functional requirements describe the information needs of data consumers, the value they derive from the data, and their preferred methods of consumption.

2. Quality Attributes (Nonfunctional Requirements) These are the technical qualities the system must possess. Examples include:

Auditability: Persisting data for legal audits (e.g., retaining data for five years).
Availability: Uptime requirements for decision-making (e.g., 99.99%).
Compliance: Adherence to regulations like GDPR (e.g., ability to remove user data).
Interoperability: Using standardized vocabularies or ontologies.
Scalability: Handling peak loads (e.g., scaling up for weekend traffic).
Security: restricting access to authorized roles and masking PII.

3. Constraints These are limitations imposed by the governance body or management, usually independent of the architect's preferences. They include:

Time and Budget: Deadlines for delivery.
Technology: Mandates to use specific cloud providers (e.g., AWS, GCP) or hosting models.
People/Organizational: Limitations based on team skills (e.g., restricting the stack to Java because that is what the team knows).

4. Principles These are high-level rules set by technical leadership. Examples might include "Cloud-native," "Mature technologies only" (valuing stability over novelty), or "Automate everything" to reduce manual error.

Technique: Capturing Drivers The authors illustrate the process of capturing drivers using a "Cost Statement" data product.

Step 1: Analyze Requirements. The architect reviews the data product canvas. For the Cost Statement product, the source is spreadsheets, and consumers are the ERP system (requiring REST API) and a Financial Analysis data product (requiring CSV files).
Step 2: Identify Priorities. Based on the analysis, attributes like scalability and performance are deemed low priority because access is infrequent (daily). However, Privacy (payroll data) and Auditability (financial decisions) are identified as critical.
Step 3: The Straw Man's Proposal. To define measures of success, the architect proposes an obviously wrong solution (a "straw man") to trigger stakeholder correction. For example, proposing to keep financial data for only "one week" forces stakeholders to clarify that "one year is the minimum, five years is optimal".

9.3 Designing the Future Architecture

Once the current state is mapped and drivers are understood, the design process begins. The authors emphasize that design should be a collaborative team effort to avoid the "ivory tower architect" syndrome. A typical design session involves brainstorming options, cross-reviewing them using a "Pro-Con-Fix" exercise (listing pros, cons, and mitigations), and making a collective decision.

The chapter presents three distinct design scenarios:

Scenario 1: File-Based Data Product (Spreadsheet)

Context: A Cost Statement data product derived from spreadsheets manually stored by the Production team. Drivers: Auditability, Privacy, low frequency of access, and constraints to use Python/Java and the existing platform.

Option 1 (Selected): Use an Airflow workflow (Python) to read the spreadsheets periodically. The workflow stores raw files for auditability, cleanses sensitive data for privacy, and saves the data into a MongoDB for API access and a shared repository for file access. This leverages the platform (Airflow) and open-source tools.
Option 2 (Rejected): A custom Java service using PostgreSQL. This was rejected because it failed to leverage the platform (Airflow), breaking the "Platform over custom solutions" principle.

Scenario 2: From Monolith and Microservice to Data Product

This scenario focuses on extracting data from "Hitchcock Movie Maker," a system comprising a legacy monolith and microservices.

Case A: Cast Data Product (Complex Extraction) Context: The "Cast" dataset is locked inside a monolithic database. Consumers include a script recommender (REST API) and analysts (SQL). Drivers: Strict Privacy/Security/Compliance (PII data), daily consumption (low performance pressure).

Design Pattern: Turning the Database Inside Out The authors introduce Martin Kleppmann's pattern, "turning the database inside out." This involves exposing a stream of events (Change Data Capture or transaction logs) from a system-of-record to create a read-optimized view asynchronously.

Option 1 (Event-Driven): Expose events directly from the monolith's database tables using Kafka Connect. These events are published to a Kafka topic and consumed by a Java microservice, which builds a read-optimized PostgreSQL database exposed via API. This option decouples the consumer from the monolith but requires transforming the internal schema into a public contract. It creates a "magically self-updating cache".
Option 2 (Batch/ETL): Use an Airflow workflow to extract, transform, and load data from the monolith into the data product database.

Tradeoff Analysis: The team compares Option 1 (Loose coupling via event contract) vs. Option 2 (Rapid development via platform reuse). As an architect, one must recognize that every design choice is a tradeoff—exchanging one value for another.

Case B: Scripts Data Product (Simple Extraction) Context: Exposing script data to a recommender engine. Drivers: Security is paramount; usage is low frequency. Few drivers imply the KISS (Keep It Simple, Stupid) principle should apply.

Option 1 (Module Extension): Extend the existing script microservice with a new module acting as a facade. This creates a "mini-monolith" and tight coupling but is very simple.
Option 2 (Independent Service): Create a separate microservice that shares the same database. This resembles the CQRS pattern (Command Query Responsibility Segregation).
Option 3 (Database per Service): A separate microservice with its own database, synchronized via events. This offers the loosest coupling but the highest complexity.

Decision: The team selects Option 1 because it adheres to the KISS principle for this specific low-complexity scenario.

Scenario 3: Stream and Batch Processing

Context: A "Fraud Analysis" data product needs to detect fraud using data from the Streaming Player, Subscription system, and Customer Support. Drivers: Real-time response (to catch fraud red-handed) and Scalability (to handle uneven traffic peaks).

Design Choices: The architecture depends heavily on the response-time requirement. If near real-time is required, stream processing is necessary. If a delay is acceptable, batch processing is simpler.

Techniques for Exposing Streams as Data Products:

Extract from Files: Use a workflow engine (like Airflow) to process logs from object storage and publish them as events.
The Outbox Pattern: Also known as the Transactional Outbox. When a system saves its state to the database, it also saves an event to a specific "outbox" table within the same transaction. A separate process (message relay) reads this table and publishes events to a message broker. This ensures consistency between the database state and the event stream.

Data Lakes and Data Mesh The chapter concludes by clarifying that a data mesh does not exclude data lakes. They are complementary. In a data mesh using data lake technology:

Data Product Teams own their specific buckets/folders/namespaces within the lake and the pipelines that transform the data.
Platform Teams own the underlying data lake technology/infrastructure itself.

Summary of Key Takeaways

Architecture Definition: It is both the structure of the system and the process of designing it based on drivers.
Documentation: The C4 model provides a standardized way to visualize architecture at context, container, and component levels.
Drivers: Design is guided by functional requirements, quality attributes (non-functionals), constraints, and principles.
Collaboration: Design should be inclusive, utilizing tools like "Straw man proposals" and "Pro-Con-Fix" lists to reach consensus.
Patterns:
- Turning the database inside out: Using event streams to create read-optimized views.
- Outbox Pattern: Ensuring consistent event publishing from systems of record.
- CQRS: Separating read and write models for optimization.
Integration: Data mesh principles can coexist with data lakes, provided ownership boundaries are respected.