Why do we need Data Products?
As data teams got caught up in defining, building, and maintaining the process – data infrastructures, pipelines, and architectures, it progressively ate up the time they spent on the core data and data applications. What this meant was:
- Limited data applications to power customer-facing endpoints or business decisions.
- Faulty and untrustworthy data as a result of increasing debt in the data infra layers and never enough data engineers to manage it.
- Long and loopy path from data to insights that led to the loss of several business opportunities.
- High time to ROI and low ROI of data teams due to complex builds and fixer-uppers that hogged significant resources and ate up from whatever ROI was generated in the first place.
TLDR: Data Products establish a direct connection between data investments and tangible business outcomes, a bridge that was starkly missing from the data landscape before.
What is a Data Product?
A data product is a reliable unit of data or a container of data that enables a direct and seamless impact on business decisions and outcomes at the time of opportunity.
Simplicity or Seamless Impact
One of the challenges that businesses face is to wade through the complexities of data to somehow mine insights and patterns that they could half-heartedly rely on. A data product is targeted to simplify this barrier since they are born out of decoupled physical and logical layers where the control of the data narrative lies with business. From the perspective of product thinking, data product is an enhanced user experience of data.
Data Unit or a Container
The most fundament physical or logical data unit that could independently add value to the end user. While physical units are directly consumed, logical units are used as on-the-fly channels for materializing the physical units, powering intelligent data movement and saving movement expenses in the process.
Direct Business Impact
The ideal objective of any data stack, data team, or data initiative should be to create valuable data that actually uplifts business objectives. While this end vision is lost under the task of maintaining heaps of complex subsystems, the data product brings back the target to the forefront and solves it right-to-left.
Reliable Unit
Nobody would make use of data they cannot trust, and even if they do due to limited options, there is the inevitable burden of engaging high-cost dedicated teams and resources to look after the lineage of the unstable business use case.
Operationalized at the time of opportunity
The most critical factor in business is time, especially due to the rate at which business moves today. Infra infiltrated with leaky pipelines needing constant fixes steals away time from actually producing business-relevant insights. A data product, therefore, is data that is available on demand with quality and governance enforced on the fly.
Discoverable
Data discoverability is perhaps the most important feature behind data operationalization. If you cannot find the data, you are blocked at the very first stage. Discoverability implies reusability, which is why the data product approach encourages uniqueness and is anti-duplicity. Duplicating data or data product pipelines is equivalent to approaching the state of another data swamp. Discoverability is powered by a global metadata manager that sits in a central control plane and has visibility over the entire data ecosystem across distributed user planes.
Secure
Data products being on-demand value providers are inherently embedded with security protocols that activate on-the-fly based on the demand center. They are required to be governed centrally (pseudo-federated) with fundamental access and data policies that are enforced security protocols on asset, row, and column levels. The uniqueness of data products brings more importance to the security component since the data is addressable by multiple end-points and must adhere to the set standards and compliances on every materialization channel.
Addressable
Addressable data is available as a standard asset across cross-functional, regional, and multi-cloud environments. Being addressable means having a common and unique address that adheres to the organizational or domain’s convention and is accessible by code and even low-code/no-code channels.
Trustworthy
Trustworthy data meet quality expectations, is compliant with required standards, and is laced with end-to-end lineage and provenance particulars. Businesses should be able to blindly rely on the fulfilment of their expectations on the data they demand. This is possible declaratively where a dedicated system manages the quality and governance expectations while business teams could focus completely on operationalizing the served data.
Natively Accessible
Natively accessible data is language, format, personnel, and system-agnostic. Enablers of this data product attribute could include domain-specific languages (DSL) to navigate the low-level subsystems, the capability of the system to understand a whole range of programming and SQL languages, and the ability to adopt new ones through native transpilers.
Interoperable
Data generated or sourced from any source should be able to talk to each other without bombarding conflicts. This is possible through data APIs or logical constructs and loosely coupled yet tightly integrated components that sit on top of unique and addressable data to ensure visibility across the required use cases with end-to-end on-the-fly governance. Interoperability also requires universal semantics that are manageable through a central control plane.
Valuable on its own
Data is valuable on its own if it acts as a complete and independent entity that directly impact business decisions. This requires the data to adhere to universal semantics, business quality assertions, and use-case-specific governance standards.
Examples of Data Products
- A spreadsheet in an S3 bucket
- A table view knitted from across heterogeneous sources
- A report generated from an analytics dashboard
- A metrics layer on top of a data model
- Features in an ML feature store
- A database of dynamic rules in a self-driving car
- An encrypted or masked dataview displayed through a SQL interface
How is a Data Product materialized?
To deep-dive into each component from the above architecture, feel free to refer to the Data Mesh Implementation of a Data Operating System.