175: The Parts, Pieces, and Future of Composable Data Systems, Featuring Wes McKinney, Pedro Pedreira, Chris Riccomini, and Ryan Blue

The Data Stack Show

1×

0:00

-1:18:30

175: The Parts, Pieces, and Future of Composable Data Systems, Featuring Wes McKinney, Pedro Pedreira, Chris Riccomini, and Ryan Blue

Feb 01, 2024

Transcript

This week on The Data Stack Show, Eric and Kostas chat with a panel of experts as Wes McKinnyey (Cofounder, Voltron), Ryan Blue (Co-Founder and CEO, Tabular), Chris Riccomini (Seed Investor, Various Startups), Pedro Pedreira (Software Engineer, Meta), all share their thoughts around the topic of composable data stacks. During the conversation, the group chats about the importance of open standards and APIs for efficient interoperability in data management systems, the evolution of data workloads, the need for specialization, and the challenges in building composable components. The conversation also covered the significance of an intermediate representation (IR) for decoupling various layers of data systems, the complexities of data types, and the desire for more secure data sharing methods. The panelists explored the evolution of open standards and the trade-offs between composable and monolithic systems, expressing excitement about new data infrastructure projects and technologies, modular execution engines, new query interfaces, standardizing policy decisions across different data management platforms, and more.

Highlights from this week’s conversation include:

Introduction of the panel (0:05)
Defining composable data stack (5:22)
Components of a composable data stack (7:49)
Challenges and incentives for composable components (10:37)
Specialization and modularity in data workloads (13:05)
Organic evolution of composable systems (17:50)
Efficiency and common layers in data management systems (22:09)
The IR and Data Computation (23:00)
Components of the Storage Layer (26:16)
Decoupling Language and Execution (29:42)
Apache Calcite and Modular Frontend (36:46)
Data Types and Coercion (39:27)
Describing Data Sets and Schema (42:00)
Open Standards and Frontiers (46:22)
Challenges of standardizing APIs (48:15)
Trade-offs in building composable systems (54:04)
Evolution of data system composability (56:32)
Exciting new projects in data systems (1:01:57)
Final thoughts and takeaways (1:17:25)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

0 Comments

The Data Stack Show

Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

Listen on