SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Downloaden Sie, um offline zu lesen
Data Tech Talk
ft. external speaker
Chris Riccomini, Author, Engineer & Manager, …
1. Talk: Data Warehousing Trends
2. Open Dialogue: Q & A
Data
Warehousing
Trends 2021/12/09 · Chris Riccomini
Hi, I’m Chris
● Engineer & Manager
WePay, LinkedIn, PayPal
● Open Source
Apache Samza, Apache Airflow, Debezium
● Author
The Missing README
● Investor & Advisor
Prefect, Meroxa, StarTree, Amundsen, Anomalo, TopCoat, ...
@criccomini
The Trends
● Realtime DWHs
● Analytics Engineering
● Data Mesh
● Data Catalogs
● Reverse ETL
● Headless BI
● Data Integrity
● Data Lakehouses
● DataOps
● White-label Data Viz
https://twitter.com/criccomini/status/1451557884769169412
https://preset.io/blog/reshaping-data-engineering/
The Trends
● Realtime DWHs
● Analytics Engineering
● Data Mesh
● Data Catalogs
● Reverse ETL
● Headless BI
● Data Integrity
● Data Lakehouses
● DataOps
● White-label Data Viz
https://twitter.com/criccomini/status/1451557884769169412
https://preset.io/blog/reshaping-data-engineering/
You are here
Realtime DWHs
Batch ETL
Pubsub
Realtime DWH
Realtime DWH
Why Realtime DWHs?
● Debugging
○ Investigate application errors
○ Audit log shows how things changed
● Operational
○ Monitoring
○ Scripts that pull from DWH
● Security/Compliance
○ Audit log
● Customer data products
○ Ad hoc customer reports (e.g. Stripe Sigma, WePay txns)
○ Data clean rooms
Realtime DWHs Technical Advantages
● Handles hard deletes
● No schema requirements (timestamps)
● Replay from Kafka
● Data integration
Realtime DWH Drawbacks
● Operationally complex
● Depends on source DB support (for CDC)
● Inline transformation is harder
● Fixing bad data is harder
Data Mesh
“A data mesh is a type of data platform architecture that embraces the ubiquity of
data in the enterprise by leveraging a domain-oriented, self-serve design.”
● Domain-oriented decentralized data ownership and architecture
● Data as a product
● Self-serve data infrastructure as a platform
● Federated computational governance
Data Mesh
https://towardsdatascience.com/what-is-a-data-mesh-and-how-not-to-mesh-it-up-210710bb41e0
“A data mesh is a type of data platform architecture that embraces the ubiquity of
data in the enterprise by leveraging a domain-oriented, self-serve design.”
● Domain-oriented decentralized data ownership and architecture
● Data as a product
● Self-serve data infrastructure as a platform
● Federated computational governance
Data Mesh
https://towardsdatascience.com/what-is-a-data-mesh-and-how-not-to-mesh-it-up-210710bb41e0
wat.
“A product is any item or service you sell to serve a customer's need or want.”
Data is a Product
https://www.aha.io/roadmapping/guide/product-management/what-is-a-product
● Customers
○ Data scientists
○ Business analysts
○ Finance
○ Sales
○ Product managers
○ Engineers
○ External customers
● Products
○ Recommender systems
○ Billing
○ Fraud
○ Reports
○ Dashboards
Data is a Product
https://cnr.sh/essays/what-the-heck-data-mesh
● Versioned
● Compatible
● Documented
● Monitored
● Self-serve
● Secure (AuthN, AuthZ)
Treat Data Models like APIs
https://cnr.sh/essays/what-the-heck-data-mesh
● Microservice
● DevOps
We’ve Done This Before
https://cnr.sh/essays/what-the-heck-data-mesh
Headless BI
Metrics then
● BI tools to create and visualize metrics
○ Looker
○ Mode
○ Tableau
○ Data Studio
● Answer internal business questions
○ How is a product's health?
○ What does revenue look like?
Metrics now
● Metrics matter for external business workflows
○ Predicting when a customer might churn
○ Notifying users when they reach their capacity limit
○ DS wants to create models to optimize certain metrics
○ Computing customer bills
● BI tools aren’t meant for this
○ Walled garden
○ Have to re-implement the same metrics in different systems
“For data consumption, we heard complaints from decision makers that different
teams reported different numbers for very simple business questions, and
there was no easy way to know which number was correct.”
"...the teams that own metrics would be able to define them once, in a way
that’s consistent across dashboards, automation tools, sales reporting, and so
on. Let’s call this ‘Headless BI’."
Headless BI
https://medium.com/airbnb-engineering/how-airbnb-achieved-metric-consistency-at-scale-f23cc53dea70
https://basecase.vc/blog/headless-bi
Headless BI
● Programmatically manage business metrics
○ Automated
○ Centralized
○ Documented
○ Validated
○ Metadata/lineage
○ Backfills
○ Cost
○ Privacy
○ Access
○ Deprecation
○ Retention
https://medium.com/airbnb-engineering/how-airbnb-achieved-metric-consistency-at-scale-f23cc53dea70
Headless BI
https://medium.com/airbnb-engineering/how-airbnb-achieved-metric-consistency-at-scale-f23cc53dea70
Q&A
● Realtime DWHs
● Analytics Engineering
● Data Mesh
● Data Catalogs
● Reverse ETL
● Headless BI
● Data Integrity
● Data Lakehouses
● DataOps
● White-label Data Viz
Appendix
Analytics Engineering
Analytics engineers provide clean data sets to end users, modeling data in a way
that empowers end users to answer their own questions.
While a data analyst spends their time analyzing data, an analytics engineer
spends their time transforming, testing, deploying, and documenting data.
Analytics engineers apply software engineering best practices like version
control and continuous integration to the analytics code base.
https://www.getdbt.com/what-is-analytics-engineering
Analytics Engineering
● Job
○ Building
○ Testing
○ Cataloging
● Tools
○ DBT
○ Airflow
● Customers
○ Data science
○ Data analysts
○ BI
○ Reporting
Data Catalogs
● Flavor of the month
○ Amundsen
○ DataHub
○ Metaphor
○ Marquez
○ Atlan
○ Collibra
○ Alation
● Use cases
○ Discoverability
○ Operations
○ Governance
“Reverse ETL syncs data from a system of records like a warehouse to a system
of actions like CRM, MAP, and other SaaS apps to operationalize data.”
Reverse ETL
https://blog.getcensus.com/what-is-reverse-etl/
Reverse ETL
https://blog.getcensus.com/what-is-reverse-etl/
● https://unsplash.com/photos/Lbvi0GGJWY4
● https://unsplash.com/photos/8vTAAFYhFfQ
● https://www.flickr.com/photos/jurgenappelo/5201275209
●
Photos
Realtime DWHs

Weitere ähnliche Inhalte

Was ist angesagt?

Data Modeling Best Practices - Business & Technical Approaches
Data Modeling Best Practices - Business & Technical ApproachesData Modeling Best Practices - Business & Technical Approaches
Data Modeling Best Practices - Business & Technical ApproachesDATAVERSITY
 
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdfChris Hoyean Song
 
RWDG Slides: What is a Data Steward to do?
RWDG Slides: What is a Data Steward to do?RWDG Slides: What is a Data Steward to do?
RWDG Slides: What is a Data Steward to do?DATAVERSITY
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks
 
Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best PracticesBoris Otto
 
How to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at ScaleHow to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at ScaleDATAVERSITY
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...DATAVERSITY
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceDATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
User Case of Migration from MicroStrategy to Power BI
 User Case of Migration from MicroStrategy to Power BI User Case of Migration from MicroStrategy to Power BI
User Case of Migration from MicroStrategy to Power BIGreenM
 
Power BI Architecture
Power BI ArchitecturePower BI Architecture
Power BI ArchitectureArthur Graus
 
Improving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureImproving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureDATAVERSITY
 
Apache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshApache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshConfluentInc1
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Data Quality & Data Governance
Data Quality & Data GovernanceData Quality & Data Governance
Data Quality & Data GovernanceTuba Yaman Him
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshJeffrey T. Pollock
 

Was ist angesagt? (20)

Data Modeling Best Practices - Business & Technical Approaches
Data Modeling Best Practices - Business & Technical ApproachesData Modeling Best Practices - Business & Technical Approaches
Data Modeling Best Practices - Business & Technical Approaches
 
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
 
RWDG Slides: What is a Data Steward to do?
RWDG Slides: What is a Data Steward to do?RWDG Slides: What is a Data Steward to do?
RWDG Slides: What is a Data Steward to do?
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best Practices
 
How to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at ScaleHow to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at Scale
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Big Data Security and Governance
Big Data Security and GovernanceBig Data Security and Governance
Big Data Security and Governance
 
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data Governance
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
User Case of Migration from MicroStrategy to Power BI
 User Case of Migration from MicroStrategy to Power BI User Case of Migration from MicroStrategy to Power BI
User Case of Migration from MicroStrategy to Power BI
 
Power BI Architecture
Power BI ArchitecturePower BI Architecture
Power BI Architecture
 
Improving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureImproving Data Literacy Around Data Architecture
Improving Data Literacy Around Data Architecture
 
Apache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshApache Kafka® and the Data Mesh
Apache Kafka® and the Data Mesh
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Data Quality & Data Governance
Data Quality & Data GovernanceData Quality & Data Governance
Data Quality & Data Governance
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 

Ähnlich wie Data Warehousing Trends

Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseDatabricks
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationDenodo
 
Running Data Platforms Like Products
Running Data Platforms Like ProductsRunning Data Platforms Like Products
Running Data Platforms Like ProductsVMware Tanzu
 
DATA BI: put key insights at the finger tip of decision makers.
DATA BI: put key insights at the finger tip of decision makers.DATA BI: put key insights at the finger tip of decision makers.
DATA BI: put key insights at the finger tip of decision makers.ZaraaTitima1
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017SingleStore
 
How To Run A Successful BI Project with Hadoop
How To Run A Successful BI Project with HadoopHow To Run A Successful BI Project with Hadoop
How To Run A Successful BI Project with HadoopMammoth Data
 
The Agile Analyst: Solving the Data Problem with Virtualization
The Agile Analyst: Solving the Data Problem with VirtualizationThe Agile Analyst: Solving the Data Problem with Virtualization
The Agile Analyst: Solving the Data Problem with VirtualizationInside Analysis
 
The ABCs of Treating Data as Product
The ABCs of Treating Data as ProductThe ABCs of Treating Data as Product
The ABCs of Treating Data as ProductDATAVERSITY
 
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven DecisionsPower to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven DecisionsLooker
 
Introduction to Big Data using AWS Services
Introduction to Big Data using AWS ServicesIntroduction to Big Data using AWS Services
Introduction to Big Data using AWS ServicesAnjani Phuyal
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Company Profile - NPC with TIBCO Spotfire solution
Company Profile - NPC with TIBCO Spotfire solution  Company Profile - NPC with TIBCO Spotfire solution
Company Profile - NPC with TIBCO Spotfire solution Sirinporn Setworaya
 
Digital Operations Service Design
Digital Operations Service DesignDigital Operations Service Design
Digital Operations Service DesignNVISIA
 
No Compromises SQL Connectivity for MongoDB
No Compromises SQL Connectivity for MongoDBNo Compromises SQL Connectivity for MongoDB
No Compromises SQL Connectivity for MongoDBMongoDB
 
20180701 - 1st Meeting - Data Science Orientation
20180701 - 1st Meeting - Data Science Orientation20180701 - 1st Meeting - Data Science Orientation
20180701 - 1st Meeting - Data Science OrientationDuc Lai Trung Minh
 
Intro to Neo4j
Intro to Neo4jIntro to Neo4j
Intro to Neo4jNeo4j
 
SPT 104 Unlock your big data with analytics and BI on Office 365
SPT 104 Unlock your big data with analytics and BI on Office 365SPT 104 Unlock your big data with analytics and BI on Office 365
SPT 104 Unlock your big data with analytics and BI on Office 365Brian Culver
 
Big Data Analytics with Microsoft
Big Data Analytics with MicrosoftBig Data Analytics with Microsoft
Big Data Analytics with MicrosoftCaserta
 

Ähnlich wie Data Warehousing Trends (20)

Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent Enterprise
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and Visualization
 
Running Data Platforms Like Products
Running Data Platforms Like ProductsRunning Data Platforms Like Products
Running Data Platforms Like Products
 
DATA BI: put key insights at the finger tip of decision makers.
DATA BI: put key insights at the finger tip of decision makers.DATA BI: put key insights at the finger tip of decision makers.
DATA BI: put key insights at the finger tip of decision makers.
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017
 
How To Run A Successful BI Project with Hadoop
How To Run A Successful BI Project with HadoopHow To Run A Successful BI Project with Hadoop
How To Run A Successful BI Project with Hadoop
 
The Agile Analyst: Solving the Data Problem with Virtualization
The Agile Analyst: Solving the Data Problem with VirtualizationThe Agile Analyst: Solving the Data Problem with Virtualization
The Agile Analyst: Solving the Data Problem with Virtualization
 
The ABCs of Treating Data as Product
The ABCs of Treating Data as ProductThe ABCs of Treating Data as Product
The ABCs of Treating Data as Product
 
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven DecisionsPower to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
 
Introduction to Big Data using AWS Services
Introduction to Big Data using AWS ServicesIntroduction to Big Data using AWS Services
Introduction to Big Data using AWS Services
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
Company Profile - NPC with TIBCO Spotfire solution
Company Profile - NPC with TIBCO Spotfire solution  Company Profile - NPC with TIBCO Spotfire solution
Company Profile - NPC with TIBCO Spotfire solution
 
Digital Operations Service Design
Digital Operations Service DesignDigital Operations Service Design
Digital Operations Service Design
 
No Compromises SQL Connectivity for MongoDB
No Compromises SQL Connectivity for MongoDBNo Compromises SQL Connectivity for MongoDB
No Compromises SQL Connectivity for MongoDB
 
20180701 - 1st Meeting - Data Science Orientation
20180701 - 1st Meeting - Data Science Orientation20180701 - 1st Meeting - Data Science Orientation
20180701 - 1st Meeting - Data Science Orientation
 
Intro to Neo4j
Intro to Neo4jIntro to Neo4j
Intro to Neo4j
 
SPT 104 Unlock your big data with analytics and BI on Office 365
SPT 104 Unlock your big data with analytics and BI on Office 365SPT 104 Unlock your big data with analytics and BI on Office 365
SPT 104 Unlock your big data with analytics and BI on Office 365
 
Paving The Way To Data Driven
Paving The Way To Data DrivenPaving The Way To Data Driven
Paving The Way To Data Driven
 
Big Data Analytics with Microsoft
Big Data Analytics with MicrosoftBig Data Analytics with Microsoft
Big Data Analytics with Microsoft
 

Mehr von Chris Riccomini

What Your Tech Lead Thinks You Know (But Didn't Teach You)
What Your Tech Lead Thinks You Know (But Didn't Teach You)What Your Tech Lead Thinks You Know (But Didn't Teach You)
What Your Tech Lead Thinks You Know (But Didn't Teach You)Chris Riccomini
 
The Future of Data Engineering - 2019 InfoQ QConSF
The Future of Data Engineering - 2019 InfoQ QConSFThe Future of Data Engineering - 2019 InfoQ QConSF
The Future of Data Engineering - 2019 InfoQ QConSFChris Riccomini
 
Apache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInApache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInChris Riccomini
 
Apache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInApache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInChris Riccomini
 
Building Applications on YARN
Building Applications on YARNBuilding Applications on YARN
Building Applications on YARNChris Riccomini
 

Mehr von Chris Riccomini (6)

What Your Tech Lead Thinks You Know (But Didn't Teach You)
What Your Tech Lead Thinks You Know (But Didn't Teach You)What Your Tech Lead Thinks You Know (But Didn't Teach You)
What Your Tech Lead Thinks You Know (But Didn't Teach You)
 
The Future of Data Engineering - 2019 InfoQ QConSF
The Future of Data Engineering - 2019 InfoQ QConSFThe Future of Data Engineering - 2019 InfoQ QConSF
The Future of Data Engineering - 2019 InfoQ QConSF
 
Airflow at WePay
Airflow at WePayAirflow at WePay
Airflow at WePay
 
Apache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInApache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedIn
 
Apache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInApache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedIn
 
Building Applications on YARN
Building Applications on YARNBuilding Applications on YARN
Building Applications on YARN
 

Kürzlich hochgeladen

BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesPrabhanshu Chaturvedi
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 

Kürzlich hochgeladen (20)

BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 

Data Warehousing Trends

  • 1. Data Tech Talk ft. external speaker Chris Riccomini, Author, Engineer & Manager, … 1. Talk: Data Warehousing Trends 2. Open Dialogue: Q & A
  • 3. Hi, I’m Chris ● Engineer & Manager WePay, LinkedIn, PayPal ● Open Source Apache Samza, Apache Airflow, Debezium ● Author The Missing README ● Investor & Advisor Prefect, Meroxa, StarTree, Amundsen, Anomalo, TopCoat, ... @criccomini
  • 4.
  • 5. The Trends ● Realtime DWHs ● Analytics Engineering ● Data Mesh ● Data Catalogs ● Reverse ETL ● Headless BI ● Data Integrity ● Data Lakehouses ● DataOps ● White-label Data Viz https://twitter.com/criccomini/status/1451557884769169412 https://preset.io/blog/reshaping-data-engineering/
  • 6. The Trends ● Realtime DWHs ● Analytics Engineering ● Data Mesh ● Data Catalogs ● Reverse ETL ● Headless BI ● Data Integrity ● Data Lakehouses ● DataOps ● White-label Data Viz https://twitter.com/criccomini/status/1451557884769169412 https://preset.io/blog/reshaping-data-engineering/
  • 13. Why Realtime DWHs? ● Debugging ○ Investigate application errors ○ Audit log shows how things changed ● Operational ○ Monitoring ○ Scripts that pull from DWH ● Security/Compliance ○ Audit log ● Customer data products ○ Ad hoc customer reports (e.g. Stripe Sigma, WePay txns) ○ Data clean rooms
  • 14. Realtime DWHs Technical Advantages ● Handles hard deletes ● No schema requirements (timestamps) ● Replay from Kafka ● Data integration
  • 15. Realtime DWH Drawbacks ● Operationally complex ● Depends on source DB support (for CDC) ● Inline transformation is harder ● Fixing bad data is harder
  • 17. “A data mesh is a type of data platform architecture that embraces the ubiquity of data in the enterprise by leveraging a domain-oriented, self-serve design.” ● Domain-oriented decentralized data ownership and architecture ● Data as a product ● Self-serve data infrastructure as a platform ● Federated computational governance Data Mesh https://towardsdatascience.com/what-is-a-data-mesh-and-how-not-to-mesh-it-up-210710bb41e0
  • 18. “A data mesh is a type of data platform architecture that embraces the ubiquity of data in the enterprise by leveraging a domain-oriented, self-serve design.” ● Domain-oriented decentralized data ownership and architecture ● Data as a product ● Self-serve data infrastructure as a platform ● Federated computational governance Data Mesh https://towardsdatascience.com/what-is-a-data-mesh-and-how-not-to-mesh-it-up-210710bb41e0 wat.
  • 19. “A product is any item or service you sell to serve a customer's need or want.” Data is a Product https://www.aha.io/roadmapping/guide/product-management/what-is-a-product
  • 20. ● Customers ○ Data scientists ○ Business analysts ○ Finance ○ Sales ○ Product managers ○ Engineers ○ External customers ● Products ○ Recommender systems ○ Billing ○ Fraud ○ Reports ○ Dashboards Data is a Product https://cnr.sh/essays/what-the-heck-data-mesh
  • 21. ● Versioned ● Compatible ● Documented ● Monitored ● Self-serve ● Secure (AuthN, AuthZ) Treat Data Models like APIs https://cnr.sh/essays/what-the-heck-data-mesh
  • 22. ● Microservice ● DevOps We’ve Done This Before https://cnr.sh/essays/what-the-heck-data-mesh
  • 24. Metrics then ● BI tools to create and visualize metrics ○ Looker ○ Mode ○ Tableau ○ Data Studio ● Answer internal business questions ○ How is a product's health? ○ What does revenue look like?
  • 25. Metrics now ● Metrics matter for external business workflows ○ Predicting when a customer might churn ○ Notifying users when they reach their capacity limit ○ DS wants to create models to optimize certain metrics ○ Computing customer bills ● BI tools aren’t meant for this ○ Walled garden ○ Have to re-implement the same metrics in different systems
  • 26. “For data consumption, we heard complaints from decision makers that different teams reported different numbers for very simple business questions, and there was no easy way to know which number was correct.” "...the teams that own metrics would be able to define them once, in a way that’s consistent across dashboards, automation tools, sales reporting, and so on. Let’s call this ‘Headless BI’." Headless BI https://medium.com/airbnb-engineering/how-airbnb-achieved-metric-consistency-at-scale-f23cc53dea70 https://basecase.vc/blog/headless-bi
  • 27. Headless BI ● Programmatically manage business metrics ○ Automated ○ Centralized ○ Documented ○ Validated ○ Metadata/lineage ○ Backfills ○ Cost ○ Privacy ○ Access ○ Deprecation ○ Retention https://medium.com/airbnb-engineering/how-airbnb-achieved-metric-consistency-at-scale-f23cc53dea70
  • 29. Q&A ● Realtime DWHs ● Analytics Engineering ● Data Mesh ● Data Catalogs ● Reverse ETL ● Headless BI ● Data Integrity ● Data Lakehouses ● DataOps ● White-label Data Viz
  • 31. Analytics Engineering Analytics engineers provide clean data sets to end users, modeling data in a way that empowers end users to answer their own questions. While a data analyst spends their time analyzing data, an analytics engineer spends their time transforming, testing, deploying, and documenting data. Analytics engineers apply software engineering best practices like version control and continuous integration to the analytics code base. https://www.getdbt.com/what-is-analytics-engineering
  • 32. Analytics Engineering ● Job ○ Building ○ Testing ○ Cataloging ● Tools ○ DBT ○ Airflow ● Customers ○ Data science ○ Data analysts ○ BI ○ Reporting
  • 33. Data Catalogs ● Flavor of the month ○ Amundsen ○ DataHub ○ Metaphor ○ Marquez ○ Atlan ○ Collibra ○ Alation ● Use cases ○ Discoverability ○ Operations ○ Governance
  • 34. “Reverse ETL syncs data from a system of records like a warehouse to a system of actions like CRM, MAP, and other SaaS apps to operationalize data.” Reverse ETL https://blog.getcensus.com/what-is-reverse-etl/
  • 36. ● https://unsplash.com/photos/Lbvi0GGJWY4 ● https://unsplash.com/photos/8vTAAFYhFfQ ● https://www.flickr.com/photos/jurgenappelo/5201275209 ● Photos