SlideShare a Scribd company logo
1 of 36
Download to read offline
Unifying Data Warehousing
with Data Lakes
Ali Ghodsi, Co-Founder & CEO
Oct 25, 2017
Many enterprises are undergoing
a data transformation
Databricks Customers Across Industries
Financial	Services Healthcare	&	Pharma Media	&	Entertainment Technology
Public	Sector Retail	&	CPG		 Consumer	Services Energy	&	Industrial	IoTMarketing	&	AdTech
Data	&	Analytics	Services
Health care AI cloud dataset use case
Financial	Services Healthcare	&	Pharma Media	&	Entertainment Technology
Public	Sector Retail	&	CPG		 Consumer	Services Energy	&	Industrial	IoTMarketing	&	AdTech
Data	&	Analytics	Services
Correlate EMR of 50,000 patients
compared with their DNA
Financial	Services Healthcare	&	Pharma Media	&	Entertainment Technology
Public	Sector Retail	&	CPG		 Consumer	Services Energy	&	Industrial	IoTMarketing	&	AdTech
Data	&	Analytics	Services
Enterprise AI use case
5
Provide recommendations to sales
using NLP and deep learning
Financial	Services Healthcare	&	Pharma Media	&	Entertainment Technology
Public	Sector Retail	&	CPG		 Consumer	Services Energy	&	Industrial	IoTMarketing	&	AdTech
Data	&	Analytics	Services
6
Real-time AI use-case
Curb abusive behavior
across gamers globally
Big Data was the Missing Link for AI
BIG DATA
Customer Data
Emails/Web pages
Click Streams
Sensor data (IoT)
Video/Speech
…
Most companies are Struggling with Big Data
GREAT RESULTS
Hardest part of AI isn’t AI
“Hidden Technical Debt in Machine Learning Systems", Google NIPS 2015
The hardest part of AI is Big Data
ML	
Code
Building Predictive Applications
is really Hard!
Unified Analytics Platform
UNIFIED
EXPERIENCE
ACROSS TEAMS
UNIFIED
PROCESSING
ENGINE
The Evolution
of Big Data
The Era of the
Data Warehouse
Data Warehouse (DW)
THE GOOD
• Pristine Data
• Fast Queries
• Transactional
THE BAD
• Expensive to Scale, not Elastic
• Requires ETL, Stale Data, No Real-Time
• No Predictions, No ML
• Closed formats (lock in)
Not Future Proof – Missing Predictions, Real-time, Scale
ETL	important	data	to	central	DW	and	get	Business	Intelligence	(BI)
The Era of the
Data Lake
THE BAD
• Inconsistent Data
• Unreliable for Analytics
• Lack of Schema
• Poor Performance
Hadoop Data Lake
Become a cheap messy data store with poor performance
ETL	all	data	to	central	scalable	open	lake	for	all	use	cases
THE GOOD
• Massive scale
• Inexpensive Storage
• Open Formats (Parquet, ORC)
• Promise of ML & Real Time
Streaming
The Current State
of Data Platforms
Info Sec at a Fortune 100 Company
DISADVANTAGES OF ARCHITECTURE
• Poor agility in responding to new threats
• Scale Limitations, no historical data
• 6 Months and twenty people to build
ENTERPRISE DATA WAREHOUSE
• Only 2 weeks of data
• Very expensive to scale
• Proprietary Formats
• No Predictions (ML)
Messy data not ready for analytics
Billions of records a day HADOOP
DATA LAKE
Complex ETL
EDW
EDW
EDW
Incidence
Response
Alerting
Reports
The Next Generation
Data Platform
First UNIFIED data management system that delivers:
The
SCALE
of data lake
The
LOW-LATENCY
of streaming
The
RELIABILITY &
PERFORMANCE
of data warehouse
Announcing Databricks Delta
The
SCALE
of data lake
The
LOW-LATENCY
of streaming
The
RELIABILITY &
PERFORMANCE
of data warehouse
Databricks Delta
Enables Predictions, Real-time and Ad Hoc
Analytics at Massive Scale
THE GOOD
OF DATA LAKES
• Massive scale on Amazon S3
• Open Formats (Parquet, ORC)
• Predictions (ML) & Real Time
Streaming
THE GOOD
OF DATA WAREHOUSES
• Pristine Data
• Transactional Reliability
• Fast Queries (10-100x)
Databricks Delta Under the Hood
• Decouple Compute & Storage
• ACID Transactions & Data Validation
• Data Indexing & Caching (10-100x)
• Real-Time Streaming Ingest
MASSIVE SCALE
RELIABILITY
PERFORMANCE
LOW-LATENCY
Info Sec with Databricks Delta
DATABRICKS RUNTIME
powered by
DATABRICKS
RUNTIME
Trillion Records a Day
DATABRICKS
DELTA
ETL, Schema Validation SQL , ML, Stream
ADVANTAGES
• AI capable data warehouse at the scale of a data lake
• Interactive analysis on 2 years of data
• 2 Weeks to build with a 5 person data platform team
Unified Analytics Platform
UNIFIED
EXPERIENCE
ACROSS TEAMS
Notebooks,Dashboards,Reports
Unified Analytics Platform
+
UNIFIED
DATA
MANAGEMENT
ReliableTransactions,Performance
UNIFIED
EXPERIENCE
ACROSS TEAMS
Notebooks,Dashboards,Reports
Demo by Michael Armbrust
Evolution of a Cutting-Edge Data Pipeline
Events
?
Reporting
Streaming
Analytics
Data Lake
Evolution of a Cutting-Edge Data Pipeline
Events
Reporting
Streaming
Analytics
Data Lake
Challenge #1: Historical Queries?
Data Lake
λ-arch
λ-arch
Streaming
Analytics
Reporting
Events
λ-arch1
1
1
Challenge #2: Messy Data?
Data Lake
λ-arch
λ-arch
Streaming
Analytics
Reporting
Events
Validation
λ-arch
Validation
1
21
1
2
Reprocessing
Challenge #3: Mistakes and Failures?
Data Lake
λ-arch
λ-arch
Streaming
Analytics
Reporting
Events
Validation
λ-arch
Validation
Reprocessing
Partitioned
1
2
3
1
1
3
2
Reprocessing
Challenge #4: Query Performance?
Data Lake
λ-arch
λ-arch
Streaming
Analytics
Reporting
Events
Validation
λ-arch
Validation
Reprocessing
Compaction
Partitioned
Compact
Small Files
Scheduled to
Avoid Compaction
1
2
3
1
1
2
4
4
4
2
Let’s try it instead with
DELTA
Reprocessing
The Canonical Data Pipeline
Data Lake
λ-arch
λ-arch
Streaming
Analytics
Reporting
Events
Validation
λ-arch
Validation
Reprocessing
Compaction
Partitioned
Compact
Small Files
Scheduled to
Avoid Compaction
1
2
3
1
1
2
4
4
4
2
Challenge
DELTA
DATA LAKE
Reporting
Streaming
Analytics
The
LOW-LATENCY
of streaming
The
RELIABILITY &
PERFORMANCE
of data warehouse
The
SCALE
of data lake
The Delta Architecture
Sign up for the Private Beta
visit databricks.

More Related Content

What's hot

Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 

What's hot (20)

Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptx
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Azure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene PolonichkoAzure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene Polonichko
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
Databricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI InitiativesDatabricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI Initiatives
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Improving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureImproving Data Literacy Around Data Architecture
Improving Data Literacy Around Data Architecture
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Snowflake Datawarehouse Architecturing
Snowflake Datawarehouse ArchitecturingSnowflake Datawarehouse Architecturing
Snowflake Datawarehouse Architecturing
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 

Similar to Introducing Databricks Delta

4th Industrial Revolution
4th Industrial Revolution4th Industrial Revolution
4th Industrial Revolution
Rolando Rangel
 
State of Big Data Markets
State of Big Data MarketsState of Big Data Markets
State of Big Data Markets
Kyle Redinger
 
Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...
Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...
Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...
DATAVERSITY
 
Ron Kasabian - Intel Big Data & Cloud Summit 2013
Ron Kasabian - Intel Big Data & Cloud Summit 2013Ron Kasabian - Intel Big Data & Cloud Summit 2013
Ron Kasabian - Intel Big Data & Cloud Summit 2013
IntelAPAC
 

Similar to Introducing Databricks Delta (20)

Unlock Data-driven Insights in Databricks Using Location Intelligence
Unlock Data-driven Insights in Databricks Using Location IntelligenceUnlock Data-driven Insights in Databricks Using Location Intelligence
Unlock Data-driven Insights in Databricks Using Location Intelligence
 
Refactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics ProductsRefactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics Products
 
There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017
 
4th Industrial Revolution
4th Industrial Revolution4th Industrial Revolution
4th Industrial Revolution
 
Introduction to Harnessing Big Data
Introduction to Harnessing Big DataIntroduction to Harnessing Big Data
Introduction to Harnessing Big Data
 
State of Big Data Markets
State of Big Data MarketsState of Big Data Markets
State of Big Data Markets
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2
 
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
 
Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)
 
Denodo Data Virtualization - IT Days in Luxembourg with Oktopus
Denodo Data Virtualization - IT Days in Luxembourg with OktopusDenodo Data Virtualization - IT Days in Luxembourg with Oktopus
Denodo Data Virtualization - IT Days in Luxembourg with Oktopus
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
 
Bigdata
Bigdata Bigdata
Bigdata
 
Dell Digital Transformation Through AI and Data Analytics Webinar
Dell Digital Transformation Through AI and  Data Analytics WebinarDell Digital Transformation Through AI and  Data Analytics Webinar
Dell Digital Transformation Through AI and Data Analytics Webinar
 
Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...
Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...
Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...
 
Ron Kasabian - Intel Big Data & Cloud Summit 2013
Ron Kasabian - Intel Big Data & Cloud Summit 2013Ron Kasabian - Intel Big Data & Cloud Summit 2013
Ron Kasabian - Intel Big Data & Cloud Summit 2013
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond
 
IBM Solutions Connect 2013 - Getting started with Big Data
IBM Solutions Connect 2013 - Getting started with Big DataIBM Solutions Connect 2013 - Getting started with Big Data
IBM Solutions Connect 2013 - Getting started with Big Data
 
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
 
The Evolution of Data Architecture
The Evolution of Data ArchitectureThe Evolution of Data Architecture
The Evolution of Data Architecture
 

More from Databricks

Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Databricks
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Databricks
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
Databricks
 

More from Databricks (20)

Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and Quality
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
 
Improving Apache Spark for Dynamic Allocation and Spot Instances
Improving Apache Spark for Dynamic Allocation and Spot InstancesImproving Apache Spark for Dynamic Allocation and Spot Instances
Improving Apache Spark for Dynamic Allocation and Spot Instances
 

Recently uploaded

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
shambhavirathore45
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
shivangimorya083
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 

Introducing Databricks Delta

  • 1. Unifying Data Warehousing with Data Lakes Ali Ghodsi, Co-Founder & CEO Oct 25, 2017
  • 2. Many enterprises are undergoing a data transformation
  • 3. Databricks Customers Across Industries Financial Services Healthcare & Pharma Media & Entertainment Technology Public Sector Retail & CPG Consumer Services Energy & Industrial IoTMarketing & AdTech Data & Analytics Services
  • 4. Health care AI cloud dataset use case Financial Services Healthcare & Pharma Media & Entertainment Technology Public Sector Retail & CPG Consumer Services Energy & Industrial IoTMarketing & AdTech Data & Analytics Services Correlate EMR of 50,000 patients compared with their DNA
  • 5. Financial Services Healthcare & Pharma Media & Entertainment Technology Public Sector Retail & CPG Consumer Services Energy & Industrial IoTMarketing & AdTech Data & Analytics Services Enterprise AI use case 5 Provide recommendations to sales using NLP and deep learning
  • 6. Financial Services Healthcare & Pharma Media & Entertainment Technology Public Sector Retail & CPG Consumer Services Energy & Industrial IoTMarketing & AdTech Data & Analytics Services 6 Real-time AI use-case Curb abusive behavior across gamers globally
  • 7. Big Data was the Missing Link for AI BIG DATA Customer Data Emails/Web pages Click Streams Sensor data (IoT) Video/Speech … Most companies are Struggling with Big Data GREAT RESULTS
  • 8. Hardest part of AI isn’t AI “Hidden Technical Debt in Machine Learning Systems", Google NIPS 2015 The hardest part of AI is Big Data ML Code
  • 10. Unified Analytics Platform UNIFIED EXPERIENCE ACROSS TEAMS UNIFIED PROCESSING ENGINE
  • 12. The Era of the Data Warehouse
  • 13. Data Warehouse (DW) THE GOOD • Pristine Data • Fast Queries • Transactional THE BAD • Expensive to Scale, not Elastic • Requires ETL, Stale Data, No Real-Time • No Predictions, No ML • Closed formats (lock in) Not Future Proof – Missing Predictions, Real-time, Scale ETL important data to central DW and get Business Intelligence (BI)
  • 14. The Era of the Data Lake
  • 15. THE BAD • Inconsistent Data • Unreliable for Analytics • Lack of Schema • Poor Performance Hadoop Data Lake Become a cheap messy data store with poor performance ETL all data to central scalable open lake for all use cases THE GOOD • Massive scale • Inexpensive Storage • Open Formats (Parquet, ORC) • Promise of ML & Real Time Streaming
  • 16. The Current State of Data Platforms
  • 17. Info Sec at a Fortune 100 Company DISADVANTAGES OF ARCHITECTURE • Poor agility in responding to new threats • Scale Limitations, no historical data • 6 Months and twenty people to build ENTERPRISE DATA WAREHOUSE • Only 2 weeks of data • Very expensive to scale • Proprietary Formats • No Predictions (ML) Messy data not ready for analytics Billions of records a day HADOOP DATA LAKE Complex ETL EDW EDW EDW Incidence Response Alerting Reports
  • 19. First UNIFIED data management system that delivers: The SCALE of data lake The LOW-LATENCY of streaming The RELIABILITY & PERFORMANCE of data warehouse Announcing Databricks Delta
  • 20. The SCALE of data lake The LOW-LATENCY of streaming The RELIABILITY & PERFORMANCE of data warehouse
  • 21. Databricks Delta Enables Predictions, Real-time and Ad Hoc Analytics at Massive Scale THE GOOD OF DATA LAKES • Massive scale on Amazon S3 • Open Formats (Parquet, ORC) • Predictions (ML) & Real Time Streaming THE GOOD OF DATA WAREHOUSES • Pristine Data • Transactional Reliability • Fast Queries (10-100x)
  • 22. Databricks Delta Under the Hood • Decouple Compute & Storage • ACID Transactions & Data Validation • Data Indexing & Caching (10-100x) • Real-Time Streaming Ingest MASSIVE SCALE RELIABILITY PERFORMANCE LOW-LATENCY
  • 23. Info Sec with Databricks Delta DATABRICKS RUNTIME powered by DATABRICKS RUNTIME Trillion Records a Day DATABRICKS DELTA ETL, Schema Validation SQL , ML, Stream ADVANTAGES • AI capable data warehouse at the scale of a data lake • Interactive analysis on 2 years of data • 2 Weeks to build with a 5 person data platform team
  • 24. Unified Analytics Platform UNIFIED EXPERIENCE ACROSS TEAMS Notebooks,Dashboards,Reports
  • 26. Demo by Michael Armbrust
  • 27. Evolution of a Cutting-Edge Data Pipeline Events ? Reporting Streaming Analytics Data Lake
  • 28. Evolution of a Cutting-Edge Data Pipeline Events Reporting Streaming Analytics Data Lake
  • 29. Challenge #1: Historical Queries? Data Lake λ-arch λ-arch Streaming Analytics Reporting Events λ-arch1 1 1
  • 30. Challenge #2: Messy Data? Data Lake λ-arch λ-arch Streaming Analytics Reporting Events Validation λ-arch Validation 1 21 1 2
  • 31. Reprocessing Challenge #3: Mistakes and Failures? Data Lake λ-arch λ-arch Streaming Analytics Reporting Events Validation λ-arch Validation Reprocessing Partitioned 1 2 3 1 1 3 2
  • 32. Reprocessing Challenge #4: Query Performance? Data Lake λ-arch λ-arch Streaming Analytics Reporting Events Validation λ-arch Validation Reprocessing Compaction Partitioned Compact Small Files Scheduled to Avoid Compaction 1 2 3 1 1 2 4 4 4 2
  • 33. Let’s try it instead with DELTA
  • 34. Reprocessing The Canonical Data Pipeline Data Lake λ-arch λ-arch Streaming Analytics Reporting Events Validation λ-arch Validation Reprocessing Compaction Partitioned Compact Small Files Scheduled to Avoid Compaction 1 2 3 1 1 2 4 4 4 2 Challenge
  • 35. DELTA DATA LAKE Reporting Streaming Analytics The LOW-LATENCY of streaming The RELIABILITY & PERFORMANCE of data warehouse The SCALE of data lake The Delta Architecture
  • 36. Sign up for the Private Beta visit databricks.