5. Need to process data from
• Multiple sources
• Different data stores and locations
• Different formats
Traditional solutions: ETL data into
data warehouse, …
Traditional Data Warehouses
ETL
Slow to access and combine data
Data Warehouse
7. Process data in place or stream it
• No need to wait for data to be
ETLed
7
JIT Data Warehouse
ETL
Data Warehouse
8. Process data in place or stream it
• No need to wait for data to be
ETLed
Cachedata in memory or SSDs
8
JIT Data Warehouse
Low latency and easy to combine data: value!
10. Analogy
10
ETL & Query
Data
Source A
ETL
Data Warehouse
Data
Source B
Data
Source B
Data
Source A
Data
Source B
Data
Source B
Stream/Cache + Query
11. Top-3 Media Company
Data sources
• Traditional data warehouse:Customer transaction and profile data
• S3: Clickstream and historical logs
• Elasticsearch: User-submitted reviewsand comments
• Kafka: Streaming online eventdata
Build Spark-basedJIT Data Warehouseto perform real-time analytics
11
16. Large On-line Service Company
Leverages
• Interactive query processing
• ML
and combines data from S3, Redshift, and HBase to provide
• data analyticsfor productmanagementteam
• advanced predictive analyticsto delivernew services(e.g.,
customized inventory displaystailored to each user)
16