SlideShare ist ein Scribd-Unternehmen logo
1 von 25
1© Cloudera, Inc. All rights reserved.
More Data in Less Time
Deploying an Operational Data Store with Cloudera
2© Cloudera, Inc. All rights reserved.
Trends in the Market
16 billion connected devices
generating more data
“It will soon be technically
feasible & affordable to record
& store everything…”
ELT drives up to 80% of
database capacity
Internet of Things Data Storage Costs Resource Intensive ELT
Trends Driving Change
Source: Forbes Source: New York Times Source: Syncsort
3© Cloudera, Inc. All rights reserved.
Customers are augmenting their
traditional architectures for
modern business needs.
4© Cloudera, Inc. All rights reserved.
Operational Data Store (ODS):
Ingesting, storing, and preparing data for
both operational and analytical use.
(AKA: Operational Data Warehouse., RDBMS, Storage)
5© Cloudera, Inc. All rights reserved.
ODS Use Cases
Offload resource intensive ETL
workloads from systems
Migrate old data and ELT
workloads off of EDW
Store old data online so analyst
can access historic data
ETL Offload EDW Optimization Active Archive
6© Cloudera, Inc. All rights reserved.
Goals of an Operational Data Store
Ingest Data Store DataPrepare Data
Enterprise Data Warehouse
ApplicationsData Sources
Structured
Unstructured
Ingest
Operational Data Store
Traditional Architecture
Enterprise Data Warehouse
ServeELT
Archive
BI System
Modeling
Reporting
ETL
Storage #1
Storage #2
Storage N
Ingest
Process
Load
7© Cloudera, Inc. All rights reserved.
Challenges with a Traditional Architecture
1) Limited Data Ingest
Enterprise Data Warehouse
ApplicationsData Sources
Structured
Unstructured
Ingest
Operational Data Store
Traditional Architecture
Enterprise Data Warehouse
ServeELT
Archive
BI System
Modeling
Reporting
ETL
Storage #1
Storage #2
Storage N
Ingest
Process
Load
1
8© Cloudera, Inc. All rights reserved.
Challenges with a Traditional Architecture
1) Limited Data Ingest 2) Inefficient Data Processing
Enterprise Data Warehouse
ApplicationsData Sources
Structured
Unstructured
Ingest
Operational Data Store
Traditional Architecture
Enterprise Data Warehouse
ServeELT
Archive
BI System
Modeling
Reporting
ETL
Storage #1
Storage #2
Storage N
Ingest
Process
Load
1
2
2
9© Cloudera, Inc. All rights reserved.
Challenges with a Traditional Architecture
1) Limited Data Ingest 2) Inefficient Data Processing 3) Data Archived
Enterprise Data Warehouse
ApplicationsData Sources
Structured
Unstructured
Ingest
Operational Data Store
Traditional Architecture
Enterprise Data Warehouse
ServeELT
Archive
BI System
Modeling
Reporting
ETL
Storage #1
Storage #2
Storage N
Ingest
Process
Load
1
2
2
3
10© Cloudera, Inc. All rights reserved.
A New Way Forward
1) Ingest More Data
ApplicationsData Sources
Structured
Unstructured
Operational Data Store
Modern Architecture
Enterprise Data Warehouse
EDHIngest
Active
Structured Data
Serve
Serve
ELT
Archive
Load
1
ETL
BI System
Modeling
Reporting
11© Cloudera, Inc. All rights reserved.
A New Way Forward
1) Ingest More Data 2) Optimize Data Processing
ApplicationsData Sources
Structured
Unstructured
Operational Data Store
Modern Architecture
Enterprise Data Warehouse
EDHIngest
Active
Structured Data
Serve
Serve
ELT
Archive
Load
2
1
ETL
BI System
Modeling
Reporting
12© Cloudera, Inc. All rights reserved.
A New Way Forward
1) Ingest More Data 2) Optimize Data Processing 3) Automated Secure Archive
ApplicationsData Sources
Structured
Unstructured
Operational Data Store
Modern Architecture
Enterprise Data Warehouse
EDHIngest
Active
Structured Data
Serve
Serve
ELT
Archive
Load
2
31
ETL
BI System
Modeling
Reporting
13© Cloudera, Inc. All rights reserved.
RelayHealth Customer Story
14© Cloudera, Inc. All rights reserved.
About RelayHealth (A McKesson Business)
What does RelayHealth do-
RelayHealth is a financial solution of McKesson used to automate 2.4 billion financial transactions per year
200K Physicians, 2K Hospitals, 1.9K Payers/ Health Plans
Who is McKesson-
Largest healthcare solution company in the world with $103+ billion in revenue
Headquarters in San Francisco and established in 1833
32K employees
15© Cloudera, Inc. All rights reserved.
RelayHealth’s Objectives
Offload resource intensive ETL
workloads from systems
Migrate old data and ELT
workloads off of EDW
Store old data online so analyst
can access historic data
ETL Offload EDW Optimization Active Archive
16© Cloudera, Inc. All rights reserved.
The Pre-Hadoop Environment
1 Deleted & archived information
Challenges
OLTP
Claim
Submitters
Various
Applications
RDBMS
EDW
Reports
Archive
1
RelayHealth Transaction
BATCH Processing System
17© Cloudera, Inc. All rights reserved.
The Pre-Hadoop Environment
1 Deleted & archived information
Challenges
OLTP
Claim
Submitters
Various
Applications
RDBMS
EDW
Reports
Archive
2 Batch wasn’t cutting it
1
2
RelayHealth Transaction
BATCH Processing System
18© Cloudera, Inc. All rights reserved.
The Pre-Hadoop Environment
1 Deleted & archived information
Challenges
OLTP
Claim
Submitters
Various
Applications
RDBMS
EDW
Reports
Archive
2 Batch wasn’t cutting it
3 Application & report latency
1
3
3
2
3
RelayHealth Transaction
BATCH Processing System
19© Cloudera, Inc. All rights reserved.
RelayHealth’s Modern Hadoop Architecture
Active archive on Hadoop1
Improvements
Traditional BATCH Processing
Hadoop STREAM Processing
Process
Payer
Application
Reports
Spark
Streaming
Claim
Submitters
RelayHealth Transaction Processing System
Ingest Store Access
Kafka Hbase
Search
Spark
Modeling
1
20© Cloudera, Inc. All rights reserved.
RelayHealth’s Modern Hadoop Architecture
Active archive on Hadoop1
Improvements
Traditional BATCH Processing
Hadoop STREAM Processing
Process
Payer
Application
Reports
Spark
Streaming
Claim
Submitters
RelayHealth Transaction Processing System
Ingest Store Access
Kafka Hbase
Search
Spark
Modeling
Stream & batch processing2
2
1
21© Cloudera, Inc. All rights reserved.
RelayHealth’s Modern Hadoop Architecture
Active archive on Hadoop1
Improvements
Traditional BATCH Processing
Hadoop STREAM Processing
Process
Payer
Application
Reports
Spark
Streaming
Claim
Submitters
RelayHealth Transaction Processing System
Ingest Store Access
Kafka Hbase
Search
Spark
Modeling
Stream & batch processing2
Prepared for future use cases3
2
3
1
22© Cloudera, Inc. All rights reserved.
Business and Technical ROI
Technology ROI
Business ROI
1) Active archive and Navigator for HIPAA compliance
2) Prepared for future use cases
3) Data ingest goes from end of day to near real-time
1) Transaction processed in 20ms VS 1 hour prior
2) $250k in licensing and hardware savings per year
3) Greater flexibility with data ingest
23© Cloudera, Inc. All rights reserved.
Key Leanings
Crawl, walk, run
It takes time, start now
Lean on experts in the community
24© Cloudera, Inc. All rights reserved.
INSERT PARTNER SLIDES
25© Cloudera, Inc. All rights reserved.
Thank you

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
Edureka!
 
Introduction to column oriented databases
Introduction to column oriented databasesIntroduction to column oriented databases
Introduction to column oriented databases
ArangoDB Database
 

Was ist angesagt? (20)

Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform System
 
Slowly changing dimension
Slowly changing dimension Slowly changing dimension
Slowly changing dimension
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Dw & etl concepts
Dw & etl conceptsDw & etl concepts
Dw & etl concepts
 
Elastic Data Warehousing
Elastic Data WarehousingElastic Data Warehousing
Elastic Data Warehousing
 
Azure Data Factory Data Flows Training (Sept 2020 Update)
Azure Data Factory Data Flows Training (Sept 2020 Update)Azure Data Factory Data Flows Training (Sept 2020 Update)
Azure Data Factory Data Flows Training (Sept 2020 Update)
 
Etl techniques
Etl techniquesEtl techniques
Etl techniques
 
Snowflake Datawarehouse Architecturing
Snowflake Datawarehouse ArchitecturingSnowflake Datawarehouse Architecturing
Snowflake Datawarehouse Architecturing
 
Data Vault Introduction
Data Vault IntroductionData Vault Introduction
Data Vault Introduction
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
The Future of Data Warehousing and Data Integration
The Future of Data Warehousing and Data IntegrationThe Future of Data Warehousing and Data Integration
The Future of Data Warehousing and Data Integration
 
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
OLAP v/s OLTP
OLAP v/s OLTPOLAP v/s OLTP
OLAP v/s OLTP
 
Snowflake Overview
Snowflake OverviewSnowflake Overview
Snowflake Overview
 
SSAS Tabular model importance and uses
SSAS  Tabular model importance and usesSSAS  Tabular model importance and uses
SSAS Tabular model importance and uses
 
Introduction to column oriented databases
Introduction to column oriented databasesIntroduction to column oriented databases
Introduction to column oriented databases
 
A 30 day plan to start ending your data struggle with Snowflake
A 30 day plan to start ending your data struggle with SnowflakeA 30 day plan to start ending your data struggle with Snowflake
A 30 day plan to start ending your data struggle with Snowflake
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 
ETL Process
ETL ProcessETL Process
ETL Process
 

Ähnlich wie Breakout: Hadoop and the Operational Data Store

Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafka
Jeffrey T. Pollock
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
Cloudera, Inc.
 
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
EMC
 

Ähnlich wie Breakout: Hadoop and the Operational Data Store (20)

Breakout: Data Discovery with Hadoop
Breakout: Data Discovery with HadoopBreakout: Data Discovery with Hadoop
Breakout: Data Discovery with Hadoop
 
CS-Op Analytics
CS-Op AnalyticsCS-Op Analytics
CS-Op Analytics
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)
 
Modernise your EDW - Data Lake
Modernise your EDW - Data LakeModernise your EDW - Data Lake
Modernise your EDW - Data Lake
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Times ten 18.1_overview_meetup
Times ten 18.1_overview_meetupTimes ten 18.1_overview_meetup
Times ten 18.1_overview_meetup
 
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
 
Google take on heterogeneous data base replication
Google take on heterogeneous data base replication Google take on heterogeneous data base replication
Google take on heterogeneous data base replication
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafka
 
times ten in-memory database for extreme performance
times ten in-memory database for extreme performancetimes ten in-memory database for extreme performance
times ten in-memory database for extreme performance
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
 
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
 
Data Virtualization and ETL
Data Virtualization and ETLData Virtualization and ETL
Data Virtualization and ETL
 
Breakout: Operational Analytics with Hadoop
Breakout: Operational Analytics with HadoopBreakout: Operational Analytics with Hadoop
Breakout: Operational Analytics with Hadoop
 
Delivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with SnowflakeDelivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with Snowflake
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
 

Mehr von Cloudera, Inc.

Mehr von Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Kürzlich hochgeladen

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Breakout: Hadoop and the Operational Data Store

  • 1. 1© Cloudera, Inc. All rights reserved. More Data in Less Time Deploying an Operational Data Store with Cloudera
  • 2. 2© Cloudera, Inc. All rights reserved. Trends in the Market 16 billion connected devices generating more data “It will soon be technically feasible & affordable to record & store everything…” ELT drives up to 80% of database capacity Internet of Things Data Storage Costs Resource Intensive ELT Trends Driving Change Source: Forbes Source: New York Times Source: Syncsort
  • 3. 3© Cloudera, Inc. All rights reserved. Customers are augmenting their traditional architectures for modern business needs.
  • 4. 4© Cloudera, Inc. All rights reserved. Operational Data Store (ODS): Ingesting, storing, and preparing data for both operational and analytical use. (AKA: Operational Data Warehouse., RDBMS, Storage)
  • 5. 5© Cloudera, Inc. All rights reserved. ODS Use Cases Offload resource intensive ETL workloads from systems Migrate old data and ELT workloads off of EDW Store old data online so analyst can access historic data ETL Offload EDW Optimization Active Archive
  • 6. 6© Cloudera, Inc. All rights reserved. Goals of an Operational Data Store Ingest Data Store DataPrepare Data Enterprise Data Warehouse ApplicationsData Sources Structured Unstructured Ingest Operational Data Store Traditional Architecture Enterprise Data Warehouse ServeELT Archive BI System Modeling Reporting ETL Storage #1 Storage #2 Storage N Ingest Process Load
  • 7. 7© Cloudera, Inc. All rights reserved. Challenges with a Traditional Architecture 1) Limited Data Ingest Enterprise Data Warehouse ApplicationsData Sources Structured Unstructured Ingest Operational Data Store Traditional Architecture Enterprise Data Warehouse ServeELT Archive BI System Modeling Reporting ETL Storage #1 Storage #2 Storage N Ingest Process Load 1
  • 8. 8© Cloudera, Inc. All rights reserved. Challenges with a Traditional Architecture 1) Limited Data Ingest 2) Inefficient Data Processing Enterprise Data Warehouse ApplicationsData Sources Structured Unstructured Ingest Operational Data Store Traditional Architecture Enterprise Data Warehouse ServeELT Archive BI System Modeling Reporting ETL Storage #1 Storage #2 Storage N Ingest Process Load 1 2 2
  • 9. 9© Cloudera, Inc. All rights reserved. Challenges with a Traditional Architecture 1) Limited Data Ingest 2) Inefficient Data Processing 3) Data Archived Enterprise Data Warehouse ApplicationsData Sources Structured Unstructured Ingest Operational Data Store Traditional Architecture Enterprise Data Warehouse ServeELT Archive BI System Modeling Reporting ETL Storage #1 Storage #2 Storage N Ingest Process Load 1 2 2 3
  • 10. 10© Cloudera, Inc. All rights reserved. A New Way Forward 1) Ingest More Data ApplicationsData Sources Structured Unstructured Operational Data Store Modern Architecture Enterprise Data Warehouse EDHIngest Active Structured Data Serve Serve ELT Archive Load 1 ETL BI System Modeling Reporting
  • 11. 11© Cloudera, Inc. All rights reserved. A New Way Forward 1) Ingest More Data 2) Optimize Data Processing ApplicationsData Sources Structured Unstructured Operational Data Store Modern Architecture Enterprise Data Warehouse EDHIngest Active Structured Data Serve Serve ELT Archive Load 2 1 ETL BI System Modeling Reporting
  • 12. 12© Cloudera, Inc. All rights reserved. A New Way Forward 1) Ingest More Data 2) Optimize Data Processing 3) Automated Secure Archive ApplicationsData Sources Structured Unstructured Operational Data Store Modern Architecture Enterprise Data Warehouse EDHIngest Active Structured Data Serve Serve ELT Archive Load 2 31 ETL BI System Modeling Reporting
  • 13. 13© Cloudera, Inc. All rights reserved. RelayHealth Customer Story
  • 14. 14© Cloudera, Inc. All rights reserved. About RelayHealth (A McKesson Business) What does RelayHealth do- RelayHealth is a financial solution of McKesson used to automate 2.4 billion financial transactions per year 200K Physicians, 2K Hospitals, 1.9K Payers/ Health Plans Who is McKesson- Largest healthcare solution company in the world with $103+ billion in revenue Headquarters in San Francisco and established in 1833 32K employees
  • 15. 15© Cloudera, Inc. All rights reserved. RelayHealth’s Objectives Offload resource intensive ETL workloads from systems Migrate old data and ELT workloads off of EDW Store old data online so analyst can access historic data ETL Offload EDW Optimization Active Archive
  • 16. 16© Cloudera, Inc. All rights reserved. The Pre-Hadoop Environment 1 Deleted & archived information Challenges OLTP Claim Submitters Various Applications RDBMS EDW Reports Archive 1 RelayHealth Transaction BATCH Processing System
  • 17. 17© Cloudera, Inc. All rights reserved. The Pre-Hadoop Environment 1 Deleted & archived information Challenges OLTP Claim Submitters Various Applications RDBMS EDW Reports Archive 2 Batch wasn’t cutting it 1 2 RelayHealth Transaction BATCH Processing System
  • 18. 18© Cloudera, Inc. All rights reserved. The Pre-Hadoop Environment 1 Deleted & archived information Challenges OLTP Claim Submitters Various Applications RDBMS EDW Reports Archive 2 Batch wasn’t cutting it 3 Application & report latency 1 3 3 2 3 RelayHealth Transaction BATCH Processing System
  • 19. 19© Cloudera, Inc. All rights reserved. RelayHealth’s Modern Hadoop Architecture Active archive on Hadoop1 Improvements Traditional BATCH Processing Hadoop STREAM Processing Process Payer Application Reports Spark Streaming Claim Submitters RelayHealth Transaction Processing System Ingest Store Access Kafka Hbase Search Spark Modeling 1
  • 20. 20© Cloudera, Inc. All rights reserved. RelayHealth’s Modern Hadoop Architecture Active archive on Hadoop1 Improvements Traditional BATCH Processing Hadoop STREAM Processing Process Payer Application Reports Spark Streaming Claim Submitters RelayHealth Transaction Processing System Ingest Store Access Kafka Hbase Search Spark Modeling Stream & batch processing2 2 1
  • 21. 21© Cloudera, Inc. All rights reserved. RelayHealth’s Modern Hadoop Architecture Active archive on Hadoop1 Improvements Traditional BATCH Processing Hadoop STREAM Processing Process Payer Application Reports Spark Streaming Claim Submitters RelayHealth Transaction Processing System Ingest Store Access Kafka Hbase Search Spark Modeling Stream & batch processing2 Prepared for future use cases3 2 3 1
  • 22. 22© Cloudera, Inc. All rights reserved. Business and Technical ROI Technology ROI Business ROI 1) Active archive and Navigator for HIPAA compliance 2) Prepared for future use cases 3) Data ingest goes from end of day to near real-time 1) Transaction processed in 20ms VS 1 hour prior 2) $250k in licensing and hardware savings per year 3) Greater flexibility with data ingest
  • 23. 23© Cloudera, Inc. All rights reserved. Key Leanings Crawl, walk, run It takes time, start now Lean on experts in the community
  • 24. 24© Cloudera, Inc. All rights reserved. INSERT PARTNER SLIDES
  • 25. 25© Cloudera, Inc. All rights reserved. Thank you

Hinweis der Redaktion

  1. Data storage costs: http://thecaucus.blogs.nytimes.com/2012/08/14/advances-in-data-storage-have-implications-for-government-surveillance/IoT: http://www.forbes.com/sites/gilpress/2014/08/22/internet-of-things-by-the-numbers-market-estimates-and-forecasts/ Resource Intensive ELT: http://www.syncsort.com/getattachment/45696aa9-1e40-43cb-8905-b9fc7e2519f7/Syncsort-Data-Warehouse-Offload-Solution.aspx
  2. An Operational Data Store provides a staging environment in order to ingest, store, and process data in preparation for operational and analytical use. Depending on whether or not this data is structured or unstructured, different systems can be used to optimize data pipelines. The only challenge is that as your organization continues to ask for larger volumes of diverse data, traditional systems face issues.
  3. These challenges specifically arise around data storage and processing. The first challenge is limited data access. Collecting and ingesting a wide variety of diverse data is not a simple task and usually results in additional systems, or capacity being added to the architecture. As the business continues to ask for more data this continues to put strains on IT. In order to avoid these challenges only the most valuable data is brought in, limiting the businesses access to data that could be extremely valuable. The second challenges that we see organizations try to hurdle is around processing data volumes. These organizations have already collected and operationalized large volumes of data and need to process this data efficiently in order to meet SLAs. If data doesn’t reach the employees in a timely manner then they continue on without the most recent information. The third and final set of challenges is around archiving data. When systems reach capacity as larger volumes of diverse data is leveraged within an organization, this causes IT professionals to archive or delete data that has been deemed “invaluable”. When data is moved offline to an archive, this significantly reduces the return on the data and can hurt the business. This data can be extremely important as analyst attempt to find patterns in historic data but can’t access this information because it’s offline. However, as the external and internal data environment has changed over the years so has the data management space.
  4. These challenges specifically arise around data storage and processing. The first challenge is limited data access. Collecting and ingesting a wide variety of diverse data is not a simple task and usually results in additional systems, or capacity being added to the architecture. As the business continues to ask for more data this continues to put strains on IT. In order to avoid these challenges only the most valuable data is brought in, limiting the businesses access to data that could be extremely valuable. The second challenges that we see organizations try to hurdle is around processing data volumes. These organizations have already collected and operationalized large volumes of data and need to process this data efficiently in order to meet SLAs. If data doesn’t reach the employees in a timely manner then they continue on without the most recent information. The third and final set of challenges is around archiving data. When systems reach capacity as larger volumes of diverse data is leveraged within an organization, this causes IT professionals to archive or delete data that has been deemed “invaluable”. When data is moved offline to an archive, this significantly reduces the return on the data and can hurt the business. This data can be extremely important as analyst attempt to find patterns in historic data but can’t access this information because it’s offline. However, as the external and internal data environment has changed over the years so has the data management space.
  5. These challenges specifically arise around data storage and processing. The first challenge is limited data access. Collecting and ingesting a wide variety of diverse data is not a simple task and usually results in additional systems, or capacity being added to the architecture. As the business continues to ask for more data this continues to put strains on IT. In order to avoid these challenges only the most valuable data is brought in, limiting the businesses access to data that could be extremely valuable. The second challenges that we see organizations try to hurdle is around processing data volumes. These organizations have already collected and operationalized large volumes of data and need to process this data efficiently in order to meet SLAs. If data doesn’t reach the employees in a timely manner then they continue on without the most recent information. The third and final set of challenges is around archiving data. When systems reach capacity as larger volumes of diverse data is leveraged within an organization, this causes IT professionals to archive or delete data that has been deemed “invaluable”. When data is moved offline to an archive, this significantly reduces the return on the data and can hurt the business. This data can be extremely important as analyst attempt to find patterns in historic data but can’t access this information because it’s offline. However, as the external and internal data environment has changed over the years so has the data management space.
  6. We have been working closely with leading organizations to create a platform that allows them to complement their current architecture in order to avoid these common challenges. This in turn prepares for future growth of data within their organizations. Ingest More Data- Cloudera allows you to collect and ingest any data type or volume of data, in full fidelity, in order to allow for complete data access to your current systems and end users. This has allowed organizations to collect and access more diverse data, opening up the possibilities of what data can do for the business, without compromising system performance or existing resource constraints. Efficiently Process & Store Data Volumes- By offloading heavy processing workloads to Cloudera, organizations are able to use parallel processing in order to significantly reduce processing time on large volumes of data. With the scalable nature of Cloudera, you also ensure that no matter how much data is stored the platform continues to perform at peak performance. Automated Secure Archive- Leveraging Cloudera as an ODS and using it as a centralized staging environment for new data allows you to automatically create a secure archive. Because of the platform’s scalable nature, there is never a reason to archive your data. Historic data can remain on the platform for analysts allowing them complete access without derogating system performance. While smaller volumes of already defined active data can run directly into the right systems, with outdated data being offloaded to Cloudera. Leading data organizations have already seen these benefits.
  7. We have been working closely with leading organizations to create a platform that allows them to complement their current architecture in order to avoid these common challenges. This in turn prepares for future growth of data within their organizations. Ingest More Data- Cloudera allows you to collect and ingest any data type or volume of data, in full fidelity, in order to allow for complete data access to your current systems and end users. This has allowed organizations to collect and access more diverse data, opening up the possibilities of what data can do for the business, without compromising system performance or existing resource constraints. Efficiently Process & Store Data Volumes- By offloading heavy processing workloads to Cloudera, organizations are able to use parallel processing in order to significantly reduce processing time on large volumes of data. With the scalable nature of Cloudera, you also ensure that no matter how much data is stored the platform continues to perform at peak performance. Automated Secure Archive- Leveraging Cloudera as an ODS and using it as a centralized staging environment for new data allows you to automatically create a secure archive. Because of the platform’s scalable nature, there is never a reason to archive your data. Historic data can remain on the platform for analysts allowing them complete access without derogating system performance. While smaller volumes of already defined active data can run directly into the right systems, with outdated data being offloaded to Cloudera. Leading data organizations have already seen these benefits.
  8. We have been working closely with leading organizations to create a platform that allows them to complement their current architecture in order to avoid these common challenges. This in turn prepares for future growth of data within their organizations. Ingest More Data- Cloudera allows you to collect and ingest any data type or volume of data, in full fidelity, in order to allow for complete data access to your current systems and end users. This has allowed organizations to collect and access more diverse data, opening up the possibilities of what data can do for the business, without compromising system performance or existing resource constraints. Efficiently Process & Store Data Volumes- By offloading heavy processing workloads to Cloudera, organizations are able to use parallel processing in order to significantly reduce processing time on large volumes of data. With the scalable nature of Cloudera, you also ensure that no matter how much data is stored the platform continues to perform at peak performance. Automated Secure Archive- Leveraging Cloudera as an ODS and using it as a centralized staging environment for new data allows you to automatically create a secure archive. Because of the platform’s scalable nature, there is never a reason to archive your data. Historic data can remain on the platform for analysts allowing them complete access without derogating system performance. While smaller volumes of already defined active data can run directly into the right systems, with outdated data being offloaded to Cloudera. Leading data organizations have already seen these benefits.
  9. Arrow from batch to stream processing