SlideShare ist ein Scribd-Unternehmen logo
1 von 38
Smarter Management for Your Data Growth  Retain Critical Data Online At A Fraction of The Cost April 2011
Introductions Changing Data Management Landscape & Trends From Operational to Analytical  Cloud and Hadoop Where do They Fit? RainStor and How it Works Analytics Data Retention Use-case Economics Q&A Matt Aslett, The 451 Group Deirdre Mahon, VP Marketing – RainStor Ramon Chen, VP Product Management - RainStor Agenda
Total Data The changing data management landscape Matthew Aslett, The 451 Group matthew.aslett@the451group.com © 2011 by The 451 Group. All rights reserved
451 Research is focused on the business of enterprise IT innovation. The company’s analysts provide critical and timely insight into the competitive dynamics of innovation in emerging technology segments. The 451 Group Tier1 Research is a single-source research and advisory firm covering the multi-tenant datacenter, hosting, IT and cloud-computing sectors, blending the best of industry and financial research.  The Uptime Institute is ‘The Global Data Center Authority’ and a pioneer in the creation and facilitation of end-user knowledge communities to improve reliability and uninterruptible availability in datacenter facilities. TheInfoPro is a leading IT advisory and research firm that provides real-world perspectives on the customer and market dynamics of the enterprise information technology landscape, harnessing the collective knowledge and insight of leading IT organizations worldwide. ChangeWave Research is a research firm that identifies and quantifies ‘change’ in consumer spending behavior, corporate purchasing, and industry, company and technology trends.
Overview The changing data management landscape One overarching trend: Total Data Impacting four technology areas: Operational database Analytic database Data archiving Machine-generated data The trends driving data management 5
Trends driving data management The volume, variety and velocity of data has never been greater and is growing The value of data has never been better understood The capabilities for processing data have never been better Higher processor performance and density are enabling advanced processing on commodity hardware Software enhancements designed to make best use of processing performance and scalable architecture Advanced and in-database analytics bring processing to the data, reducing latency and improving efficiency The data deluge problem is also a big data opportunity 6
Introducing Total Data A concept define by The 451 Group to describe new approaches to data management – beyond restrictive silos Reflects the changing data management landscape as pragmatic choices are being made about data storage and analysis techniques Processing any data that might be applicable to analytics in the operational database, data warehouse, or Hadoop, or archive Structured, semi-structured or unstructured  Relational or non-relational, on-premise or in the cloud Inspired by ‘Total Football’ 7
Total Football meets Total Data “You make space, you come into space. And if the ball doesn’t come, you leave this space and another player will come into it.” BernadusHulshoff, Ajax  1966-77 Abandonment of restrictive (self-imposed) rules about individual roles and responsibility Enabled and relied on fluidity and flexibility to respond to changing requirements Reliant on, and exploited, improved performance levels  8
Reporting/BI Data management – in theory 9 ,[object Object]
The relational database is sacrosanct
The enterprise data warehouse is the single source of the truth (or is supposed to be)
Offline data archiving
Infrastructure primarily exists to support the data/application layerEnterprise app Operationaldatabase Data cleansing/sampling/MDM EDW Data archive Infrastructure
Data management – in practice 10 ,[object Object]
Distributed data layer to meet the scalability and performance demands
New opportunities for real-time BI
Polyglot persistence – use the most appropriate data storage for the applicationEnterprise app Reporting/BI Reporting/BI Distributed data Data cleansing/sampling/MDM Operational database Operational database Operational database Operational database EDW Data archive Infrastructure
Data management – in practice 11 ,[object Object]
Data is copied into departmental or regional data marts
Data warehouse administrators are fighting a losing battle for controlEnterprise app Reporting/BI Reporting/BI Reporting Reporting Reporting Distributed data Data cleansing/sampling/MDM Operational database Operational database Operational database Operational database Analytic database Analytic database Analyticdatabase EDW Data archive Infrastructure
Data management – in practice 12 ,[object Object]
Advanced in-database analytics bring processing to the data, reducing latency and improving efficiencyEnterprise app Reporting/BI Reporting/BI Reporting Reporting Reporting Distributed data Data cleansing/sampling/MDM Operational database Operational database Operational database Operational database Analytic database Analytic database Analyticdatabase EDW Data archive Infrastructure
Data management – in practice 13 ,[object Object]
Taking further advantage of hardware economicsEnterprise app Reporting/BI Reporting/BI Reporting/BI Reporting Reporting Reporting Distributed data Data cleansing/sampling/MDM Hadoop Operational database Operational database Operational database Operational database Analytic database Analytic database Analyticdatabase EDW Data archive Infrastructure
Data management – in practice 14 ,[object Object]
Greater acceptance that the EDW is part of a broader data analytics architectureEnterprise app Reporting/BI Reporting/BI Reporting/BI Reporting Reporting Reporting Distributed data Data cleansing/sampling/MDM Hadoop Operational database Operational database Operational database Operational database Analytic database Analytic database Analyticdatabase EDW Data archive Infrastructure
Data location, data location, data location Not the end of the EDW, but the EDW is one of many sources of BI, rather than the only source of BI  The issue of data location becomes paramount Choose the right storage technology – software and hardware EDW, Hadoop or archive On-premise or on the cloud Memory, disk or SSD Understand the requirements: Value and temperature of the data Ensure data can be queried using existing tools/skills Cost 15
EDW requirements/characteristics High performance query/analysis response Ability to support multiple users concurrently Capacity for multi-terabyte storage and scale Fast data load and staging for data transformation Ability to operate with BI/analytics tools Security and governance Cost - $20k-$50k per TB Alternatives Do nothing and suffer the consequences  Deploy appliances and/or Hadoop for specific use-cases Offload to an online repository 16
Data management – in practice 17 ,[object Object]
Traditionally, data archived for legal requirements
Previously little need for querying/analyticsEnterprise app Reporting/BI Reporting/BI Reporting/BI Reporting Reporting Reporting Distributed data Data cleansing/sampling/MDM Hadoop Operational database Operational database Operational database Operational database Analytic database Analytic database Analyticdatabase EDW Data archive Infrastructure
Data management – in practice 18 ,[object Object]
Focus shifts on to how to enable querying easily and cost effectively
Becomes an online repository for historical dataEnterprise app Reporting/BI Reporting/BI Reporting/BI Reporting Reporting Reporting Reporting Distributed data Data cleansing/sampling/MDM Hadoop Operational database Operational database Operational database Operational database Analytic database Analytic database Analyticdatabase EDW Data repository Infrastructure
Data management – in practice 19 ,[object Object]
“Machine generated data” an untapped source of dataEnterprise app Reporting/BI Reporting/BI Reporting/BI Reporting Reporting Reporting Reporting Distributed data Data cleansing/sampling/MDM Hadoop Operational database Operational database Operational database Operational database Analytic database Analytic database Analyticdatabase EDW Data repository Infrastructure
Data management – in practice 20 ,[object Object]
Likely to transform into data-generating and data-processing infrastructure as analytics capabilities are applied directly to the data sourceEnterprise app Reporting/BI Reporting/BI Reporting/BI Reporting/BI Reporting Reporting Reporting Reporting Distributed data Data cleansing/sampling/MDM Hadoop Operational database Operational database Operational database Operational database Analytic database Analytic database Analyticdatabase EDW Data repository Datastructure

Weitere ähnliche Inhalte

Was ist angesagt?

6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop
Dr. Wilfred Lin (Ph.D.)
 
Hitachi white-paper-storage-virtualization
Hitachi white-paper-storage-virtualizationHitachi white-paper-storage-virtualization
Hitachi white-paper-storage-virtualization
Hitachi Vantara
 
Hadoop: Making it work for the Business Unit
Hadoop: Making it work for the Business UnitHadoop: Making it work for the Business Unit
Hadoop: Making it work for the Business Unit
DataWorks Summit
 
Hds ucp sap hana infographic v6[1]
Hds ucp  sap hana infographic v6[1]Hds ucp  sap hana infographic v6[1]
Hds ucp sap hana infographic v6[1]
Barbara Götz
 
Hitachi Cloud Solutions Profile
Hitachi Cloud Solutions Profile Hitachi Cloud Solutions Profile
Hitachi Cloud Solutions Profile
Hitachi Vantara
 
Data warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-clouderaData warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-cloudera
Jyrki Määttä
 

Was ist angesagt? (20)

6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop
 
Enabling Cloud Data Integration (EMEA)
Enabling Cloud Data Integration (EMEA)Enabling Cloud Data Integration (EMEA)
Enabling Cloud Data Integration (EMEA)
 
The Emerging Data Lake IT Strategy
The Emerging Data Lake IT StrategyThe Emerging Data Lake IT Strategy
The Emerging Data Lake IT Strategy
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Hitachi white-paper-storage-virtualization
Hitachi white-paper-storage-virtualizationHitachi white-paper-storage-virtualization
Hitachi white-paper-storage-virtualization
 
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
 
From Traditional Data Warehouse To Real Time Data Warehouse
From Traditional Data Warehouse To Real Time Data WarehouseFrom Traditional Data Warehouse To Real Time Data Warehouse
From Traditional Data Warehouse To Real Time Data Warehouse
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and Visualization
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
 
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data LakesData Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
 
Hadoop: Making it work for the Business Unit
Hadoop: Making it work for the Business UnitHadoop: Making it work for the Business Unit
Hadoop: Making it work for the Business Unit
 
Hadoop 2015: what we larned -Think Big, A Teradata Company
Hadoop 2015: what we larned -Think Big, A Teradata CompanyHadoop 2015: what we larned -Think Big, A Teradata Company
Hadoop 2015: what we larned -Think Big, A Teradata Company
 
Hds ucp sap hana infographic v6[1]
Hds ucp  sap hana infographic v6[1]Hds ucp  sap hana infographic v6[1]
Hds ucp sap hana infographic v6[1]
 
Setting Up the Data Lake
Setting Up the Data LakeSetting Up the Data Lake
Setting Up the Data Lake
 
Hitachi Cloud Solutions Profile
Hitachi Cloud Solutions Profile Hitachi Cloud Solutions Profile
Hitachi Cloud Solutions Profile
 
DW 101
DW 101DW 101
DW 101
 
Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lon
 
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
 
Data warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-clouderaData warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-cloudera
 

Ähnlich wie Smarter Management for Your Data Growth

Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
work
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Pentaho
 

Ähnlich wie Smarter Management for Your Data Growth (20)

ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
 
Exploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisExploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis Kapsalis
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefits
 
Got data?… now what? An introduction to modern data platforms
Got data?… now what?  An introduction to modern data platformsGot data?… now what?  An introduction to modern data platforms
Got data?… now what? An introduction to modern data platforms
 
Data wirehouse
Data wirehouseData wirehouse
Data wirehouse
 
Benefits of a data lake
Benefits of a data lake Benefits of a data lake
Benefits of a data lake
 
Big data rmoug
Big data rmougBig data rmoug
Big data rmoug
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big Data
 
Enterprise Archiving with Apache Hadoop Featuring the 2015 Gartner Magic Quad...
Enterprise Archiving with Apache Hadoop Featuring the 2015 Gartner Magic Quad...Enterprise Archiving with Apache Hadoop Featuring the 2015 Gartner Magic Quad...
Enterprise Archiving with Apache Hadoop Featuring the 2015 Gartner Magic Quad...
 
How to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
How to Optimize Sales Analytics Using 10x the Data at 1/10th the CostHow to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
How to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Week 5
Week 5Week 5
Week 5
 
Week 5
Week 5Week 5
Week 5
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)
 
Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks
 
IoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJIoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJ
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
 
Data Ninja Webinar Series: Accelerating Business Value with Data Virtualizati...
Data Ninja Webinar Series: Accelerating Business Value with Data Virtualizati...Data Ninja Webinar Series: Accelerating Business Value with Data Virtualizati...
Data Ninja Webinar Series: Accelerating Business Value with Data Virtualizati...
 

Mehr von RainStor

TDWI Checklist Report: Active Data Archiving
TDWI Checklist Report:  Active Data ArchivingTDWI Checklist Report:  Active Data Archiving
TDWI Checklist Report: Active Data Archiving
RainStor
 

Mehr von RainStor (6)

Archiving is a No-brainer - Bloor Analyst and RainStor Executive Discuss
Archiving is a No-brainer - Bloor Analyst and RainStor Executive DiscussArchiving is a No-brainer - Bloor Analyst and RainStor Executive Discuss
Archiving is a No-brainer - Bloor Analyst and RainStor Executive Discuss
 
Big Data Analytics on Hadoop RainStor Infographic
Big Data Analytics on Hadoop RainStor InfographicBig Data Analytics on Hadoop RainStor Infographic
Big Data Analytics on Hadoop RainStor Infographic
 
TDWI Checklist Report: Active Data Archiving
TDWI Checklist Report:  Active Data ArchivingTDWI Checklist Report:  Active Data Archiving
TDWI Checklist Report: Active Data Archiving
 
Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your M...
Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your M...Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your M...
Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your M...
 
Big Data Retention Opportunity or Burden? - Featuring Merv Adrian July 2010
Big Data Retention Opportunity or Burden? - Featuring Merv Adrian July 2010Big Data Retention Opportunity or Burden? - Featuring Merv Adrian July 2010
Big Data Retention Opportunity or Burden? - Featuring Merv Adrian July 2010
 
RainStor 3.5 Overview
RainStor 3.5 OverviewRainStor 3.5 Overview
RainStor 3.5 Overview
 

Kürzlich hochgeladen

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Kürzlich hochgeladen (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 

Smarter Management for Your Data Growth

  • 1. Smarter Management for Your Data Growth Retain Critical Data Online At A Fraction of The Cost April 2011
  • 2. Introductions Changing Data Management Landscape & Trends From Operational to Analytical Cloud and Hadoop Where do They Fit? RainStor and How it Works Analytics Data Retention Use-case Economics Q&A Matt Aslett, The 451 Group Deirdre Mahon, VP Marketing – RainStor Ramon Chen, VP Product Management - RainStor Agenda
  • 3. Total Data The changing data management landscape Matthew Aslett, The 451 Group matthew.aslett@the451group.com © 2011 by The 451 Group. All rights reserved
  • 4. 451 Research is focused on the business of enterprise IT innovation. The company’s analysts provide critical and timely insight into the competitive dynamics of innovation in emerging technology segments. The 451 Group Tier1 Research is a single-source research and advisory firm covering the multi-tenant datacenter, hosting, IT and cloud-computing sectors, blending the best of industry and financial research. The Uptime Institute is ‘The Global Data Center Authority’ and a pioneer in the creation and facilitation of end-user knowledge communities to improve reliability and uninterruptible availability in datacenter facilities. TheInfoPro is a leading IT advisory and research firm that provides real-world perspectives on the customer and market dynamics of the enterprise information technology landscape, harnessing the collective knowledge and insight of leading IT organizations worldwide. ChangeWave Research is a research firm that identifies and quantifies ‘change’ in consumer spending behavior, corporate purchasing, and industry, company and technology trends.
  • 5. Overview The changing data management landscape One overarching trend: Total Data Impacting four technology areas: Operational database Analytic database Data archiving Machine-generated data The trends driving data management 5
  • 6. Trends driving data management The volume, variety and velocity of data has never been greater and is growing The value of data has never been better understood The capabilities for processing data have never been better Higher processor performance and density are enabling advanced processing on commodity hardware Software enhancements designed to make best use of processing performance and scalable architecture Advanced and in-database analytics bring processing to the data, reducing latency and improving efficiency The data deluge problem is also a big data opportunity 6
  • 7. Introducing Total Data A concept define by The 451 Group to describe new approaches to data management – beyond restrictive silos Reflects the changing data management landscape as pragmatic choices are being made about data storage and analysis techniques Processing any data that might be applicable to analytics in the operational database, data warehouse, or Hadoop, or archive Structured, semi-structured or unstructured Relational or non-relational, on-premise or in the cloud Inspired by ‘Total Football’ 7
  • 8. Total Football meets Total Data “You make space, you come into space. And if the ball doesn’t come, you leave this space and another player will come into it.” BernadusHulshoff, Ajax 1966-77 Abandonment of restrictive (self-imposed) rules about individual roles and responsibility Enabled and relied on fluidity and flexibility to respond to changing requirements Reliant on, and exploited, improved performance levels 8
  • 9.
  • 10. The relational database is sacrosanct
  • 11. The enterprise data warehouse is the single source of the truth (or is supposed to be)
  • 13. Infrastructure primarily exists to support the data/application layerEnterprise app Operationaldatabase Data cleansing/sampling/MDM EDW Data archive Infrastructure
  • 14.
  • 15. Distributed data layer to meet the scalability and performance demands
  • 16. New opportunities for real-time BI
  • 17. Polyglot persistence – use the most appropriate data storage for the applicationEnterprise app Reporting/BI Reporting/BI Distributed data Data cleansing/sampling/MDM Operational database Operational database Operational database Operational database EDW Data archive Infrastructure
  • 18.
  • 19. Data is copied into departmental or regional data marts
  • 20. Data warehouse administrators are fighting a losing battle for controlEnterprise app Reporting/BI Reporting/BI Reporting Reporting Reporting Distributed data Data cleansing/sampling/MDM Operational database Operational database Operational database Operational database Analytic database Analytic database Analyticdatabase EDW Data archive Infrastructure
  • 21.
  • 22. Advanced in-database analytics bring processing to the data, reducing latency and improving efficiencyEnterprise app Reporting/BI Reporting/BI Reporting Reporting Reporting Distributed data Data cleansing/sampling/MDM Operational database Operational database Operational database Operational database Analytic database Analytic database Analyticdatabase EDW Data archive Infrastructure
  • 23.
  • 24. Taking further advantage of hardware economicsEnterprise app Reporting/BI Reporting/BI Reporting/BI Reporting Reporting Reporting Distributed data Data cleansing/sampling/MDM Hadoop Operational database Operational database Operational database Operational database Analytic database Analytic database Analyticdatabase EDW Data archive Infrastructure
  • 25.
  • 26. Greater acceptance that the EDW is part of a broader data analytics architectureEnterprise app Reporting/BI Reporting/BI Reporting/BI Reporting Reporting Reporting Distributed data Data cleansing/sampling/MDM Hadoop Operational database Operational database Operational database Operational database Analytic database Analytic database Analyticdatabase EDW Data archive Infrastructure
  • 27. Data location, data location, data location Not the end of the EDW, but the EDW is one of many sources of BI, rather than the only source of BI The issue of data location becomes paramount Choose the right storage technology – software and hardware EDW, Hadoop or archive On-premise or on the cloud Memory, disk or SSD Understand the requirements: Value and temperature of the data Ensure data can be queried using existing tools/skills Cost 15
  • 28. EDW requirements/characteristics High performance query/analysis response Ability to support multiple users concurrently Capacity for multi-terabyte storage and scale Fast data load and staging for data transformation Ability to operate with BI/analytics tools Security and governance Cost - $20k-$50k per TB Alternatives Do nothing and suffer the consequences Deploy appliances and/or Hadoop for specific use-cases Offload to an online repository 16
  • 29.
  • 30. Traditionally, data archived for legal requirements
  • 31. Previously little need for querying/analyticsEnterprise app Reporting/BI Reporting/BI Reporting/BI Reporting Reporting Reporting Distributed data Data cleansing/sampling/MDM Hadoop Operational database Operational database Operational database Operational database Analytic database Analytic database Analyticdatabase EDW Data archive Infrastructure
  • 32.
  • 33. Focus shifts on to how to enable querying easily and cost effectively
  • 34. Becomes an online repository for historical dataEnterprise app Reporting/BI Reporting/BI Reporting/BI Reporting Reporting Reporting Reporting Distributed data Data cleansing/sampling/MDM Hadoop Operational database Operational database Operational database Operational database Analytic database Analytic database Analyticdatabase EDW Data repository Infrastructure
  • 35.
  • 36. “Machine generated data” an untapped source of dataEnterprise app Reporting/BI Reporting/BI Reporting/BI Reporting Reporting Reporting Reporting Distributed data Data cleansing/sampling/MDM Hadoop Operational database Operational database Operational database Operational database Analytic database Analytic database Analyticdatabase EDW Data repository Infrastructure
  • 37.
  • 38. Likely to transform into data-generating and data-processing infrastructure as analytics capabilities are applied directly to the data sourceEnterprise app Reporting/BI Reporting/BI Reporting/BI Reporting/BI Reporting Reporting Reporting Reporting Distributed data Data cleansing/sampling/MDM Hadoop Operational database Operational database Operational database Operational database Analytic database Analytic database Analyticdatabase EDW Data repository Datastructure
  • 39.
  • 40.
  • 41. Greater opportunities for business intelligenceEnterprise app Hadoop/DW Data archive Analytic DB Reporting/BI Reporting/BI Reporting/BI Reporting/BI Reporting/BI Reporting Reporting Reporting Reporting Reporting Reporting Reporting Distributed data Data cleansing/sampling/MDM Hadoop Operational database Operational database Operational database Operational database Analytic database Analytic database Analyticdatabase Analytic database Analytic database Analyticdatabase EDW Cloud Infrastructure Data repository Datastructure
  • 42. Data location, data location, data location Avoid data movement and duplication – retain governance Virtual data marts and data clouds Data virtualization to provide access to multiple data sources 23
  • 43. Data virtualization 24 Enterprise app Hadoop/DW Data archive Analytic DB Reporting/BI Reporting/BI Reporting/BI Reporting/BI Reporting/BI Reporting Reporting Reporting Reporting Reporting Reporting Reporting Distributed data Data cleansing/sampling/MDM Hadoop Operational database Operational database Operational database Operational database Analytic database Analytic database Analyticdatabase Analytic database Analytic database Analyticdatabase EDW Cloud Infrastructure Data repository Datastructure
  • 44. Data virtualization 25 Enterprise app Analytic DB Hadoop/DW Data archive Reporting/BI Reporting/BI Reporting Reporting Reporting Reporting Reporting Reporting Reporting Reporting Distributed data Datavirtualization Data cleansing/sampling/MDM Hadoop Operational database Operational database Operational database Operational database Virtualdata mart Virtualdata mart Virtualdata mart Virtualdata mart Virtualdata mart Virtualdata mart EDW Cloud Infrastructure Data repository Datastructure
  • 45. Who is RainStor? Specialized database for cost effective reduction, retention & on-demand retrieval of historical structured data At 10x Less Cost OEM Partner Model Cloud or On-premise
  • 46.
  • 47. Solution : Message (SMS/MMS) and traffic log management
  • 48. Retaining 1000s of messages a second while keeping accessible for regulatory purposes
  • 50. Solution : Teradata Data Retention Machine
  • 51. Retain BI & Analytical data long term in RainStor powered Data Retention Machine for low cost per TB stored. Eliminating tape.
  • 53. Solution : Information Lifecycle Management
  • 54.
  • 55. Where RainStor Fits Enterpriseapp Hadoop/DW Data archive Analytic DB Application Archive / Retired Reporting/BI Reporting/BI Reporting/BI Reporting/BI Reporting/BI Reporting Reporting Reporting Reporting Reporting Reporting Reporting Distributed data Data cleansing/sampling/MDM Hadoop Operational database Operational database Operational database Operational database Analytic database Analytic database Analyticdatabase Analytic database Analytic database Analyticdatabase EDW Cloud Infrastructure Data repository Datastructure
  • 56.
  • 57.
  • 59. ISSBig Data Volumes - Needs to be online & Query-able Found the needle – where’s the haystack? Volumes are rising- Regulated - Infrastructure needs - Reaching Telco-scale Multi- billions of records Strict Compliance RDBMS’s Break Analytics Required 10’s of Petabytes Retained
  • 60. How Does RainStor Do It? Reduce SIZE: Massive de-dupe ~97% savings in storage HARDWARE: On commodity server/disk infrastructure RESOURCES: Without specialist DBA support Retain PRESERVED: Massive record volumes in original form IMMUTABLE: Tamper proofed with audit trail CONFIGURABLE: With retention & expiry policies Retrieve STANDARDS: SQL & BI tools via ODBC/JDBC PERFORMANT: Fast queries for large complex data sets FLEXIBLE: With schema evolution & point-in-time access
  • 61.
  • 62. Data Reduction through value and pattern de-duplication
  • 63. Further Algorithmic-level and byte-level compression
  • 64. Fast Queries in stored format without re-inflation.Smith Pharma Peter $40,000 Pharma Smith $40,000 Peter Finance Paul $35,000 Pharma Smith $40,000 Peter Finance Paul Brown $35,000 John
  • 65.
  • 66. Run query on RainStor and import results to data warehouse
  • 67.
  • 70. Add more data sources for broader analysis50 Quarters Source DB e.g. Oracle Analytics/DW 5 Quarters
  • 71. RainStor Cloud 2. Encrypted data stored in private containers ensuring security and easy management. 1. Compressed de-duplicated data sent to the cloud resulting in quicker and cheaper uploads. VM Software Appliance Amazon Send S3 Search EC2 ODBC/JDBC Store 3. Data accessed on demand using standard SQL tools leveraging elasticity of the cloud
  • 72. How Do the Economics Stack Up?
  • 73. Quick summary The growing volume, variety and velocity of data is a problem, but it is also an opportunity Requires a broader approach to data management Deploy appliances and Hadoop for specific use-cases, and online repository for historical data ‘Datastructure’ will become increasingly valuable, not only as a source of data but also as a source of intelligence Data location, and the role of data virtualization will come into greater focus 36
  • 74. Q&A

Hinweis der Redaktion

  1. De-dupe & ReductionAny storage / PlatformCloud EnabledLimitless Data VolumesFast load – Ingestion RatesSQL Query – High PerformanceImmutable Compliant Store
  2. So if we take a look at Matt’s earlier high level architecture diagram, I think its worth pointing out the key areas RainStor technology can be applied – at the top, we have a RS repository which can be deployed alongside the RDBMS … and can be archived / retired saving by compressing the data to a much smaller footprint. Our INFA partnership focuses on this area predominantly and retires a large number of applications such as Oracle ebusiness suite… On the lower part of the screen – RS can be deployed as the leading repository to store long term historical data for EDW’s and additionally the same data sets can be stored on the cloud…
  3. Security Industry:The combination of the increase in cybercrime, changing regulations, and public exposures is increasing the attention and resources dedicated to data security. Over the next three years it's expected that data security issues (and the related application security) will account for over 60% of new enterprise security spending- this includes spending on new technologies, and excludes maintenance of existing technologies such as firewalls and antivirus, which account for most current security costs.Data and business application security will drive most of the new growth of the security market over the next 3-5 years.Business network traffic for 2010 > 3,800 Pb / month> 2,500 Pb internet traffic > 1,200 Pb WAN traffic > 58 Pb mobile trafficCisco forecasts 20% CAGRData breaches are common - 95% of records stolen externally - 90% involved malware - 70% were uncovered by outsiders - 50% went unnoticed for monthsCSPs: Global mobile data traffic will increase 26-fold between 2010 and 2015. Mobile data traffic will grow at acompound annual growth rate (CAGR) of 92 percent from 2010 to 2015, reaching 6.3 exabytes per month by 2015.Last year’s mobile data traffic was three times the size of the entire global Internet in 2000.