SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Hadoop Summit - 2014
Cost of Ownership for Hadoop
Implementation
Santosh Jha,
Steve Ackley
Part 1 – Estimating TCO
Iceberg
Estimating TCO is hard.
Like an iceberg, many
costs are hidden.
Example :
integration of Big Data
within the existing
ecosystem.
Hadoop Implementations
Hadoop deployment methods
Sample Vendors
Hortonworks IBM, EMC AWS EMR
Cloudera Oracle, Teradata Rackspace Altiscale
MAPR VMware Gogrid Quoble
On Premise
Hadoop
Appliance
Hadoop
Hosting
Hadoop as a
service
Bare Metal Cloud
On-Premise Cost Categories
Cost Group Item
Hardware/Infrastructure Costs Servers , Peripherals, Network
Storage
Communication Costs Local Area Network , Wide Area Network
Remote Access
Software Costs License/Subscription Fees
Implementation Costs Development/customization/integration
Training , Consulting , Non Functional
Testing(Performance, Capacity, Security etc.)
Management Costs Hardware & software upgrades , Hardware &
software administration, Legal Cost
Support Costs Support staff, Staff training, Travel, Support
contracts, Overhead labor, High Availability Cost
Disaster Recovery Cost, Ticketing & Trouble
Shooting Cost, Monitoring Cost, Internal Audit Cost
Managing Risk
Cost Group Item
Vendor Vendor Viability
Control on Technical Architecture
Data Protection
Loss of Intellectual Property
Loss of Privacy
Internal IT Vendor Viability
Control on Technical Architecture
Data Protection
Loss of Intellectual Property
Loss of Privacy
Sample calculation
Inputs
Average Monthly HDFS (TB) 1500
Peak HDFS over Monthly (TB) 100
Monthly HDFS Growth (TB) 20
Average Monthly Compute ('000 SH) 20
Peak Compute (SH) 1400
Planning Cycle (Months) 36
Purchased Distribution No
Hadoop Admin Costs Included
Data from S3 Yes
Results without considering risk
0
1,000,000
2,000,000
3,000,000
4,000,000
5,000,000
6,000,000
7,000,000
8,000,000
Hadoop as a
service
On Premise Amazon EMR Hadoop
Distribution
on EC2
Cost over 36 Months
Cost over 36 Months
Managing Risk (Vendor) – Sample data
Managing Risk Risk Factor Weight(%) Calculated Risk
Vendor Viability 2 40 0.8
Control on Technical
Architecture 1 20 0.2
Data Protection 2 15 0.3
Loss of Intellectual
Property 1 10 0.1
Loss of Privacy 2 15 0.3
Total 1.7
Vendor Viability 1 - No Risk, 5 - Very High Risk with vendor viability
Control on Technical Architecture 1 - No Need to Control, 5 - Compelling Need to control technical architecture.
Data Protection 1 - High data protection provided by architecture and process, 5 - No data protection
Loss of Intellectual Property 1 - No IP, 5 - High business impact with the loss of IP
Loss of Privacy 1 - No privacy issue for the solution, 5 - High business impact with loss of Data
Managing Risk (Internal IT – Sample data)
Managing Risk Risk Factor Weight(%) Calculated Risk
Vendor Viability 1 40 0.4
Control on Technical
Architecture 1 20 0.2
Data Protection 2 15 0.3
Loss of Intellectual
Property 1 10 0.1
Loss of Privacy 2 15 0.3
Total 1.3
Vendor Viability 1 - No Risk, 5 - Very High Risk with vendor viability
Control on Technical Architecture 1 - No Need to Control, 5 - Compelling Need to control technical architecture.
Data Protection 1 - High data protection provided by architecture and process, 5 - No data protection
Loss of Intellectual Property 1 - No IP, 5 - High business impact with the loss of IP
Loss of Privacy 1 - No privacy issue for the solution, 5 - High business impact with loss of Data
Results after considering risk
0
2000000
4000000
6000000
8000000
10000000
12000000
14000000
Hadoop as a
service
On Premise Amazon EMR Hadoop
Distribution
on EC2
Cost over 36 Months
Cost over 36 Months
Part 2 - Deployment
Considerations
On-Premise Implementation – When?
• Well-defined use cases with a demonstrated ROI
• Developed and tuned Hadoop applications
• IT team with experience and bandwidth to
manage/maintain Hadoop and integrated
hardware/software stack - as well as troubleshoot job
problems
• Sufficient # of Nodes to Support:
o Growth in Data Sets
o “Bursty” Nature of Jobs
On-Premise Implementation – Company Profile
• Large enterprise with a strategic need for Big Data
Analytics
• Moved from an exploratory stage to enterprise
adoption
• Committed IT resources to support Hadoop
hardware/software stack
Hadoop as a Service – The Continuum
• Vendors manage the hardware
• Vendors install hadoop
• Vendors manage hadoop
Vendors Manage The Hardware
For Organizations that:
• Want to create a small cluster for a relatively
short period of time, for training and software
development purposes.
• Have a short-term processing need and no
internal capacity to support it.
• Do not have an IT organization that can install,
manage, maintain and operate the Hadoop
hardware/software stack, and can fix “broken”
jobs.
Vendors Install Hadoop
For Organizations that:
• Have a short-term need or small-scale Hadoop
requirement.
• Have Hadoop applications that are “bursty.”
• Have an IT organization that can operate the
Hadoop hardware/software stack, can manage
scaling the cluster, and can fix “broken” jobs.
• Do not need to tailor the hardware to their
specific requirements.
Vendors Manage Hadoop
For Organizations that:
• Do not have the IT organization that can install,
manage, maintain and operate the Hadoop
hardware/software stack, and fix “broken” jobs.
• Do not have the IT hardware infrastructure that’s
required.
• May need an “always on” Hadoop environment.
• Need service providers that:
• Can handle all aspects of the IT support for Hadoop.
• Can provide comprehensive SLAs.
• May offer hardware optimized for Hadoop.
19
Thank You
Contact :
steve@altiscale.com
Santosh.jha@aziksa.com

Weitere ähnliche Inhalte

Was ist angesagt?

Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
elliando dias
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
Databricks
 
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Simplilearn
 

Was ist angesagt? (20)

A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
 
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
 
03 spark rdd operations
03 spark rdd operations03 spark rdd operations
03 spark rdd operations
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Introduction to apache spark
Introduction to apache spark Introduction to apache spark
Introduction to apache spark
 
Memcached Presentation
Memcached PresentationMemcached Presentation
Memcached Presentation
 
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
 
Hadoop vs Apache Spark
Hadoop vs Apache SparkHadoop vs Apache Spark
Hadoop vs Apache Spark
 
Introduction to HiveQL
Introduction to HiveQLIntroduction to HiveQL
Introduction to HiveQL
 
Spark architecture
Spark architectureSpark architecture
Spark architecture
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
 
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
What Is RDD In Spark? | Edureka
What Is RDD In Spark? | EdurekaWhat Is RDD In Spark? | Edureka
What Is RDD In Spark? | Edureka
 
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenData Discovery at Databricks with Amundsen
Data Discovery at Databricks with Amundsen
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
 
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
 

Andere mochten auch

How to Profit from Factoring 2015
How to Profit from Factoring 2015How to Profit from Factoring 2015
How to Profit from Factoring 2015
Michael Ponomarew
 
Top 10 team coordinator interview questions and answers
Top 10 team coordinator interview questions and answersTop 10 team coordinator interview questions and answers
Top 10 team coordinator interview questions and answers
janritari
 

Andere mochten auch (18)

Hadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluationHadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluation
 
Where to Deploy Hadoop: Bare Metal or Cloud?
Where to Deploy Hadoop: Bare Metal or Cloud? Where to Deploy Hadoop: Bare Metal or Cloud?
Where to Deploy Hadoop: Bare Metal or Cloud?
 
ROI of Big Data Analytics Native on Hadoop
ROI of Big Data Analytics Native on HadoopROI of Big Data Analytics Native on Hadoop
ROI of Big Data Analytics Native on Hadoop
 
How to Profit from Factoring 2015
How to Profit from Factoring 2015How to Profit from Factoring 2015
How to Profit from Factoring 2015
 
Fish Sticks by Stephen C Lundin, John Christensen and Harry Paul
Fish Sticks by Stephen C Lundin, John Christensen and Harry PaulFish Sticks by Stephen C Lundin, John Christensen and Harry Paul
Fish Sticks by Stephen C Lundin, John Christensen and Harry Paul
 
Rate zonal centrifugation and Its applications
Rate zonal centrifugation and Its applicationsRate zonal centrifugation and Its applications
Rate zonal centrifugation and Its applications
 
What is system level analysis
What is system level analysisWhat is system level analysis
What is system level analysis
 
Top 10 team coordinator interview questions and answers
Top 10 team coordinator interview questions and answersTop 10 team coordinator interview questions and answers
Top 10 team coordinator interview questions and answers
 
HW09 Hadoop Vaidya
HW09 Hadoop VaidyaHW09 Hadoop Vaidya
HW09 Hadoop Vaidya
 
Apache Hadoop on Virtual Machines
Apache Hadoop on Virtual MachinesApache Hadoop on Virtual Machines
Apache Hadoop on Virtual Machines
 
Financial aspects of marketing management
Financial aspects of marketing managementFinancial aspects of marketing management
Financial aspects of marketing management
 
Moving From a Selenium Grid to the Cloud - A Real Life Story
Moving From a Selenium Grid to the Cloud - A Real Life StoryMoving From a Selenium Grid to the Cloud - A Real Life Story
Moving From a Selenium Grid to the Cloud - A Real Life Story
 
Progeny LIMS
Progeny LIMSProgeny LIMS
Progeny LIMS
 
Introduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data ApplicationsIntroduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data Applications
 
Getting Past No
Getting Past NoGetting Past No
Getting Past No
 
IT Strategic Planning (Case Studies)
IT Strategic Planning (Case Studies)IT Strategic Planning (Case Studies)
IT Strategic Planning (Case Studies)
 
Matrix Effect
Matrix EffectMatrix Effect
Matrix Effect
 
The purpose and Benefits of setting high standards for your work
The purpose and Benefits of setting high standards for your work The purpose and Benefits of setting high standards for your work
The purpose and Benefits of setting high standards for your work
 

Ähnlich wie Cost of Ownership for Hadoop Implementation

Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lake
EMC
 
Airavaat Technologies October 2013
Airavaat Technologies October 2013Airavaat Technologies October 2013
Airavaat Technologies October 2013
VenkataGiri Puthigai
 

Ähnlich wie Cost of Ownership for Hadoop Implementation (20)

Cloud Native Batch Processing: Beyond the What and How
Cloud Native Batch Processing: Beyond the What and HowCloud Native Batch Processing: Beyond the What and How
Cloud Native Batch Processing: Beyond the What and How
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lake
 
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
 
What it takes to bring Hadoop to a production-ready state
What it takes to bring Hadoop to a production-ready stateWhat it takes to bring Hadoop to a production-ready state
What it takes to bring Hadoop to a production-ready state
 
How Hewlett Packard Enterprise Gets Real with IoT Analytics
How Hewlett Packard Enterprise Gets Real with IoT AnalyticsHow Hewlett Packard Enterprise Gets Real with IoT Analytics
How Hewlett Packard Enterprise Gets Real with IoT Analytics
 
Best Practices for a Complete Postgres Enterprise Architecture Setup
Best Practices for a Complete Postgres Enterprise Architecture SetupBest Practices for a Complete Postgres Enterprise Architecture Setup
Best Practices for a Complete Postgres Enterprise Architecture Setup
 
Big Data: InterConnect 2016 Session on Getting Started with Big Data Analytics
Big Data:  InterConnect 2016 Session on Getting Started with Big Data AnalyticsBig Data:  InterConnect 2016 Session on Getting Started with Big Data Analytics
Big Data: InterConnect 2016 Session on Getting Started with Big Data Analytics
 
Customer value analysis of big data products
Customer value analysis of big data productsCustomer value analysis of big data products
Customer value analysis of big data products
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & Hadoop
 
Oracle Data Protection - 1. část
Oracle Data Protection - 1. částOracle Data Protection - 1. část
Oracle Data Protection - 1. část
 
Contexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to Production
 
Oracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analyticsOracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analytics
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
 
Airavaat Technologies October 2013
Airavaat Technologies October 2013Airavaat Technologies October 2013
Airavaat Technologies October 2013
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
 
Tame Big Data with Oracle Data Integration
Tame Big Data with Oracle Data IntegrationTame Big Data with Oracle Data Integration
Tame Big Data with Oracle Data Integration
 
A #Pink14 Presentation: Optimizing for the #SDDC
A #Pink14 Presentation: Optimizing for the #SDDCA #Pink14 Presentation: Optimizing for the #SDDC
A #Pink14 Presentation: Optimizing for the #SDDC
 
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyEnterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
 

Mehr von DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Kürzlich hochgeladen

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Kürzlich hochgeladen (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 

Cost of Ownership for Hadoop Implementation

  • 1. Hadoop Summit - 2014 Cost of Ownership for Hadoop Implementation Santosh Jha, Steve Ackley
  • 2. Part 1 – Estimating TCO
  • 3. Iceberg Estimating TCO is hard. Like an iceberg, many costs are hidden. Example : integration of Big Data within the existing ecosystem.
  • 4. Hadoop Implementations Hadoop deployment methods Sample Vendors Hortonworks IBM, EMC AWS EMR Cloudera Oracle, Teradata Rackspace Altiscale MAPR VMware Gogrid Quoble On Premise Hadoop Appliance Hadoop Hosting Hadoop as a service Bare Metal Cloud
  • 5. On-Premise Cost Categories Cost Group Item Hardware/Infrastructure Costs Servers , Peripherals, Network Storage Communication Costs Local Area Network , Wide Area Network Remote Access Software Costs License/Subscription Fees Implementation Costs Development/customization/integration Training , Consulting , Non Functional Testing(Performance, Capacity, Security etc.) Management Costs Hardware & software upgrades , Hardware & software administration, Legal Cost Support Costs Support staff, Staff training, Travel, Support contracts, Overhead labor, High Availability Cost Disaster Recovery Cost, Ticketing & Trouble Shooting Cost, Monitoring Cost, Internal Audit Cost
  • 6. Managing Risk Cost Group Item Vendor Vendor Viability Control on Technical Architecture Data Protection Loss of Intellectual Property Loss of Privacy Internal IT Vendor Viability Control on Technical Architecture Data Protection Loss of Intellectual Property Loss of Privacy
  • 7. Sample calculation Inputs Average Monthly HDFS (TB) 1500 Peak HDFS over Monthly (TB) 100 Monthly HDFS Growth (TB) 20 Average Monthly Compute ('000 SH) 20 Peak Compute (SH) 1400 Planning Cycle (Months) 36 Purchased Distribution No Hadoop Admin Costs Included Data from S3 Yes
  • 8. Results without considering risk 0 1,000,000 2,000,000 3,000,000 4,000,000 5,000,000 6,000,000 7,000,000 8,000,000 Hadoop as a service On Premise Amazon EMR Hadoop Distribution on EC2 Cost over 36 Months Cost over 36 Months
  • 9. Managing Risk (Vendor) – Sample data Managing Risk Risk Factor Weight(%) Calculated Risk Vendor Viability 2 40 0.8 Control on Technical Architecture 1 20 0.2 Data Protection 2 15 0.3 Loss of Intellectual Property 1 10 0.1 Loss of Privacy 2 15 0.3 Total 1.7 Vendor Viability 1 - No Risk, 5 - Very High Risk with vendor viability Control on Technical Architecture 1 - No Need to Control, 5 - Compelling Need to control technical architecture. Data Protection 1 - High data protection provided by architecture and process, 5 - No data protection Loss of Intellectual Property 1 - No IP, 5 - High business impact with the loss of IP Loss of Privacy 1 - No privacy issue for the solution, 5 - High business impact with loss of Data
  • 10. Managing Risk (Internal IT – Sample data) Managing Risk Risk Factor Weight(%) Calculated Risk Vendor Viability 1 40 0.4 Control on Technical Architecture 1 20 0.2 Data Protection 2 15 0.3 Loss of Intellectual Property 1 10 0.1 Loss of Privacy 2 15 0.3 Total 1.3 Vendor Viability 1 - No Risk, 5 - Very High Risk with vendor viability Control on Technical Architecture 1 - No Need to Control, 5 - Compelling Need to control technical architecture. Data Protection 1 - High data protection provided by architecture and process, 5 - No data protection Loss of Intellectual Property 1 - No IP, 5 - High business impact with the loss of IP Loss of Privacy 1 - No privacy issue for the solution, 5 - High business impact with loss of Data
  • 11. Results after considering risk 0 2000000 4000000 6000000 8000000 10000000 12000000 14000000 Hadoop as a service On Premise Amazon EMR Hadoop Distribution on EC2 Cost over 36 Months Cost over 36 Months
  • 12. Part 2 - Deployment Considerations
  • 13. On-Premise Implementation – When? • Well-defined use cases with a demonstrated ROI • Developed and tuned Hadoop applications • IT team with experience and bandwidth to manage/maintain Hadoop and integrated hardware/software stack - as well as troubleshoot job problems • Sufficient # of Nodes to Support: o Growth in Data Sets o “Bursty” Nature of Jobs
  • 14. On-Premise Implementation – Company Profile • Large enterprise with a strategic need for Big Data Analytics • Moved from an exploratory stage to enterprise adoption • Committed IT resources to support Hadoop hardware/software stack
  • 15. Hadoop as a Service – The Continuum • Vendors manage the hardware • Vendors install hadoop • Vendors manage hadoop
  • 16. Vendors Manage The Hardware For Organizations that: • Want to create a small cluster for a relatively short period of time, for training and software development purposes. • Have a short-term processing need and no internal capacity to support it. • Do not have an IT organization that can install, manage, maintain and operate the Hadoop hardware/software stack, and can fix “broken” jobs.
  • 17. Vendors Install Hadoop For Organizations that: • Have a short-term need or small-scale Hadoop requirement. • Have Hadoop applications that are “bursty.” • Have an IT organization that can operate the Hadoop hardware/software stack, can manage scaling the cluster, and can fix “broken” jobs. • Do not need to tailor the hardware to their specific requirements.
  • 18. Vendors Manage Hadoop For Organizations that: • Do not have the IT organization that can install, manage, maintain and operate the Hadoop hardware/software stack, and fix “broken” jobs. • Do not have the IT hardware infrastructure that’s required. • May need an “always on” Hadoop environment. • Need service providers that: • Can handle all aspects of the IT support for Hadoop. • Can provide comprehensive SLAs. • May offer hardware optimized for Hadoop.

Hinweis der Redaktion

  1. Welcome to Hadoop Summit 2014.
  2. Welcome to Hadoop Summit 2014.
  3. Examples : IT engineer working to create reports
  4. Thank you for your time today. Hope this has been helpful.