SlideShare a Scribd company logo
1 of 12
Case Study of BigData use with MapR
           M7 in the Enterprise Datacenter
  Zeljko Dodlek
  Sales Director DACH
  zdodlek@maprtech.com
  +49 (0) 151 120 555 07
©MapR Technologies - Confidential   1
Agenda



        Ancestry Case Study
        MapR Overview
            Q&A




©MapR Technologies - Confidential   2
Ancestry use Case (page 1)

    What does Ancestry do?
Ancestry.com is an online family history service that uses machine
learning and several other statistical techniques to provide services
such as ancestry information and DNA sequencing to its users.


    Business Challenges?
10 Billion records in a 4 PB DataStore
40.000 Record collections (date of birth/death, census, military
status,….)
2+ Million subscribers
10+ Million registered users
DNA matching added to their offering
    ©MapR Technologies - Confidential   3
Ancestry use Case (page 2)

 Why MapR ?
HA Requirements for the NameNode & TaskTracker
Easy way to ingest Data into the cluster
Safe way for using different Jobs on the same cluster
Unified File & Table platform


Configuration
3 separate clusters
* DNA Matching
* Machine Learning
* Data Mining


    ©MapR Technologies - Confidential   4
MapRTech Overview
            Enterprise Grade Hadoop Distribution
            Innovations in the areas of the DataPlatform, Map&Reduce
             and HBase
            Enabling Customers to depend on our Hadoop Distribution
              –    No Single Points of Failure
              –    Guaranteeing SLA’s
              –    Easy to Install/run/expand
            Professional Services – Installation, consulting and training
            Support 7 x24




©MapR Technologies - Confidential                5
MapR Distribution




©MapR Technologies - Confidential   6
MapR’s value addition




                                    Distribution made for the enterprise
©MapR Technologies - Confidential                   7
Expanding Hadoop Use Cases


                                                              Hadoop APIs
                                                              for Hadoop
                                                              Applications


                                                                                   ODBC and JDBC for
                                    NFS for file-based
                                                                                      SQL-based
                                      applications
                                                                                     applications




                                                                                                Mission
                      Real-time                                                            critical and SLA
                     Applications                                                            dependent
                                                                                            Applications


                                                         Blue = MapR Innovations
©MapR Technologies - Confidential                                    8
No NameNode Architecture
Other Distributions (HDFS Federation)                                              MapR
                                          NAS
                                       APPLIANCE



                  A        B            C    D      E   F
                                                    NameNode
              NameNode                 NameNode    NameNode


                                                                           E
               DataNode                DataNode    DataNode
                                                                       A       F     C    D       E     D


               DataNode                DataNode    DataNode
                                                                       A       B     B    C        E    B


               DataNode                DataNode    DataNode
                                                                       A       D     C    F        B    F

                 Multiple single points of failure                   HA w/ automatic failover and re-replication
                 Limited to 50M files per NameNode                   Up to 1T files (> 5000x advantage)
                 Performance bottleneck                              Higher performance
                 Commercial NAS required                             100% commodity hardware
                 Metadata must fit in memory                         Metadata is persisted to disk

   ©MapR Technologies - Confidential                           9
Simplifying HBase Architecture


                          HBase

                             JVM


                             DFS    HBase

                             JVM     JVM

                            ext3    MapR    Unified


                           Disks    Disks    Disks


          Other Distributions

©MapR Technologies - Confidential    10
Selected MapR Customers
                                                                                                                         Global threat
                                                                                                                          analytics
    Intrusion detection & prevention                       Recommendation Engine                                       Virus analysis
    Forensic analysis                                      Family tree connections



Major Credit Card Company                                                                                      Clickstream Analysis
                                                                                   Log analysis               Quality profiling/field
     Recommendation Engine                                                        HBase                       failure analysis
     Fraud detection and Prevention



                                            Fraud                                                                     Customer
                                             Detection                                                                  Sentiment
                                            Channel            Advertising exchange                                  Network Analytics
                                             analytics           analysis and optimization



                                   Customer Revenue
                                    Analytics
                                                                Customer targeting                   Monitors and measures
                                   ETL Offload
                                                                Social media analysis                 behavior of online shoppers
    ©MapR Technologies - Confidential                                      11
Thank You




©MapR Technologies - Confidential   12

More Related Content

More from Swiss Big Data User Group

Making Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to useMaking Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to useSwiss Big Data User Group
 
A real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operatorA real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operatorSwiss Big Data User Group
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaSwiss Big Data User Group
 
Closing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisClosing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisSwiss Big Data User Group
 
Big Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companiesBig Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companiesSwiss Big Data User Group
 
Design Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningDesign Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningSwiss Big Data User Group
 
Unleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data WarehouseUnleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data WarehouseSwiss Big Data User Group
 
Project "Babelfish" - A data warehouse to attack complexity
 Project "Babelfish" - A data warehouse to attack complexity Project "Babelfish" - A data warehouse to attack complexity
Project "Babelfish" - A data warehouse to attack complexitySwiss Big Data User Group
 
Brainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density ChoiceBrainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density ChoiceSwiss Big Data User Group
 
Urturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maketUrturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maketSwiss Big Data User Group
 
The World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridThe World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridSwiss Big Data User Group
 
New opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph databaseNew opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph databaseSwiss Big Data User Group
 
Technology Outlook - The new Era of computing
Technology Outlook - The new Era of computingTechnology Outlook - The new Era of computing
Technology Outlook - The new Era of computingSwiss Big Data User Group
 

More from Swiss Big Data User Group (20)

Making Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to useMaking Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to use
 
A real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operatorA real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operator
 
Data Analytics – B2B vs. B2C
Data Analytics – B2B vs. B2CData Analytics – B2B vs. B2C
Data Analytics – B2B vs. B2C
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Closing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisClosing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data Analysis
 
Big Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companiesBig Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companies
 
Design Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningDesign Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time Learning
 
Educating Data Scientists of the Future
Educating Data Scientists of the FutureEducating Data Scientists of the Future
Educating Data Scientists of the Future
 
Unleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data WarehouseUnleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data Warehouse
 
Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?
 
Project "Babelfish" - A data warehouse to attack complexity
 Project "Babelfish" - A data warehouse to attack complexity Project "Babelfish" - A data warehouse to attack complexity
Project "Babelfish" - A data warehouse to attack complexity
 
Brainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density ChoiceBrainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density Choice
 
Urturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maketUrturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maket
 
The World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridThe World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC Datagrid
 
New opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph databaseNew opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph database
 
Technology Outlook - The new Era of computing
Technology Outlook - The new Era of computingTechnology Outlook - The new Era of computing
Technology Outlook - The new Era of computing
 
In-Store Analysis with Hadoop
In-Store Analysis with HadoopIn-Store Analysis with Hadoop
In-Store Analysis with Hadoop
 
Big Data Visualization With ParaView
Big Data Visualization With ParaViewBig Data Visualization With ParaView
Big Data Visualization With ParaView
 
Introduction to Apache Drill
Introduction to Apache DrillIntroduction to Apache Drill
Introduction to Apache Drill
 

Recently uploaded

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 

Recently uploaded (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

Case Study of BigData use with MapR M7 in the Enterprise Datacenter

  • 1. Case Study of BigData use with MapR M7 in the Enterprise Datacenter Zeljko Dodlek Sales Director DACH zdodlek@maprtech.com +49 (0) 151 120 555 07 ©MapR Technologies - Confidential 1
  • 2. Agenda  Ancestry Case Study  MapR Overview  Q&A ©MapR Technologies - Confidential 2
  • 3. Ancestry use Case (page 1)  What does Ancestry do? Ancestry.com is an online family history service that uses machine learning and several other statistical techniques to provide services such as ancestry information and DNA sequencing to its users.  Business Challenges? 10 Billion records in a 4 PB DataStore 40.000 Record collections (date of birth/death, census, military status,….) 2+ Million subscribers 10+ Million registered users DNA matching added to their offering ©MapR Technologies - Confidential 3
  • 4. Ancestry use Case (page 2)  Why MapR ? HA Requirements for the NameNode & TaskTracker Easy way to ingest Data into the cluster Safe way for using different Jobs on the same cluster Unified File & Table platform Configuration 3 separate clusters * DNA Matching * Machine Learning * Data Mining ©MapR Technologies - Confidential 4
  • 5. MapRTech Overview  Enterprise Grade Hadoop Distribution  Innovations in the areas of the DataPlatform, Map&Reduce and HBase  Enabling Customers to depend on our Hadoop Distribution – No Single Points of Failure – Guaranteeing SLA’s – Easy to Install/run/expand  Professional Services – Installation, consulting and training  Support 7 x24 ©MapR Technologies - Confidential 5
  • 7. MapR’s value addition Distribution made for the enterprise ©MapR Technologies - Confidential 7
  • 8. Expanding Hadoop Use Cases Hadoop APIs for Hadoop Applications ODBC and JDBC for NFS for file-based SQL-based applications applications Mission Real-time critical and SLA Applications dependent Applications Blue = MapR Innovations ©MapR Technologies - Confidential 8
  • 9. No NameNode Architecture Other Distributions (HDFS Federation) MapR NAS APPLIANCE A B C D E F NameNode NameNode NameNode NameNode E DataNode DataNode DataNode A F C D E D DataNode DataNode DataNode A B B C E B DataNode DataNode DataNode A D C F B F  Multiple single points of failure  HA w/ automatic failover and re-replication  Limited to 50M files per NameNode  Up to 1T files (> 5000x advantage)  Performance bottleneck  Higher performance  Commercial NAS required  100% commodity hardware  Metadata must fit in memory  Metadata is persisted to disk ©MapR Technologies - Confidential 9
  • 10. Simplifying HBase Architecture HBase JVM DFS HBase JVM JVM ext3 MapR Unified Disks Disks Disks Other Distributions ©MapR Technologies - Confidential 10
  • 11. Selected MapR Customers  Global threat analytics  Intrusion detection & prevention  Recommendation Engine  Virus analysis  Forensic analysis  Family tree connections Major Credit Card Company  Clickstream Analysis  Log analysis  Quality profiling/field  Recommendation Engine  HBase failure analysis  Fraud detection and Prevention  Fraud  Customer Detection Sentiment  Channel  Advertising exchange  Network Analytics analytics analysis and optimization  Customer Revenue Analytics  Customer targeting  Monitors and measures  ETL Offload  Social media analysis behavior of online shoppers ©MapR Technologies - Confidential 11
  • 12. Thank You ©MapR Technologies - Confidential 12

Editor's Notes

  1. MapR’s innovations have also expanded the use cases that are possible with Hadoop. Not only do we support the full Hadoop API set. MapR provides support for NFS so any file-based application can access the cluster with no changes or rewrites required. MapR provides ODBC support, so any database application or SQL-based tool can access and manipulate data in a MapR cluster. MapR supports real-time streaming access. This greatly expands the applications that are possible with Hadoop moving beyond a batch limitation. Finally, the full HA, DR and data protection capabilities of MapR allow mission critical apps to be deployed safely and allows administrators to meet stringent SLA targets.
  2. The Namenode today in Hadoop is a single point of failure, a scalability limitation, and a performance bottleneck.With MapR there is no dedicated NameNode. The NameNode function is distributed across the cluster. This provides major advantages in terms of HA, data loss avoidance, scalability and performance. Other distributions you have a bottleneck regardless of the number of nodes in the cluster. With other distributions the most number of files that you can support is 200M at the maximum and that is with an extremely high end server. 50% of the processing of Hadoop in Facebook is to pack and unpack files to try to work around this limitation. MapR scales uniformly.
  3. (ed. Note: this slide is a great white board slide to summarize M7)The stack on the left is a representation of the HBase architecture found in all other distributions. HBase is deployed on a VM that stores its data in the HDFS layer running on a JVM that in turn stores its data in the Linux file system (ext3) which writes the data to disk. This stack results in a lot of administrative tasks, performance issues, and reliability issues. A lot of the infrastructure within HBase is an attempt to make up for the deficiencies in HDFS. You basically have a database solution that needs to deal with random IO that runs on top of a write-once file system. The middle stack shows how MapR simplified the lower part of the stack with our M5 edition that replaced HDFS and the dependency on the Linux file system with a random read/write storage layer. However, HBase is still a separate infrastructure running on top the storage layer within M5. The region servers are separate and users still experience downtime and delays when recovering from node failures and snapshots.With M7 on the far right, MapR has now unified tables and files into a unified data platform. We’ve eliminated the separate HBase infrastructure. The environment is much simpler to manage by eliminating the various redundant components. We’ve provided a uniform data management layer across files and tables, we’ve provided a consistent data protection layer. Recovery from node failures is in seconds, there is 100% data locality, HBase can read directly from snapshots. Files and tables are in the same namespace, volumes, and directories.