SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Taming the Elephant - Learn how
    Monsanto manages their Hadoop clusters
    to enable Genome/Sequence processing

          Erich Hochmuth          Bala Venkatrao
         Mark Seidenstricker      Aparna Ramani

•   Hadoop World 2012, New York, October 25th, 2012
Agenda
• Introductions
• Monsanto Hadoop Use Case
     • Operational Challenges
     • How Monsanto leverages Cloudera Manager & Product Demo
     • Key benefits of using Cloudera Manager
•   Cloudera Manager
     • Overview
     • Key Features
     • Roadmap
•   Q&A

2
Introductions
    • Monsanto
      • Erich Hochmuth – R&D IT Data & Analytics Lead
      • Mark Seidenstricker – Infrastructure R&D Architect


    • Cloudera
       • Bala Venkartrao – Director, Products
       • Aparna Ramani – Director, Engineering



3
Monsanto Serves Farmers Around the World
    Working With Growers Large and Small, Row Crops and Vegetables




4
Monsanto’s Approach to Driving Yield
    A System of Agriculture Working Together to Boost Productivity




                          BREEDING                   BIOTECHNOLOGY                AGRONOMICS




                   The art and science             The science of improving    The farm management
                   of combining genetic material   plants by inserting genes   practices involved in
                   to produce a new seed           into their DNA              growing plants

5
Increasing Yield through Big Data
    At the Cornerstone of Yield Increases is Information & Analytics
                                            Increased Yield




                      Variety                      Volume                          Velocity




         • Raw Sequence data              • PBs of NGS data              • 10’s millions yield dps/day
         • Unstructured sensor data       • 10’s TBs of genomic data     • 100’s million genotyping dps/day
         • Poly-structured genomic data   • TBs of yield data            • TBs of NGS data/week
         • Spatial data                   • Billions of genotyping dps

6
What are the Challenges of managing a Hadoop Cluster?
    Software Provisioning & Configuration Management
        •   Automated & simplified installation/patch management
        •   Streamlined cluster configuration

    Enterprise –ready Tools
        •   Enterprise grade monitoring & management capabilities
        •   Integration with existing enterprise IT stack

    Reporting & Monitoring
        •   Proactive monitoring & alerting
        •   Capacity planning

    Support
        •   Midwest Location
        •   Lack of Hadoop expertise


7
What are the Solutions?
    With Cloudera Manager, you get…
    Intuitive Management Console
         •   Mission control style dashboard for entire cluster
         •   Centralized management of entire Hadoop ecosystem
         •   Treat the cluster as an appliance
         •   Configuration change audit & validation
    Integration with Enterprise IT Management Tools
         •   Connect to Corporate LDAP
         •   Cloudera Manager API integrates with existing BMC platform
    Comprehensive Monitoring & Alerting
         •   Proactive service level alerts
         •   Summarized cluster level graphs & charts
         •   Real-time series charts (MapReduce & HBase)
    Historical Cluster Metrics/Reports
         •   Capacity planning - Disk usage/ Slot Capacity


8
What are the Benefits of Cloudera Manager?
    Lowers the barrier for Hadoop administration
        •   Do not need to rely on experts solely

        •   Reduces the number of administrators needed

    Provides a “one-stop” holistic view
        •   Easy to understand how the overall cluster is performing

    Includes pre-tuned configuration with best practices
        •   Get straight to solving the business problem

    Integrates with Cloudera support
        •   Leverage the real experts…not just for bugs
9
Cloudera Enterprise – The Platform for Big Data




10
Why You Need Cloudera Manager?
     Complexity services running across many machines
     Hadoop is more than a dozen
        • Hundreds of hardware components
        • Thousands of settings
        • Limitless permutations

     Context not just a collection of parts
     Hadoop is a system,
        • Everything is interrelated
        • Raw data about individual pieces is not enough
        • Must extract what’s important


     Efficiency multiple tools & manual process takes longer
     Managing Hadoop with
        • Complicated, error-prone workflows
        • Longer issue resolution
        • Lack of consistent & repeatable processes

11
Cloudera Manager
     End-to-End Administration for CDH




     1   Deploy
         Install, configure & start your cluster in 3
         simple steps



     2 Configure & Optimize
         Ensure optimal settings for all hosts & services




     3 Monitor, Diagnose & Report
         Find & fix problems quickly, view current &
         historical activity & resource usage



12
Managing Complexity
       One Tool For Everything
 DEPLOYMENT &                                                                                            ACTIVITY
                      MONITORING   WORKFLOWS   EVENTS & ALERTS   LOG SEARCH   DIAGNOSTICS   REPORTING
 CONFIGURATION                                                                                          MONITORING

DO-IT-YOURSELF




                         +




CLOUDERA ENTERPRISE




      “In a recent Cloudera survey, >95% of respondents emphasized the importance of having a
                      single end-to-end tool to manage their Hadoop Operations”
 13
Raw Data vs. Hadoop Intelligence
     Providing Context




                                   1   Smart Configuration

                         ?
                                       Auto-sets configurations & guards against user error

                             VS.   2   Workflows
                                       Ensures that multi-step tasks are accomplished completely
                                       & in the correct sequence

                                   3   Dependencies
                                       Aware of how a particular action affects the rest of the
                                       cluster & manages the impact

                                   4   Events & Alerts
                                       Makes you aware of what’s important at a Hadoop system level


                                   5   History
                                       Compares current & past activities for context

14
Cloudera Manager Key Features
                  Installs the complete Hadoop stack in minutes via a wizard-based interface

                  Gives you complete, end-to-end visibility and control over your Hadoop cluster from a single
                  interface
                  Allows you to manage multiple clusters from a single instance of Cloudera Manager

                  Integrate Cloudera Manager with Active Directory

                  Establishes the time context globally for almost all views

                  Correlates jobs, activities, logs, system changes, configuration changes and service metrics along
                  a single timeline to simplify diagnosis
                  Set server roles, configure services and manage security across the cluster

                  Gracefully start, stop and restart of services as needed
                  Supports Administrator and Read-Only users

                  Maintains a complete record of configuration changes with the ability to roll back to previous
                  states
                  Monitors dozens of service performance metrics and alerts you when you approach critical
                  thresholds
15
Cloudera Manager Key Features (Contd..)
                  Gather, view and search Hadoop logs collected from across the cluster

                  Scans Hadoop logs for irregularities and warns you before they impact the cluster
                  Creates and aggregates relevant Hadoop events pertaining to system health, log messages, user
                  services and activities and make them available for alerting and searching


                  Generates email alerts when certain events occur

                  Consolidates all cluster activity into a single, real-time view

                  View information pertaining to hosts in your cluster including status, resident memory, virtual
                  memory and roles
                  Visualize health status and metrics across the cluster to quickly identify problem nodes and take
                  action
                  Visualize current and historical disk usage by user, group and directory
                  Track MapReduce activity on the cluster by job or user
                  Takes a snapshot of the cluster state and automatically sends it to Cloudera support to assist with
                  resolution
                  Easily integrate Cloudera Manager with your existing enterprise-wide management and monitoring
                  tools

16
Cloudera Manager Roadmap
     •   Cloudera Manager 4.1 – Released 10/24
           • Platform Support for CDH4.1
           • Cloudera Impala management & monitoring
           • New monitoring – Zookeeper, Flume NG
           • Maintenance Mode
           • Host Decommissioning
           • Several Usability Enhancements


     •   Cloudera Manager 4.5 – Early 2013
           •   Rolling Upgrades/ Restarts
           •   Enhanced Monitoring, Cluster Heatmaps etc.
           •   Role Groups Configuration
           •   Cloud Support
           •   Others – SNMP support, Error handling, ISV integration etc.


17
Why Cloudera Manager?
      Simple administration in a single tool
      End-to-End Hadoop

      Intelligentsystem level – Cloudera’s experience realized in software
      Manages Hadoop at a


      Efficient workflows & makes administrators more productive
      Simplifies complex

      Best-in-Class management application available
      The only enterprise-grade Hadoop

18
Next Steps
     • Try out FREE edition of Cloudera Manager
        •   Download from:
            http://www.cloudera.com/products-services/tools/
        •   Support available via scm-users@cloudera.org


     • For Cloudera Enterprise subscriptions,    please contact:
      sales@cloudera.com

19
Q&A




20
Key Features
     Cloudera Manager




22
Install A Cluster In 3 Simple Steps
     Cloudera Manager Key Features


                  1
             Find Nodes
                                                                    2
                                                        Install Components
                                                                                                                       3
                                                                                                                 Assign Roles




  Enter the names of the hosts which will be      Cloudera Manager automatically installs the CDH   Verify the roles of the nodes within your cluster.
included in the Hadoop cluster. Click Continue.       components on the hosts you specified.                   Make changes as necessary.


23
View Service Health & Performance
     Cloudera Manager Key Features




24
Get Host-Level Snapshots
     Cloudera Manager Key Features




25
Monitor & Diagnose Cluster Workloads
     Cloudera Manager Key Features




26
Gather, View & Search Hadoop Logs
     Cloudera Manager Key Features




27
Track Events From Across The Cluster
     Cloudera Manager Key Features




28
Report On System Performance & Usage
     Cloudera Manager Key Features




29
Visualize Health Status With Heatmaps
     Cloudera Manager Key Features




30
Manage Multiple CDH Clusters
     Cloudera Manager Key Features




31
Easily Configure High Availability
     Cloudera Manager Key Features




32
Set The Time Context Globally
     Cloudera Manager Key Features




33

Weitere ähnliche Inhalte

Ähnlich wie Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing

How CBS Interactive uses Cloudera Manager to effectively manage their Hadoop ...
How CBS Interactive uses Cloudera Manager to effectively manage their Hadoop ...How CBS Interactive uses Cloudera Manager to effectively manage their Hadoop ...
How CBS Interactive uses Cloudera Manager to effectively manage their Hadoop ...Cloudera, Inc.
 
BlueData Integration with Cloudera Manager
BlueData Integration with Cloudera ManagerBlueData Integration with Cloudera Manager
BlueData Integration with Cloudera ManagerBlueData, Inc.
 
What the Enterprise Requires - Business Continuity and Visibility
What the Enterprise Requires - Business Continuity and VisibilityWhat the Enterprise Requires - Business Continuity and Visibility
What the Enterprise Requires - Business Continuity and VisibilityCloudera, Inc.
 
DevOps - Top Trends In 2019
DevOps - Top Trends In 2019DevOps - Top Trends In 2019
DevOps - Top Trends In 2019Vikash Karuna
 
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Cloudera, Inc.
 
Cloudera Manager Webinar | Cloudera Enterprise 3.7
Cloudera Manager Webinar | Cloudera Enterprise 3.7Cloudera Manager Webinar | Cloudera Enterprise 3.7
Cloudera Manager Webinar | Cloudera Enterprise 3.7Cloudera, Inc.
 
VMworld 2013: Separating Cloud Hype from Reality in Healthcare – a Real-Life ...
VMworld 2013: Separating Cloud Hype from Reality in Healthcare – a Real-Life ...VMworld 2013: Separating Cloud Hype from Reality in Healthcare – a Real-Life ...
VMworld 2013: Separating Cloud Hype from Reality in Healthcare – a Real-Life ...VMworld
 
Newvem Community - Cloud Management
Newvem Community - Cloud ManagementNewvem Community - Cloud Management
Newvem Community - Cloud ManagementAndreas Chatzakis
 
Cloudera Search Webinar: Big Data Search, Bigger Insights
Cloudera Search Webinar: Big Data Search, Bigger InsightsCloudera Search Webinar: Big Data Search, Bigger Insights
Cloudera Search Webinar: Big Data Search, Bigger InsightsCloudera, Inc.
 
Cloudera User Group SF - Cloudera Manager: APIs & Extensibility
Cloudera User Group SF - Cloudera Manager: APIs & ExtensibilityCloudera User Group SF - Cloudera Manager: APIs & Extensibility
Cloudera User Group SF - Cloudera Manager: APIs & ExtensibilityClouderaUserGroups
 
HadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop OverviewHadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop OverviewYafang Chang
 
Cloumon Product Introduction
Cloumon Product IntroductionCloumon Product Introduction
Cloumon Product IntroductionGruter
 
VMware vROps Management Pack for Hadoop
VMware vROps Management Pack for HadoopVMware vROps Management Pack for Hadoop
VMware vROps Management Pack for HadoopBlue Medora
 
Virtualization Management With Quest V Foglight
Virtualization Management With Quest V FoglightVirtualization Management With Quest V Foglight
Virtualization Management With Quest V FoglightChris Roberts
 
10 Key Steps for Moving from Legacy Infrastructure to the Cloud
10 Key Steps for Moving from Legacy Infrastructure to the Cloud10 Key Steps for Moving from Legacy Infrastructure to the Cloud
10 Key Steps for Moving from Legacy Infrastructure to the CloudNGINX, Inc.
 
Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...Cloudera, Inc.
 
5 Best Practices to Achieve Operational Excellence with Big Data Apps
5 Best Practices to Achieve Operational Excellence with Big Data Apps5 Best Practices to Achieve Operational Excellence with Big Data Apps
5 Best Practices to Achieve Operational Excellence with Big Data AppsDriven Inc.
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanJim Kaskade
 
The IBM dashboard for operational metrics
The IBM dashboard for operational metricsThe IBM dashboard for operational metrics
The IBM dashboard for operational metricsPlatform CF
 
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in ProductionUpgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in ProductionCloudera, Inc.
 

Ähnlich wie Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing (20)

How CBS Interactive uses Cloudera Manager to effectively manage their Hadoop ...
How CBS Interactive uses Cloudera Manager to effectively manage their Hadoop ...How CBS Interactive uses Cloudera Manager to effectively manage their Hadoop ...
How CBS Interactive uses Cloudera Manager to effectively manage their Hadoop ...
 
BlueData Integration with Cloudera Manager
BlueData Integration with Cloudera ManagerBlueData Integration with Cloudera Manager
BlueData Integration with Cloudera Manager
 
What the Enterprise Requires - Business Continuity and Visibility
What the Enterprise Requires - Business Continuity and VisibilityWhat the Enterprise Requires - Business Continuity and Visibility
What the Enterprise Requires - Business Continuity and Visibility
 
DevOps - Top Trends In 2019
DevOps - Top Trends In 2019DevOps - Top Trends In 2019
DevOps - Top Trends In 2019
 
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
 
Cloudera Manager Webinar | Cloudera Enterprise 3.7
Cloudera Manager Webinar | Cloudera Enterprise 3.7Cloudera Manager Webinar | Cloudera Enterprise 3.7
Cloudera Manager Webinar | Cloudera Enterprise 3.7
 
VMworld 2013: Separating Cloud Hype from Reality in Healthcare – a Real-Life ...
VMworld 2013: Separating Cloud Hype from Reality in Healthcare – a Real-Life ...VMworld 2013: Separating Cloud Hype from Reality in Healthcare – a Real-Life ...
VMworld 2013: Separating Cloud Hype from Reality in Healthcare – a Real-Life ...
 
Newvem Community - Cloud Management
Newvem Community - Cloud ManagementNewvem Community - Cloud Management
Newvem Community - Cloud Management
 
Cloudera Search Webinar: Big Data Search, Bigger Insights
Cloudera Search Webinar: Big Data Search, Bigger InsightsCloudera Search Webinar: Big Data Search, Bigger Insights
Cloudera Search Webinar: Big Data Search, Bigger Insights
 
Cloudera User Group SF - Cloudera Manager: APIs & Extensibility
Cloudera User Group SF - Cloudera Manager: APIs & ExtensibilityCloudera User Group SF - Cloudera Manager: APIs & Extensibility
Cloudera User Group SF - Cloudera Manager: APIs & Extensibility
 
HadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop OverviewHadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop Overview
 
Cloumon Product Introduction
Cloumon Product IntroductionCloumon Product Introduction
Cloumon Product Introduction
 
VMware vROps Management Pack for Hadoop
VMware vROps Management Pack for HadoopVMware vROps Management Pack for Hadoop
VMware vROps Management Pack for Hadoop
 
Virtualization Management With Quest V Foglight
Virtualization Management With Quest V FoglightVirtualization Management With Quest V Foglight
Virtualization Management With Quest V Foglight
 
10 Key Steps for Moving from Legacy Infrastructure to the Cloud
10 Key Steps for Moving from Legacy Infrastructure to the Cloud10 Key Steps for Moving from Legacy Infrastructure to the Cloud
10 Key Steps for Moving from Legacy Infrastructure to the Cloud
 
Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...
 
5 Best Practices to Achieve Operational Excellence with Big Data Apps
5 Best Practices to Achieve Operational Excellence with Big Data Apps5 Best Practices to Achieve Operational Excellence with Big Data Apps
5 Best Practices to Achieve Operational Excellence with Big Data Apps
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
 
The IBM dashboard for operational metrics
The IBM dashboard for operational metricsThe IBM dashboard for operational metrics
The IBM dashboard for operational metrics
 
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in ProductionUpgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
 

Mehr von Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Mehr von Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing

  • 1. Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing Erich Hochmuth Bala Venkatrao Mark Seidenstricker Aparna Ramani • Hadoop World 2012, New York, October 25th, 2012
  • 2. Agenda • Introductions • Monsanto Hadoop Use Case • Operational Challenges • How Monsanto leverages Cloudera Manager & Product Demo • Key benefits of using Cloudera Manager • Cloudera Manager • Overview • Key Features • Roadmap • Q&A 2
  • 3. Introductions • Monsanto • Erich Hochmuth – R&D IT Data & Analytics Lead • Mark Seidenstricker – Infrastructure R&D Architect • Cloudera • Bala Venkartrao – Director, Products • Aparna Ramani – Director, Engineering 3
  • 4. Monsanto Serves Farmers Around the World Working With Growers Large and Small, Row Crops and Vegetables 4
  • 5. Monsanto’s Approach to Driving Yield A System of Agriculture Working Together to Boost Productivity BREEDING BIOTECHNOLOGY AGRONOMICS The art and science The science of improving The farm management of combining genetic material plants by inserting genes practices involved in to produce a new seed into their DNA growing plants 5
  • 6. Increasing Yield through Big Data At the Cornerstone of Yield Increases is Information & Analytics Increased Yield Variety Volume Velocity • Raw Sequence data • PBs of NGS data • 10’s millions yield dps/day • Unstructured sensor data • 10’s TBs of genomic data • 100’s million genotyping dps/day • Poly-structured genomic data • TBs of yield data • TBs of NGS data/week • Spatial data • Billions of genotyping dps 6
  • 7. What are the Challenges of managing a Hadoop Cluster? Software Provisioning & Configuration Management • Automated & simplified installation/patch management • Streamlined cluster configuration Enterprise –ready Tools • Enterprise grade monitoring & management capabilities • Integration with existing enterprise IT stack Reporting & Monitoring • Proactive monitoring & alerting • Capacity planning Support • Midwest Location • Lack of Hadoop expertise 7
  • 8. What are the Solutions? With Cloudera Manager, you get… Intuitive Management Console • Mission control style dashboard for entire cluster • Centralized management of entire Hadoop ecosystem • Treat the cluster as an appliance • Configuration change audit & validation Integration with Enterprise IT Management Tools • Connect to Corporate LDAP • Cloudera Manager API integrates with existing BMC platform Comprehensive Monitoring & Alerting • Proactive service level alerts • Summarized cluster level graphs & charts • Real-time series charts (MapReduce & HBase) Historical Cluster Metrics/Reports • Capacity planning - Disk usage/ Slot Capacity 8
  • 9. What are the Benefits of Cloudera Manager? Lowers the barrier for Hadoop administration • Do not need to rely on experts solely • Reduces the number of administrators needed Provides a “one-stop” holistic view • Easy to understand how the overall cluster is performing Includes pre-tuned configuration with best practices • Get straight to solving the business problem Integrates with Cloudera support • Leverage the real experts…not just for bugs 9
  • 10. Cloudera Enterprise – The Platform for Big Data 10
  • 11. Why You Need Cloudera Manager? Complexity services running across many machines Hadoop is more than a dozen • Hundreds of hardware components • Thousands of settings • Limitless permutations Context not just a collection of parts Hadoop is a system, • Everything is interrelated • Raw data about individual pieces is not enough • Must extract what’s important Efficiency multiple tools & manual process takes longer Managing Hadoop with • Complicated, error-prone workflows • Longer issue resolution • Lack of consistent & repeatable processes 11
  • 12. Cloudera Manager End-to-End Administration for CDH 1 Deploy Install, configure & start your cluster in 3 simple steps 2 Configure & Optimize Ensure optimal settings for all hosts & services 3 Monitor, Diagnose & Report Find & fix problems quickly, view current & historical activity & resource usage 12
  • 13. Managing Complexity One Tool For Everything DEPLOYMENT & ACTIVITY MONITORING WORKFLOWS EVENTS & ALERTS LOG SEARCH DIAGNOSTICS REPORTING CONFIGURATION MONITORING DO-IT-YOURSELF + CLOUDERA ENTERPRISE “In a recent Cloudera survey, >95% of respondents emphasized the importance of having a single end-to-end tool to manage their Hadoop Operations” 13
  • 14. Raw Data vs. Hadoop Intelligence Providing Context 1 Smart Configuration ? Auto-sets configurations & guards against user error VS. 2 Workflows Ensures that multi-step tasks are accomplished completely & in the correct sequence 3 Dependencies Aware of how a particular action affects the rest of the cluster & manages the impact 4 Events & Alerts Makes you aware of what’s important at a Hadoop system level 5 History Compares current & past activities for context 14
  • 15. Cloudera Manager Key Features Installs the complete Hadoop stack in minutes via a wizard-based interface Gives you complete, end-to-end visibility and control over your Hadoop cluster from a single interface Allows you to manage multiple clusters from a single instance of Cloudera Manager Integrate Cloudera Manager with Active Directory Establishes the time context globally for almost all views Correlates jobs, activities, logs, system changes, configuration changes and service metrics along a single timeline to simplify diagnosis Set server roles, configure services and manage security across the cluster Gracefully start, stop and restart of services as needed Supports Administrator and Read-Only users Maintains a complete record of configuration changes with the ability to roll back to previous states Monitors dozens of service performance metrics and alerts you when you approach critical thresholds 15
  • 16. Cloudera Manager Key Features (Contd..) Gather, view and search Hadoop logs collected from across the cluster Scans Hadoop logs for irregularities and warns you before they impact the cluster Creates and aggregates relevant Hadoop events pertaining to system health, log messages, user services and activities and make them available for alerting and searching Generates email alerts when certain events occur Consolidates all cluster activity into a single, real-time view View information pertaining to hosts in your cluster including status, resident memory, virtual memory and roles Visualize health status and metrics across the cluster to quickly identify problem nodes and take action Visualize current and historical disk usage by user, group and directory Track MapReduce activity on the cluster by job or user Takes a snapshot of the cluster state and automatically sends it to Cloudera support to assist with resolution Easily integrate Cloudera Manager with your existing enterprise-wide management and monitoring tools 16
  • 17. Cloudera Manager Roadmap • Cloudera Manager 4.1 – Released 10/24 • Platform Support for CDH4.1 • Cloudera Impala management & monitoring • New monitoring – Zookeeper, Flume NG • Maintenance Mode • Host Decommissioning • Several Usability Enhancements • Cloudera Manager 4.5 – Early 2013 • Rolling Upgrades/ Restarts • Enhanced Monitoring, Cluster Heatmaps etc. • Role Groups Configuration • Cloud Support • Others – SNMP support, Error handling, ISV integration etc. 17
  • 18. Why Cloudera Manager? Simple administration in a single tool End-to-End Hadoop Intelligentsystem level – Cloudera’s experience realized in software Manages Hadoop at a Efficient workflows & makes administrators more productive Simplifies complex Best-in-Class management application available The only enterprise-grade Hadoop 18
  • 19. Next Steps • Try out FREE edition of Cloudera Manager • Download from: http://www.cloudera.com/products-services/tools/ • Support available via scm-users@cloudera.org • For Cloudera Enterprise subscriptions, please contact: sales@cloudera.com 19
  • 21.
  • 22. Key Features Cloudera Manager 22
  • 23. Install A Cluster In 3 Simple Steps Cloudera Manager Key Features 1 Find Nodes 2 Install Components 3 Assign Roles Enter the names of the hosts which will be Cloudera Manager automatically installs the CDH Verify the roles of the nodes within your cluster. included in the Hadoop cluster. Click Continue. components on the hosts you specified. Make changes as necessary. 23
  • 24. View Service Health & Performance Cloudera Manager Key Features 24
  • 25. Get Host-Level Snapshots Cloudera Manager Key Features 25
  • 26. Monitor & Diagnose Cluster Workloads Cloudera Manager Key Features 26
  • 27. Gather, View & Search Hadoop Logs Cloudera Manager Key Features 27
  • 28. Track Events From Across The Cluster Cloudera Manager Key Features 28
  • 29. Report On System Performance & Usage Cloudera Manager Key Features 29
  • 30. Visualize Health Status With Heatmaps Cloudera Manager Key Features 30
  • 31. Manage Multiple CDH Clusters Cloudera Manager Key Features 31
  • 32. Easily Configure High Availability Cloudera Manager Key Features 32
  • 33. Set The Time Context Globally Cloudera Manager Key Features 33

Hinweis der Redaktion

  1. Monsanto is a St. Louis-based agricultural company with one goal in mind – produce more food, fiber and fuel using less inputs like water and land, while improving the lives of the people around the world that benefit from our technology.Monsanto utilizes a systems approach to improving upon today’s agricultural offerings – Breeding, Biotechnology, and Advanced Agronomic Practices These three facets of our approach help farmers improve productivity, reduce the costs of farming, and grow better foods for consumers and better feed for animals.We’re proud to have customers of all kinds; from large-acre, technology-driven row-crop farmers in Central Illinois all the way to farmers with very small landholdings who are just beginning to realize the benefits of modern agriculture in Africa.
  2. Sustainably increasing yield, while more efficiently using inputs and resources, requires every tool at farmers’ disposal. At Monsanto, we’re focused on three pillars for driving yield: breeding, biotechnology and improved agronomic practices. All three are required to meet our goals.Basics of Breeding Breeding, a technique that has been practiced by farmers for thousands of years, involves bringing together two parent plants to produce a new offspring that contains a mixture of parent characteristics. Monsanto has assembled a pool of elite seed genetics (germplasm) from around the world, and we use cutting-edge technology to help us more quickly, efficiently and accurately find desired traits for breeding. Our primary method is using genetic analysis – mapping the DNA of plants – to identify seeds with traits we want, such as improved yield, disease resistance, suitability for a particular climate, and in the case of vegetables better taste and nutrition.Basics of Biotechnology Biotechnology is the process of inserting a gene from one species, like a plant or a bacterium, into another species. We use biotechnology to give plants desirable characteristics (or traits) that often cannot be developed through breeding practices. The traits we develop help farmers produce more of their crop, reduce costs and conserve resources. Examples of these traits would be herbicide tolerance, insect-resistance and drought-tolerance. We also are working to develop traits that will benefit consumers, such as soybeans that produce healthier oils.Basics of AgronomicsAgronomic practices are steps farmers incorporate into their farm management systems to improve soil quality, enhance water use, manage crop residue and improve the environment through better fertilizer management. These steps not only improve a farmer’s bottom line by decreasing input costs, but also improve the environment by decreasing water use and over-fertilization. Improved agronomics cover a broad range of practices, suitable for any type of farm. For example, a high-tech, high productivity grower may use GPS and computer systems to automate planting for optimal row spacing and varying inputs acre by acre, to produce more and conserve more. A subsistence farmer can see significant benefits by learning about input management and optimal plant spacing to reduce costs and improve yield. Conservation tillage is a broadly applicable technique that preserves topsoil and locks in moisture.