Upgrade Without the Headache: Best Practices for Upgrading Hadoop in ProductionCloudera, Inc.
Ähnlich wie Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing (20)
Introducing the data science sandbox as a service 8.30.18
Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing
1. Taming the Elephant - Learn how
Monsanto manages their Hadoop clusters
to enable Genome/Sequence processing
Erich Hochmuth Bala Venkatrao
Mark Seidenstricker Aparna Ramani
• Hadoop World 2012, New York, October 25th, 2012
2. Agenda
• Introductions
• Monsanto Hadoop Use Case
• Operational Challenges
• How Monsanto leverages Cloudera Manager & Product Demo
• Key benefits of using Cloudera Manager
• Cloudera Manager
• Overview
• Key Features
• Roadmap
• Q&A
2
3. Introductions
• Monsanto
• Erich Hochmuth – R&D IT Data & Analytics Lead
• Mark Seidenstricker – Infrastructure R&D Architect
• Cloudera
• Bala Venkartrao – Director, Products
• Aparna Ramani – Director, Engineering
3
4. Monsanto Serves Farmers Around the World
Working With Growers Large and Small, Row Crops and Vegetables
4
5. Monsanto’s Approach to Driving Yield
A System of Agriculture Working Together to Boost Productivity
BREEDING BIOTECHNOLOGY AGRONOMICS
The art and science The science of improving The farm management
of combining genetic material plants by inserting genes practices involved in
to produce a new seed into their DNA growing plants
5
6. Increasing Yield through Big Data
At the Cornerstone of Yield Increases is Information & Analytics
Increased Yield
Variety Volume Velocity
• Raw Sequence data • PBs of NGS data • 10’s millions yield dps/day
• Unstructured sensor data • 10’s TBs of genomic data • 100’s million genotyping dps/day
• Poly-structured genomic data • TBs of yield data • TBs of NGS data/week
• Spatial data • Billions of genotyping dps
6
7. What are the Challenges of managing a Hadoop Cluster?
Software Provisioning & Configuration Management
• Automated & simplified installation/patch management
• Streamlined cluster configuration
Enterprise –ready Tools
• Enterprise grade monitoring & management capabilities
• Integration with existing enterprise IT stack
Reporting & Monitoring
• Proactive monitoring & alerting
• Capacity planning
Support
• Midwest Location
• Lack of Hadoop expertise
7
8. What are the Solutions?
With Cloudera Manager, you get…
Intuitive Management Console
• Mission control style dashboard for entire cluster
• Centralized management of entire Hadoop ecosystem
• Treat the cluster as an appliance
• Configuration change audit & validation
Integration with Enterprise IT Management Tools
• Connect to Corporate LDAP
• Cloudera Manager API integrates with existing BMC platform
Comprehensive Monitoring & Alerting
• Proactive service level alerts
• Summarized cluster level graphs & charts
• Real-time series charts (MapReduce & HBase)
Historical Cluster Metrics/Reports
• Capacity planning - Disk usage/ Slot Capacity
8
9. What are the Benefits of Cloudera Manager?
Lowers the barrier for Hadoop administration
• Do not need to rely on experts solely
• Reduces the number of administrators needed
Provides a “one-stop” holistic view
• Easy to understand how the overall cluster is performing
Includes pre-tuned configuration with best practices
• Get straight to solving the business problem
Integrates with Cloudera support
• Leverage the real experts…not just for bugs
9
11. Why You Need Cloudera Manager?
Complexity services running across many machines
Hadoop is more than a dozen
• Hundreds of hardware components
• Thousands of settings
• Limitless permutations
Context not just a collection of parts
Hadoop is a system,
• Everything is interrelated
• Raw data about individual pieces is not enough
• Must extract what’s important
Efficiency multiple tools & manual process takes longer
Managing Hadoop with
• Complicated, error-prone workflows
• Longer issue resolution
• Lack of consistent & repeatable processes
11
12. Cloudera Manager
End-to-End Administration for CDH
1 Deploy
Install, configure & start your cluster in 3
simple steps
2 Configure & Optimize
Ensure optimal settings for all hosts & services
3 Monitor, Diagnose & Report
Find & fix problems quickly, view current &
historical activity & resource usage
12
13. Managing Complexity
One Tool For Everything
DEPLOYMENT & ACTIVITY
MONITORING WORKFLOWS EVENTS & ALERTS LOG SEARCH DIAGNOSTICS REPORTING
CONFIGURATION MONITORING
DO-IT-YOURSELF
+
CLOUDERA ENTERPRISE
“In a recent Cloudera survey, >95% of respondents emphasized the importance of having a
single end-to-end tool to manage their Hadoop Operations”
13
14. Raw Data vs. Hadoop Intelligence
Providing Context
1 Smart Configuration
?
Auto-sets configurations & guards against user error
VS. 2 Workflows
Ensures that multi-step tasks are accomplished completely
& in the correct sequence
3 Dependencies
Aware of how a particular action affects the rest of the
cluster & manages the impact
4 Events & Alerts
Makes you aware of what’s important at a Hadoop system level
5 History
Compares current & past activities for context
14
15. Cloudera Manager Key Features
Installs the complete Hadoop stack in minutes via a wizard-based interface
Gives you complete, end-to-end visibility and control over your Hadoop cluster from a single
interface
Allows you to manage multiple clusters from a single instance of Cloudera Manager
Integrate Cloudera Manager with Active Directory
Establishes the time context globally for almost all views
Correlates jobs, activities, logs, system changes, configuration changes and service metrics along
a single timeline to simplify diagnosis
Set server roles, configure services and manage security across the cluster
Gracefully start, stop and restart of services as needed
Supports Administrator and Read-Only users
Maintains a complete record of configuration changes with the ability to roll back to previous
states
Monitors dozens of service performance metrics and alerts you when you approach critical
thresholds
15
16. Cloudera Manager Key Features (Contd..)
Gather, view and search Hadoop logs collected from across the cluster
Scans Hadoop logs for irregularities and warns you before they impact the cluster
Creates and aggregates relevant Hadoop events pertaining to system health, log messages, user
services and activities and make them available for alerting and searching
Generates email alerts when certain events occur
Consolidates all cluster activity into a single, real-time view
View information pertaining to hosts in your cluster including status, resident memory, virtual
memory and roles
Visualize health status and metrics across the cluster to quickly identify problem nodes and take
action
Visualize current and historical disk usage by user, group and directory
Track MapReduce activity on the cluster by job or user
Takes a snapshot of the cluster state and automatically sends it to Cloudera support to assist with
resolution
Easily integrate Cloudera Manager with your existing enterprise-wide management and monitoring
tools
16
17. Cloudera Manager Roadmap
• Cloudera Manager 4.1 – Released 10/24
• Platform Support for CDH4.1
• Cloudera Impala management & monitoring
• New monitoring – Zookeeper, Flume NG
• Maintenance Mode
• Host Decommissioning
• Several Usability Enhancements
• Cloudera Manager 4.5 – Early 2013
• Rolling Upgrades/ Restarts
• Enhanced Monitoring, Cluster Heatmaps etc.
• Role Groups Configuration
• Cloud Support
• Others – SNMP support, Error handling, ISV integration etc.
17
18. Why Cloudera Manager?
Simple administration in a single tool
End-to-End Hadoop
Intelligentsystem level – Cloudera’s experience realized in software
Manages Hadoop at a
Efficient workflows & makes administrators more productive
Simplifies complex
Best-in-Class management application available
The only enterprise-grade Hadoop
18
19. Next Steps
• Try out FREE edition of Cloudera Manager
• Download from:
http://www.cloudera.com/products-services/tools/
• Support available via scm-users@cloudera.org
• For Cloudera Enterprise subscriptions, please contact:
sales@cloudera.com
19
23. Install A Cluster In 3 Simple Steps
Cloudera Manager Key Features
1
Find Nodes
2
Install Components
3
Assign Roles
Enter the names of the hosts which will be Cloudera Manager automatically installs the CDH Verify the roles of the nodes within your cluster.
included in the Hadoop cluster. Click Continue. components on the hosts you specified. Make changes as necessary.
23
33. Set The Time Context Globally
Cloudera Manager Key Features
33
Hinweis der Redaktion
Monsanto is a St. Louis-based agricultural company with one goal in mind – produce more food, fiber and fuel using less inputs like water and land, while improving the lives of the people around the world that benefit from our technology.Monsanto utilizes a systems approach to improving upon today’s agricultural offerings – Breeding, Biotechnology, and Advanced Agronomic Practices These three facets of our approach help farmers improve productivity, reduce the costs of farming, and grow better foods for consumers and better feed for animals.We’re proud to have customers of all kinds; from large-acre, technology-driven row-crop farmers in Central Illinois all the way to farmers with very small landholdings who are just beginning to realize the benefits of modern agriculture in Africa.
Sustainably increasing yield, while more efficiently using inputs and resources, requires every tool at farmers’ disposal. At Monsanto, we’re focused on three pillars for driving yield: breeding, biotechnology and improved agronomic practices. All three are required to meet our goals.Basics of Breeding Breeding, a technique that has been practiced by farmers for thousands of years, involves bringing together two parent plants to produce a new offspring that contains a mixture of parent characteristics. Monsanto has assembled a pool of elite seed genetics (germplasm) from around the world, and we use cutting-edge technology to help us more quickly, efficiently and accurately find desired traits for breeding. Our primary method is using genetic analysis – mapping the DNA of plants – to identify seeds with traits we want, such as improved yield, disease resistance, suitability for a particular climate, and in the case of vegetables better taste and nutrition.Basics of Biotechnology Biotechnology is the process of inserting a gene from one species, like a plant or a bacterium, into another species. We use biotechnology to give plants desirable characteristics (or traits) that often cannot be developed through breeding practices. The traits we develop help farmers produce more of their crop, reduce costs and conserve resources. Examples of these traits would be herbicide tolerance, insect-resistance and drought-tolerance. We also are working to develop traits that will benefit consumers, such as soybeans that produce healthier oils.Basics of AgronomicsAgronomic practices are steps farmers incorporate into their farm management systems to improve soil quality, enhance water use, manage crop residue and improve the environment through better fertilizer management. These steps not only improve a farmer’s bottom line by decreasing input costs, but also improve the environment by decreasing water use and over-fertilization. Improved agronomics cover a broad range of practices, suitable for any type of farm. For example, a high-tech, high productivity grower may use GPS and computer systems to automate planting for optimal row spacing and varying inputs acre by acre, to produce more and conserve more. A subsistence farmer can see significant benefits by learning about input management and optimal plant spacing to reduce costs and improve yield. Conservation tillage is a broadly applicable technique that preserves topsoil and locks in moisture.