Weitere ähnliche Inhalte Ähnlich wie Hadoop Meets Scrum (20) Mehr von Rommel Garcia (12) Kürzlich hochgeladen (20) Hadoop Meets Scrum1. Page1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
The Elephant Meets Scrum
Rommel Garcia
2. Page2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Agenda
Control
access into
system
Flexibility
in defining
policies
• Introductions
• Why Scrum?
• Scrum Basic Concepts
• Scrum Team
• Scrum Framework
• Hadoop Meets Scrum
• Scrum Exercise
• Open Forum
3. Page3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Introductions
What’s your name?
What’s your role?
Why are you here?
4. Page4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Why Scrum?
Nobody wants to fail too big….too co$tly…on projects.
5. Page5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Monolithic SDLC
• Small change, impacts everything
• Cost of failure, extremely big
• Slow, unpredictable progress
• Hard to prioritize
• Not business friendly
6. Page6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Scrum..
• Produces immediate results
• Makes the development team nimble and adaptable
• Full visibility on development process
• Is a perfect fit for Hadoop
• Hadoop provides isolation of data and processing (HDFS and YARN respectively)
• Failure in Hadoop is cheap
• Complete traceability of apps deployed, run, tested by whom, when, where
7. Page7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Scrum Concepts
Agile. Iterative. Adaptive. Fast results.
8. Page8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Scrum is..
• A framework within which people can address complex adaptive
problems, while productively and creatively delivering products of the
highest possible value
• A framework to employ various processes and techniques
• Lightweight
• Simple to understand
• Difficult to master….if RULES are not followed religiously
9. Page9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Success of SCRUM depends on..
• Transparency
• Common language must be shared by all team members
• What does “Done” mean??
• Inspection
• Frequent Scrum artifacts progress check
• But be careful not to overdo it or it gets in the way of work
• Adaptation
• Adjust properly and timely when process deviates outside of acceptable limits
• Adjust immediately to prevent further deviation
10. Page10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Scrum Formal Events
1. Sprint Planning
2. Daily Scrum
3. Sprint Review
4. Sprint Retrospective
11. Page11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Scrum consists of..
• Team
• Roles
• Events
• Artifacts
• Rules
14. Page14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
The Team
• Product Owner
• Development Team
• Scrum Master
15. Page15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Product Owner
• Mainly responsible for Product Story management
• Clearly defines Product Story items
• Effectively order items in Product Story
• Ensures Product Story is visible, transparent, and clear to all, and
shows what the Scrum Team will work on next
• Validates with Scrum team that they understand the items in the
Product Story
• In real world, this could be either the Project Manager, Program
Manager, Development Manager, or Product Manager
16. Page16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Development Team
• Self-organizing
• They decide how to produce and release incremental releasable functionality
• Scrum Master has no influence on how the team develop functionality
• Cross-functional
• Pig, Hive, HDFS, YARN, and more
• Develop and release features faster
• Accountability belongs to the Development Team as a whole
• Team size: >=3 but <=9
• Normally composed of Hadoop Developer, Hadoop Architect, Data
Scientist, Data Analyst, QA.
17. Page17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Scrum Master
• Ensures Scrum theory, practices, and rules are enacted
• Servant-leader for the Scrum Team
• Coach Development Team in self-organization and cross-functionality
• Remove impediments to Development team’s progress
• Serves the Product Owner
• Find techniques for effective Product Story management
• Help with clear, concise definition of Product Story items
• Ensures Product Owner knows how to arrange Product Story to maximize value
• Facilitate Scrum events as requested/needed
• Serves the Organization
– Leading Scrum adoption
– Work with other Scrum Masters to increase effectiveness of Scrum application in the organization
18. Page18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Scrum Framework
Fail fast in Hadoop. Move fast with Scrum.
19. Page19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
The Sprint
• It is the heart of Scrum
• Time-boxed at 1 month or less. 2 weeks is pretty common.
• New Sprint starts immediately after conclusion of previous Sprint
• Consists of
• Sprint Planning
• Daily Scrums
• Development Work
• Sprint Review
• Sprint Retrospective
20. Page20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
During the Sprint
• No changes are made that would compromise the Sprint Goal
• Quality goals do NOT decrease
• Scope may be clarified and re-negotiated between Product Owner and
Development team as more is learned
• ONLY Product Owner has the authority to cancel a Sprint
21. Page21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Sprint Planning
• Time-boxed
• 8 hours planning is to 1 month of Sprint or 2 hours of planning is to 2 weeks of Sprint
• Answers the questions:
• What can be done this Sprint?
– Development Team forecasts what Product Story items it will deliver
– Output is Sprint Goal
• How will the chosen work get done?
– Development Team determines how to deliver the increments
– Output is Sprint Story
22. Page22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Daily Scrum
• Driven by Scrum Master
• Time-boxed at 15 mins
• Synchronize activities and plan for the next 24 hours
• Each Development Team member will be asked the questions:
– What has been done yesterday?
– What needs to be done today?
– What were the issues faced that prevented incremental progress to work?
• Highlights and promotes quick decision-making
• Improves communications and eliminate other meetings
23. Page23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Sprint Review
• Time-boxed
• 4 hour review is to 1 month Sprint or 2 hour review is to 2 week Sprint
• Scrum Team and Stakeholders collaborate on what was done in the
Sprint.
• Informal meeting, NOT a status meeting. A demo of product is
presented
• Scrum Team discusses
• What went well during the sprint
• What were the issues faced
• What could be improved
• Output is a revised Product Story items for the next Sprint
24. Page24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Sprint Retrospective
• Time-boxed
• 3 hour meeting is to 1 month Sprint or <2 hour meeting is to 2 week Sprint
• Main purpose
• Review how the previous Sprint went with respect to people, relationships, process, and tools
• Identify and order the major items that went well and potential improvements
• Create a plan for implementing improvements to the way how the Scrum Team does its work
25. Page25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Scrum Timelines
Product Story Sprint Planning Sprint Sprint Review
Sprint
Retrospective
Business Input
Immediate
Driven by Product
Owner,
Stakeholders,
Scrum Master
Immediate
4 hours for 2 wk
Sprint
2 weeks
Daily Scrum
2 hours
2 hours
Immediate
<2 hours
27. Page27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Scrum Tools: Go modern or Archaic
• Agile Software is available i.e. www.rallydev.com, etc.
• LCD Projector
• Whiteboard and colored markers
• Long, contiguous wall
• Clustered cubicles
• Index card
28. Page28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop Meets Scrum
Supporting Scrum in Hadoop Development
29. Page29 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
What is needed in Hadoop to support Scrum?
• Multi-tenancy is critical
• Setup security -> LDAP/AD, Ranger, Kerberos, Knox
• Setup HDFS quota for each Scrum Team
• Setup Capacity Scheduler Queue for each Scrum Team or member
• High Availability is important but not critical
• Setup NN HA
• Setup RM HA
• Setup HiveServer2 HA
• Setup Hive Metastore HA
• Setup Multi HBase Master
• Setup Multi Knox Cluster
30. Page30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
What is needed in Hadoop to support Scrum?
• Establish a habit of disciplined performance tuning of Hadoop regularly
• YARN, Hive, Tez, Spark, Kafka, Storm, Flume, HBase, Solr, Mapreduce, etc.
• Truncate logs regularly
• All Hadoop component logs
• Truncate when at 80% disk utilization
• Logs are a gold mine. Learn to interpret it correctly.
• Troubleshooting purposes
• Understanding how component operates, interoperate
• Turn off Hadoop services that are not needed
• Save cpu, memory, disk space
• Do not forget to turn on maintenance mode. Ask your Hadoop Admin why.
31. Page31 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
What is needed in Hadoop to support Scrum?
• Know your tools
Component Best used for
Sqoop Ingesting RDBMS tables into HDFS and/or Hive
Flume Ingesting flat files from network file systems or file servers. Capped at 400,000 records/sec
NFS Ingesting flat files from NFS based file servers. ONLY ingest less than 1GB per file
Kafka, Storm, HBase Realtime, Streaming and Online processing. Perfect for IoT, CEP. They all go together in realtime
systems.
Slider Deploying custom long running applications. i.e. Tomcat Apps, etc.
Spark Data science (Spark ML), Micro-batch Streaming (Spark Streaming)
MapReduce Only use it when Pig and Hive can’t do the job
Pig Perfect for ETL processing. Data mining and statistics (Apache DataFu)
Hive Reporting and Analytics. Data warehousing. Always use ORC!
Tez Never turn it off. Enable both for Pig and Hive for fast data processing
Falcon Process orchestration and data lineage
Knox, Kerberos, Ranger AuthN, AuthZ, Audit. Preventing impersonation.
Ambari Do NOT update config files manually. Use Ambari UI to make config changes in Hadoop.
32. Page32 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Let’s Scrum!
Putting Hadoop and Scrum to the test
33. Page33 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
The Project – HVAC Sensor Analytics
• Business wants to understand how the buildings are consuming energy
and wants to start with HVAC. They want to determine which HVAC
systems are working harder and prioritize for maintenance or
replacement.
• Determine which HVAC products have the highest temperature
deviation and order them by age.
• Recommend which buildings have the possible, poorest maintenance
practices
34. Page34 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
TODO
• Apply SCRUM principles and rules
• Properly size your team
• Break down the requirements into Product Story
• Determine Sprint Goal
• Generate one Spring Story
• Develop the app in Hive
• Any performance tuning to your tables and creates is a big +
35. Page35 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Rules
• Spend 15 minutes as Sprint Planning
• We will do a 2 hour Sprint
• We will do daily Scrum meeting (just once) in the middle of 2 hour
Sprint
• Spend 15 minutes Sprint Review