Weitere ähnliche Inhalte
Ähnlich wie Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand Webcast (20)
Mehr von Impetus Technologies (20)
Kürzlich hochgeladen (20)
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand Webcast
- 1. © 2014 Impetus Technologies1
July 25, 2014
Accelerating the Big Data Solution
Lifecycle and Improving ROI
- 2. © 2014 Impetus Technologies2
Agenda
Big Data
Analytics:
Implementation
patterns
Challenges
faced
Jumbune –
an open source
lifecycle
accelerator
Enterprise
solution
lifecycle
Ways to
address the
challenges
Recorded version available at http://bit.ly/1nMw8nQ
- 3. © 2014 Impetus Technologies3
Big Data Analytics
Primary drive
for performing
analytics
Rise of the
enterprise
data lake
Utilization of
analytical
resources
Recorded version available at http://bit.ly/1nMw8nQ
- 4. © 2014 Impetus Technologies4
Primary Purposes of an Analytical Solution
Optimize the
business
Reduce time
taken by analytics
Result in effective
analytics
Compete and
win
Recorded version available at http://bit.ly/1nMw8nQ
- 5. © 2014 Impetus Technologies5
Rise of the Enterprise Data Lake
BIG DATA
Sources of Data: ETL from every
source - RDBMS, flat files, queues,
legacy off loading, logs
Arrival of Data: Intermittent, bulk,
incremental
Theme : “Leave no Data unused”
Recorded version available at http://bit.ly/1nMw8nQ
- 6. © 2014 Impetus Technologies6
Utilization of Analytics Resources
• Capitalize on all analytics resources (engines) available
• Access data with a variety of processing engines – Storm, Spark, Yarn
etc.
• Model in data science analytical systems – R, Octave, SAS, etc.
• Write complex logic in custom MapReduce
• Reuse code as User Defined Functions (UDFs)
• Create ad hoc queries using Hive and PIG
• Customization of Mahout algorithms, machine learning libraries
Recorded version available at http://bit.ly/1nMw8nQ
- 7. © 2014 Impetus Technologies7
Enterprise Big Data Solution Trends
• No more single purpose Hadoop clusters
• Enterprise Data Lake: Data flowing from many sources
• Integrated platforms using variety of analytical engines
• Serving multiple business applications
• Resource sharing is a must across applications and engines
Recorded version available at http://bit.ly/1nMw8nQ
- 8. © 2014 Impetus Technologies8
Enterprise Solution Lifecycle (High level view)
Business
Requirement
Designing /
Modelling
Development
and Testing
Production and
Monitoring
Recorded version available at http://bit.ly/1nMw8nQ
- 9. © 2014 Impetus Technologies9
Enterprise Solution Lifecycle (Ground level
view)
xxx
xxx
Business User Data Analyst Development
Quality Test
DevOpsData Lake
Production and
Monitoring
Recorded version available at http://bit.ly/1nMw8nQ
- 10. © 2014 Impetus Technologies10
Challenges in Enterprise Analytical Solutions
No common
platform to detect
root causes
Incremental
imports may ingest
bad data
Cluster resources are
shared and optimal
utilization is the key
Implementing
models in custom
MR without errors
is like hitting the
bull’s eye
Bad logic or bad
data
Recorded version available at http://bit.ly/1nMw8nQ
- 11. © 2014 Impetus Technologies11
Scenario: Digitization of Newspaper for
Analyzing News
xxx
xxx
Team: 5 Dev, 3 QA, 2 DevOps
Simple Problem: ‘q’ was misread by OCR as 9
TIME
• A single code fault on TB of data can consume 24 work hours total
for 2 Developers + 1 QA
COST
• Additional hours by engineers + The cost of unproductive cloud
instances, storage and resources
Recorded version available at http://bit.ly/1nMw8nQ
- 12. © 2014 Impetus Technologies12
Scenario: Hive Queries Interpreted as
MapReduce Executions on a Hadoop Cluster
xxx
xxx
Team: 2 Dev, 1 QA, 1 DevOps
Simple Problem: Data imbalance across cluster, low performance by Hive queries.
TIME
• Development team were refactoring Hive queries for improving the
performance
COST
• Additional hours by engineers + The cost of unproductive cloud
instances, storage and resources
Recorded version available at http://bit.ly/1nMw8nQ
- 13. © 2014 Impetus Technologies13
Impact on ROI
Delayed Analytics Increase in CostsProductivity Loss
Defeats one of the
prime purpose of
analytics
Defeats the purpose
of business cost
optimization
Iterations reduce the
productivity of
dependent teams in
the cycle
Recorded version available at http://bit.ly/1nMw8nQ
- 14. © 2014 Impetus Technologies14
Current Iterative Development Approach
Local
Debug/ Unit
Tests
HDFS Data Check
Performance
• Localized subset
of data
• Non parallel
execution
• Practically
unfeasible
• Error prone
• Difficult to find
bad code
• Difficult to
collaborate
across
environments
Recorded version available at http://bit.ly/1nMw8nQ
- 15. © 2014 Impetus Technologies15
A Complete Enterprise Platform
Data Lake
Enterprise Engines
Solutions
Governance
Security
Validate,Profile,DebugandMonitor
Recorded version available at http://bit.ly/1nMw8nQ
- 16. © 2014 Impetus Technologies16
Introducing Jumbune: An Open Source
Solution
“A catalyst to accelerate realization of Big Data Analytics
solutions”
Flow AnalyzerData Validation Cluster Monitor Job Profiler
Recorded version available at http://bit.ly/1nMw8nQ
- 21. © 2014 Impetus Technologies21
Full Lifecycle Support - Jumbune
xxx
xxx
Development Quality
DevOpsData Ingestion
Recorded version available at http://bit.ly/1nMw8nQ
- 22. © 2014 Impetus Technologies22
Jumbune - Key Features
• In depth code level analysis of cluster wide flow
• Record and field level data violation reports
• No deployment on worker nodes - Ultra light agent installation on the gateway node
• Ability to turn on/off cluster monitoring at will – reduces resource load
• Customizable rack aware monitoring
• Correlated profiling analysis of phases, throughput and resource consumption
• Ability to work with all Hadoop distributions
• Coming up support for Yarn, Spark, Mesos
• Available as Open Source
Recorded version available at http://bit.ly/1nMw8nQ
- 23. © 2014 Impetus Technologies23
2
3
For general inquiries about other Impetus solutions and services
reach us at bigdata@impetus.com
Recorded version available at http://bit.ly/1nMw8nQ
- 24. © 2014 Impetus Technologies24
Thank You!
Website
• http://jumbune.org
Contribute
• http://github.com/impetus-opensource/jumbune
• http://jumbune.org/jira/JUM
Social
• Follow @jumbune Use #jumbune
• Jumbune Group: http://linkd.in/1mUmcYm
Forums
• Users: users-subscribe@collaborate.jumbune.org
• Dev: dev-subscribe@collaborate.jumbune.org
• Issues: issues-subscribe@collaborate.jumbune.org
Downloads
• http://jumbune.org
• https://bintray.com/jumbune/downloads/jumbune
Recorded version available at http://bit.ly/1nMw8nQ