As more applications are created using Apache Hadoop that derive value from the new types of data from sensors/machines, server logs, click-streams, and other sources, the enterprise "Data Lake" forms with Hadoop acting as a shared service. While these Data Lakes are important, a broader life-cycle needs to be considered that spans development, test, production, and archival and that is deployed across a hybrid cloud architecture.
If you have already deployed Hadoop on-premise, this session will also provide an overview of the key scenarios and benefits of joining your on-premise Hadoop implementation with the cloud, by doing backup/archive, dev/test or bursting. Learn how you can get the benefits of an on-premise Hadoop that can seamlessly scale with the power of the cloud.
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Â
Data Lake for the Cloud: Extending your Hadoop Implementation
1. Page 1 Š Hortonworks Inc. 2014June 2014
We do Hadoop.
Data Lake for the Cloud
âŚExtending your Hadoop Implementation
2. Page 2 Š Hortonworks Inc. 2014
Your speakersâŚ
John OâBrien
Principal Analyst and CEO
Radiant Advisors
Bob Page
VP Partner Product Management
Hortonworks
Matt Winkler
Principal Program Manager
Microsoft
3. Page 3 Š Hortonworks Inc. 2014
Poll: Where are you on your Hadoop Journey?
â˘âŻ Researching our options
â˘âŻ Currently Evaluating
â˘âŻ Deep in a trial
â˘âŻ Whatâs Hadoop?
4. Page 4 Š Hortonworks Inc. 2014
Trends and driversâŚ
John OâBrien
Principal Analyst and CEO
Radiant Advisors
5. Page 5 Š Hortonworks Inc. 2014
Leading Business Drivers and Trends
1.⯠Scale down operational infrastructure management costs
â˘âŻ General evaluation for all on-premises to private/public/hybrid cloud
â˘âŻ Hadoop does not fit IT efficiency through economies of scale and standards
2.⯠Centralize Hadoop data management
â˘âŻ Resolve costly data movement, duplication and latency between data centers
â˘âŻ Cloud Data Lake Strategy for shared access across geographic regions
3.⯠Moving data store closer to data sources and Users
â˘âŻ Performance and costs (Internet/VPN, LAN Ethernet, InfiniBand)
â˘âŻ Data sources are increasingly external to the company
4.⯠Ecosystem of strategic IT relationships
âOur sister
organization just
signed a great deal
with Microsoft Azure
and we want to
leverage shared
services.â
6. Page 6 Š Hortonworks Inc. 2014
Technical Drivers for Hadoop in the Cloud
1.⯠Elasticity â setting nominal resources and handling load volatility
2.⯠Flexibility â managing base workloads and handling others
3.⯠Scalability â can on-premises handle scalable requirements
4.⯠Security â requirements dictate from Hadoop apps to networking
5.⯠Proximity â distance data travels impacts cost and performance
6.⯠Functionality â not all distributions are equal (Hive, HBase versions)
7.⯠Usability â Internal existing skillsets with OS and scripting
8.⯠Manageability â monitoring cloud and hybrid easily
Reference: Microsoft Big Data Solutions. Wiley 2014. Adam Jorgensen, James
Rowland-Jones, John Welch, Dan Clark, Christopher Price, Brian Mitchell.
7. Page 7 Š Hortonworks Inc. 2014
Hadoop Operating Models and Maturity
1.⯠On-Premises Hadoop Clusters
â˘âŻ Predefined balanced configurations with internal connectivity
â˘âŻ May leverage private cloud architecture for elasticity
2.⯠Cloud-based Hadoop Clusters and Storage
â˘âŻ Always-on Infrastructure-as-a-Service (IaaS) pricing model and workload
â˘âŻ On-demand Platform-as-a-Service (PaaS) pricing model and workloads
3.⯠Hybrid Hadoop Architectures
â˘âŻ Affordable storage and access to second class data
â˘âŻ Separation of production Analytic Applications from temporary activities
â˘âŻ Enabling on-premises clusters to efficiently meet the demands of volatility
8. Page 8 Š Hortonworks Inc. 2014
Hybrid Cloud Architecture Driver #1
Driver: Lower cost through optimized data platform
â˘âŻ Lower cost storage for lower value data needs (lower SLA)
â˘âŻ Regulatory requirements of historical data
Online Transparent Archive:
â˘âŻ Data policy driven by time, status and read-only state
â˘âŻ 10/90 or 10/100 data architecture to simplify data management
Online Backup and Business Continuity:
â˘âŻ Hadoop has good fault tolerance built-in with multiple data copies
â˘âŻ âClustersâ are single location oriented and not disaster recovery
9. Page 9 Š Hortonworks Inc. 2014
Hybrid Cloud Architecture Driver #2
Driver: Flexibility for on-demand and temporary needs
â˘âŻ Workload and cluster management (Prioritize jobs)
â˘âŻ Separate Production from Dev/Test and Discovery (mindset)
Discovery Sandboxes:
â˘âŻ Load external data to cloud for evaluation is easier than into the
data centers (network load, storage, security)
Proof of Concepts:
â˘âŻ Verifying new technologies and analytic apps on smaller subset
â˘âŻ Beyond exploring new data (not evaluation of Hadoop distribution)
Separating environment for Analytic Applications:
â˘âŻ Ensuring SLA-driven operational applications from discovery
10. Page 10 Š Hortonworks Inc. 2014
Hybrid Cloud Architecture Driver #3
Driver: Need for temporary elasticity
â˘âŻ On-premises clusters typically configured for nominal
â˘âŻ Volatility requires on-demand temporary resources
Bursting:
â˘âŻ Setting and managing ongoing nominal workloads with expected
volatility in data volumes (threshold)
Surging:
â˘âŻ Maintaining performance levels during surging event data volumes
or surging user activity (dynamic)
Electric Grids
maintain the
balance of
dynamic energy
generation with
dynamic demand.
11. Page 11 Š Hortonworks Inc. 2014
Dig Deeper Considerations
1.⯠Network Connectivity between corporate data centers and cloud
locations are often taken for granted where configuration stability
and latency have become obstacles.
2.⯠Unified Data Access can become an issue when federated access
involves extracting data out rather than pushing workloads into
Hadoop clusters.
3.⯠Hybrid Cloud Architectures vary for IaaS and PaaS implementations
of Hadoop. Understand the drivers for either Always-on IaaS or
On-Demand PaaS first then adjust the hybrid architecture.
12. Page 12 Š Hortonworks Inc. 2014
Key Takeaways
1.⯠Hadoop with the Cloud is driven by a set of business drivers and
then feasibility assessments for an increasing number of use cases,
architecture patterns and balance.
2.⯠Understand the different value propositions for Hadoop in the
Cloud with both IaaS and PaaS architectures as Cloud elasticity
comes in various forms.
3.⯠Strategic relationships play a significant roll in determining Cloud
and Hybrid-Cloud Hadoop architectures.
13. Page 13 Š Hortonworks Inc. 2014
Data Lake for the CloudâŚ
Bob Page
VP Partner Product Management
Hortonworks
14. Page 14 Š Hortonworks Inc. 2014
Hadoop Deployments Start Small
SCALE
SCOPE
New Analytic Apps
New types of data
LOB-driven
15. Page 15 Š Hortonworks Inc. 2014
And Then Grow Into Data LakesSCALE
SCOPE
A Modern Data Architecture/Data Lake
Â
New Analytic Apps
New types of data
LOB-driven
RDBMS
MPP
EDW
Governance
&Integration
Security
Operations
Data Access
Data
Management
Data Lake
An architectural shift in
the data center that
uses Hadoop to deliver
deeper insight across
a large, broad, diverse
set of data at efficient
scale.
Supporting multiple
applications and
workloads.
16. Page 16 Š Hortonworks Inc. 2014
Example Applications on the Data Lake
$
â˘âŻ New Account Risk Screens
â˘âŻ Fraud Prevention
â˘âŻ Trading Risk
â˘âŻ Maximize Deposit Spread
â˘âŻ Insurance Underwriting
â˘âŻ Accelerate Loan Processing
â˘âŻ Call Detail Records (CDRs)
â˘âŻ Infrastructure Investment
â˘âŻ Next Product to Buy (NPTB)
â˘âŻ Real-time Bandwidth Allocation
â˘âŻ New Product Development
â˘âŻ 360° View of the Customer
â˘âŻ Analyze Brand Sentiment
â˘âŻ Localized, Personalized
Promotions
â˘âŻ Website Optimization
â˘âŻ Optimal Store Layout
Financial
Services
Retail Telecom
Healthcare
Utilities, Oil &
Gas
Public Sector
â˘âŻ Genomic data for medical trials
â˘âŻ Monitor patient vitals
â˘âŻ Reduce re-admittance rates
â˘âŻ Store medical research data
â˘âŻ Recruit cohorts for pharmaceutical
trials
â˘âŻ Smart meter stream analysis
â˘âŻ Slow oil well decline curves
â˘âŻ Optimize lease bidding
â˘âŻ Compliance reporting
â˘âŻ Proactive equipment repair
â˘âŻ Seismic image processing
â˘âŻ Analyze public sentiment
â˘âŻ Protect critical networks
â˘âŻ Prevent fraud and waste
â˘âŻ Crowdsource reporting for repairs
to infrastructure
â˘âŻ Fulfill open records requests
â˘âŻ Supplier Consolidation
â˘âŻ Supply Chain and
Logistics
â˘âŻ Assembly Line Quality
Assurance
â˘âŻ Proactive Maintenance
â˘âŻ Crowdsourced Quality
Assurance
Manufacturing
17. Page 17 Š Hortonworks Inc. 2014
Efficient Data Lakes can Span to the Cloud
On-Premises Cloud
HDP on Windows
HDP on Linux
Your deployment of Hadoop
hosted as a VM in Azure
HDP on Windows
HDP on Linux
Full control of HW and
software configs
Analytics Platform
System
Turnkey Hadoop and
relational warehouse
appliance
HDInsight
Managed Hadoop Service
Built on Azure storage
Enjoy cross-platform interoperability based on 100% open source HDP
1 2
3 4
18. Page 18 Š Hortonworks Inc. 2014
âŚand Provide On-Premises and Cloud
Interoperability
Deployment choice: run the same
apps in the environment of your
choice
Consistent management story
Co-locate Hadoop processing next
to your apps, deployed on-premises
or in the cloud
Leverage Azure for cloud hosting,
Hadoop as a service, or as a
destination for backup
On-Ââpremises
 or
Â
 âprivate
 cloudâ
Â
Microso6
 Analy9cs
Â
Pla;orm
 System
Â
Opera9onal
Â
Tools
Â
Microso6
 Azure
Â
Microsoft Applications
Azure Storage
Azure HDInsight
20. Page 20 Š Hortonworks Inc. 2014
Hybrid Hadoop Scenarios:
Cloud Backup and Archive
Azure blob storage as low cost,
offsite backup
§ď§âŻ Run HDP and HDInsight to power
analytics on your data in the cloud
Automated data upload & backup
â˘âŻ Use Falcon to schedule data load rules,
push data based on business needs
Global aggregation
§ď§âŻ Capture data centers around the world
§ď§âŻ Run Hadoop local to a DC, or aggregate
across DCâs to query the entire dataset
Seamless transfer to other storage
§ď§âŻ Leverage Azure SQL DB & Azure storage
as sources or destinations data
On-Ââpremises
 or
Â
 âprivate
 cloudâ
Â
Microso6
 Analy9cs
Â
Pla;orm
 System
Â
Microso6
 Azure
Â
Azure Storage
Azure HDInsight
21. Page 21 Š Hortonworks Inc. 2014
Hybrid Hadoop Scenarios:
App Development/POC
Develop new apps on 100%
interoperable infrastructure
â˘âŻ Develop & test without pre-committing to
on-prem or cloud deployment
Create new development & test
environments on demand
â˘âŻ Do development with predictable costs
De-risk application development
â˘âŻ Protect production data & SLA workloads
from new dev errors and load spikes
Experiment with new types of data
to create new apps
â˘âŻ Defer decisions on data value and
integration with the Data Lake
On-Ââpremises
 or
Â
 âprivate
 cloudâ
Â
Microso6
 Analy9cs
Â
Pla;orm
 System
Â
Microso6
 Azure
Â
Azure Storage
Azure HDInsight
âŚ
22. Page 22 Š Hortonworks Inc. 2014
Hybrid Hadoop Scenarios:
Bursting
Handle peak workloads in 100%
interoperable environments
§ď§âŻ Run HDP and HDInsight to power analytics on
your data in the cloud
§ď§âŻ Runs the same application code
Make additional capacity available
by separating jobs, e.g.
â˘âŻ Ad hoc from scheduled
â˘âŻ analytics from reporting
â˘âŻ recent data from archived data
â˘âŻ ETL from aggregation
â˘âŻ SLA from non-SLA
â˘âŻ departmental
â˘âŻ by priorities
On-Ââpremises
 or
Â
 âprivate
 cloudâ
Â
Microso6
 Analy9cs
Â
Pla;orm
 System
Â
Microso6
 Azure
Â
Azure Storage
Azure HDInsight
âŚ
23. Page 23 Š Hortonworks Inc. 2014
Demo
Matt Winkler
Principal Program Manager
Microsoft
24. Page 24 Š Hortonworks Inc. 2014
Story line: Leveraging Falcon to enable data
movement to the cloud
Microsoft
Azure
Azure
Storage
HDInsight
Hadoop cluster
deployed to IaaS
On-Premises
Hadoop Cluster (HDP 2.1)
Running on CentOS
HDFS
YARN
Tez
Hive
MR
Falcon
â˘âŻ Leveraging Falcon to seamlessly move data to the cloud
â˘âŻ Leveraging HDInsight to create a cluster on demand to process the same data
with the same job
25. Page 25 Š Hortonworks Inc. 2014
Wait for itâŚ.Wait for itâŚ
26. Page 26 Š Hortonworks Inc. 2014
Demo wrap-upâŚ
Why Cloud?
â˘âŻ Elasticity
â˘âŻ Cost Optimization
â˘âŻ Economic flexibility
â˘âŻ Support for bursting workloads
â˘âŻ Global footprint
Why on Premises?
â˘âŻ Compliance requirements
â˘âŻ Specific control over hardware/networking
â˘âŻ Integration requirements for additional apps to be close to cluster
Why Both?
â˘âŻ Offsite backup
â˘âŻ Dev/Test
â˘âŻ Burst to Cloud
27. Page 27 Š Hortonworks Inc. 2014
Next stepsâŚ
Industry leading Hadoop Sandbox
§ď§âŻ Free download
§ď§âŻ Personal, portable Hadoop environment
Included Tutorials for Microsoft
§ď§âŻ How to Use Excel 2013 to Access Hadoop Data
§ď§âŻ How to Use Excel 2013 to Analyze Hadoop Data
§ď§âŻ How to Install and Configure the Hortonworks ODBC
driver on Windows 7
Try Hadoop in the Cloud
â˘âŻ Up and running in minutes
â˘âŻ Spin up without hardware
Free Trial: www.windowsazure.com/bigdata
hortonworks.com/sandbox
28. Page 28 Š Hortonworks Inc. 2014
Thank you
Time for Q&A