State of the Union with Shaun Connolly

Hortonworks
“State of the Union”
Shaun Connolly, VP Strategy
@shaunconnolly, @hortonworks

January 22, 2013

© Hortonworks Inc. 2013 Page 1

Quick House Keeping Rule

• Q&A panel is available if you have any questions during
the webinar
• There will be time for Q&A at the end
• We will record the webinar for future viewing
• All attendees will receive a cope of the slides an recording

Page 2
© Hortonworks Inc. 2013

Hortonworks
• History of Apache Hadoop & Hortonworks’ Role
– Genesis of Apache Hadoop
– Role of Apache Software Foundation
– Hortonworks Process for “Enterprise Hadoop”

• Key Areas of Focus in 2012

• The Road Ahead for Enterprise Hadoop

Page 3

A Brief History of Apache Hadoop

Apache Project Yahoo! Hortonworks
Established Operate at scale Data Platform

2013
2004 2006 2008 2010 2012 Enterprise
Hadoop
2005: Yahoo! creates
team under E14 to Focus on INNOVATION
work on Hadoop

Page 4



2013
2004 2006 2008 2010 2012 Enterprise
Hadoop
work on Hadoop

2007: Yahoo team extends focus to
operations to support multiple Focus on OPERATIONS
projects & growing clusters

Page 5



2013
2004 2006 2008 2010 2012 Enterprise
Hadoop
work on Hadoop

2007: Yahoo team extends focus to
operations to support multiple Focus on OPERATIONS
projects & growing clusters

2011: Hortonworks created to focus on
“Enterprise Hadoop“. Starts with 24 STABILITY
key Hadoop engineers from Yahoo

Page 6

Hortonworks Snapshot

We develop, distribute and support
the ONLY 100% open source
Headquarters: Palo Alto, CA
Employees: 180+ and growing
Enterprise Hadoop distribution
Investors: Benchmark, Index, Yahoo

Develop Distribute Support
• We employ the core • We distribute the only 100% • We are uniquely positioned
architects, builders and Open Source Enterprise to deliver the highest quality
operators of Apache Hadoop Hadoop Distribution: of Hadoop support
Hortonworks Data
• We drive innovation within Platform • We enable the ecosystem to
Apache Software work better with Hadoop
Foundation projects • We engineer, test & certify
HDP for enterprise usage

Endorsed by Strategic Partners

Page 7

Apache Community Leadership
Apache Software Foundation
Test & Guiding Principles
Patch Release
Apache • Release early & often
Hadoop
• Transparency, respect, meritocracy
Design & Develop

“We have noticed more activity over the last year
from Hortonworks’ engineers on building out
Apache Hadoop’s more innovative features. These
include YARN, Ambari and HCatalog..”
- Jeff Kelly: Wikibon

Page 8

Apache
Pig Test & Guiding Principles
Patch Release
Hadoop
Apache • Transparency, respect, meritocracy
Hive
Design & Develop

Apache
Apache HCatalo
HBase g

Apache
Other Ambari
Apache
Projects

Apache Hadoop’s more innovative features. These
include YARN, Ambari and HCatalog..”
- Jeff Kelly: Wikibon

Page 9

Apache
Pig Test & Guiding Principles
Patch Release
Hadoop
Apache • Transparency, respect, meritocracy
Hive
Design & Develop

Apache
Key Roles held by Hortonworkers
Apache
HBase
HCatalo
g
• PMC Members
– Managing community projects
Apache
Ambari
– Mentoring new incubator projects
Other
Apache – About 20 Hortonworkers managing community
Projects

• Committers
– Authoring, reviewing & editing code
– About 50 Hortonworkers across projects
Apache Hadoop’s more innovative features. These • Release Managers
include YARN, Ambari and HCatalog..” – Testing & releasing projects
– Hortonworkers across key projects like
- Jeff Kelly: Wikibon Hadoop, Hive, Pig, HCatalog, Ambari, HBase

Page 10

Hortonworks Process for Enterprise Hadoop
Upstream Community Projects Downstream Enterprise Product

Apache
Pig Test &
Patch Release
Apache
Hadoop
Apache Hortonworks
Hive
Design & Develop Data Platform

Apache
Apache HCatalo
HBase g

Apache
Other Ambari
Apache
Projects

Page 11


Integrate
& Test

Apache Design &
Pig Test &
Patch Develop
Apache Release Package
Hadoop & Certify
Apache Hortonworks
Hive

Apache
Apache HCatalo
HBase g
Distribute
Apache
Other Ambari
Apache
Projects

Page 12


Virtuous Cycle: development & fixed issues done upstream & stable project releases flow downstream
Integrate
& Test

Fixed Issues

Apache Design &
Pig Test &
Patch Develop
Apache Release Package
Hadoop & Certify
Apache Stable Project Hortonworks
Hive Releases

Apache
Apache HCatalo
HBase g
Distribute
Apache
Other Ambari
Apache
Projects
No Lock-in: Integrated, tested & certified distribution
lowers risk by ensuring close alignment with Apache projects

Page 13

HDP Certifies Latest Stable Components

Apache HDP CDH CDH
Project 1.2 3u5 4.1.2
Hadoop 1.1.2 020.2 +923.418 2.0.0alpha +541
Pig 0.10.1 0.8.1 +51.39 0.10.0 +48
Hive 0.10.0 0.7.1 +42.56 0.9.0 +148
HCatalog 0.5.0 n/a n/a
HBase 0.94.2 0.90.6 +84.73 0.92.1 +154
Sqoop 1.4.2 1.3.0 +5.88 1.4.1 +51
Oozie 3.2.0 3.2.0 3.2.0
Zookeeper 3.4.5 3.3.5 +19.5 3.4.3 +25
Ambari 1.2.0 n/a n/a
Flume 1.3.0 0.9.4 +25.46 1.2.0 +119
Mahout 0.7.0 0.5 +9.7 0.7 +4

Source: http://files.cloudera.com/pdf/datasheet/cdh4.1_spec_sheet.pdf

Page 14

True Enterprise Class Open Source
• 100% Open Source. No Holdbacks.
– Only true implementation of OSS Apache Hadoop
– Preferred by the software vendors that you rely on

• Flexible Deployment
– No License Fee for usage

• Community Open Source Mitigates Lock-In
– Proprietary Open Source = Lock-In
– Open communities always trump “open source”

Page 15

Hortonworks

– Addressing “Enterprise Hadoop” Requirements
– Enabling Interoperability of the Ecosystem


Page 16

HDP: Enterprise Hadoop Distribution
Hortonworks
Data Platform (HDP)
Enterprise Hadoop
• The ONLY 100% open source
WEBHDFS
Distributed MAP REDUCE and complete distribution
HADOOP CORE Storage & Processing (in 2.0)
HDFS YARN
Enterprise Readiness:
PLATFORM SERVICES HA, DR, Snapshots, Security • Enterprise grade, proven and
,…
tested at scale

• Ecosystem endorsed to
ensure interoperability

Page 17


DATA Hortonworks
SERVICES
Data Platform (HDP)
FLUME Store, Proces
PIG HIVE
s and Access HBASE Enterprise Hadoop
SQOOP Data
HCATALOG
WEBHDFS
HDFS YARN
,…
tested at scale


Page 18


OPERATIONAL DATA Hortonworks
SERVICES SERVICES
Data Platform (HDP)
Manage &
AMBARI FLUME Store, Proces
PIG HIVE
Operate at s and Access HBASE Enterprise Hadoop
Scale SQOOP Data
OOZIE HCATALOG
WEBHDFS
HDFS YARN

PLATFORM SERVICES Enterprise Readiness: HA,
DR, Snapshots, Security, …
• Enterprise grade, proven and
tested at scale


Page 19


SERVICES SERVICES
Data Platform (HDP)
Manage &
PIG HIVE
Scale SQOOP Data
OOZIE HCATALOG
WEBHDFS
HDFS YARN
,…
tested at scale
HORTONWORKS
DATA PLATFORM (HDP) • Ecosystem endorsed to

Page 20


SERVICES SERVICES
Data Platform (HDP)
Manage &
PIG HIVE
Scale SQOOP Data
OOZIE HCATALOG
WEBHDFS
HDFS YARN
,…
tested at scale
HORTONWORKS
DATA PLATFORM (HDP) • Ecosystem endorsed to
OS Cloud VM Appliance

Page 21

Latest Hortonworks Announcements
Two releases in January 2013

JANUARY Hortonworks Data Platform 1.2
Hortonworks Brings Enterprise Manageability to 100%
Open Source Apache Hadoop Distribution
15

JANUARY Hortonworks Sandbox
Hortonworks accelerates Hadoop skills development
with an easy-to-use, flexible and extensible platform to
22 learn, evaluate and use Apache Hadoop

Page 22

HDP 1.2 Summary
Hortonworks Data Platform 1.2
HDP outpaces the competition to extend leadership through 100%
open source Enterprise Apache Hadoop

Focus areas:
• Ambari: continued innovation with a complete, free and open
cluster management tool
• Provision, Manage and Monitor your Hadoop infrastructure
• Job diagnostics, usage heat maps
• Ecosystem integration
• Enhanced security model for Hive and HCatalog
• Performance and operational enhancements for HBase
• Extended Full Stack HA to Hive & HCatalog Metastore

Page 23

HDP 1.2: Ambari Key Features
• Job Diagnostics
Visualize and troubleshoot Hadoop
job execution and performance

• Cluster History
View historical job execution &
performance

• Instant Insight
View health of Core Hadoop
(HDFS, MapReduce) and related
projects

• Cluster Navigation
“Quick link” buttons jump into
namenode web UI for a server

• REST interface
provides external access to Ambari
for existing tools. Facilitates
Apache Ambari Dashboard integration with Microsoft System
Center and Teradata Viewpoint

Page 24

0 to Big Data in 15 Minutes

Hands on tutorials
integrated into HDP environment for
Sandbox evaluation

Page 25

Hortonworks

– Addressing “Enterprise Hadoop” Requirements
– Enabling Interoperability of the Ecosystem


Page 26

Traditional Data Architecture
APPLICATIONS

Business Custom Enterprise
Analytics Applications Applications
DEV & DATA
TOOLS
BUILD &
TEST
DATA SYSTEMS

OPERATIONAL
TOOLS
MANAGE &
MONITOR
RDBMS EDW MPP
TRADITIONAL REPOS
DATA SOURCES

Traditional Sources
OLTP, PO(RDBMS, OLTP, OLAP)
S
SYSTEMS

Page 27

Next-Generation Data Architecture
APPLICATIONS

Business Custom Enterprise
Analytics Applications Applications
DEV & DATA
TOOLS
BUILD &
TEST
DATA SYSTEMS

OPERATIONAL
TOOLS
HORTONWORKS MANAGE &
DATA PLATFORM MONITOR
RDBMS EDW MPP
TRADITIONAL REPOS
DATA SOURCES

Traditional Sources New Sources
OLTP, PO(RDBMS, OLTP, OLAP) (web logs, email, sensor data, social media)
MOBILE
S DATA
SYSTEMS

Page 28

Interoperating With Your Tools
APPLICATIONS

Microsoft Applications
DEV & DATA
TOOLS
DATA SYSTEMS

OPERATIONAL
TOOLS
HORTONWORKS
DATA PLATFORM
TRADITIONAL REPOS Viewpoint
DATA SOURCES

Traditional Sources New Sources
OLTP, PO(RDBMS, OLTP, OLAP) (web logs, email, sensor data, social media)
MOBILE
S DATA
SYSTEMS

Page 29

Hortonworks & Teradata
• Unified Data Architecture
– The right technology on the right analytical problems using best of breed technologies

• Viewpoint Integration
– Common management console
for Aster, Teradata and Apache
Hadoop
• TVI: Teradata Vital
Infrastructure
– Proactive
reliability, availability, and
manageability support service
• Aster Connector for Hadoop
– SQL-H integration
• Teradata Connector for Hadoop
– Sqoop integration
• Pre-tuned HDFS and
HORTONWORKS MapReduce parameters for Big
DISTRIBUTION PLATFORM Data workloads
Big Data Management

Page 30

Hortonworks & Microsoft
Microsoft Brings Big Data to the Masses

HDInsight • Simplifies Hadoop, Enterprise Ready
• Hortonworks Data Platform used for
Hadoop on Windows Server and Azure

+ • An engineered, open source solution
– Hadoop engineered for Windows
• Excel – Hadoop powered Microsoft business tools
• PowerPivot (BI) – Ops integration with MS System Center
• PowerView – Bidirectional connectors for SQL Server
(visualization) – Support for Hyper-V, deploy Hadoop on VMs
– Opens the .NET developer community to Hadoop
– Deploy on Azure in 10 minutes

Page 31

Hortonworks


– Patterns of Use
– Key Areas of Investment

Page 32

Market Transitioning into Early Majority
Enterprise adoption accelerates via:
• Repeatable horizontal patterns of use
relative %
customers

• Ecosystem-driven pull market
• Vertical applications (aka bowling pins)

The CHASM
Innovators, t Early Early Late
Laggards, Ske
echnology adopters, majority, pr majority, conse
ptics
enthusiasts visionaries agmatists rvatives

time
Customers want Customers want
technology & performance solutions & convenience

Source: Geoffrey Moore - Crossing the Chasm

Page 33

Patterns of Use: “Right-time Access”
Business Case

Batch Interactive Online

Refine Explore Enrich

HORTONWORKS
DATA PLATFORM

Big Data
Transactions, Interactions, Observations

Page 34

Being Big Data Driven at Neustar
Create new business opportunities and save money
with information analytics
• Provides real-time
information and
analysis to
• Traditional business heavy in data capture and data
Internet, telecommunic movement.
ations, entertainment
and marketing – Aggregate data for industries as information exchange
industries throughout – For instance they used to store 1% of DNS data for 60 days
the world.
to bill customers and identify DDOS attacks – With Hadoop
• Started off focused on they now store 100% over a year
# porting for carriers
– Not economically feasible to use existing DW for new data

• 2500+ Employees • Eliminated politics with creation of “catch basin”
– Year 1: Use Hadoop to capture everything they used to throw
away while leaving existing systems in tact
– Year 2: Make this data available for new business
opportunities, but require the business to justify

Page 35

Customers Don’t Want More Data Silos

AVOID: Systems separated by GOAL: Platform that natively
workload type due to contention supports mixed workloads

Batch Interactive Online

Big Big Big Big Data
Data Data Data Transactions, Interactions, Observations

Page 36

2013 “Enterprise Hadoop” Initiatives

Invest In:

OPERATIONA DATA
L SERVICES SERVICES

HADOOP CORE

PLATFORM SERVICES

HORTONWORKS
DATA PLATFORM (HDP)

Page 37


Invest In:
–Platform Services

OPERATIONA DATA
L SERVICES SERVICES

HADOOP CORE

PLATFORM SERVICES

HORTONWORKS
DATA PLATFORM (HDP)

“Continuum”
Biz Continuity

Page 38


Invest In:
Hive / “Stinger”
Interactive Query
HBase
Online Data
–Data Services
OPERATIONA DATA
L SERVICES SERVICES

HADOOP CORE

PLATFORM SERVICES

“Herd” –
HORTONWORKS
DATA PLATFORM (HDP) Data Integration

“Continuum”
Biz Continuity

Page 39


Invest In:
Hive / “Stinger”
Interactive Query
Ambari HBase
Manage & Operate Online Data
–Data Services
OPERATIONA DATA
L SERVICES SERVICES

HADOOP CORE

PLATFORM SERVICES

“Herd” –Operational Services
“Knox” HORTONWORKS

Secure Access
DATA PLATFORM (HDP) Data Integration

“Continuum”
Biz Continuity

Page 40

Top BI Vendors Support Hive Today

Page 41

Stinger: Enhance Hive for BI Use Cases
Enterprise Reports Parameterized Reports

Dashboard / Scorecard

Data Mining Visualization

More SQL
&
Better Performance

Batch Interactive

Page 42

Our Focus Remains Unchanged
• Innovate Core Hadoop
– Lead innovation within the Apache Hadoop community

• Enhance Hadoop for Enterprise Class Usage
– Add platform, data, and operational services that enterprises need
– Apply enterprise software rigor to test & release process

• Enable the Data Ecosystem
– Leverage Hadoop to enable Partners to be successful

• All Open Source, All the Time
– Avoid proprietary open source which locks you in

Page 43

Next Steps
Download Hortonworks Sandbox
www.hortonworks.com/sandbox

Download Hortonworks Data Platform
www.hortonworks.com/download

Register for Enterprise Hadoop Series
www.hortonworks.com/webinars

Follow…

Page 44

Power of Community is Key

Amsterdam San Jose, CA
March 20 - 21, 2013 June 26 - 27, 2013

REGISTER NOW CALL FOR PAPERS
http://hadoopsummit.org/amsterdam/register/ http://hadoopsummit.org/san-jose/call-for-papers/


Next Steps
Download Hortonworks Sandbox
www.hortonworks.com/sandbox

Download Hortonworks Data Platform
www.hortonworks.com/download

Register for Enterprise Hadoop Series
www.hortonworks.com/webinars

Follow…

Page 46

Questions?

Page 47

State of the Union with Shaun Connolly

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to State of the Union with Shaun Connolly

Similar to State of the Union with Shaun Connolly (20)

More from Hortonworks

More from Hortonworks (20)

Recently uploaded

Recently uploaded (20)

State of the Union with Shaun Connolly

Editor's Notes