SlideShare ist ein Scribd-Unternehmen logo
1 von 49
Downloaden Sie, um offline zu lesen
Page 1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Turbocharging Your Data Science with
HAWQ on the Hortonworks Data Platform
We Do Hadoop
Page 2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Your Hosts
Michael Cucchi
•  Sr. Director of Outbound Product for Pivotal's Data,
Mobile, and IoT solutions
•  20 years of engineering, management, and
marketing experience in the high-tech industry
@mikecucchi
Matt Morgan
•  Vice President, Global Product Marketing
•  20 year history as a marketing and product
executive in cloud, SaaS, and big data businesses
@forwardtension
Page 3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Establish Hadoop as the
Foundational Technology
of the Modern Enterprise
Data Architecture
Year Founded In 2011, 24 engineers from the original Hadoop
team at Yahoo! spun out to form Hortonworks.
Ticker Symbol NASDAQ: HDP
Headquarters Santa Clara, CA
Business Model Open Source Software Support Subscriptions,
Training and Consulting Services
Non-GAAP Billings Grew from zero to over $120 million
on an annualized basis in 11 quarters
Subscription
Customers
437 in 11 quarters
with 105 added in Q1-2015 alone.
Support 24×7, global web, telephone support
Partners 1100 joint engineering, strategic reseller,
technology, and system integrator partners
Employees 650+
Global Operations 17 countries
#1
28 out of 86 Apache Hadoop committers
Hortonworks employs the largest group of Hadoop committers
under one roof; more than twice any other company.
#1
165 Apache committer seats for projects in HDP
Our committers work in 20+ projects on the data access,
management, security, operations, and governance needs of
the enterprise; more than twice any other company.
Hortonworks Quick Facts
The Forrester Wave™ Big Data Hadoop Solutions
We are recognized as a leader in Hadoop by Forrester
Research based on the strengths of our offerings and strategy
Page 4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Traditional Systems Under Pressure
Challenges
•  Constrains data to app
•  Can’t manage new data
•  Costly to Scale
Business Value
Clickstream
Geolocation
Web Data
Internet of Things
Docs, emails
Server logs
2012
2.8 Zettabytes
2020
40 Zettabytes
LAGGARDS
INDUSTRY
LEADERS
1
2 New Data
ERP CRM SCM
New
Traditional
Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Early Hadoop: The Start of a Modern Data Architecture
Apache Hadoop is an open source data platform for
managing large volumes of high velocity and variety of data
•  Built by Yahoo! to be the heartbeat of its ad & search business
•  Donated to Apache Software Foundation in 2005 with rapid adoption by
large web properties & early adopter enterprises
•  Incredibly disruptive to current platform economics
Traditional Hadoop Advantages
ü  Manages new data paradigm
ü  Handles data at scale
ü  Cost effective
ü  Open source
Traditional Hadoop Had Limitations
Batch-only architecture with limited analytic
options
Single purpose clusters, specific data sets
Difficult to integrate with existing investments
Not enterprise-grade
Application
Storage
HDFS
Batch Processing
MapReduce
Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Today: Modern Data Architecture Unifies Data & Processing
Modern Data Architecture
•  Enable applications to have access to
all your enterprise data through an
efficient centralized platform
•  Supported with a centralized
approach governance, security and
operations
•  Versatile to handle any applications
and datasets no matter the size or
type
Clickstream	
   Web	
  	
  
&	
  Social	
  
Geoloca3on	
   Sensor	
  	
  
&	
  Machine	
  
Server	
  	
  
Logs	
  
Unstructured	
  
SOURCES
Existing Systems
ERP	
   CRM	
   SCM	
  
ANALYTICS
Data
Marts
Business
Analytics
Visualization
& Dashboards
ANALYTICS
Applications
Business
Analytics
Visualization
& Dashboards
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
HDFS
(Hadoop Distributed File System)
YARN: Data Operating System
Interactive Real-TimeBatch Partner ISVBatch Batch
MPP	
   EDW	
  
Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
OPERATIONAL	
  TOOLS	
  
DEV	
  &	
  DATA	
  TOOLS	
  
INFRASTRUCTURE	
  
Partnerships Enrich the Hadoop Ecosystem
Clickstream	
   Web	
  	
  
&	
  Social	
  
Geoloca3on	
   Sensor	
  	
  
&	
  Machine	
  
Server	
  	
  
Logs	
  
Unstructured	
  
SOURCES
Existing Systems
ERP	
   CRM	
   SCM	
  
ANALYTICS
Data
Marts
Business
Analytics
Visualization
& Dashboards
ANALYTICS
Applications
Business
Analytics
Visualization
& Dashboards
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
HDFS
(Hadoop Distributed File System)
Deep Partnerships
Hortonworks engages
in deep engineered relationships
with the leaders in the data center,
such as EMC, Microsoft, Teradata,
Red Hat, HP, SAS & SAP
Broad Partnerships
Over 1100 partners work with us to
certify their applications to work with
Hadoop so they can extend big data
to their users
YARN: Data Operating System
EDW	
  
Interactive Real-TimeBatch Partner ISV
Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoop Adoption Follows a Predictable Journey
Cost Optimization, new analytic apps, and ultimately to a data lake
Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoop Driver: Cost optimization
Archive Data off EDW
Move rarely used data to Hadoop as active
archive, store more data longer
Offload costly ETL process
Free your EDW to perform high-value functions
like analytics & operations, not ETL
Enrich the value of your EDW
Use Hadoop to refine new data sources, such as
web and machine data for new analytical context
ANALYTICS
Data
Marts
Business
Analytics
Visualization
& Dashboards
HDP helps you reduce costs and optimize the value associated with your EDW
ANALYTICSDATASYSTEMS
Data
Marts
Business
Analytics
Visualization
& Dashboards
HDP 2.2
ELT
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
N
Cold Data,
Deeper Archive
& New Sources
Enterprise Data
Warehouse
Hot
MPP
In-Memory
Clickstream	
   Web	
  	
  
&	
  Social	
  
Geoloca3on	
   Sensor	
  	
  
&	
  Machine	
  
Server	
  	
  
Logs	
  
Unstructured	
  
Existing Systems
ERP	
   CRM	
   SCM	
  
SOURCES
Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoop Driver: Advanced analytic applications
Single View:
Improve acquisition & retention
•  HDP enables a single view of each
customer, allowing organizations to
provide targeted, personalized
customer experiences.
•  Single view reduces attrition,
improves cross-sell and improves
customer satisfaction.
Predictive Analytics:
Identify next best action
•  HDP captures, stores and processes
large volumes of data streaming
from connected devices
•  Stream processing and data science
help introduce new analytics for real-
time and batch analysis
Data Discovery:
Uncover new findings
•  HDP allows exploration of new data
types and large data sets that were
previously too big to capture, store &
process.
•  Unlock insights from data such as
clickstream, geo-location, sensor,
server log, social, text and video
data.
Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
360° Customer View Boosts Sales at Home Supply Retailer
Problem: Lack of unified customer record across all channels
clouded targeting for marketing campaigns
•  No “golden record” for analytics on customer buying behavior across all channels
•  Data repositories on web traffic, POS transactions and in-home services existed in
isolation of each other
•  Data storage costs were increasing, without a corresponding increase in value
Solution: HDP data lake drives golden customer record, targeted
marketing, and reduction in data storage expenses
•  Golden record enables targeted, personalized marketing with higher success rates
•  Data warehouse offload saved millions of dollars in recurring expense
•  Price optimization versus competitors à several millions in top-line revenue growth
New Analytic Applications
Clickstream, Unstructured
and Structured Data
Retail
Major home improvement
retailer
RT2
Why Hadoop?
Single View
Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Responsive Patient Treatment with Real-time Monitoring of Vitals
Problem: Inability to store and access sufficient data for medical
decision support in real time
•  9 million patient records on a legacy system were not searchable nor retrievable
•  Cohort selection for research projects was slow, despite abundance of data
•  Clinicians had minimal access to historical data gathered across all patients
Solution: Unified data lake improves patient health, speeds
research
•  Legacy system retired immediately, saving $500K in annual recurring expense
•  Records stored with patient identification for clinical use, same data presented
anonymously to researchers for cohort selection
•  Wireless patches transmit vital signs, algorithms notify doctors of high risk patterns
•  Heart patients weigh themselves from home, algorithms notify doctors about unsafe
weight changes and recommend a visit to the clinic
New Analytic Applications
Sensor, Social Data
& ETL Offload
Healthcare
Public university teaching
hospital
HC2
Why Hadoop?
Predictive Analytics
Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoop Driver: Enabling the Data LakeSCALE
SCOPE
Data Lake Definition
•  Centralized Architecture
Multiple applications on a shared data
set with consistent levels of service
•  Any App, Any Data
Multiple applications accessing all data
affording new insights and opportunities.
•  Unlocks ‘Systems of Insight’
Advanced algorithms and applications
used to derive new value and optimize
existing value.
Drivers:
1.  Cost Optimization
2.  Advanced Analytic Apps
Goal:
•  Centralized Architecture
•  Data-driven Business
DATA LAKE
Journey to the Data Lake with Hadoop
Systems of Insight
Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Case Study: 12-Month Hadoop Evolution at TrueCar
DataPlatformCapabilities
12 months execution plan
June 2013
Begin
Hadoop
Execution
July 2013
Hortonworks
Partnership
May ‘14
IPO
Aug 2013
Training
& Dev
Begins
Nov 2013
Production
Cluster
60 Nodes
2 PB
Jan 2014
40% Dev
Staff
Proficient
Dec 2013
Three
Production
Apps
(3 total)
Feb 2014
Three More
Production
Apps
(6 total)
12 Month Results at TRUECar
•  Six Production Hadoop Applications
•  Sixty nodes/2PB data
•  Storage Costs/Compute Costs
from $19/GB to $0.12/GB
“We addressed our data platform capabilities
strategically as a pre-cursor to IPO.”
Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hortonworks Data Platform
Hadoop for the Enterprise
Page 16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
HDP Makes Hadoop Enterprise-Ready
Hortonworks Data Platform
Multi-tenant data platform built on a centralized
architecture of shared enterprise services
YARN: data operating system
Governance Security
Operations
Resource management
Existing
applications
New
analytics
Partner
applications
Data access: batch, interactive, real-time
Storage
Key benefits
Consolidates all data sets
Delivers real-time insights
Integrates with data center
Scalable and affordable
Page 17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Any application
Batch, interactive, and real-time
Any data
Existing and new datasets
Anywhere
Complete range of deployment options
Commodity Appliance Cloud
HDP Makes Hadoop Pervasive
YARN: data operating system
Existing
applications
New
analytics
Partner
applications
Data access: batch, interactive, real-time
Page 18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
An “Any Application” Example: Spark in HDP
Delivering a production-ready
experience for Spark applications
•  Centralized Resource Management
Integrated with YARN
•  Consistent Operations
Provisioned and managed by Ambari
•  Comprehensive Security
Runs within secure clusters
•  Deployable Anywhere
Windows, Linux, on-premises or cloud;
consistent Cloudbreak launch experience
YARN: data operating system
Governance Security
Operations
Resource management
Storage
Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
BI / Analytics
(Hive)
IoT Apps
(Storm, HBase, Hive)
An “Anywhere” Example: Cloudbreak and HDP
Dev / Test
(all HDP services)
Data Science
(Spark)
Cloudbreak
1. Pick a Blueprint
2. Choose a Cloud
3. Launch HDP!
Example Ambari Blueprints:
IoT Apps, BI / Analytics, Data Science, Dev / Test
Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
“Hortonworks loves and lives
open source innovation”
World Class Support and Services.
Hortonworks' Customer Support received a
maximum score and was significantly higher
than both Cloudera and MapR
A Leader in Hadoop
The Forrester Wave™
Big Data Hadoop Solutions
Q1 2014
Page 21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
INRASTRUCTURE
Pivotal in the Modern Data Architecture
OPERATIONS TOOLS
Provision, Manage &
Monitor
DEV & DATA TOOLS
Build & Test
DATASYSTEMSAPPLICATIONS
Repositories
ROOMS
Statistical
Analysis
BI / Reporting,
Ad Hoc Analysis
Interactive Web
& Mobile Applications
Enterprise
Applications
EDW MPP
RDBMS
EDW
MPP
SOURCES
OLTP, ERP,
CRM Systems
Documents
& Emails
Web Logs,
Click Streams
Social
Networks
Machine
Generated
Sensor
Data
Geo-location
Data
On Premise, Cloud,
Appliance
Governance
&Integration
Security
Operations
Data Access
Data Management
YARNGreenplum
Gemfire HAWQ
22© Copyright 2014 Pivotal. All rights reserved. 22© Copyright 2014 Pivotal. All rights reserved.
Turbo Charging Data
Science with HAWQ
23© 2015 Pivotal Software, Inc. All rights reserved.
Pivotal By the Numbers
FOUNDED APRIL 2013
1700+ EMPLOYEES
FUNDED BY EMC, VMWARE, AND GE
HUNDREDS OF CUSTOMERS
PIVOTAL DATA
>$100M in data software bookings in 2014
PIVOTAL CLOUD FOUNDRY
Fastest revenue growth in an open source project in history
>$40M in first year for Pivotal Cloud Foundry in 2014 (subscription)
BIG DATACLOUD
PLATFORM
AGILE
24© 2015 Pivotal Software, Inc. All rights reserved.
Software is Eating the World
Data Is Fueling Software
25© 2015 Pivotal Software, Inc. All rights reserved.
The Data Divide
BIG DATA
CHASM
70%
of data
generated by
customers
80%
of data stored
3%
prepared for
analysis
0.5%
being
analyzed
<0.5%
being
operationalized
26© Copyright 2014 Pivotal. All rights reserved.
Pivotal Business Data Lake Architecture
Ingestion
Tier
Insights
TierSystem monitoring System management
Processing Tier
Workflow management
Distillation Tier
HDFS storage
Unstructured and structured data
In-memory
MPP database
Real-time
Micro batch
Mega batch
SQL
NoSQL
SQL
MapReduce
Query interfaces
SQL
Sources Action Tier
Real-time
ingestion
Micro batch
ingestion
Batch ingestion
Real-time
insights
Interactive
insights
Batch insights
27© 2015 Pivotal Software, Inc. All rights reserved.
The Data Driven Enterprise Journey
STORE
•  Structured
•  Unstructured
•  High Volume
•  High Velocity
ANALYZE
•  Predictive Analytics
•  Machine Learning
•  Advance Data Science
•  Realtime Analytics
DEVELOP
•  Advanced Analytic Pipelines
•  Realtime Analytical Applications
•  Global Scale Data-Driven
Applications
•  Enterprise, Consumer, IoT, and
Mobile
INNOVATE
•  Agile Dev Expertise
•  DevOps
•  Hybrid Cloud
•  Continuous Delivery
•  Closed Loop Applications
AGILE DEVELOPMENT
BIG DATA
PREDICTIVE ANALYTICS
ENTERPRISE PAAS
28© 2015 Pivotal Software, Inc. All rights reserved.
Technical Observations
•  SQL is today and will remain the most valuable workload on Hadoop
•  While Hadoop continues to mature, focused MPP SQL will remain
important
•  Scale out in-memory processing will have significant enterprise
adoption and impact into the future
•  Streaming and Machine Learning will continue to gain value
•  Open Source is becoming critical to enterprise investment decisions
29© Copyright 2015 Pivotal. All rights reserved.
®
Pivotal BDS + Hortonworks HDP = The Complete Solution
Pivotal Data Engineering Pivotal LabsPivotal Data Science
HDP
30© 2015 Pivotal Software, Inc. All rights reserved.
SQL on Hadoop Ecosystem HAWQ
Challenges Requirements
•  Complex joins not supported •  Complex joins at performance
•  Advanced analytics support •  Advanced analytics at scale within SQL
•  Interactive query latency issues •  Fast interactive queries on large data
•  Ad-hoc query performance issues •  Strong ad-hoc query support in optimizer
•  SQL analytic query coverage issues •  Full analytic SQL compliance
•  Concurrent query throughput issues •  High query throughput for mixed workloads
31© 2015 Pivotal Software, Inc. All rights reserved.
HAWQ
HAWQ: Enterprise Class SQL on Hadoop
•  Leverages market leading Greenplum technology
•  100% ANSI SQL Compliant for analytic workloads
•  Advanced cost-based query optimizer
•  Highest performing SQL on Hadoop
•  Polymorphic storage with advanced compression
•  Industry differentiating data federation with PXF*
•  Built-in advanced analytics for data science (MADLib)
•  Supports all major file HDFS file formats (AVRO, Parquet, HDFS)
•  Integrated with leading analytical tools out-of-the-box
HAWQ
*PXF = Pivotal eXtension Framework
32© 2015 Pivotal Software, Inc. All rights reserved.
Business Benefits
Feature Benefit
Rich and compliant SQL dialect •  Powerful and portable SQL apps
•  Leverage large SQL-based ecosystems
TPC-DS compliance •  Enable a wide range of use cases
•  Avoid surprises in production
Flexible/efficient joins at linear scale Off-load EDW workloads at a much lower cost
Deep analytics + machine learning Predictive/advanced learning use cases at scale
Data federation capabilities Build use cases with diverse/external data assets
without data movement
High availability and fault tolerance Off-load business critical workloads from EDW
Native Hadoop file format support Reduce ETL and data movement = lower costs
HAWQ
33© 2015 Pivotal Software, Inc. All rights reserved.
Pivotal Query Optimizer (PQO)
For HAWQ and Greenplum Database
HAWQ
Turns a SQL query into an execution plan
Greenplum DB
Ÿ  Leading Cost Based Optimizer for BIG data
Ÿ  Applies all possible optimizations at the same time
–  Considers many more plan alternatives
–  Optimizes a wider range of queries
–  Optimizes memory usage
Ÿ  New Extensible Code Base
–  Rapid adoption of emerging technologies
PIVOTAL VALUE-ADDED FUNCTIONALITY
34© 2015 Pivotal Software, Inc. All rights reserved.
Configuring and Managing HAWQ
with Ambari
•  Install HAWQ/PXF Ambari plugin
RPM
•  Restart Ambari
•  Add HAWQ/PXF service like any
other Hadoop component
HAWQ
35© 2015 Pivotal Software, Inc. All rights reserved.
Pivotal eXtension Framework (PXF)
•  Enables connectivity between HAWQ and
other services (Hive, HBase).
•  Provides an extensible framework to add
support for custom services
•  Operates as a separate service in Hadoop
Industry differentiators
•  Low latency on large data sets
•  Extensible and customizable
•  Considers cost model of federated sources
HAWQ
HDFS
(Hadoop Distributed File System)
Hive
HBase
P
X
F
Services
HAWQ
36© 2015 Pivotal Software, Inc. All rights reserved.
Data Driven Journey with Pivotal Big Data Suite
STORE
•  Structured
•  Unstructured
•  High Volume
•  High Velocity
ANALYZE
•  Predictive Analytics
•  Machine Learning
•  Advance Data Science
•  Realtime Analytics
DEVELOP
•  Advanced Analytic Pipelines
•  Realtime Analytical Applications
•  Global Scale Data-Driven
Applications
•  Enterprise, Consumer, IoT, and
Mobile
INNOVATE
•  Agile Dev Expertise
•  DevOps
•  Hybrid Cloud
•  Continuous Delivery
•  Closed Loop Applications
AGILE DEVELOPMENT
BIG DATA
PREDICTIVE ANALYTICS
ENTERPRISE PAAS
Spring XD
Spark
Pivotal HD &
Open Data Platform
Spring XD
Pivotal Greenplum
Database
Pivotal HAWQ
Spring XD
Pivotal GemFire
Redis
Rabbit MQ
Spring IO
Groovy
Pivotal BDS on PCF
Pivotal Cloud Foundry
Pivotal LabsData ScienceData Engineering
37© 2015 Pivotal Software, Inc. All rights reserved.
Putting it All Together
DATA FEEDS TRANSACTIONAL APPS ANALYTIC APPS
Expert Systems &
Machine Learning
Advanced
Analytics
Real-Time
Data
Data Stream Pipeline
HDFSData Lake
Distributed
Computing
38© 2015 Pivotal Software, Inc. All rights reserved.
Putting it All Together
DATA FEEDS TRANSACTIONAL APPS ANALYTIC APPS
GemFire
Ingest
 Filter
 Enrich
 Sink
SpringXD
HAWQ GPDB
39© Copyright 2015 Pivotal. All rights reserved.
Demo: HAWQ on HDP
bit.ly/HAWQonHDPVideo
Tutorial: HAWQ on Sandbox
bit.ly/HAWQonHDPTutorial
Page 40
© 2015 Open Data Platform initiative. All rights reserved.
THE OPEN DATA PLATFORM INITIATIVE
Page 41
© 2015 Open Data Platform initiative. All rights reserved.
Introducing
The Open
Data
Platform
Initiative
Page 42
© 2015 Open Data Platform initiative. All rights reserved.
A shared industry effort to help promote and advance
the state of Apache Hadoop® and Big Data
technologies for the Enterprise
43© Copyright 2014 Pivotal. All rights reserved.
The Open Data Platform will accelerate the delivery of
Big Data solutions by providing a well-defined
platform called ‘The ODP Core’
Page 44
© 2015 Open Data Platform initiative. All rights reserved.
The ODP Core
▪  The ODP Core is the kernel over which the industry can
build enterprise-class Apache Hadoop® solutions
–  Simplifying development of interoperable technologies
▪  Created by the ODP Developer Community
–  A team of cross industry technical experts
–  Individual, or member company developers – anyone can participate
▪  Using an open and transparent planning and release
process that follows the Apache Way
–  Interoperability within and beyond the ODP Core drives a broad set of use cases
and rapid market growth
Page 45
© 2015 Open Data Platform initiative. All rights reserved.
Delivering
Enterprise
Requirements
& Real-world
Experience
ODP Member Companies
•  Diverse representation of the Big Data eco-system
–  End users, ISVs, Systems Integrators, Distribution vendors, etc.
–  Any company can join the Open Data Platform
•  A forum for the Enterprise to define its Big Data
requirements
–  Industry groups (SIGs) to align on common industry practices and
challenges
•  Direct feedback and participation in the ODP Core
–  Real world experience determining what is Enterprise grade
Page 46
© 2015 Open Data Platform initiative. All rights reserved.
A Simple Beginning For The ODP Core
▪  The ODP Core is starting with a small number of projects
–  Enables a rapid start for the Initiative and an industry driven definition
▪  All members decide how the ODP Core evolves
–  All members are responsible for choosing projects to include in the ODP Core
–  Platinum, Gold and Silver member companies = One Member / One Vote
HDFS
YARN
Map Reduce
Ambari
ü  Deployable Hadoop configuration
ü  Improves interoperability
ü  Gives customers more freedom
ü  Follows the Apache Way
ODP Core Initial Projects
47© Copyright 2014 Pivotal. All rights reserved.
Quickly Showing Value To The Industry
Common core
HDP 2.2 Open Platform 4.0
with Apache Hadoop
IIP
Key benefits
Improves ecosystem interoperability
Unlocks customer choice
Eliminates wasteful guesswork
Respects the Apache way
Hortonworks, IBM, Pivotal and InfoSys Harmonize on Open Data Platform
Vision to Accelerate Big Data Solutions
Apache Hadoop 2.6 Apache Ambari
Pivotal HD 3.0
Page 48
© 2015 Open Data Platform initiative. All rights reserved.
How You Can Participate
§  Anybody can join the ODP – Company
memberships start at $1k
§  Have a direct voice into the future of big data
§  Help us define priorities to solve your challenges
§  Join your peers and accelerate industry solutions
§  Contribute people, tests, and code to accelerate
executing on the vision
ODP - enabling Big Data
solutions to flourish atop a
common core platform
Page 49 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...Hortonworks
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopHortonworks
 
Enterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the UnionEnterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the UnionHortonworks
 
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsightBig Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsightHortonworks
 
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Hortonworks
 
Hadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHortonworks
 
HPE and Hortonworks join forces to Deliver Healthcare Transformation
HPE and Hortonworks join forces to Deliver Healthcare TransformationHPE and Hortonworks join forces to Deliver Healthcare Transformation
HPE and Hortonworks join forces to Deliver Healthcare TransformationHortonworks
 
Hortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks
 
How Customers are Optimizing their EDW for Fast, Secure, and Effective Insights
How Customers are Optimizing their EDW for Fast, Secure, and Effective InsightsHow Customers are Optimizing their EDW for Fast, Secure, and Effective Insights
How Customers are Optimizing their EDW for Fast, Secure, and Effective InsightsHortonworks
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks
 
How Universities Use Big Data to Transform Education
How Universities Use Big Data to Transform EducationHow Universities Use Big Data to Transform Education
How Universities Use Big Data to Transform EducationHortonworks
 
Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25Hortonworks
 
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifySimplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifyHortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata Hortonworks
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks
 
Enterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageEnterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageHortonworks
 
The Next Generation of Big Data Analytics
The Next Generation of Big Data AnalyticsThe Next Generation of Big Data Analytics
The Next Generation of Big Data AnalyticsHortonworks
 
Apache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudApache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudHortonworks
 
The Power of your Data Achieved - Next Gen Modernization
The Power of your Data Achieved - Next Gen ModernizationThe Power of your Data Achieved - Next Gen Modernization
The Power of your Data Achieved - Next Gen ModernizationHortonworks
 

Was ist angesagt? (20)

Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
 
Enterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the UnionEnterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the Union
 
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsightBig Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
 
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
 
Hadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data Processing
 
HPE and Hortonworks join forces to Deliver Healthcare Transformation
HPE and Hortonworks join forces to Deliver Healthcare TransformationHPE and Hortonworks join forces to Deliver Healthcare Transformation
HPE and Hortonworks join forces to Deliver Healthcare Transformation
 
Hortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptx
 
How Customers are Optimizing their EDW for Fast, Secure, and Effective Insights
How Customers are Optimizing their EDW for Fast, Secure, and Effective InsightsHow Customers are Optimizing their EDW for Fast, Secure, and Effective Insights
How Customers are Optimizing their EDW for Fast, Secure, and Effective Insights
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
 
How Universities Use Big Data to Transform Education
How Universities Use Big Data to Transform EducationHow Universities Use Big Data to Transform Education
How Universities Use Big Data to Transform Education
 
Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25
 
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifySimplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
 
Enterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageEnterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble Storage
 
The Next Generation of Big Data Analytics
The Next Generation of Big Data AnalyticsThe Next Generation of Big Data Analytics
The Next Generation of Big Data Analytics
 
Apache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudApache Hadoop on the Open Cloud
Apache Hadoop on the Open Cloud
 
The Power of your Data Achieved - Next Gen Modernization
The Power of your Data Achieved - Next Gen ModernizationThe Power of your Data Achieved - Next Gen Modernization
The Power of your Data Achieved - Next Gen Modernization
 

Ähnlich wie Webinar turbo charging_data_science_hawq_on_hdp_final

Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopHortonworks
 
Hortonworks and HP Vertica Webinar
Hortonworks and HP Vertica WebinarHortonworks and HP Vertica Webinar
Hortonworks and HP Vertica WebinarHortonworks
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...Hortonworks
 
IoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJIoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJDaniel Madrigal
 
4. Big data & analytics HP
4. Big data & analytics HP4. Big data & analytics HP
4. Big data & analytics HPMITEF México
 
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Hortonworks
 
Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Hortonworks
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopPOSSCON
 
Hortonworks Hadoop @ Oslo Hadoop User Group
Hortonworks Hadoop @ Oslo Hadoop User GroupHortonworks Hadoop @ Oslo Hadoop User Group
Hortonworks Hadoop @ Oslo Hadoop User GroupMats Johansson
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataWANdisco Plc
 
Hortonworks & Bilot Data Driven Transformations with Hadoop
Hortonworks & Bilot Data Driven Transformations with HadoopHortonworks & Bilot Data Driven Transformations with Hadoop
Hortonworks & Bilot Data Driven Transformations with HadoopMats Johansson
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldCA Technologies
 
Hortonworks - IBM Cognitive - The Future of Data Science
Hortonworks - IBM Cognitive - The Future of Data ScienceHortonworks - IBM Cognitive - The Future of Data Science
Hortonworks - IBM Cognitive - The Future of Data ScienceThiago Santiago
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Hortonworks
 
Up Your Analytics Game with Pentaho and Vertica
Up Your Analytics Game with Pentaho and Vertica Up Your Analytics Game with Pentaho and Vertica
Up Your Analytics Game with Pentaho and Vertica Pentaho
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Precisely
 
Hortonworks and Voltage Security webinar
Hortonworks and Voltage Security webinarHortonworks and Voltage Security webinar
Hortonworks and Voltage Security webinarHortonworks
 

Ähnlich wie Webinar turbo charging_data_science_hawq_on_hdp_final (20)

Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Hortonworks and HP Vertica Webinar
Hortonworks and HP Vertica WebinarHortonworks and HP Vertica Webinar
Hortonworks and HP Vertica Webinar
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks
 
IoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJIoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJ
 
4. Big data & analytics HP
4. Big data & analytics HP4. Big data & analytics HP
4. Big data & analytics HP
 
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
 
Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Meetup oslo hortonworks HDP
Meetup oslo hortonworks HDPMeetup oslo hortonworks HDP
Meetup oslo hortonworks HDP
 
Hortonworks Hadoop @ Oslo Hadoop User Group
Hortonworks Hadoop @ Oslo Hadoop User GroupHortonworks Hadoop @ Oslo Hadoop User Group
Hortonworks Hadoop @ Oslo Hadoop User Group
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Hortonworks & Bilot Data Driven Transformations with Hadoop
Hortonworks & Bilot Data Driven Transformations with HadoopHortonworks & Bilot Data Driven Transformations with Hadoop
Hortonworks & Bilot Data Driven Transformations with Hadoop
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven World
 
Hortonworks - IBM Cognitive - The Future of Data Science
Hortonworks - IBM Cognitive - The Future of Data ScienceHortonworks - IBM Cognitive - The Future of Data Science
Hortonworks - IBM Cognitive - The Future of Data Science
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
 
Up Your Analytics Game with Pentaho and Vertica
Up Your Analytics Game with Pentaho and Vertica Up Your Analytics Game with Pentaho and Vertica
Up Your Analytics Game with Pentaho and Vertica
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
 
Hortonworks and Voltage Security webinar
Hortonworks and Voltage Security webinarHortonworks and Voltage Security webinar
Hortonworks and Voltage Security webinar
 

Mehr von Hortonworks

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's NewHortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidHortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleHortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseHortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseHortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationHortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementHortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCHortonworks
 
4 Essential Steps for Managing Sensitive Data
4 Essential Steps for Managing Sensitive Data4 Essential Steps for Managing Sensitive Data
4 Essential Steps for Managing Sensitive DataHortonworks
 

Mehr von Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 
4 Essential Steps for Managing Sensitive Data
4 Essential Steps for Managing Sensitive Data4 Essential Steps for Managing Sensitive Data
4 Essential Steps for Managing Sensitive Data
 

Kürzlich hochgeladen

YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.JasonViviers2
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptaigil2
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)Data & Analytics Magazin
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024Becky Burwell
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 

Kürzlich hochgeladen (17)

YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .ppt
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 

Webinar turbo charging_data_science_hawq_on_hdp_final

  • 1. Page 1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Turbocharging Your Data Science with HAWQ on the Hortonworks Data Platform We Do Hadoop
  • 2. Page 2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Your Hosts Michael Cucchi •  Sr. Director of Outbound Product for Pivotal's Data, Mobile, and IoT solutions •  20 years of engineering, management, and marketing experience in the high-tech industry @mikecucchi Matt Morgan •  Vice President, Global Product Marketing •  20 year history as a marketing and product executive in cloud, SaaS, and big data businesses @forwardtension
  • 3. Page 3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Establish Hadoop as the Foundational Technology of the Modern Enterprise Data Architecture Year Founded In 2011, 24 engineers from the original Hadoop team at Yahoo! spun out to form Hortonworks. Ticker Symbol NASDAQ: HDP Headquarters Santa Clara, CA Business Model Open Source Software Support Subscriptions, Training and Consulting Services Non-GAAP Billings Grew from zero to over $120 million on an annualized basis in 11 quarters Subscription Customers 437 in 11 quarters with 105 added in Q1-2015 alone. Support 24×7, global web, telephone support Partners 1100 joint engineering, strategic reseller, technology, and system integrator partners Employees 650+ Global Operations 17 countries #1 28 out of 86 Apache Hadoop committers Hortonworks employs the largest group of Hadoop committers under one roof; more than twice any other company. #1 165 Apache committer seats for projects in HDP Our committers work in 20+ projects on the data access, management, security, operations, and governance needs of the enterprise; more than twice any other company. Hortonworks Quick Facts The Forrester Wave™ Big Data Hadoop Solutions We are recognized as a leader in Hadoop by Forrester Research based on the strengths of our offerings and strategy
  • 4. Page 4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Traditional Systems Under Pressure Challenges •  Constrains data to app •  Can’t manage new data •  Costly to Scale Business Value Clickstream Geolocation Web Data Internet of Things Docs, emails Server logs 2012 2.8 Zettabytes 2020 40 Zettabytes LAGGARDS INDUSTRY LEADERS 1 2 New Data ERP CRM SCM New Traditional
  • 5. Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Early Hadoop: The Start of a Modern Data Architecture Apache Hadoop is an open source data platform for managing large volumes of high velocity and variety of data •  Built by Yahoo! to be the heartbeat of its ad & search business •  Donated to Apache Software Foundation in 2005 with rapid adoption by large web properties & early adopter enterprises •  Incredibly disruptive to current platform economics Traditional Hadoop Advantages ü  Manages new data paradigm ü  Handles data at scale ü  Cost effective ü  Open source Traditional Hadoop Had Limitations Batch-only architecture with limited analytic options Single purpose clusters, specific data sets Difficult to integrate with existing investments Not enterprise-grade Application Storage HDFS Batch Processing MapReduce
  • 6. Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Today: Modern Data Architecture Unifies Data & Processing Modern Data Architecture •  Enable applications to have access to all your enterprise data through an efficient centralized platform •  Supported with a centralized approach governance, security and operations •  Versatile to handle any applications and datasets no matter the size or type Clickstream   Web     &  Social   Geoloca3on   Sensor     &  Machine   Server     Logs   Unstructured   SOURCES Existing Systems ERP   CRM   SCM   ANALYTICS Data Marts Business Analytics Visualization & Dashboards ANALYTICS Applications Business Analytics Visualization & Dashboards ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° HDFS (Hadoop Distributed File System) YARN: Data Operating System Interactive Real-TimeBatch Partner ISVBatch Batch MPP   EDW  
  • 7. Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved OPERATIONAL  TOOLS   DEV  &  DATA  TOOLS   INFRASTRUCTURE   Partnerships Enrich the Hadoop Ecosystem Clickstream   Web     &  Social   Geoloca3on   Sensor     &  Machine   Server     Logs   Unstructured   SOURCES Existing Systems ERP   CRM   SCM   ANALYTICS Data Marts Business Analytics Visualization & Dashboards ANALYTICS Applications Business Analytics Visualization & Dashboards ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° HDFS (Hadoop Distributed File System) Deep Partnerships Hortonworks engages in deep engineered relationships with the leaders in the data center, such as EMC, Microsoft, Teradata, Red Hat, HP, SAS & SAP Broad Partnerships Over 1100 partners work with us to certify their applications to work with Hadoop so they can extend big data to their users YARN: Data Operating System EDW   Interactive Real-TimeBatch Partner ISV
  • 8. Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hadoop Adoption Follows a Predictable Journey Cost Optimization, new analytic apps, and ultimately to a data lake
  • 9. Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hadoop Driver: Cost optimization Archive Data off EDW Move rarely used data to Hadoop as active archive, store more data longer Offload costly ETL process Free your EDW to perform high-value functions like analytics & operations, not ETL Enrich the value of your EDW Use Hadoop to refine new data sources, such as web and machine data for new analytical context ANALYTICS Data Marts Business Analytics Visualization & Dashboards HDP helps you reduce costs and optimize the value associated with your EDW ANALYTICSDATASYSTEMS Data Marts Business Analytics Visualization & Dashboards HDP 2.2 ELT ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N Cold Data, Deeper Archive & New Sources Enterprise Data Warehouse Hot MPP In-Memory Clickstream   Web     &  Social   Geoloca3on   Sensor     &  Machine   Server     Logs   Unstructured   Existing Systems ERP   CRM   SCM   SOURCES
  • 10. Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hadoop Driver: Advanced analytic applications Single View: Improve acquisition & retention •  HDP enables a single view of each customer, allowing organizations to provide targeted, personalized customer experiences. •  Single view reduces attrition, improves cross-sell and improves customer satisfaction. Predictive Analytics: Identify next best action •  HDP captures, stores and processes large volumes of data streaming from connected devices •  Stream processing and data science help introduce new analytics for real- time and batch analysis Data Discovery: Uncover new findings •  HDP allows exploration of new data types and large data sets that were previously too big to capture, store & process. •  Unlock insights from data such as clickstream, geo-location, sensor, server log, social, text and video data.
  • 11. Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved 360° Customer View Boosts Sales at Home Supply Retailer Problem: Lack of unified customer record across all channels clouded targeting for marketing campaigns •  No “golden record” for analytics on customer buying behavior across all channels •  Data repositories on web traffic, POS transactions and in-home services existed in isolation of each other •  Data storage costs were increasing, without a corresponding increase in value Solution: HDP data lake drives golden customer record, targeted marketing, and reduction in data storage expenses •  Golden record enables targeted, personalized marketing with higher success rates •  Data warehouse offload saved millions of dollars in recurring expense •  Price optimization versus competitors à several millions in top-line revenue growth New Analytic Applications Clickstream, Unstructured and Structured Data Retail Major home improvement retailer RT2 Why Hadoop? Single View
  • 12. Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Responsive Patient Treatment with Real-time Monitoring of Vitals Problem: Inability to store and access sufficient data for medical decision support in real time •  9 million patient records on a legacy system were not searchable nor retrievable •  Cohort selection for research projects was slow, despite abundance of data •  Clinicians had minimal access to historical data gathered across all patients Solution: Unified data lake improves patient health, speeds research •  Legacy system retired immediately, saving $500K in annual recurring expense •  Records stored with patient identification for clinical use, same data presented anonymously to researchers for cohort selection •  Wireless patches transmit vital signs, algorithms notify doctors of high risk patterns •  Heart patients weigh themselves from home, algorithms notify doctors about unsafe weight changes and recommend a visit to the clinic New Analytic Applications Sensor, Social Data & ETL Offload Healthcare Public university teaching hospital HC2 Why Hadoop? Predictive Analytics
  • 13. Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hadoop Driver: Enabling the Data LakeSCALE SCOPE Data Lake Definition •  Centralized Architecture Multiple applications on a shared data set with consistent levels of service •  Any App, Any Data Multiple applications accessing all data affording new insights and opportunities. •  Unlocks ‘Systems of Insight’ Advanced algorithms and applications used to derive new value and optimize existing value. Drivers: 1.  Cost Optimization 2.  Advanced Analytic Apps Goal: •  Centralized Architecture •  Data-driven Business DATA LAKE Journey to the Data Lake with Hadoop Systems of Insight
  • 14. Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Case Study: 12-Month Hadoop Evolution at TrueCar DataPlatformCapabilities 12 months execution plan June 2013 Begin Hadoop Execution July 2013 Hortonworks Partnership May ‘14 IPO Aug 2013 Training & Dev Begins Nov 2013 Production Cluster 60 Nodes 2 PB Jan 2014 40% Dev Staff Proficient Dec 2013 Three Production Apps (3 total) Feb 2014 Three More Production Apps (6 total) 12 Month Results at TRUECar •  Six Production Hadoop Applications •  Sixty nodes/2PB data •  Storage Costs/Compute Costs from $19/GB to $0.12/GB “We addressed our data platform capabilities strategically as a pre-cursor to IPO.”
  • 15. Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hortonworks Data Platform Hadoop for the Enterprise
  • 16. Page 16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HDP Makes Hadoop Enterprise-Ready Hortonworks Data Platform Multi-tenant data platform built on a centralized architecture of shared enterprise services YARN: data operating system Governance Security Operations Resource management Existing applications New analytics Partner applications Data access: batch, interactive, real-time Storage Key benefits Consolidates all data sets Delivers real-time insights Integrates with data center Scalable and affordable
  • 17. Page 17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Any application Batch, interactive, and real-time Any data Existing and new datasets Anywhere Complete range of deployment options Commodity Appliance Cloud HDP Makes Hadoop Pervasive YARN: data operating system Existing applications New analytics Partner applications Data access: batch, interactive, real-time
  • 18. Page 18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved An “Any Application” Example: Spark in HDP Delivering a production-ready experience for Spark applications •  Centralized Resource Management Integrated with YARN •  Consistent Operations Provisioned and managed by Ambari •  Comprehensive Security Runs within secure clusters •  Deployable Anywhere Windows, Linux, on-premises or cloud; consistent Cloudbreak launch experience YARN: data operating system Governance Security Operations Resource management Storage
  • 19. Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved BI / Analytics (Hive) IoT Apps (Storm, HBase, Hive) An “Anywhere” Example: Cloudbreak and HDP Dev / Test (all HDP services) Data Science (Spark) Cloudbreak 1. Pick a Blueprint 2. Choose a Cloud 3. Launch HDP! Example Ambari Blueprints: IoT Apps, BI / Analytics, Data Science, Dev / Test
  • 20. Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved “Hortonworks loves and lives open source innovation” World Class Support and Services. Hortonworks' Customer Support received a maximum score and was significantly higher than both Cloudera and MapR A Leader in Hadoop The Forrester Wave™ Big Data Hadoop Solutions Q1 2014
  • 21. Page 21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved INRASTRUCTURE Pivotal in the Modern Data Architecture OPERATIONS TOOLS Provision, Manage & Monitor DEV & DATA TOOLS Build & Test DATASYSTEMSAPPLICATIONS Repositories ROOMS Statistical Analysis BI / Reporting, Ad Hoc Analysis Interactive Web & Mobile Applications Enterprise Applications EDW MPP RDBMS EDW MPP SOURCES OLTP, ERP, CRM Systems Documents & Emails Web Logs, Click Streams Social Networks Machine Generated Sensor Data Geo-location Data On Premise, Cloud, Appliance Governance &Integration Security Operations Data Access Data Management YARNGreenplum Gemfire HAWQ
  • 22. 22© Copyright 2014 Pivotal. All rights reserved. 22© Copyright 2014 Pivotal. All rights reserved. Turbo Charging Data Science with HAWQ
  • 23. 23© 2015 Pivotal Software, Inc. All rights reserved. Pivotal By the Numbers FOUNDED APRIL 2013 1700+ EMPLOYEES FUNDED BY EMC, VMWARE, AND GE HUNDREDS OF CUSTOMERS PIVOTAL DATA >$100M in data software bookings in 2014 PIVOTAL CLOUD FOUNDRY Fastest revenue growth in an open source project in history >$40M in first year for Pivotal Cloud Foundry in 2014 (subscription) BIG DATACLOUD PLATFORM AGILE
  • 24. 24© 2015 Pivotal Software, Inc. All rights reserved. Software is Eating the World Data Is Fueling Software
  • 25. 25© 2015 Pivotal Software, Inc. All rights reserved. The Data Divide BIG DATA CHASM 70% of data generated by customers 80% of data stored 3% prepared for analysis 0.5% being analyzed <0.5% being operationalized
  • 26. 26© Copyright 2014 Pivotal. All rights reserved. Pivotal Business Data Lake Architecture Ingestion Tier Insights TierSystem monitoring System management Processing Tier Workflow management Distillation Tier HDFS storage Unstructured and structured data In-memory MPP database Real-time Micro batch Mega batch SQL NoSQL SQL MapReduce Query interfaces SQL Sources Action Tier Real-time ingestion Micro batch ingestion Batch ingestion Real-time insights Interactive insights Batch insights
  • 27. 27© 2015 Pivotal Software, Inc. All rights reserved. The Data Driven Enterprise Journey STORE •  Structured •  Unstructured •  High Volume •  High Velocity ANALYZE •  Predictive Analytics •  Machine Learning •  Advance Data Science •  Realtime Analytics DEVELOP •  Advanced Analytic Pipelines •  Realtime Analytical Applications •  Global Scale Data-Driven Applications •  Enterprise, Consumer, IoT, and Mobile INNOVATE •  Agile Dev Expertise •  DevOps •  Hybrid Cloud •  Continuous Delivery •  Closed Loop Applications AGILE DEVELOPMENT BIG DATA PREDICTIVE ANALYTICS ENTERPRISE PAAS
  • 28. 28© 2015 Pivotal Software, Inc. All rights reserved. Technical Observations •  SQL is today and will remain the most valuable workload on Hadoop •  While Hadoop continues to mature, focused MPP SQL will remain important •  Scale out in-memory processing will have significant enterprise adoption and impact into the future •  Streaming and Machine Learning will continue to gain value •  Open Source is becoming critical to enterprise investment decisions
  • 29. 29© Copyright 2015 Pivotal. All rights reserved. ® Pivotal BDS + Hortonworks HDP = The Complete Solution Pivotal Data Engineering Pivotal LabsPivotal Data Science HDP
  • 30. 30© 2015 Pivotal Software, Inc. All rights reserved. SQL on Hadoop Ecosystem HAWQ Challenges Requirements •  Complex joins not supported •  Complex joins at performance •  Advanced analytics support •  Advanced analytics at scale within SQL •  Interactive query latency issues •  Fast interactive queries on large data •  Ad-hoc query performance issues •  Strong ad-hoc query support in optimizer •  SQL analytic query coverage issues •  Full analytic SQL compliance •  Concurrent query throughput issues •  High query throughput for mixed workloads
  • 31. 31© 2015 Pivotal Software, Inc. All rights reserved. HAWQ HAWQ: Enterprise Class SQL on Hadoop •  Leverages market leading Greenplum technology •  100% ANSI SQL Compliant for analytic workloads •  Advanced cost-based query optimizer •  Highest performing SQL on Hadoop •  Polymorphic storage with advanced compression •  Industry differentiating data federation with PXF* •  Built-in advanced analytics for data science (MADLib) •  Supports all major file HDFS file formats (AVRO, Parquet, HDFS) •  Integrated with leading analytical tools out-of-the-box HAWQ *PXF = Pivotal eXtension Framework
  • 32. 32© 2015 Pivotal Software, Inc. All rights reserved. Business Benefits Feature Benefit Rich and compliant SQL dialect •  Powerful and portable SQL apps •  Leverage large SQL-based ecosystems TPC-DS compliance •  Enable a wide range of use cases •  Avoid surprises in production Flexible/efficient joins at linear scale Off-load EDW workloads at a much lower cost Deep analytics + machine learning Predictive/advanced learning use cases at scale Data federation capabilities Build use cases with diverse/external data assets without data movement High availability and fault tolerance Off-load business critical workloads from EDW Native Hadoop file format support Reduce ETL and data movement = lower costs HAWQ
  • 33. 33© 2015 Pivotal Software, Inc. All rights reserved. Pivotal Query Optimizer (PQO) For HAWQ and Greenplum Database HAWQ Turns a SQL query into an execution plan Greenplum DB Ÿ  Leading Cost Based Optimizer for BIG data Ÿ  Applies all possible optimizations at the same time –  Considers many more plan alternatives –  Optimizes a wider range of queries –  Optimizes memory usage Ÿ  New Extensible Code Base –  Rapid adoption of emerging technologies PIVOTAL VALUE-ADDED FUNCTIONALITY
  • 34. 34© 2015 Pivotal Software, Inc. All rights reserved. Configuring and Managing HAWQ with Ambari •  Install HAWQ/PXF Ambari plugin RPM •  Restart Ambari •  Add HAWQ/PXF service like any other Hadoop component HAWQ
  • 35. 35© 2015 Pivotal Software, Inc. All rights reserved. Pivotal eXtension Framework (PXF) •  Enables connectivity between HAWQ and other services (Hive, HBase). •  Provides an extensible framework to add support for custom services •  Operates as a separate service in Hadoop Industry differentiators •  Low latency on large data sets •  Extensible and customizable •  Considers cost model of federated sources HAWQ HDFS (Hadoop Distributed File System) Hive HBase P X F Services HAWQ
  • 36. 36© 2015 Pivotal Software, Inc. All rights reserved. Data Driven Journey with Pivotal Big Data Suite STORE •  Structured •  Unstructured •  High Volume •  High Velocity ANALYZE •  Predictive Analytics •  Machine Learning •  Advance Data Science •  Realtime Analytics DEVELOP •  Advanced Analytic Pipelines •  Realtime Analytical Applications •  Global Scale Data-Driven Applications •  Enterprise, Consumer, IoT, and Mobile INNOVATE •  Agile Dev Expertise •  DevOps •  Hybrid Cloud •  Continuous Delivery •  Closed Loop Applications AGILE DEVELOPMENT BIG DATA PREDICTIVE ANALYTICS ENTERPRISE PAAS Spring XD Spark Pivotal HD & Open Data Platform Spring XD Pivotal Greenplum Database Pivotal HAWQ Spring XD Pivotal GemFire Redis Rabbit MQ Spring IO Groovy Pivotal BDS on PCF Pivotal Cloud Foundry Pivotal LabsData ScienceData Engineering
  • 37. 37© 2015 Pivotal Software, Inc. All rights reserved. Putting it All Together DATA FEEDS TRANSACTIONAL APPS ANALYTIC APPS Expert Systems & Machine Learning Advanced Analytics Real-Time Data Data Stream Pipeline HDFSData Lake Distributed Computing
  • 38. 38© 2015 Pivotal Software, Inc. All rights reserved. Putting it All Together DATA FEEDS TRANSACTIONAL APPS ANALYTIC APPS GemFire Ingest Filter Enrich Sink SpringXD HAWQ GPDB
  • 39. 39© Copyright 2015 Pivotal. All rights reserved. Demo: HAWQ on HDP bit.ly/HAWQonHDPVideo Tutorial: HAWQ on Sandbox bit.ly/HAWQonHDPTutorial
  • 40. Page 40 © 2015 Open Data Platform initiative. All rights reserved. THE OPEN DATA PLATFORM INITIATIVE
  • 41. Page 41 © 2015 Open Data Platform initiative. All rights reserved. Introducing The Open Data Platform Initiative
  • 42. Page 42 © 2015 Open Data Platform initiative. All rights reserved. A shared industry effort to help promote and advance the state of Apache Hadoop® and Big Data technologies for the Enterprise
  • 43. 43© Copyright 2014 Pivotal. All rights reserved. The Open Data Platform will accelerate the delivery of Big Data solutions by providing a well-defined platform called ‘The ODP Core’
  • 44. Page 44 © 2015 Open Data Platform initiative. All rights reserved. The ODP Core ▪  The ODP Core is the kernel over which the industry can build enterprise-class Apache Hadoop® solutions –  Simplifying development of interoperable technologies ▪  Created by the ODP Developer Community –  A team of cross industry technical experts –  Individual, or member company developers – anyone can participate ▪  Using an open and transparent planning and release process that follows the Apache Way –  Interoperability within and beyond the ODP Core drives a broad set of use cases and rapid market growth
  • 45. Page 45 © 2015 Open Data Platform initiative. All rights reserved. Delivering Enterprise Requirements & Real-world Experience ODP Member Companies •  Diverse representation of the Big Data eco-system –  End users, ISVs, Systems Integrators, Distribution vendors, etc. –  Any company can join the Open Data Platform •  A forum for the Enterprise to define its Big Data requirements –  Industry groups (SIGs) to align on common industry practices and challenges •  Direct feedback and participation in the ODP Core –  Real world experience determining what is Enterprise grade
  • 46. Page 46 © 2015 Open Data Platform initiative. All rights reserved. A Simple Beginning For The ODP Core ▪  The ODP Core is starting with a small number of projects –  Enables a rapid start for the Initiative and an industry driven definition ▪  All members decide how the ODP Core evolves –  All members are responsible for choosing projects to include in the ODP Core –  Platinum, Gold and Silver member companies = One Member / One Vote HDFS YARN Map Reduce Ambari ü  Deployable Hadoop configuration ü  Improves interoperability ü  Gives customers more freedom ü  Follows the Apache Way ODP Core Initial Projects
  • 47. 47© Copyright 2014 Pivotal. All rights reserved. Quickly Showing Value To The Industry Common core HDP 2.2 Open Platform 4.0 with Apache Hadoop IIP Key benefits Improves ecosystem interoperability Unlocks customer choice Eliminates wasteful guesswork Respects the Apache way Hortonworks, IBM, Pivotal and InfoSys Harmonize on Open Data Platform Vision to Accelerate Big Data Solutions Apache Hadoop 2.6 Apache Ambari Pivotal HD 3.0
  • 48. Page 48 © 2015 Open Data Platform initiative. All rights reserved. How You Can Participate §  Anybody can join the ODP – Company memberships start at $1k §  Have a direct voice into the future of big data §  Help us define priorities to solve your challenges §  Join your peers and accelerate industry solutions §  Contribute people, tests, and code to accelerate executing on the vision ODP - enabling Big Data solutions to flourish atop a common core platform
  • 49. Page 49 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Questions?