SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Hadoop and Spark – Perfect Together
Arun C. Murthy (@acmurthy)
Co-Founder, Hortonworks
® ®
© Hortonworks Inc. 2015. All Rights Reserved
Data Operating System
Enable all data and applications
TO BE
accessible and shared
BY
any end-users
© Hortonworks Inc. 2015. All Rights Reserved
Data Operating System: Open Enterprise Hadoop
YARN: data operating system
Governance Security
Operations
Resource management
Data access: batch, interactive, real-time
Storage
Commodity Appliance Cloud
Built on a centralized architecture of
shared enterprise services:
Scalable tiered storage
Resource and workload management
Trusted data governance & metadata management
Consistent operations
Comprehensive security
Developer APIs and tools
Hadoop/YARN-powered data operating system
100% open source, multi-tenant data platform for
any application, any data set, anywhere.
© Hortonworks Inc. 2015. All Rights Reserved
Elegant Developer APIs
DataFrames, Machine Learning, and SQL
Made for Data Science
All apps need to get predictive at scale and fine granularity
Democratize Machine Learning
Spark is doing to ML on Hadoop what Hive did for SQL on
Hadoop
Community
Broad developer, customer and partner interest
Realize Value of Data Operating System
A key tool in the Hadoop toolbox
Why We Love Spark at Hortonworks
Storage
YARN: Data Operating System
Governance Security
Operations
Resource Management
© Hortonworks Inc. 2015. All Rights Reserved
Resource Management
YARN for multi-tenant, diverse workloads with predictable SLAs
Tiered Memory Storage
HDFS in-memory tier – External BlockStore for RDD Cache
SparkSQL & Hive for SQL
Interop with modern Metastore/HS2, optimized ORC support,
advanced analytics e.g. Geospatial
Spark & NoSQL
Deep integration with HBase via DataSources/Catalyst for
Predicate/Aggregate Pushdown
Connect The Dots – Algorithms to Use-Cases
Higher-level ML Abstractions – E.g. OneVsRest
Validation, tuning, pipeline assembly... e.g. GeoSpatial
Spark and Hadoop – How Can We Do Better?
Storage
YARN: Data Operating System
Governance Security
Operations
Resource Management
© Hortonworks Inc. 2015. All Rights Reserved
Ease of Use
Apache Zeppelin for interactive notebooks
Metadata & Governance
Apache Atlas for metadata & Apache Falcon support for
Spark pipelines
Security & Operations
Apache Ranger managed authorization and
deployment/management via Apache Ambari
Deployable Anywhere
Linux, Windows, on-premises or cloud
Self-Service Spark in the Cloud
Easy launch of Data Science clusters via Cloudbreak and
Ambari – for Azure, AWS, GCP, OpenStack, Docker
Spark and Hadoop – How Can We Do Better?
Storage
YARN: Data Operating System
Governance Security
Operations
Resource Management
© Hortonworks Inc. 2015. All Rights Reserved
Let’s Talk Real-World Use-Cases
© Hortonworks Inc. 2015. All Rights Reserved
Real-time events and alerts application
Last Year: Real-time and Interactive Application
Microsoft
Excel
Truck sensors
Real-time and interactive
data processing
running SIMULTANEOUSLY on YARN
Trucking company datasets stored in HDFS
© Hortonworks Inc. 2015. All Rights Reserved
Chief Data Officer’s Needs
Increase safety and reduce liabilities
Anticipate driver violations BEFORE they
happen and take precautionary actions
Application Team’s Response
Enrich app with weather data and driver profiles
Explore data for predictive model features
Train and generate predictive model
Plug in model to original application so it can predict
driver violations in real-time
Truck Fleet Use Case: Real-time, Predictive App
© Hortonworks Inc. 2015. All Rights Reserved
Data Scientist: Explore Data & Build Model in Cloud
Click-thru Demo
Provision data science
environment in the cloud
Use data science
notebook to explore data
Run algorithms to create
predictive model
Cloudbreak
1. Choose a cloud
2. Pick the Spark
blueprint
3. Launch HDP
Microsoft Azure
© Hortonworks Inc. 2015. All Rights Reserved
Login to launch.hortonworks.com
which is a self-service portal for
launching HDP clusters to the cloud
© Hortonworks Inc. 2015. All Rights Reserved
Select a cloud provider, then start the
process of creating your cluster
© Hortonworks Inc. 2015. All Rights Reserved
Name the cluster, choose your region,
and pick your blueprint…in this case,
we want “hdp-spark-cluster” for our
data science work
© Hortonworks Inc. 2015. All Rights Reserved
We clicked “create cluster” and
Cloudbreak is now provisioning our
Spark environment on Azure
© Hortonworks Inc. 2015. All Rights Reserved
We can now access Zeppelin which is
a data science notebook for Spark
that’s similar to iPython notebook
© Hortonworks Inc. 2015. All Rights Reserved
Let’s look at our data. We can see
eventType, if the driver’s certified, how
many hours driven, as well as weather
data such as foggy, rainy, etc.
© Hortonworks Inc. 2015. All Rights Reserved
Let’s start asking questions of our
data; such as, does fatigue cause
violations?
© Hortonworks Inc. 2015. All Rights Reserved
Let’s view the data in a pie chart
graphic to see how violations look by
hours driven.
© Hortonworks Inc. 2015. All Rights Reserved
How are violations impacted by fog?
© Hortonworks Inc. 2015. All Rights Reserved
Does location have an impact on
incidents?
© Hortonworks Inc. 2015. All Rights Reserved
OK, we’ve learned enough about the
data and what features we want to
include in our model. So we’ll run a
logistic regression on training data.
© Hortonworks Inc. 2015. All Rights Reserved
Let’s run our code
© Hortonworks Inc. 2015. All Rights Reserved
Let’s look at our model.
Next step is to hand the model off to
the Enterprise Architect to integrate
into our real-time application.
© Hortonworks Inc. 2015. All Rights Reserved
Apps on YARN
Trucking company datasets stored in HDFS
Real-time and Predictive Application Architecture
Your BI Tool
Predictive application
Truck sensors
App alerts
(ActiveMQ)
Messages
SQL Stream NoSQLML
Use
Model
© Hortonworks Inc. 2015. All Rights Reserved
HDP and Spark…
Perfect Together

Weitere ähnliche Inhalte

Was ist angesagt?

Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNCombine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARN
Hortonworks
 

Was ist angesagt? (20)

Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
 
Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics Optimization
 
Falcon Meetup
Falcon Meetup Falcon Meetup
Falcon Meetup
 
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2
 
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshop
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshopDeep learning with Hortonworks and Apache Spark - Hortonworks technical workshop
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshop
 
Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
 
Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
 
Authoring and Hosting Applications on YARN using Slider
Authoring and Hosting Applications on YARN using SliderAuthoring and Hosting Applications on YARN using Slider
Authoring and Hosting Applications on YARN using Slider
 
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDB
 
Protecting enterprise Data in Hadoop
Protecting enterprise Data in HadoopProtecting enterprise Data in Hadoop
Protecting enterprise Data in Hadoop
 
#HSTokyo16 Apache Spark Crash Course
#HSTokyo16 Apache Spark Crash Course #HSTokyo16 Apache Spark Crash Course
#HSTokyo16 Apache Spark Crash Course
 
Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN Applications
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Apache NiFi Toronto Meetup
Apache NiFi Toronto MeetupApache NiFi Toronto Meetup
Apache NiFi Toronto Meetup
 
Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNCombine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARN
 

Ähnlich wie Hadoop and Spark – Perfect Together

Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
skumpf
 
Cw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-clouderaCw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-cloudera
inevitablecloud
 

Ähnlich wie Hadoop and Spark – Perfect Together (20)

Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks)
Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks)Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks)
Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks)
 
Spark and Hadoop Perfect Togeher by Arun Murthy
Spark and Hadoop Perfect Togeher by Arun MurthySpark and Hadoop Perfect Togeher by Arun Murthy
Spark and Hadoop Perfect Togeher by Arun Murthy
 
Spark Summit EMEA - Arun Murthy's Keynote
Spark Summit EMEA - Arun Murthy's KeynoteSpark Summit EMEA - Arun Murthy's Keynote
Spark Summit EMEA - Arun Murthy's Keynote
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
 
S2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real WorldS2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real World
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
 
Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015
 
Hortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptx
 
Webinar Nebula&Scalr : Increasing Business Agility with Real-time Processing ...
Webinar Nebula&Scalr : Increasing Business Agility with Real-time Processing ...Webinar Nebula&Scalr : Increasing Business Agility with Real-time Processing ...
Webinar Nebula&Scalr : Increasing Business Agility with Real-time Processing ...
 
Webinar: Increasing Business Agility with Real-time Processing with Apache Ha...
Webinar: Increasing Business Agility with Real-time Processing with Apache Ha...Webinar: Increasing Business Agility with Real-time Processing with Apache Ha...
Webinar: Increasing Business Agility with Real-time Processing with Apache Ha...
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
OOP 2014
OOP 2014OOP 2014
OOP 2014
 
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
 
Cw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-clouderaCw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-cloudera
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop Summit
 
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior GraphsPredicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
 
Spark + Hadoop Perfect together
Spark + Hadoop Perfect togetherSpark + Hadoop Perfect together
Spark + Hadoop Perfect together
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu BariApache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
 

Mehr von Hortonworks

Mehr von Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Kürzlich hochgeladen

➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 

Kürzlich hochgeladen (20)

Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 

Hadoop and Spark – Perfect Together

  • 1. Hadoop and Spark – Perfect Together Arun C. Murthy (@acmurthy) Co-Founder, Hortonworks ® ®
  • 2. © Hortonworks Inc. 2015. All Rights Reserved Data Operating System Enable all data and applications TO BE accessible and shared BY any end-users
  • 3. © Hortonworks Inc. 2015. All Rights Reserved Data Operating System: Open Enterprise Hadoop YARN: data operating system Governance Security Operations Resource management Data access: batch, interactive, real-time Storage Commodity Appliance Cloud Built on a centralized architecture of shared enterprise services: Scalable tiered storage Resource and workload management Trusted data governance & metadata management Consistent operations Comprehensive security Developer APIs and tools Hadoop/YARN-powered data operating system 100% open source, multi-tenant data platform for any application, any data set, anywhere.
  • 4. © Hortonworks Inc. 2015. All Rights Reserved Elegant Developer APIs DataFrames, Machine Learning, and SQL Made for Data Science All apps need to get predictive at scale and fine granularity Democratize Machine Learning Spark is doing to ML on Hadoop what Hive did for SQL on Hadoop Community Broad developer, customer and partner interest Realize Value of Data Operating System A key tool in the Hadoop toolbox Why We Love Spark at Hortonworks Storage YARN: Data Operating System Governance Security Operations Resource Management
  • 5. © Hortonworks Inc. 2015. All Rights Reserved Resource Management YARN for multi-tenant, diverse workloads with predictable SLAs Tiered Memory Storage HDFS in-memory tier – External BlockStore for RDD Cache SparkSQL & Hive for SQL Interop with modern Metastore/HS2, optimized ORC support, advanced analytics e.g. Geospatial Spark & NoSQL Deep integration with HBase via DataSources/Catalyst for Predicate/Aggregate Pushdown Connect The Dots – Algorithms to Use-Cases Higher-level ML Abstractions – E.g. OneVsRest Validation, tuning, pipeline assembly... e.g. GeoSpatial Spark and Hadoop – How Can We Do Better? Storage YARN: Data Operating System Governance Security Operations Resource Management
  • 6. © Hortonworks Inc. 2015. All Rights Reserved Ease of Use Apache Zeppelin for interactive notebooks Metadata & Governance Apache Atlas for metadata & Apache Falcon support for Spark pipelines Security & Operations Apache Ranger managed authorization and deployment/management via Apache Ambari Deployable Anywhere Linux, Windows, on-premises or cloud Self-Service Spark in the Cloud Easy launch of Data Science clusters via Cloudbreak and Ambari – for Azure, AWS, GCP, OpenStack, Docker Spark and Hadoop – How Can We Do Better? Storage YARN: Data Operating System Governance Security Operations Resource Management
  • 7. © Hortonworks Inc. 2015. All Rights Reserved Let’s Talk Real-World Use-Cases
  • 8. © Hortonworks Inc. 2015. All Rights Reserved Real-time events and alerts application Last Year: Real-time and Interactive Application Microsoft Excel Truck sensors Real-time and interactive data processing running SIMULTANEOUSLY on YARN Trucking company datasets stored in HDFS
  • 9. © Hortonworks Inc. 2015. All Rights Reserved Chief Data Officer’s Needs Increase safety and reduce liabilities Anticipate driver violations BEFORE they happen and take precautionary actions Application Team’s Response Enrich app with weather data and driver profiles Explore data for predictive model features Train and generate predictive model Plug in model to original application so it can predict driver violations in real-time Truck Fleet Use Case: Real-time, Predictive App
  • 10. © Hortonworks Inc. 2015. All Rights Reserved Data Scientist: Explore Data & Build Model in Cloud Click-thru Demo Provision data science environment in the cloud Use data science notebook to explore data Run algorithms to create predictive model Cloudbreak 1. Choose a cloud 2. Pick the Spark blueprint 3. Launch HDP Microsoft Azure
  • 11. © Hortonworks Inc. 2015. All Rights Reserved Login to launch.hortonworks.com which is a self-service portal for launching HDP clusters to the cloud
  • 12. © Hortonworks Inc. 2015. All Rights Reserved Select a cloud provider, then start the process of creating your cluster
  • 13. © Hortonworks Inc. 2015. All Rights Reserved Name the cluster, choose your region, and pick your blueprint…in this case, we want “hdp-spark-cluster” for our data science work
  • 14. © Hortonworks Inc. 2015. All Rights Reserved We clicked “create cluster” and Cloudbreak is now provisioning our Spark environment on Azure
  • 15. © Hortonworks Inc. 2015. All Rights Reserved We can now access Zeppelin which is a data science notebook for Spark that’s similar to iPython notebook
  • 16. © Hortonworks Inc. 2015. All Rights Reserved Let’s look at our data. We can see eventType, if the driver’s certified, how many hours driven, as well as weather data such as foggy, rainy, etc.
  • 17. © Hortonworks Inc. 2015. All Rights Reserved Let’s start asking questions of our data; such as, does fatigue cause violations?
  • 18. © Hortonworks Inc. 2015. All Rights Reserved Let’s view the data in a pie chart graphic to see how violations look by hours driven.
  • 19. © Hortonworks Inc. 2015. All Rights Reserved How are violations impacted by fog?
  • 20. © Hortonworks Inc. 2015. All Rights Reserved Does location have an impact on incidents?
  • 21. © Hortonworks Inc. 2015. All Rights Reserved OK, we’ve learned enough about the data and what features we want to include in our model. So we’ll run a logistic regression on training data.
  • 22. © Hortonworks Inc. 2015. All Rights Reserved Let’s run our code
  • 23. © Hortonworks Inc. 2015. All Rights Reserved Let’s look at our model. Next step is to hand the model off to the Enterprise Architect to integrate into our real-time application.
  • 24. © Hortonworks Inc. 2015. All Rights Reserved Apps on YARN Trucking company datasets stored in HDFS Real-time and Predictive Application Architecture Your BI Tool Predictive application Truck sensors App alerts (ActiveMQ) Messages SQL Stream NoSQLML Use Model
  • 25. © Hortonworks Inc. 2015. All Rights Reserved HDP and Spark… Perfect Together

Hinweis der Redaktion

  1. Title: Hadoop and Spark – perfect together Abstract: Unlocking a fully integrated Spark  experience within your enterprise Hadoop environment that is manageable, secure and deployable anywhere.
  2. Ensuring Spark is well integrated with YARN, Ambari, and Ranger enables enterprise to deploy Spark apps with confidence, and since HDP is available across Windows, Linux, on-premises and cloud deployment environments, it just makes it that much easier for enterprises to adopt it.
  3. Ensuring Spark is well integrated with YARN, Ambari, and Ranger enables enterprise to deploy Spark apps with confidence, and since HDP is available across Windows, Linux, on-premises and cloud deployment environments, it just makes it that much easier for enterprises to adopt it.
  4. Ensuring Spark is well integrated with YARN, Ambari, and Ranger enables enterprise to deploy Spark apps with confidence, and since HDP is available across Windows, Linux, on-premises and cloud deployment environments, it just makes it that much easier for enterprises to adopt it.
  5. The basic architecture of the demo looked something like this. We had events from the truck sensors streaming into HDP in real-time where that data was simultaneously being processed and served up in real-time as well as being made accessible for historical interactive queries. We showed a real-time map that displayed the trucks on their routes along with details about any violations as they were occurring. We also demonstrated interactive querying and analysis of the historical data using Microsoft Excel. [NEXT SLIDE]
  6. 1
  7. 3
  8. 6
  9. 9
  10. 11
  11. 12
  12. 13
  13. 14
  14. 15
  15. 16
  16. 17
  17. 18
  18. 20