SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
Spark Webinar 
October 2nd, 2014 
Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Vinay Shukla & Ram Venkatesh
Agenda 
• What is Spark? 
• What have we done with Spark so far 
• Tech Previews 
• Brief on Spark 1.1.0 Tech Preview 
• Multi tenant & multi workload with YARN 
• Introducing Spark-3561 
• Get Involved 
Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Let’s Talk About Apache Spark 
What is Spark? 
• Spark is a general-purpose big data engine that provides simple APIs for data scientists and 
engineers familiar with Scala, Python and Java to build ad-hoc interactive analytics, iterative 
machine-learning, and other use cases well-suited to interactive, in-memory data processing of GB to 
TB sized datasets. 
Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
What is Spark? 
Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
(Hadoop|FlatMapped|Filter|MapPartitions|Shuffled)RDD 
stage0: (Hadoop|FlatMapped|Filter|MapPartitions)RDD 
stage1: ShuffledRDD 
ShuffleMapTask: (flatMap | 
map) 
Task Task 
ResultTask: (reduceByKey) 
ShuffleMapTask: (flatMap | 
map) 
Spark API 
Spark 
Compiler / Optimizer 
DAG Runtime 
Execution Engine 
Spark Cluster YARN Mesos 
Client 
Cluster 
DAGScheduler, ActiveJob 
Task 
SparkAM
Let’s Talk About Apache Spark (cont’d) 
What’s Our Spark Strategy? 
• Hortonworks is focused on enabling Spark for Enterprise Hadoop so users can deploy Spark-based 
applications along with their other Hadoop workloads in a consistent, predictable, and robust way. 
– Leverage Scale, Multi-tenancy provided by “YARN” so its memory and CPU intensive apps can work with 
predictable performance 
– Integrate it with HDP’s operations, security, governance, scalability, availability, and multi-tenancy capabilities 
Do We Have a Plan to Support Spark? Yes. 
• Spark is available now as a Technology Preview. 
• We are working our standard process of Tech Preview -> GA. We did this for Storm, Falcon, etc. 
• Spark will be added to our HDP Enterprise Plus subscription when it’s GA ready 
Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Spark Timeline 
Break-down 
Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Spark Roadmap 
2014 JULY SEPT 
1.0.1 TP 
Refresh 
Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
1.1.0 TP 
Refresh 
DEC 
1.2.0 GA 
• Hive 13 support 
• Limited ORC support 
• Spark on YARN: Deployment Best Practices 
• Ambari Support for Spark Install/Stop/Config 
• Spark on Kerberized Cluster 
• Authentication against LDAP in Spark UI
What’s in Spark 1.1.0 Tech Preview 
• Upgrades Spark to Hive .13 
• Provides Hive .13 features (new Hive UDFs) in Spark 
• Limited ORC support 
• Ability to manipulate ORC as HadoopRDD 
….. 
val inputRead = 
sc.hadoopFile("hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/orc_table",cl 
assOf[org.apache.hadoop.hive.ql.io.orc.OrcInputFormat],classOf[org.apache.hadoop.io. 
NullWritable],classOf[org.apache.hadoop.hive.ql.io.orc.OrcStruct]) 
val k = inputRead.map(pair => pair._2.toString) 
val c = k.collect 
….. 
Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Spark Enterprise Readiness 
Enhancements 
Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Spark Investment Phases 
• Phase 1 
• Hive 0.13 support 
• Limited ORC support 
• Security: Spark certification on Kerberized Cluster 
• Security: Authentication in Spark UI against LDAP/AD 
• Operations: Ambari Stack Definition: Install/Start/Stop/Config/Quick links to Spark UI 
• Phase 2 
• Improve reliability & Scale of Spark-on-YARN 
• Enhance ORC support 
• Improve Debug Capabilities 
• Security: Wire Encryption and Authorization with XA/Argus 
• Operations: Spark logs published to YARN Application Timeline Service (ATS) 
• Operations: Enhanced workload management 
Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Spark on Hadoop 
October 2nd, 2014 
Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Ram Venkatesh
Spark-on-Hadoop – End User Benefits 
Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
© Hortonworks Inc. 2013 
• Developer Productivity 
• Simple, easy to use APIs 
• Direct and elegant representation of the data processing flow 
• Focus on application business logic rather than Hadoop internals 
• Integrated develop-deploy-debug experience through the IDE 
• Multi-tenancy 
• Shared infrastructure across workloads – interactive queries by day, batch ETL at night 
• Better utilization of compute capacity 
• Move the execution to the data tier instead of the other way around 
• Reduced load on distributed filesystem (HDFS) 
• Reduce unnecessary replicated reads and writes 
• Reduced network usage 
• Eliminates the need for data transfer in and out of the cluster 
Page 12
Spark-on-Hadoop – Design considerations 
• Don’t solve problems that have already been solved. 
–Leverage discrete task based compute model for elasticity, scalability and fault tolerance 
–Leverage several man years of work in Hadoop Map-Reduce data shuffling operations 
–Leverage proven resource sharing and multi-tenancy model for Hadoop and YARN 
–Leverage built-in security mechanisms in Hadoop for privacy and isolation 
Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
© Hortonworks Inc. 2013 
• Don’t create new problems 
–Preserve the simple developer experience 
–No changes to Spark programs, all programs run unmodified 
–Propose simple, mainstream in-the-community extension to the Apache Spark project 
Page 13 
Look to the Future with an eye on the Past
Spark on Hadoop – From service model to app model 
Spark jobs compile down to a Directed Acyclic Graph (DAG). 
• Vertices in the graph represent user logic 
• Edges represent data movement from producers to consumers 
• Spark DAG executed using Apache Tez at runtime 
Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
© Hortonworks Inc. 2013 
Page 14 
Preprocessor Stage 
Partition Stage 
Aggregate Stage 
Sampler 
Task-1 Task-2 
Task-1 Task-2 
Task-1 Task-2 
Samples 
Ranges 
Distributed Sort
Spark-on-Hadoop – Simplifying Operations 
• No deployments to do. No side effects. Easy and safe to try it out! 
• Completely client side application. 
• Simply upload to any accessible FileSystem and point to the cluster through configuration files. 
• Enables running different versions concurrently. Easy to test new functionality while keeping stable 
versions for production. 
• Leverages YARN local resources. 
Spark Client TezTask 
Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
TezTask 
© Hortonworks Inc. 2013 
Page 15 
Client 
Machine 
Node 
Manager 
Node 
Manager 
HDFS 
Spark-v1 Spark-v2 
Spark Client 
Client 
Machine
Benefits of native Hadoop execution of Spark DAGs 
• Elastic resource management - dynamic acquisition and release of containers 
•Works with YARN pre-emption, reservation and headroom calculations 
• Auto-parallelism based on sampling – you no longer need to guess no. of reducers 
• Efficient data movement between stages using the Hadoop shuffle 
• Integrates with resource isolation and governance mechanisms in Hadoop 
• Classpath and jarfile management through local resources 
• Detailed job-level metrics through integration with the YARN ATS 
Enables large-scale, multi-tenant batch ETL Spark programs 
Page 16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
© Hortonworks Inc. 2013 
Page 16
Introducing SPARK-3561 
Page 17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
DEMO: SPARK-3561 in action 
Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
SPARK-3561 under the hood 
Example program: 
Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
SPARK-3561 Demo – contd. 
Execute program using spark-submit 
spark-submit --class dev.demo.WordCount 
--master execution-context:org.apache.spark.tez.TezJobExecutionContext 
spark-on-hadoop-1.0.jar 1 test.txt 
Execute interactive Spark commands through spark-shell 
spark-shell --master execution-context:org.apache.spark.tez.TezJobExecutionContext 
INFO main spark.SparkContext:59 - Will use custom job execution context org.apache.spark.tez.TezJobExecutionContext 
INFO main adapter.SparkToTezAdapter:59 - Adapting PairRDDFunctions.saveAsNewAPIHadoopDataset for Tez 
INFO main repl.SparkILoop:59 - Created spark context.. 
Spark context available as sc. 
scala> 
Page 20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
SPARK-3561 – feedback requested 
Provide feedback on your ETL/batch scenarios 
Participate in the discussion on the JIRA 
Try it out when it becomes available 
Looking for early adopters to run and validate at scale 
Page 21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Resources 
• Spark Labs Page : http://hortonworks.com/hadoop/spark/ 
• Spark Roadmap Blog : http://hortonworks.com/blog/extending-spark-yarn- 
enterprise-hadoop/ 
• Spark 1.1.0 Tech Preview : http://hortonworks.com/kb/spark-1-1-0- 
technical-preview-hdp-2-1-5/ 
• Public Spark Forums : 
http://hortonworks.com/community/forums/forum/spark/ 
• Spark-3561 : https://issues.apache.org/jira/browse/SPARK-3561 
Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Q&A… 
Discussion 
Page 23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Más contenido relacionado

Was ist angesagt?

Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureDataWorks Summit
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionDataWorks Summit/Hadoop Summit
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?DataWorks Summit
 
The Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral ProcessingThe Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral ProcessingDataWorks Summit
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with sparkHortonworks
 
Big Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNBig Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNDataWorks Summit
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesDataWorks Summit
 
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinIntro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinAlex Zeltov
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...DataWorks Summit/Hadoop Summit
 
A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseA New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseDataWorks Summit/Hadoop Summit
 
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHortonworks
 

Was ist angesagt? (20)

Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
 
Evolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage SubsystemEvolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage Subsystem
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Row/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache SparkRow/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache Spark
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
 
Apache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real TimeApache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real Time
 
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
The Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral ProcessingThe Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral Processing
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
 
Big Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNBig Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeN
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinIntro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
 
Hadoop 3 in a Nutshell
Hadoop 3 in a NutshellHadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseA New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouse
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
 

Andere mochten auch

Apache Ambari: Managing Hadoop and YARN
Apache Ambari: Managing Hadoop and YARNApache Ambari: Managing Hadoop and YARN
Apache Ambari: Managing Hadoop and YARNHortonworks
 
Hortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks Technical Workshop: HBase For Mission Critical ApplicationsHortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks Technical Workshop: HBase For Mission Critical ApplicationsHortonworks
 
Hadoop crashcourse v3
Hadoop crashcourse v3Hadoop crashcourse v3
Hadoop crashcourse v3Hortonworks
 
Hortonworks technical workshop operations with ambari
Hortonworks technical workshop   operations with ambariHortonworks technical workshop   operations with ambari
Hortonworks technical workshop operations with ambariHortonworks
 
Apache Hive 0.13 Performance Benchmarks
Apache Hive 0.13 Performance BenchmarksApache Hive 0.13 Performance Benchmarks
Apache Hive 0.13 Performance BenchmarksHortonworks
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Hortonworks
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalWebinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalHortonworks
 
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
YARN webinar series: Using Scalding to write applications to Hadoop and YARNYARN webinar series: Using Scalding to write applications to Hadoop and YARN
YARN webinar series: Using Scalding to write applications to Hadoop and YARNHortonworks
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks
 
Apache Ambari - What's New in 2.1
Apache Ambari - What's New in 2.1Apache Ambari - What's New in 2.1
Apache Ambari - What's New in 2.1Hortonworks
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...Hortonworks
 
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion StoicaSpark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion StoicaDatabricks
 
Beyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFramesBeyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFramesDatabricks
 
Spark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student SlidesSpark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student SlidesDatabricks
 
Reactive Programming Meetup - NodeJs on K8s
Reactive Programming Meetup - NodeJs on K8sReactive Programming Meetup - NodeJs on K8s
Reactive Programming Meetup - NodeJs on K8sRoland Tritsch
 
Scala training workshop 02
Scala training workshop 02Scala training workshop 02
Scala training workshop 02Nguyen Tuan
 
IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)
IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)
IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)Spark Summit
 

Andere mochten auch (20)

Apache Ambari: Managing Hadoop and YARN
Apache Ambari: Managing Hadoop and YARNApache Ambari: Managing Hadoop and YARN
Apache Ambari: Managing Hadoop and YARN
 
Hortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks Technical Workshop: HBase For Mission Critical ApplicationsHortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks Technical Workshop: HBase For Mission Critical Applications
 
Hadoop crashcourse v3
Hadoop crashcourse v3Hadoop crashcourse v3
Hadoop crashcourse v3
 
Hortonworks technical workshop operations with ambari
Hortonworks technical workshop   operations with ambariHortonworks technical workshop   operations with ambari
Hortonworks technical workshop operations with ambari
 
Apache Hive 0.13 Performance Benchmarks
Apache Hive 0.13 Performance BenchmarksApache Hive 0.13 Performance Benchmarks
Apache Hive 0.13 Performance Benchmarks
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalWebinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_final
 
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
YARN webinar series: Using Scalding to write applications to Hadoop and YARNYARN webinar series: Using Scalding to write applications to Hadoop and YARN
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3
 
Apache Ambari - What's New in 2.1
Apache Ambari - What's New in 2.1Apache Ambari - What's New in 2.1
Apache Ambari - What's New in 2.1
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
 
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion StoicaSpark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
 
Beyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFramesBeyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFrames
 
Spark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student SlidesSpark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student Slides
 
Reactive Programming Meetup - NodeJs on K8s
Reactive Programming Meetup - NodeJs on K8sReactive Programming Meetup - NodeJs on K8s
Reactive Programming Meetup - NodeJs on K8s
 
Sparkstreaming
SparkstreamingSparkstreaming
Sparkstreaming
 
Devops Spark Streaming
Devops Spark StreamingDevops Spark Streaming
Devops Spark Streaming
 
Scala training workshop 02
Scala training workshop 02Scala training workshop 02
Scala training workshop 02
 
IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)
IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)
IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)
 
Spark Technology Center IBM
Spark Technology Center IBMSpark Technology Center IBM
Spark Technology Center IBM
 

Ähnlich wie YARN Ready: Apache Spark

Apache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop SummitApache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop SummitSaptak Sen
 
Spark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitSpark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitDataWorks Summit
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkSlim Baltagi
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARNDataWorks Summit
 
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...Alex Zeltov
 
Spark and Hadoop Technology
Spark and Hadoop Technology Spark and Hadoop Technology
Spark and Hadoop Technology Avinash Gautam
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingDataWorks Summit
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionDataWorks Summit
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionWangda Tan
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform WebinarCloudera, Inc.
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Data Con LA
 
Getting Started with Spark Scala
Getting Started with Spark ScalaGetting Started with Spark Scala
Getting Started with Spark ScalaKnoldus Inc.
 
Hive on spark berlin buzzwords
Hive on spark berlin buzzwordsHive on spark berlin buzzwords
Hive on spark berlin buzzwordsSzehon Ho
 
Spark + Hadoop Perfect together
Spark + Hadoop Perfect togetherSpark + Hadoop Perfect together
Spark + Hadoop Perfect togetherIsheeta Sanghi
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing enginebigdatagurus_meetup
 
Tez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaTez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaData Con LA
 

Ähnlich wie YARN Ready: Apache Spark (20)

Apache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop SummitApache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop Summit
 
Spark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitSpark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop Summit
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to Spark
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARN
 
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
 
Spark and Hadoop Technology
Spark and Hadoop Technology Spark and Hadoop Technology
Spark and Hadoop Technology
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
 
Apache spark
Apache sparkApache spark
Apache spark
 
Module01
 Module01 Module01
Module01
 
Getting Started with Spark Scala
Getting Started with Spark ScalaGetting Started with Spark Scala
Getting Started with Spark Scala
 
Hive on spark berlin buzzwords
Hive on spark berlin buzzwordsHive on spark berlin buzzwords
Hive on spark berlin buzzwords
 
Spark + Hadoop Perfect together
Spark + Hadoop Perfect togetherSpark + Hadoop Perfect together
Spark + Hadoop Perfect together
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing engine
 
spark_v1_2
spark_v1_2spark_v1_2
spark_v1_2
 
Tez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaTez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_saha
 
Spark Security
Spark SecuritySpark Security
Spark Security
 

Mehr von Hortonworks

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's NewHortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidHortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleHortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseHortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseHortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationHortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementHortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCHortonworks
 

Mehr von Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Último

Graphene Quantum Dots-Based Composites for Biomedical Applications
Graphene Quantum Dots-Based Composites for  Biomedical ApplicationsGraphene Quantum Dots-Based Composites for  Biomedical Applications
Graphene Quantum Dots-Based Composites for Biomedical Applicationsnooralam814309
 
3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud DataEric D. Schabell
 
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfQ4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfTejal81
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 
Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.IPLOOK Networks
 
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxEmil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxNeo4j
 
.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptx.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptxHansamali Gamage
 
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Alkin Tezuysal
 
IT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingIT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingMAGNIntelligence
 
20140402 - Smart house demo kit
20140402 - Smart house demo kit20140402 - Smart house demo kit
20140402 - Smart house demo kitJamie (Taka) Wang
 
Keep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES LiveKeep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES LiveIES VE
 
The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)IES VE
 
Extra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdfExtra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdfInfopole1
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTxtailishbaloch
 
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Muhammad Tiham Siddiqui
 
AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024Brian Pichman
 
My key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIMy key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIVijayananda Mohire
 
CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024Brian Pichman
 
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechWebinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechProduct School
 

Último (20)

Graphene Quantum Dots-Based Composites for Biomedical Applications
Graphene Quantum Dots-Based Composites for  Biomedical ApplicationsGraphene Quantum Dots-Based Composites for  Biomedical Applications
Graphene Quantum Dots-Based Composites for Biomedical Applications
 
3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data
 
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfQ4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.
 
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxEmil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
 
.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptx.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptx
 
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
 
IT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingIT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced Computing
 
20140402 - Smart house demo kit
20140402 - Smart house demo kit20140402 - Smart house demo kit
20140402 - Smart house demo kit
 
Keep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES LiveKeep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES Live
 
The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)
 
Extra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdfExtra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdf
 
SheDev 2024
SheDev 2024SheDev 2024
SheDev 2024
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
 
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)
 
AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024
 
My key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIMy key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAI
 
CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024
 
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechWebinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
 

YARN Ready: Apache Spark

  • 1. Spark Webinar October 2nd, 2014 Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Vinay Shukla & Ram Venkatesh
  • 2. Agenda • What is Spark? • What have we done with Spark so far • Tech Previews • Brief on Spark 1.1.0 Tech Preview • Multi tenant & multi workload with YARN • Introducing Spark-3561 • Get Involved Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 3. Let’s Talk About Apache Spark What is Spark? • Spark is a general-purpose big data engine that provides simple APIs for data scientists and engineers familiar with Scala, Python and Java to build ad-hoc interactive analytics, iterative machine-learning, and other use cases well-suited to interactive, in-memory data processing of GB to TB sized datasets. Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 4. What is Spark? Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved (Hadoop|FlatMapped|Filter|MapPartitions|Shuffled)RDD stage0: (Hadoop|FlatMapped|Filter|MapPartitions)RDD stage1: ShuffledRDD ShuffleMapTask: (flatMap | map) Task Task ResultTask: (reduceByKey) ShuffleMapTask: (flatMap | map) Spark API Spark Compiler / Optimizer DAG Runtime Execution Engine Spark Cluster YARN Mesos Client Cluster DAGScheduler, ActiveJob Task SparkAM
  • 5. Let’s Talk About Apache Spark (cont’d) What’s Our Spark Strategy? • Hortonworks is focused on enabling Spark for Enterprise Hadoop so users can deploy Spark-based applications along with their other Hadoop workloads in a consistent, predictable, and robust way. – Leverage Scale, Multi-tenancy provided by “YARN” so its memory and CPU intensive apps can work with predictable performance – Integrate it with HDP’s operations, security, governance, scalability, availability, and multi-tenancy capabilities Do We Have a Plan to Support Spark? Yes. • Spark is available now as a Technology Preview. • We are working our standard process of Tech Preview -> GA. We did this for Storm, Falcon, etc. • Spark will be added to our HDP Enterprise Plus subscription when it’s GA ready Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 6. Spark Timeline Break-down Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 7. Spark Roadmap 2014 JULY SEPT 1.0.1 TP Refresh Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 1.1.0 TP Refresh DEC 1.2.0 GA • Hive 13 support • Limited ORC support • Spark on YARN: Deployment Best Practices • Ambari Support for Spark Install/Stop/Config • Spark on Kerberized Cluster • Authentication against LDAP in Spark UI
  • 8. What’s in Spark 1.1.0 Tech Preview • Upgrades Spark to Hive .13 • Provides Hive .13 features (new Hive UDFs) in Spark • Limited ORC support • Ability to manipulate ORC as HadoopRDD ….. val inputRead = sc.hadoopFile("hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/orc_table",cl assOf[org.apache.hadoop.hive.ql.io.orc.OrcInputFormat],classOf[org.apache.hadoop.io. NullWritable],classOf[org.apache.hadoop.hive.ql.io.orc.OrcStruct]) val k = inputRead.map(pair => pair._2.toString) val c = k.collect ….. Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 9. Spark Enterprise Readiness Enhancements Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 10. Spark Investment Phases • Phase 1 • Hive 0.13 support • Limited ORC support • Security: Spark certification on Kerberized Cluster • Security: Authentication in Spark UI against LDAP/AD • Operations: Ambari Stack Definition: Install/Start/Stop/Config/Quick links to Spark UI • Phase 2 • Improve reliability & Scale of Spark-on-YARN • Enhance ORC support • Improve Debug Capabilities • Security: Wire Encryption and Authorization with XA/Argus • Operations: Spark logs published to YARN Application Timeline Service (ATS) • Operations: Enhanced workload management Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 11. Spark on Hadoop October 2nd, 2014 Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Ram Venkatesh
  • 12. Spark-on-Hadoop – End User Benefits Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved © Hortonworks Inc. 2013 • Developer Productivity • Simple, easy to use APIs • Direct and elegant representation of the data processing flow • Focus on application business logic rather than Hadoop internals • Integrated develop-deploy-debug experience through the IDE • Multi-tenancy • Shared infrastructure across workloads – interactive queries by day, batch ETL at night • Better utilization of compute capacity • Move the execution to the data tier instead of the other way around • Reduced load on distributed filesystem (HDFS) • Reduce unnecessary replicated reads and writes • Reduced network usage • Eliminates the need for data transfer in and out of the cluster Page 12
  • 13. Spark-on-Hadoop – Design considerations • Don’t solve problems that have already been solved. –Leverage discrete task based compute model for elasticity, scalability and fault tolerance –Leverage several man years of work in Hadoop Map-Reduce data shuffling operations –Leverage proven resource sharing and multi-tenancy model for Hadoop and YARN –Leverage built-in security mechanisms in Hadoop for privacy and isolation Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved © Hortonworks Inc. 2013 • Don’t create new problems –Preserve the simple developer experience –No changes to Spark programs, all programs run unmodified –Propose simple, mainstream in-the-community extension to the Apache Spark project Page 13 Look to the Future with an eye on the Past
  • 14. Spark on Hadoop – From service model to app model Spark jobs compile down to a Directed Acyclic Graph (DAG). • Vertices in the graph represent user logic • Edges represent data movement from producers to consumers • Spark DAG executed using Apache Tez at runtime Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved © Hortonworks Inc. 2013 Page 14 Preprocessor Stage Partition Stage Aggregate Stage Sampler Task-1 Task-2 Task-1 Task-2 Task-1 Task-2 Samples Ranges Distributed Sort
  • 15. Spark-on-Hadoop – Simplifying Operations • No deployments to do. No side effects. Easy and safe to try it out! • Completely client side application. • Simply upload to any accessible FileSystem and point to the cluster through configuration files. • Enables running different versions concurrently. Easy to test new functionality while keeping stable versions for production. • Leverages YARN local resources. Spark Client TezTask Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved TezTask © Hortonworks Inc. 2013 Page 15 Client Machine Node Manager Node Manager HDFS Spark-v1 Spark-v2 Spark Client Client Machine
  • 16. Benefits of native Hadoop execution of Spark DAGs • Elastic resource management - dynamic acquisition and release of containers •Works with YARN pre-emption, reservation and headroom calculations • Auto-parallelism based on sampling – you no longer need to guess no. of reducers • Efficient data movement between stages using the Hadoop shuffle • Integrates with resource isolation and governance mechanisms in Hadoop • Classpath and jarfile management through local resources • Detailed job-level metrics through integration with the YARN ATS Enables large-scale, multi-tenant batch ETL Spark programs Page 16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved © Hortonworks Inc. 2013 Page 16
  • 17. Introducing SPARK-3561 Page 17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 18. DEMO: SPARK-3561 in action Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 19. SPARK-3561 under the hood Example program: Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 20. SPARK-3561 Demo – contd. Execute program using spark-submit spark-submit --class dev.demo.WordCount --master execution-context:org.apache.spark.tez.TezJobExecutionContext spark-on-hadoop-1.0.jar 1 test.txt Execute interactive Spark commands through spark-shell spark-shell --master execution-context:org.apache.spark.tez.TezJobExecutionContext INFO main spark.SparkContext:59 - Will use custom job execution context org.apache.spark.tez.TezJobExecutionContext INFO main adapter.SparkToTezAdapter:59 - Adapting PairRDDFunctions.saveAsNewAPIHadoopDataset for Tez INFO main repl.SparkILoop:59 - Created spark context.. Spark context available as sc. scala> Page 20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 21. SPARK-3561 – feedback requested Provide feedback on your ETL/batch scenarios Participate in the discussion on the JIRA Try it out when it becomes available Looking for early adopters to run and validate at scale Page 21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 22. Resources • Spark Labs Page : http://hortonworks.com/hadoop/spark/ • Spark Roadmap Blog : http://hortonworks.com/blog/extending-spark-yarn- enterprise-hadoop/ • Spark 1.1.0 Tech Preview : http://hortonworks.com/kb/spark-1-1-0- technical-preview-hdp-2-1-5/ • Public Spark Forums : http://hortonworks.com/community/forums/forum/spark/ • Spark-3561 : https://issues.apache.org/jira/browse/SPARK-3561 Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 23. Q&A… Discussion Page 23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved