Suche senden
Hochladen
Intro to Spark with Zeppelin
•
Als PPTX, PDF herunterladen
•
10 gefällt mir
•
2,269 views
Hortonworks
Folgen
Intro to Spark with Zeppelin
Weniger lesen
Mehr lesen
Technologie
Melden
Teilen
Melden
Teilen
1 von 38
Jetzt herunterladen
Empfohlen
Log Analytics Optimization
Log Analytics Optimization
Hortonworks
Spark Summit EMEA - Arun Murthy's Keynote
Spark Summit EMEA - Arun Murthy's Keynote
Hortonworks
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5
Hortonworks
ODPi 101: Who we are, What we do
ODPi 101: Who we are, What we do
Hortonworks
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Hortonworks
Falcon Meetup
Falcon Meetup
Hortonworks
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks
Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks
Empfohlen
Log Analytics Optimization
Log Analytics Optimization
Hortonworks
Spark Summit EMEA - Arun Murthy's Keynote
Spark Summit EMEA - Arun Murthy's Keynote
Hortonworks
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5
Hortonworks
ODPi 101: Who we are, What we do
ODPi 101: Who we are, What we do
Hortonworks
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Hortonworks
Falcon Meetup
Falcon Meetup
Hortonworks
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks
Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?
DataWorks Summit
Apache Hadoop Crash Course
Apache Hadoop Crash Course
DataWorks Summit/Hadoop Summit
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
Design a Dataflow in 7 minutes with Apache NiFi/HDF
Design a Dataflow in 7 minutes with Apache NiFi/HDF
Hortonworks
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and Future
DataWorks Summit
Intro to Spark & Zeppelin - Crash Course - HS16SJ
Intro to Spark & Zeppelin - Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
MiNiFi 0.0.1 MeetUp talk
MiNiFi 0.0.1 MeetUp talk
Joe Percivall
Scaling real time streaming architectures with HDF and Dell EMC Isilon
Scaling real time streaming architectures with HDF and Dell EMC Isilon
Hortonworks
Hadoop and Spark – Perfect Together
Hadoop and Spark – Perfect Together
Hortonworks
What’s new in Apache Spark 2.3 and Spark 2.4
What’s new in Apache Spark 2.3 and Spark 2.4
DataWorks Summit
Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Hortonworks
Best Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop Environment
DataWorks Summit/Hadoop Summit
Apache Atlas: Governance for your Data
Apache Atlas: Governance for your Data
DataWorks Summit/Hadoop Summit
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
Hortonworks
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3
DataWorks Summit
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...
DataWorks Summit/Hadoop Summit
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
DataWorks Summit
Scalable and adaptable typosquatting detection in Apache Metron
Scalable and adaptable typosquatting detection in Apache Metron
DataWorks Summit
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Alex Zeltov
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
DataWorks Summit/Hadoop Summit
Weitere ähnliche Inhalte
Was ist angesagt?
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?
DataWorks Summit
Apache Hadoop Crash Course
Apache Hadoop Crash Course
DataWorks Summit/Hadoop Summit
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
Design a Dataflow in 7 minutes with Apache NiFi/HDF
Design a Dataflow in 7 minutes with Apache NiFi/HDF
Hortonworks
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and Future
DataWorks Summit
Intro to Spark & Zeppelin - Crash Course - HS16SJ
Intro to Spark & Zeppelin - Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
MiNiFi 0.0.1 MeetUp talk
MiNiFi 0.0.1 MeetUp talk
Joe Percivall
Scaling real time streaming architectures with HDF and Dell EMC Isilon
Scaling real time streaming architectures with HDF and Dell EMC Isilon
Hortonworks
Hadoop and Spark – Perfect Together
Hadoop and Spark – Perfect Together
Hortonworks
What’s new in Apache Spark 2.3 and Spark 2.4
What’s new in Apache Spark 2.3 and Spark 2.4
DataWorks Summit
Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Hortonworks
Best Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop Environment
DataWorks Summit/Hadoop Summit
Apache Atlas: Governance for your Data
Apache Atlas: Governance for your Data
DataWorks Summit/Hadoop Summit
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
Hortonworks
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3
DataWorks Summit
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...
DataWorks Summit/Hadoop Summit
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
DataWorks Summit
Scalable and adaptable typosquatting detection in Apache Metron
Scalable and adaptable typosquatting detection in Apache Metron
DataWorks Summit
Was ist angesagt?
(20)
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?
Apache Hadoop Crash Course
Apache Hadoop Crash Course
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Design a Dataflow in 7 minutes with Apache NiFi/HDF
Design a Dataflow in 7 minutes with Apache NiFi/HDF
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and Future
Intro to Spark & Zeppelin - Crash Course - HS16SJ
Intro to Spark & Zeppelin - Crash Course - HS16SJ
MiNiFi 0.0.1 MeetUp talk
MiNiFi 0.0.1 MeetUp talk
Scaling real time streaming architectures with HDF and Dell EMC Isilon
Scaling real time streaming architectures with HDF and Dell EMC Isilon
Hadoop and Spark – Perfect Together
Hadoop and Spark – Perfect Together
What’s new in Apache Spark 2.3 and Spark 2.4
What’s new in Apache Spark 2.3 and Spark 2.4
Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Best Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop Environment
Apache Atlas: Governance for your Data
Apache Atlas: Governance for your Data
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
Scalable and adaptable typosquatting detection in Apache Metron
Scalable and adaptable typosquatting detection in Apache Metron
Ähnlich wie Intro to Spark with Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Alex Zeltov
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
DataWorks Summit/Hadoop Summit
Apache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster Computing
All Things Open
Apache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
Hortonworks
Big data processing using HPCC Systems Above and Beyond Hadoop
Big data processing using HPCC Systems Above and Beyond Hadoop
HPCC Systems
Spark mhug2
Spark mhug2
Joseph Niemiec
Apache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop Summit
Saptak Sen
Row/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache Spark
DataWorks Summit/Hadoop Summit
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
MLconf
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Alex Zeltov
Spark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop Summit
DataWorks Summit
Apache Spark and Object Stores
Apache Spark and Object Stores
Steve Loughran
eScience Cluster Arch. Overview
eScience Cluster Arch. Overview
Francesco Bongiovanni
Druid deep dive
Druid deep dive
Kashif Khan
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
VMware Tanzu
Hive acid and_2.x new_features
Hive acid and_2.x new_features
Alberto Romero
Spark Summit EU talk by Steve Loughran
Spark Summit EU talk by Steve Loughran
Spark Summit
Oracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by Example
Harald Erb
Unified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache Flink
Slim Baltagi
Ähnlich wie Intro to Spark with Zeppelin
(20)
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Apache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster Computing
Apache Spark Crash Course
Apache Spark Crash Course
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
Big data processing using HPCC Systems Above and Beyond Hadoop
Big data processing using HPCC Systems Above and Beyond Hadoop
Spark mhug2
Spark mhug2
Apache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop Summit
Row/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache Spark
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Spark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop Summit
Apache Spark and Object Stores
Apache Spark and Object Stores
eScience Cluster Arch. Overview
eScience Cluster Arch. Overview
Druid deep dive
Druid deep dive
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Hive acid and_2.x new_features
Hive acid and_2.x new_features
Spark Summit EU talk by Steve Loughran
Spark Summit EU talk by Steve Loughran
Oracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by Example
Unified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache Flink
Mehr von Hortonworks
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Hortonworks
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Hortonworks
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
Hortonworks
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Hortonworks
HDF 3.2 - What's New
HDF 3.2 - What's New
Hortonworks
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Hortonworks
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Hortonworks
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
Hortonworks
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
Hortonworks
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
Hortonworks
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Hortonworks
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Hortonworks
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Hortonworks
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
Hortonworks
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Hortonworks
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
Hortonworks
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Hortonworks
Mehr von Hortonworks
(20)
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
HDF 3.2 - What's New
HDF 3.2 - What's New
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Kürzlich hochgeladen
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
Nanddeep Nachan
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Remote DBA Services
Architecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Juan lago vázquez
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
apidays
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
apidays
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
ThousandEyes
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Zilliz
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
MIND CTI
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Jeffrey Haguewood
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
Sandro Moreira
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
Christopher Logan Kennedy
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
Remote DBA Services
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
Zilliz
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Orbitshub
presentation ICT roal in 21st century education
presentation ICT roal in 21st century education
jfdjdjcjdnsjd
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
danishmna97
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Zilliz
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Orbitshub
Kürzlich hochgeladen
(20)
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Architecting Cloud Native Applications
Architecting Cloud Native Applications
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
presentation ICT roal in 21st century education
presentation ICT roal in 21st century education
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Intro to Spark with Zeppelin
1.
Robert Hryniewicz Data Evangelist @RobHryniewicz Intro
to Spark & Zeppelin
2.
2 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Apache Spark Background
3.
3 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved What is Spark? Apache Open Source Project - originally developed at AMPLab (University of California Berkeley) Data Processing Engine - focused on in-memory distributed computing use-cases API - Scala, Python, Java and R
4.
4 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Spark Ecosystem Spark Core Spark SQL Spark Streaming MLLib GraphX
5.
5 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Why Spark? Elegant Developer APIs – Single environment for data munging and Machine Learning (ML) In-memory computation model – Fast! – Effective for iterative computations and ML Machine Learning – Implementation of distributed ML algorithms – Pipeline API (Spark ML)
6.
6 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved History of Hadoop & Spark
7.
7 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Apache Spark Basics
8.
8 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Spark Context Main entry point for Spark functionality Represents a connection to a Spark cluster Represented as sc in your code What is it?
9.
9 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved RDD - Resilient Distributed Dataset Primary abstraction in Spark – An Immutable collection of objects (or records, or elements) that can be operated on in parallel Distributed – Collection of elements partitioned across nodes in a cluster – Each RDD is composed of one or more partitions – User can control the number of partitions – More partitions => more parallelism Resilient – Recover from node failures – An RDD keeps its lineage information -> it can be recreated from parent RDDs Created by starting with a file in Hadoop Distributed File System (HDFS) or an existing collection in the driver program May be persisted in memory for efficient reuse across parallel operations (caching)
10.
10 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved RDD – Resilient Distributed Dataset Partition 1 Partition 2 Partition 3 RDD 2 Partition 1 Partition 2 Partition 3 Partition 4 RDD 1 Cluster Nodes
11.
11 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Spark SQL
12.
12 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Spark SQL Overview Spark module for structured data processing (e.g. DB tables, JSON files) Three ways to manipulate data: – DataFrames API – SQL queries – Datasets API Same execution engine for all three Spark SQL interfaces provide more information about both structure and computation being performed than basic Spark RDD API
13.
13 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved DataFrames Conceptually equivalent to a table in relational DB or data frame in R/Python API available in Scala, Java, Python, and R Richer optimizations (significantly faster than RDDs) Distributed collection of data organized into named columns Underneath is an RDD
14.
14 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved DataFrames CSVAvro HIVE Spark SQL Text Col1 Col2 … … ColN DataFrame (with RDD underneath) Column Row Created from Various Sources DataFrames from HIVE: – Reading and writing HIVE tables, including ORC DataFrames from files: – Built-in: JSON, JDBC, ORC, Parquet, HDFS – External plug-in: CSV, HBASE, Avro DataFrames from existing RDDs – with toDF()function Data is described as a DataFrame with rows, columns and a schema
15.
15 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved SQL Context and Hive Context Entry point into all functionality in Spark SQL All you need is SparkContext val sqlContext = SQLContext(sc) SQLContext Superset of functionality provided by basic SQLContext – Read data from Hive tables – Access to Hive Functions UDFs HiveContext val hc = HiveContext(sc) Use when your data resides in Hive
16.
16 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Spark SQL Examples
17.
17 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved DataFrame Example val df = sqlContext.table("flightsTbl") df.select("Origin", "Dest", "DepDelay").show(5) Reading Data From Table +------+----+--------+ |Origin|Dest|DepDelay| +------+----+--------+ | IAD| TPA| 8| | IAD| TPA| 19| | IND| BWI| 8| | IND| BWI| -4| | IND| BWI| 34| +------+----+--------+
18.
18 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved DataFrame Example df.select("Origin", "Dest", "DepDelay”).filter($"DepDelay" > 15).show(5) Using DataFrame API to Filter Data (show delays more than 15 min) +------+----+--------+ |Origin|Dest|DepDelay| +------+----+--------+ | IAD| TPA| 19| | IND| BWI| 34| | IND| JAX| 25| | IND| LAS| 67| | IND| MCO| 94| +------+----+--------+
19.
19 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved SQL Example // Register Temporary Table df.registerTempTable("flights") // Use SQL to Query Dataset sqlContext.sql("SELECT Origin, Dest, DepDelay FROM flights WHERE DepDelay > 15 LIMIT 5").show Using SQL to Query and Filter Data (again, show delays more than 15 min) +------+----+--------+ |Origin|Dest|DepDelay| +------+----+--------+ | IAD| TPA| 19| | IND| BWI| 34| | IND| JAX| 25| | IND| LAS| 67| | IND| MCO| 94| +------+----+--------+
20.
20 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved RDD vs. DataFrame
21.
21 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved RDDs vs. DataFrames RDD DataFrame Lower-level API (more control) Lots of existing code & users Compile-time type-safety Higher-level API (faster development) Faster sorting, hashing, and serialization More opportunities for automatic optimization Lower memory pressure
22.
22 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Data Frames are Intuitive RDD Example Equivalent Data Frame Example dept name age Bio H Smith 48 CS A Turing 54 Bio B Jones 43 Phys E Witten 61 Find average age by department?
23.
23 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Spark SQL Optimizations Spark SQL uses an underlying optimization engine (Catalyst) – Catalyst can perform intelligent optimization since it understands the schema Spark SQL does not materialize all the columns (as with RDD) only what’s needed
24.
24 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Apache Zeppelin & HDP Sandbox
25.
25 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Apache Zeppelin Web-based Notebook for interactive analytics Use Cases – Data exploration and discovery – Visualization – Interactive snippet-at-a-time experience – “Modern Data Science Studio” Features – Deeply integrated with Spark and Hadoop – Supports multiple language backends – Pluggable “Interpreters”
26.
26 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved What’s not included with Spark? Resource Management Storage Applications Spark Core Engine Scala Java Python libraries MLlib (Machine learning) Spark SQL* Spark Streaming* Spark Core Engine
27.
27 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved HDP Sandbox What’s included in the Sandbox? Zeppelin Latest Hortonworks Data Platform (HDP) – Spark – YARN Resource Management – HDFS Distributed Storage Layer – And many more components... YARN Scala Java Python R APIs Spark Core Engine Spark SQL Spark Streaming MLlib GraphX 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N HDFS
28.
28 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Access patterns enabled by YARN YARN: Data Operating System 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° °N HDFS Hadoop Distributed File System Interactive Real-TimeBatch Applications Batch Needs to happen but, no timeframe limitations Interactive Needs to happen at Human time Real-Time Needs to happen at Machine Execution time.
29.
29 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Why Spark on YARN? Utilize existing HDP cluster infrastructure Resource management – share Spark workloads with other workloads like PIG, HIVE, etc. Scheduling and queues Spark Driver Client Spark Application Master YARN container Spark Executor YARN container Task Task Spark Executor YARN container Task Task Spark Executor YARN container Task Task
30.
30 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Why HDFS? Fault Tolerant Distributed Storage • Divide files into big blocks and distribute 3 copies randomly across the cluster • Processing Data Locality • Not Just storage but computation 10110100101 00100111001 11111001010 01110100101 00101100100 10101001100 01010010111 01011101011 11011011010 10110100101 01001010101 01011100100 11010111010 0 Logical File 1 2 3 4 Blocks 1 Cluster 1 1 2 2 2 3 3 34 4 4
31.
31 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved There’s more to HDP YARN : Data Operating System DATA ACCESS SECURITY GOVERNANCE & INTEGRATION OPERATIONS 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N Data Lifecycle & Governance Falcon Atlas Administration Authentication Authorization Auditing Data Protection Ranger Knox Atlas HDFS EncryptionData Workflow Sqoop Flume Kafka NFS WebHDFS Provisioning, Managing, & Monitoring Ambari Cloudbreak Zookeeper Scheduling Oozie Batch MapReduce Script Pig Search Solr SQL Hive NoSQL HBase Accumulo Phoenix Stream Storm In-memory Others ISV Engines Tez Tez Slider Slider DATA MANAGEMENT Hortonworks Data Platform 2.4.x Deployment ChoiceLinux Windows On-Premise Cloud HDFS Hadoop Distributed File System
32.
32 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Hortonworks Community Connection
33.
33 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved community.hortonworks.com
34.
34 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved community.hortonworks.com
35.
35 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved HCC DS, Analytics, and Spark Related Questions Sample
36.
36 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Lab Preview
37.
37 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Link to Tutorials with Lab Instructions http://tinyurl.com/hwx-intro-to-spark
38.
Thank you! community.hortonworks.com
Jetzt herunterladen