SlideShare ist ein Scribd-Unternehmen logo
1 von 5
Downloaden Sie, um offline zu lesen
Spark & Scala Course Contents
Describe Features of Apache Spark
 How Spark fits in Big Data ecosystem
 Why Spark & Hadoop fit together
Define Spark Components
 Driver Program
 Spark Context
 Cluster Manager
 Worker
 Executor
 Task
 Spark RDD
 Spark Context
 Spark Libraries
Load data into Spark
 Different data sources and formats
 HDFS
 Amazon S3
 Local File System
 Text
 JSON
 CSV
 Sequence File
 Create & Use RDD, Data Frames
Apply dataset operations to Resilient Distributed Datasets
 Transformation
 Actions
 Cache Intermediate RDD
 Lineage Graph
 Lazy Evaluation
Use Spark DataFrames for simple queries
 Create Data Frame
 Spark Interactive shell (Scala & Python)
 Spark SQL
Define different ways to run your application
Build and launch a standalone application
 Spark Program Life Cycle
 Function of Spark Context
 Different Way to Launch Spark Application
 Local
 Standalone
 Hadoop YARN
 Apache Mesos
 Launch Spark Application
 Spark-Submit
 Monitor the Spark Job
Describe & Create pair RDD
 Key-Value pair
 Apache Spark vs Apache Hadoop MapReduce
 Create RDD from existing non-pair RDD
 Create pair RDD by loading certain formats
 Create pair RDD from in-memory collection of pairs
Apply Operations on pair RDD
 Group ByKey
 Reduce ByKey
 Other Transformations
 Joins
Control partitioning across nodes
 RDD Partition
 Types of Partition
 Hash Partitioning
 Range Partitioning
 Benefit of Partitioning
 Best Practices
More on Data Frames
 Explore Data in DataFrames
 Create UDFs (user define functions)
 UDF with Scala DSL
 UDF with SQL
 Repartition Data Frames.
 Infer Schema by Reflection
 DataFrame from database table
 DataFrame from JSON
Monitor Apache Spark Applications
 Spark Execution Model
 Debug and Tune Spark Applications
Identify Spark Unified Stack Components
 Spark SQL
 Spark Streaming
 Spark MLib
 Spark GraphX
Benefits of Apache Spark over Hadoop Ecosystem
Describe Spark Data pipeline Use Cases
 Spark Streaming Architecture
 Dstream and a spark streaming application
 Define Use Case (Time Series Data)
 Basic Steps
 Save Data to HBase
 Operations on DStream
 Transformations
 Data Frame and SQL Operations
 Define Windowed Operation
 Sliding Window
 Windowed Computation
 Window based Transformation
 Window Operations
 Fault tolerance of streaming applications
 Fault Tolerance in Spark Streaming
 Fault Tolerance in Spark RDD
 Check pointing
Describe Graph X
Define Regular, Directed, and property graphs
Create a Property Graph
Perform Operations on Graphs
Describe Apache Spark MLib
Describe the Machine Learning Techniques
 Classifications
 Clustering
 Collaborative Filtering
Use Collaborative filtering to predict user choice
Scala
 Introduction
 A first example
 Expressions and Simple Functions
 First Class function
 Classes and Objects
 Case classes and Pattern matching
 Generic types and methods
 Lists
 For- Comprehension
 Mutable State
 Computing with Streams
 Lazy Values
 Implicit Parameters and Conversions
 Handley / Milner type Interface
 Abstraction for concurrency

Weitere ähnliche Inhalte

Was ist angesagt?

Spark SQL - 10 Things You Need to Know
Spark SQL - 10 Things You Need to KnowSpark SQL - 10 Things You Need to Know
Spark SQL - 10 Things You Need to Know
Kristian Alexander
 

Was ist angesagt? (20)

Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)
 
Introduction to Spark (Intern Event Presentation)
Introduction to Spark (Intern Event Presentation)Introduction to Spark (Intern Event Presentation)
Introduction to Spark (Intern Event Presentation)
 
Databricks @ Strata SJ
Databricks @ Strata SJDatabricks @ Strata SJ
Databricks @ Strata SJ
 
Stanford CS347 Guest Lecture: Apache Spark
Stanford CS347 Guest Lecture: Apache SparkStanford CS347 Guest Lecture: Apache Spark
Stanford CS347 Guest Lecture: Apache Spark
 
H2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks CloudH2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks Cloud
 
GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™
 
Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFrames
 
Data Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data FactoryData Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data Factory
 
The BDAS Open Source Community
The BDAS Open Source CommunityThe BDAS Open Source Community
The BDAS Open Source Community
 
Learning spark ch09 - Spark SQL
Learning spark ch09 - Spark SQLLearning spark ch09 - Spark SQL
Learning spark ch09 - Spark SQL
 
Spark streaming state of the union
Spark streaming state of the unionSpark streaming state of the union
Spark streaming state of the union
 
Spark SQL - 10 Things You Need to Know
Spark SQL - 10 Things You Need to KnowSpark SQL - 10 Things You Need to Know
Spark SQL - 10 Things You Need to Know
 
Apache® Spark™ 1.5 presented by Databricks co-founder Patrick Wendell
Apache® Spark™ 1.5 presented by Databricks co-founder Patrick WendellApache® Spark™ 1.5 presented by Databricks co-founder Patrick Wendell
Apache® Spark™ 1.5 presented by Databricks co-founder Patrick Wendell
 
Learning spark ch06 - Advanced Spark Programming
Learning spark ch06 - Advanced Spark ProgrammingLearning spark ch06 - Advanced Spark Programming
Learning spark ch06 - Advanced Spark Programming
 
Enabling Exploratory Analysis of Large Data with Apache Spark and R
Enabling Exploratory Analysis of Large Data with Apache Spark and REnabling Exploratory Analysis of Large Data with Apache Spark and R
Enabling Exploratory Analysis of Large Data with Apache Spark and R
 
Apache Spark An Overview
Apache Spark An OverviewApache Spark An Overview
Apache Spark An Overview
 
Learning spark ch04 - Working with Key/Value Pairs
Learning spark ch04 - Working with Key/Value PairsLearning spark ch04 - Working with Key/Value Pairs
Learning spark ch04 - Working with Key/Value Pairs
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
 
SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...
SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...
SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...
 

Ähnlich wie Spark and scala course content | Spark and scala course online training

Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Simplilearn
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn
 
An Introduction to Spark
An Introduction to SparkAn Introduction to Spark
An Introduction to Spark
jlacefie
 

Ähnlich wie Spark and scala course content | Spark and scala course online training (20)

Spark as a Service with Azure Databricks
Spark as a Service with Azure DatabricksSpark as a Service with Azure Databricks
Spark as a Service with Azure Databricks
 
Introduction to Spark - DataFactZ
Introduction to Spark - DataFactZIntroduction to Spark - DataFactZ
Introduction to Spark - DataFactZ
 
Azure Databricks is Easier Than You Think
Azure Databricks is Easier Than You ThinkAzure Databricks is Easier Than You Think
Azure Databricks is Easier Than You Think
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
 
Bring the Spark To Your Eyes
Bring the Spark To Your EyesBring the Spark To Your Eyes
Bring the Spark To Your Eyes
 
Apache Spark on HDinsight Training
Apache Spark on HDinsight TrainingApache Spark on HDinsight Training
Apache Spark on HDinsight Training
 
Spark core
Spark coreSpark core
Spark core
 
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
 
Composable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and WeldComposable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and Weld
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
 
Apache Spark Introduction.pdf
Apache Spark Introduction.pdfApache Spark Introduction.pdf
Apache Spark Introduction.pdf
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
 
Bds session 13 14
Bds session 13 14Bds session 13 14
Bds session 13 14
 
Spark Workshop
Spark WorkshopSpark Workshop
Spark Workshop
 
Spark
SparkSpark
Spark
 
SPARK ARCHITECTURE
SPARK ARCHITECTURESPARK ARCHITECTURE
SPARK ARCHITECTURE
 
An Introduction to Spark
An Introduction to SparkAn Introduction to Spark
An Introduction to Spark
 
An Introduct to Spark - Atlanta Spark Meetup
An Introduct to Spark - Atlanta Spark MeetupAn Introduct to Spark - Atlanta Spark Meetup
An Introduct to Spark - Atlanta Spark Meetup
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
 

Kürzlich hochgeladen

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 

Kürzlich hochgeladen (20)

microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 

Spark and scala course content | Spark and scala course online training

  • 1. Spark & Scala Course Contents Describe Features of Apache Spark  How Spark fits in Big Data ecosystem  Why Spark & Hadoop fit together Define Spark Components  Driver Program  Spark Context  Cluster Manager  Worker  Executor  Task  Spark RDD  Spark Context  Spark Libraries Load data into Spark  Different data sources and formats  HDFS  Amazon S3  Local File System  Text  JSON  CSV  Sequence File  Create & Use RDD, Data Frames Apply dataset operations to Resilient Distributed Datasets  Transformation  Actions  Cache Intermediate RDD  Lineage Graph  Lazy Evaluation Use Spark DataFrames for simple queries
  • 2.  Create Data Frame  Spark Interactive shell (Scala & Python)  Spark SQL Define different ways to run your application Build and launch a standalone application  Spark Program Life Cycle  Function of Spark Context  Different Way to Launch Spark Application  Local  Standalone  Hadoop YARN  Apache Mesos  Launch Spark Application  Spark-Submit  Monitor the Spark Job Describe & Create pair RDD  Key-Value pair  Apache Spark vs Apache Hadoop MapReduce  Create RDD from existing non-pair RDD  Create pair RDD by loading certain formats  Create pair RDD from in-memory collection of pairs Apply Operations on pair RDD  Group ByKey  Reduce ByKey  Other Transformations  Joins Control partitioning across nodes  RDD Partition  Types of Partition  Hash Partitioning
  • 3.  Range Partitioning  Benefit of Partitioning  Best Practices More on Data Frames  Explore Data in DataFrames  Create UDFs (user define functions)  UDF with Scala DSL  UDF with SQL  Repartition Data Frames.  Infer Schema by Reflection  DataFrame from database table  DataFrame from JSON Monitor Apache Spark Applications  Spark Execution Model  Debug and Tune Spark Applications Identify Spark Unified Stack Components  Spark SQL  Spark Streaming  Spark MLib  Spark GraphX Benefits of Apache Spark over Hadoop Ecosystem Describe Spark Data pipeline Use Cases  Spark Streaming Architecture  Dstream and a spark streaming application  Define Use Case (Time Series Data)  Basic Steps  Save Data to HBase  Operations on DStream  Transformations
  • 4.  Data Frame and SQL Operations  Define Windowed Operation  Sliding Window  Windowed Computation  Window based Transformation  Window Operations  Fault tolerance of streaming applications  Fault Tolerance in Spark Streaming  Fault Tolerance in Spark RDD  Check pointing Describe Graph X Define Regular, Directed, and property graphs Create a Property Graph Perform Operations on Graphs Describe Apache Spark MLib Describe the Machine Learning Techniques  Classifications  Clustering  Collaborative Filtering Use Collaborative filtering to predict user choice Scala  Introduction  A first example  Expressions and Simple Functions  First Class function  Classes and Objects  Case classes and Pattern matching  Generic types and methods  Lists  For- Comprehension  Mutable State
  • 5.  Computing with Streams  Lazy Values  Implicit Parameters and Conversions  Handley / Milner type Interface  Abstraction for concurrency