SlideShare ist ein Scribd-Unternehmen logo
1 von 45
Downloaden Sie, um offline zu lesen
Big Data 2.0
HOW SPARK TECHNOLOGIES ARE RESHAPING THE
WORLD OF BIG DATA ANALYTICS
Presented By: Lillian Pierson, P.E.
Today’s webinar
Apache Spark: Journey from “Hadoop Eco System component” to “Big
Data platform”
The story of how Spark began
Is Spark a data engineering or data science platform?
Who is using Spark and for what?
Got Spark skills? Here’s why you should
Apache Spark
JOURNEY FROM “HADOOP ECO SYSTEM
COMPONENT” TO “BIG DATA PLATFORM”
What is Spark?
“In-memory computing appliances
are … faster than the traditional
Hadoop system because in-
memory appliances don’t use
MapReduce… By storing data in
memory, in-memory appliances are
able to bypass the time-consuming
disk accesses that are required as
part of the map and reduce
operations that comprise the
MapReduce process. In-memory
data storage processing, and
analysis is fast enough to generate
data analytics in real-time, derived
from streaming data sources.“ –
Excerpt from my book:
Big Data/Hadoop for Dummies
Why in-memory
applications?
From Hadoop ecosystem
component…
HDFS
MapReduce
2.0
YARN
From Hadoop ecosystem
component…
HDFS
Spark
MapReduce
2.0
YARN
To big data platform
HDFS
MapReduce
2.0
Spark YARN
To big data platform
Spark-as-a-Service
Spark’s 4 submodules
Spark SQL MLlib
GraphX Streaming
Spark SQL module
DataFrames
Spark SQL
◦ SQL
Hive
◦ HiveQL
◦ Spark Processing Engine
Mllib module
Data analysis
Statistics
Machine learning
GraphX module
Graph data storage and processing
Graphx
◦ In-memory graph data processing
HDFS
◦ Graph data storage
Streaming module
Continuously
Streaming
Data
Discreet Data
Streams
(Dstream)
Micro-batch processing
Dstreams and micro-batch
architecture
Source: http://www.slideshare.net/skpabba/hadoop-and-spark
RDD @ time 1 RDD @ time 2 RDD @ time 3
Basic Spark Architecture
Spark SQL MLlib GraphX Streaming
Physical Hardware
Data Storage Layer (HDFS)
Resource Manager (YARN)
Spark Core Libraries
Single Abstraction Layer
Processing Processing Processing Processing
Changes with Spark 2.0
RDD API
•DataFrame
API
Spark
1.0
•RDD API
•DataFrame
API
Spark
1.3
*RDD API
*DataFrame
API
*Dataset API
Spark
1.6
Dataset API
•DataFrame
API
•RDD API
Spark
2.0
Changes with Spark 2.0
RDD API
Dataset API
DataFrame API
RDD API
Spark 1.0 Spark 2.0
Changes with Spark 2.0
Structured
Stream
Processing
DataFrame API
Dataset API
The story of how
Spark began
Taking things from the
beginning…
2009
Mesos
UC Berkeley
Interactive, iterative parallel processing (in-
memory)
◦ Machine learning requirements
Integrates with Hadoop ecosystem
Dr. Ion Stoica
Computer Science Professor
UC Berkeley
Databricks… the cutting edge
of Spark
Delivers Apache Spark-as-a-Service
Most popular solution for deploying Spark on
the cloud
Dr. Ion Stoica
Executive Chairman, Apache Databricks
Databricks… the cutting edge
of Spark
Spark on an as-needed basis
Automates
◦ Cluster building and configuration
◦ Security
◦ Process monitoring
◦ Resource monitoring
Notebooks
◦ For data analysis and machine learning using Python, R, and Scala
Data visualization capabilities
◦ Data visualization and dashboard design options
Is Spark a data
engineering or data
science platform?
DATA ENGINEERING COMPONENTS AND
TECHNOLOGIES
DATA SCIENCE COMPONENTS AND TECHNOLOGIES
Spark’s data engineering
elements
Automate cluster sizing and configuration requirements
Data Storage: HDFS
Resource Management:
◦ Spark Standalone
◦ Apache Mesos
◦ Hadoop YARN
Spark’s data engineering
elements
Spark Streaming Submodule – Reuse same code you use for batch
processing, but get real-time results!
◦ Integrates with big data source, like:
◦ HDFS
◦ Flume
◦ Kafka
◦ Twitter and
◦ ZeroMQ
Doing data science with Spark
Useful for machine learning and analysis of big data
Build big data analytics products
Programmable in Python, R, Scala, and SQL
Submodules:
◦ SQL and DataFrames
◦ MLlib for machine learning
◦ GraphX for in-memory big (graph) data computations
Doing data science with Spark
Spark integrates with the following data sources and formats:
◦ Hive, Avro, Parquet, CSV, JSON, and JDBC, HBase
◦ BI Tools: Tableau, QLIK, ZoomData, etc. (through JDBC)
Who is using
Spark and for
what?
A U T O M A T I C L A B S
L E N D U P
S E L L P O I N T S
F I N D I F Y
Automatic Labs on Databricks
Making cars smarter with real-time analytics
Connect to, and make smart use, of your car’s data
Automatic Labs on Databricks
Automatic apps do things like:
◦ Decoding engine problems
◦ Locating parked cars
◦ Crash detection and response
◦ Low fuel warnings, etc.
Automatic is using Spark to make cars smarter with real-time analytics
During product development, Automatic needs to query, explore, and
visualize large amounts of data, QUICKLY. By moving this work over to
Spark, Automatic was able to:
◦ Validate products in days, not weeks
◦ Complete complex queries in minutes
◦ Free up 1 full-time data scientist
◦ Save $10K/month on infrastructure costs
LendUp on
Databricks
Improving the lending
process and experience
“Moving up the LendUp
Ladder means earning
access to more money, at
better rates, for longer
periods of time” - LendUp
LendUp on Databricks
LendUp uses Spark for:
◦ Feature engineering at scale
◦ Fast model building and testing
By using Spark to do this work, LendUp is able to:
◦ Build more accurate models, faster
◦ Offer more lines of credit
◦ Develop new products more quickly
◦ Increase in-house productivity of data science team
sellpoints on Databricks
Increasing ROI on ad spend
sellpoints on Databricks
Increasing ROI on ad spend
Sellpoint offers services in:
◦ Identifying qualified shoppers
◦ Driving traffic
◦ Increasing sales conversion
By moving to Databricks, sellpoints was able to:
◦ Productize a new predictive analytics offering, improving the ad spend ROI
by threefold compared to competitive offerings.
◦ Reduce the time and effort required to deliver actionable insights to the
business team while lowering costs.
◦ Improve productivity of the engineering and data science team by
eliminating the time spent on DevOps and maintaining open source
software.
Findify on Databricks
Improving shopping experience for ecommerce customers
Uses machine learning to continually improve search accuracy
Findify on Databricks
Improving shopping experience for ecommerce customers
By moving to Databricks, Findify was able to:
◦ Focus on development instead of infrastructure – Allowing them to complete
their feature development projects faster and reduce customer frustration
in delayed analytics
◦ Focus on building innovative features - because the managed Spark platform
eliminated time spent on DevOps and infrastructure issues.
Uses machine learning to continually improve search accuracy
Got Spark skills?
Here’s why you
should
IMPACT ON SALARY
TRAINING ISSUES AND OPPORTUNITIES
How much do Spark skills pay?
2015 Data Science Salary Survey, by O’Reilly
$11,000
$4,000
$4,600
$8,000
$0
$2,000
$4,000
$6,000
$8,000
$10,000
$12,000
Spark Skills Scala Programming Basic Exploratory
Analysis (>4 hr/wk)
D3.js Skills
Annual Salary Increase
Annual Salary Increase
Getting training and
experience in Spark
$149.50
Sale
Until
March 30
Only
Discount
Code:
‘SPRING50’
Getting training and
experience in Spark
Get hands-on training in the following areas:
◦ Using RDD
◦ Writing applications using Scala
◦ Spark SQL
◦ Spark Streaming
◦ Machine Learning in Spark (Mllib)
◦ Spark GraphX
◦ Spark Project Implementation
Getting training and
experience in Spark
$149.50
Sale
Until
March 30
Only
Discount
Code:
‘SPRING50’
Download these slide
Why Data Science From Simplilearn
Key
Features
40 hours of real life
industry project
experience
25 hours of High
Quality e-learning
Visualize and
optimize data
effectively using
the built-in tools in
R , SAS and Excel
48 hours of Live
Instructor Led
Online sessions
Get proficient in
using R,SAS and Excel
to model data and
predict solutions to
business problems
Master the concepts
of statistical analysis
like linear & logistic
regression, cluster
analysis &
forecasting
OUR JOURNEY SO FAR Project
Management
Digital Marketing
Big Data &
Analytics
Business
Productivity
Tools
Quality
Management
Virtualization and
Cloud Computing
IT Security
Financial
Management
CompTIA
Certification
IT Hardware and
N/W ERP
IT Services and
Architecture
Agile and Scrum
Certification
OS and Database
Web and App
Programming
Simplilearn : World’s Largest Certification Training Destination
One of the largest collections of accredited certification training in the
world.
YEAR
2010
YEAR
2015
YEAR
2010
YEAR
2016

Weitere ähnliche Inhalte

Was ist angesagt?

Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Databricks
 
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Anya Bida
 
Apache spark-the-definitive-guide-excerpts-r1
Apache spark-the-definitive-guide-excerpts-r1Apache spark-the-definitive-guide-excerpts-r1
Apache spark-the-definitive-guide-excerpts-r1
AjayRawat971036
 
Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks
Databricks
 
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Databricks
 

Was ist angesagt? (20)

Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
 
Large-Scale Data Science in Apache Spark 2.0
Large-Scale Data Science in Apache Spark 2.0Large-Scale Data Science in Apache Spark 2.0
Large-Scale Data Science in Apache Spark 2.0
 
Composable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and WeldComposable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and Weld
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
 
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick WendellApache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
 
Strata NYC 2015: What's new in Spark Streaming
Strata NYC 2015: What's new in Spark StreamingStrata NYC 2015: What's new in Spark Streaming
Strata NYC 2015: What's new in Spark Streaming
 
Spark Summit EU 2016 Keynote - Simplifying Big Data in Apache Spark 2.0
Spark Summit EU 2016 Keynote - Simplifying Big Data in Apache Spark 2.0Spark Summit EU 2016 Keynote - Simplifying Big Data in Apache Spark 2.0
Spark Summit EU 2016 Keynote - Simplifying Big Data in Apache Spark 2.0
 
Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache Spark
 
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
 
Apache spark-the-definitive-guide-excerpts-r1
Apache spark-the-definitive-guide-excerpts-r1Apache spark-the-definitive-guide-excerpts-r1
Apache spark-the-definitive-guide-excerpts-r1
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks
 
What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutions
 
Introduction to Spark - DataFactZ
Introduction to Spark - DataFactZIntroduction to Spark - DataFactZ
Introduction to Spark - DataFactZ
 
What's New in Apache Spark 2.3 & Why Should You Care
What's New in Apache Spark 2.3 & Why Should You CareWhat's New in Apache Spark 2.3 & Why Should You Care
What's New in Apache Spark 2.3 & Why Should You Care
 
A Journey into Databricks' Pipelines: Journey and Lessons Learned
A Journey into Databricks' Pipelines: Journey and Lessons LearnedA Journey into Databricks' Pipelines: Journey and Lessons Learned
A Journey into Databricks' Pipelines: Journey and Lessons Learned
 
Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFrames
 
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
 

Andere mochten auch

TupleJump: Breakthrough OLAP performance on Cassandra and Spark
TupleJump: Breakthrough OLAP performance on Cassandra and SparkTupleJump: Breakthrough OLAP performance on Cassandra and Spark
TupleJump: Breakthrough OLAP performance on Cassandra and Spark
DataStax Academy
 

Andere mochten auch (20)

Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksOverview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and Spark
 
Unified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache FlinkUnified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache Flink
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
2016-03-17 Structural Value Engineering
2016-03-17 Structural Value Engineering2016-03-17 Structural Value Engineering
2016-03-17 Structural Value Engineering
 
Fighting Fraud with Apache Spark
Fighting Fraud with Apache SparkFighting Fraud with Apache Spark
Fighting Fraud with Apache Spark
 
Big Data Analytics with Spark
Big Data Analytics with SparkBig Data Analytics with Spark
Big Data Analytics with Spark
 
What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and Spark
 
ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
ETL to ML: Use Apache Spark as an end to end tool for Advanced AnalyticsETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
 
TupleJump: Breakthrough OLAP performance on Cassandra and Spark
TupleJump: Breakthrough OLAP performance on Cassandra and SparkTupleJump: Breakthrough OLAP performance on Cassandra and Spark
TupleJump: Breakthrough OLAP performance on Cassandra and Spark
 
Continuous Application with Structured Streaming 2.0
Continuous Application with Structured Streaming 2.0Continuous Application with Structured Streaming 2.0
Continuous Application with Structured Streaming 2.0
 
Data Source API in Spark
Data Source API in SparkData Source API in Spark
Data Source API in Spark
 
Python in Civil/Environmental Engineering
Python in Civil/Environmental EngineeringPython in Civil/Environmental Engineering
Python in Civil/Environmental Engineering
 
Apache Spark in Action
Apache Spark in ActionApache Spark in Action
Apache Spark in Action
 
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
 
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming Analytics
 

Ähnlich wie Big Data 2.0 - How Spark technologies are reshaping the world of big data analytics

Tech Spark Presentation
Tech Spark PresentationTech Spark Presentation
Tech Spark Presentation
Stephen Borg
 

Ähnlich wie Big Data 2.0 - How Spark technologies are reshaping the world of big data analytics (20)

Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on Azure
 
Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)
 
End-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache SparkEnd-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache Spark
 
Apache Spark in Scientific Applications
Apache Spark in Scientific ApplicationsApache Spark in Scientific Applications
Apache Spark in Scientific Applications
 
Apache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsApache Spark in Scientific Applciations
Apache Spark in Scientific Applciations
 
Delight: An Improved Apache Spark UI, Free, and Cross-Platform
Delight: An Improved Apache Spark UI, Free, and Cross-PlatformDelight: An Improved Apache Spark UI, Free, and Cross-Platform
Delight: An Improved Apache Spark UI, Free, and Cross-Platform
 
Solution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline AcceleratorSolution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline Accelerator
 
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
 
Hyf project ideas_02
Hyf project ideas_02Hyf project ideas_02
Hyf project ideas_02
 
Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要
 
2015 Data Science Summit @ dato Review
2015 Data Science Summit @ dato Review2015 Data Science Summit @ dato Review
2015 Data Science Summit @ dato Review
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | QuboleEbooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
 
Tech Spark Presentation
Tech Spark PresentationTech Spark Presentation
Tech Spark Presentation
 
An Insider’s Guide to Maximizing Spark SQL Performance
 An Insider’s Guide to Maximizing Spark SQL Performance An Insider’s Guide to Maximizing Spark SQL Performance
An Insider’s Guide to Maximizing Spark SQL Performance
 
Started with-apache-spark
Started with-apache-sparkStarted with-apache-spark
Started with-apache-spark
 
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics EcosystemXDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
 
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
 

Kürzlich hochgeladen

Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 

Kürzlich hochgeladen (20)

Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 

Big Data 2.0 - How Spark technologies are reshaping the world of big data analytics

  • 1. Big Data 2.0 HOW SPARK TECHNOLOGIES ARE RESHAPING THE WORLD OF BIG DATA ANALYTICS Presented By: Lillian Pierson, P.E.
  • 2. Today’s webinar Apache Spark: Journey from “Hadoop Eco System component” to “Big Data platform” The story of how Spark began Is Spark a data engineering or data science platform? Who is using Spark and for what? Got Spark skills? Here’s why you should
  • 3. Apache Spark JOURNEY FROM “HADOOP ECO SYSTEM COMPONENT” TO “BIG DATA PLATFORM”
  • 5. “In-memory computing appliances are … faster than the traditional Hadoop system because in- memory appliances don’t use MapReduce… By storing data in memory, in-memory appliances are able to bypass the time-consuming disk accesses that are required as part of the map and reduce operations that comprise the MapReduce process. In-memory data storage processing, and analysis is fast enough to generate data analytics in real-time, derived from streaming data sources.“ – Excerpt from my book: Big Data/Hadoop for Dummies Why in-memory applications?
  • 8. To big data platform HDFS MapReduce 2.0 Spark YARN
  • 9. To big data platform Spark-as-a-Service
  • 10. Spark’s 4 submodules Spark SQL MLlib GraphX Streaming
  • 11. Spark SQL module DataFrames Spark SQL ◦ SQL Hive ◦ HiveQL ◦ Spark Processing Engine
  • 13. GraphX module Graph data storage and processing Graphx ◦ In-memory graph data processing HDFS ◦ Graph data storage
  • 15. Dstreams and micro-batch architecture Source: http://www.slideshare.net/skpabba/hadoop-and-spark RDD @ time 1 RDD @ time 2 RDD @ time 3
  • 16. Basic Spark Architecture Spark SQL MLlib GraphX Streaming Physical Hardware Data Storage Layer (HDFS) Resource Manager (YARN) Spark Core Libraries Single Abstraction Layer Processing Processing Processing Processing
  • 17. Changes with Spark 2.0 RDD API •DataFrame API Spark 1.0 •RDD API •DataFrame API Spark 1.3 *RDD API *DataFrame API *Dataset API Spark 1.6 Dataset API •DataFrame API •RDD API Spark 2.0
  • 18. Changes with Spark 2.0 RDD API Dataset API DataFrame API RDD API Spark 1.0 Spark 2.0
  • 19. Changes with Spark 2.0 Structured Stream Processing DataFrame API Dataset API
  • 20. The story of how Spark began
  • 21. Taking things from the beginning… 2009 Mesos UC Berkeley Interactive, iterative parallel processing (in- memory) ◦ Machine learning requirements Integrates with Hadoop ecosystem Dr. Ion Stoica Computer Science Professor UC Berkeley
  • 22. Databricks… the cutting edge of Spark Delivers Apache Spark-as-a-Service Most popular solution for deploying Spark on the cloud Dr. Ion Stoica Executive Chairman, Apache Databricks
  • 23. Databricks… the cutting edge of Spark Spark on an as-needed basis Automates ◦ Cluster building and configuration ◦ Security ◦ Process monitoring ◦ Resource monitoring Notebooks ◦ For data analysis and machine learning using Python, R, and Scala Data visualization capabilities ◦ Data visualization and dashboard design options
  • 24. Is Spark a data engineering or data science platform? DATA ENGINEERING COMPONENTS AND TECHNOLOGIES DATA SCIENCE COMPONENTS AND TECHNOLOGIES
  • 25. Spark’s data engineering elements Automate cluster sizing and configuration requirements Data Storage: HDFS Resource Management: ◦ Spark Standalone ◦ Apache Mesos ◦ Hadoop YARN
  • 26. Spark’s data engineering elements Spark Streaming Submodule – Reuse same code you use for batch processing, but get real-time results! ◦ Integrates with big data source, like: ◦ HDFS ◦ Flume ◦ Kafka ◦ Twitter and ◦ ZeroMQ
  • 27. Doing data science with Spark Useful for machine learning and analysis of big data Build big data analytics products Programmable in Python, R, Scala, and SQL Submodules: ◦ SQL and DataFrames ◦ MLlib for machine learning ◦ GraphX for in-memory big (graph) data computations
  • 28. Doing data science with Spark Spark integrates with the following data sources and formats: ◦ Hive, Avro, Parquet, CSV, JSON, and JDBC, HBase ◦ BI Tools: Tableau, QLIK, ZoomData, etc. (through JDBC)
  • 29. Who is using Spark and for what? A U T O M A T I C L A B S L E N D U P S E L L P O I N T S F I N D I F Y
  • 30. Automatic Labs on Databricks Making cars smarter with real-time analytics Connect to, and make smart use, of your car’s data
  • 31. Automatic Labs on Databricks Automatic apps do things like: ◦ Decoding engine problems ◦ Locating parked cars ◦ Crash detection and response ◦ Low fuel warnings, etc. Automatic is using Spark to make cars smarter with real-time analytics During product development, Automatic needs to query, explore, and visualize large amounts of data, QUICKLY. By moving this work over to Spark, Automatic was able to: ◦ Validate products in days, not weeks ◦ Complete complex queries in minutes ◦ Free up 1 full-time data scientist ◦ Save $10K/month on infrastructure costs
  • 32. LendUp on Databricks Improving the lending process and experience “Moving up the LendUp Ladder means earning access to more money, at better rates, for longer periods of time” - LendUp
  • 33. LendUp on Databricks LendUp uses Spark for: ◦ Feature engineering at scale ◦ Fast model building and testing By using Spark to do this work, LendUp is able to: ◦ Build more accurate models, faster ◦ Offer more lines of credit ◦ Develop new products more quickly ◦ Increase in-house productivity of data science team
  • 35. sellpoints on Databricks Increasing ROI on ad spend Sellpoint offers services in: ◦ Identifying qualified shoppers ◦ Driving traffic ◦ Increasing sales conversion By moving to Databricks, sellpoints was able to: ◦ Productize a new predictive analytics offering, improving the ad spend ROI by threefold compared to competitive offerings. ◦ Reduce the time and effort required to deliver actionable insights to the business team while lowering costs. ◦ Improve productivity of the engineering and data science team by eliminating the time spent on DevOps and maintaining open source software.
  • 36. Findify on Databricks Improving shopping experience for ecommerce customers Uses machine learning to continually improve search accuracy
  • 37. Findify on Databricks Improving shopping experience for ecommerce customers By moving to Databricks, Findify was able to: ◦ Focus on development instead of infrastructure – Allowing them to complete their feature development projects faster and reduce customer frustration in delayed analytics ◦ Focus on building innovative features - because the managed Spark platform eliminated time spent on DevOps and infrastructure issues. Uses machine learning to continually improve search accuracy
  • 38. Got Spark skills? Here’s why you should IMPACT ON SALARY TRAINING ISSUES AND OPPORTUNITIES
  • 39. How much do Spark skills pay? 2015 Data Science Salary Survey, by O’Reilly $11,000 $4,000 $4,600 $8,000 $0 $2,000 $4,000 $6,000 $8,000 $10,000 $12,000 Spark Skills Scala Programming Basic Exploratory Analysis (>4 hr/wk) D3.js Skills Annual Salary Increase Annual Salary Increase
  • 40. Getting training and experience in Spark $149.50 Sale Until March 30 Only Discount Code: ‘SPRING50’
  • 41. Getting training and experience in Spark Get hands-on training in the following areas: ◦ Using RDD ◦ Writing applications using Scala ◦ Spark SQL ◦ Spark Streaming ◦ Machine Learning in Spark (Mllib) ◦ Spark GraphX ◦ Spark Project Implementation
  • 42. Getting training and experience in Spark $149.50 Sale Until March 30 Only Discount Code: ‘SPRING50’
  • 44. Why Data Science From Simplilearn Key Features 40 hours of real life industry project experience 25 hours of High Quality e-learning Visualize and optimize data effectively using the built-in tools in R , SAS and Excel 48 hours of Live Instructor Led Online sessions Get proficient in using R,SAS and Excel to model data and predict solutions to business problems Master the concepts of statistical analysis like linear & logistic regression, cluster analysis & forecasting
  • 45. OUR JOURNEY SO FAR Project Management Digital Marketing Big Data & Analytics Business Productivity Tools Quality Management Virtualization and Cloud Computing IT Security Financial Management CompTIA Certification IT Hardware and N/W ERP IT Services and Architecture Agile and Scrum Certification OS and Database Web and App Programming Simplilearn : World’s Largest Certification Training Destination One of the largest collections of accredited certification training in the world. YEAR 2010 YEAR 2015 YEAR 2010 YEAR 2016