In this one-hour webinar, you will be introduced to Spark, the data engineering that supports it, and the data science advances that it has spurned. You’ll discover the interesting story of its academic origins and then get an overview of the organizations who are using the technology. After being briefed on some impressive Spark case studies, you’ll come to know of the next-generation Spark 2.0 (to be released in just a few months). We will also tell you about the tremendous impact that learning Spark can have upon your current salary, and the best ways to get trained in this ground-breaking new technology.
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Big Data 2.0 - How Spark technologies are reshaping the world of big data analytics
1. Big Data 2.0
HOW SPARK TECHNOLOGIES ARE RESHAPING THE
WORLD OF BIG DATA ANALYTICS
Presented By: Lillian Pierson, P.E.
2. Today’s webinar
Apache Spark: Journey from “Hadoop Eco System component” to “Big
Data platform”
The story of how Spark began
Is Spark a data engineering or data science platform?
Who is using Spark and for what?
Got Spark skills? Here’s why you should
5. “In-memory computing appliances
are … faster than the traditional
Hadoop system because in-
memory appliances don’t use
MapReduce… By storing data in
memory, in-memory appliances are
able to bypass the time-consuming
disk accesses that are required as
part of the map and reduce
operations that comprise the
MapReduce process. In-memory
data storage processing, and
analysis is fast enough to generate
data analytics in real-time, derived
from streaming data sources.“ –
Excerpt from my book:
Big Data/Hadoop for Dummies
Why in-memory
applications?
17. Changes with Spark 2.0
RDD API
•DataFrame
API
Spark
1.0
•RDD API
•DataFrame
API
Spark
1.3
*RDD API
*DataFrame
API
*Dataset API
Spark
1.6
Dataset API
•DataFrame
API
•RDD API
Spark
2.0
18. Changes with Spark 2.0
RDD API
Dataset API
DataFrame API
RDD API
Spark 1.0 Spark 2.0
19. Changes with Spark 2.0
Structured
Stream
Processing
DataFrame API
Dataset API
21. Taking things from the
beginning…
2009
Mesos
UC Berkeley
Interactive, iterative parallel processing (in-
memory)
◦ Machine learning requirements
Integrates with Hadoop ecosystem
Dr. Ion Stoica
Computer Science Professor
UC Berkeley
22. Databricks… the cutting edge
of Spark
Delivers Apache Spark-as-a-Service
Most popular solution for deploying Spark on
the cloud
Dr. Ion Stoica
Executive Chairman, Apache Databricks
23. Databricks… the cutting edge
of Spark
Spark on an as-needed basis
Automates
◦ Cluster building and configuration
◦ Security
◦ Process monitoring
◦ Resource monitoring
Notebooks
◦ For data analysis and machine learning using Python, R, and Scala
Data visualization capabilities
◦ Data visualization and dashboard design options
24. Is Spark a data
engineering or data
science platform?
DATA ENGINEERING COMPONENTS AND
TECHNOLOGIES
DATA SCIENCE COMPONENTS AND TECHNOLOGIES
25. Spark’s data engineering
elements
Automate cluster sizing and configuration requirements
Data Storage: HDFS
Resource Management:
◦ Spark Standalone
◦ Apache Mesos
◦ Hadoop YARN
26. Spark’s data engineering
elements
Spark Streaming Submodule – Reuse same code you use for batch
processing, but get real-time results!
◦ Integrates with big data source, like:
◦ HDFS
◦ Flume
◦ Kafka
◦ Twitter and
◦ ZeroMQ
27. Doing data science with Spark
Useful for machine learning and analysis of big data
Build big data analytics products
Programmable in Python, R, Scala, and SQL
Submodules:
◦ SQL and DataFrames
◦ MLlib for machine learning
◦ GraphX for in-memory big (graph) data computations
28. Doing data science with Spark
Spark integrates with the following data sources and formats:
◦ Hive, Avro, Parquet, CSV, JSON, and JDBC, HBase
◦ BI Tools: Tableau, QLIK, ZoomData, etc. (through JDBC)
29. Who is using
Spark and for
what?
A U T O M A T I C L A B S
L E N D U P
S E L L P O I N T S
F I N D I F Y
30. Automatic Labs on Databricks
Making cars smarter with real-time analytics
Connect to, and make smart use, of your car’s data
31. Automatic Labs on Databricks
Automatic apps do things like:
◦ Decoding engine problems
◦ Locating parked cars
◦ Crash detection and response
◦ Low fuel warnings, etc.
Automatic is using Spark to make cars smarter with real-time analytics
During product development, Automatic needs to query, explore, and
visualize large amounts of data, QUICKLY. By moving this work over to
Spark, Automatic was able to:
◦ Validate products in days, not weeks
◦ Complete complex queries in minutes
◦ Free up 1 full-time data scientist
◦ Save $10K/month on infrastructure costs
32. LendUp on
Databricks
Improving the lending
process and experience
“Moving up the LendUp
Ladder means earning
access to more money, at
better rates, for longer
periods of time” - LendUp
33. LendUp on Databricks
LendUp uses Spark for:
◦ Feature engineering at scale
◦ Fast model building and testing
By using Spark to do this work, LendUp is able to:
◦ Build more accurate models, faster
◦ Offer more lines of credit
◦ Develop new products more quickly
◦ Increase in-house productivity of data science team
35. sellpoints on Databricks
Increasing ROI on ad spend
Sellpoint offers services in:
◦ Identifying qualified shoppers
◦ Driving traffic
◦ Increasing sales conversion
By moving to Databricks, sellpoints was able to:
◦ Productize a new predictive analytics offering, improving the ad spend ROI
by threefold compared to competitive offerings.
◦ Reduce the time and effort required to deliver actionable insights to the
business team while lowering costs.
◦ Improve productivity of the engineering and data science team by
eliminating the time spent on DevOps and maintaining open source
software.
36. Findify on Databricks
Improving shopping experience for ecommerce customers
Uses machine learning to continually improve search accuracy
37. Findify on Databricks
Improving shopping experience for ecommerce customers
By moving to Databricks, Findify was able to:
◦ Focus on development instead of infrastructure – Allowing them to complete
their feature development projects faster and reduce customer frustration
in delayed analytics
◦ Focus on building innovative features - because the managed Spark platform
eliminated time spent on DevOps and infrastructure issues.
Uses machine learning to continually improve search accuracy
41. Getting training and
experience in Spark
Get hands-on training in the following areas:
◦ Using RDD
◦ Writing applications using Scala
◦ Spark SQL
◦ Spark Streaming
◦ Machine Learning in Spark (Mllib)
◦ Spark GraphX
◦ Spark Project Implementation
44. Why Data Science From Simplilearn
Key
Features
40 hours of real life
industry project
experience
25 hours of High
Quality e-learning
Visualize and
optimize data
effectively using
the built-in tools in
R , SAS and Excel
48 hours of Live
Instructor Led
Online sessions
Get proficient in
using R,SAS and Excel
to model data and
predict solutions to
business problems
Master the concepts
of statistical analysis
like linear & logistic
regression, cluster
analysis &
forecasting
45. OUR JOURNEY SO FAR Project
Management
Digital Marketing
Big Data &
Analytics
Business
Productivity
Tools
Quality
Management
Virtualization and
Cloud Computing
IT Security
Financial
Management
CompTIA
Certification
IT Hardware and
N/W ERP
IT Services and
Architecture
Agile and Scrum
Certification
OS and Database
Web and App
Programming
Simplilearn : World’s Largest Certification Training Destination
One of the largest collections of accredited certification training in the
world.
YEAR
2010
YEAR
2015
YEAR
2010
YEAR
2016