Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Big Data Day LA 2016 Keynote - Reynold Xin/ Databricks
1. Scaling Big Data, a Spark perspective
Reynold Xin
@rxin
2016-07-09 Big Data LA
2. Scaling Big Data
Early adopters
Data Scientists
Statisticians
Physicists
R users
PyData
…
Citizen data scientists
Sophisticated
engineering
teams
3. Spark Philosophy
Unified engine
Support end-to-end applications
High-level APIs
Easy to use, rich optimizations
Integrate broadly
Storage systems, libraries, etc
SQLStreaming ML Graph
…
1
2
3
4. Apache Spark 2.0
Next major release,coming out in the next few weeks
• Unstable preview release at spark.apache.org
• 2.0.0-rc2 available on dev@sparkmailing list
Remains highly compatible with ApacheSpark 1.X
17k patches (2500 for 2.0) from 1200+ contributors
5. New in 2.0
Structured API improvements
(DataFrame, Dataset, SparkSession)
Structured Streaming
MLlib model export
R bindings
SQL 2003
Performance improvements
Deep learning libraries
(Baidu, Yahoo!, Berkeley, Databricks)
GraphFrames
PyData integration
Reactive streams
C# bindings:Mobius
JS bindings:EclairJS
Broader Community
7. The largest challenge in applying big data is
the skills gap.
StackOverflow Developer Survey 2016
8. Massive Open Online Courses
Free 5-course series on big
data with Apache Spark
dbricks.co/mooc16
Introduction
to Apache Spark
TM
Distributed
Machine Learning
with Apache Spark
TM
Big Data Analysis
with Apache Spark
TM
Advanced Apache Spark
for Data Science and
Data Engineering
TM
Advanced
Machine Learning
with Apache Spark
TM
9. Databricks Community Edition
Free version of Databricks with:
• Interactive tutorials
• Apache Spark and populardata
science libraries
• Visualization & debug tools
databricks.com/ce