4. AMPLab Overview
Project Launched Jan 2011, 6 Yr Planned Duration
Personnel: ~65 Students, Postdocs, Faculty and Staff
Funding: Government/Industry Partnership
NSF Expedition Award , Darpa XData, DoE, 20+
Companies
Key Outputs:
BDAS Open Source Stack & Apps, (including Apache
Spark)
Publications: Top Venues in ML, Systems, Databases and
Others
“… the University of California, Berkeley’s AMPLab
has already left an indelible mark on world of
information technology, and even the web. But we
haven’t yet experienced the full impact of the group,
… Not even close.”
-- Derrick Harris, GigaOm, August
2014
5. The AMPLab Faculty UC BERKELEY
Michael Franklin (Databases)
Michael Jordan (Machine Learning)
Ion Stoica (Systems)
Dave Patterson (Systems)
Scott Shenker (Networks)
Alex Bayen (Mobile Sensing)
David Culler (Systems/Sensing)
Ken Goldberg (Crowdsourcing)
Anthony Joseph (Security)
Randy Katz (Systems)
Michael Mahoney (ML)
Ben Recht (Machine Learning)
Raluca Popa (Systems/security) joining in Summer 2015
6. Industrial Engagement
• Industrial-Strength Open Source Software
• Used by Sponsors, Start-ups and many others
• Regular interactions with top industry technologists
twice-yearly 3-day offsite retreats; AMPCamp training, some
site visits
7. AMP: Integrating 3 Key
Resources
Algorithms
• Machine Learning, Statistical Methods
• Prediction, Business Intelligence
Machines
• Clusters and Clouds
• Warehouse Scale Computing
People
• Crowdsourcing, Human Computation
• Data Scientists, Analysts
8. Our View of the Big Data Challenge
Time
Answer
Money Quality
8
Step 1:
Improve
efficiency
(e.g. Spark,
Tachyon)
Massive
Diverse
Massive
Diverse
and
and
Growing
Growing
Data
Data
Step 1I:
Enable
intelligent
tradeoffs
(e.g.,
BlinkDB
SampleCle
an)
9. The Research Challenge
+ + Integration +
Extreme Elasticity +
Tradeoffs +
More Sophisticated Analytics
= Extreme Complexity
10. Arc of our Research
Program
Early work on Foundations (Yrs 1-2):
Algorithms – Bag of Little Bootstraps
Machines – Mesos and Spark
People – CrowdDB Prototype
Filling out the Analytics Stack (Yrs 3-4): <you are
here>
Algorithms – ML Pipelines, Async Algorithms,
Concurrency Ctl
Machines – Tachyon, SQL, Graphs, Streams, R,
Performance
People – Hybrid Human/Machine Data
Cleaning/Integration
Moving Up the Stack/Expanding the Footprint (Yrs
5-6):
Algorithms – MLlib build out, Declarative ML (MLBase)
Machines – New Storage/Processing Archs, Data/Model
11. Big Data Ecosystem
Evolution
MapReduce
Pregel
Dremel
GraphLab
Storm
Giraph
Drill
Tez
Impala
S4
…
Specialized systems
(iterative, interactive and
streaming apps)
General batch
processing
12. AMPLab Unification
Philosophy
Don’t specialize MapReduce – Generalize it!
Two additions to Hadoop MR can enable all the
models shown earlier!
1. General Task DAGs
2. Data Sharing
For Users:
Fewer Systems to Use
Less Data Movement
Spark
Streaming
GraphX
…
SparkSQL
MLbase
13. Berkeley Data Analytics
Cancer Genomics, Energy Debugging, Smart
In House Applications
Buildings
Sample
Clean
MLBa
se
Spark
R
Access and Interfaces
Velox Model Serving
Processing Engine
Tachyon
BlinkDB
Spark
Streamin
g
Shark
GraphX MLlib
Spark
Stack
(open source software)
HDFS,
Mesos Resource S3, Virtualization
… Yarn
In-house
Apps
Access and
Interfaces
Processing
Engine
Storage
Resource
Virtualization
Tachyon
Storage
14. Berkeley Data Analytics
Cancer Genomics, Energy Debugging, Smart
Buildings
Sample
Clean
MLBa
se
Spark
R
Velox Model Serving
SparkSQ
Tachyon
BlinkDB
Spark
Streamin
g
GraphX MLlib
Spark
Stack
(open source software)
HDFS,
In-house
Apps
Access and
Interfaces
Processing
Engine
Storage
Mesos S3, … Yarn Resource
Virtualization
Tachyon
Apache
Apache
Shark
L
15. Some Academic Accolades
Ph.D. + Postdoc alumni 2013/14 above have accepted
faculty jobs at: Brown, Harvey Mudd, MIT(3), Stanford,
UCLA, UT Austin
Best Paper Awards: BPOE14,Eurosys13, ICDE 13, NSDI 12,
SIGCOMM 12 and Best Demo: SIGMOD 12, VLDB 11
CACM “Research Highlight” Selections 2014 and 2015
16. About AMPCamp
History
Today
• BDAS and Stack Component Overviews
• Hands On Exercises
• Use Cases
• Reception and Networking
Tomorrow
• Research and ML Overviews
• Advanced Hands On Exercises (including
genomics)
AMPCamp I @ Berkeley, August 2012
AMPCamp II @ Strata NYC., Feb 2013
AMPCamp III @ Berkeley, August 2013
AMPCamp IV @Strata Santa Clara, Feb 2014
AMPCamp V @Berkeley, Nov 2015
Also “Spark Camp”: AMPCamp Spinoff
17. AMPCamp Made Possible
By
Rachit Agarwal
Elaine Angelino
Peter Bailis
Dan Crankshaw
Ankur Dave
Joseph Gonzalez
Daniel Haas
Sanjay Krishnan
Haoyuan Li
Frank Austin Nothaft
Xinghao Pan
Pedro Rodriguez
Ginger Smith
Evan Sparks
Shivaram Venkataraman
Jiannan Wang
Zongheng Yang
Ameet Talwalkar
Jey Kottalam
Kattt Atchley
Carlyn Chinen
Boban Zarkovich
Jon Kuroda
18. To find out more or
get involved:
UC BERKELEY
amplab.berkeley.edu
franklin@berkeley.e
du
Thanks to NSF CISE Expeditions in Computing, DARPA XData,
Founding Sponsors: Amazon Web Services, Google, and SAP,
the Thomas and Stacy Siebel Foundation,
and all our industrial sponsors and partners.