2. It’s All Happening On-line User Generated
(Web, Social & Mobile)
Every:
Click
Ad impression
Billing event
…..
Fast Forward, pause,…
Friend Request
Transaction
Network message
Fault
…
Internet of Things / M2M Scientific Computing
3. Volume Petabytes+
Variety Unstructured
Velocity Real-Time
Our view: More data should mean better answers
• Must balance Cost, Time, and Answer Quality
3
5. UC BERKELEY
Algorithms: Machine
Learning and
Analytics
Massive
and Diverse
Data
People:
Machines:
CrowdSourcing &
Cloud Computing
Human Computation
5
7. Alex Bayen (Mobile Sensing) Anthony Joseph (Sec./ Privacy)
Ken Goldberg (Crowdsourcing) Randy Katz (Systems)
*Michael Franklin (Databases) Dave Patterson (Systems)
Armando Fox (Systems) *Ion Stoica (Systems)
*Mike Jordan (Machine Learning) Scott Shenker (Networking)
Organized for Collaboration:
7
10. • Sequencing costs (150X) Big Data $100,000.0
$K per genome
$10,000.0
• UCSF cancer researchers + UCSC cancer genetic $1,000.0
$100.0
database + AMP Lab + Intel Cluster $10.0
$1.0
@TCGA: 5 PB = 20 cancers x 1000 genomes $0.1
2001 - 2014
• See Dave Patterson’s Talk: Thursday 3-4, BDT205
David Patterson, “Computer Scientists May Have What It Takes to Help Cure Cancer,” New York Times,
10 12/5/2011