SlideShare ist ein Scribd-Unternehmen logo
1 von 12
1
Big Data
Past, Present & Future
Where are We Headed?
Rob Peglar
CTO Americas
Isilon Storage Division
EMC Corporation
rob.peglar@emc.com
@peglarr
2
• In order to understand what’s coming, we must
understand our past
• We must also understand that
Big Data is fundamentally
different than what we’re used to
• Consider the difference between a still photograph
and a movie – and our human perception of them
– More than a collection of still photographs – why?
Prediction is Very Difficult -
Especially About the Future
- Niels Bohr
3
The Past –
and I Mean the Past
• Consider the census…
• From the Latin “censere”
– meaning “to estimate”
• “In those days a decree went out from Emperor Augustus that all
the world should be registered.” Luke 2:1
• The Domesday Book of 1086 – England
– Comprehensive tally of people, their land, and property
• The US Constitution mandates a decennial census
– The 1880 census took eight years (!) to complete
• This led to Hollerith’s punched card tabulator in 1890
– The beginning of automated data processing
– Reduced the census time to one year
4
Sampling – Good or Bad?
• Sampling precision improves optimally
with randomness
– Not sample size
– Jerzy Neyman (Poland, 1934) proved this
• Neyman, J.(1934) "On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection", Journal of
the Royal Statistical Society, 97 (4), 557–625
• Good - Sampling was a solution to information overload
• Bad - Systematic bias in sampling gives wrong conclusions
• A seismic shift is occurring – from
– Sampling, keeping datasets small on purpose, using them once…to
– N=all, keeping datasets large on purpose, using them many times
• Why? The outliers are the most interesting!
– Examples – credit card fraud, language translation, insurability
– Don’t just follow the rules, look for the exceptions
Williams
Tube
1946
1024 bits
5
The Journey from
Clean to Messy
• 1998 – Linden et al, collaborative
filtering patent, working at a Seattle startup selling books
online
– G. Linden J. Jacobi and E. Benson, Collaborative Recommendations Using Item-to-Item Similarity Mappings, US Patent 6,266,649 (to Amazon.com),
Patent and Trademark Office, Washington, D.C., 2001
• “If it works perfectly, Amazon should show you just one
book – the next one you will buy.” (Linden)
• Hypothesis-driven approach becomes data-driven
– “Proving” something (causation)  correlation
• McGregor et al – using big data to improve the NICU
– 16 data streams, 1,260 data points/sec
– Valid improvement of premature infant adverse outcomes
– No “proof” – it helps doctors make better diagnostic decisions
– Carolyn McGregor, "Big Data in Neonatal Intensive Care," Computer, vol. 46, no. 6, pp. 54-59, June 2013, doi:10.1109/MC.2013.157
6
Manholes and Raw Data - Correlations
• 94,000 miles of underground cable in NYC, 51,000 manholes in
just Manhattan w/service boxes below
• 1 in 20 cables laid before 1930; some Edison-era
• Records kept since 1880’s – 38 different terms
– All hand-written, paper, cards, ledgers, etc.
• 2008 - How to prevent fires, exploding manholes?
• Machine-correlate 106 predictors of imminent disaster
– Top 10% predicted were 44% of total failures
• Chris Anderson – “data deluge makes scientific method obsolete”
– http://archive.wired.com/science/discoveries/magazine/16-07/pb_theory
• “Datafication” – everything is data
– Numbers to words to images to locations to relationships to feelings …
– Graph theory & graph analysis changes the way we perceive the world
7© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
The Present - Architecture
BUSINESS PROCESSINFO PROCESSINGDATA ACQUISITIONDATA CREATION
END USERSANALYSTS / SCIENTISTSARCHITECTS / ENGINEERSPRODUCERS
Shared Nothing
Scale-out Storage + SSD
MPP + In-Memory
Compute
Hadoop
Hi-Speed / -
Resiliency
Networking
Converged
Infrastructure
Cloud
Non-relational
DWH
SYSTEMS INTEGRATION
VOLUMEVELOCITYVARIETY
OBJECTIVES
Stream Processing
Event Management
Data Exploration
Contextualized Data
Modeling / Scenarios
Forecasting
DELIVERY MODELS
Access-Anywhere
Analytics Services
Context-Aware
Business Applications
ON-DEMAND
Location-Based
Services
Alert and
Respond
PUSH
Workflow and
Interaction
Automation
Smart devices
and systems
EMBEDDED
Email and
Messaging
Mobile Apps Data
Transaction and
Usage Logs
Machine and
Sensors
Geolocation
Relationships and
Social Influence
Real-time
Events
Deep
Insights
VALUE
8
The Present – Business Value of Data
• Data is valuable – re-use of data even more so
– Not ephemeral value – can be re-consumed ad infinitum
– Economists call this a “non-rivalrous” good
• Cost/benefit of storage ~ 0 – so keep everything
– Ewan Birney, European Biomatics Information Institute, “Hidden Treasures
In Junk DNA” http://www.scientificamerican.com/article.cfm?id=hidden-treasures-in-junk-dna
– Last 50 years, cost/byte ~1/2x every 2 years
– Density has increased ~50 million times since 1956
• Consider electric cars:
– Battery level indicates when to “fill up” from the power grid
– Power utility monitors grid usage over time
– Correlate both data sets together
• Determine when/where to build recharge stations on which roads
• Recombinant data
– “Old” data combined into new forms for new insights
– “Noisy” datasets enable feedback loops – e.g. better/faster search/index
9
The Future 1 – Wild, Wild West?
• Can we treat data as a corporate asset?
– A ledger entry, like “brand value” (intangible)
– Or is data a tangible asset to be kept on the books?
– Does data have “cash value”? Asset amortization?
– Can a business be legally “liable” for its data collection?
• Facebook book-valued at $6.3B. IPO value: $104B
– Why the difference? Facebook is essentially data
– Or, every FB user is worth ~ $100 (~1B subscribers)
• We will see much more “data value chain” ahead
– Ingest, analyze, sell results, analyze, sell results …downstreaming
– Licensing of data in its infancy – much more to come
– Think about the data just from your car – 40 uPs
10
The Future 2 – Data as Policy -
Can Data save Us from Us?
• “In God We Trust – all others bring data”
– Commonly attributed to W. Edward Deming
• New jobs/titles coming out of the woodwork
– CAO (Chief Analytics Officer), CDO (Data)
– Data Scientist, Data Correlationist, Data Ethicist
• Knowing “what” not “why” is good enough. Is it?
• Remember Bayes’ “inductive probability” (250 yrs!)
– We update our beliefs about something as new data arrives
– Bayes T. (1763) "An Essay towards solving a Problem in the Doctrine of Chances". Phil. Trans., 53, 370–418.
• Data Policy in the immortal words of Yogi Berra:
– “We make too many wrong mistakes”
– “You can observe a lot just by watching.”
11
The Future 3 – N=all?
Keep Everything? Seriously?
• Data Silos or the Data Lake?
– HDFS presents a crisis: i.e. 危機, weiji
• dangerous ‘critical point’ (not crisis; mis-translation)
– Write-once, read-many, modify-never; delete-never?
– Time is not your friend when moving data
• (So, don’t move it between repositories; move it to the CPU)
• One 40GE NIC yields same rate on bus as 28 disks @ 140MB/s
• One million seconds is 277.7 hours (~ 11.5 days)
• 1 PB @ 1 GB/sec is … 1 EB @ 1 TB/sec is …
• Non-shared (1 protocol) or shared (N protocols)?
• Time versus Space – the Essential Judgment
• Cost of Having Data vs. Cost of Not Having Data
12
THANK YOU

Weitere ähnliche Inhalte

Was ist angesagt?

Creating Value in Health through Big Data
Creating Value in Health through Big DataCreating Value in Health through Big Data
Creating Value in Health through Big DataBooz Allen Hamilton
 
Asking More - Jon Iwata, IBM
Asking More - Jon Iwata, IBMAsking More - Jon Iwata, IBM
Asking More - Jon Iwata, IBMpaulp-mc2
 
Qu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air France
Qu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air FranceQu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air France
Qu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air FranceJedha Bootcamp
 
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroKeynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroData ScienceTech Institute
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 
Big Data & Machine Learning
Big Data & Machine LearningBig Data & Machine Learning
Big Data & Machine LearningAngelo Mariano
 
Big data, big opportunities
Big data, big opportunitiesBig data, big opportunities
Big data, big opportunitiesChouaieb NEMRI
 
Data Science and Culture
Data Science and CultureData Science and Culture
Data Science and CultureÍcaro Medeiros
 
Big Data Analytics for Dodd-Frank
Big Data Analytics for Dodd-FrankBig Data Analytics for Dodd-Frank
Big Data Analytics for Dodd-FrankDataWorks Summit
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science James Hendler
 

Was ist angesagt? (20)

Creating Value in Health through Big Data
Creating Value in Health through Big DataCreating Value in Health through Big Data
Creating Value in Health through Big Data
 
Big data
Big dataBig data
Big data
 
Asking More - Jon Iwata, IBM
Asking More - Jon Iwata, IBMAsking More - Jon Iwata, IBM
Asking More - Jon Iwata, IBM
 
Qu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air France
Qu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air FranceQu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air France
Qu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air France
 
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroKeynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
Data science
Data scienceData science
Data science
 
U4 l01 What is big data?
U4 l01 What is big data?U4 l01 What is big data?
U4 l01 What is big data?
 
Big Data & Machine Learning
Big Data & Machine LearningBig Data & Machine Learning
Big Data & Machine Learning
 
The promise and challenge of Big Data
The promise and challenge of Big DataThe promise and challenge of Big Data
The promise and challenge of Big Data
 
NewMR 2016 presents: 9 Big Applications of Big Data
NewMR 2016 presents: 9 Big Applications of Big DataNewMR 2016 presents: 9 Big Applications of Big Data
NewMR 2016 presents: 9 Big Applications of Big Data
 
Big data, big opportunities
Big data, big opportunitiesBig data, big opportunities
Big data, big opportunities
 
Data Science and Culture
Data Science and CultureData Science and Culture
Data Science and Culture
 
Business analytics
Business analyticsBusiness analytics
Business analytics
 
Big Data Analytics for Dodd-Frank
Big Data Analytics for Dodd-FrankBig Data Analytics for Dodd-Frank
Big Data Analytics for Dodd-Frank
 
The Field Guide to Data Science
The Field Guide to Data ScienceThe Field Guide to Data Science
The Field Guide to Data Science
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science
 
Applications of Big Data
Applications of Big DataApplications of Big Data
Applications of Big Data
 
Data science and_analytics_for_ordinary_people_ebook
Data science and_analytics_for_ordinary_people_ebookData science and_analytics_for_ordinary_people_ebook
Data science and_analytics_for_ordinary_people_ebook
 
Lecture #01
Lecture #01Lecture #01
Lecture #01
 

Ähnlich wie Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014

Big Data and the Art of Data Science
Big Data and the Art of Data ScienceBig Data and the Art of Data Science
Big Data and the Art of Data ScienceAndrew Gardner
 
Data Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesData Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesKathirvel Ayyaswamy
 
DataEd Slides: Getting Data Quality Right – Success Stories
DataEd Slides: Getting Data Quality Right – Success StoriesDataEd Slides: Getting Data Quality Right – Success Stories
DataEd Slides: Getting Data Quality Right – Success StoriesDATAVERSITY
 
Data science and its potential to change business as we know it. The Roadmap ...
Data science and its potential to change business as we know it. The Roadmap ...Data science and its potential to change business as we know it. The Roadmap ...
Data science and its potential to change business as we know it. The Roadmap ...InnoTech
 
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...IT Network marcus evans
 
Why quality control and quality assurance is important for the legacy of GEOT...
Why quality control and quality assurance is important for the legacy of GEOT...Why quality control and quality assurance is important for the legacy of GEOT...
Why quality control and quality assurance is important for the legacy of GEOT...Adam Leadbetter
 
Spark Social Media
Spark Social Media Spark Social Media
Spark Social Media suresh sood
 
Big Data: What's it Really About?
Big Data: What's it Really About?Big Data: What's it Really About?
Big Data: What's it Really About?inside-BigData.com
 
Level Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentationLevel Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentationDoug Denton
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalIIIT Allahabad
 
1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptxRahulTr22
 
Data warehouse Vs Big Data
Data warehouse Vs Big Data Data warehouse Vs Big Data
Data warehouse Vs Big Data Lisette ZOUNON
 

Ähnlich wie Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014 (20)

DBMS
DBMSDBMS
DBMS
 
Big Data and the Art of Data Science
Big Data and the Art of Data ScienceBig Data and the Art of Data Science
Big Data and the Art of Data Science
 
Big data
Big dataBig data
Big data
 
Data Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesData Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research Opportunities
 
DataEd Slides: Getting Data Quality Right – Success Stories
DataEd Slides: Getting Data Quality Right – Success StoriesDataEd Slides: Getting Data Quality Right – Success Stories
DataEd Slides: Getting Data Quality Right – Success Stories
 
Big Data World
Big Data WorldBig Data World
Big Data World
 
Data science and its potential to change business as we know it. The Roadmap ...
Data science and its potential to change business as we know it. The Roadmap ...Data science and its potential to change business as we know it. The Roadmap ...
Data science and its potential to change business as we know it. The Roadmap ...
 
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
 
Why quality control and quality assurance is important for the legacy of GEOT...
Why quality control and quality assurance is important for the legacy of GEOT...Why quality control and quality assurance is important for the legacy of GEOT...
Why quality control and quality assurance is important for the legacy of GEOT...
 
Big Data et eGovernment
Big Data et eGovernmentBig Data et eGovernment
Big Data et eGovernment
 
Spark
SparkSpark
Spark
 
Spark Social Media
Spark Social Media Spark Social Media
Spark Social Media
 
Ictam big data
Ictam big dataIctam big data
Ictam big data
 
Big Data – Are You Ready?
Big Data – Are You Ready?Big Data – Are You Ready?
Big Data – Are You Ready?
 
Big Data: What's it Really About?
Big Data: What's it Really About?Big Data: What's it Really About?
Big Data: What's it Really About?
 
Level Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentationLevel Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentation
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar Semwal
 
1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx
 
Data warehouse Vs Big Data
Data warehouse Vs Big Data Data warehouse Vs Big Data
Data warehouse Vs Big Data
 
BrightTALK - Semantic AI
BrightTALK - Semantic AI BrightTALK - Semantic AI
BrightTALK - Semantic AI
 

Mehr von StampedeCon

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...StampedeCon
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017StampedeCon
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017StampedeCon
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...StampedeCon
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017StampedeCon
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017StampedeCon
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017StampedeCon
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...StampedeCon
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...StampedeCon
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017StampedeCon
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017StampedeCon
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017StampedeCon
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017StampedeCon
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017StampedeCon
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017StampedeCon
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...StampedeCon
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...StampedeCon
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016StampedeCon
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016StampedeCon
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016StampedeCon
 

Mehr von StampedeCon (20)

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
 

Kürzlich hochgeladen

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 

Kürzlich hochgeladen (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014

  • 1. 1 Big Data Past, Present & Future Where are We Headed? Rob Peglar CTO Americas Isilon Storage Division EMC Corporation rob.peglar@emc.com @peglarr
  • 2. 2 • In order to understand what’s coming, we must understand our past • We must also understand that Big Data is fundamentally different than what we’re used to • Consider the difference between a still photograph and a movie – and our human perception of them – More than a collection of still photographs – why? Prediction is Very Difficult - Especially About the Future - Niels Bohr
  • 3. 3 The Past – and I Mean the Past • Consider the census… • From the Latin “censere” – meaning “to estimate” • “In those days a decree went out from Emperor Augustus that all the world should be registered.” Luke 2:1 • The Domesday Book of 1086 – England – Comprehensive tally of people, their land, and property • The US Constitution mandates a decennial census – The 1880 census took eight years (!) to complete • This led to Hollerith’s punched card tabulator in 1890 – The beginning of automated data processing – Reduced the census time to one year
  • 4. 4 Sampling – Good or Bad? • Sampling precision improves optimally with randomness – Not sample size – Jerzy Neyman (Poland, 1934) proved this • Neyman, J.(1934) "On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection", Journal of the Royal Statistical Society, 97 (4), 557–625 • Good - Sampling was a solution to information overload • Bad - Systematic bias in sampling gives wrong conclusions • A seismic shift is occurring – from – Sampling, keeping datasets small on purpose, using them once…to – N=all, keeping datasets large on purpose, using them many times • Why? The outliers are the most interesting! – Examples – credit card fraud, language translation, insurability – Don’t just follow the rules, look for the exceptions Williams Tube 1946 1024 bits
  • 5. 5 The Journey from Clean to Messy • 1998 – Linden et al, collaborative filtering patent, working at a Seattle startup selling books online – G. Linden J. Jacobi and E. Benson, Collaborative Recommendations Using Item-to-Item Similarity Mappings, US Patent 6,266,649 (to Amazon.com), Patent and Trademark Office, Washington, D.C., 2001 • “If it works perfectly, Amazon should show you just one book – the next one you will buy.” (Linden) • Hypothesis-driven approach becomes data-driven – “Proving” something (causation)  correlation • McGregor et al – using big data to improve the NICU – 16 data streams, 1,260 data points/sec – Valid improvement of premature infant adverse outcomes – No “proof” – it helps doctors make better diagnostic decisions – Carolyn McGregor, "Big Data in Neonatal Intensive Care," Computer, vol. 46, no. 6, pp. 54-59, June 2013, doi:10.1109/MC.2013.157
  • 6. 6 Manholes and Raw Data - Correlations • 94,000 miles of underground cable in NYC, 51,000 manholes in just Manhattan w/service boxes below • 1 in 20 cables laid before 1930; some Edison-era • Records kept since 1880’s – 38 different terms – All hand-written, paper, cards, ledgers, etc. • 2008 - How to prevent fires, exploding manholes? • Machine-correlate 106 predictors of imminent disaster – Top 10% predicted were 44% of total failures • Chris Anderson – “data deluge makes scientific method obsolete” – http://archive.wired.com/science/discoveries/magazine/16-07/pb_theory • “Datafication” – everything is data – Numbers to words to images to locations to relationships to feelings … – Graph theory & graph analysis changes the way we perceive the world
  • 7. 7© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. The Present - Architecture BUSINESS PROCESSINFO PROCESSINGDATA ACQUISITIONDATA CREATION END USERSANALYSTS / SCIENTISTSARCHITECTS / ENGINEERSPRODUCERS Shared Nothing Scale-out Storage + SSD MPP + In-Memory Compute Hadoop Hi-Speed / - Resiliency Networking Converged Infrastructure Cloud Non-relational DWH SYSTEMS INTEGRATION VOLUMEVELOCITYVARIETY OBJECTIVES Stream Processing Event Management Data Exploration Contextualized Data Modeling / Scenarios Forecasting DELIVERY MODELS Access-Anywhere Analytics Services Context-Aware Business Applications ON-DEMAND Location-Based Services Alert and Respond PUSH Workflow and Interaction Automation Smart devices and systems EMBEDDED Email and Messaging Mobile Apps Data Transaction and Usage Logs Machine and Sensors Geolocation Relationships and Social Influence Real-time Events Deep Insights VALUE
  • 8. 8 The Present – Business Value of Data • Data is valuable – re-use of data even more so – Not ephemeral value – can be re-consumed ad infinitum – Economists call this a “non-rivalrous” good • Cost/benefit of storage ~ 0 – so keep everything – Ewan Birney, European Biomatics Information Institute, “Hidden Treasures In Junk DNA” http://www.scientificamerican.com/article.cfm?id=hidden-treasures-in-junk-dna – Last 50 years, cost/byte ~1/2x every 2 years – Density has increased ~50 million times since 1956 • Consider electric cars: – Battery level indicates when to “fill up” from the power grid – Power utility monitors grid usage over time – Correlate both data sets together • Determine when/where to build recharge stations on which roads • Recombinant data – “Old” data combined into new forms for new insights – “Noisy” datasets enable feedback loops – e.g. better/faster search/index
  • 9. 9 The Future 1 – Wild, Wild West? • Can we treat data as a corporate asset? – A ledger entry, like “brand value” (intangible) – Or is data a tangible asset to be kept on the books? – Does data have “cash value”? Asset amortization? – Can a business be legally “liable” for its data collection? • Facebook book-valued at $6.3B. IPO value: $104B – Why the difference? Facebook is essentially data – Or, every FB user is worth ~ $100 (~1B subscribers) • We will see much more “data value chain” ahead – Ingest, analyze, sell results, analyze, sell results …downstreaming – Licensing of data in its infancy – much more to come – Think about the data just from your car – 40 uPs
  • 10. 10 The Future 2 – Data as Policy - Can Data save Us from Us? • “In God We Trust – all others bring data” – Commonly attributed to W. Edward Deming • New jobs/titles coming out of the woodwork – CAO (Chief Analytics Officer), CDO (Data) – Data Scientist, Data Correlationist, Data Ethicist • Knowing “what” not “why” is good enough. Is it? • Remember Bayes’ “inductive probability” (250 yrs!) – We update our beliefs about something as new data arrives – Bayes T. (1763) "An Essay towards solving a Problem in the Doctrine of Chances". Phil. Trans., 53, 370–418. • Data Policy in the immortal words of Yogi Berra: – “We make too many wrong mistakes” – “You can observe a lot just by watching.”
  • 11. 11 The Future 3 – N=all? Keep Everything? Seriously? • Data Silos or the Data Lake? – HDFS presents a crisis: i.e. 危機, weiji • dangerous ‘critical point’ (not crisis; mis-translation) – Write-once, read-many, modify-never; delete-never? – Time is not your friend when moving data • (So, don’t move it between repositories; move it to the CPU) • One 40GE NIC yields same rate on bus as 28 disks @ 140MB/s • One million seconds is 277.7 hours (~ 11.5 days) • 1 PB @ 1 GB/sec is … 1 EB @ 1 TB/sec is … • Non-shared (1 protocol) or shared (N protocols)? • Time versus Space – the Essential Judgment • Cost of Having Data vs. Cost of Not Having Data