SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Jul-13© 2013 IDC
IDC’s Perspective
On Big Data
Outside Of HPC
Jul-13© 2013 IDC
Big Data:
A General Definition
Value
+
 Lots of data
 Time critical
 Multiple types (e.g.,
numbers, text, video)
 Worth something to
someone
Jul-13© 2013 IDC
Defining Big Data:
For the Broader IT Market
Jul-13© 2013 IDC
Top Drivers For Implementing Big Data
Jul-13© 2013 IDC
Organizational Challenges With Big Data:
Government Compared To All Others
Jul-13© 2013 IDC
Big Data Software
Jul-13© 2013 IDC
Big Data Software Technology Stack
Jul-13© 2013 IDC
Big Data Software Shortcomings -- Today
Jul-13© 2013 IDC
HPDA =
BIG DATA MEETS HPC
AND
ADVANCED SIMULATION
Jul-13© 2013 IDC
HPDA (High Performance Data Analysis):
Data-Intensive Simulation and Analytics
HPDA = tasks involving sufficient data volumes and
algorithmic complexity to require HPC
resources/approaches
 Established (simulation) or newer (analytics) methods
 Structured data, unstructured data, or both
 Regular (e.g., Hadoop) or irregular (e.g., graph) patterns
 Government, industry, or academia
 Upward extensions of commercial business problems
 Accumulated results of iterative problem-solving methods
(e.g., stochastic modeling, parametric modeling).
Jul-13© 2013 IDC
HPDA Market Drivers
 More input data (ingestion)
• More powerful scientific instruments/sensor networks
• More transactions/higher scrutiny (fraud, terrorism)
 More output data for integration/analysis
• More powerful computers
• More realism
• More iterations in available time
 Real time, near-real time requirements
• Catch fraud before it hits credit cards
• Catch terrorists before they strike
• Diagnose patients before they leave the office
• Provide insurance quotes before callers leave the phone
 The need to pose more intelligent questions
• Smarter mathematical models and algorithms
Jul-13© 2013 IDC
Data Movement Is Expensive:
In Energy and Time-to-Solution
Energy Consumption
 1MW ≈ $1 million
 Computing 1 calculation ≈
1 picojoule
 Moving 1 calculation = up
to 100 picojoules
 => It can take 100 times more
energy to move the results of a
calculation than to perform the
calculation in the first place.
Strategies
 Accelerate data movement
(bandwidth, latency)
 Minimize data movement
(e.g., data reduction, in-
memory compute, in-storage
compute, etc.)
Jul-13© 2013 IDC
Different Systems for Different Jobs
Partitionable Big Data Work
 Most jobs are here!
 Goal: search
 Regular access patterns (locality)
 Global memory not important
 Standard clusters + Hadoop,
Cassandra, etc.
Non-Partitionable Work
 Toughest jobs (e.g., graphing)
 Goal: discovery
 Irregular access patterns
 Global memory very important
 Systems turbo-charged for data
movement +graphing
versus
HPC architectures today are compute-centric (FLOPS vs. IOPS)
Jul-13© 2013 IDC
IDC HPDA Server Forecast
 Fast growth from a small starting point
 In 2015, conservatively approaching $1B
Jul-13© 2013 IDC
END-USE EXAMPLES
OF BIG DATA TODAY
Jul-13© 2013 IDC
Some Major Use Cases for HPDA
• Fraud/error detection across massive databases
 A horizontal use – applicable in many domains
• National security/crime-fighting
 SIGINT/anomaly detection/anti-hacking
 Anti-terrorism (including evacuation planning)/anti-crime
• Health care/medical informatics
 Drug design, personalized medicine
 Outcomes-based diagnosis & treatment planning
 Systems biology
• Customer acquisition/retention
• Smart electrical grids
• Design of social network architectures
Jul-13© 2013 IDC
Use Case: PayPal
Fraud Detection / Internet
Commerce
Slides and permission provided by PayPal, an eBay company
Jul-13© 2013 IDC
The Problem
Finding suspicious patterns that we don’t
even know exist in related data sets.
Jul-13© 2013 IDC
What Kind of Volume?
PayPal’s Data Volumes And HPDA Requirements
Jul-13© 2013 IDC
Where Paypal Used HPC
Jul-13© 2013 IDC
The Results
 $710 million saved in fraud that they wouldn’t have
been able to detect before (in the first year)
Jul-13© 2013 IDC
GEICO: Real-Time Insurance Quotes
 Problem: Need accurate automated phone quotes in
100ms. They couldn’t do these calculations nearly fast
enough on the fly.
 Solution: Each weekend, use a new HPC cluster to pre-
calculate quotes for every American adult and household
(60 hour run time)
Jul-13© 2013 IDC
Global Courier Service: Fraud/Error
Detection
Here’s a real-world example of one of the biggest names in global
package delivery.
Their problem is not so different from PayPal’s.
This courier service is doing real-time fraud detection on huge
volumes of packages that come into their sorting facility from many
locations and leave the facility for many other locations around the
world.
 Check 1 billion-plus packages per hour in central sorting facility
 Benchmark won by a HPC vendor with a turbo-charged
interconnect and memory system
Jul-13© 2013 IDC
Apollo Group/University of Phoenix:
Student Recruitment and Retention
Apollo Group is approaching 300,000 online students. To
maintain and grow, they have to target millions of
prospective students.
 Must target millions of potential students
 Must track student performance for early identification of
potential dropouts – “churn” is very expensive
 Solution: a sophisticated, cluster-based Big Data models
Jul-13© 2013 IDC
They use the cloud for this High Performance Data Analysis problem
-- that’s not so surprising, since molecular dynamics codes are often
highly parallel.
Jul-13© 2013 IDC
Architecture
Jul-13© 2013 IDC
Optum + Mayo Initiative to Move Past
Procedures-Based Healthcare
You may have seen the recent news that Optum, which is
part of United Health Group, is teaming with the Mayo
Cline to build a large center ($500K) in Cambridge,
Massachusetts to lay the research groundwork for
outcomes-based medicine.
 Data: 100M United Health Group claims (20 years) + 5M
Mayo Clinic archived patient records. Option for genomic
data
 Findings will be published
 Goal: outcomes-based care
Jul-13© 2013 IDC 28
Jul-13© 2013 IDC
Summary: HPDA Market Opportunity
 HPDA: simulation + newer high-performance analytics
• IDC predicts fast growth from a small starting point
 HPC and high-end commercial analytics are converging
• Algorithmic complexity is the common denominator
• Technologies will evolve greatly
 Economically important use cases are emerging
 No single HPC solution is best for all problems
• Clusters with MR/Hadoop will handle most but not all work
(e.g., graph analysis)
• New technologies will be required in many areas
 IDC believes our growth estimates could be
conservative
Jul-13© 2013 IDC
HPDA User Talks: HPC User Forums, UK,
Germany, France, China and U.S.
• HPC in Evolutionary Biology, Andrew Meade, University of Reading
• HPC in Pharmaceutical Research: From Virtual Screening to All-Atom Simulations of Biomolecules,
Jan Kriegl, Boehringer-Ingelheim
• European Exascale Software Initiative, Jean-Yves Berthou, Electricite de France
• Real-time Rendering in the Automotive Industry, Cornelia Denk, RTT-Munich
• Data Analysis and Visualization for the DoD HPCMP, Paul Adams, ERDC
• Why HPCs Hate Biologists, and What We're Doing About It, Titus Brown, Michigan State University
• Scalable Data Mining and Archiving in the Era of the Square Kilometre Array, the Square Kilometre
Array Telescope Project, Chris Mattmann, NASA/JPL
• Big Data and Analytics in HPC: Leveraging HPC and Enterprise Architectures for Large Scale Inline
Transactional Analytics in Fraud Detection at PayPal, Arno Kolster, PayPal, an eBay Company
• Big Data and Analytics Vendor Panel: How Vendors See Big Data Impacting the Markets and Their
Products/Services, Panel Moderator: Chirag Dekate, IDC
• Data Analysis and Visualization of Very Large Data, David Pugmire, ORNL
• The Impact of HPC and Data-Centric Computing in Cancer Research, Jack Collins, National Cancer
Institute
• Urban Analytics: Big Cities and Big Data, Paul Muzio, City University of New York
• Stampede: Intel MIC And Data-Intensive Computing, Jay Boisseau, Texas Advanced Computing
Center
• Big Data Approaches at Convey, John Leidel
• Cray Technical Perspective On Data-Intensive Computing, Amar Shan
• Data-intensive Computing Research At PNNL, John Feo, Pacific Northwest National Laboratory
• Trends in High Performance Analytics, David Pope, SAS
• Processing Large Volumes of Experimental Data, Shane Canon, LBNL
• SGI Technical Perspective On Data-Intensive Computing, Eng Lim Goh, SGI
• Big Data and PLFS: A Checkpoint File System For Parallel Applications, John Bent, EMC
• HPC Data-intensive Computing Technologies, Scott Campbell, Platform/IBM

Weitere ähnliche Inhalte

Was ist angesagt?

Measuring HPC: Performance, Cost, & Value
Measuring HPC: Performance, Cost, & ValueMeasuring HPC: Performance, Cost, & Value
Measuring HPC: Performance, Cost, & Valueinside-BigData.com
 
HPC Market Update from Hyperion Research
HPC Market Update from Hyperion ResearchHPC Market Update from Hyperion Research
HPC Market Update from Hyperion Researchinside-BigData.com
 
High Performance Data Analysis (HPDA): HPC - Big Data Convergence
High Performance Data Analysis (HPDA): HPC - Big Data ConvergenceHigh Performance Data Analysis (HPDA): HPC - Big Data Convergence
High Performance Data Analysis (HPDA): HPC - Big Data Convergenceinside-BigData.com
 
Application Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance CenterApplication Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance Centerinside-BigData.com
 
Development Trends of Next-Generation Supercomputers
Development Trends of Next-Generation SupercomputersDevelopment Trends of Next-Generation Supercomputers
Development Trends of Next-Generation Supercomputersinside-BigData.com
 
BIG Data & Hadoop Applications in Logistics
BIG Data & Hadoop Applications in LogisticsBIG Data & Hadoop Applications in Logistics
BIG Data & Hadoop Applications in LogisticsSkillspeed
 
iphix-demo-day.pdf
iphix-demo-day.pdfiphix-demo-day.pdf
iphix-demo-day.pdfEd Dodds
 
Whitepaper - Transforming the Energy & Utilities Industry with Smart Analytics
Whitepaper - Transforming the Energy & Utilities Industry with Smart AnalyticsWhitepaper - Transforming the Energy & Utilities Industry with Smart Analytics
Whitepaper - Transforming the Energy & Utilities Industry with Smart AnalyticseInfochips (An Arrow Company)
 
Big Data Meetup: Data Science & Big Data in Telecom
Big Data Meetup: Data Science & Big Data in TelecomBig Data Meetup: Data Science & Big Data in Telecom
Big Data Meetup: Data Science & Big Data in TelecomProvectus
 
Abivin - Big Data Analytics & Optimization
Abivin - Big Data Analytics & OptimizationAbivin - Big Data Analytics & Optimization
Abivin - Big Data Analytics & OptimizationLong Pham
 
using big-data methods analyse the Cross platform aviation
 using big-data methods analyse the Cross platform aviation using big-data methods analyse the Cross platform aviation
using big-data methods analyse the Cross platform aviationranjit banshpal
 
Needle in the Haystack by Anshul Vikram Pandey at QuantCon 2016
Needle in the Haystack by Anshul Vikram Pandey at QuantCon 2016Needle in the Haystack by Anshul Vikram Pandey at QuantCon 2016
Needle in the Haystack by Anshul Vikram Pandey at QuantCon 2016Quantopian
 
Big Data: Smart Technologies Provide Big Opportunities
Big Data: Smart Technologies Provide Big OpportunitiesBig Data: Smart Technologies Provide Big Opportunities
Big Data: Smart Technologies Provide Big OpportunitiesNAED_Org
 
Understanding Big Data so you can act with confidence
Understanding Big Data so you can act with confidenceUnderstanding Big Data so you can act with confidence
Understanding Big Data so you can act with confidenceIBM Software India
 
That's not a metric! Data for cloud-native success
That's not a metric! Data for cloud-native successThat's not a metric! Data for cloud-native success
That's not a metric! Data for cloud-native successGordon Haff
 

Was ist angesagt? (20)

Measuring HPC: Performance, Cost, & Value
Measuring HPC: Performance, Cost, & ValueMeasuring HPC: Performance, Cost, & Value
Measuring HPC: Performance, Cost, & Value
 
HPC Market Update from Hyperion Research
HPC Market Update from Hyperion ResearchHPC Market Update from Hyperion Research
HPC Market Update from Hyperion Research
 
High Performance Data Analysis (HPDA): HPC - Big Data Convergence
High Performance Data Analysis (HPDA): HPC - Big Data ConvergenceHigh Performance Data Analysis (HPDA): HPC - Big Data Convergence
High Performance Data Analysis (HPDA): HPC - Big Data Convergence
 
Hot Technology Topics in 2017
Hot Technology Topics in 2017Hot Technology Topics in 2017
Hot Technology Topics in 2017
 
Application Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance CenterApplication Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance Center
 
OrionX AI Survey
OrionX AI SurveyOrionX AI Survey
OrionX AI Survey
 
Development Trends of Next-Generation Supercomputers
Development Trends of Next-Generation SupercomputersDevelopment Trends of Next-Generation Supercomputers
Development Trends of Next-Generation Supercomputers
 
Outlook on Hot Technologies
Outlook on Hot TechnologiesOutlook on Hot Technologies
Outlook on Hot Technologies
 
BIG Data & Hadoop Applications in Logistics
BIG Data & Hadoop Applications in LogisticsBIG Data & Hadoop Applications in Logistics
BIG Data & Hadoop Applications in Logistics
 
iphix-demo-day.pdf
iphix-demo-day.pdfiphix-demo-day.pdf
iphix-demo-day.pdf
 
Whitepaper - Transforming the Energy & Utilities Industry with Smart Analytics
Whitepaper - Transforming the Energy & Utilities Industry with Smart AnalyticsWhitepaper - Transforming the Energy & Utilities Industry with Smart Analytics
Whitepaper - Transforming the Energy & Utilities Industry with Smart Analytics
 
Big Data Meetup: Data Science & Big Data in Telecom
Big Data Meetup: Data Science & Big Data in TelecomBig Data Meetup: Data Science & Big Data in Telecom
Big Data Meetup: Data Science & Big Data in Telecom
 
Abivin - Big Data Analytics & Optimization
Abivin - Big Data Analytics & OptimizationAbivin - Big Data Analytics & Optimization
Abivin - Big Data Analytics & Optimization
 
using big-data methods analyse the Cross platform aviation
 using big-data methods analyse the Cross platform aviation using big-data methods analyse the Cross platform aviation
using big-data methods analyse the Cross platform aviation
 
The Data Asset
The Data AssetThe Data Asset
The Data Asset
 
Needle in the Haystack by Anshul Vikram Pandey at QuantCon 2016
Needle in the Haystack by Anshul Vikram Pandey at QuantCon 2016Needle in the Haystack by Anshul Vikram Pandey at QuantCon 2016
Needle in the Haystack by Anshul Vikram Pandey at QuantCon 2016
 
Big Data: Smart Technologies Provide Big Opportunities
Big Data: Smart Technologies Provide Big OpportunitiesBig Data: Smart Technologies Provide Big Opportunities
Big Data: Smart Technologies Provide Big Opportunities
 
Understanding Big Data so you can act with confidence
Understanding Big Data so you can act with confidenceUnderstanding Big Data so you can act with confidence
Understanding Big Data so you can act with confidence
 
TPA
TPATPA
TPA
 
That's not a metric! Data for cloud-native success
That's not a metric! Data for cloud-native successThat's not a metric! Data for cloud-native success
That's not a metric! Data for cloud-native success
 

Andere mochten auch

Containerizing Distributed Pipes
Containerizing Distributed PipesContainerizing Distributed Pipes
Containerizing Distributed Pipesinside-BigData.com
 
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at UberWSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at UberWSO2
 
2016.10 HPDA in Precision Medicine
2016.10 HPDA in Precision Medicine2016.10 HPDA in Precision Medicine
2016.10 HPDA in Precision MedicineMichael Atkins
 
Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?Andrea Dal Pozzolo
 
Business Wizard Of The Year : Mr. AMAR BABU,M.D.-LENOVO INDIA
Business Wizard Of The Year : Mr. AMAR BABU,M.D.-LENOVO INDIABusiness Wizard Of The Year : Mr. AMAR BABU,M.D.-LENOVO INDIA
Business Wizard Of The Year : Mr. AMAR BABU,M.D.-LENOVO INDIAVARINDIA
 
EMC in HPC – The Journey so far and the Road Ahead
EMC in HPC – The Journey so far and the Road AheadEMC in HPC – The Journey so far and the Road Ahead
EMC in HPC – The Journey so far and the Road Aheadinside-BigData.com
 
Best Practices: Large Scale Multiphysics
Best Practices: Large Scale MultiphysicsBest Practices: Large Scale Multiphysics
Best Practices: Large Scale Multiphysicsinside-BigData.com
 
Maximizing HPC Compute Resources with Minimal Cost
Maximizing HPC Compute Resources with Minimal CostMaximizing HPC Compute Resources with Minimal Cost
Maximizing HPC Compute Resources with Minimal Costinside-BigData.com
 
IDC España Predictions 2014
IDC España Predictions 2014IDC España Predictions 2014
IDC España Predictions 2014Lluis Altes
 
Content marketing in the B2B customer journey
Content marketing in the B2B customer journeyContent marketing in the B2B customer journey
Content marketing in the B2B customer journeyHeadline.nl
 
Modern Computing: Cloud, Distributed, & High Performance
Modern Computing: Cloud, Distributed, & High PerformanceModern Computing: Cloud, Distributed, & High Performance
Modern Computing: Cloud, Distributed, & High Performanceinside-BigData.com
 
SC16 Student Cluster Competition Configurations & Results
SC16 Student Cluster Competition Configurations & ResultsSC16 Student Cluster Competition Configurations & Results
SC16 Student Cluster Competition Configurations & Resultsinside-BigData.com
 
Towards Exascale Computing with Fortran 2015
Towards Exascale Computing with Fortran 2015Towards Exascale Computing with Fortran 2015
Towards Exascale Computing with Fortran 2015inside-BigData.com
 
Conflictmanagement
ConflictmanagementConflictmanagement
Conflictmanagementamit singh
 
Don't Fall Into a Trap: How Business Continuity Management Can Help Data Brea...
Don't Fall Into a Trap: How Business Continuity Management Can Help Data Brea...Don't Fall Into a Trap: How Business Continuity Management Can Help Data Brea...
Don't Fall Into a Trap: How Business Continuity Management Can Help Data Brea...IBM Services
 

Andere mochten auch (19)

2016 IDC HPC Market Update
2016 IDC HPC Market Update2016 IDC HPC Market Update
2016 IDC HPC Market Update
 
Containerizing Distributed Pipes
Containerizing Distributed PipesContainerizing Distributed Pipes
Containerizing Distributed Pipes
 
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at UberWSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
 
HPC at HP Update
HPC at HP UpdateHPC at HP Update
HPC at HP Update
 
2016.10 HPDA in Precision Medicine
2016.10 HPDA in Precision Medicine2016.10 HPDA in Precision Medicine
2016.10 HPDA in Precision Medicine
 
Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?
 
Business Wizard Of The Year : Mr. AMAR BABU,M.D.-LENOVO INDIA
Business Wizard Of The Year : Mr. AMAR BABU,M.D.-LENOVO INDIABusiness Wizard Of The Year : Mr. AMAR BABU,M.D.-LENOVO INDIA
Business Wizard Of The Year : Mr. AMAR BABU,M.D.-LENOVO INDIA
 
EMC in HPC – The Journey so far and the Road Ahead
EMC in HPC – The Journey so far and the Road AheadEMC in HPC – The Journey so far and the Road Ahead
EMC in HPC – The Journey so far and the Road Ahead
 
Best Practices: Large Scale Multiphysics
Best Practices: Large Scale MultiphysicsBest Practices: Large Scale Multiphysics
Best Practices: Large Scale Multiphysics
 
Maximizing HPC Compute Resources with Minimal Cost
Maximizing HPC Compute Resources with Minimal CostMaximizing HPC Compute Resources with Minimal Cost
Maximizing HPC Compute Resources with Minimal Cost
 
IDC España Predictions 2014
IDC España Predictions 2014IDC España Predictions 2014
IDC España Predictions 2014
 
Content marketing in the B2B customer journey
Content marketing in the B2B customer journeyContent marketing in the B2B customer journey
Content marketing in the B2B customer journey
 
It's Time to ROCm!
It's Time to ROCm!It's Time to ROCm!
It's Time to ROCm!
 
Modern Computing: Cloud, Distributed, & High Performance
Modern Computing: Cloud, Distributed, & High PerformanceModern Computing: Cloud, Distributed, & High Performance
Modern Computing: Cloud, Distributed, & High Performance
 
SC16 Student Cluster Competition Configurations & Results
SC16 Student Cluster Competition Configurations & ResultsSC16 Student Cluster Competition Configurations & Results
SC16 Student Cluster Competition Configurations & Results
 
Idc predictions 2016
Idc predictions 2016Idc predictions 2016
Idc predictions 2016
 
Towards Exascale Computing with Fortran 2015
Towards Exascale Computing with Fortran 2015Towards Exascale Computing with Fortran 2015
Towards Exascale Computing with Fortran 2015
 
Conflictmanagement
ConflictmanagementConflictmanagement
Conflictmanagement
 
Don't Fall Into a Trap: How Business Continuity Management Can Help Data Brea...
Don't Fall Into a Trap: How Business Continuity Management Can Help Data Brea...Don't Fall Into a Trap: How Business Continuity Management Can Help Data Brea...
Don't Fall Into a Trap: How Business Continuity Management Can Help Data Brea...
 

Ähnlich wie IDC Perspectives on Big Data Outside of HPC

Ähnlich wie IDC Perspectives on Big Data Outside of HPC (20)

R180305120123
R180305120123R180305120123
R180305120123
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big Data
 
Fundamentals of Big Data
Fundamentals of Big DataFundamentals of Big Data
Fundamentals of Big Data
 
Big Data
Big DataBig Data
Big Data
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar Semwal
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Big Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data ScientistsBig Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data Scientists
 
5. big data vs it stki - pini cohen
5. big data vs  it    stki - pini cohen5. big data vs  it    stki - pini cohen
5. big data vs it stki - pini cohen
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattoo
 
Big data business case
Big data   business caseBig data   business case
Big data business case
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Big data and Internet
Big data and InternetBig data and Internet
Big data and Internet
 
Big data
Big dataBig data
Big data
 
Big Data 2.0
Big Data 2.0Big Data 2.0
Big Data 2.0
 
Capturing big value in big data
Capturing big value in big data Capturing big value in big data
Capturing big value in big data
 
Big Data et eGovernment
Big Data et eGovernmentBig Data et eGovernment
Big Data et eGovernment
 
An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigData
 
Big dataimplementation hadoop_and_beyond
Big dataimplementation hadoop_and_beyondBig dataimplementation hadoop_and_beyond
Big dataimplementation hadoop_and_beyond
 
Kartikey tripathi
Kartikey tripathiKartikey tripathi
Kartikey tripathi
 
BigDataFinal.pptx
BigDataFinal.pptxBigDataFinal.pptx
BigDataFinal.pptx
 

Mehr von inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networksinside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 

Mehr von inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Kürzlich hochgeladen

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 

Kürzlich hochgeladen (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

IDC Perspectives on Big Data Outside of HPC

  • 1. Jul-13© 2013 IDC IDC’s Perspective On Big Data Outside Of HPC
  • 2. Jul-13© 2013 IDC Big Data: A General Definition Value +  Lots of data  Time critical  Multiple types (e.g., numbers, text, video)  Worth something to someone
  • 3. Jul-13© 2013 IDC Defining Big Data: For the Broader IT Market
  • 4. Jul-13© 2013 IDC Top Drivers For Implementing Big Data
  • 5. Jul-13© 2013 IDC Organizational Challenges With Big Data: Government Compared To All Others
  • 6. Jul-13© 2013 IDC Big Data Software
  • 7. Jul-13© 2013 IDC Big Data Software Technology Stack
  • 8. Jul-13© 2013 IDC Big Data Software Shortcomings -- Today
  • 9. Jul-13© 2013 IDC HPDA = BIG DATA MEETS HPC AND ADVANCED SIMULATION
  • 10. Jul-13© 2013 IDC HPDA (High Performance Data Analysis): Data-Intensive Simulation and Analytics HPDA = tasks involving sufficient data volumes and algorithmic complexity to require HPC resources/approaches  Established (simulation) or newer (analytics) methods  Structured data, unstructured data, or both  Regular (e.g., Hadoop) or irregular (e.g., graph) patterns  Government, industry, or academia  Upward extensions of commercial business problems  Accumulated results of iterative problem-solving methods (e.g., stochastic modeling, parametric modeling).
  • 11. Jul-13© 2013 IDC HPDA Market Drivers  More input data (ingestion) • More powerful scientific instruments/sensor networks • More transactions/higher scrutiny (fraud, terrorism)  More output data for integration/analysis • More powerful computers • More realism • More iterations in available time  Real time, near-real time requirements • Catch fraud before it hits credit cards • Catch terrorists before they strike • Diagnose patients before they leave the office • Provide insurance quotes before callers leave the phone  The need to pose more intelligent questions • Smarter mathematical models and algorithms
  • 12. Jul-13© 2013 IDC Data Movement Is Expensive: In Energy and Time-to-Solution Energy Consumption  1MW ≈ $1 million  Computing 1 calculation ≈ 1 picojoule  Moving 1 calculation = up to 100 picojoules  => It can take 100 times more energy to move the results of a calculation than to perform the calculation in the first place. Strategies  Accelerate data movement (bandwidth, latency)  Minimize data movement (e.g., data reduction, in- memory compute, in-storage compute, etc.)
  • 13. Jul-13© 2013 IDC Different Systems for Different Jobs Partitionable Big Data Work  Most jobs are here!  Goal: search  Regular access patterns (locality)  Global memory not important  Standard clusters + Hadoop, Cassandra, etc. Non-Partitionable Work  Toughest jobs (e.g., graphing)  Goal: discovery  Irregular access patterns  Global memory very important  Systems turbo-charged for data movement +graphing versus HPC architectures today are compute-centric (FLOPS vs. IOPS)
  • 14. Jul-13© 2013 IDC IDC HPDA Server Forecast  Fast growth from a small starting point  In 2015, conservatively approaching $1B
  • 15. Jul-13© 2013 IDC END-USE EXAMPLES OF BIG DATA TODAY
  • 16. Jul-13© 2013 IDC Some Major Use Cases for HPDA • Fraud/error detection across massive databases  A horizontal use – applicable in many domains • National security/crime-fighting  SIGINT/anomaly detection/anti-hacking  Anti-terrorism (including evacuation planning)/anti-crime • Health care/medical informatics  Drug design, personalized medicine  Outcomes-based diagnosis & treatment planning  Systems biology • Customer acquisition/retention • Smart electrical grids • Design of social network architectures
  • 17. Jul-13© 2013 IDC Use Case: PayPal Fraud Detection / Internet Commerce Slides and permission provided by PayPal, an eBay company
  • 18. Jul-13© 2013 IDC The Problem Finding suspicious patterns that we don’t even know exist in related data sets.
  • 19. Jul-13© 2013 IDC What Kind of Volume? PayPal’s Data Volumes And HPDA Requirements
  • 20. Jul-13© 2013 IDC Where Paypal Used HPC
  • 21. Jul-13© 2013 IDC The Results  $710 million saved in fraud that they wouldn’t have been able to detect before (in the first year)
  • 22. Jul-13© 2013 IDC GEICO: Real-Time Insurance Quotes  Problem: Need accurate automated phone quotes in 100ms. They couldn’t do these calculations nearly fast enough on the fly.  Solution: Each weekend, use a new HPC cluster to pre- calculate quotes for every American adult and household (60 hour run time)
  • 23. Jul-13© 2013 IDC Global Courier Service: Fraud/Error Detection Here’s a real-world example of one of the biggest names in global package delivery. Their problem is not so different from PayPal’s. This courier service is doing real-time fraud detection on huge volumes of packages that come into their sorting facility from many locations and leave the facility for many other locations around the world.  Check 1 billion-plus packages per hour in central sorting facility  Benchmark won by a HPC vendor with a turbo-charged interconnect and memory system
  • 24. Jul-13© 2013 IDC Apollo Group/University of Phoenix: Student Recruitment and Retention Apollo Group is approaching 300,000 online students. To maintain and grow, they have to target millions of prospective students.  Must target millions of potential students  Must track student performance for early identification of potential dropouts – “churn” is very expensive  Solution: a sophisticated, cluster-based Big Data models
  • 25. Jul-13© 2013 IDC They use the cloud for this High Performance Data Analysis problem -- that’s not so surprising, since molecular dynamics codes are often highly parallel.
  • 27. Jul-13© 2013 IDC Optum + Mayo Initiative to Move Past Procedures-Based Healthcare You may have seen the recent news that Optum, which is part of United Health Group, is teaming with the Mayo Cline to build a large center ($500K) in Cambridge, Massachusetts to lay the research groundwork for outcomes-based medicine.  Data: 100M United Health Group claims (20 years) + 5M Mayo Clinic archived patient records. Option for genomic data  Findings will be published  Goal: outcomes-based care
  • 29. Jul-13© 2013 IDC Summary: HPDA Market Opportunity  HPDA: simulation + newer high-performance analytics • IDC predicts fast growth from a small starting point  HPC and high-end commercial analytics are converging • Algorithmic complexity is the common denominator • Technologies will evolve greatly  Economically important use cases are emerging  No single HPC solution is best for all problems • Clusters with MR/Hadoop will handle most but not all work (e.g., graph analysis) • New technologies will be required in many areas  IDC believes our growth estimates could be conservative
  • 30. Jul-13© 2013 IDC HPDA User Talks: HPC User Forums, UK, Germany, France, China and U.S. • HPC in Evolutionary Biology, Andrew Meade, University of Reading • HPC in Pharmaceutical Research: From Virtual Screening to All-Atom Simulations of Biomolecules, Jan Kriegl, Boehringer-Ingelheim • European Exascale Software Initiative, Jean-Yves Berthou, Electricite de France • Real-time Rendering in the Automotive Industry, Cornelia Denk, RTT-Munich • Data Analysis and Visualization for the DoD HPCMP, Paul Adams, ERDC • Why HPCs Hate Biologists, and What We're Doing About It, Titus Brown, Michigan State University • Scalable Data Mining and Archiving in the Era of the Square Kilometre Array, the Square Kilometre Array Telescope Project, Chris Mattmann, NASA/JPL • Big Data and Analytics in HPC: Leveraging HPC and Enterprise Architectures for Large Scale Inline Transactional Analytics in Fraud Detection at PayPal, Arno Kolster, PayPal, an eBay Company • Big Data and Analytics Vendor Panel: How Vendors See Big Data Impacting the Markets and Their Products/Services, Panel Moderator: Chirag Dekate, IDC • Data Analysis and Visualization of Very Large Data, David Pugmire, ORNL • The Impact of HPC and Data-Centric Computing in Cancer Research, Jack Collins, National Cancer Institute • Urban Analytics: Big Cities and Big Data, Paul Muzio, City University of New York • Stampede: Intel MIC And Data-Intensive Computing, Jay Boisseau, Texas Advanced Computing Center • Big Data Approaches at Convey, John Leidel • Cray Technical Perspective On Data-Intensive Computing, Amar Shan • Data-intensive Computing Research At PNNL, John Feo, Pacific Northwest National Laboratory • Trends in High Performance Analytics, David Pope, SAS • Processing Large Volumes of Experimental Data, Shane Canon, LBNL • SGI Technical Perspective On Data-Intensive Computing, Eng Lim Goh, SGI • Big Data and PLFS: A Checkpoint File System For Parallel Applications, John Bent, EMC • HPC Data-intensive Computing Technologies, Scott Campbell, Platform/IBM

Hinweis der Redaktion

  1. Here’s a general definition of Big Data using the schema of the “four V’s” that’s become familiar. This isn’t specific to high performance data analysis. It applies to Big Data across all markets.To qualify as Big Data in this general context, the data set has to be large in volume, critical to analyze in a timeframe...It has to include multiple types of data and it has to be worthwhile to someone, preferably with a monetary value.
  2. The emerging market for high performance data analysis is narrower than that. As I said a minute ago, it’s the market being formed by the convergence of data-intensive simulation and data-intensive analytical methods, so it’s really a union set. As the slide shows, this evolving market is very inclusive in relation to methods, types of data, and market sectors. The common denominator across these segments is the use of models that incorporate algorithmic complexity. You typically don’t find that kind of algorithmic complexity in online transaction processing or in commercial applications such as supply chain management and customer relationship management.The ultimate criterion for HPDA that it requires HPC resources.
  3. There are important HPDA market drivers on the data ingestion side and the data output side. Data sources have become much more powerful. CERN’s Large Hadron Collider generates 1PB/second when it’s running. The Square Kilometer Array telescope will produce 1EB/day when it becomes operational in 2016. - But those are extreme examples. Much more common are sensor networks for power grids and other things, gene sequencers, MRI machines, and so on.- Onllne sales transactions produce a lot of data and a lot of opportunity for fraud. Standards, regulations and lawsuits are on the rise. Boeing stores all its engineering data for the 30-year lifetime of their commercial airplanes, not just as a reference for designing future planes but in case there’s a crash and a lawsuit. On the output side, more powerful HPC systems are kicking out lots more data in response to the growing user requirements you see listed here.
  4. Moving data costs time and money. Energy has become very expensive. It can take 100 times more energy to move the results of a calculation than to perform the calculation in the first place. It’s no wonder that oil and gas companies, for example, still rely heavily on courier services for overnight shipping of disk drives. It would take too long and cost too much to send the data over a computer network.- If you’re a vendor, you have two main strategies available to you: you can speed up data movement , mainly through better interconnects, or you can minimize data movement by pre-filtering data or bringing the compute to the data, or you can both accelerate and minimize.
  5. The data in most HPDA jobs assigned to HPC resources will continue to have regular access patterns, whether the data is structured or unstructured.This means it can be partitioned and mapped onto a standard cluster or other distributed memory machine for running Hadoop or other software.But there’s a rising tide of data work that exhibits irregular access patterns and can’t take advantage of data locality processing features. Caches are highly inefficient for jobs like this. These jobs benefit from global memory combined with powerful interconnects and other data movement capabilities. Partitionable jobs are very important now and non-partitionable jobs are becoming more important. By the way, SGI systems address both types. One general remark is that as the data analysis side of HPC expands, HPC architectures will need to become less compute-centric and offer more support for data integration and analysis.“Many current approaches to big data have been about ‘search’ – the ability to efficiently find something that you know is there in your data,” said Arvind Parthasarathi, President of YarcData. “uRiKA was purposely built to solve the problem of ‘discovery’ in big data – to discover things, relationships or patterns that you don’t know exist. By giving organizations the ability to do much faster hypothesis validation at scale and in real time, we are enabling the solution of business problems that were previously difficult or impossible – whether it be discovering the ideal patient treatment, investigating fraud, detecting threats, finding new trading algorithms or identifying counter-party risk. Basically, we are systematizing serendipity.”
  6. HPC servers are often used for more than one purpose. IDC classifies HPC servers according to the primary purpose they’re used for. So, an HPDA server is one that’s used more than 50% for HPDA work. As this table shows, IDC forecasts that revenue for HPC servers acquired primarily for HPDA use will grow robustly (10.4% CAGR) to approach $1 billion in 2015. Because HPDA revenue starts as such a relatively small chunk of overall HPC server revenue, the HPDA share of the overall HPC server revenue will still be in the single digits in 2015, despite the fast growth rate.
  7. Let’s look at some real-world use cases
  8. This slide lists some of the most prominent use cases, meaning ones where repeated sales of HPC products have been happening. Fraud detection and life sciences are emerging fastest. BTW, I didn’t include financial services here because we’ve been tracking back-office FSI analytics as part of the HPC market for more than 20 years. But FSI is an important part of the high performance data analysis market. – not an easy one to penetrate for the first time.
  9. I want to zero in more on the PayPal example because they gave me permission to use these slides and because in many ways they are representative of a larger group of commercial companies whose business requirements are pushing them up into HPC. The slides are from a talk PayPal gave IDC’s September 2012 HPC User Forum meeting in Dearborn, Michigan. By the way, if you want a copy of this talk or any of the long list of talks on one of our first slides, just email me at sconway [at] idc.com
  10. PayPal is an eBay subsidiary and, among other things, has responsibility for detecting fraud across eBay and SKYPE. Five years ago, a day's worth of data was processed in batch processing overnight and fraud wasn't detected until as much as two weeks later. They realized they needed to detect fraud in real time, and for that they needed graph analysis. They were most interested in checking out collusion between multiple parties, such as when a credit card shows activity from four or more users. They needed to be able to stop that before the credit card got hit. IBM Watson on the Jeopardy game show was amazing but it was a needle in a haystack problem, meaning that Watson could only find answers that were already in its database. PayPal’s problem was different, because there was no visible needle to be found. Graph analysis let them uncover hidden relationships and behavior patterns
  11. This gives you an idea of PayPal’s data volumes and HPDA requirements. These are going up all the time.
  12. Here’s what PayPal is using. For the serious fraud detection and analysis, they’re using SGI servers and storage on an InfiniBand network. For the less-challenging work that doesn’t involve pattern discovery and real-time requirements, they’re running Hadoop on a cluster. By the way, PayPal says HPC has already saved them $710 million in fraud they wouldn’t have been able to detect before.
  13. This gives you an idea of PayPal’s data volumes and HPDA requirements. These are going up all the time.
  14. For cost and growth reasons, GEICO moved to automated insurance quotes on the phone. They needed to provide quotes instantaneously, in 100 milliseconds or less. They couldn’t do these calculations nearly fast enough on the fly .GEICO’s solution was to install an HPC system and every weekend run updated quotes for every adult and every household in the United States. That takes 60 wall clock hours today. The phones tap into the stored quotes and return the correct one in 100 milliseconds.
  15. Here’s a real-world example of one of the biggest names in global package delivery. Their problem is not so different from PayPal’s. This courier service is doing real-time fraud detection on huge volumes of packages that come into their sorting facility from many locations and leave the facility for many other locations around the world.They ran a difficult benchmark. The winner hasn’t been publicly announced yet, but IDC’s back channels tell us the vendor has a 3-letter name that starts with S.
  16. Schrödinger is a global life sciences software company with offices in Munich and Mannheim. One of the major things they do is use molecular dynamics to identify promising candidates for new drugs to combat cancer and other diseases – and it seems they’ve been using the cloud for this High Performance Data Analysis problem. That’s not so surprising, since molecular dynamics codes are often highly parallel.
  17. Here’s the architecture they used. Note that they were already using HPC in their on premise data center, but the resources weren’t big enough for this task. That’s why they bursted out to Amazon EC2 using a software management layer from Cycle Computing to access more than 50,000 additional cores. Bringing a new drug to market can cost as much as £10 billion and a decade of time, so security is a major concern with commercial drug discovery. Apparently, Schrödinger felt confident about the cloud security measures.
  18. You may have seen the recent news that Optum, which is part of United Health Group, is teaming with the Mayo Cline to build a huge center in Cambridge, Massachusetts to lay the research groundwork for outcomes-based medicine. They’ll have more than 100 million patient records at their disposal for this enormous data-intensive work.They’ll be using data-intensive methods to look at other aspects of health care, too. A week ago, United Health issued a press release in which they said they believe that improved efficiencies alone could reduce Medicare costs by about 40%, obviating much of the need for the major reforms the political parties have been fighting about.
  19. In the U.S., the largest urban gangs are the Crips and the Bloods. They’re rival gangs that are at each other’s throats all the time, fighting for money and power. Both gangs are national in scope, but the national organizations aren’t that strong. The branches of these gangs in each city have a lot of autonomy to do what they want.What you see here, again in blurred form, was something that astounded the police department of Atlanta, Georgia, a city with about 4 million inhabitants. Through real-time monitory of social networks, they were able to witness, as it happened, the planned merger of these rival gangs in their city. This information allowed the police to adapt their own plans accordingly.
  20. In summary, we defined HPDA and told you that IDC is forecasting rapid growth from a small base.HPDA is about the convergence of data-intensive HPC and high-end commercial analytics. One of the most interesting aspects of this, to us, is that the demands of the commercial market are moving this along faster in the commercial sector than in the traditional HPC market. PayPal is a great example of this (story of how PayPal was shy about presenting at User Forum – both sides should be learning from each other). On the analytics side, some attractive use cases are already out there. In the time allotted to us here, we described some of the more prominent ones, but there are many others.Most of the work will be done on clusters, but some economically important use cases need more capable architectures, especially for graph analytics.Many of the large investment firms are IDC clients, so our growth estimates tend to err on the side of conservatism. There is potential for the HPDA market to grow faster than our current forecast. But we talk with a lot of people and we update the forecasts often, so we get too far off the mark.
  21. This is a partial list of the user and vendor talks on this topic that we’ve lined up in the past two years as part of the HPC User Forum. IDC has operated the HPC User Forum since 1999 for a volunteer steering committee made up of senior HPC people from government, industry and academia – organizations like Boeing, GM, Ford, NSF and others. We hold meetings across the world, and the talks listed here include perspectives on High Performance Data Analysis from the Americas, Europe and Asia.I’ll ask Chirag to explain how we define High Performance Data Analysis. I’ll return later to walk you through some real-world use cases. Chirag...