SlideShare ist ein Scribd-Unternehmen logo
1 von 42
Big Science, Big Data
         Simon Metson
      simon@cloudant.com
Outline
• Who’s this guy?
• Computing for the LHC experiments
• NoSQL tools
• Landslide modelling
• Issues arising for Universities
Who am I?
• Until March I was a research associate
  working on computing for CMS - one of
  the LHC experiments at CERN
• In March I began transitioning to
  working for Cloudant as an “ecology
  engineer”
• Have dealt with multi-petabyte datasets
  for the last 10 years
The CMS Experiment
Workflow ladder
Number of users
                   Large datasets (>100 TB)




                                                 }
                    Complex computation


                   Large datasets (>100 TB)             Use Grid compute and storag
                     Simple computation
                                                                  exclusively
                  Shared datasets (>500 GB)
                    Complex computation

                  Shared datasets (10-500 GB)



                                                 }
                     Complex computation
                                                        Work on departmental resourc
                  Shared datasets (10-100 GB)        store resulting datasets to Grid st
                      Simple computation


                  Shared datasets (0.1-10 GB)



                                                 }     Work on laptop/desktop machi
                      Simple computation


                  Private datasets (0.1-10 GB)
                                                       store resulting datasets to loca
                      Simple computation                          Grid storage
Warning: Obligatory
formula on next slide
The formula
The formula
Fixed




                 Usually fixed
The formula
People are important
• Be nice to people working on weekends
• The “cost” of a person is one place you
  can make savings - e.g. by giving them
  the ability to do more
• Building a suitable team is hard, takes
  time and is essential for success
Observation
•   What's interesting is that big data isn't
    interesting any more
    •   Unless you are of a similar scale to
        Google you don't need to write your
        own system
    •   Doesn't mean it's easy, though!
•   A terabyte was quite a lot 10 years ago,
    now it’s commodity hardware
****
When all you have is a hammer
 everything looks like a nail
General NoSQL
     observations
• Good for startups with limited resources
  and exposure to risk
• Good for large companies who build
  data centres with lots of loosely related
  data and large DevOps teams
• How does this fit with University
  researchers?
LHC computing
       evolution
• Our current system works, but at a high
  staff cost
• Expect simplification of system, retire
  bespoke components in favour of
  generic tools
Why are landslides
   an issue?
Use cases
• Needs to be usable by geographically
  dispersed, non-expert field engineers
• Need expert approval step
• Need to be accessible on low end
  hardware
• Need to run 1000’s of simulations per
  slope/storm and analyse result data
Aside: Complexity

                                  Variations
         Cut slope   Stochastic    for each
Slopes                                       Output files    Runtime
          angles     parameters   stochastic
                                  parameter

                                                               0.25
  1         1            0            0           1
                                                            (cpu hours
Aside: Complexity

                                  Variations
         Cut slope   Stochastic    for each
Slopes                                       Output files    Runtime
          angles     parameters   stochastic
                                  parameter

                                                               6.25
  1         25           0            0           25
                                                            (cpu hours
Aside: Complexity

                                  Variations
         Cut slope   Stochastic    for each
Slopes                                       Output files    Runtime
          angles     parameters   stochastic
                                  parameter

                                                               312.5
  1         25           5           10         1250
                                                            (cpu hours
Aside: Complexity

                                  Variations
         Cut slope   Stochastic    for each
Slopes                                       Output files    Runtime
          angles     parameters   stochastic
                                  parameter

                                                               31250
 100        25           5           10        125000
                                                            (cpu hours
Aside: Complexity

                                  Variations
         Cut slope   Stochastic    for each
Slopes                                       Output files   Runtime
          angles     parameters   stochastic
                                  parameter

                                                              3.5
 100        25           5           10        125000
                                                            (years)
Aside: Complexity
• The above is for one storm, simulate
  many
• Can easily have more stochastic
  parameters, or vary them in a more fine
  grained manner
• May want to compare across software
  versions - standard datasets
Problem solved?
      Text
Use cases
 • Needs to be usable by geographically
    dispersed, non-expert field engineers
 • Need expert approval step
 • Need to be accessible on low end
    hardware
 • Need to run 1000’s of simulations per
    slope/storm and analyse result data
mpossible scale with current tools/manpower
Design schematic
 Geographers                      Job submi
 validate input                     daemon
      data


                                  Job Job

                                  Job Job
                  Write results
                                  Job Job

                                  Job Job

                                  Job Job



Field engineers                   Governme
Design schematic
 Geographers                                                            Job submi
 validate input                                                           daemon
      data


                                                                        Job Job

                                                                        Job Job
                                                        Write results
                                                                        Job Job

                                                                        Job Job


                                            Replicate
                                                                        Job Job



Field engineers                                                         Governme
                  Upload measurements via
Web 3.0
Visualisation
Big Data and
       Universities
• Data intensive research will become the
  norm (already is in many fields)
• Universities will need   access to Big
  Data resources
• Expect significant use from
  nontraditional fields
• Expect new fields to emerge
Workflow ladder
Number of users
                   Large datasets (>100 TB)




                                                 }
                    Complex computation


                   Large datasets (>100 TB)             Use Grid compute and storag
                     Simple computation
                                                                  exclusively
                  Shared datasets (>500 GB)
                    Complex computation

                  Shared datasets (10-500 GB)



                                                 }
                     Complex computation
                                                        Work on departmental resourc
                  Shared datasets (10-100 GB)        store resulting datasets to Grid st
                      Simple computation


                  Shared datasets (0.1-10 GB)



                                                 }     Work on laptop/desktop machi
                      Simple computation


                  Private datasets (0.1-10 GB)
                                                       store resulting datasets to loca
                      Simple computation                          Grid storage
Workflow ladder
Number of users
                   Large datasets (>100 TB)
                    Complex computation


                   Large datasets (>100 TB)
                     Simple computation


                  Shared datasets (>500 GB)
                    Complex computation

                  Shared datasets (10-500 GB)
                     Complex computation


                  Shared datasets (10-100 GB)
                      Simple computation


                  Shared datasets (0.1-10 GB)
                      Simple computation


                  Private datasets (0.1-10 GB)
                      Simple computation
Implications for the
        future
• Quality and scale of Big Data resource
  will have direct impact on ability of
  Universities to do research at
  international level
• Universities will need to provide data
  intensive compute resources to
  complement traditional HPC
Implications for the
        future
• Big data clusters are very different
  architecturally to HPC clusters
• Data is stateful; harder to manage than
  HPC
• Interesting legal issues arise
Implications for the
        future
• Cost savings from SaaS vendors can
  be hard to realise at an institute level
• Building DevOps teams is not
  something University funding easily
  supports
Summary
• Big data is mainstream
• Should be seen as an enabling
  technology for academics
• Not trivial to adopt
• Universities need to build up teams to
  support these activities, or find ways to
  out source

Weitere ähnliche Inhalte

Was ist angesagt?

5 Major Challenges in Interactive Rendering
5 Major Challenges in Interactive Rendering5 Major Challenges in Interactive Rendering
5 Major Challenges in Interactive RenderingElectronic Arts / DICE
 
NVIDIA effects GDC09
NVIDIA effects GDC09NVIDIA effects GDC09
NVIDIA effects GDC09IGDA_London
 
Masked Software Occlusion Culling
Masked Software Occlusion CullingMasked Software Occlusion Culling
Masked Software Occlusion CullingIntel® Software
 
Foveated Ray Tracing for VR on Multiple GPUs
Foveated Ray Tracing for VR on Multiple GPUsFoveated Ray Tracing for VR on Multiple GPUs
Foveated Ray Tracing for VR on Multiple GPUsTakahiro Harada
 
Scalable Services For Digital Preservation Ross King
Scalable Services For Digital Preservation Ross KingScalable Services For Digital Preservation Ross King
Scalable Services For Digital Preservation Ross KingDigitalPreservationEurope
 
Optimizing the graphics pipeline with compute
Optimizing the graphics pipeline with computeOptimizing the graphics pipeline with compute
Optimizing the graphics pipeline with computeWuBinbo
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasAMD Developer Central
 
High Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in FrostbiteHigh Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in FrostbiteElectronic Arts / DICE
 
Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!Johan Andersson
 
GTC 2012: GPU-Accelerated Path Rendering
GTC 2012: GPU-Accelerated Path RenderingGTC 2012: GPU-Accelerated Path Rendering
GTC 2012: GPU-Accelerated Path Rendering Mark Kilgard
 
Hardware Shaders
Hardware ShadersHardware Shaders
Hardware Shadersgueste52f1b
 
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Johan Andersson
 
[2017 GDC] Radeon ProRender and Radeon Rays in a Gaming Rendering Workflow
[2017 GDC] Radeon ProRender and Radeon Rays in a Gaming Rendering Workflow[2017 GDC] Radeon ProRender and Radeon Rays in a Gaming Rendering Workflow
[2017 GDC] Radeon ProRender and Radeon Rays in a Gaming Rendering WorkflowTakahiro Harada
 
[2018 GDC] Real-Time Ray-Tracing Techniques for Integration into Existing Ren...
[2018 GDC] Real-Time Ray-Tracing Techniques for Integration into Existing Ren...[2018 GDC] Real-Time Ray-Tracing Techniques for Integration into Existing Ren...
[2018 GDC] Real-Time Ray-Tracing Techniques for Integration into Existing Ren...Takahiro Harada
 
The Rendering Technology of Killzone 2
The Rendering Technology of Killzone 2The Rendering Technology of Killzone 2
The Rendering Technology of Killzone 2Guerrilla
 
NVIDIA OpenGL in 2016
NVIDIA OpenGL in 2016NVIDIA OpenGL in 2016
NVIDIA OpenGL in 2016Mark Kilgard
 
Parallel Futures of a Game Engine
Parallel Futures of a Game EngineParallel Futures of a Game Engine
Parallel Futures of a Game EngineJohan Andersson
 

Was ist angesagt? (20)

5 Major Challenges in Interactive Rendering
5 Major Challenges in Interactive Rendering5 Major Challenges in Interactive Rendering
5 Major Challenges in Interactive Rendering
 
NVIDIA effects GDC09
NVIDIA effects GDC09NVIDIA effects GDC09
NVIDIA effects GDC09
 
Bending the Graphics Pipeline
Bending the Graphics PipelineBending the Graphics Pipeline
Bending the Graphics Pipeline
 
Mantle for Developers
Mantle for DevelopersMantle for Developers
Mantle for Developers
 
Masked Software Occlusion Culling
Masked Software Occlusion CullingMasked Software Occlusion Culling
Masked Software Occlusion Culling
 
Foveated Ray Tracing for VR on Multiple GPUs
Foveated Ray Tracing for VR on Multiple GPUsFoveated Ray Tracing for VR on Multiple GPUs
Foveated Ray Tracing for VR on Multiple GPUs
 
Scalable Services For Digital Preservation Ross King
Scalable Services For Digital Preservation Ross KingScalable Services For Digital Preservation Ross King
Scalable Services For Digital Preservation Ross King
 
Optimizing the graphics pipeline with compute
Optimizing the graphics pipeline with computeOptimizing the graphics pipeline with compute
Optimizing the graphics pipeline with compute
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
 
High Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in FrostbiteHigh Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in Frostbite
 
Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!
 
GTC 2012: GPU-Accelerated Path Rendering
GTC 2012: GPU-Accelerated Path RenderingGTC 2012: GPU-Accelerated Path Rendering
GTC 2012: GPU-Accelerated Path Rendering
 
Hardware Shaders
Hardware ShadersHardware Shaders
Hardware Shaders
 
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
 
[2017 GDC] Radeon ProRender and Radeon Rays in a Gaming Rendering Workflow
[2017 GDC] Radeon ProRender and Radeon Rays in a Gaming Rendering Workflow[2017 GDC] Radeon ProRender and Radeon Rays in a Gaming Rendering Workflow
[2017 GDC] Radeon ProRender and Radeon Rays in a Gaming Rendering Workflow
 
[2018 GDC] Real-Time Ray-Tracing Techniques for Integration into Existing Ren...
[2018 GDC] Real-Time Ray-Tracing Techniques for Integration into Existing Ren...[2018 GDC] Real-Time Ray-Tracing Techniques for Integration into Existing Ren...
[2018 GDC] Real-Time Ray-Tracing Techniques for Integration into Existing Ren...
 
The Rendering Technology of Killzone 2
The Rendering Technology of Killzone 2The Rendering Technology of Killzone 2
The Rendering Technology of Killzone 2
 
NVIDIA OpenGL in 2016
NVIDIA OpenGL in 2016NVIDIA OpenGL in 2016
NVIDIA OpenGL in 2016
 
Low-level Graphics APIs
Low-level Graphics APIsLow-level Graphics APIs
Low-level Graphics APIs
 
Parallel Futures of a Game Engine
Parallel Futures of a Game EngineParallel Futures of a Game Engine
Parallel Futures of a Game Engine
 

Andere mochten auch

Conducting at the Piazza Venezia
Conducting at the Piazza VeneziaConducting at the Piazza Venezia
Conducting at the Piazza VeneziaEduserv
 
Dr Brian Gaffney - The return on investment of delivering health services onl...
Dr Brian Gaffney - The return on investment of delivering health services onl...Dr Brian Gaffney - The return on investment of delivering health services onl...
Dr Brian Gaffney - The return on investment of delivering health services onl...Eduserv
 
Investigation into the management of web content in Higher Education Institut...
Investigation into the management of web content in Higher Education Institut...Investigation into the management of web content in Higher Education Institut...
Investigation into the management of web content in Higher Education Institut...Eduserv
 
News from the New Coffehouses
News from the New CoffehousesNews from the New Coffehouses
News from the New CoffehousesEduserv
 
The iBorrow Experience
The iBorrow ExperienceThe iBorrow Experience
The iBorrow ExperienceEduserv
 
Guy Coates
Guy CoatesGuy Coates
Guy CoatesEduserv
 
Rethinking concepts in virtual worlds and education research
Rethinking concepts in virtual worlds and education researchRethinking concepts in virtual worlds and education research
Rethinking concepts in virtual worlds and education researchEduserv
 
This is Me - Digital Identity and Reputation on the Internet
This is Me - Digital Identity and Reputation on the InternetThis is Me - Digital Identity and Reputation on the Internet
This is Me - Digital Identity and Reputation on the InternetEduserv
 
Moonshot-enabled Federated Access to Cloud Infrastructure
Moonshot-enabled Federated Access to Cloud InfrastructureMoonshot-enabled Federated Access to Cloud Infrastructure
Moonshot-enabled Federated Access to Cloud InfrastructureEduserv
 
Update on Raptor - understanding usage information for e-resources - Dr Rhys ...
Update on Raptor - understanding usage information for e-resources - Dr Rhys ...Update on Raptor - understanding usage information for e-resources - Dr Rhys ...
Update on Raptor - understanding usage information for e-resources - Dr Rhys ...Eduserv
 

Andere mochten auch (10)

Conducting at the Piazza Venezia
Conducting at the Piazza VeneziaConducting at the Piazza Venezia
Conducting at the Piazza Venezia
 
Dr Brian Gaffney - The return on investment of delivering health services onl...
Dr Brian Gaffney - The return on investment of delivering health services onl...Dr Brian Gaffney - The return on investment of delivering health services onl...
Dr Brian Gaffney - The return on investment of delivering health services onl...
 
Investigation into the management of web content in Higher Education Institut...
Investigation into the management of web content in Higher Education Institut...Investigation into the management of web content in Higher Education Institut...
Investigation into the management of web content in Higher Education Institut...
 
News from the New Coffehouses
News from the New CoffehousesNews from the New Coffehouses
News from the New Coffehouses
 
The iBorrow Experience
The iBorrow ExperienceThe iBorrow Experience
The iBorrow Experience
 
Guy Coates
Guy CoatesGuy Coates
Guy Coates
 
Rethinking concepts in virtual worlds and education research
Rethinking concepts in virtual worlds and education researchRethinking concepts in virtual worlds and education research
Rethinking concepts in virtual worlds and education research
 
This is Me - Digital Identity and Reputation on the Internet
This is Me - Digital Identity and Reputation on the InternetThis is Me - Digital Identity and Reputation on the Internet
This is Me - Digital Identity and Reputation on the Internet
 
Moonshot-enabled Federated Access to Cloud Infrastructure
Moonshot-enabled Federated Access to Cloud InfrastructureMoonshot-enabled Federated Access to Cloud Infrastructure
Moonshot-enabled Federated Access to Cloud Infrastructure
 
Update on Raptor - understanding usage information for e-resources - Dr Rhys ...
Update on Raptor - understanding usage information for e-resources - Dr Rhys ...Update on Raptor - understanding usage information for e-resources - Dr Rhys ...
Update on Raptor - understanding usage information for e-resources - Dr Rhys ...
 

Ähnlich wie Big Science, Big Data: Simon Metson at Eduserv Symposium 2012

Spark Summit EU talk by Nick Pentreath
Spark Summit EU talk by Nick PentreathSpark Summit EU talk by Nick Pentreath
Spark Summit EU talk by Nick PentreathSpark Summit
 
SysML to Discrete-event Simulation to Analyze Electronic Assembly Systems
SysML to Discrete-event Simulation to Analyze Electronic Assembly SystemsSysML to Discrete-event Simulation to Analyze Electronic Assembly Systems
SysML to Discrete-event Simulation to Analyze Electronic Assembly SystemsDaniele Gianni
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...MLconf
 
Performance Oriented Design
Performance Oriented DesignPerformance Oriented Design
Performance Oriented DesignRodrigo Campos
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Yahoo Developer Network
 
유연하고 확장성 있는 빅데이터 처리
유연하고 확장성 있는 빅데이터 처리유연하고 확장성 있는 빅데이터 처리
유연하고 확장성 있는 빅데이터 처리NAVER D2
 
MEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop MicrosoftMEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop MicrosoftLee Stott
 
Profiling & Testing with Spark
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with SparkRoger Rafanell Mas
 
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...Flink Forward
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...Chester Chen
 
Cloudcon East Presentation
Cloudcon East PresentationCloudcon East Presentation
Cloudcon East Presentationbr7tt
 
Cloudcon East Presentation
Cloudcon East PresentationCloudcon East Presentation
Cloudcon East Presentationbr7tt
 
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...Cloudera, Inc.
 
Partitioning CCGrid 2012
Partitioning CCGrid 2012Partitioning CCGrid 2012
Partitioning CCGrid 2012Weiwei Chen
 
Vizuri Exadata East Coast Users Conference
Vizuri Exadata East Coast Users ConferenceVizuri Exadata East Coast Users Conference
Vizuri Exadata East Coast Users ConferenceIsaac Christoffersen
 
Architecture Best Practices on Windows Azure
Architecture Best Practices on Windows AzureArchitecture Best Practices on Windows Azure
Architecture Best Practices on Windows AzureNuno Godinho
 
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache GiraphAvery Ching
 
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)Benoit Combemale
 
Cacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccCacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccsrisatish ambati
 
Betting On Data Grids
Betting On Data GridsBetting On Data Grids
Betting On Data Gridsgojkoadzic
 

Ähnlich wie Big Science, Big Data: Simon Metson at Eduserv Symposium 2012 (20)

Spark Summit EU talk by Nick Pentreath
Spark Summit EU talk by Nick PentreathSpark Summit EU talk by Nick Pentreath
Spark Summit EU talk by Nick Pentreath
 
SysML to Discrete-event Simulation to Analyze Electronic Assembly Systems
SysML to Discrete-event Simulation to Analyze Electronic Assembly SystemsSysML to Discrete-event Simulation to Analyze Electronic Assembly Systems
SysML to Discrete-event Simulation to Analyze Electronic Assembly Systems
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 
Performance Oriented Design
Performance Oriented DesignPerformance Oriented Design
Performance Oriented Design
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
 
유연하고 확장성 있는 빅데이터 처리
유연하고 확장성 있는 빅데이터 처리유연하고 확장성 있는 빅데이터 처리
유연하고 확장성 있는 빅데이터 처리
 
MEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop MicrosoftMEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop Microsoft
 
Profiling & Testing with Spark
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with Spark
 
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
 
Cloudcon East Presentation
Cloudcon East PresentationCloudcon East Presentation
Cloudcon East Presentation
 
Cloudcon East Presentation
Cloudcon East PresentationCloudcon East Presentation
Cloudcon East Presentation
 
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...
 
Partitioning CCGrid 2012
Partitioning CCGrid 2012Partitioning CCGrid 2012
Partitioning CCGrid 2012
 
Vizuri Exadata East Coast Users Conference
Vizuri Exadata East Coast Users ConferenceVizuri Exadata East Coast Users Conference
Vizuri Exadata East Coast Users Conference
 
Architecture Best Practices on Windows Azure
Architecture Best Practices on Windows AzureArchitecture Best Practices on Windows Azure
Architecture Best Practices on Windows Azure
 
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph
 
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
 
Cacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccCacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svcc
 
Betting On Data Grids
Betting On Data GridsBetting On Data Grids
Betting On Data Grids
 

Mehr von Eduserv

Phase two of OpenAthens SP evolution including OpenID connect option
Phase two of OpenAthens SP evolution including OpenID connect optionPhase two of OpenAthens SP evolution including OpenID connect option
Phase two of OpenAthens SP evolution including OpenID connect optionEduserv
 
Partnership Licensing - allowing access to licensed resources
Partnership Licensing - allowing access to licensed resources Partnership Licensing - allowing access to licensed resources
Partnership Licensing - allowing access to licensed resources Eduserv
 
Lightning talk - EBSCO
Lightning talk - EBSCOLightning talk - EBSCO
Lightning talk - EBSCOEduserv
 
Lightning talk - Boopsie
Lightning talk - BoopsieLightning talk - Boopsie
Lightning talk - BoopsieEduserv
 
Lightning talk - Softlink
Lightning talk - SoftlinkLightning talk - Softlink
Lightning talk - SoftlinkEduserv
 
Lightning talk - Third Iron BrowZine
Lightning talk - Third Iron BrowZineLightning talk - Third Iron BrowZine
Lightning talk - Third Iron BrowZineEduserv
 
Lightning talk - Eduserv Chest Agreements
Lightning talk - Eduserv Chest AgreementsLightning talk - Eduserv Chest Agreements
Lightning talk - Eduserv Chest AgreementsEduserv
 
Phase one of OpenAthens SP evolution
Phase one of OpenAthens SP evolutionPhase one of OpenAthens SP evolution
Phase one of OpenAthens SP evolutionEduserv
 
Key considerations when mapping your end user experience
Key considerations when mapping your end user experienceKey considerations when mapping your end user experience
Key considerations when mapping your end user experienceEduserv
 
Our product development methodology
Our product development methodologyOur product development methodology
Our product development methodologyEduserv
 
How Readers Discover Content
How Readers Discover ContentHow Readers Discover Content
How Readers Discover ContentEduserv
 
OpenAthens product update
OpenAthens product updateOpenAthens product update
OpenAthens product updateEduserv
 
OpenAthens Customer Conference - Welcome address
OpenAthens Customer Conference - Welcome addressOpenAthens Customer Conference - Welcome address
OpenAthens Customer Conference - Welcome addressEduserv
 
Generating leads with content marketing
Generating leads with content marketingGenerating leads with content marketing
Generating leads with content marketingEduserv
 
Pre-launch introduction to the new OpenAthens SP dashboard - 13/09/2016
Pre-launch introduction to the new OpenAthens SP dashboard - 13/09/2016Pre-launch introduction to the new OpenAthens SP dashboard - 13/09/2016
Pre-launch introduction to the new OpenAthens SP dashboard - 13/09/2016Eduserv
 
Mobius from Maplesoft
Mobius from MaplesoftMobius from Maplesoft
Mobius from MaplesoftEduserv
 
QSR NVivo
QSR NVivo QSR NVivo
QSR NVivo Eduserv
 
How Eduserv are helping local government organisations
How Eduserv are helping local government organisationsHow Eduserv are helping local government organisations
How Eduserv are helping local government organisationsEduserv
 
Is cloud the right fit for your needs?
Is cloud the right fit for your needs?Is cloud the right fit for your needs?
Is cloud the right fit for your needs?Eduserv
 
Planning your cloud strategy: Adur and Worthing Councils
Planning your cloud strategy: Adur and Worthing CouncilsPlanning your cloud strategy: Adur and Worthing Councils
Planning your cloud strategy: Adur and Worthing CouncilsEduserv
 

Mehr von Eduserv (20)

Phase two of OpenAthens SP evolution including OpenID connect option
Phase two of OpenAthens SP evolution including OpenID connect optionPhase two of OpenAthens SP evolution including OpenID connect option
Phase two of OpenAthens SP evolution including OpenID connect option
 
Partnership Licensing - allowing access to licensed resources
Partnership Licensing - allowing access to licensed resources Partnership Licensing - allowing access to licensed resources
Partnership Licensing - allowing access to licensed resources
 
Lightning talk - EBSCO
Lightning talk - EBSCOLightning talk - EBSCO
Lightning talk - EBSCO
 
Lightning talk - Boopsie
Lightning talk - BoopsieLightning talk - Boopsie
Lightning talk - Boopsie
 
Lightning talk - Softlink
Lightning talk - SoftlinkLightning talk - Softlink
Lightning talk - Softlink
 
Lightning talk - Third Iron BrowZine
Lightning talk - Third Iron BrowZineLightning talk - Third Iron BrowZine
Lightning talk - Third Iron BrowZine
 
Lightning talk - Eduserv Chest Agreements
Lightning talk - Eduserv Chest AgreementsLightning talk - Eduserv Chest Agreements
Lightning talk - Eduserv Chest Agreements
 
Phase one of OpenAthens SP evolution
Phase one of OpenAthens SP evolutionPhase one of OpenAthens SP evolution
Phase one of OpenAthens SP evolution
 
Key considerations when mapping your end user experience
Key considerations when mapping your end user experienceKey considerations when mapping your end user experience
Key considerations when mapping your end user experience
 
Our product development methodology
Our product development methodologyOur product development methodology
Our product development methodology
 
How Readers Discover Content
How Readers Discover ContentHow Readers Discover Content
How Readers Discover Content
 
OpenAthens product update
OpenAthens product updateOpenAthens product update
OpenAthens product update
 
OpenAthens Customer Conference - Welcome address
OpenAthens Customer Conference - Welcome addressOpenAthens Customer Conference - Welcome address
OpenAthens Customer Conference - Welcome address
 
Generating leads with content marketing
Generating leads with content marketingGenerating leads with content marketing
Generating leads with content marketing
 
Pre-launch introduction to the new OpenAthens SP dashboard - 13/09/2016
Pre-launch introduction to the new OpenAthens SP dashboard - 13/09/2016Pre-launch introduction to the new OpenAthens SP dashboard - 13/09/2016
Pre-launch introduction to the new OpenAthens SP dashboard - 13/09/2016
 
Mobius from Maplesoft
Mobius from MaplesoftMobius from Maplesoft
Mobius from Maplesoft
 
QSR NVivo
QSR NVivo QSR NVivo
QSR NVivo
 
How Eduserv are helping local government organisations
How Eduserv are helping local government organisationsHow Eduserv are helping local government organisations
How Eduserv are helping local government organisations
 
Is cloud the right fit for your needs?
Is cloud the right fit for your needs?Is cloud the right fit for your needs?
Is cloud the right fit for your needs?
 
Planning your cloud strategy: Adur and Worthing Councils
Planning your cloud strategy: Adur and Worthing CouncilsPlanning your cloud strategy: Adur and Worthing Councils
Planning your cloud strategy: Adur and Worthing Councils
 

Kürzlich hochgeladen

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 

Kürzlich hochgeladen (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

Big Science, Big Data: Simon Metson at Eduserv Symposium 2012

  • 1. Big Science, Big Data Simon Metson simon@cloudant.com
  • 2. Outline • Who’s this guy? • Computing for the LHC experiments • NoSQL tools • Landslide modelling • Issues arising for Universities
  • 3. Who am I? • Until March I was a research associate working on computing for CMS - one of the LHC experiments at CERN • In March I began transitioning to working for Cloudant as an “ecology engineer” • Have dealt with multi-petabyte datasets for the last 10 years
  • 5.
  • 6.
  • 7.
  • 8. Workflow ladder Number of users Large datasets (>100 TB) } Complex computation Large datasets (>100 TB) Use Grid compute and storag Simple computation exclusively Shared datasets (>500 GB) Complex computation Shared datasets (10-500 GB) } Complex computation Work on departmental resourc Shared datasets (10-100 GB) store resulting datasets to Grid st Simple computation Shared datasets (0.1-10 GB) } Work on laptop/desktop machi Simple computation Private datasets (0.1-10 GB) store resulting datasets to loca Simple computation Grid storage
  • 11. The formula Fixed Usually fixed
  • 13. People are important • Be nice to people working on weekends • The “cost” of a person is one place you can make savings - e.g. by giving them the ability to do more • Building a suitable team is hard, takes time and is essential for success
  • 14. Observation • What's interesting is that big data isn't interesting any more • Unless you are of a similar scale to Google you don't need to write your own system • Doesn't mean it's easy, though! • A terabyte was quite a lot 10 years ago, now it’s commodity hardware
  • 15.
  • 16.
  • 17. ****
  • 18. When all you have is a hammer everything looks like a nail
  • 19. General NoSQL observations • Good for startups with limited resources and exposure to risk • Good for large companies who build data centres with lots of loosely related data and large DevOps teams • How does this fit with University researchers?
  • 20. LHC computing evolution • Our current system works, but at a high staff cost • Expect simplification of system, retire bespoke components in favour of generic tools
  • 21.
  • 22. Why are landslides an issue?
  • 23. Use cases • Needs to be usable by geographically dispersed, non-expert field engineers • Need expert approval step • Need to be accessible on low end hardware • Need to run 1000’s of simulations per slope/storm and analyse result data
  • 24. Aside: Complexity Variations Cut slope Stochastic for each Slopes Output files Runtime angles parameters stochastic parameter 0.25 1 1 0 0 1 (cpu hours
  • 25. Aside: Complexity Variations Cut slope Stochastic for each Slopes Output files Runtime angles parameters stochastic parameter 6.25 1 25 0 0 25 (cpu hours
  • 26. Aside: Complexity Variations Cut slope Stochastic for each Slopes Output files Runtime angles parameters stochastic parameter 312.5 1 25 5 10 1250 (cpu hours
  • 27. Aside: Complexity Variations Cut slope Stochastic for each Slopes Output files Runtime angles parameters stochastic parameter 31250 100 25 5 10 125000 (cpu hours
  • 28. Aside: Complexity Variations Cut slope Stochastic for each Slopes Output files Runtime angles parameters stochastic parameter 3.5 100 25 5 10 125000 (years)
  • 29. Aside: Complexity • The above is for one storm, simulate many • Can easily have more stochastic parameters, or vary them in a more fine grained manner • May want to compare across software versions - standard datasets
  • 31. Use cases • Needs to be usable by geographically dispersed, non-expert field engineers • Need expert approval step • Need to be accessible on low end hardware • Need to run 1000’s of simulations per slope/storm and analyse result data mpossible scale with current tools/manpower
  • 32. Design schematic Geographers Job submi validate input daemon data Job Job Job Job Write results Job Job Job Job Job Job Field engineers Governme
  • 33. Design schematic Geographers Job submi validate input daemon data Job Job Job Job Write results Job Job Job Job Replicate Job Job Field engineers Governme Upload measurements via
  • 36. Big Data and Universities • Data intensive research will become the norm (already is in many fields) • Universities will need access to Big Data resources • Expect significant use from nontraditional fields • Expect new fields to emerge
  • 37. Workflow ladder Number of users Large datasets (>100 TB) } Complex computation Large datasets (>100 TB) Use Grid compute and storag Simple computation exclusively Shared datasets (>500 GB) Complex computation Shared datasets (10-500 GB) } Complex computation Work on departmental resourc Shared datasets (10-100 GB) store resulting datasets to Grid st Simple computation Shared datasets (0.1-10 GB) } Work on laptop/desktop machi Simple computation Private datasets (0.1-10 GB) store resulting datasets to loca Simple computation Grid storage
  • 38. Workflow ladder Number of users Large datasets (>100 TB) Complex computation Large datasets (>100 TB) Simple computation Shared datasets (>500 GB) Complex computation Shared datasets (10-500 GB) Complex computation Shared datasets (10-100 GB) Simple computation Shared datasets (0.1-10 GB) Simple computation Private datasets (0.1-10 GB) Simple computation
  • 39. Implications for the future • Quality and scale of Big Data resource will have direct impact on ability of Universities to do research at international level • Universities will need to provide data intensive compute resources to complement traditional HPC
  • 40. Implications for the future • Big data clusters are very different architecturally to HPC clusters • Data is stateful; harder to manage than HPC • Interesting legal issues arise
  • 41. Implications for the future • Cost savings from SaaS vendors can be hard to realise at an institute level • Building DevOps teams is not something University funding easily supports
  • 42. Summary • Big data is mainstream • Should be seen as an enabling technology for academics • Not trivial to adopt • Universities need to build up teams to support these activities, or find ways to out source