SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Downloaden Sie, um offline zu lesen
Digital Data Handling with
Modern Cyberinfrastructure

             Scott Teige
        steige@indiana.edu


            October 2009
Contents
•   The trend toward “born digital” data
•   The bad old days
•   The new days
•   Examples: the new, the old




                                           Scott Teige
Trends
• The US will produce 113 million medical images in the
  next year (CNN)
• CT and MRI scans are “born digital”
• Physics has a long tradition of digital data acquisition
  which continues with, for example, the latest CERN
  experiments
• Chemistry, Biology, Geology, Communication and
  Culture, Anthropology and Economics are also producing
  increasing amounts of data
• Hard drives are down to $0.07 per GigaByte, 8GB thumb
  drives are SWAG at conferences.




                                                Scott Teige
The Bad old days (~1992)




                           Scott Teige
The bad old days, part 2
• Data written from the instrument to 8mm video tape (loss
  of ~5%)
• Tapes carried from DAQ computers to analysis
  computers
• Tapes carried (courier) from instrument building to
  “storage” facility at BNL (Patty M. office bookshelves)
• 2nd pass analysis on BNL mainframes (loss ~5%)
• Tapes copied to DLT (loss ~10%)
• … years pass …
• DLT copied to HPSS (loss ~5%)




                                                Scott Teige
Almost there …
• USArray, locations of the transportable seismographs.




                                                Scott Teige
Almost there …
• Data written to a hard drive on the seismometer
• Data uplinked via cell phone or satellite to central location
• Researchers request specific portions of the data via web
  interface
• Data sent via e-mail (small request) or hard drive to
  researcher (“large” request)
• Once a year, or so, someone goes to the seismographs
  and retrieves the hard drives…




                                                    Scott Teige
A modern case
• The electron microscope in Simon Hall




                                          Scott Teige
A modern case
• Images are digitized by the instrument
• The digitized images are written directly to the Data
  Capacitor
• The Data Capacitor appears as a local file system on the
  researchers desktop computer, BigRed, Quarry and
  some other TeraGrid systems
• The researcher does quality checks, tuning, optimization,
  etc. on his local workstation.
• CPU intensive analysis is done on the large systems
  provided by IU or the TeraGrid
• Data is archived daily to the HPSS (via high bandwidth
  connection from DC to HPSS)


                                                 Scott Teige
Infrastructure, The Data Capacitor

   • >300 TeraBytes




                                     Scott Teige
Infrastructure, HPSS

• >3 PetaBytes




                       Scott Teige
Infrastructure, CPU Resources
     Big Red [TeraGrid System]
        30 TFLOPS IBM JS21 SuSE Cluster
        768 blades/3072 cores: 2.5 GHz PPC 970MP
        8GB Memory, 4 cores per blade
        Myrinet 2000
        LoadLeveler & Moab

     Quarry [Future TeraGrid System]
       7 TFLOPS IBM HS21 RHEL Cluster
       140 blades/1120 cores: 2.0 GHz Intel Xeon
         5335
       8GB Memory, 8 cores per blade
       1Gb Ethernet (upgrading to 10Gb)
       PBS (Torque) & Moab

                                               Scott Teige
Infrastructure, Network

•   10 GigE to parts of campus, 1GigE to entire system
•   4x10GigE from BigRed to DC
•   48x1GigE from Quarry to DC
•   15x10 GigE from DC to HPSS




                                                  Scott Teige
What does this give you?




                           Scott Teige
What does this give you? FAQ
• How much data can I have?
   • All of it, right now.
• Where is my data?
   • Everywhere.
• Where can I analyze my data?
   • Anywhere.
• How long can I keep my data?
   • Forever.
• Is there a backup?
   • Yes, two of them.




                                 Scott Teige
Acknowledgments
This material is based upon work supported by the National Science Foundation under
   Grant Numbers 0116050 and 0521433. Any opinions, findings and conclusions or
   recommendations expressed in this material are those of the author and do not
   necessarily reflect the views of the National Science Foundation (NSF).

This work was support in part by the Indiana Metabolomics and Cytomics Initiative
   (METACyt). METACyt is supported in part by Lilly Endowment, Inc.

This work was support in part by the Indiana Genomics Initiative. The Indiana
   Genomics Initiative of Indiana University is supported in part by Lilly Endowment,
   Inc.

This work was supported in part by Shared University Research grants from IBM, Inc.
   to Indiana University.




                                                                        Scott Teige

Weitere ähnliche Inhalte

Andere mochten auch

2010 02 instrumentation_and_runtime_measurement
2010 02 instrumentation_and_runtime_measurement2010 02 instrumentation_and_runtime_measurement
2010 02 instrumentation_and_runtime_measurementPTIHPA
 
4 HPA Examples Of Vampir Usage
4 HPA Examples Of Vampir Usage4 HPA Examples Of Vampir Usage
4 HPA Examples Of Vampir UsagePTIHPA
 
BORIS in action
BORIS in actionBORIS in action
BORIS in actionboris_vhc
 
Community
CommunityCommunity
Communityccrystle
 
Entrepreneurial goverance in het MKB. Bijdrage aan het Jaarboek Corporate Gov...
Entrepreneurial goverance in het MKB. Bijdrage aan het Jaarboek Corporate Gov...Entrepreneurial goverance in het MKB. Bijdrage aan het Jaarboek Corporate Gov...
Entrepreneurial goverance in het MKB. Bijdrage aan het Jaarboek Corporate Gov...Karin Kleingeld
 
Goed bestuur bij overdracht in familiebedrijven: loslaten en oppakken
Goed bestuur bij overdracht in familiebedrijven: loslaten en oppakkenGoed bestuur bij overdracht in familiebedrijven: loslaten en oppakken
Goed bestuur bij overdracht in familiebedrijven: loslaten en oppakkenKarin Kleingeld
 
Introduction to podcasts part 1
Introduction to podcasts part 1Introduction to podcasts part 1
Introduction to podcasts part 1rzimmerman21
 
Risk Management Webinar
Risk Management WebinarRisk Management Webinar
Risk Management Webinarjanemangat
 
Implementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
Implementing 3D SPHARM Surfaces Registration on Cell B.E. ProcessorImplementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
Implementing 3D SPHARM Surfaces Registration on Cell B.E. ProcessorPTIHPA
 
2 Vampir Trace Visualization
2 Vampir Trace Visualization2 Vampir Trace Visualization
2 Vampir Trace VisualizationPTIHPA
 
2010 03 papi_indiana
2010 03 papi_indiana2010 03 papi_indiana
2010 03 papi_indianaPTIHPA
 
Future of training and development
Future of training and developmentFuture of training and development
Future of training and developmentSAHILSHETTY91
 
Top 6 reasons ,How writing a BLOG help you change current JOB and INCOME ?
Top 6 reasons ,How writing a BLOG help you change current JOB and INCOME ?Top 6 reasons ,How writing a BLOG help you change current JOB and INCOME ?
Top 6 reasons ,How writing a BLOG help you change current JOB and INCOME ?Vijay Hole
 

Andere mochten auch (15)

2010 02 instrumentation_and_runtime_measurement
2010 02 instrumentation_and_runtime_measurement2010 02 instrumentation_and_runtime_measurement
2010 02 instrumentation_and_runtime_measurement
 
4 HPA Examples Of Vampir Usage
4 HPA Examples Of Vampir Usage4 HPA Examples Of Vampir Usage
4 HPA Examples Of Vampir Usage
 
Teste
TesteTeste
Teste
 
BORIS in action
BORIS in actionBORIS in action
BORIS in action
 
Community
CommunityCommunity
Community
 
Entrepreneurial goverance in het MKB. Bijdrage aan het Jaarboek Corporate Gov...
Entrepreneurial goverance in het MKB. Bijdrage aan het Jaarboek Corporate Gov...Entrepreneurial goverance in het MKB. Bijdrage aan het Jaarboek Corporate Gov...
Entrepreneurial goverance in het MKB. Bijdrage aan het Jaarboek Corporate Gov...
 
Goed bestuur bij overdracht in familiebedrijven: loslaten en oppakken
Goed bestuur bij overdracht in familiebedrijven: loslaten en oppakkenGoed bestuur bij overdracht in familiebedrijven: loslaten en oppakken
Goed bestuur bij overdracht in familiebedrijven: loslaten en oppakken
 
Introduction to podcasts part 1
Introduction to podcasts part 1Introduction to podcasts part 1
Introduction to podcasts part 1
 
Risk Management Webinar
Risk Management WebinarRisk Management Webinar
Risk Management Webinar
 
Implementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
Implementing 3D SPHARM Surfaces Registration on Cell B.E. ProcessorImplementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
Implementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
 
2 Vampir Trace Visualization
2 Vampir Trace Visualization2 Vampir Trace Visualization
2 Vampir Trace Visualization
 
2010 03 papi_indiana
2010 03 papi_indiana2010 03 papi_indiana
2010 03 papi_indiana
 
07-24-09 Angelman ATRI Presentation
07-24-09 Angelman ATRI Presentation07-24-09 Angelman ATRI Presentation
07-24-09 Angelman ATRI Presentation
 
Future of training and development
Future of training and developmentFuture of training and development
Future of training and development
 
Top 6 reasons ,How writing a BLOG help you change current JOB and INCOME ?
Top 6 reasons ,How writing a BLOG help you change current JOB and INCOME ?Top 6 reasons ,How writing a BLOG help you change current JOB and INCOME ?
Top 6 reasons ,How writing a BLOG help you change current JOB and INCOME ?
 

Ähnlich wie Switc Hpa

Guy Coates
Guy CoatesGuy Coates
Guy CoatesEduserv
 
Data analytics introduction
Data analytics introductionData analytics introduction
Data analytics introductionamiyadash
 
Cloud Computing,雲端運算-中研院網格計畫主持人林誠謙
Cloud Computing,雲端運算-中研院網格計畫主持人林誠謙Cloud Computing,雲端運算-中研院網格計畫主持人林誠謙
Cloud Computing,雲端運算-中研院網格計畫主持人林誠謙Tracy Chen
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsJason Riedy
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsJason Riedy
 
E Science As A Lens On The World Lazowska
E Science As A Lens On The World   LazowskaE Science As A Lens On The World   Lazowska
E Science As A Lens On The World Lazowskaguest43b4df3
 
E Science As A Lens On The World Lazowska
E Science As A Lens On The World   LazowskaE Science As A Lens On The World   Lazowska
E Science As A Lens On The World LazowskaWCET
 
Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesRukshan Batuwita
 
Mateo Valero - Big data: de la investigaciĂłn cientĂ­fica a la gestiĂłn empresarial
Mateo Valero - Big data: de la investigaciĂłn cientĂ­fica a la gestiĂłn empresarialMateo Valero - Big data: de la investigaciĂłn cientĂ­fica a la gestiĂłn empresarial
Mateo Valero - Big data: de la investigaciĂłn cientĂ­fica a la gestiĂłn empresarialFundaciĂłn RamĂłn Areces
 
High-Performance Networking Use Cases in Life Sciences
High-Performance Networking Use Cases in Life SciencesHigh-Performance Networking Use Cases in Life Sciences
High-Performance Networking Use Cases in Life SciencesAri Berman
 
Big data from the LHC commissioning: practical lessons from big science - Sim...
Big data from the LHC commissioning: practical lessons from big science - Sim...Big data from the LHC commissioning: practical lessons from big science - Sim...
Big data from the LHC commissioning: practical lessons from big science - Sim...jaxLondonConference
 
Big data
Big dataBig data
Big dataraghav125
 
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
(Em)Powering Science: High-Performance Infrastructure in Biomedical ScienceAri Berman
 
Big Data - Umesh Bellur
Big Data - Umesh BellurBig Data - Umesh Bellur
Big Data - Umesh BellurSTS FORUM 2016
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapSrinath Perera
 
WebServices_Grid.ppt
WebServices_Grid.pptWebServices_Grid.ppt
WebServices_Grid.pptEqinNiftalyev
 
Testing the Data Warehouse
Testing the Data WarehouseTesting the Data Warehouse
Testing the Data WarehouseTechWell
 
Introduction to Big Data
Introduction to Big Data Introduction to Big Data
Introduction to Big Data Srinath Perera
 

Ähnlich wie Switc Hpa (20)

Guy Coates
Guy CoatesGuy Coates
Guy Coates
 
Data analytics introduction
Data analytics introductionData analytics introduction
Data analytics introduction
 
Cloud Computing,雲端運算-中研院網格計畫主持人林誠謙
Cloud Computing,雲端運算-中研院網格計畫主持人林誠謙Cloud Computing,雲端運算-中研院網格計畫主持人林誠謙
Cloud Computing,雲端運算-中研院網格計畫主持人林誠謙
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
 
E Science As A Lens On The World Lazowska
E Science As A Lens On The World   LazowskaE Science As A Lens On The World   Lazowska
E Science As A Lens On The World Lazowska
 
E Science As A Lens On The World Lazowska
E Science As A Lens On The World   LazowskaE Science As A Lens On The World   Lazowska
E Science As A Lens On The World Lazowska
 
Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our Lives
 
Mateo Valero - Big data: de la investigaciĂłn cientĂ­fica a la gestiĂłn empresarial
Mateo Valero - Big data: de la investigaciĂłn cientĂ­fica a la gestiĂłn empresarialMateo Valero - Big data: de la investigaciĂłn cientĂ­fica a la gestiĂłn empresarial
Mateo Valero - Big data: de la investigaciĂłn cientĂ­fica a la gestiĂłn empresarial
 
High-Performance Networking Use Cases in Life Sciences
High-Performance Networking Use Cases in Life SciencesHigh-Performance Networking Use Cases in Life Sciences
High-Performance Networking Use Cases in Life Sciences
 
Big data from the LHC commissioning: practical lessons from big science - Sim...
Big data from the LHC commissioning: practical lessons from big science - Sim...Big data from the LHC commissioning: practical lessons from big science - Sim...
Big data from the LHC commissioning: practical lessons from big science - Sim...
 
Big data
Big dataBig data
Big data
 
Big Data
Big Data Big Data
Big Data
 
Welcome to big data
Welcome to big dataWelcome to big data
Welcome to big data
 
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
 
Big Data - Umesh Bellur
Big Data - Umesh BellurBig Data - Umesh Bellur
Big Data - Umesh Bellur
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
WebServices_Grid.ppt
WebServices_Grid.pptWebServices_Grid.ppt
WebServices_Grid.ppt
 
Testing the Data Warehouse
Testing the Data WarehouseTesting the Data Warehouse
Testing the Data Warehouse
 
Introduction to Big Data
Introduction to Big Data Introduction to Big Data
Introduction to Big Data
 

Mehr von PTIHPA

Github:fi Presentation
Github:fi PresentationGithub:fi Presentation
Github:fi PresentationPTIHPA
 
2010 05 hands_on
2010 05 hands_on2010 05 hands_on
2010 05 hands_onPTIHPA
 
Trace Visualization
Trace VisualizationTrace Visualization
Trace VisualizationPTIHPA
 
2010 vampir workshop_iu_configuration
2010 vampir workshop_iu_configuration2010 vampir workshop_iu_configuration
2010 vampir workshop_iu_configurationPTIHPA
 
Overview: Event Based Program Analysis
Overview: Event Based Program AnalysisOverview: Event Based Program Analysis
Overview: Event Based Program AnalysisPTIHPA
 
Statewide It Robert Henschel
Statewide It Robert HenschelStatewide It Robert Henschel
Statewide It Robert HenschelPTIHPA
 
3 Vampir Trace In Detail
3 Vampir Trace In Detail3 Vampir Trace In Detail
3 Vampir Trace In DetailPTIHPA
 
5 Vampir Configuration At IU
5 Vampir Configuration At IU5 Vampir Configuration At IU
5 Vampir Configuration At IUPTIHPA
 
1 Vampir Overview
1 Vampir Overview1 Vampir Overview
1 Vampir OverviewPTIHPA
 
GeneIndex: an open source parallel program for enumerating and locating words...
GeneIndex: an open source parallel program for enumerating and locating words...GeneIndex: an open source parallel program for enumerating and locating words...
GeneIndex: an open source parallel program for enumerating and locating words...PTIHPA
 
Big Iron and Parallel Processing, USArray Data Processing Workshop
Big Iron and Parallel Processing, USArray Data Processing WorkshopBig Iron and Parallel Processing, USArray Data Processing Workshop
Big Iron and Parallel Processing, USArray Data Processing WorkshopPTIHPA
 

Mehr von PTIHPA (11)

Github:fi Presentation
Github:fi PresentationGithub:fi Presentation
Github:fi Presentation
 
2010 05 hands_on
2010 05 hands_on2010 05 hands_on
2010 05 hands_on
 
Trace Visualization
Trace VisualizationTrace Visualization
Trace Visualization
 
2010 vampir workshop_iu_configuration
2010 vampir workshop_iu_configuration2010 vampir workshop_iu_configuration
2010 vampir workshop_iu_configuration
 
Overview: Event Based Program Analysis
Overview: Event Based Program AnalysisOverview: Event Based Program Analysis
Overview: Event Based Program Analysis
 
Statewide It Robert Henschel
Statewide It Robert HenschelStatewide It Robert Henschel
Statewide It Robert Henschel
 
3 Vampir Trace In Detail
3 Vampir Trace In Detail3 Vampir Trace In Detail
3 Vampir Trace In Detail
 
5 Vampir Configuration At IU
5 Vampir Configuration At IU5 Vampir Configuration At IU
5 Vampir Configuration At IU
 
1 Vampir Overview
1 Vampir Overview1 Vampir Overview
1 Vampir Overview
 
GeneIndex: an open source parallel program for enumerating and locating words...
GeneIndex: an open source parallel program for enumerating and locating words...GeneIndex: an open source parallel program for enumerating and locating words...
GeneIndex: an open source parallel program for enumerating and locating words...
 
Big Iron and Parallel Processing, USArray Data Processing Workshop
Big Iron and Parallel Processing, USArray Data Processing WorkshopBig Iron and Parallel Processing, USArray Data Processing Workshop
Big Iron and Parallel Processing, USArray Data Processing Workshop
 

KĂźrzlich hochgeladen

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 

KĂźrzlich hochgeladen (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

Switc Hpa

  • 1. Digital Data Handling with Modern Cyberinfrastructure Scott Teige steige@indiana.edu October 2009
  • 2. Contents • The trend toward “born digital” data • The bad old days • The new days • Examples: the new, the old Scott Teige
  • 3. Trends • The US will produce 113 million medical images in the next year (CNN) • CT and MRI scans are “born digital” • Physics has a long tradition of digital data acquisition which continues with, for example, the latest CERN experiments • Chemistry, Biology, Geology, Communication and Culture, Anthropology and Economics are also producing increasing amounts of data • Hard drives are down to $0.07 per GigaByte, 8GB thumb drives are SWAG at conferences. Scott Teige
  • 4. The Bad old days (~1992) Scott Teige
  • 5. The bad old days, part 2 • Data written from the instrument to 8mm video tape (loss of ~5%) • Tapes carried from DAQ computers to analysis computers • Tapes carried (courier) from instrument building to “storage” facility at BNL (Patty M. office bookshelves) • 2nd pass analysis on BNL mainframes (loss ~5%) • Tapes copied to DLT (loss ~10%) • … years pass … • DLT copied to HPSS (loss ~5%) Scott Teige
  • 6. Almost there … • USArray, locations of the transportable seismographs. Scott Teige
  • 7. Almost there … • Data written to a hard drive on the seismometer • Data uplinked via cell phone or satellite to central location • Researchers request specific portions of the data via web interface • Data sent via e-mail (small request) or hard drive to researcher (“large” request) • Once a year, or so, someone goes to the seismographs and retrieves the hard drives… Scott Teige
  • 8. A modern case • The electron microscope in Simon Hall Scott Teige
  • 9. A modern case • Images are digitized by the instrument • The digitized images are written directly to the Data Capacitor • The Data Capacitor appears as a local file system on the researchers desktop computer, BigRed, Quarry and some other TeraGrid systems • The researcher does quality checks, tuning, optimization, etc. on his local workstation. • CPU intensive analysis is done on the large systems provided by IU or the TeraGrid • Data is archived daily to the HPSS (via high bandwidth connection from DC to HPSS) Scott Teige
  • 10. Infrastructure, The Data Capacitor • >300 TeraBytes Scott Teige
  • 11. Infrastructure, HPSS • >3 PetaBytes Scott Teige
  • 12. Infrastructure, CPU Resources Big Red [TeraGrid System] 30 TFLOPS IBM JS21 SuSE Cluster 768 blades/3072 cores: 2.5 GHz PPC 970MP 8GB Memory, 4 cores per blade Myrinet 2000 LoadLeveler & Moab Quarry [Future TeraGrid System] 7 TFLOPS IBM HS21 RHEL Cluster 140 blades/1120 cores: 2.0 GHz Intel Xeon 5335 8GB Memory, 8 cores per blade 1Gb Ethernet (upgrading to 10Gb) PBS (Torque) & Moab Scott Teige
  • 13. Infrastructure, Network • 10 GigE to parts of campus, 1GigE to entire system • 4x10GigE from BigRed to DC • 48x1GigE from Quarry to DC • 15x10 GigE from DC to HPSS Scott Teige
  • 14. What does this give you? Scott Teige
  • 15. What does this give you? FAQ • How much data can I have? • All of it, right now. • Where is my data? • Everywhere. • Where can I analyze my data? • Anywhere. • How long can I keep my data? • Forever. • Is there a backup? • Yes, two of them. Scott Teige
  • 16. Acknowledgments This material is based upon work supported by the National Science Foundation under Grant Numbers 0116050 and 0521433. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation (NSF). This work was support in part by the Indiana Metabolomics and Cytomics Initiative (METACyt). METACyt is supported in part by Lilly Endowment, Inc. This work was support in part by the Indiana Genomics Initiative. The Indiana Genomics Initiative of Indiana University is supported in part by Lilly Endowment, Inc. This work was supported in part by Shared University Research grants from IBM, Inc. to Indiana University. Scott Teige