SlideShare a Scribd company logo
1 of 21
High Energy Physics Data
Management using
CLOUD Computing
ANALYSIS OF THE FAMOUS BABAR EXPERIMENT DATA HANDLING
PAPER BY: ABHISHEK DEY, CSE 2nd Year | DIYA GHOSH, CSE 2nd year |
Mr. SOMENATH ROY CHOWDHURY
1
Contents
 Motivation
 HEP Legacy Project
 CANFAR Astronomical Research Facility
 System Architecture
 Operational Experience
 Summary
5/25/2013
2
What exactly is BaBar?
 It’s design was motivated by the investigation
of CP violation.
 set up to understand the disparity between
the matter and antimatter content of the universe
by measuring CP violation.
 BaBar focuses on the study of CP violation in the B
meson system.
 nomenclature for the B meson (symbol B) and
its antiparticle (symbol B, pronounced B bar)
5/25/2013
3
BaBar : Data Point of View
 9.5 million lines of C++
and Fortran
 Compiled size is 30 GB
 Significant amount of
manpower is required to
maintain the software
 Each installation must be
validated before
generated results will be
accepted.
CANFAR is a partnership between :
– University of Victoria
– University of British Columbia
– National Research Council, Canadian Astronomy
Data Centre
– Herzberg Institute for Astrophysics
 Helps in providing Infrastructure for VMs.
5/25/2013
4
Need for Cloud Computing:
 Jobs are embarrassingly parallel, much
like HEP.
 Each of these surveys requires a different
processing environment, which require:
 A specific version of a Linux
distribution.
 A specific compiler version.
 Specific libraries
 Applications have little documentation.
 These environments are evolving rapidly
5/25/2013
5
DATA is precious,
too precious..
We need Infrastructure,
which comes easily as a
Service
5/25/2013
6
A word about Cloud Computing:
5/25/2013
7
IaaS: What next?
 With IaaS, we can easily create
many instances of a VM image
 How do we Manage the VMs
once booted?
 How do we get jobs to the
VMs?
5/25/2013
8
Our Solution: Cloud Scheduler + Condor
 Users create a VM with their
experiment software installed.
 A basic VM is created by one group,
and users add on their analysis or
processing software to create their
custom VM.
 Users then create batch jobs as they
would on a regular cluster, but they
specify which VM should run their
images.
CONDOR
5/25/2013
9
Steps for the successful architecture
setup:
5/25/2013
10
5/25/2013
11
5/25/2013
12
5/25/2013
13
CANFAR : MAssive Compact Halo
Objects
 Detailed re-analysis of data from
the MACHO experiment Dark
Matter search.
 Jobs perform a wget to retrieve the
input data (40 M) and have a 4-6
hour run time. Low I/O great for
clouds.
 Astronomers happy with the
environment.
5/25/2013
14
Data Handling in BaBar:
Analysis Jobs
Event data
Real Data
Simulated
Data
Configuration
BaBar
Conditions
Database
 Data is approximately 2PB.
 The file system is hosted on a
cluster of six nodes, consisting of
a Management/Metadata
server (MGS/MDS).
 five Object Storage servers
(OSS).
 single gigabit interface/VLAN to
communicate both internally
and externally.
5/25/2013
15
Xrootd : Need for Distributed Data
 Xrootd is a file server
providing byte level access
and is used by many high
energy physics experiments.
 provides access to the
distributed data.
 a read-ahead value of 1 MB
 a read-ahead cache size of
10 MB was set on each
Xrootd client
5/25/2013
16
How a DFS works?
 Blocks replicated across several
datanodes(usually 3)
 Single namenode stores metadata (file names,
block locations, etc.)
 Optimized for large files, sequential reads
 Clients read from closest replica available.(note:
locality of reference.)
 If the replication for a block drops below target, it
is automatically re-replicated.
Datanodes
1
2
3
4
1
2
4
2
1
3
1
4
3
3
2
4
Namenode
5/25/2013
17
Results and Analysis:
5/25/2013
18
Fault tolerant model:
5/25/2013
19
Acknowledgements
 A special word of appreciation and
thanks to Mr. Somenath Roy Chowdhury.
 My heartiest thanks to the entire team
who worked hard to build the cloud.
5/25/2013
20
Questions Please?
21

More Related Content

What's hot

What's hot (20)

Large Scale On-Demand Image Processing For Disaster Relief
Large Scale On-Demand Image Processing For Disaster ReliefLarge Scale On-Demand Image Processing For Disaster Relief
Large Scale On-Demand Image Processing For Disaster Relief
 
Exploring performance and energy consumption differences between recent Intel...
Exploring performance and energy consumption differences between recent Intel...Exploring performance and energy consumption differences between recent Intel...
Exploring performance and energy consumption differences between recent Intel...
 
Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)
 
High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...
High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...
High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...
 
An Overview of Bionimbus (March 2010)
An Overview of Bionimbus (March 2010)An Overview of Bionimbus (March 2010)
An Overview of Bionimbus (March 2010)
 
Hopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open WorkshopHopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open Workshop
 
Overview of DuraMat software tool development (poster version)
Overview of DuraMat software tool development(poster version)Overview of DuraMat software tool development(poster version)
Overview of DuraMat software tool development (poster version)
 
Core Objective 1: Highlights from the Central Data Resource
Core Objective 1: Highlights from the Central Data ResourceCore Objective 1: Highlights from the Central Data Resource
Core Objective 1: Highlights from the Central Data Resource
 
Real-Time Analysis of Streaming Synchotron Data: SCinet SC19 Technology Chall...
Real-Time Analysis of Streaming Synchotron Data: SCinet SC19 Technology Chall...Real-Time Analysis of Streaming Synchotron Data: SCinet SC19 Technology Chall...
Real-Time Analysis of Streaming Synchotron Data: SCinet SC19 Technology Chall...
 
Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the Continuum
 
OpenTopography - Scalable Services for Geosciences Data
OpenTopography - Scalable Services for Geosciences DataOpenTopography - Scalable Services for Geosciences Data
OpenTopography - Scalable Services for Geosciences Data
 
rasdaman: from barebone Arrays to DataCubes
rasdaman: from barebone Arrays to DataCubesrasdaman: from barebone Arrays to DataCubes
rasdaman: from barebone Arrays to DataCubes
 
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTechGeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
 
DATACUBES: Conquering Space & Time
DATACUBES: Conquering Space & TimeDATACUBES: Conquering Space & Time
DATACUBES: Conquering Space & Time
 
CLIM Program: Remote Sensing Workshop, The Earth System Grid Federation as a ...
CLIM Program: Remote Sensing Workshop, The Earth System Grid Federation as a ...CLIM Program: Remote Sensing Workshop, The Earth System Grid Federation as a ...
CLIM Program: Remote Sensing Workshop, The Earth System Grid Federation as a ...
 
Ict 2019 v2
Ict 2019 v2Ict 2019 v2
Ict 2019 v2
 
Open Science Data Cloud (IEEE Cloud 2011)
Open Science Data Cloud (IEEE Cloud 2011)Open Science Data Cloud (IEEE Cloud 2011)
Open Science Data Cloud (IEEE Cloud 2011)
 
The DuraMat Data Hub and Analytics Capability: A Resource for Solar PV Data
The DuraMat Data Hub and Analytics Capability: A Resource for Solar PV DataThe DuraMat Data Hub and Analytics Capability: A Resource for Solar PV Data
The DuraMat Data Hub and Analytics Capability: A Resource for Solar PV Data
 
DuraMat Data Analytics
DuraMat Data AnalyticsDuraMat Data Analytics
DuraMat Data Analytics
 

Viewers also liked

Fall Update on the Competitive Retail Market in ERCOT
Fall Update on the Competitive Retail Market in ERCOTFall Update on the Competitive Retail Market in ERCOT
Fall Update on the Competitive Retail Market in ERCOT
aectnet
 
How to Transform Enterprise Applications to On-premise Clouds with Wipro and ...
How to Transform Enterprise Applications to On-premise Clouds with Wipro and ...How to Transform Enterprise Applications to On-premise Clouds with Wipro and ...
How to Transform Enterprise Applications to On-premise Clouds with Wipro and ...
Eucalyptus Systems, Inc.
 
Eucalyptus - An Open-source Infrastructure for Cloud Computing
Eucalyptus - An Open-source Infrastructure for Cloud ComputingEucalyptus - An Open-source Infrastructure for Cloud Computing
Eucalyptus - An Open-source Infrastructure for Cloud Computing
elliando dias
 
Open Source Cloud Computing -Eucalyptus
Open Source Cloud Computing -EucalyptusOpen Source Cloud Computing -Eucalyptus
Open Source Cloud Computing -Eucalyptus
Sameer Naik
 

Viewers also liked (20)

Cloud computing using Eucalyptus
Cloud computing using EucalyptusCloud computing using Eucalyptus
Cloud computing using Eucalyptus
 
Eucalyptus 3 Product Overview
Eucalyptus 3 Product OverviewEucalyptus 3 Product Overview
Eucalyptus 3 Product Overview
 
Fall Update on the Competitive Retail Market in ERCOT
Fall Update on the Competitive Retail Market in ERCOTFall Update on the Competitive Retail Market in ERCOT
Fall Update on the Competitive Retail Market in ERCOT
 
Forest Governance In Malaysia
Forest Governance In MalaysiaForest Governance In Malaysia
Forest Governance In Malaysia
 
Building your own personal cloud with Eucalyptus
Building your own personal cloud with EucalyptusBuilding your own personal cloud with Eucalyptus
Building your own personal cloud with Eucalyptus
 
En acabar una lectura… què podem fer a l’aula?
En acabar una lectura…què podem fer a l’aula?En acabar una lectura…què podem fer a l’aula?
En acabar una lectura… què podem fer a l’aula?
 
Green IT matters at Wipro Ltd
Green IT matters at Wipro LtdGreen IT matters at Wipro Ltd
Green IT matters at Wipro Ltd
 
Eucalyptus gnuNify 2012
Eucalyptus gnuNify 2012 Eucalyptus gnuNify 2012
Eucalyptus gnuNify 2012
 
Eucalyptus: Open Source for Cloud Computing
Eucalyptus: Open Source for Cloud ComputingEucalyptus: Open Source for Cloud Computing
Eucalyptus: Open Source for Cloud Computing
 
How to Transform Enterprise Applications to On-premise Clouds with Wipro and ...
How to Transform Enterprise Applications to On-premise Clouds with Wipro and ...How to Transform Enterprise Applications to On-premise Clouds with Wipro and ...
How to Transform Enterprise Applications to On-premise Clouds with Wipro and ...
 
Eucalyptus - An Open-source Infrastructure for Cloud Computing
Eucalyptus - An Open-source Infrastructure for Cloud ComputingEucalyptus - An Open-source Infrastructure for Cloud Computing
Eucalyptus - An Open-source Infrastructure for Cloud Computing
 
Eucalyptus - Open Source Infrastructure-as-a-Service
Eucalyptus - Open Source Infrastructure-as-a-ServiceEucalyptus - Open Source Infrastructure-as-a-Service
Eucalyptus - Open Source Infrastructure-as-a-Service
 
CloudStack Architecture
CloudStack ArchitectureCloudStack Architecture
CloudStack Architecture
 
CloudStack vs OpenStack
CloudStack vs OpenStackCloudStack vs OpenStack
CloudStack vs OpenStack
 
CloudStack vs Openstack
CloudStack vs OpenstackCloudStack vs Openstack
CloudStack vs Openstack
 
Cloud Computing Architecture
Cloud Computing Architecture Cloud Computing Architecture
Cloud Computing Architecture
 
Value Stream Mapping VSM Mapeo de la Cadena de Valor - Lean Manufacturing -
Value Stream Mapping VSM Mapeo de la Cadena de Valor - Lean Manufacturing -Value Stream Mapping VSM Mapeo de la Cadena de Valor - Lean Manufacturing -
Value Stream Mapping VSM Mapeo de la Cadena de Valor - Lean Manufacturing -
 
Open Source Cloud Computing -Eucalyptus
Open Source Cloud Computing -EucalyptusOpen Source Cloud Computing -Eucalyptus
Open Source Cloud Computing -Eucalyptus
 
Cloud computing architecture and vulnerabilies
Cloud computing architecture and vulnerabiliesCloud computing architecture and vulnerabilies
Cloud computing architecture and vulnerabilies
 
RADOS for Eucalyptus
RADOS for EucalyptusRADOS for Eucalyptus
RADOS for Eucalyptus
 

Similar to Handling High Energy Physics Data using Cloud Computing

Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIArm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
inside-BigData.com
 

Similar to Handling High Energy Physics Data using Cloud Computing (20)

Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​
 
Summary of the Deployment Scenarios and Functional Requirements
Summary of the Deployment Scenarios and Functional RequirementsSummary of the Deployment Scenarios and Functional Requirements
Summary of the Deployment Scenarios and Functional Requirements
 
Larry Smarr - NRP Application Drivers
Larry Smarr - NRP Application DriversLarry Smarr - NRP Application Drivers
Larry Smarr - NRP Application Drivers
 
Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceBig Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental Science
 
Low Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard PlatformLow Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard Platform
 
"Cloud Computing for HPC"
"Cloud Computing for HPC""Cloud Computing for HPC"
"Cloud Computing for HPC"
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009
 
Security Challenges and the Pacific Research Platform
Security Challenges and the Pacific Research PlatformSecurity Challenges and the Pacific Research Platform
Security Challenges and the Pacific Research Platform
 
Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analys...
Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analys...Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analys...
Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analys...
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009
 
Applying Photonics to User Needs: The Application Challenge
Applying Photonics to User Needs: The Application ChallengeApplying Photonics to User Needs: The Application Challenge
Applying Photonics to User Needs: The Application Challenge
 
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIArm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
 
OpenACC and Open Hackathons Monthly Highlights May 2023.pdf
OpenACC and Open Hackathons Monthly Highlights May  2023.pdfOpenACC and Open Hackathons Monthly Highlights May  2023.pdf
OpenACC and Open Hackathons Monthly Highlights May 2023.pdf
 
Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010
 
DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …
 
Blue Waters and Resource Management - Now and in the Future
 Blue Waters and Resource Management - Now and in the Future Blue Waters and Resource Management - Now and in the Future
Blue Waters and Resource Management - Now and in the Future
 
Riding the Light: How Dedicated Optical Circuits are Enabling New Science
Riding the Light: How Dedicated Optical Circuits are Enabling New ScienceRiding the Light: How Dedicated Optical Circuits are Enabling New Science
Riding the Light: How Dedicated Optical Circuits are Enabling New Science
 
The Coming Age of Extreme Heterogeneity in HPC
The Coming Age of Extreme Heterogeneity in HPCThe Coming Age of Extreme Heterogeneity in HPC
The Coming Age of Extreme Heterogeneity in HPC
 
Big data analytics_7_giants_public_24_sep_2013
Big data analytics_7_giants_public_24_sep_2013Big data analytics_7_giants_public_24_sep_2013
Big data analytics_7_giants_public_24_sep_2013
 
The next generation of the Montage image mosaic engine
The next generation of the Montage image mosaic engineThe next generation of the Montage image mosaic engine
The next generation of the Montage image mosaic engine
 

Recently uploaded

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 

Recently uploaded (20)

Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 

Handling High Energy Physics Data using Cloud Computing

  • 1. High Energy Physics Data Management using CLOUD Computing ANALYSIS OF THE FAMOUS BABAR EXPERIMENT DATA HANDLING PAPER BY: ABHISHEK DEY, CSE 2nd Year | DIYA GHOSH, CSE 2nd year | Mr. SOMENATH ROY CHOWDHURY 1
  • 2. Contents  Motivation  HEP Legacy Project  CANFAR Astronomical Research Facility  System Architecture  Operational Experience  Summary 5/25/2013 2
  • 3. What exactly is BaBar?  It’s design was motivated by the investigation of CP violation.  set up to understand the disparity between the matter and antimatter content of the universe by measuring CP violation.  BaBar focuses on the study of CP violation in the B meson system.  nomenclature for the B meson (symbol B) and its antiparticle (symbol B, pronounced B bar) 5/25/2013 3
  • 4. BaBar : Data Point of View  9.5 million lines of C++ and Fortran  Compiled size is 30 GB  Significant amount of manpower is required to maintain the software  Each installation must be validated before generated results will be accepted. CANFAR is a partnership between : – University of Victoria – University of British Columbia – National Research Council, Canadian Astronomy Data Centre – Herzberg Institute for Astrophysics  Helps in providing Infrastructure for VMs. 5/25/2013 4
  • 5. Need for Cloud Computing:  Jobs are embarrassingly parallel, much like HEP.  Each of these surveys requires a different processing environment, which require:  A specific version of a Linux distribution.  A specific compiler version.  Specific libraries  Applications have little documentation.  These environments are evolving rapidly 5/25/2013 5
  • 6. DATA is precious, too precious.. We need Infrastructure, which comes easily as a Service 5/25/2013 6
  • 7. A word about Cloud Computing: 5/25/2013 7
  • 8. IaaS: What next?  With IaaS, we can easily create many instances of a VM image  How do we Manage the VMs once booted?  How do we get jobs to the VMs? 5/25/2013 8
  • 9. Our Solution: Cloud Scheduler + Condor  Users create a VM with their experiment software installed.  A basic VM is created by one group, and users add on their analysis or processing software to create their custom VM.  Users then create batch jobs as they would on a regular cluster, but they specify which VM should run their images. CONDOR 5/25/2013 9
  • 10. Steps for the successful architecture setup: 5/25/2013 10
  • 14. CANFAR : MAssive Compact Halo Objects  Detailed re-analysis of data from the MACHO experiment Dark Matter search.  Jobs perform a wget to retrieve the input data (40 M) and have a 4-6 hour run time. Low I/O great for clouds.  Astronomers happy with the environment. 5/25/2013 14
  • 15. Data Handling in BaBar: Analysis Jobs Event data Real Data Simulated Data Configuration BaBar Conditions Database  Data is approximately 2PB.  The file system is hosted on a cluster of six nodes, consisting of a Management/Metadata server (MGS/MDS).  five Object Storage servers (OSS).  single gigabit interface/VLAN to communicate both internally and externally. 5/25/2013 15
  • 16. Xrootd : Need for Distributed Data  Xrootd is a file server providing byte level access and is used by many high energy physics experiments.  provides access to the distributed data.  a read-ahead value of 1 MB  a read-ahead cache size of 10 MB was set on each Xrootd client 5/25/2013 16
  • 17. How a DFS works?  Blocks replicated across several datanodes(usually 3)  Single namenode stores metadata (file names, block locations, etc.)  Optimized for large files, sequential reads  Clients read from closest replica available.(note: locality of reference.)  If the replication for a block drops below target, it is automatically re-replicated. Datanodes 1 2 3 4 1 2 4 2 1 3 1 4 3 3 2 4 Namenode 5/25/2013 17
  • 20. Acknowledgements  A special word of appreciation and thanks to Mr. Somenath Roy Chowdhury.  My heartiest thanks to the entire team who worked hard to build the cloud. 5/25/2013 20