SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Downloaden Sie, um offline zu lesen
Lincoln Bryant • University of Chicago
Ceph Day Chicago
August 18th, 2015
Using Ceph for Large
Hadron Collider Data
about us
●
○
■
○
●
○
○
●
○
○
ATLAS site
5.4 miles
(17 mi circumference)
⇐ Standard Model of Particle
Physics
Higgs boson: final piece
discoved in 2012
⇒ Nobel Prize
2015 → 2018:
Cool new physics searches
underway at 13 TeV
← credit: Katherine Leney
(UCL, March 2015)
LHC
ATLAS detector
● Run2 center of mass energy = 13 TeV (Run1: 8 TeV)
● 40 MHz proton bunch crossing rate
○ 20-50 collisions/bunch crossing (“pileup”)
● Trigger (filters) reduces raw rate to ~ 1kHz
● Events are written to disk at ~ 1.5 GB/s
LHC
ATLAS detector
100M active sensors
torroid
magnets
inner
tracking
person
(scale)
Not shown:
Tile calorimeters
(electrons,
photons)
Liquid argon
calorimeter
(hadrons)
Muon chambers
Forward detectors
ATLAS
data &
analysis
Primary data from CERN
globally processed (event
reconstruction and analysis)
Role for
Ceph:
analysis
datasets &
object store for
single events
3x100 Gbps
Ceph technologies used
● Currently:
○ RBD
○ CephFS
● Future:
○ librados
○ RadosGW
Our setup
● Ceph v0.94.2 on Scientific Linux 6.6
● 14 storage servers
● 12 x 6 TB disks, no dedicated journal devices
○ Could buy PCI-E SSD(s) if the performance is needed
● Each connected at 10 Gbps
● Mons and MDS virtualized
● CephFS pools using erasure coding + cache
tiering
Ceph Storage Element
● ATLAS uses the Open Science Grid
middleware in the US
○ among other things: facilitates data management and
transfer between sites
● Typical sites will use Lustre, dCache, etc as the
“storage element” (SE)
● Goal: Build and productionize a storage
element based on Ceph
XRootD
● Primary file access protocol for accessing files
within ATLAS
● Developed by Stanford Linear Accelerator
(SLAC)
● Built to support standard high-energy physics
analysis tools (e.g., ROOT)
○ Supports remote reads, caching, etc
● Federated over WAN via hierarchical system of
‘redirectors’
Ceph and XRootD
● How to pair our favorite access protocol with
our favorite storage platform?
Ceph and XRootD
● How to pair our favorite access protocol with
our favorite storage platform?
● Original approach: RBD + XRootD
○ Performance was acceptable
○ Problem: RBD only mounted on 1 machine
■ Can only run one XRootD server
○ Could create new RBDs and add to XRootD cluster to
scale out
■ Problem: NFS exports for interactive users become
a lot trickier
Ceph and XRootD
● Current approach: CephFS + XRootD
○ All XRootD servers mount CephFS via kernel client
■ Scale out is a breeze
○ Fully POSIX filesystem, integrates simply with existing
infrastructure
● Problem: Users want to r/w to the filesystem
directly via CephFS, but XRootD needs to own
the files it serves
○ Permissions issues galore
Squashing with Ganesha NFS
● XRootD does not run in a privileged mode
○ Cannot modify/delete files written by users
○ Users can’t modify/delete files owned by XRootD
● How to allow users to read/write via FS mount?
● Using Ganesha to export CephFS as NFS and
squash all users to the XRootD user
○ Doesn’t prevent users from stomping on each other’s
files, but works well enough in practice
Transfers from CERN to Chicago
● Using Ceph as the backend store for data
from the LHC
● Analysis input data sets for regional physics
analysis
● Easily obtain 200 MB/s from Geneva to our
Ceph storage system in Chicago
How does it look in practice?
● Pretty good!
Potential evaluations
● XRootD with librados plugin
○ Skip the filesystem, write directly to object store
○ XRootD handles POSIX filesystem semantics as a
pseudo-MDS
○ Three ways of accessing:
■ Directly access files via XRootD clients
■ Mount XRootD via FUSE client
■ LD_PRELOAD hook to intercept system calls to
/xrootd
Cycle scavenging Ceph servers
Ceph and the batch system
● Goal: Run Ceph and user analysis jobs on the
same machines
● Problem: Poorly defined jobs can wreak havoc
on the Ceph cluster
○ e.g., machine starts heavily swapping, OOM killer starts
killing random processes including OSDs, load spikes to
hundreds, etc..
Ceph and the batch system
● Solution: control groups (cgroups)
● Configured batch system (HTCondor) to use
cgroups to limit the amount of CPU/RAM used
on a per-job basis
● We let HTCondor scavenge about 80% of the
cycles
○ May need to be tweaked as our Ceph usage increases.
Ceph and the batch system
● Working well thus far:
Ceph and the batch system
● Further work in this area:
○ Need to configure the batch system to immediately kill
jobs when Ceph-related load goes up
■ e.g., disk failure
○ Re-nice OSDs to maximum priority
○ May require investigation into limiting network saturation
ATLAS Event Service and RadosGW
Higgs boson detection
ATLAS Event Service
● Deliver single ATLAS events for processing
○ Rather than a complete dataset - “fine grained”
● Able to efficiently fill opportunistic resources
like AWS instances (spot pricing), semi-idle
HPC clusters, BOINC
● Can be evicted from resources immediately
with negligible loss of work
● Output data is streamed to remote object
storage
ATLAS Event Service
● Rather than pay for S3, RadosGW fits this use
case perfectly
● Colleagues at Brookhaven National Lab have
deployed a test instance already
○ interested in providing this service as well
○ could potentially federate gateways
● Still in the pre-planning stage at our site
Final thoughts
June 2015 event
17 p-p collisions
in one event
Final thoughts
● Overall, quite happy with Ceph
○ Storage endpoint should be in production soon
○ More nodes on the way: plan to expand to 2 PB
● Looking forward to new CephFS features like quotas,
offline fsck, etc
● Will be experimenting with Ceph pools shared between
data centers with low RTT ping in the near future
● Expect Ceph to play important role in ATLAS data
processing ⇒ new discoveries
Questions?
cleaning up
inside ATLAS :)

Weitere ähnliche Inhalte

Was ist angesagt?

Moving from CellsV1 to CellsV2 at CERN
Moving from CellsV1 to CellsV2 at CERNMoving from CellsV1 to CellsV2 at CERN
Moving from CellsV1 to CellsV2 at CERNBelmiro Moreira
 
RDFox Poster
RDFox PosterRDFox Poster
RDFox PosterDBOnto
 
Future Science on Future OpenStack
Future Science on Future OpenStackFuture Science on Future OpenStack
Future Science on Future OpenStackBelmiro Moreira
 
KubeCon EU 2019 - P2P Docker Image Distribution in Hybrid Cloud Environment w...
KubeCon EU 2019 - P2P Docker Image Distribution in Hybrid Cloud Environment w...KubeCon EU 2019 - P2P Docker Image Distribution in Hybrid Cloud Environment w...
KubeCon EU 2019 - P2P Docker Image Distribution in Hybrid Cloud Environment w...Yiran Wang
 
Object Compaction in Cloud for High Yield
Object Compaction in Cloud for High YieldObject Compaction in Cloud for High Yield
Object Compaction in Cloud for High YieldScyllaDB
 
Declarative Infrastructure Tools
Declarative Infrastructure Tools Declarative Infrastructure Tools
Declarative Infrastructure Tools Yulia Shcherbachova
 
Containers on Baremetal and Preemptible VMs at CERN and SKA
Containers on Baremetal and Preemptible VMs at CERN and SKAContainers on Baremetal and Preemptible VMs at CERN and SKA
Containers on Baremetal and Preemptible VMs at CERN and SKABelmiro Moreira
 
Anatomy of an action
Anatomy of an actionAnatomy of an action
Anatomy of an actionGordon Chung
 
Building AuroraObjects- Ceph Day Frankfurt
Building AuroraObjects- Ceph Day Frankfurt Building AuroraObjects- Ceph Day Frankfurt
Building AuroraObjects- Ceph Day Frankfurt Ceph Community
 
The Dark Side Of Go -- Go runtime related problems in TiDB in production
The Dark Side Of Go -- Go runtime related problems in TiDB  in productionThe Dark Side Of Go -- Go runtime related problems in TiDB  in production
The Dark Side Of Go -- Go runtime related problems in TiDB in productionPingCAP
 
Open stack @ iiit hyderabad
Open stack @ iiit hyderabad Open stack @ iiit hyderabad
Open stack @ iiit hyderabad openstackindia
 
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Apache Flink Training Workshop @ HadoopCon2016 - #1 System OverviewApache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Apache Flink Training Workshop @ HadoopCon2016 - #1 System OverviewApache Flink Taiwan User Group
 
"Metrics: Where and How", Vsevolod Polyakov
"Metrics: Where and How", Vsevolod Polyakov"Metrics: Where and How", Vsevolod Polyakov
"Metrics: Where and How", Vsevolod PolyakovYulia Shcherbachova
 
Golang in TiDB (GopherChina 2017)
Golang in TiDB  (GopherChina 2017)Golang in TiDB  (GopherChina 2017)
Golang in TiDB (GopherChina 2017)PingCAP
 
Efficient and Fast Time Series Storage - The missing link in dynamic software...
Efficient and Fast Time Series Storage - The missing link in dynamic software...Efficient and Fast Time Series Storage - The missing link in dynamic software...
Efficient and Fast Time Series Storage - The missing link in dynamic software...Florian Lautenschlager
 
Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)Eric Evans
 
20170926 cern cloud v4
20170926 cern cloud v420170926 cern cloud v4
20170926 cern cloud v4Tim Bell
 
Cern Cloud Architecture - February, 2016
Cern Cloud Architecture - February, 2016Cern Cloud Architecture - February, 2016
Cern Cloud Architecture - February, 2016Belmiro Moreira
 
Responsive Distributed Routing Algorithm
Responsive Distributed Routing AlgorithmResponsive Distributed Routing Algorithm
Responsive Distributed Routing AlgorithmNafiz Ishtiaque Ahmed
 

Was ist angesagt? (20)

Moving from CellsV1 to CellsV2 at CERN
Moving from CellsV1 to CellsV2 at CERNMoving from CellsV1 to CellsV2 at CERN
Moving from CellsV1 to CellsV2 at CERN
 
RDFox Poster
RDFox PosterRDFox Poster
RDFox Poster
 
Future Science on Future OpenStack
Future Science on Future OpenStackFuture Science on Future OpenStack
Future Science on Future OpenStack
 
KubeCon EU 2019 - P2P Docker Image Distribution in Hybrid Cloud Environment w...
KubeCon EU 2019 - P2P Docker Image Distribution in Hybrid Cloud Environment w...KubeCon EU 2019 - P2P Docker Image Distribution in Hybrid Cloud Environment w...
KubeCon EU 2019 - P2P Docker Image Distribution in Hybrid Cloud Environment w...
 
Object Compaction in Cloud for High Yield
Object Compaction in Cloud for High YieldObject Compaction in Cloud for High Yield
Object Compaction in Cloud for High Yield
 
Declarative Infrastructure Tools
Declarative Infrastructure Tools Declarative Infrastructure Tools
Declarative Infrastructure Tools
 
Containers on Baremetal and Preemptible VMs at CERN and SKA
Containers on Baremetal and Preemptible VMs at CERN and SKAContainers on Baremetal and Preemptible VMs at CERN and SKA
Containers on Baremetal and Preemptible VMs at CERN and SKA
 
Anatomy of an action
Anatomy of an actionAnatomy of an action
Anatomy of an action
 
Building AuroraObjects- Ceph Day Frankfurt
Building AuroraObjects- Ceph Day Frankfurt Building AuroraObjects- Ceph Day Frankfurt
Building AuroraObjects- Ceph Day Frankfurt
 
The Dark Side Of Go -- Go runtime related problems in TiDB in production
The Dark Side Of Go -- Go runtime related problems in TiDB  in productionThe Dark Side Of Go -- Go runtime related problems in TiDB  in production
The Dark Side Of Go -- Go runtime related problems in TiDB in production
 
Open stack @ iiit hyderabad
Open stack @ iiit hyderabad Open stack @ iiit hyderabad
Open stack @ iiit hyderabad
 
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Apache Flink Training Workshop @ HadoopCon2016 - #1 System OverviewApache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
 
"Metrics: Where and How", Vsevolod Polyakov
"Metrics: Where and How", Vsevolod Polyakov"Metrics: Where and How", Vsevolod Polyakov
"Metrics: Where and How", Vsevolod Polyakov
 
Golang in TiDB (GopherChina 2017)
Golang in TiDB  (GopherChina 2017)Golang in TiDB  (GopherChina 2017)
Golang in TiDB (GopherChina 2017)
 
Efficient and Fast Time Series Storage - The missing link in dynamic software...
Efficient and Fast Time Series Storage - The missing link in dynamic software...Efficient and Fast Time Series Storage - The missing link in dynamic software...
Efficient and Fast Time Series Storage - The missing link in dynamic software...
 
Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)
 
Kraken mesoscon 2018
Kraken mesoscon 2018Kraken mesoscon 2018
Kraken mesoscon 2018
 
20170926 cern cloud v4
20170926 cern cloud v420170926 cern cloud v4
20170926 cern cloud v4
 
Cern Cloud Architecture - February, 2016
Cern Cloud Architecture - February, 2016Cern Cloud Architecture - February, 2016
Cern Cloud Architecture - February, 2016
 
Responsive Distributed Routing Algorithm
Responsive Distributed Routing AlgorithmResponsive Distributed Routing Algorithm
Responsive Distributed Routing Algorithm
 

Andere mochten auch

Ceph Day Shanghai - Ceph in Chinau Unicom Labs
Ceph Day Shanghai - Ceph in Chinau Unicom LabsCeph Day Shanghai - Ceph in Chinau Unicom Labs
Ceph Day Shanghai - Ceph in Chinau Unicom LabsCeph Community
 
Ceph Day Chicago - Deploying flash storage for Ceph without compromising perf...
Ceph Day Chicago - Deploying flash storage for Ceph without compromising perf...Ceph Day Chicago - Deploying flash storage for Ceph without compromising perf...
Ceph Day Chicago - Deploying flash storage for Ceph without compromising perf...Ceph Community
 
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...Ceph Community
 
Ceph Day Chicago - Brining Ceph Storage to the Enterprise
Ceph Day Chicago - Brining Ceph Storage to the Enterprise Ceph Day Chicago - Brining Ceph Storage to the Enterprise
Ceph Day Chicago - Brining Ceph Storage to the Enterprise Ceph Community
 
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph Ceph Community
 
Ceph Day Seoul - Ceph: a decade in the making and still going strong
Ceph Day Seoul - Ceph: a decade in the making and still going strong Ceph Day Seoul - Ceph: a decade in the making and still going strong
Ceph Day Seoul - Ceph: a decade in the making and still going strong Ceph Community
 
Ceph Day Shanghai - Community Update
Ceph Day Shanghai - Community Update Ceph Day Shanghai - Community Update
Ceph Day Shanghai - Community Update Ceph Community
 
Ceph Day Shanghai - On the Productization Practice of Ceph
Ceph Day Shanghai - On the Productization Practice of Ceph Ceph Day Shanghai - On the Productization Practice of Ceph
Ceph Day Shanghai - On the Productization Practice of Ceph Ceph Community
 
Ceph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-GeneCeph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-GeneCeph Community
 
Ceph Day Chicago - Supermicro Ceph - Open SolutionsDefined by Workload
Ceph Day Chicago - Supermicro Ceph - Open SolutionsDefined by WorkloadCeph Day Chicago - Supermicro Ceph - Open SolutionsDefined by Workload
Ceph Day Chicago - Supermicro Ceph - Open SolutionsDefined by WorkloadCeph Community
 
Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions Ceph Community
 
Ceph Day Taipei - Community Update
Ceph Day Taipei - Community Update Ceph Day Taipei - Community Update
Ceph Day Taipei - Community Update Ceph Community
 
Ceph Tech Talk -- Ceph Benchmarking Tool
Ceph Tech Talk -- Ceph Benchmarking ToolCeph Tech Talk -- Ceph Benchmarking Tool
Ceph Tech Talk -- Ceph Benchmarking ToolCeph Community
 
Ceph Day Chicago - Ceph at work at Bloomberg
Ceph Day Chicago - Ceph at work at Bloomberg Ceph Day Chicago - Ceph at work at Bloomberg
Ceph Day Chicago - Ceph at work at Bloomberg Ceph Community
 
2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on CephCeph Community
 
Ceph Day Taipei - Ceph on All-Flash Storage
Ceph Day Taipei - Ceph on All-Flash Storage Ceph Day Taipei - Ceph on All-Flash Storage
Ceph Day Taipei - Ceph on All-Flash Storage Ceph Community
 
Ceph Day Shanghai - Ceph Performance Tools
Ceph Day Shanghai - Ceph Performance Tools Ceph Day Shanghai - Ceph Performance Tools
Ceph Day Shanghai - Ceph Performance Tools Ceph Community
 
Ceph Day Taipei - Ceph Tiering with High Performance Architecture
Ceph Day Taipei - Ceph Tiering with High Performance Architecture Ceph Day Taipei - Ceph Tiering with High Performance Architecture
Ceph Day Taipei - Ceph Tiering with High Performance Architecture Ceph Community
 
iSCSI Target Support for Ceph
iSCSI Target Support for Ceph iSCSI Target Support for Ceph
iSCSI Target Support for Ceph Ceph Community
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community
 

Andere mochten auch (20)

Ceph Day Shanghai - Ceph in Chinau Unicom Labs
Ceph Day Shanghai - Ceph in Chinau Unicom LabsCeph Day Shanghai - Ceph in Chinau Unicom Labs
Ceph Day Shanghai - Ceph in Chinau Unicom Labs
 
Ceph Day Chicago - Deploying flash storage for Ceph without compromising perf...
Ceph Day Chicago - Deploying flash storage for Ceph without compromising perf...Ceph Day Chicago - Deploying flash storage for Ceph without compromising perf...
Ceph Day Chicago - Deploying flash storage for Ceph without compromising perf...
 
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
 
Ceph Day Chicago - Brining Ceph Storage to the Enterprise
Ceph Day Chicago - Brining Ceph Storage to the Enterprise Ceph Day Chicago - Brining Ceph Storage to the Enterprise
Ceph Day Chicago - Brining Ceph Storage to the Enterprise
 
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
 
Ceph Day Seoul - Ceph: a decade in the making and still going strong
Ceph Day Seoul - Ceph: a decade in the making and still going strong Ceph Day Seoul - Ceph: a decade in the making and still going strong
Ceph Day Seoul - Ceph: a decade in the making and still going strong
 
Ceph Day Shanghai - Community Update
Ceph Day Shanghai - Community Update Ceph Day Shanghai - Community Update
Ceph Day Shanghai - Community Update
 
Ceph Day Shanghai - On the Productization Practice of Ceph
Ceph Day Shanghai - On the Productization Practice of Ceph Ceph Day Shanghai - On the Productization Practice of Ceph
Ceph Day Shanghai - On the Productization Practice of Ceph
 
Ceph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-GeneCeph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-Gene
 
Ceph Day Chicago - Supermicro Ceph - Open SolutionsDefined by Workload
Ceph Day Chicago - Supermicro Ceph - Open SolutionsDefined by WorkloadCeph Day Chicago - Supermicro Ceph - Open SolutionsDefined by Workload
Ceph Day Chicago - Supermicro Ceph - Open SolutionsDefined by Workload
 
Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions
 
Ceph Day Taipei - Community Update
Ceph Day Taipei - Community Update Ceph Day Taipei - Community Update
Ceph Day Taipei - Community Update
 
Ceph Tech Talk -- Ceph Benchmarking Tool
Ceph Tech Talk -- Ceph Benchmarking ToolCeph Tech Talk -- Ceph Benchmarking Tool
Ceph Tech Talk -- Ceph Benchmarking Tool
 
Ceph Day Chicago - Ceph at work at Bloomberg
Ceph Day Chicago - Ceph at work at Bloomberg Ceph Day Chicago - Ceph at work at Bloomberg
Ceph Day Chicago - Ceph at work at Bloomberg
 
2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph
 
Ceph Day Taipei - Ceph on All-Flash Storage
Ceph Day Taipei - Ceph on All-Flash Storage Ceph Day Taipei - Ceph on All-Flash Storage
Ceph Day Taipei - Ceph on All-Flash Storage
 
Ceph Day Shanghai - Ceph Performance Tools
Ceph Day Shanghai - Ceph Performance Tools Ceph Day Shanghai - Ceph Performance Tools
Ceph Day Shanghai - Ceph Performance Tools
 
Ceph Day Taipei - Ceph Tiering with High Performance Architecture
Ceph Day Taipei - Ceph Tiering with High Performance Architecture Ceph Day Taipei - Ceph Tiering with High Performance Architecture
Ceph Day Taipei - Ceph Tiering with High Performance Architecture
 
iSCSI Target Support for Ceph
iSCSI Target Support for Ceph iSCSI Target Support for Ceph
iSCSI Target Support for Ceph
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 

Ähnlich wie Ceph Day Chicago: Using Ceph for Large Hadron Collider Data

Lessons learned from shifting real data around: An ad hoc data challenge from...
Lessons learned from shifting real data around: An ad hoc data challenge from...Lessons learned from shifting real data around: An ad hoc data challenge from...
Lessons learned from shifting real data around: An ad hoc data challenge from...Jisc
 
Enabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioEnabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioAlluxio, Inc.
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introductionkanedafromparis
 
Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Ceph Community
 
Ceph in 2023 and Beyond.pdf
Ceph in 2023 and Beyond.pdfCeph in 2023 and Beyond.pdf
Ceph in 2023 and Beyond.pdfClyso GmbH
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Junping Du
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateDataWorks Summit
 
Hadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of OzoneHadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of OzoneErik Krogen
 
OpenStackTage Cologne - OpenStack at 99.999% availability with Ceph
OpenStackTage Cologne - OpenStack at 99.999% availability with CephOpenStackTage Cologne - OpenStack at 99.999% availability with Ceph
OpenStackTage Cologne - OpenStack at 99.999% availability with CephDanny Al-Gaaf
 
Openstack For Beginners
Openstack For BeginnersOpenstack For Beginners
Openstack For Beginnerscpallares
 
Erasure Code at Scale - Thomas William Byrne
Erasure Code at Scale - Thomas William ByrneErasure Code at Scale - Thomas William Byrne
Erasure Code at Scale - Thomas William ByrneCeph Community
 
99.999% Available OpenStack Cloud - A Builder's Guide
99.999% Available OpenStack Cloud - A Builder's Guide99.999% Available OpenStack Cloud - A Builder's Guide
99.999% Available OpenStack Cloud - A Builder's GuideDanny Al-Gaaf
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Dave Holland
 
Initial presentation of swift (for montreal user group)
Initial presentation of swift (for montreal user group)Initial presentation of swift (for montreal user group)
Initial presentation of swift (for montreal user group)Marcos García
 
Skyhook: Towards an Arrow-Native Storage System, CCGrid 2022
Skyhook: Towards an Arrow-Native Storage System, CCGrid 2022Skyhook: Towards an Arrow-Native Storage System, CCGrid 2022
Skyhook: Towards an Arrow-Native Storage System, CCGrid 2022JayjeetChakraborty
 
Netflix running Presto in the AWS Cloud
Netflix running Presto in the AWS CloudNetflix running Presto in the AWS Cloud
Netflix running Presto in the AWS CloudZhenxiao Luo
 
Towards constrained semantic web
Towards constrained semantic webTowards constrained semantic web
Towards constrained semantic web☕ Remy Rojas
 
20140120 presto meetup_en
20140120 presto meetup_en20140120 presto meetup_en
20140120 presto meetup_enOgibayashi
 
Challenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan LambrightChallenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan LambrightGluster.org
 

Ähnlich wie Ceph Day Chicago: Using Ceph for Large Hadron Collider Data (20)

Lessons learned from shifting real data around: An ad hoc data challenge from...
Lessons learned from shifting real data around: An ad hoc data challenge from...Lessons learned from shifting real data around: An ad hoc data challenge from...
Lessons learned from shifting real data around: An ad hoc data challenge from...
 
DEVIEW 2013
DEVIEW 2013DEVIEW 2013
DEVIEW 2013
 
Enabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioEnabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with Alluxio
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introduction
 
Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt
 
Ceph in 2023 and Beyond.pdf
Ceph in 2023 and Beyond.pdfCeph in 2023 and Beyond.pdf
Ceph in 2023 and Beyond.pdf
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
Hadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of OzoneHadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of Ozone
 
OpenStackTage Cologne - OpenStack at 99.999% availability with Ceph
OpenStackTage Cologne - OpenStack at 99.999% availability with CephOpenStackTage Cologne - OpenStack at 99.999% availability with Ceph
OpenStackTage Cologne - OpenStack at 99.999% availability with Ceph
 
Openstack For Beginners
Openstack For BeginnersOpenstack For Beginners
Openstack For Beginners
 
Erasure Code at Scale - Thomas William Byrne
Erasure Code at Scale - Thomas William ByrneErasure Code at Scale - Thomas William Byrne
Erasure Code at Scale - Thomas William Byrne
 
99.999% Available OpenStack Cloud - A Builder's Guide
99.999% Available OpenStack Cloud - A Builder's Guide99.999% Available OpenStack Cloud - A Builder's Guide
99.999% Available OpenStack Cloud - A Builder's Guide
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017
 
Initial presentation of swift (for montreal user group)
Initial presentation of swift (for montreal user group)Initial presentation of swift (for montreal user group)
Initial presentation of swift (for montreal user group)
 
Skyhook: Towards an Arrow-Native Storage System, CCGrid 2022
Skyhook: Towards an Arrow-Native Storage System, CCGrid 2022Skyhook: Towards an Arrow-Native Storage System, CCGrid 2022
Skyhook: Towards an Arrow-Native Storage System, CCGrid 2022
 
Netflix running Presto in the AWS Cloud
Netflix running Presto in the AWS CloudNetflix running Presto in the AWS Cloud
Netflix running Presto in the AWS Cloud
 
Towards constrained semantic web
Towards constrained semantic webTowards constrained semantic web
Towards constrained semantic web
 
20140120 presto meetup_en
20140120 presto meetup_en20140120 presto meetup_en
20140120 presto meetup_en
 
Challenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan LambrightChallenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan Lambright
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Kürzlich hochgeladen (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Ceph Day Chicago: Using Ceph for Large Hadron Collider Data

  • 1. Lincoln Bryant • University of Chicago Ceph Day Chicago August 18th, 2015 Using Ceph for Large Hadron Collider Data
  • 3. ATLAS site 5.4 miles (17 mi circumference) ⇐ Standard Model of Particle Physics Higgs boson: final piece discoved in 2012 ⇒ Nobel Prize 2015 → 2018: Cool new physics searches underway at 13 TeV ← credit: Katherine Leney (UCL, March 2015) LHC
  • 4. ATLAS detector ● Run2 center of mass energy = 13 TeV (Run1: 8 TeV) ● 40 MHz proton bunch crossing rate ○ 20-50 collisions/bunch crossing (“pileup”) ● Trigger (filters) reduces raw rate to ~ 1kHz ● Events are written to disk at ~ 1.5 GB/s LHC
  • 5. ATLAS detector 100M active sensors torroid magnets inner tracking person (scale) Not shown: Tile calorimeters (electrons, photons) Liquid argon calorimeter (hadrons) Muon chambers Forward detectors
  • 6. ATLAS data & analysis Primary data from CERN globally processed (event reconstruction and analysis) Role for Ceph: analysis datasets & object store for single events 3x100 Gbps
  • 7. Ceph technologies used ● Currently: ○ RBD ○ CephFS ● Future: ○ librados ○ RadosGW
  • 8. Our setup ● Ceph v0.94.2 on Scientific Linux 6.6 ● 14 storage servers ● 12 x 6 TB disks, no dedicated journal devices ○ Could buy PCI-E SSD(s) if the performance is needed ● Each connected at 10 Gbps ● Mons and MDS virtualized ● CephFS pools using erasure coding + cache tiering
  • 10. ● ATLAS uses the Open Science Grid middleware in the US ○ among other things: facilitates data management and transfer between sites ● Typical sites will use Lustre, dCache, etc as the “storage element” (SE) ● Goal: Build and productionize a storage element based on Ceph
  • 11. XRootD ● Primary file access protocol for accessing files within ATLAS ● Developed by Stanford Linear Accelerator (SLAC) ● Built to support standard high-energy physics analysis tools (e.g., ROOT) ○ Supports remote reads, caching, etc ● Federated over WAN via hierarchical system of ‘redirectors’
  • 12. Ceph and XRootD ● How to pair our favorite access protocol with our favorite storage platform?
  • 13. Ceph and XRootD ● How to pair our favorite access protocol with our favorite storage platform? ● Original approach: RBD + XRootD ○ Performance was acceptable ○ Problem: RBD only mounted on 1 machine ■ Can only run one XRootD server ○ Could create new RBDs and add to XRootD cluster to scale out ■ Problem: NFS exports for interactive users become a lot trickier
  • 14. Ceph and XRootD ● Current approach: CephFS + XRootD ○ All XRootD servers mount CephFS via kernel client ■ Scale out is a breeze ○ Fully POSIX filesystem, integrates simply with existing infrastructure ● Problem: Users want to r/w to the filesystem directly via CephFS, but XRootD needs to own the files it serves ○ Permissions issues galore
  • 15. Squashing with Ganesha NFS ● XRootD does not run in a privileged mode ○ Cannot modify/delete files written by users ○ Users can’t modify/delete files owned by XRootD ● How to allow users to read/write via FS mount? ● Using Ganesha to export CephFS as NFS and squash all users to the XRootD user ○ Doesn’t prevent users from stomping on each other’s files, but works well enough in practice
  • 16. Transfers from CERN to Chicago ● Using Ceph as the backend store for data from the LHC ● Analysis input data sets for regional physics analysis ● Easily obtain 200 MB/s from Geneva to our Ceph storage system in Chicago
  • 17. How does it look in practice? ● Pretty good!
  • 18. Potential evaluations ● XRootD with librados plugin ○ Skip the filesystem, write directly to object store ○ XRootD handles POSIX filesystem semantics as a pseudo-MDS ○ Three ways of accessing: ■ Directly access files via XRootD clients ■ Mount XRootD via FUSE client ■ LD_PRELOAD hook to intercept system calls to /xrootd
  • 20. Ceph and the batch system ● Goal: Run Ceph and user analysis jobs on the same machines ● Problem: Poorly defined jobs can wreak havoc on the Ceph cluster ○ e.g., machine starts heavily swapping, OOM killer starts killing random processes including OSDs, load spikes to hundreds, etc..
  • 21. Ceph and the batch system ● Solution: control groups (cgroups) ● Configured batch system (HTCondor) to use cgroups to limit the amount of CPU/RAM used on a per-job basis ● We let HTCondor scavenge about 80% of the cycles ○ May need to be tweaked as our Ceph usage increases.
  • 22. Ceph and the batch system ● Working well thus far:
  • 23. Ceph and the batch system ● Further work in this area: ○ Need to configure the batch system to immediately kill jobs when Ceph-related load goes up ■ e.g., disk failure ○ Re-nice OSDs to maximum priority ○ May require investigation into limiting network saturation
  • 24. ATLAS Event Service and RadosGW Higgs boson detection
  • 25. ATLAS Event Service ● Deliver single ATLAS events for processing ○ Rather than a complete dataset - “fine grained” ● Able to efficiently fill opportunistic resources like AWS instances (spot pricing), semi-idle HPC clusters, BOINC ● Can be evicted from resources immediately with negligible loss of work ● Output data is streamed to remote object storage
  • 26. ATLAS Event Service ● Rather than pay for S3, RadosGW fits this use case perfectly ● Colleagues at Brookhaven National Lab have deployed a test instance already ○ interested in providing this service as well ○ could potentially federate gateways ● Still in the pre-planning stage at our site
  • 27. Final thoughts June 2015 event 17 p-p collisions in one event
  • 28. Final thoughts ● Overall, quite happy with Ceph ○ Storage endpoint should be in production soon ○ More nodes on the way: plan to expand to 2 PB ● Looking forward to new CephFS features like quotas, offline fsck, etc ● Will be experimenting with Ceph pools shared between data centers with low RTT ping in the near future ● Expect Ceph to play important role in ATLAS data processing ⇒ new discoveries