PEARC17: Data Access for LIGO on the OSG

Data Access for LIGO on the
OSG
Derek Weitzel & Brian Bockelman - University of Nebraska – Lincoln
Duncan A. Brown - Syracuse University
Peter Couvares - California Institute of Technology
Frank Würthwein & Edgar Fajardo Hernandez - University of California San
Diego

Introduction
• During 2015 and 2016, the Laser Interferometer Gravitational-
Wave Observatory (LIGO) conducted a three-month observing
campaign.
• These observations delivered the first direct detection of
gravitational waves from binary black hole mergers.
• To deliver science results in a timely manner, LIGO collaborated
with the Open Science Grid (OSG) to distribute the required
computation across a series of dedicated, opportunistic, and
allocated resources.
• To deliver the petabytes necessary for such a large-scale
computation, our team deployed a distributed data access
infrastructure.
Aurore Simonnet/Sonoma State/Caltech/MIT/LIGO

OSG
• The Open Science Grid (OSG) is a
national, distributed computing
partnership for data-intensive
research
• It provides a fabric of services for
achieving Distributing High
roughput Computing (DHTC) across
dozens of computational facilities

OSG Challenges
• Question: How does OSG manage to include the extraordinary
heterogeneity of dozens of research facilities and laboratories?
• Answer: Maintain as light footprint as possible!
• OSG sites typically provide:
• A batch environment and remote submit mechanism.
• No site-wide shared filesystem. No uniform OS environment.
• A global, read-only filesystem for software distribution (more later!).
• Local HTTP cache.
• Challenge: How do we square what LIGO users expect with what OSG
sites provide?

LIGO Needs: PyCBC Workflow
• PyCBC workflow consists of approximately a hundred thousand jobs for
each day’s worth of recorded LIGO data;
• The total need is driven by various aspects of the science, for example,
enough data must be analyzed to measure the statistical significance of
detection candidates and the computational aspects
• The workflows themselves are managed using the Pegasus Workflow
Management System
• PyCBC workflow requires several terabytes of non-public input data;
throughout the analysis, the data may be read up to 200 times.
• Accordingly, the PyCBC pipeline was historically always run at sites with a
full copy of the LIGO data on a shared filesystem.
Can we get PyCBC running on OSG?

File Size & Velocity
• PyCBC team makes software independent of the OS environment.
• The PyCBC science payload reads, on average, 1Mbps per core.
• Modest until you run thousands of cores!
• Total data size for
• Observation 1 (O1): 7TB.
• O2: ~3TB so far.
• Jobs require a few hundred MB of common, public calibration data.
• The data will be re-read approximately 200 times and the set of workflows
needed will consume several million CPU hours.

Syracuse
HTCondor Submit
Host
LIGO Pool: SUGAR
Generic OSG Site
WN
WN
WNWN
PILOT
JOBS
Nebraska
HDFS Install
LIGO Data Replicator
GridFTP Xfer GridFTP Xfer
Xrootd Xfer
O1 - Implementation
• Used central repository @ Nebraska
with very high bandwidth.
• LIGO data was copied to central
repository
• Submit host submitted to both local and
OSG resources.
• Pegasus runtime managed file
downloads.
• PyCBC executable managed OS
heterogeneity.
• Global shared filesystem (CVMFS)
distributed callibration data.

Syracuse
HTCondor Submit
Host
LIGO Pool: SUGAR
Generic OSG Site
WN
WN
WNWN
PILOT
JOBS
Nebraska
HDFS Install
Xrootd Xfer
O1 - Implementation
• Purposefully Simple - Wanted to get
something running fast!
• Single repository had 100Gbps
connection using GridFTP
• Volume of data is small, ~7TB,
compared to stored CMS data at the
repository of 2.7PB.

Syracuse
HTCondor Submit
Host
LIGO Pool: SUGAR
Generic OSG Site
WN
WN
WNWN
PILOT
JOBS
Nebraska
HDFS Install
Xrootd Xfer
O1 - Implementation
• Each job needs 1Mbps of 100Gbps
total
• This setup was expected to scale
across the 10,000 cores we estimated
could be available to LIGO.
• But, we started to see issues with this
architecture.

O1 – The issues
• Ramp-Up
• GridFTP requires ~128MB of memory per connection due to a per-process
Java VM started by Hadoop HDFS client.
• Transfer nodes could handle the steady state; however, at ramp-up, the OSG
started jobs faster than the GridFTP servers could handle.
• Solution:
• Throttle HTCondor job startup to 1.5Hz. Still caused issues at sites with slow
TCP connections to Nebraska.
• Developed and deployed GridFTP extension to throttle connections per user,
preventing LIGO from disrupting other Nebraska site users.

O1 – The issues
• Scalability
• GridFTP limiting how many jobs could start on the OSG.
• Lots of wasted time because of throttle job starts.
• Solution:
• Switched to XRootD server & protocol. Pegasus makes this easy as it
understands the concept of the same storage available via different
mechanism.
• Implementation uses single process (single Java VM) with many threads.

Adding non-OSG resources
• Added TACC’s Stampede resource with an allocation reward
• Challenges at Stampede:
• Lack of global filesystem (CVMFS). Solution: Use venerable `rsync` to copy
LIGO software/calibrations to Lustre on Stampede.
• Input data access: External data access for each job will likely not scale.
GridFTP copied the entire O1 data to Stampede
• Scalable grid interface: Found Stampede’s Globus GRAM endpoint was
limiting in the number of jobs it could manage. Developed wrapper script to
launch 1024 invocations in single GRAM/SLURM submission.

Pay-Off
10,000 cores seen at
Stampede alone
25,000 cores total in
PyCBC workflow
across multiple sites

Lessons learned from O1
• We can sustain 20,000 running cores: twice than the original target
data distribution. Further scaling would require a new strategy.
• Key to success: Pegasus to help manage data location.
• Still, managing this was a headache: one list of filenames for LIGO data grid
sites (Syracuse), one for TACC/Stampede, one for the rest of OSG…
• Downside: Data access was only useful for LIGO workflows based on
Pegasus.

Goals for LIGO O2
• POSIX-based access: Don’t force users to use esoteric file transfer
utilities to download input. Critical to reach non-Pegasus LIGO users.
• Use of available local storage resources: If either cache-based or
filesystem-based storage resources were available to LIGO at the
computation site, these should be used instead of the WAN.
• Uniform namespace: The LIGO frame files should be accessible via
the same filenames at all sites in the OSG resource pool, avoiding the
need for site-specific lists of filenames.

OSG Redirector
OSG-Connect
Source
IF
Source
Caching
Proxy
Caching
Proxy
Caching
Proxy
Caching
Proxy
JobJob
Job
Download
Redirect
Discovery
Using StashCache
• StashCache is an XRootD-based
content distribution network.
• Targets multi-TB working set sizes.
• Caches are distributed strategically
throughout the OSG to provide
bandwidth for large sites.
• Accessible through POSIX interface
via CVMFS.
• Data is copied through cache
closest to the job.

Securing LIGO data - Authentication
• By default, CVMFS distributes files through a HTTP-based CDN: all files are considered public!
• Seen as acceptable for the original use case of distributing software.
• We enabled “secure-CVMFS” which uses X.509 certificates to authenticate users.
• Authentication happens twice: once to access the worker node’s cache, once to access the StashCache CDN if
there is a local cache-miss.
• LIGO already uses X.509 certificates with their jobs, so this does not increase the burden on their
users.
• Only data is secured with X.509 certificates: the namespace is public (unauthenticated) and
distributed with the normal CVMFS CDNs.
• Done primarily for scalability reasons.
• Sensitive “metadata” about contents of data are not encoded into filename or directory structure.

Securing LIGO data - Authorization
• At the worker node, user identities are checked against an ACL
distributed with the LIGO repository.
• In case of a local cache miss, the user’s X509 identity is used to access
remote StashCache. Authorization ACL is checked again.
• CVMFS will use GeoIP to pick the nearest cache.
• In case of a StashCache cache miss, connection from StashCache host
to Nebraska origin also uses X509-based
authentication/authorization.

CVMFS Performance Results
• The POSIX access via CVMFS is slightly slower than direct access.
• Still sufficient for this application.
• Since the working set size is larger than the typical worker node cache, local
CVMFS cache only acts as a buffer.
• Performance penalty likely due to additional layers (e.g., kernel / FUSE).
• Tradeoff: POSIX access to data is perceived to be a better interface
than using a transfer tool such as XRootD

Conclusions
• Implemented easy-to-access POSIX interface to LIGO data.
• StashCache data federation was used to distribute secure LIGO data
to over 20 sites across the country.
• Production PyCBC workflow now runs on both the OSG and LIGO
sites.
• As O2 has continued, we have started to add international LIGO and VIRGO
sites.

Awards
• HPCWire Editor’s and Readers
award: Top Supercomuting
Achievement
• SDSC
• TACC
• Open Science Grid
• XSEDE
• UNL’s Holland Computing Center

PEARC17: Data Access for LIGO on the OSG

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie PEARC17: Data Access for LIGO on the OSG

Ähnlich wie PEARC17: Data Access for LIGO on the OSG (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

PEARC17: Data Access for LIGO on the OSG