SlideShare ist ein Scribd-Unternehmen logo
1 von 15
Downloaden Sie, um offline zu lesen
Robust File Replication
                                                            PPDG Focus Meeting, January 10th 2002
                                                                      PPDG-11 V0.4


 Robust File Replication.......................................................................................................................................... .................1
1 Introduction.......................................................................................................................................................................1
                                                                                                                                                                               .....
2 Summary of Presentations.............................................................................................................................................. 2    ......
   2.1 JLAB........................................................................................................................................................
                                                                                                                                                              ......................2
   2.2 SRB...........................................................................................................................................................................
                                                                                                                                                                               .....2
   2.3 Globus (Giggle/Grin).................................................................................................................................................    ....3
   2.4 GDMP(CMS).............................................................................................................................................................4 .....
   2.5 MAGDA(Atlas) ..............................................................................................................................................  ................4
   2.6 SAM(D0).........................................................................................................................................................
                                                                                                                                                                     ...............5
   2.7 STAR........................................................................................................................................................
                                                                                                                                                               .....................6
   2.8 Babar........................................................................................................................................................
                                                                                                                                                               .....................6
   2.9 Related Work - Condor.......................................................................................................................................... 6    ........
3 Summary of Discussion Sessions......................................................................................................................................           ...6
   3.1 Interfaces to Robust File Replication services:..............................................................................................                  ..............6
      3.1.1 Redirection Proposal from BaBar.........................................................................................................               .................7
   3.2 Errors, Status, Error Handling, Reliability..................................................................................................................             ...8
      3.2.1 SAM...................................................................................................................................................................8
                                                                                                                                                                               .....
   3.3 Interfaces on which Replication depends .................................................................................................................                 ...8
4 Results of and Proposals from the Meeting................................................................................................................... 9             .......
   4.1 Acceptance of Documents:..........................................................................................................................................         ..9
   4.2 Statements of Direction:................................................................................................................................     ................9
   4.3 Action Items:....................................................................................................................................................
                                                                                                                                                                      ............10
5 Architecture Diagrams .....................................................................................................................................................  ...11
   5.1 SRB ..........................................................................................................................................................................
                                                                                                                                                                               ...11
   5.2 SAM............................................................................................................................................................
                                                                                                                                                                 .................12
   5.3 JLAB.......................................................................................................................................................
                                                                                                                                                             .....................12
   5.4 WP2............................................................................................................................................................
                                                                                                                                                                 .................13
   5.5 ATLAS........................................................................................................................................................................
                                                                                                                                                                                ..13
6 Appendix.............................................................................................................................................................................
                                                                                                                                                                                 .14
   6.1 Jan 10th Meeting Agenda ..........................................................................................................................................        .14
   6.2 Jan 10th Participants......................................................................................................................................   .............14


1        Introduction
The Particle Physics Data Grid SciDAC Collaboratory Pilot includes as one of its core work areas “Robust data
movement and replication” (CS-5 and CS-6). The four participating Computer Science groups are developing Grid
middleware to address components or integrated solutions for these services. The sixexperiments are deploying file
replication services into production – starting from the use of generic FTP, through initial parallel-stream FTPs such as
bbFTP, gsiFTP, using catalogs of varying sophistication to track and manage the distributed file sets, and experiment
specific higher level components to accomplish end-to-end applications which users can invoke and with which end
users, developers and integrators can interact. PPDG sponsors these groups to integrate and deploy their replication
applications and to share functionality and performance requirements, experience and plans. PPDG then acts to promote
common components and interfaces, consistency and interoperability of appropriate middleware and standards.

This PPDG report is the result of a one-day focus meeting on “Robust File Replication”. Appendix 1 gives the agenda of
and attendees at the meeting. The meeting reflected accomplishments from a lot of work on the part of all the
participating groups. There was a clear interest and preparedness to discuss across the groups future work, technical
and practical issues and directions.



                                                                                                                                                                                    1
This report is in 2 sections. It attempts to capture key points of the application and technology presentations to provide
the background to identify a list of future activities of the project, and necessary and relevant areas of work which would
benefit from future discussion. The report does not include the information presented in the talks at the meeting. The
reader is referred to the slides and documents posted off of the meeting agenda page for more detailed information.
http://www.ppdg.net/mtgs/10jan    -02/agenda.htm


2      Summary of Presentations

2.1        JLAB
All software based on Web services.


Replica                             Deep name tree. Translation from GFN to SURL – Global File Name and
Catalog                             Site-url – host name of the site + url (which includes the protocol and the site).
                                    This is a logical string which can be “redirected” to the actual physical site.
                                    Intent is that naming semantics is links and collections.
                                    Rejected globus replica catalog because it does not support deep trees. Not a
                                    challenging database design to do this. Using mySQL. Performance not an
                                    issue because can improve the h/w. End to end locking and transaction rate.
HRM Listener         Application    Glue between local site and global replica catalog. Listening to local HRM
                     level Agent    actions and informs other services. A planner? An information server? Each
                                    storage system has an HRM listener. HRM is part of the VO.
                                    Where are there one to many? Switch from Grid to VO this needs more
                                    thought. Wrapper on Jasmine. Soap + mysql
Replication                         handles requests to make replicas at a higher level than the replica catalog.
Service                             Handles space requests etc. File Client does not do this. File Transfer Service
                                    does not manage space. Who does?

      1.    Recommend that a definition of web accessible services should be included in the PPDG Architecture. Is this a
            minority opinion? Globus has stated it is a direction they are moving in. The next generation of replica catalog
            is defined to be a web service.
      2.    How should PPDG be defining the web services interfaces? SRB will work with JLAB. Mapping between the
            representations can be easily done. How does one define the meaning of the schema? Agree on a minimum
            set? Can this be done as a joint effort or is it several parallel efforts. All results should be posted to the PPDG
            web sites and comment on it as it is in progress rather than a “final draft for review”. Draft of JLAB
            implementation is posted to the meeting web page.
      3.    GridFTP interaction with Storage Resource managers? Need a discussion with Ian, Carl and Arie. SRM
            document addresses some of the issues.
      4.    Need to communicate error information back through the services and/or layers?

2.2        SRB
SRB Enhancements for BaBar:


SRB->HPSS Driver               Glue            Connect metadata in SRB DB to HPSS files
SRB Server                     Extend          Extension to use new driver to HPSS and make server support SLAC
                               Common
                               Middleware
Remote Proxy                   Glue            Access to and bundling of file transfers.
                               (DataCutter)
User Client                    BaBar



                                                                                                                             2
Replication Services
Logical Name Space                          Replication is a capability “in the logical name space”. Replication integrated
                                            into SRB system. Locking done with timeouts. Inconsistencies can occur.
Registration of Digital                     Files, Blobs, Database command sequences, URLs. Can see information
Objects                                     from different databases.
Aggregation                                 Container replication; synchronization; staging. Can have a Container that
                                            represents a whole site.
Replica Creation                            Synchronous, Asynchronous – out of band. (from PPDG requirements)
Replica Access                              Automated fail over to alternate copy
Latency Management
Data Transport
Meta Data Transport

          1.    Remote Proxy – possibility that will need scheduling service, and mechanism for improving efficiency of
                file transfers.
          2.    Need to access metadata independent of file access. Need to provide bulk metadata import and
                registration. Discovery based on attributes.
          3.    Storage System access and data transport interface are site specific.
          4.    Any thoughts on linking the BaBar metadata catalogs Oracle and Objectivity ? Complex but has been
                done with Objectstore.
          5.    Asynchronous replica creation (k out of n is a success). using background service was not requested by
                BaBar nor implemented. by SRB. Relation to partial result? Could benefit from more discussion.
          6.    Architecture
                       a. Storage Abstraction – is this /should this be a common components? How does it relate to the
                            HRM definition? Includes latency management.
                       b. Catalog Abstraction


2.3     Globus (Giggle/Grin)
First version of Replica Catalog and Management Services is in production as part of Globus V2.0 and integrated into
GDMP and EDG TestBed 1. The comments relate to the developments of the new components : Replica Location
Service (RLS) which augments the Replica Catalog, and Reliable File Transfer Service which is a component above the
File Transfer layer. The first prototype implementation of the RLS is scheduled for 4/02 and a production version for
integration with EDG TestBed 2 in 9/02.


Replica Catalog                      File attributes are kept in meta-data catalog which is outside the domain of the
                                     Globus service?
Reliable Replication                 Combine storage system operations with replica catalog updates.
Replica Selection                    Estimate performance
                                     Relies on Information Services

Replica Location                     To an end user the functionality will appear as equivalent to the set of Replica
Service                              Catalog, Replica Selection and Replication Managament
Framework:
Reliable Local State
Global State with
Relaxed Consistency

Reliable File                        Reliable transfer of byte streams. Built on top of GridFTP.
Transfer Service                     http://www.mcs.anl.gov/~maddu   ri/RFT.html
Reliable Replication                 Reliable Replication Service. Who is responsible for establishing the reliability,
Service                              verifying and determinine the Catalog consistency. Catalogs within RLS include the



                                                                                                                          3
Storage System catalog.

         1.   New implementation of Replica Catalog supports logical files in several collections and containers?)
         2.   Name Space. Could one map to the UNIX file system name space? Is this something that PPDG wants to
              input to? WP2: Does one need to define Name Space semantics? Is the definition of database tables
              sufficient – ie arbitrary set of attributes that defines a name?
         3.   “Collection” use overload.
                    a. Container/Aggregation. Same as a data object. Clusters.
                    b. Selection Set/Collections. Logical organization.
              These are orthogonal.
         4.   Could Globus interface discuss with JLAB and SRB before completing the definition of the interfaces for
              RLS?
         5.   Difference between Replica Management and RLS was not completely clear?
         6.   Impact on End User of different consistency levels. Sh      ould be none except for performance? Depends on
              the user API. User gets “probability” that file is in the stated location. This is always true.
                    a. Does End User gets information that is “Wrong”? - possibly. But this is true given errors that
                         can occur with completely design which guarantees consistency?
                    b. Does End User always get correct information but performance is affected? Yes.
         7.   Semantics of the Hints/Location Service needs to be separate from those of the File Delvery Service.
                                                                                                              i
         8.   WP2 has seen no performance issues with current version of Replica Catalog.


2.4    GDMP(CMS)
Grid Data Mirroring Package. V2.0 is included in EDG TestBed 1 and V2.x will be in VDT 1.0.



Publish/Subscription   GDMP         Local catalogs – text files - keep lists and state.
Manager
Replica Catalog        Globus       Updated when replica “pulled”. Can be used as a push model with the GDMP layer
File Copier            GridFTP      Interfaces to the Storage System
Storage System                      Looking at the HRM. How does the interaction happen?
Interface
Replica Optimizer      WP2          Being designed. Is this a potentially “common component”. Workshop is at CERN
                                    week of Mar 15th

         1.   GDMP works on Containers as well as single files. This is an enhancement to the Globus Replica
              catalog/management.
         2.   Error recovery use cases.
                   a. May republish a file that already succeeded. Globus replica catalog refuses duplicate entry of
                        logical file.
                   b. May be knowledge in the catalog you don’t know about. Should protocol include a Transaction
                        Index and 2 phase commit?
                   c. Where is the responsibility to determine validity of catalog?
                   d. Is GDMP functionality replaced by Globus Reliable Location Service in the future? Not
                        completely. Will need the Publish/Subscribe layer.



2.5    MAGDA(Atlas)
MAGDA is being used and further developed by ATLAS as a vertically integrated framework available for testing,
experiment development and production use. Gsiftp and scp are used for the file copy, mysql as the database. To date
other components are ATLAS developed.




                                                                                                                       4
Logical File Name                   Supports collections and container. Arbitray string. Name is unique in a VO, includes
Space                               Replica Number.
File Catalog                        Mysql database. Mysql accelerator written by ATLAS for sets of database updates.
                                    Replica catalog loader written but not tested. No transaction locking to date.
Storage System                      Data repository. Site + Location. Host can access a set of sites.
File Discovery Agent                Spider finds files and registers them
Replication Service                 Replication Operation done by tasks. (Data Placement Jobs). Master Instance is a
                                    requirement – addresses consistency issue. Use scp/gsiftp. Gdmp integration
                                    underway. Cost of access – only allow access from local cache and site. Automated
                                    optional delete of replica.
User Web Interface                  Web pages for requests and status

          1.   Consistency maintenance – Assured Current.
          2.   Trusted Files. Supports new versions of files which must be published. Can one rephrase this?
          3.   HEMP – Hybrid Event Store Metadata Prototype. Related to Data Signature work.
          4.   Replication Jobs. Data Movement scheduling needs a fuller discussion.

GDMP Issues:
               1. One root disk directory per site
               2. Subscription updates bring in all new data for a site
               3. File collections not used
               4. LFN fixed as ‘dir/filename’ (RC constraint)
               5. Doesn’t catalog or directly manage files in MSS
               6. Wr
               7. ite
               8. access to tmp, etc disk areas required for all GDMP users
               7. System state info (in files) only available locally

General discussion topics:
         1. Policies for Storage and Access.
         2. User view of MAGDA? Similarity of services with SAM and BaBar needs?

2.6     SAM(D0)
SAM is in production use by D0 as an inte grated data grid system. The file handling, replication, routing services were
developed some time ago. The presentation focused on some of the robustness features in the file copying components
and deployment of the integrated distributed system – it is not a complete view.


Failover                            If error from one replica automatically fail over to another
Cleanup                             Release resources if task or job fails. Detection of abandoned jobs.
Responses to Errors                 Timeout if resources held too long without action.
                                    Node error results in rerouting of the data to healthy nodes
                                    Exit handler in User process which calls DH system
Resilience                          Automatic restart of servers and jobs. Retries of replication. Separate movement of
                                    data itself from that of the metadata to separate dependence on storage system and
                                    data catalogs.
Performance Tuning                  Parallelize database access layer.
Integration Features                Validation agents. Error message translation and interpretation at Component
                                    Interfaces. Tunable timeouts at every interface. (No checksums.)

          1.   Timeouts as an error mechanism. Pluses and minuses.
          2.   Unexpected/incorrect behaviour of layers depending on (e.g. file copier) takes a lot of time and work to
               code for/around.
          3.   Complete logs help debugging and diagnosis.




                                                                                                                          5
2.7    STAR
STAR is working with the SRM project on the integration of the HRM implementation of the SRM standard in an end –to-
end application.


Replica Catalog                             mysql
File transfer                  Globus       GridFTP
Storage Management             SRM          SRM-HRM. Retries work when there is a storage system error.


2.8    Babar
BaBar has a prototype of database replication using the SRB replication services. This prototype is being modified to
separate the catalog information in MCAT - leaving the core replication schema in MCAT and the BaBar extensions in
another DB.


2.9    Related Work - Condor
Condor developments were not reported in the meeting, are related to the topics at hand and are candidates for PPDG
work: Nest http://www.nestproject.org/ , ftp-lite http://www.cs.wisc.edu/condor/ftp_lite and the pluggable file
system http://www.cs.wisc.edu/condor/pfs and kangaroo http://www.cs.wisc.edu/condor/kangaroo

3     Summary of Discussion Sessions
       These notes are from the scheduled and impromtu discussion sessions. As such they are incomplete and reflect
       periods of time when the notetakers were otherwise engaged.

         A JOB is a schedulable unit or a schedulable transaction.

3.1    Interfaces to Robust File Replication services:
         MAGDA, Globus, SRB, SAM –

         Web Services for this uniformity? Or Protocol Question – commands and/or attributes that are included.

         Do we want to retrofit and/or wrap existing systems with the same interface definition but different
         implementation.

         Are there separate services for Replica Catalog interface and/or Replica Services.

         Semantics of replica systems.

         Assume live in a heterogeneous world and one implementation can talk to another implementation. May
         require reimplementation.

         EDG is not trying to solve the problems “of the whole world”. Bottom up approach and identify components.
         Core set of capabilities.

         For JLAB Publish/Subscribe is a Replica Policy.

         Low level API for file transfer should not be dependent on whether being used in Replication or not.

         Where does bulk transfer of data – container of containers. Is this a separate concept or not? Does it affect the
         semantics and model of consumption of the data. Is there lazy consumption or not? Where do the policy and



                                                                                                                        6
planning interfaces occur? Can a file be regarded as a container and it is then decomposed and partially copied
          – this is a task for SRB ASCI project.

          How high up the service layers are we going to go? What are the collective and application level components.
          Do we want/need to address the end user layers?

          With reference to DGRA V2.09

                                                                                           User Interface
          Replica Management              9.1                                              register, move, copy

          Replica Catalog Service         7.3                                              “catalog-only” requests and
                                                                                           collection definition
          Local Replica Catalog           5.5.2
          Storage Resource (system,       5.1                                              storage requests and information
          element)
          Reliable Transfer               9.1                                              copy only requests
          Publish/Subscribe


          Is there a consistency mechanism as part of the API? Validation and transaction API? What is the semantic for
          this?

          How to address fact the “place to memory’ and ‘place to disk’ can have same semantics but are certainly not
          replaceable and are not necessarily interoperable.

          Need to discuss the State of the file and as well as the Status of the replication and file storage/copy.

          Coupling between Storage Element and Virtual Storage Element or Replica Catalog. Need to be careful about
          wanting a full file system semantics of a unix file system.

           Are people prepared to get together to work out the overlap and commonality between current
          implementations. Then deliver this to PPDG. Should not take more than 2 months. Not clear what benefit this
          would have – we have representatives of all the implementations available to review any common proposals.

          RLS. Local Catalog in next week or 2. Index Node specification – prototype version by the end of March.

          Globus Replica Management API:
          http://www-unix.globus.org/api/c/globus_replica_management/html/index.html

3.1.1     Redirection Proposal from BaBar

          The BaBar redirection requirement and implementation proposal is posted off of the agenda web page. It has
          been previously discussed in PPDG meetings and was revisited here in light of the next round of Globus/WP2
          design and implementation work:

          1.   Redirection is part of the WP2 design for TestBed 2.
          2.   RLS allows a first level of indirection. Need to leave protocol open to allow later addition of this redirection
               capability. This is has been agreed to for a while, but needs detailed implementation details.
          3.   For web services interface – redirection is explicit in that there is a 2 step process for accessing the byte
               stream in the SRM document.
          4.   Manual lookups – always doing a redirection.

          Agreed that this issue is being addressed and the next discussion should be to review the implementation after
the first prototype version of RLS is released.




                                                                                                                              7
3.2     Errors, Status, Error Handling, Reliability
Discussion was driven by the slides posted of the agenda web page.

          Should one provide a layer that takes all error information and interprets it. Can design a “perfect error system”
          will always have to translate the information for some other component.

          Strings vs Error Codes – give the Details or the Essence. Maximum length of string to have user read it. So
          “Summary String” and “Detailed String”.

          Need to address Status from success as well as failure e.g retries.

          What is in the error and status handling that is better in the information/monitoring system?

          Diagnosis and response can/should/is an independent activity? Who uses the information for what –
          debugging , diagnosis, human response.

          PPDG should decide what we want to do about Error Handling? Agreement that this is an important area
          which always takes much work for end-to-end application and distributed system integration and deployment.

          Server Process and/or Service Machine died in the middle of a catalog/database update. Details are different
          although report to the user is the same.

          Should system be robust to system administrator deleting a logical file somewhere. In Giggle can make sure
          local catalog and local storage are consistent. This might be too costly? What happens if one loses a file?

          Status e.g. how many retries, automated failover information, of successful operations also important.

          Definition of file STATEs part of overall understanding of errror, status, consistency, robustness issues.

3.2.1     SAM
          SAM status blocks were not included to date in the presentation. SAM keeps a nested stack of errors and
          structures. All the information is contained in the structure. Ultimately printed as text.
          http://d0db.fnal.gov/sam/doc/design/status.html , http://www.ppdg.net/mtgs/10jan-02/SAMErrorCode.idl.txt ,
            http://www.ppdg.net/mtgs/10jan    -02/SAM_Status.idl.txt . Examples:

>>>>>> Starting project with the Station     %      CERR 11-Sep-2001 15:57:02 SAMManager:sammgr -
M aster                                      %ERLOG-w SAM: PROJECT MASTER:
Defaulting to quot;newquot; dataset version               Project master error caught in SAMManager::locatePM()!
CORBA Exception, station is probably              Error message: Project master unreachable!
dead (Minor: 0                                    Contact sam-users@fnal.gov!
Completed: COMPLETED_NO)                          sammgr 11-Sep-2001 15:57:02 SAMManager:sammgr -
                                             SAMManager:sammgr Waiting for the project master (no timeout).
                                             %ERLOG-e UNKNOWN:
                                                  CorbaUtil::Resolve:
                                             '/SAMStations/central-analysis/09_11_01_15_56:Project' not found



3.3     Interfaces on which Replication depends
        The Data Grid Capabilities document (PPDG-8) was used as a basis for discussion. This document will be recast
        into categories to map onto DGRA and MAGDA will be included.

        Latency management – what are the technical details.



                                                                                                                           8
Robustness – capabilities not in common.
        Asynchrony support
        Consistency state.

Logical File Names:
           1. Does Unix semantics Logical File Names follow through into functionality e.g. ACL for directory affects
               ability to create new files. How does Authorization get affected? Is this part of the architecture/design.
               Multi-part authorization process.
           2. How does one do a Physics Meta-Data Query. Logical name space attributes or the name?
           3. Are Names of Files “meaningful” or are the “strings that identify a set of meta-data”.
           4. Does update of a file create a new entry? Should version be part of the significant name?


4     Results of and Proposals from the Meeting

4.1     Acceptance of Documents:
The following PPDG documents were accepted. Comments, changes, new versions are anticipated. These documents
are PPDG project document deliverables in Common Services CS-7.


PPDG-10 Numeric Requirements for the Replica Catalog Service V0.2
PPDG-9 Common Storage Resource Manager Operations V1.0
PPDG-8 Data Grid Implementations - Comparison of Capabilities, V6

This paper is proposed to be PPDG-11 - Robust File Replication, PPDG Focus Meeting Report

4.2     Statements of Direction:
There has clearly been a lot of progress in the design, implementation and deploying of Replication Services in the PPDG
experiments over the past year. Successes include:
                 a. End to end application tests by all experiments.
                 b. Delivery and prototype use of new Globus Replication services and extension of SRB and HRM
                     common services.
                 c. Accepted common terminology and use of Data Grid Reference Architecture definitions.
                 d. Documenting performance requirements and system capabilities.
                 e. Progress on more detailed interface, architecture and protocol definitions
                 f. Inter-team discussions on new designs and interfaces.

PPDG will continue to collaborate with EDG on GDMP in its developments for WP2 TestBed 2 and integration with
Giggle. Ppdg-exec should discuss this with EDG/WP2, CMS and Globus as part of PPDG Year 2 planning.
                    a. Need to define which pieces to leave as GDMP specific layer. Is GDMP still a “CMS specific”
                         PPDG project activity? For EDG it is not CMS specific.
                    b. Need to address the issue of GDMP V2 support as V3 is developed and deployed.
.
JLAB/SRB Project Activity service specification will be the nascent protocol definition for Replica Management for PPDG
review/input and adoption. This is a possible discussion topic for the Feb PPDG collaboration meeting if there is time. It is
possible that Globus might be able to consider contributing to and/or reviewing this.

While there is continued concern at multiple implementations in experiments of file transfer and replica management it is
clear that during this phase of the project it is most constructive to be exploring different ideas and directions as a
precursor to moving towards more commonality. We expect continued discussion of this issue.

There is still significant work to be done to have a Robust File Replication system that meets the needs of all the PPDG
application groups.




                                                                                                                            9
4.3   Action Items:

            SRB/JLAB interface to Storage Element and         JLAB,SRB document first draft           2/20/02
            Replication Management (web service definition)
            GridFTP interaction with Storage Resource         ppdg-exec phone con with Carl, Ian,
            managers                                          Bill, Arie to initiate the discussion
            Container and Collection consistency in use       ppdg-exec review PPDG documents         2/20/02
            GDMP and RLS issues (ATLAS GDMP issues,           Ppdg—exec phone con with GDMP,          2/20/02
            PPDG Year 2 planning, Master Replica.             WP2, CMS, ATLAS, Globus, Andy
            Review next version of Globus Replication         Agenda of PPDG phone con.               1/30/02
            development
            Review Local Replica Catalog Interface            Agenda of PPDG phone con – AC,          1/30/02
                                                              SM
            Data movement scheduling                              Agenda of PPDG phone con            Before
                                                                                                      4/02
            Error Reporting, Handling and Response in the     2 page paper from ppdg-exec.            2/20/02
            PPDG Environment                                  Agenda of PPDG phone con                3.02
            Revisit outcomes                                  Another focus meeting                   Decide in
                                                                                                      April.




                                                                                                                10
5     Architecture Diagrams
5.1    SRB
                                                               Local Application
                                                               Local Application

          SRB mapped to PPDG/DGRA
          Architecture
           Experiment Computing
                                                     Application Framework
                                                     Application Framework                               Experiment Databases
                                                                                                         Experiment Databases




               Domain             Job Management
                                  Job Management        Data Management
                                                        Data Management                  Metadata
                                                                                         Metadata                     Object to File
                                                                                                                      Object to File
                                                                                        Management
                                                                                        Management                      Mapper
                                                                                                                        Mapper
                BaBar Grid




                                     Information &
                                     Information &           Logical name Space
                                                             Logical name Space                 Grid Scheduler
                                                                                                Grid Scheduler                Consistency
                                                                                                                              Consistency
                Collective             Monitoring                                                                          (metadata // data)
                                       Monitoring                                                                           (metadata data)
                                                               Replica Attributes
                                                               Replica Attributes                                               (latency
                                                                                                                                 (latency
                                                              Replica Optimization
                                                                                                                             management //
                                                                                                                             management
                                                              Replica Optimization
                                                                                                                               metadata)
                                                                                                                               metadata)



                                                                 Storage
                                                                  Storage              Catalog
                                                                                        Catalog           Authorisation,
                                                                                                          Authorisation,        Service Index
                                                                                                                                Service Index
                                    SQL
                                    SQL          Computing
                                                 Computing
                Resource                                         Services
                                                                 Services            Management
                                                                                     Management           Authentication
                                                                                                          Authentication           (URL //
                                                                                                                                    (URL
                                  Database
                                  Database        Element
                                                  Element        (storage
                                                                  (storage             (catalog
                                                                                        (catalog           and Auditing
                                                                                                           and Auditing           command
                                                                                                                                  command
                                   Service
                                   Service        Services
                                                  Services     abstraction)
                                                               abstraction)          manipulation)
                                                                                     manipulation)                              registration)
                                                                                                                                 registration)
                  Grid



                Fabric and
                Connectivity       Resource
                                   Resource          Configuration
                                                     Configuration          Monitoring
                                                                            Monitoring                   Node
                                                                                                          Node              Fabric Storage
                                                                                                                            Fabric Storage
                                  Management
                                  Management         Management
                                                      Management                and
                                                                                and                  Installation &
                                                                                                     Installation &          Management
                                                                                                                             Management
                                                                          Fault Tolerance
                                                                          Fault Tolerance            Management
                                                                                                      Management




       SDSC Storage Resource Broker & Meta-data Catalog
                                                          Application

          C, C++,            Linux        Unix          Java, NT                DLL /            Prolog                   Web             Clients
          Libraries           I/O         Shell         Browsers               Python           Predicate

                  Consistency Management / Authorization-Authentication
                                                                                                                                           Prime
         Logical Name                   Latency                        Data                                Metadata                        Server
            Space                     Management                     Transport                             Transport
          Catalog Abstraction                            Storage Abstraction
                                                Archives        File Systems Databases
              Databases                                                                                      DB2, Oracle,                 Servers
                                             HPSS, ADSM, HRM                          Unix, NT,
          DB2, Oracle, Sybase                                                                                 Postgres
                                             UniTree, DMF                             Mac OSX




                                                                                                                                                    11
5.2   SAM


                                                                                         Client Applications
                                 Web                 Command line
                                                                                            D0 Framework C++ codes                           Python codes, Java codes


                                       Request
                                    Formulator and                Request Manager                 Cache Manager                Job Manager                  Storage Manager
         Collective Services


                                       Planner


                                “Dataset Editor”                “Project Master”               “Station Master”          “Station Master”           “File Storage Server”

                                                                                Batch Systems - LSF, FBS, PBS,
                                         SAM Resource Management                                                           Job Services                     Data Mover
                                                                                            Condor
                                             “Optimiser”                                                                                                    “Stager”

                                Significant Event Logger              Naming Service                   Catalog Manager                             Database Manager




                               CORBA            UDP                 Catalog            File transfer protocols -                              Mass Storage systems protocols
                                                                   protocols                 ftp, bbftp, rcp             GridFTP                      e.g. encp, hpss

       Connectivity and Resource

                                               SAM-specific user, group, node, station registration                         GSI                   Bbftp ‘cookie’
       Authentication and Security
        Fabric
        Fabric




                                    Tape               Disk                                                                Resource and
                                                                     Compute           LANs and              Code                                 Replica             Meta-data
                                   Storage            Storage                                                             Services Catalog
                                                                     Elements           WANs               Repostory                              Catalog              Catalog
                                  Elements           Elements




                Indicates component that will be replaced                           enhanced            or added          using PPDG and Grid tools

                                                                   Name in “quotes” is SAM-given software component name




5.3   JLAB


          Data Grid Web Services Architecture
                                                                                                               Web Services


                                                                                       Meta Data Catalog

                                                                                        Replica Catalog


                                                                                       Replication Service

                         File Client
                                                                                       HRM++ Service



                                                                                       File Server(s)                                   HRM Listener



                                                                                        Storage Resource
                                                                                                                                                        Single Site
                                                                                                                                                                                  12
5.4   WP2



             WP2 Replication Services - Overview




5.5   ATLAS



                                  Magda Architecture
                                                          Collection of logical
                                                           files to replicate


        Location
                          Mass                                                           Spider
          Location        Store
                                                   Disk    Source to cache
                           Site       Location                                          Host 1
               Location                 Location   Site
                                                                stagein
                                           Cache


                                           scp, gsiftp                            Synch via DB    MySQL


            Location     Site
              Location      Site
                               Site                         Source to dest              Host 2
                Location
                   Location                                     transfer
                     Location
                                                                                         Spider


                  Replication task                          Register replicas

                  Catalog updates
                                                                                                  13
6      Appendix

6.1      Jan 10th Meeting Agenda
      GriPhyN/PPDG Data Grid
  Architecture, Toolkit, and Roadmap                  EDG Work Package 2
               v2.09, v2.07s
Replication Requirements: 6/01; 1/02       Storage Resource Management Interface V1.0
.
                                  Speaker or     Documentation /
   Time Topic
                                  Discussion     Presentation

             Ongoing Work

    9:00am   Welcome                Chip Watson
    9:10am   Introduction           Ruth Pordes      Slides
                                                     Talk: Web services for replicated file
    9:15am   JLAB                   Chip Watson
                                                     management,       A data analysis grid
    9:30am    SRB                   Reagan Moore     Talk :SRB and the discussion session
             Globus                                  Draft Paper. GGF presentation, Reliable
    9:45am                          Ann Chervenak
             (Giggle/Grin)                           File Transfer
 11:00am     GDMP(CMS)              Heinz Stockinger Talk: GDMP documentation
 11:15am     MAGDA(Atlas)           Torre Wenaus     Talk: Magda Documentation
 11:30am     SAM(D0)                Vicky White      Talk: SAM home page
 11:45am     STAR                   Eric Hjort       Talk
 12:00pm     Babar                  Adil Hasan       SRB in Babar
 12:30pm     Lunch
                                                      Requirements/Interfaces for: catalog;
                                                      queueing of replication requests; reliable
             Common interfaces                        execution of these requests; replication
                                    Andy
    1.30pm   to services this layer                   policy specification
                                    Hanushevsky
             provides
                                                      Redirection issue: Paper, Proposal
             Status, Errors,
    2:30pm                          Doug Thain        Talk
             Asynchrony
                                                      Talk (second half of slides above)
                                                      Requirements/Interfaces to the services
             Common interfaces
    3:30pm                        Reagan Moore        robust replication consumes -- HRM, file
             to provider services
                                                      transfer Comparison of data grid
                                                      capabilities
    4:30pm   Break
             What has been
    5:00pm   learned, next steps,
             goals etc
    6:00pm   Dinner/End

6.2      Jan 10th Participants




                                                                                                   14
Walt Aker – jlab          Reagan W. Moore – SDSC
Bill Allcock – ANL        Richard Mount (VRVS) - SLAC
Jie Chen - JLAB           Shazhad Muzaffar – Fermilab
Ying Chen – JLAB          Ruth Pordes – Fermilab
Ann Chervenak – ISI       Heinz Stockinger – CERN
Peter Couvares – Uwisc    Doug Thain – Uwisc
Ewa Deelman – ISI         Yee-Ting Li (VRVS, ucl , uk)
Andy Hanushevsky – SLAC   Chip Watson – JLAB
Bryan Hess – JLAB         Torre Wenaus – BNL
Eric Hjort – LBL          Vicky White – DOE
Andy Kowalski – JLAB      Mike Wilde – ANL
Miron Livny – Uwisc       Bing Zhu – SDSC




                                                         15

Weitere ähnliche Inhalte

Was ist angesagt?

Soa In The Real World
Soa In The Real WorldSoa In The Real World
Soa In The Real Worldssiliveri
 
Z commerce-for-the-cloud-blueprint
Z commerce-for-the-cloud-blueprintZ commerce-for-the-cloud-blueprint
Z commerce-for-the-cloud-blueprintArief Wicaksono
 
PA Marcellus Shale Advisory Commission Final Report
PA Marcellus Shale Advisory Commission Final ReportPA Marcellus Shale Advisory Commission Final Report
PA Marcellus Shale Advisory Commission Final ReportMarcellus Drilling News
 
texas instruments 2007 Proxy Statement
texas instruments  2007 Proxy Statementtexas instruments  2007 Proxy Statement
texas instruments 2007 Proxy Statementfinance19
 
Yahoo Web Analytics API Reference Guide
Yahoo Web Analytics API Reference GuideYahoo Web Analytics API Reference Guide
Yahoo Web Analytics API Reference GuideAndrew Talcott
 
Statutory Demand Law in Australia
Statutory Demand Law in AustraliaStatutory Demand Law in Australia
Statutory Demand Law in AustraliaWayne Davis
 
NPY Rule Book [constitution] catsi act approved at 14.11.08
NPY Rule Book [constitution] catsi act approved at 14.11.08NPY Rule Book [constitution] catsi act approved at 14.11.08
NPY Rule Book [constitution] catsi act approved at 14.11.08npywc
 
01 f25 introduction
01 f25 introduction01 f25 introduction
01 f25 introductionc3uo
 
agilent 2009_Proxy_Statement
agilent  2009_Proxy_Statementagilent  2009_Proxy_Statement
agilent 2009_Proxy_Statementfinance38
 
Proxy Statement for July 2007 Annual Meeting
Proxy Statement for July 2007 Annual Meeting Proxy Statement for July 2007 Annual Meeting
Proxy Statement for July 2007 Annual Meeting finance2
 
texas instruments 2008 Proxy Statement
texas instruments 2008 Proxy Statementtexas instruments 2008 Proxy Statement
texas instruments 2008 Proxy Statementfinance19
 
SafeDNS Content Filtering Service Guide
SafeDNS Content Filtering Service GuideSafeDNS Content Filtering Service Guide
SafeDNS Content Filtering Service GuideSafeDNS
 
Java how to_program__7th_edition
Java how to_program__7th_editionJava how to_program__7th_edition
Java how to_program__7th_editionABDUmomo
 
Saptableref[1]
Saptableref[1]Saptableref[1]
Saptableref[1]mpeepms
 
Psp2010 rulesgeneral
Psp2010 rulesgeneralPsp2010 rulesgeneral
Psp2010 rulesgeneralguestcf6cfc
 

Was ist angesagt? (20)

By d ui_styleguide_2012_fp35
By d ui_styleguide_2012_fp35By d ui_styleguide_2012_fp35
By d ui_styleguide_2012_fp35
 
Soa In The Real World
Soa In The Real WorldSoa In The Real World
Soa In The Real World
 
Z commerce-for-the-cloud-blueprint
Z commerce-for-the-cloud-blueprintZ commerce-for-the-cloud-blueprint
Z commerce-for-the-cloud-blueprint
 
PA Marcellus Shale Advisory Commission Final Report
PA Marcellus Shale Advisory Commission Final ReportPA Marcellus Shale Advisory Commission Final Report
PA Marcellus Shale Advisory Commission Final Report
 
texas instruments 2007 Proxy Statement
texas instruments  2007 Proxy Statementtexas instruments  2007 Proxy Statement
texas instruments 2007 Proxy Statement
 
Yahoo Web Analytics API Reference Guide
Yahoo Web Analytics API Reference GuideYahoo Web Analytics API Reference Guide
Yahoo Web Analytics API Reference Guide
 
Rails4 Days
Rails4 DaysRails4 Days
Rails4 Days
 
Spiral b of master thesis new1
Spiral b  of master thesis   new1Spiral b  of master thesis   new1
Spiral b of master thesis new1
 
Statutory Demand Law in Australia
Statutory Demand Law in AustraliaStatutory Demand Law in Australia
Statutory Demand Law in Australia
 
perl_tk_tutorial
perl_tk_tutorialperl_tk_tutorial
perl_tk_tutorial
 
NPY Rule Book [constitution] catsi act approved at 14.11.08
NPY Rule Book [constitution] catsi act approved at 14.11.08NPY Rule Book [constitution] catsi act approved at 14.11.08
NPY Rule Book [constitution] catsi act approved at 14.11.08
 
01 f25 introduction
01 f25 introduction01 f25 introduction
01 f25 introduction
 
agilent 2009_Proxy_Statement
agilent  2009_Proxy_Statementagilent  2009_Proxy_Statement
agilent 2009_Proxy_Statement
 
Proxy Statement for July 2007 Annual Meeting
Proxy Statement for July 2007 Annual Meeting Proxy Statement for July 2007 Annual Meeting
Proxy Statement for July 2007 Annual Meeting
 
texas instruments 2008 Proxy Statement
texas instruments 2008 Proxy Statementtexas instruments 2008 Proxy Statement
texas instruments 2008 Proxy Statement
 
SafeDNS Content Filtering Service Guide
SafeDNS Content Filtering Service GuideSafeDNS Content Filtering Service Guide
SafeDNS Content Filtering Service Guide
 
Java how to_program__7th_edition
Java how to_program__7th_editionJava how to_program__7th_edition
Java how to_program__7th_edition
 
Saptableref[1]
Saptableref[1]Saptableref[1]
Saptableref[1]
 
Ale i doc-complete-tutorial
Ale i doc-complete-tutorialAle i doc-complete-tutorial
Ale i doc-complete-tutorial
 
Psp2010 rulesgeneral
Psp2010 rulesgeneralPsp2010 rulesgeneral
Psp2010 rulesgeneral
 

Ähnlich wie Ppdg Robust File Replication

BOOK FOR RECORD KEEPING - latest
BOOK FOR RECORD KEEPING - latestBOOK FOR RECORD KEEPING - latest
BOOK FOR RECORD KEEPING - latestNamatai Moyo
 
Gbr Version 060209 Addendum
Gbr Version 060209 AddendumGbr Version 060209 Addendum
Gbr Version 060209 Addendummatthromatka
 
Castor Reference Guide 1 3 1
Castor Reference Guide 1 3 1Castor Reference Guide 1 3 1
Castor Reference Guide 1 3 1paripelly
 
The.common.java.cookbook.2009
The.common.java.cookbook.2009The.common.java.cookbook.2009
The.common.java.cookbook.2009ex344
 
The.Common.Java.Cookbook.2009
The.Common.Java.Cookbook.2009The.Common.Java.Cookbook.2009
The.Common.Java.Cookbook.2009teamojiao
 
BizTalk Practical Course Preview
BizTalk Practical Course PreviewBizTalk Practical Course Preview
BizTalk Practical Course PreviewMoustafaRefaat
 
Net app v-c_tech_report_3785
Net app v-c_tech_report_3785Net app v-c_tech_report_3785
Net app v-c_tech_report_3785ReadWrite
 
CALM DURING THE STORM:Best Practices in Multicast Security
CALM DURING THE STORM:Best Practices in Multicast SecurityCALM DURING THE STORM:Best Practices in Multicast Security
CALM DURING THE STORM:Best Practices in Multicast SecurityJohnson Liu
 
Offshore wind-development-program-offshore-wind-roadmap-for-vietnam
Offshore wind-development-program-offshore-wind-roadmap-for-vietnamOffshore wind-development-program-offshore-wind-roadmap-for-vietnam
Offshore wind-development-program-offshore-wind-roadmap-for-vietnamTunAnh346
 
C3d content ukie_doc0
C3d content ukie_doc0C3d content ukie_doc0
C3d content ukie_doc0puilaos2011
 
White Paper: Look Before You Leap Into Google Apps
White Paper: Look Before You Leap Into Google AppsWhite Paper: Look Before You Leap Into Google Apps
White Paper: Look Before You Leap Into Google AppsOffice
 

Ähnlich wie Ppdg Robust File Replication (20)

LSI_SAS2008_Manual_v100.pdf
LSI_SAS2008_Manual_v100.pdfLSI_SAS2008_Manual_v100.pdf
LSI_SAS2008_Manual_v100.pdf
 
R Lang
R LangR Lang
R Lang
 
R Ints
R IntsR Ints
R Ints
 
Drools expert-docs
Drools expert-docsDrools expert-docs
Drools expert-docs
 
BOOK FOR RECORD KEEPING - latest
BOOK FOR RECORD KEEPING - latestBOOK FOR RECORD KEEPING - latest
BOOK FOR RECORD KEEPING - latest
 
Going the Extra Mile
Going the Extra MileGoing the Extra Mile
Going the Extra Mile
 
Google General Guidelines 2011
Google General Guidelines 2011Google General Guidelines 2011
Google General Guidelines 2011
 
General guidelines 2011
General guidelines 2011General guidelines 2011
General guidelines 2011
 
Oscom23 old
Oscom23 oldOscom23 old
Oscom23 old
 
Gbr Version 060209 Addendum
Gbr Version 060209 AddendumGbr Version 060209 Addendum
Gbr Version 060209 Addendum
 
Castor Reference Guide 1 3 1
Castor Reference Guide 1 3 1Castor Reference Guide 1 3 1
Castor Reference Guide 1 3 1
 
perl_tk_tutorial
perl_tk_tutorialperl_tk_tutorial
perl_tk_tutorial
 
The.common.java.cookbook.2009
The.common.java.cookbook.2009The.common.java.cookbook.2009
The.common.java.cookbook.2009
 
The.Common.Java.Cookbook.2009
The.Common.Java.Cookbook.2009The.Common.Java.Cookbook.2009
The.Common.Java.Cookbook.2009
 
BizTalk Practical Course Preview
BizTalk Practical Course PreviewBizTalk Practical Course Preview
BizTalk Practical Course Preview
 
Net app v-c_tech_report_3785
Net app v-c_tech_report_3785Net app v-c_tech_report_3785
Net app v-c_tech_report_3785
 
CALM DURING THE STORM:Best Practices in Multicast Security
CALM DURING THE STORM:Best Practices in Multicast SecurityCALM DURING THE STORM:Best Practices in Multicast Security
CALM DURING THE STORM:Best Practices in Multicast Security
 
Offshore wind-development-program-offshore-wind-roadmap-for-vietnam
Offshore wind-development-program-offshore-wind-roadmap-for-vietnamOffshore wind-development-program-offshore-wind-roadmap-for-vietnam
Offshore wind-development-program-offshore-wind-roadmap-for-vietnam
 
C3d content ukie_doc0
C3d content ukie_doc0C3d content ukie_doc0
C3d content ukie_doc0
 
White Paper: Look Before You Leap Into Google Apps
White Paper: Look Before You Leap Into Google AppsWhite Paper: Look Before You Leap Into Google Apps
White Paper: Look Before You Leap Into Google Apps
 

Kürzlich hochgeladen

A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sectoritnewsafrica
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha
 

Kürzlich hochgeladen (20)

A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
 

Ppdg Robust File Replication

  • 1. Robust File Replication PPDG Focus Meeting, January 10th 2002 PPDG-11 V0.4 Robust File Replication.......................................................................................................................................... .................1 1 Introduction.......................................................................................................................................................................1 ..... 2 Summary of Presentations.............................................................................................................................................. 2 ...... 2.1 JLAB........................................................................................................................................................ ......................2 2.2 SRB........................................................................................................................................................................... .....2 2.3 Globus (Giggle/Grin)................................................................................................................................................. ....3 2.4 GDMP(CMS).............................................................................................................................................................4 ..... 2.5 MAGDA(Atlas) .............................................................................................................................................. ................4 2.6 SAM(D0)......................................................................................................................................................... ...............5 2.7 STAR........................................................................................................................................................ .....................6 2.8 Babar........................................................................................................................................................ .....................6 2.9 Related Work - Condor.......................................................................................................................................... 6 ........ 3 Summary of Discussion Sessions...................................................................................................................................... ...6 3.1 Interfaces to Robust File Replication services:.............................................................................................. ..............6 3.1.1 Redirection Proposal from BaBar......................................................................................................... .................7 3.2 Errors, Status, Error Handling, Reliability.................................................................................................................. ...8 3.2.1 SAM...................................................................................................................................................................8 ..... 3.3 Interfaces on which Replication depends ................................................................................................................. ...8 4 Results of and Proposals from the Meeting................................................................................................................... 9 ....... 4.1 Acceptance of Documents:.......................................................................................................................................... ..9 4.2 Statements of Direction:................................................................................................................................ ................9 4.3 Action Items:.................................................................................................................................................... ............10 5 Architecture Diagrams ..................................................................................................................................................... ...11 5.1 SRB .......................................................................................................................................................................... ...11 5.2 SAM............................................................................................................................................................ .................12 5.3 JLAB....................................................................................................................................................... .....................12 5.4 WP2............................................................................................................................................................ .................13 5.5 ATLAS........................................................................................................................................................................ ..13 6 Appendix............................................................................................................................................................................. .14 6.1 Jan 10th Meeting Agenda .......................................................................................................................................... .14 6.2 Jan 10th Participants...................................................................................................................................... .............14 1 Introduction The Particle Physics Data Grid SciDAC Collaboratory Pilot includes as one of its core work areas “Robust data movement and replication” (CS-5 and CS-6). The four participating Computer Science groups are developing Grid middleware to address components or integrated solutions for these services. The sixexperiments are deploying file replication services into production – starting from the use of generic FTP, through initial parallel-stream FTPs such as bbFTP, gsiFTP, using catalogs of varying sophistication to track and manage the distributed file sets, and experiment specific higher level components to accomplish end-to-end applications which users can invoke and with which end users, developers and integrators can interact. PPDG sponsors these groups to integrate and deploy their replication applications and to share functionality and performance requirements, experience and plans. PPDG then acts to promote common components and interfaces, consistency and interoperability of appropriate middleware and standards. This PPDG report is the result of a one-day focus meeting on “Robust File Replication”. Appendix 1 gives the agenda of and attendees at the meeting. The meeting reflected accomplishments from a lot of work on the part of all the participating groups. There was a clear interest and preparedness to discuss across the groups future work, technical and practical issues and directions. 1
  • 2. This report is in 2 sections. It attempts to capture key points of the application and technology presentations to provide the background to identify a list of future activities of the project, and necessary and relevant areas of work which would benefit from future discussion. The report does not include the information presented in the talks at the meeting. The reader is referred to the slides and documents posted off of the meeting agenda page for more detailed information. http://www.ppdg.net/mtgs/10jan -02/agenda.htm 2 Summary of Presentations 2.1 JLAB All software based on Web services. Replica Deep name tree. Translation from GFN to SURL – Global File Name and Catalog Site-url – host name of the site + url (which includes the protocol and the site). This is a logical string which can be “redirected” to the actual physical site. Intent is that naming semantics is links and collections. Rejected globus replica catalog because it does not support deep trees. Not a challenging database design to do this. Using mySQL. Performance not an issue because can improve the h/w. End to end locking and transaction rate. HRM Listener Application Glue between local site and global replica catalog. Listening to local HRM level Agent actions and informs other services. A planner? An information server? Each storage system has an HRM listener. HRM is part of the VO. Where are there one to many? Switch from Grid to VO this needs more thought. Wrapper on Jasmine. Soap + mysql Replication handles requests to make replicas at a higher level than the replica catalog. Service Handles space requests etc. File Client does not do this. File Transfer Service does not manage space. Who does? 1. Recommend that a definition of web accessible services should be included in the PPDG Architecture. Is this a minority opinion? Globus has stated it is a direction they are moving in. The next generation of replica catalog is defined to be a web service. 2. How should PPDG be defining the web services interfaces? SRB will work with JLAB. Mapping between the representations can be easily done. How does one define the meaning of the schema? Agree on a minimum set? Can this be done as a joint effort or is it several parallel efforts. All results should be posted to the PPDG web sites and comment on it as it is in progress rather than a “final draft for review”. Draft of JLAB implementation is posted to the meeting web page. 3. GridFTP interaction with Storage Resource managers? Need a discussion with Ian, Carl and Arie. SRM document addresses some of the issues. 4. Need to communicate error information back through the services and/or layers? 2.2 SRB SRB Enhancements for BaBar: SRB->HPSS Driver Glue Connect metadata in SRB DB to HPSS files SRB Server Extend Extension to use new driver to HPSS and make server support SLAC Common Middleware Remote Proxy Glue Access to and bundling of file transfers. (DataCutter) User Client BaBar 2
  • 3. Replication Services Logical Name Space Replication is a capability “in the logical name space”. Replication integrated into SRB system. Locking done with timeouts. Inconsistencies can occur. Registration of Digital Files, Blobs, Database command sequences, URLs. Can see information Objects from different databases. Aggregation Container replication; synchronization; staging. Can have a Container that represents a whole site. Replica Creation Synchronous, Asynchronous – out of band. (from PPDG requirements) Replica Access Automated fail over to alternate copy Latency Management Data Transport Meta Data Transport 1. Remote Proxy – possibility that will need scheduling service, and mechanism for improving efficiency of file transfers. 2. Need to access metadata independent of file access. Need to provide bulk metadata import and registration. Discovery based on attributes. 3. Storage System access and data transport interface are site specific. 4. Any thoughts on linking the BaBar metadata catalogs Oracle and Objectivity ? Complex but has been done with Objectstore. 5. Asynchronous replica creation (k out of n is a success). using background service was not requested by BaBar nor implemented. by SRB. Relation to partial result? Could benefit from more discussion. 6. Architecture a. Storage Abstraction – is this /should this be a common components? How does it relate to the HRM definition? Includes latency management. b. Catalog Abstraction 2.3 Globus (Giggle/Grin) First version of Replica Catalog and Management Services is in production as part of Globus V2.0 and integrated into GDMP and EDG TestBed 1. The comments relate to the developments of the new components : Replica Location Service (RLS) which augments the Replica Catalog, and Reliable File Transfer Service which is a component above the File Transfer layer. The first prototype implementation of the RLS is scheduled for 4/02 and a production version for integration with EDG TestBed 2 in 9/02. Replica Catalog File attributes are kept in meta-data catalog which is outside the domain of the Globus service? Reliable Replication Combine storage system operations with replica catalog updates. Replica Selection Estimate performance Relies on Information Services Replica Location To an end user the functionality will appear as equivalent to the set of Replica Service Catalog, Replica Selection and Replication Managament Framework: Reliable Local State Global State with Relaxed Consistency Reliable File Reliable transfer of byte streams. Built on top of GridFTP. Transfer Service http://www.mcs.anl.gov/~maddu ri/RFT.html Reliable Replication Reliable Replication Service. Who is responsible for establishing the reliability, Service verifying and determinine the Catalog consistency. Catalogs within RLS include the 3
  • 4. Storage System catalog. 1. New implementation of Replica Catalog supports logical files in several collections and containers?) 2. Name Space. Could one map to the UNIX file system name space? Is this something that PPDG wants to input to? WP2: Does one need to define Name Space semantics? Is the definition of database tables sufficient – ie arbitrary set of attributes that defines a name? 3. “Collection” use overload. a. Container/Aggregation. Same as a data object. Clusters. b. Selection Set/Collections. Logical organization. These are orthogonal. 4. Could Globus interface discuss with JLAB and SRB before completing the definition of the interfaces for RLS? 5. Difference between Replica Management and RLS was not completely clear? 6. Impact on End User of different consistency levels. Sh ould be none except for performance? Depends on the user API. User gets “probability” that file is in the stated location. This is always true. a. Does End User gets information that is “Wrong”? - possibly. But this is true given errors that can occur with completely design which guarantees consistency? b. Does End User always get correct information but performance is affected? Yes. 7. Semantics of the Hints/Location Service needs to be separate from those of the File Delvery Service. i 8. WP2 has seen no performance issues with current version of Replica Catalog. 2.4 GDMP(CMS) Grid Data Mirroring Package. V2.0 is included in EDG TestBed 1 and V2.x will be in VDT 1.0. Publish/Subscription GDMP Local catalogs – text files - keep lists and state. Manager Replica Catalog Globus Updated when replica “pulled”. Can be used as a push model with the GDMP layer File Copier GridFTP Interfaces to the Storage System Storage System Looking at the HRM. How does the interaction happen? Interface Replica Optimizer WP2 Being designed. Is this a potentially “common component”. Workshop is at CERN week of Mar 15th 1. GDMP works on Containers as well as single files. This is an enhancement to the Globus Replica catalog/management. 2. Error recovery use cases. a. May republish a file that already succeeded. Globus replica catalog refuses duplicate entry of logical file. b. May be knowledge in the catalog you don’t know about. Should protocol include a Transaction Index and 2 phase commit? c. Where is the responsibility to determine validity of catalog? d. Is GDMP functionality replaced by Globus Reliable Location Service in the future? Not completely. Will need the Publish/Subscribe layer. 2.5 MAGDA(Atlas) MAGDA is being used and further developed by ATLAS as a vertically integrated framework available for testing, experiment development and production use. Gsiftp and scp are used for the file copy, mysql as the database. To date other components are ATLAS developed. 4
  • 5. Logical File Name Supports collections and container. Arbitray string. Name is unique in a VO, includes Space Replica Number. File Catalog Mysql database. Mysql accelerator written by ATLAS for sets of database updates. Replica catalog loader written but not tested. No transaction locking to date. Storage System Data repository. Site + Location. Host can access a set of sites. File Discovery Agent Spider finds files and registers them Replication Service Replication Operation done by tasks. (Data Placement Jobs). Master Instance is a requirement – addresses consistency issue. Use scp/gsiftp. Gdmp integration underway. Cost of access – only allow access from local cache and site. Automated optional delete of replica. User Web Interface Web pages for requests and status 1. Consistency maintenance – Assured Current. 2. Trusted Files. Supports new versions of files which must be published. Can one rephrase this? 3. HEMP – Hybrid Event Store Metadata Prototype. Related to Data Signature work. 4. Replication Jobs. Data Movement scheduling needs a fuller discussion. GDMP Issues: 1. One root disk directory per site 2. Subscription updates bring in all new data for a site 3. File collections not used 4. LFN fixed as ‘dir/filename’ (RC constraint) 5. Doesn’t catalog or directly manage files in MSS 6. Wr 7. ite 8. access to tmp, etc disk areas required for all GDMP users 7. System state info (in files) only available locally General discussion topics: 1. Policies for Storage and Access. 2. User view of MAGDA? Similarity of services with SAM and BaBar needs? 2.6 SAM(D0) SAM is in production use by D0 as an inte grated data grid system. The file handling, replication, routing services were developed some time ago. The presentation focused on some of the robustness features in the file copying components and deployment of the integrated distributed system – it is not a complete view. Failover If error from one replica automatically fail over to another Cleanup Release resources if task or job fails. Detection of abandoned jobs. Responses to Errors Timeout if resources held too long without action. Node error results in rerouting of the data to healthy nodes Exit handler in User process which calls DH system Resilience Automatic restart of servers and jobs. Retries of replication. Separate movement of data itself from that of the metadata to separate dependence on storage system and data catalogs. Performance Tuning Parallelize database access layer. Integration Features Validation agents. Error message translation and interpretation at Component Interfaces. Tunable timeouts at every interface. (No checksums.) 1. Timeouts as an error mechanism. Pluses and minuses. 2. Unexpected/incorrect behaviour of layers depending on (e.g. file copier) takes a lot of time and work to code for/around. 3. Complete logs help debugging and diagnosis. 5
  • 6. 2.7 STAR STAR is working with the SRM project on the integration of the HRM implementation of the SRM standard in an end –to- end application. Replica Catalog mysql File transfer Globus GridFTP Storage Management SRM SRM-HRM. Retries work when there is a storage system error. 2.8 Babar BaBar has a prototype of database replication using the SRB replication services. This prototype is being modified to separate the catalog information in MCAT - leaving the core replication schema in MCAT and the BaBar extensions in another DB. 2.9 Related Work - Condor Condor developments were not reported in the meeting, are related to the topics at hand and are candidates for PPDG work: Nest http://www.nestproject.org/ , ftp-lite http://www.cs.wisc.edu/condor/ftp_lite and the pluggable file system http://www.cs.wisc.edu/condor/pfs and kangaroo http://www.cs.wisc.edu/condor/kangaroo 3 Summary of Discussion Sessions These notes are from the scheduled and impromtu discussion sessions. As such they are incomplete and reflect periods of time when the notetakers were otherwise engaged. A JOB is a schedulable unit or a schedulable transaction. 3.1 Interfaces to Robust File Replication services: MAGDA, Globus, SRB, SAM – Web Services for this uniformity? Or Protocol Question – commands and/or attributes that are included. Do we want to retrofit and/or wrap existing systems with the same interface definition but different implementation. Are there separate services for Replica Catalog interface and/or Replica Services. Semantics of replica systems. Assume live in a heterogeneous world and one implementation can talk to another implementation. May require reimplementation. EDG is not trying to solve the problems “of the whole world”. Bottom up approach and identify components. Core set of capabilities. For JLAB Publish/Subscribe is a Replica Policy. Low level API for file transfer should not be dependent on whether being used in Replication or not. Where does bulk transfer of data – container of containers. Is this a separate concept or not? Does it affect the semantics and model of consumption of the data. Is there lazy consumption or not? Where do the policy and 6
  • 7. planning interfaces occur? Can a file be regarded as a container and it is then decomposed and partially copied – this is a task for SRB ASCI project. How high up the service layers are we going to go? What are the collective and application level components. Do we want/need to address the end user layers? With reference to DGRA V2.09 User Interface Replica Management 9.1 register, move, copy Replica Catalog Service 7.3 “catalog-only” requests and collection definition Local Replica Catalog 5.5.2 Storage Resource (system, 5.1 storage requests and information element) Reliable Transfer 9.1 copy only requests Publish/Subscribe Is there a consistency mechanism as part of the API? Validation and transaction API? What is the semantic for this? How to address fact the “place to memory’ and ‘place to disk’ can have same semantics but are certainly not replaceable and are not necessarily interoperable. Need to discuss the State of the file and as well as the Status of the replication and file storage/copy. Coupling between Storage Element and Virtual Storage Element or Replica Catalog. Need to be careful about wanting a full file system semantics of a unix file system. Are people prepared to get together to work out the overlap and commonality between current implementations. Then deliver this to PPDG. Should not take more than 2 months. Not clear what benefit this would have – we have representatives of all the implementations available to review any common proposals. RLS. Local Catalog in next week or 2. Index Node specification – prototype version by the end of March. Globus Replica Management API: http://www-unix.globus.org/api/c/globus_replica_management/html/index.html 3.1.1 Redirection Proposal from BaBar The BaBar redirection requirement and implementation proposal is posted off of the agenda web page. It has been previously discussed in PPDG meetings and was revisited here in light of the next round of Globus/WP2 design and implementation work: 1. Redirection is part of the WP2 design for TestBed 2. 2. RLS allows a first level of indirection. Need to leave protocol open to allow later addition of this redirection capability. This is has been agreed to for a while, but needs detailed implementation details. 3. For web services interface – redirection is explicit in that there is a 2 step process for accessing the byte stream in the SRM document. 4. Manual lookups – always doing a redirection. Agreed that this issue is being addressed and the next discussion should be to review the implementation after the first prototype version of RLS is released. 7
  • 8. 3.2 Errors, Status, Error Handling, Reliability Discussion was driven by the slides posted of the agenda web page. Should one provide a layer that takes all error information and interprets it. Can design a “perfect error system” will always have to translate the information for some other component. Strings vs Error Codes – give the Details or the Essence. Maximum length of string to have user read it. So “Summary String” and “Detailed String”. Need to address Status from success as well as failure e.g retries. What is in the error and status handling that is better in the information/monitoring system? Diagnosis and response can/should/is an independent activity? Who uses the information for what – debugging , diagnosis, human response. PPDG should decide what we want to do about Error Handling? Agreement that this is an important area which always takes much work for end-to-end application and distributed system integration and deployment. Server Process and/or Service Machine died in the middle of a catalog/database update. Details are different although report to the user is the same. Should system be robust to system administrator deleting a logical file somewhere. In Giggle can make sure local catalog and local storage are consistent. This might be too costly? What happens if one loses a file? Status e.g. how many retries, automated failover information, of successful operations also important. Definition of file STATEs part of overall understanding of errror, status, consistency, robustness issues. 3.2.1 SAM SAM status blocks were not included to date in the presentation. SAM keeps a nested stack of errors and structures. All the information is contained in the structure. Ultimately printed as text. http://d0db.fnal.gov/sam/doc/design/status.html , http://www.ppdg.net/mtgs/10jan-02/SAMErrorCode.idl.txt , http://www.ppdg.net/mtgs/10jan -02/SAM_Status.idl.txt . Examples: >>>>>> Starting project with the Station % CERR 11-Sep-2001 15:57:02 SAMManager:sammgr - M aster %ERLOG-w SAM: PROJECT MASTER: Defaulting to quot;newquot; dataset version Project master error caught in SAMManager::locatePM()! CORBA Exception, station is probably Error message: Project master unreachable! dead (Minor: 0 Contact sam-users@fnal.gov! Completed: COMPLETED_NO) sammgr 11-Sep-2001 15:57:02 SAMManager:sammgr - SAMManager:sammgr Waiting for the project master (no timeout). %ERLOG-e UNKNOWN: CorbaUtil::Resolve: '/SAMStations/central-analysis/09_11_01_15_56:Project' not found 3.3 Interfaces on which Replication depends The Data Grid Capabilities document (PPDG-8) was used as a basis for discussion. This document will be recast into categories to map onto DGRA and MAGDA will be included. Latency management – what are the technical details. 8
  • 9. Robustness – capabilities not in common. Asynchrony support Consistency state. Logical File Names: 1. Does Unix semantics Logical File Names follow through into functionality e.g. ACL for directory affects ability to create new files. How does Authorization get affected? Is this part of the architecture/design. Multi-part authorization process. 2. How does one do a Physics Meta-Data Query. Logical name space attributes or the name? 3. Are Names of Files “meaningful” or are the “strings that identify a set of meta-data”. 4. Does update of a file create a new entry? Should version be part of the significant name? 4 Results of and Proposals from the Meeting 4.1 Acceptance of Documents: The following PPDG documents were accepted. Comments, changes, new versions are anticipated. These documents are PPDG project document deliverables in Common Services CS-7. PPDG-10 Numeric Requirements for the Replica Catalog Service V0.2 PPDG-9 Common Storage Resource Manager Operations V1.0 PPDG-8 Data Grid Implementations - Comparison of Capabilities, V6 This paper is proposed to be PPDG-11 - Robust File Replication, PPDG Focus Meeting Report 4.2 Statements of Direction: There has clearly been a lot of progress in the design, implementation and deploying of Replication Services in the PPDG experiments over the past year. Successes include: a. End to end application tests by all experiments. b. Delivery and prototype use of new Globus Replication services and extension of SRB and HRM common services. c. Accepted common terminology and use of Data Grid Reference Architecture definitions. d. Documenting performance requirements and system capabilities. e. Progress on more detailed interface, architecture and protocol definitions f. Inter-team discussions on new designs and interfaces. PPDG will continue to collaborate with EDG on GDMP in its developments for WP2 TestBed 2 and integration with Giggle. Ppdg-exec should discuss this with EDG/WP2, CMS and Globus as part of PPDG Year 2 planning. a. Need to define which pieces to leave as GDMP specific layer. Is GDMP still a “CMS specific” PPDG project activity? For EDG it is not CMS specific. b. Need to address the issue of GDMP V2 support as V3 is developed and deployed. . JLAB/SRB Project Activity service specification will be the nascent protocol definition for Replica Management for PPDG review/input and adoption. This is a possible discussion topic for the Feb PPDG collaboration meeting if there is time. It is possible that Globus might be able to consider contributing to and/or reviewing this. While there is continued concern at multiple implementations in experiments of file transfer and replica management it is clear that during this phase of the project it is most constructive to be exploring different ideas and directions as a precursor to moving towards more commonality. We expect continued discussion of this issue. There is still significant work to be done to have a Robust File Replication system that meets the needs of all the PPDG application groups. 9
  • 10. 4.3 Action Items: SRB/JLAB interface to Storage Element and JLAB,SRB document first draft 2/20/02 Replication Management (web service definition) GridFTP interaction with Storage Resource ppdg-exec phone con with Carl, Ian, managers Bill, Arie to initiate the discussion Container and Collection consistency in use ppdg-exec review PPDG documents 2/20/02 GDMP and RLS issues (ATLAS GDMP issues, Ppdg—exec phone con with GDMP, 2/20/02 PPDG Year 2 planning, Master Replica. WP2, CMS, ATLAS, Globus, Andy Review next version of Globus Replication Agenda of PPDG phone con. 1/30/02 development Review Local Replica Catalog Interface Agenda of PPDG phone con – AC, 1/30/02 SM Data movement scheduling Agenda of PPDG phone con Before 4/02 Error Reporting, Handling and Response in the 2 page paper from ppdg-exec. 2/20/02 PPDG Environment Agenda of PPDG phone con 3.02 Revisit outcomes Another focus meeting Decide in April. 10
  • 11. 5 Architecture Diagrams 5.1 SRB Local Application Local Application SRB mapped to PPDG/DGRA Architecture Experiment Computing Application Framework Application Framework Experiment Databases Experiment Databases Domain Job Management Job Management Data Management Data Management Metadata Metadata Object to File Object to File Management Management Mapper Mapper BaBar Grid Information & Information & Logical name Space Logical name Space Grid Scheduler Grid Scheduler Consistency Consistency Collective Monitoring (metadata // data) Monitoring (metadata data) Replica Attributes Replica Attributes (latency (latency Replica Optimization management // management Replica Optimization metadata) metadata) Storage Storage Catalog Catalog Authorisation, Authorisation, Service Index Service Index SQL SQL Computing Computing Resource Services Services Management Management Authentication Authentication (URL // (URL Database Database Element Element (storage (storage (catalog (catalog and Auditing and Auditing command command Service Service Services Services abstraction) abstraction) manipulation) manipulation) registration) registration) Grid Fabric and Connectivity Resource Resource Configuration Configuration Monitoring Monitoring Node Node Fabric Storage Fabric Storage Management Management Management Management and and Installation & Installation & Management Management Fault Tolerance Fault Tolerance Management Management SDSC Storage Resource Broker & Meta-data Catalog Application C, C++, Linux Unix Java, NT DLL / Prolog Web Clients Libraries I/O Shell Browsers Python Predicate Consistency Management / Authorization-Authentication Prime Logical Name Latency Data Metadata Server Space Management Transport Transport Catalog Abstraction Storage Abstraction Archives File Systems Databases Databases DB2, Oracle, Servers HPSS, ADSM, HRM Unix, NT, DB2, Oracle, Sybase Postgres UniTree, DMF Mac OSX 11
  • 12. 5.2 SAM Client Applications Web Command line D0 Framework C++ codes Python codes, Java codes Request Formulator and Request Manager Cache Manager Job Manager Storage Manager Collective Services Planner “Dataset Editor” “Project Master” “Station Master” “Station Master” “File Storage Server” Batch Systems - LSF, FBS, PBS, SAM Resource Management Job Services Data Mover Condor “Optimiser” “Stager” Significant Event Logger Naming Service Catalog Manager Database Manager CORBA UDP Catalog File transfer protocols - Mass Storage systems protocols protocols ftp, bbftp, rcp GridFTP e.g. encp, hpss Connectivity and Resource SAM-specific user, group, node, station registration GSI Bbftp ‘cookie’ Authentication and Security Fabric Fabric Tape Disk Resource and Compute LANs and Code Replica Meta-data Storage Storage Services Catalog Elements WANs Repostory Catalog Catalog Elements Elements Indicates component that will be replaced enhanced or added using PPDG and Grid tools Name in “quotes” is SAM-given software component name 5.3 JLAB Data Grid Web Services Architecture Web Services Meta Data Catalog Replica Catalog Replication Service File Client HRM++ Service File Server(s) HRM Listener Storage Resource Single Site 12
  • 13. 5.4 WP2 WP2 Replication Services - Overview 5.5 ATLAS Magda Architecture Collection of logical files to replicate Location Mass Spider Location Store Disk Source to cache Site Location Host 1 Location Location Site stagein Cache scp, gsiftp Synch via DB MySQL Location Site Location Site Site Source to dest Host 2 Location Location transfer Location Spider Replication task Register replicas Catalog updates 13
  • 14. 6 Appendix 6.1 Jan 10th Meeting Agenda GriPhyN/PPDG Data Grid Architecture, Toolkit, and Roadmap EDG Work Package 2 v2.09, v2.07s Replication Requirements: 6/01; 1/02 Storage Resource Management Interface V1.0 . Speaker or Documentation / Time Topic Discussion Presentation Ongoing Work 9:00am Welcome Chip Watson 9:10am Introduction Ruth Pordes Slides Talk: Web services for replicated file 9:15am JLAB Chip Watson management, A data analysis grid 9:30am SRB Reagan Moore Talk :SRB and the discussion session Globus Draft Paper. GGF presentation, Reliable 9:45am Ann Chervenak (Giggle/Grin) File Transfer 11:00am GDMP(CMS) Heinz Stockinger Talk: GDMP documentation 11:15am MAGDA(Atlas) Torre Wenaus Talk: Magda Documentation 11:30am SAM(D0) Vicky White Talk: SAM home page 11:45am STAR Eric Hjort Talk 12:00pm Babar Adil Hasan SRB in Babar 12:30pm Lunch Requirements/Interfaces for: catalog; queueing of replication requests; reliable Common interfaces execution of these requests; replication Andy 1.30pm to services this layer policy specification Hanushevsky provides Redirection issue: Paper, Proposal Status, Errors, 2:30pm Doug Thain Talk Asynchrony Talk (second half of slides above) Requirements/Interfaces to the services Common interfaces 3:30pm Reagan Moore robust replication consumes -- HRM, file to provider services transfer Comparison of data grid capabilities 4:30pm Break What has been 5:00pm learned, next steps, goals etc 6:00pm Dinner/End 6.2 Jan 10th Participants 14
  • 15. Walt Aker – jlab Reagan W. Moore – SDSC Bill Allcock – ANL Richard Mount (VRVS) - SLAC Jie Chen - JLAB Shazhad Muzaffar – Fermilab Ying Chen – JLAB Ruth Pordes – Fermilab Ann Chervenak – ISI Heinz Stockinger – CERN Peter Couvares – Uwisc Doug Thain – Uwisc Ewa Deelman – ISI Yee-Ting Li (VRVS, ucl , uk) Andy Hanushevsky – SLAC Chip Watson – JLAB Bryan Hess – JLAB Torre Wenaus – BNL Eric Hjort – LBL Vicky White – DOE Andy Kowalski – JLAB Mike Wilde – ANL Miron Livny – Uwisc Bing Zhu – SDSC 15