SlideShare ist ein Scribd-Unternehmen logo
1 von 40
glideinWMS Training




                      Glidein Internals
                         How they work and why

                          by Igor Sfiligoi (UCSD)




glideinWMS Training             Glidein internals   1
Refresher - What is a glidein?
 ●   A glidein is just a properly configured execution
     node submitted as a Grid job
                      Central manager
                                                    glidein
                                                  Execution node
                         Collector
                                                    glidein
                                                  Execution node
                        Negotiator
     Submit node
     Submit node
                                                    glidein
                                                  Execution node
     Submit node
                                                  Execution node
                                                    glidein
       Schedd                                        Startd

                                                          Job



glideinWMS Training           Glidein internals                    2
Refresher – Glidein startup
 ●   glidein_startup configures and starts Condor
                                                                               Configure
                                                                              Condor G.N.

                                    Submit node
  Frontend node                                                       Worker node
                        Monitor     Submit node
     Frontend           Condor                                           glidein
                                  Central manager
                                                                         Startd
                      Match
                                                           CREAM               Job
             Request
             glideins             Factory node

                                    Condor                            glidein
                                                                    Execution node
                                                           Globus
                                    Factory                            glidein
                                                                    Execution node
                                                   Submit
                                                   glideins
glideinWMS Training                        Glidein internals                                3
What does
                      glidein_startup
                             do



glideinWMS Training        Glidein internals   4
glidein_startup tasks
 ●   Download scripts, parameters and Condor bins
 ●   Validate node (environment)
 ●   Configure Condor
 ●   Start Condor daemon(s)
 ●   Collect post-mortem monitoring info
 ●   Cleanup



glideinWMS Training           Glidein internals     5
Work area
 ●   Before writing any files, a work area must
     be chosen
      ●   CWD often not the right choice
 ●   Where to cd into provided as an argument
 ●   A unique tmp dir created there
      ●   So several glideins running on the same node don't
          step on each other toes




glideinWMS Training          Glidein internals                 6
Downloading files
●     Files downloaded via HTTP
       ●   From both the factory and the frontend Web servers
       ●   Can use local Web proxy (e.g. Squid)
       ●   Mechanism tamper proof and cache coherent
            Factory node                                          glidein_startup
               HTTPd                                     ●   Load files
                                                             from factory Web
                                        Squid

                                                         ●   Load files
                                                             from frontend Web
           Frontend node                                 ●   Run executables
                                                         ●   Start Condor      Startd
               HTTPd                                     ●   Cleanup
                              Nothing will work
                              if HTTPd down!

    glideinWMS Training              Glidein internals                                  7
URLs
 ●   All URLs passed to glidein_startup
     as arguments
      ●   Factory Web server
      ●   Frontend Web server
      ●   Squid, if any
 ●   glidein_startup arguments defined
     by the factory
      ●   Frontend Web URL passed to the Factory via
          request ClassAd

glideinWMS Training         Glidein internals          8
Cache coherence
 ●   Files never change,                 ●    There is a (set of)
     once uploaded to the                     logical to physical
     Web area                                 name map files
                                                  ●   *_file_list.id.lst
 ●   If the source file
     changes, a file with a
                                         ●    The names of the list
     different name is                        files are in
     created on the Web                           ●   description.id.cfg
     area                                         ●   The id of this list is
                                                      passed as an
      ●   Essentially                                 argument
                         Frontend ids
          fname.date     passed to the                to glidein_startup
                         factory via ClassAd

glideinWMS Training           Glidein internals                                9
Making the download tamper proof
 ●   Verifying SHA1                    ●    The hash of the
     hashes                                 signature file a        Frontend hash
      ●   Each file has a SHA1              parameter of                 passed to
                                                                       the factory
          associated with it                glidein_startup           via ClassAd
 ●   The hashes delivered                       ●   This is guaranteed to
     in a single file                               be secure with GRAM
      ●   signature.id.sha1
      ●   List of (fname,hash)
                                       ●    Description file hash
          pairs                             in the signature file
 ●   Signature file id in the                   ●   Still safe even if
     description file                               loaded out of order

glideinWMS Training         Glidein internals                               10
Example Web area
                                    Frontend         Factory

                                                                      Globus

                                                              glidein_startup




                                               ...



                                                          .
                                                     ..




glideinWMS Training   Glidein internals                                   11
Node validation
 ●   Run scripts / plugins provided by both
     factory and frontend
      ●   The map file has a flag to distinguish
          scripts from “regular” files
                      # Outfile           InFile                     Exec/other
                       # Outfile            InFile                    Exec/other
                      #########################################################
                       #########################################################
                      constants.cfg       constants.b8ifnW.cfg       regular
                       constants.cfg        constants.b8ifnW.cfg      regular
                      cat_consts.sh       cat_consts.b8ifnW.sh       exec
                       cat_consts.sh        cat_consts.b8ifnW.sh      exec
                      set_home_cms.source set_home_cms.b8ifnW.source wrapper
                       set_home_cms.source set_home_cms.b8ifnW.source wrapper
 ●   If a script returns with exit_code !=0,
     glidein_startup stops execution
      ●   Condor never started → no user jobs ever pulled
      ●   Will sleep for 20 mins → blackhole protection

glideinWMS Training                    Glidein internals                           12
Condor configuration
 ●   Condor config values coming from files
     downloaded by glidein_startup
      ●   Static from both factory and frontend config files
      ●   More details at         http://tinyurl.com/glideinWMS/doc.prd/factory/custom_vars.html

 ●   Scripts can add, alter or delete any attribute
      ●   Based on dynamic information
      ●   Either discovered on the WN
          or by combining info from various sources
      ●   More details at http://tinyurl.com/glideinWMS/doc.prd/factory/custom_scripts.html

glideinWMS Training                     Glidein internals                                     13
Example (pseudo-)script

                 ## check ifCMSSW installed locally
                    check if CMSSW installed locally
                 ifif [-f-f"$CMSSW" ];];then
                    [       "$CMSSW" then
                     ## get the other env
                        get the other env
                     source "$CMSSW"
                       source "$CMSSW"
                 else
                   else
                     ## fail validation
                        fail validation
                     echo "CMSSW not found!n" 1>&2
                       echo "CMSSW not found!n" 1>&2
                     exit 11
                       exit
                 fifi

                 ## publish to Condor
                   publish to Condor
                 add_config_line "CMSSW_LIST" "$CMS_SW_LIST"
                  add_config_line "CMSSW_LIST" "$CMS_SW_LIST"
                 add_condor_vars_line "CMSSW_LIST" "S" "-" "+" "Y" "Y" "-"
                  add_condor_vars_line "CMSSW_LIST" "S" "-" "+" "Y" "Y" "-"
                 ## if Igot here, passed validation
                   if I got here, passed validation
                 exit 00
                  exit

                      More details at http://tinyurl.com/glideinWMS/doc.prd/factory/custom_scripts.html


glideinWMS Training                                  Glidein internals                                    14
Example attributes
 ●   Factory defining where the glidein is running
      ●   GLIDEIN_Country = “US”
 ●   Frontend disabling the monitoring slot
      ●   GLIDEIN_Monitoring_Enabled = “False”




             More details at http://tinyurl.com/glideinWMS/doc.prd/factory/custom_vars.html



glideinWMS Training                             Glidein internals                             15
Condor startup
 ●   After validation and configuration,
     glidein just start condor_master
      ●   Which in turn start condor_startd
 ●   It is just regular Condor from here on
      ●   Any policy the VO needs must be part of
          Condor configuration
                                        Central manager
                      Submit node
                                                          Execution node
                                             Collector
                       Schedd                             glidein_startup
                                            Negotiator

                                                             Startd


glideinWMS Training                 Glidein internals                       16
Post mortem monitoring
 ●   After a glidein terminates, the logs are sent back to
     the Factory
 ●   The logs are mined for job information
     ●   Number jobs ran
     ●   Job exit codes, exist by signal?
     ●   Wasted time
 ●   Condor logs also compressed and copied into stderr
     ●   Only way to get them always to the factory
     ●   Not all Grids guarantee files returning


glideinWMS Training            Glidein internals             17
Cleanup
 ●   glidein_startup will remove all files
     before terminating
      ●   Unless killed by the OS, of course
 ●   Condor does something similar for the user
     jobs disk area after every job
      ●   Users should not expect anything to survive their
          jobs




glideinWMS Training          Glidein internals                18
Glidein lifetime




glideinWMS Training        Glidein internals   19
Glidein lifetime
 ●   Glideins are temporary resources
      ●   Must go away after some time
 ●   We want them to go away by their own will
      ●   So we can monitor progress and clean up
      ●   As opposed to being killed by Grid batch system
 ●   Condor daemons configured to
     die by themselves
      ●   Just need to tell them when


glideinWMS Training          Glidein internals              20
Limiting glidein lifetime
 ●   Hard End-of-Life deadline
      ●   Condor will terminate if it is ever reached
      ●   Any jobs running at that point in time will be killed
            –   Resulting in waste
 ●   Deadline site specific
      ●   Set by the Factory
      ●   Controlled by GLIDEIN_Max_Walltime
 ●   Condor advertises it in the glidein ClassAd
      ●   Attribute GLIDEIN_ToDie

glideinWMS Training                  Glidein internals            21
Deadline for job startup
 ●   Starting jobs just to kill them wasteful
      ●   Glideins set an earlier deadline for job startup
      ●   After the deadline, Condor will terminate at job end
      ●   Advertised as GLIDEIN_TORETIRE
 ●   Need to know how long is the
                                                      Default exists but
     expected job lifetime                            quite arbitrary

      ●   Should be provided by the Frontend
      ●   Parameter GLIDEIN_Job_Max_Time
      ●   ToRetire=Max_Walltime-Job_Max_Time

glideinWMS Training            Glidein internals                     22
Termination due to non-use
 ●   glideins will self-terminate if not used, too
      ●   To limit waste
 ●   Reasons for hitting this
      ●   No more user jobs (spikes)
      ●   Policy problems (Frontend vs Negotiator)
 ●   Typically uniform across the Condor pool
      ●   Can be set by either Frontend or Factory
      ●   Controlled by GLIDEIN_Max_Idle
                                                 To give Negotiator
      ●   Defaults to 20 mins                    the time to match
                                                 the glidein
glideinWMS Training          Glidein internals                        23
Finer grained policies
●    Job_Max_Time is the typical expected job lifetime
      ●   What if not all jobs have the same lifetime?
●    Use Condor matchmaking to define
     finer grained policies
      ●   To prevent long jobs from starting if they are
          expected to be killed before terminating
      ●   This allows (lower priority) shorter jobs to use the CPU
      ●   Or waste just 20 mins (instead of hours)
●    Job lifetime can be difficult to know
      ●   Even users often don't know
      ●   Can refine after restarts (assuming they are deterministic)
    glideinWMS Training               Glidein internals                 24
Example START expression
 ●   Have two different thresholds
      ●   One for first startup, one for all following
      ●   Useful when wide dynamic range, unpredictable

GLIDECLIENT_Start =
 ifthenelse(
    LastVacateTime=?=UNDEFINED,
    NormMaxWallTime < (GLIDEIN_ToDie-MyCurrentTime),
    MaxWallTime < (GLIDEIN_ToDie-MyCurrentTime)
  )


glideinWMS Training            Glidein internals          25
Other sources of waste
 ●   Glideins must stay within the limits of the
     resource lease
      ●   Typical limit is on memory usage
          OSG Factory defines GLIDEIN_MaxMemMBs
 ●   If user jobs exceed that limit, the glideins may
     be killed by the resource provider
     (e.g. Grid batch system)
      ●   Resulting in the user job being killed, too
      ●   Thus waste


glideinWMS Training           Glidein internals         26
Example START expression
 ●   Prevent startup of jobs that are known to use too
     much memory
      ●
          Unless user defines it, known only at 2nd re-run
 ●   Also kill jobs as soon as they pass that
      ●   So we don't have to start a new glidein for other jobs

GLIDECLIENT_Start =
  ImageSize<=(GLIDEIN_MaxMemMBs*1024)

PREEMPT =
  ImageSize>(GLIDEIN_MaxMemMBs*1024)
          More details at http://research.cs.wisc.edu/condor/manual/v7.6/10_Appendix_A.html#82447

glideinWMS Training                           Glidein internals                                     27
Security considerations




glideinWMS Training   Glidein internals   28
Talking to the rest of Condor
 ●    Glideins will talk over            ●    Collector and glidein
      WAN to the rest of                      must whitelist each
      Condor                                  other
       ●   Strong security a must        ●    Schedd and glidein
Central manager
                                              trusted indirectly
     Collector                                    ●   Security token
  Negotiator                                          through Collector
 Submit node                                                 Execution node
                          AN
                                                              glidein_startup
     Schedd
                         W


                                                                 Startd


glideinWMS Training           Glidein internals                                 29
Multi User Pilot Jobs
 ●   A glidein typically starts with a proxy that is not
     representing a specific user
      ●   Very few exceptions (e.g. CMS Organized Reco)
 ●   Three potential security issues:
      ●   Site does not know who the final user is
          (and thus cannot ban specific users, if needed)
      ●   If 2 glideins land on the same node, 2 users may be
          running as the same UID (no system protection)
      ●   User and pilot code run as the same UID
          (no system protection)

glideinWMS Training           Glidein internals             30
glexec
 ●   We have one tool that can help us with all 3
      ●   Namely glexec
 ●   glexec provides UID switching
      ●   But must be installed by the Site on WNs
      ●   Only a subset of sites have done it so far
 ●   glexec requires a valid proxy from
     the final users on the submit node
      ●   Or jobs will not match
      ●   Not a problem for long time Grid users, like CMS,
          but it may be for other VOs

glideinWMS Training           Glidein internals               31
glexec in a picture
                                                          Legend
                         Central manager                           Pilot UID
                            Collector                              User UID

                           Negotiator
     Submit node
     Submit node
                                                      Execution node
     Submit node
                                                     glidein
       Schedd                                           Startd


                                                     glexec         Job




glideinWMS Training              Glidein internals                             32
glexec, cont
 ●   It is in VO best interest to use glexec
      ●   Only way to have a secure MUPJ system
      ●   Should push sites to deploy it
 ●   Some sites require glexec for their own interest
      ●   To cryptographically know who the final users are
      ●   To easily ban users                       May be a legal
                                                    requirement
 ●   Note on site banning
      ●   Condor currently does not handle well glexec
          failures – VO admin must look for this

glideinWMS Training             Glidein internals                    33
Turning on glexec
 ●   Factory provides information about glexec
     installation at site
      ●   Attribute GLEXEC_BIN = NONE | OSG | glite
 ●   Frontend decides if it should be used
      ●   Parameter GLIDEIN_Glexec_Use
      ●   Possible values:
            –   NEVER    - Do not use, even if available
            –   OPTIONAL - Use whenever available
            –   REQUIRED - Only use sites that provide glexec


glideinWMS Training               Glidein internals             34
Standard attributes




glideinWMS Training          Glidein internals   35
Standard attributes - Factory
 ●   Factory provides
      ●   Max_walltime, optionally max_idle
      ●   Glexec location
      ●   Condor binaries (including os arch and version)
      ●   High level network setup (CCB, TCP, etc.)
 ●   Anything not explicitly defined
     has Factory provided defaults
      ●   glideinWMS/creation/web_base/condor_var*


glideinWMS Training          Glidein internals              36
Standard attributes - Frontend
 ●   Frontend provides
      ●   Collector location
      ●   START expression
 ●   Frontend should also provide
      ●   job_max_time
      ●   Should glexec be used




glideinWMS Training            Glidein internals   37
THE END




glideinWMS Training    Glidein internals   38
Pointers
 ●   The official project Web page is
     http://tinyurl.com/glideinWMS
 ●   glideinWMS development team is reachable at
     glideinwms-support@fnal.gov




glideinWMS Training     Glidein internals          39
Acknowledgments
 ●   The glideinWMS is a CMS-led project
     developed mostly at FNAL, with contributions
     from UCSD and ISI
 ●   The glideinWMS factory operations at UCSD is
     sponsored by OSG
 ●   The funding comes from NSF, DOE and the
     UC system




glideinWMS Training        Glidein internals        40

Weitere ähnliche Inhalte

Was ist angesagt?

JSR-299 (CDI), Weld & the Future of Seam (JavaOne 2010)
JSR-299 (CDI), Weld & the Future of Seam (JavaOne 2010)JSR-299 (CDI), Weld & the Future of Seam (JavaOne 2010)
JSR-299 (CDI), Weld & the Future of Seam (JavaOne 2010)Dan Allen
 
CDI and Weld
CDI and WeldCDI and Weld
CDI and Weldjensaug
 
SlapOS Presentation at VW2011 Seoul
SlapOS Presentation at VW2011 SeoulSlapOS Presentation at VW2011 Seoul
SlapOS Presentation at VW2011 SeoulJean-Paul Smets
 
Measure and Manage Flow v2
Measure and Manage Flow v2Measure and Manage Flow v2
Measure and Manage Flow v2Zsolt Fabok
 
Drupal & Continous Integration - SF State Study Case
Drupal & Continous Integration - SF State Study CaseDrupal & Continous Integration - SF State Study Case
Drupal & Continous Integration - SF State Study CaseEmanuele Quinto
 
Node.js #digpen presentation
Node.js #digpen presentationNode.js #digpen presentation
Node.js #digpen presentationGOSS Interactive
 

Was ist angesagt? (7)

JSR-299 (CDI), Weld & the Future of Seam (JavaOne 2010)
JSR-299 (CDI), Weld & the Future of Seam (JavaOne 2010)JSR-299 (CDI), Weld & the Future of Seam (JavaOne 2010)
JSR-299 (CDI), Weld & the Future of Seam (JavaOne 2010)
 
03 - Continuous Integration
03 - Continuous Integration03 - Continuous Integration
03 - Continuous Integration
 
CDI and Weld
CDI and WeldCDI and Weld
CDI and Weld
 
SlapOS Presentation at VW2011 Seoul
SlapOS Presentation at VW2011 SeoulSlapOS Presentation at VW2011 Seoul
SlapOS Presentation at VW2011 Seoul
 
Measure and Manage Flow v2
Measure and Manage Flow v2Measure and Manage Flow v2
Measure and Manage Flow v2
 
Drupal & Continous Integration - SF State Study Case
Drupal & Continous Integration - SF State Study CaseDrupal & Continous Integration - SF State Study Case
Drupal & Continous Integration - SF State Study Case
 
Node.js #digpen presentation
Node.js #digpen presentationNode.js #digpen presentation
Node.js #digpen presentation
 

Andere mochten auch

Final pp t
Final pp tFinal pp t
Final pp tSandesh
 
An argument for moving the requirements out of user hands - The CMS Experience
An argument for moving the requirements out of user hands - The CMS ExperienceAn argument for moving the requirements out of user hands - The CMS Experience
An argument for moving the requirements out of user hands - The CMS ExperienceIgor Sfiligoi
 
ภูกระดึง จ
ภูกระดึง จภูกระดึง จ
ภูกระดึง จsensetthabut
 
ภูกระดึง จ
ภูกระดึง จภูกระดึง จ
ภูกระดึง จsensetthabut
 
Known HTCondor break points
Known HTCondor break pointsKnown HTCondor break points
Known HTCondor break pointsIgor Sfiligoi
 
glideinWMS, The OSG overlay DHTC system - OSG School 2014
glideinWMS, The OSG overlay DHTC system - OSG School 2014glideinWMS, The OSG overlay DHTC system - OSG School 2014
glideinWMS, The OSG overlay DHTC system - OSG School 2014Igor Sfiligoi
 
Introduction to security in the Open Science Grid - OSG School 2014
Introduction to security in the Open Science Grid - OSG School 2014Introduction to security in the Open Science Grid - OSG School 2014
Introduction to security in the Open Science Grid - OSG School 2014Igor Sfiligoi
 
Theresa regli bw-3
Theresa regli bw-3Theresa regli bw-3
Theresa regli bw-3R Aunpad
 
glideinWMS Training 2014 - HTCondor Internals
glideinWMS Training 2014 - HTCondor InternalsglideinWMS Training 2014 - HTCondor Internals
glideinWMS Training 2014 - HTCondor InternalsIgor Sfiligoi
 
simple past past continous
simple past past continoussimple past past continous
simple past past continousKkarOl Glez
 
쇼에듀 회사 소개서
쇼에듀 회사 소개서쇼에듀 회사 소개서
쇼에듀 회사 소개서Show Edu
 

Andere mochten auch (15)

Rph t3
Rph t3 Rph t3
Rph t3
 
Final pp t
Final pp tFinal pp t
Final pp t
 
An argument for moving the requirements out of user hands - The CMS Experience
An argument for moving the requirements out of user hands - The CMS ExperienceAn argument for moving the requirements out of user hands - The CMS Experience
An argument for moving the requirements out of user hands - The CMS Experience
 
Rph t1
Rph t1Rph t1
Rph t1
 
ภูกระดึง จ
ภูกระดึง จภูกระดึง จ
ภูกระดึง จ
 
ภูกระดึง จ
ภูกระดึง จภูกระดึง จ
ภูกระดึง จ
 
Known HTCondor break points
Known HTCondor break pointsKnown HTCondor break points
Known HTCondor break points
 
glideinWMS, The OSG overlay DHTC system - OSG School 2014
glideinWMS, The OSG overlay DHTC system - OSG School 2014glideinWMS, The OSG overlay DHTC system - OSG School 2014
glideinWMS, The OSG overlay DHTC system - OSG School 2014
 
Introduction to security in the Open Science Grid - OSG School 2014
Introduction to security in the Open Science Grid - OSG School 2014Introduction to security in the Open Science Grid - OSG School 2014
Introduction to security in the Open Science Grid - OSG School 2014
 
Theresa regli bw-3
Theresa regli bw-3Theresa regli bw-3
Theresa regli bw-3
 
glideinWMS Training 2014 - HTCondor Internals
glideinWMS Training 2014 - HTCondor InternalsglideinWMS Training 2014 - HTCondor Internals
glideinWMS Training 2014 - HTCondor Internals
 
Rph t2
Rph t2Rph t2
Rph t2
 
simple past past continous
simple past past continoussimple past past continous
simple past past continous
 
쇼에듀 회사 소개서
쇼에듀 회사 소개서쇼에듀 회사 소개서
쇼에듀 회사 소개서
 
Ciri-Ciri Anak Sholeh
Ciri-Ciri Anak SholehCiri-Ciri Anak Sholeh
Ciri-Ciri Anak Sholeh
 

Ähnlich wie Glidein internals

Introduction to glideinWMS
Introduction to glideinWMSIntroduction to glideinWMS
Introduction to glideinWMSIgor Sfiligoi
 
Monitoring and troubleshooting a glideinWMS-based HTCondor pool
Monitoring and troubleshooting a glideinWMS-based HTCondor poolMonitoring and troubleshooting a glideinWMS-based HTCondor pool
Monitoring and troubleshooting a glideinWMS-based HTCondor poolIgor Sfiligoi
 
Matchmaking in glideinWMS in CMS
Matchmaking in glideinWMS in CMSMatchmaking in glideinWMS in CMS
Matchmaking in glideinWMS in CMSIgor Sfiligoi
 
Solving Grid problems through glidein monitoring
Solving Grid problems through glidein monitoringSolving Grid problems through glidein monitoring
Solving Grid problems through glidein monitoringIgor Sfiligoi
 
OpenTuesday: Die Selenium-Toolfamilie und ihr Einsatz im Web- und Mobile-Auto...
OpenTuesday: Die Selenium-Toolfamilie und ihr Einsatz im Web- und Mobile-Auto...OpenTuesday: Die Selenium-Toolfamilie und ihr Einsatz im Web- und Mobile-Auto...
OpenTuesday: Die Selenium-Toolfamilie und ihr Einsatz im Web- und Mobile-Auto...Digicomp Academy AG
 
Condor from the user point of view - glideinWMS Training Jan 2012
Condor from the user point of view - glideinWMS Training Jan 2012Condor from the user point of view - glideinWMS Training Jan 2012
Condor from the user point of view - glideinWMS Training Jan 2012Igor Sfiligoi
 
Pluggable web app using Angular (Odessa JS conf)
Pluggable web app using Angular (Odessa JS conf)Pluggable web app using Angular (Odessa JS conf)
Pluggable web app using Angular (Odessa JS conf)Saif Jerbi
 
Improving Engineering Processes using Hudson - Spark IT 2010
Improving Engineering Processes using Hudson - Spark IT 2010Improving Engineering Processes using Hudson - Spark IT 2010
Improving Engineering Processes using Hudson - Spark IT 2010Arun Gupta
 
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Android app to the challenge
Android   app to the challengeAndroid   app to the challenge
Android app to the challengeUdi Cohen
 
Towards Continuous Deployment with Django
Towards Continuous Deployment with DjangoTowards Continuous Deployment with Django
Towards Continuous Deployment with DjangoRoger Barnes
 
Genymotion with Jenkins
Genymotion with JenkinsGenymotion with Jenkins
Genymotion with JenkinsVishal Nayak
 
Dev + DevOps для PHP розробника
Dev + DevOps для PHP розробникаDev + DevOps для PHP розробника
Dev + DevOps для PHP розробникаphpfriendsclub
 
Hands on Virtualization with Ganeti (part 1) - LinuxCon 2012
Hands on Virtualization with Ganeti (part 1)  - LinuxCon 2012Hands on Virtualization with Ganeti (part 1)  - LinuxCon 2012
Hands on Virtualization with Ganeti (part 1) - LinuxCon 2012Lance Albertson
 
Grunt.js and Yeoman, Continous Integration
Grunt.js and Yeoman, Continous IntegrationGrunt.js and Yeoman, Continous Integration
Grunt.js and Yeoman, Continous IntegrationDavid Amend
 

Ähnlich wie Glidein internals (20)

Pilot Factory
Pilot FactoryPilot Factory
Pilot Factory
 
Introduction to glideinWMS
Introduction to glideinWMSIntroduction to glideinWMS
Introduction to glideinWMS
 
Monitoring and troubleshooting a glideinWMS-based HTCondor pool
Monitoring and troubleshooting a glideinWMS-based HTCondor poolMonitoring and troubleshooting a glideinWMS-based HTCondor pool
Monitoring and troubleshooting a glideinWMS-based HTCondor pool
 
Matchmaking in glideinWMS in CMS
Matchmaking in glideinWMS in CMSMatchmaking in glideinWMS in CMS
Matchmaking in glideinWMS in CMS
 
Solving Grid problems through glidein monitoring
Solving Grid problems through glidein monitoringSolving Grid problems through glidein monitoring
Solving Grid problems through glidein monitoring
 
OpenTuesday: Die Selenium-Toolfamilie und ihr Einsatz im Web- und Mobile-Auto...
OpenTuesday: Die Selenium-Toolfamilie und ihr Einsatz im Web- und Mobile-Auto...OpenTuesday: Die Selenium-Toolfamilie und ihr Einsatz im Web- und Mobile-Auto...
OpenTuesday: Die Selenium-Toolfamilie und ihr Einsatz im Web- und Mobile-Auto...
 
GlassFish v3 Prelude Aquarium Paris
GlassFish v3 Prelude Aquarium ParisGlassFish v3 Prelude Aquarium Paris
GlassFish v3 Prelude Aquarium Paris
 
Condor from the user point of view - glideinWMS Training Jan 2012
Condor from the user point of view - glideinWMS Training Jan 2012Condor from the user point of view - glideinWMS Training Jan 2012
Condor from the user point of view - glideinWMS Training Jan 2012
 
Pluggable web app using Angular (Odessa JS conf)
Pluggable web app using Angular (Odessa JS conf)Pluggable web app using Angular (Odessa JS conf)
Pluggable web app using Angular (Odessa JS conf)
 
Griffon demo
Griffon demoGriffon demo
Griffon demo
 
Improving Engineering Processes using Hudson - Spark IT 2010
Improving Engineering Processes using Hudson - Spark IT 2010Improving Engineering Processes using Hudson - Spark IT 2010
Improving Engineering Processes using Hudson - Spark IT 2010
 
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLab
 
Android app to the challenge
Android   app to the challengeAndroid   app to the challenge
Android app to the challenge
 
Towards Continuous Deployment with Django
Towards Continuous Deployment with DjangoTowards Continuous Deployment with Django
Towards Continuous Deployment with Django
 
SWT Tech Sharing: Node.js + Redis
SWT Tech Sharing: Node.js + RedisSWT Tech Sharing: Node.js + Redis
SWT Tech Sharing: Node.js + Redis
 
Genymotion with Jenkins
Genymotion with JenkinsGenymotion with Jenkins
Genymotion with Jenkins
 
Dev + DevOps для PHP розробника
Dev + DevOps для PHP розробникаDev + DevOps для PHP розробника
Dev + DevOps для PHP розробника
 
Amazon SWF and Gordon
Amazon SWF and GordonAmazon SWF and Gordon
Amazon SWF and Gordon
 
Hands on Virtualization with Ganeti (part 1) - LinuxCon 2012
Hands on Virtualization with Ganeti (part 1)  - LinuxCon 2012Hands on Virtualization with Ganeti (part 1)  - LinuxCon 2012
Hands on Virtualization with Ganeti (part 1) - LinuxCon 2012
 
Grunt.js and Yeoman, Continous Integration
Grunt.js and Yeoman, Continous IntegrationGrunt.js and Yeoman, Continous Integration
Grunt.js and Yeoman, Continous Integration
 

Mehr von Igor Sfiligoi

Preparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROPreparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROIgor Sfiligoi
 
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...Igor Sfiligoi
 
Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Igor Sfiligoi
 
The anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingThe anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingIgor Sfiligoi
 
Auto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resourcesAuto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resourcesIgor Sfiligoi
 
Speeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rateSpeeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rateIgor Sfiligoi
 
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsIgor Sfiligoi
 
Comparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance computeComparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance computeIgor Sfiligoi
 
Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Igor Sfiligoi
 
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessAccelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessIgor Sfiligoi
 
Using A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputUsing A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputIgor Sfiligoi
 
Using commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsUsing commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsIgor Sfiligoi
 
Modest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYROModest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYROIgor Sfiligoi
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstIgor Sfiligoi
 
Scheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with AdmiraltyScheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with AdmiraltyIgor Sfiligoi
 
Accelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCAccelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCIgor Sfiligoi
 
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Igor Sfiligoi
 
Porting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsPorting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsIgor Sfiligoi
 
Demonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public CloudsDemonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public CloudsIgor Sfiligoi
 
TransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud linksTransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud linksIgor Sfiligoi
 

Mehr von Igor Sfiligoi (20)

Preparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROPreparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYRO
 
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
 
Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...
 
The anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingThe anachronism of whole-GPU accounting
The anachronism of whole-GPU accounting
 
Auto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resourcesAuto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resources
 
Speeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rateSpeeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rate
 
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
 
Comparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance computeComparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance compute
 
Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...
 
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessAccelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
 
Using A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputUsing A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific Output
 
Using commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsUsing commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobs
 
Modest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYROModest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYRO
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud Burst
 
Scheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with AdmiraltyScheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with Admiralty
 
Accelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCAccelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACC
 
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
 
Porting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsPorting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUs
 
Demonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public CloudsDemonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public Clouds
 
TransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud linksTransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud links
 

Kürzlich hochgeladen

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Kürzlich hochgeladen (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Glidein internals

  • 1. glideinWMS Training Glidein Internals How they work and why by Igor Sfiligoi (UCSD) glideinWMS Training Glidein internals 1
  • 2. Refresher - What is a glidein? ● A glidein is just a properly configured execution node submitted as a Grid job Central manager glidein Execution node Collector glidein Execution node Negotiator Submit node Submit node glidein Execution node Submit node Execution node glidein Schedd Startd Job glideinWMS Training Glidein internals 2
  • 3. Refresher – Glidein startup ● glidein_startup configures and starts Condor Configure Condor G.N. Submit node Frontend node Worker node Monitor Submit node Frontend Condor glidein Central manager Startd Match CREAM Job Request glideins Factory node Condor glidein Execution node Globus Factory glidein Execution node Submit glideins glideinWMS Training Glidein internals 3
  • 4. What does glidein_startup do glideinWMS Training Glidein internals 4
  • 5. glidein_startup tasks ● Download scripts, parameters and Condor bins ● Validate node (environment) ● Configure Condor ● Start Condor daemon(s) ● Collect post-mortem monitoring info ● Cleanup glideinWMS Training Glidein internals 5
  • 6. Work area ● Before writing any files, a work area must be chosen ● CWD often not the right choice ● Where to cd into provided as an argument ● A unique tmp dir created there ● So several glideins running on the same node don't step on each other toes glideinWMS Training Glidein internals 6
  • 7. Downloading files ● Files downloaded via HTTP ● From both the factory and the frontend Web servers ● Can use local Web proxy (e.g. Squid) ● Mechanism tamper proof and cache coherent Factory node glidein_startup HTTPd ● Load files from factory Web Squid ● Load files from frontend Web Frontend node ● Run executables ● Start Condor Startd HTTPd ● Cleanup Nothing will work if HTTPd down! glideinWMS Training Glidein internals 7
  • 8. URLs ● All URLs passed to glidein_startup as arguments ● Factory Web server ● Frontend Web server ● Squid, if any ● glidein_startup arguments defined by the factory ● Frontend Web URL passed to the Factory via request ClassAd glideinWMS Training Glidein internals 8
  • 9. Cache coherence ● Files never change, ● There is a (set of) once uploaded to the logical to physical Web area name map files ● *_file_list.id.lst ● If the source file changes, a file with a ● The names of the list different name is files are in created on the Web ● description.id.cfg area ● The id of this list is passed as an ● Essentially argument Frontend ids fname.date passed to the to glidein_startup factory via ClassAd glideinWMS Training Glidein internals 9
  • 10. Making the download tamper proof ● Verifying SHA1 ● The hash of the hashes signature file a Frontend hash ● Each file has a SHA1 parameter of passed to the factory associated with it glidein_startup via ClassAd ● The hashes delivered ● This is guaranteed to in a single file be secure with GRAM ● signature.id.sha1 ● List of (fname,hash) ● Description file hash pairs in the signature file ● Signature file id in the ● Still safe even if description file loaded out of order glideinWMS Training Glidein internals 10
  • 11. Example Web area Frontend Factory Globus glidein_startup ... . .. glideinWMS Training Glidein internals 11
  • 12. Node validation ● Run scripts / plugins provided by both factory and frontend ● The map file has a flag to distinguish scripts from “regular” files # Outfile InFile Exec/other # Outfile InFile Exec/other ######################################################### ######################################################### constants.cfg constants.b8ifnW.cfg regular constants.cfg constants.b8ifnW.cfg regular cat_consts.sh cat_consts.b8ifnW.sh exec cat_consts.sh cat_consts.b8ifnW.sh exec set_home_cms.source set_home_cms.b8ifnW.source wrapper set_home_cms.source set_home_cms.b8ifnW.source wrapper ● If a script returns with exit_code !=0, glidein_startup stops execution ● Condor never started → no user jobs ever pulled ● Will sleep for 20 mins → blackhole protection glideinWMS Training Glidein internals 12
  • 13. Condor configuration ● Condor config values coming from files downloaded by glidein_startup ● Static from both factory and frontend config files ● More details at http://tinyurl.com/glideinWMS/doc.prd/factory/custom_vars.html ● Scripts can add, alter or delete any attribute ● Based on dynamic information ● Either discovered on the WN or by combining info from various sources ● More details at http://tinyurl.com/glideinWMS/doc.prd/factory/custom_scripts.html glideinWMS Training Glidein internals 13
  • 14. Example (pseudo-)script ## check ifCMSSW installed locally check if CMSSW installed locally ifif [-f-f"$CMSSW" ];];then [ "$CMSSW" then ## get the other env get the other env source "$CMSSW" source "$CMSSW" else else ## fail validation fail validation echo "CMSSW not found!n" 1>&2 echo "CMSSW not found!n" 1>&2 exit 11 exit fifi ## publish to Condor publish to Condor add_config_line "CMSSW_LIST" "$CMS_SW_LIST" add_config_line "CMSSW_LIST" "$CMS_SW_LIST" add_condor_vars_line "CMSSW_LIST" "S" "-" "+" "Y" "Y" "-" add_condor_vars_line "CMSSW_LIST" "S" "-" "+" "Y" "Y" "-" ## if Igot here, passed validation if I got here, passed validation exit 00 exit More details at http://tinyurl.com/glideinWMS/doc.prd/factory/custom_scripts.html glideinWMS Training Glidein internals 14
  • 15. Example attributes ● Factory defining where the glidein is running ● GLIDEIN_Country = “US” ● Frontend disabling the monitoring slot ● GLIDEIN_Monitoring_Enabled = “False” More details at http://tinyurl.com/glideinWMS/doc.prd/factory/custom_vars.html glideinWMS Training Glidein internals 15
  • 16. Condor startup ● After validation and configuration, glidein just start condor_master ● Which in turn start condor_startd ● It is just regular Condor from here on ● Any policy the VO needs must be part of Condor configuration Central manager Submit node Execution node Collector Schedd glidein_startup Negotiator Startd glideinWMS Training Glidein internals 16
  • 17. Post mortem monitoring ● After a glidein terminates, the logs are sent back to the Factory ● The logs are mined for job information ● Number jobs ran ● Job exit codes, exist by signal? ● Wasted time ● Condor logs also compressed and copied into stderr ● Only way to get them always to the factory ● Not all Grids guarantee files returning glideinWMS Training Glidein internals 17
  • 18. Cleanup ● glidein_startup will remove all files before terminating ● Unless killed by the OS, of course ● Condor does something similar for the user jobs disk area after every job ● Users should not expect anything to survive their jobs glideinWMS Training Glidein internals 18
  • 19. Glidein lifetime glideinWMS Training Glidein internals 19
  • 20. Glidein lifetime ● Glideins are temporary resources ● Must go away after some time ● We want them to go away by their own will ● So we can monitor progress and clean up ● As opposed to being killed by Grid batch system ● Condor daemons configured to die by themselves ● Just need to tell them when glideinWMS Training Glidein internals 20
  • 21. Limiting glidein lifetime ● Hard End-of-Life deadline ● Condor will terminate if it is ever reached ● Any jobs running at that point in time will be killed – Resulting in waste ● Deadline site specific ● Set by the Factory ● Controlled by GLIDEIN_Max_Walltime ● Condor advertises it in the glidein ClassAd ● Attribute GLIDEIN_ToDie glideinWMS Training Glidein internals 21
  • 22. Deadline for job startup ● Starting jobs just to kill them wasteful ● Glideins set an earlier deadline for job startup ● After the deadline, Condor will terminate at job end ● Advertised as GLIDEIN_TORETIRE ● Need to know how long is the Default exists but expected job lifetime quite arbitrary ● Should be provided by the Frontend ● Parameter GLIDEIN_Job_Max_Time ● ToRetire=Max_Walltime-Job_Max_Time glideinWMS Training Glidein internals 22
  • 23. Termination due to non-use ● glideins will self-terminate if not used, too ● To limit waste ● Reasons for hitting this ● No more user jobs (spikes) ● Policy problems (Frontend vs Negotiator) ● Typically uniform across the Condor pool ● Can be set by either Frontend or Factory ● Controlled by GLIDEIN_Max_Idle To give Negotiator ● Defaults to 20 mins the time to match the glidein glideinWMS Training Glidein internals 23
  • 24. Finer grained policies ● Job_Max_Time is the typical expected job lifetime ● What if not all jobs have the same lifetime? ● Use Condor matchmaking to define finer grained policies ● To prevent long jobs from starting if they are expected to be killed before terminating ● This allows (lower priority) shorter jobs to use the CPU ● Or waste just 20 mins (instead of hours) ● Job lifetime can be difficult to know ● Even users often don't know ● Can refine after restarts (assuming they are deterministic) glideinWMS Training Glidein internals 24
  • 25. Example START expression ● Have two different thresholds ● One for first startup, one for all following ● Useful when wide dynamic range, unpredictable GLIDECLIENT_Start = ifthenelse( LastVacateTime=?=UNDEFINED, NormMaxWallTime < (GLIDEIN_ToDie-MyCurrentTime), MaxWallTime < (GLIDEIN_ToDie-MyCurrentTime) ) glideinWMS Training Glidein internals 25
  • 26. Other sources of waste ● Glideins must stay within the limits of the resource lease ● Typical limit is on memory usage OSG Factory defines GLIDEIN_MaxMemMBs ● If user jobs exceed that limit, the glideins may be killed by the resource provider (e.g. Grid batch system) ● Resulting in the user job being killed, too ● Thus waste glideinWMS Training Glidein internals 26
  • 27. Example START expression ● Prevent startup of jobs that are known to use too much memory ● Unless user defines it, known only at 2nd re-run ● Also kill jobs as soon as they pass that ● So we don't have to start a new glidein for other jobs GLIDECLIENT_Start = ImageSize<=(GLIDEIN_MaxMemMBs*1024) PREEMPT = ImageSize>(GLIDEIN_MaxMemMBs*1024) More details at http://research.cs.wisc.edu/condor/manual/v7.6/10_Appendix_A.html#82447 glideinWMS Training Glidein internals 27
  • 29. Talking to the rest of Condor ● Glideins will talk over ● Collector and glidein WAN to the rest of must whitelist each Condor other ● Strong security a must ● Schedd and glidein Central manager trusted indirectly Collector ● Security token Negotiator through Collector Submit node Execution node AN glidein_startup Schedd W Startd glideinWMS Training Glidein internals 29
  • 30. Multi User Pilot Jobs ● A glidein typically starts with a proxy that is not representing a specific user ● Very few exceptions (e.g. CMS Organized Reco) ● Three potential security issues: ● Site does not know who the final user is (and thus cannot ban specific users, if needed) ● If 2 glideins land on the same node, 2 users may be running as the same UID (no system protection) ● User and pilot code run as the same UID (no system protection) glideinWMS Training Glidein internals 30
  • 31. glexec ● We have one tool that can help us with all 3 ● Namely glexec ● glexec provides UID switching ● But must be installed by the Site on WNs ● Only a subset of sites have done it so far ● glexec requires a valid proxy from the final users on the submit node ● Or jobs will not match ● Not a problem for long time Grid users, like CMS, but it may be for other VOs glideinWMS Training Glidein internals 31
  • 32. glexec in a picture Legend Central manager Pilot UID Collector User UID Negotiator Submit node Submit node Execution node Submit node glidein Schedd Startd glexec Job glideinWMS Training Glidein internals 32
  • 33. glexec, cont ● It is in VO best interest to use glexec ● Only way to have a secure MUPJ system ● Should push sites to deploy it ● Some sites require glexec for their own interest ● To cryptographically know who the final users are ● To easily ban users May be a legal requirement ● Note on site banning ● Condor currently does not handle well glexec failures – VO admin must look for this glideinWMS Training Glidein internals 33
  • 34. Turning on glexec ● Factory provides information about glexec installation at site ● Attribute GLEXEC_BIN = NONE | OSG | glite ● Frontend decides if it should be used ● Parameter GLIDEIN_Glexec_Use ● Possible values: – NEVER - Do not use, even if available – OPTIONAL - Use whenever available – REQUIRED - Only use sites that provide glexec glideinWMS Training Glidein internals 34
  • 36. Standard attributes - Factory ● Factory provides ● Max_walltime, optionally max_idle ● Glexec location ● Condor binaries (including os arch and version) ● High level network setup (CCB, TCP, etc.) ● Anything not explicitly defined has Factory provided defaults ● glideinWMS/creation/web_base/condor_var* glideinWMS Training Glidein internals 36
  • 37. Standard attributes - Frontend ● Frontend provides ● Collector location ● START expression ● Frontend should also provide ● job_max_time ● Should glexec be used glideinWMS Training Glidein internals 37
  • 38. THE END glideinWMS Training Glidein internals 38
  • 39. Pointers ● The official project Web page is http://tinyurl.com/glideinWMS ● glideinWMS development team is reachable at glideinwms-support@fnal.gov glideinWMS Training Glidein internals 39
  • 40. Acknowledgments ● The glideinWMS is a CMS-led project developed mostly at FNAL, with contributions from UCSD and ISI ● The glideinWMS factory operations at UCSD is sponsored by OSG ● The funding comes from NSF, DOE and the UC system glideinWMS Training Glidein internals 40