SlideShare a Scribd company logo
1 of 34
Download to read offline
Distributed Resource Management Application API
(DRMAA) Version 2
Dr. Peter Tröger

Hasso-Plattner-Institute, University of Potsdam

peter@troeger.eu

DRMAA-WG Co-Chair
http://www.drmaa.org/
DRMAAv2 | FZ Jülich PT 2012
Open Grid Forum (OGF)
Application Area
2
DRMAAAPI
Grid Engine
OtherOGF
Standards
Proprietary
API
DRMAAAPI
Load Leveler
OtherOGF
Standards
Proprietary
API
SAGA API + Backends
Meta Scheduler
DRMAAAPI
OtherOGF
Standards
Proprietary
API
End User Application / Portal
...
SAGA API / OGSA / OCCI
End User Application / Portal
Features
Portabilit
Features
Portabilit
DRMAAv2 | FZ Jülich PT 2012
DRMAA
• DRMAA group established in 2002

• Goal: Standardized API for distributed resource management systems (DRMS) 

• Low-level portability for cluster and grid infrastructure applications

• Simple design, bottom-up philosophy

• Different DRMAA 1.0 documents

• June 2004 - DRMAA 1.0 Proposed Recommendation (GFD.22)

• April 2008 - Shift to IDL based root specification, some clarifications (GFD.130)

• June 2008 - DRMAA 1.0 Grid Recommendation (GFD.133)

• Official language binding documents for C, Java, Python (GFD.143)

• Experience reports, tutorials, unofficial language bindings for Perl, Ruby and C#
3
DRMAAv2 | FZ Jülich PT 2012
Stakeholders
• Distributed resource management (DRM) system

• Distributes computational tasks on execution resources

• Central scheduling entity

• DRMAA implementation / library : Implementation of a language binding, 

semantics as in the root spec

• Submission host: Resource that runs the DRMAA-based application

• Execution host: Resource that can run a submitted computational task

• Job: One or more operating system processes on a execution host
4
DRMAAv2 | FZ Jülich PT 2012
DRMAA v1 Success Story
5
• Product-quality implementations of DRMAA C and Java, broad industrial up-take

• Grid Engine since 2006

• Condor since 2006

• PBS/Torque, LSF, LoadLeveler, Kerrighed, SLURM, Globus (via GridWay), ...

(third-party implementations)

• Large and silent user community

• Meta schedulers accessing DRM resources 

(Galaxy, QosCosGrid, MOAB, eXludus, OpenDSP, EGEE, Unicore, SAGA, ...)

• Applications or portals accessing DRM resources

(BLAST, workflow processing, finance sector, seismic data, Mathematica, ...)

• Hundreds (?) of unknown Grid Engine users
DRMAAv2 | FZ Jülich PT 2012
DRMAA v1 C Example
6
#include "drmaa.h"
int main(int argc, char **argv) {
char error[DRMAA_ERROR_STRING_BUFFER];
int errnum = 0;
drmaa_job_template_t *jt = NULL;
errnum = drmaa_init(NULL, error, DRMAA_ERROR_STRING_BUFFER);
if (errnum != DRMAA_ERRNO_SUCCESS) return 1;
errnum = drmaa_allocate_job_template(&jt, error, DRMAA_ERROR_STRING_BUFFER);
if (errnum != DRMAA_ERRNO_SUCCESS) return 1;
drmaa_set_attribute(jt, DRMAA_REMOTE_COMMAND, "sleeper.sh",
error, DRMAA_ERROR_STRING_BUFFER);
const char *args[2] = {"5", NULL};
drmaa_set_vector_attribute(jt, DRMAA_V_ARGV, args, error,
DRMAA_ERROR_STRING_BUFFER);
char jobid[DRMAA_JOBNAME_BUFFER];
errnum = drmaa_run_job(jobid, DRMAA_JOBNAME_BUFFER, jt, error,
DRMAA_ERROR_STRING_BUFFER);
if (errnum != DRMAA_ERRNO_SUCCESS) return 1;
errnum = drmaa_delete_job_template(jt, error, DRMAA_ERROR_STRING_BUFFER);
errnum = drmaa_exit(error, DRMAA_ERROR_STRING_BUFFER);
return 0;
}
DRMAAv2 | FZ Jülich PT 2012
DRMAA v1 Java Example (1/2)
7
try {
session.init ("");
System.out.println ("Version: " + session.getDrmaaImplementation ());
JobTemplate jt = session.createJobTemplate ();
jt.setRemoteCommand ("<SGE_ROOT>/examples/javaone/sleeper.sh");
jt.setArgs (new String[]{"30"});
jt.setWorkingDirectory ("<SGE_ROOT>/<SGE_CELL>/javaone");
jt.setJobCategory ("sleeper");
String jobId = session.runJob (jt);
List jobIds = session.runBulkJobs (jt, 1, 4, 1);
System.out.println ("Job " + jobId + " is running");
for (Object id: List jobIds) {
System.out.println ("Job " + id + " is running");
}
session.deleteJobTemplate (jt);
DRMAAv2 | FZ Jülich PT 2012
DRMAA v1 Java Example (2/2)
8
JobInfo info = session.wait (jobId, Session.TIMEOUT_WAIT_FOREVER);
Map usage = info.getResourceUsage ();
for (Object name : usage.keySet ()) {
System.out.println (name + "=" + usage.get (name));}
if (info.hasExited ()) {
System.out.println ("Job exited: " + info.getExitStatus ());}
else if (info.hasSignaled ()) {
System.out.println ("Job signaled: " + info.getTerminatingSignal ());
if (info.hasCoreDump ()) {
System.out.println ("A core dump is available.");}}
else if (info.wasAborted ()) {
System.out.println ("Job never ran.");}
else {
System.out.println ("Exit status is unknown.");}
session.synchronize (jobIds, Session.TIMEOUT_WAIT_FOREVER, true);
}
catch (DrmaaException e) {
e.printStackTrace ();
System.exit (1);}
}
DRMAAv2 | FZ Jülich PT 2012
DRMAA v1 Got Old
• DRM systems got better

• Concept of resources, session persistency, advance reservation, parallel jobs,
many-core systems, queues, state models, WS-* stuff, ...

• Some obsolete / never implemented features

• Date / time handling, host-to-host file staging, specialized job states, ...

• Awkward design decisions

• Job synchronization, job monitoring, data reaping, native specification, ...

• DRMAAv2 work started in 2009 

• Public survey, Sun customer feedback, implementation experiences, ...

• Close collaboration with SAGA, GAT, and OCCI WG; considered JSDL & BES

• Performant investigation of current DRM systems
9
DRMAAv2 | FZ Jülich PT 2012
DRMAA v2 - Status Today
• DRMAAv2 root specification is an official OGF standard since January 2012

• http://www.ogf.org/documents/GFD.194.pdf

• Final draft of DRMAAv2 C binding announced yesterday

• No more major changes, header file available

• OGF public comment period will start in May 2012

• Official specification in late summer 2012

• New project for fork-based reference implementation (drmaav2-mock)

• Language bindings for C++, Python, Java and OCCI under way
10
DRMAAv2 | FZ Jülich PT 2012
DRMAA v2 Implementations ?
• Implementations announced

• Fork-based reference implementation (DRMAA Working Group)

• Univa Grid Engine (Daniel Gruber)

• Torque / PBS Pro (Mariusz Mamonski)

• GridWay (Eduardo Huedo)

• Condor (Todd Tannenbaum)

• Collection type handling in C should be re-usable

• Interfacing the DRM system is already part of the DRMAAv1 library

• Old libraires can be extended with new features

(job arrays, session persistency, reservation, monitoring)
11
DRMAAv2 | FZ Jülich PT 2012
DRMAA v2 Design Approach
• All behavioral aspects in the IDL-based root specification 

• Same model as W3C DOM, OGF SAGA, ...

• What functions are offered ? How are they grouped ?

• What are possible error conditions ?

• For language binding designers and implementors

• Language binding just defines syntactical mapping

• „DRMAA IDL construct A maps to programming
language construct B“

• Root spec mandates some design decisions in the
language binding (e.g. „UNSET“)

• End users should get dedicated man pages
12
DRMAA
C
Binding
DRMAA
OCCI
Binding
GridEngineImplementation
PBSImplementation
...
LSFImplementation
GridWayImplementation
...
DRMAA Root Spec
DRMAAv2 | FZ Jülich PT 2012
DRMAA v2 Layout
13
interface
SessionManager
interface
MonitoringSession
interface
Job
interface
JobSession
interface
JobArray
interface
Reservation
interface
ReservationSession
interface
DrmaaCallback
interface
DrmaaReflective
Exceptions
struct
SlotInfo
struct
MachineInfo
struct
QueueInfo
struct
DrmaaNotification
struct
ReservationTemplate
struct
JobTemplate
struct
ReservationInfo
struct
JobInfo
Enumerations
Establish session

with the DRM system
Work with

jobs /
advance reservation /

monitoring features 

of the

DRM system
DRMAAv2 | FZ Jülich PT 2012
Session Manager
14
• Create multiple connections to one (or more) DRM system(s) at a time

• JobSession / ReservationSession
• Persistent storage of session state (at least) on submission machine

• Can be reloaded by name, explicit reaping

• MonitoringSession
• Read-only semantic, global view on all machines

• Maps nicely to SAGA and friends - management vs. monitoring

• Security remain out of scope for DRMAA

• Explicitly supports the portal / command-line tools use case

• Old concept of contact strings from DRMAAv1
DRMAAv2 | FZ Jülich PT 2012
Session Manager
15
interface SessionManager{
readonly attribute string drmsName;
readonly attribute Version drmsVersion;
readonly attribute string drmaaName;
readonly attribute Version drmaaVersion;
boolean supports(in DrmaaCapability capability);
JobSession createJobSession(in string sessionName, in string contact);
ReservationSession createReservationSession(

in string sessionName, in string contact);
JobSession openJobSession(in string sessionName);
ReservationSession openReservationSession(in string sessionName);
MonitoringSession openMonitoringSession (in string contact);
void closeJobSession(in JobSession s);
void closeReservationSession(in ReservationSession s);
void closeMonitoringSession(in MonitoringSession s);
void destroyJobSession(in string sessionName);
void destroyReservationSession(in string sessionName);
StringList getJobSessionNames();
StringList getReservationSessionNames();
void registerEventNotification(in DrmaaCallback callback);
};
DRMAAv2 | FZ Jülich PT 2012
Event Callback
16
interface DrmaaCallback {
void notify(in DrmaaNotification notification);
};
struct DrmaaNotification {
DrmaaEvent event;
string jobId;
string sessionName;
JobState jobState;
};
enum DrmaaEvent {
NEW_STATE, MIGRATED, ATTRIBUTE_CHANGE
};
• Optional support for event push notification

• Application implements DrmaaCallback interface

• Demands some additional rules from the language binding

• Push notification at least supported in Grid Engine

• Heavily demanded by SAGA and end users

• Might be realized by a 

polling DRMAA library

• Nested callbacks are forbidden

• Scalability is out of scope
DRMAAv2 | FZ Jülich PT 2012
Job Categories
• Consider deployment properties

• Path settings, environment variables, software installed, application starters, ...

• Job category references a site-specific configuration

• Non-normative set of recommendations on DRMAA home page

• Idea is to perform updates without spec modification

• Site operators should be allowed to 

create their own job categories

(configuration of the implementation)

• Initial set derived from JSDL SPMD 

Extension (GFD.115) + experience

• Proposals welcome ...
17
DRMAAv2 | FZ Jülich PT 2012
Job Session
18
interface JobSession {
readonly attribute string contact;
readonly attribute string sessionName;
readonly attribute StringList jobCategories;
JobList getJobs(in JobInfo filter);
JobArray getJobArray(in string jobArrayId);
Job runJob(in JobTemplate jobTemplate);
JobArray runBulkJobs( in JobTemplate
jobTemplate,
in long beginIndex,
in long endIndex,
in long step,
in long maxParallel);
Job waitAnyStarted(in JobList jobs,
in TimeAmount timeout);
Job waitAnyTerminated(in JobList jobs,
in TimeAmount timeout);
};
native ZERO_TIME;
native INFINITE_TIME;
• New: Fetch list of 

supported job 

categories (portal case)

• New: Job filtering

• New: JobArray

• New: maxParallel
restriction for bulk jobs

• Old synchronize() was 

replaced by class-based

state waiting approach

• More responsive

• Get rid of partial job

failure problems
DRMAAv2 | FZ Jülich PT 2012
Job Template
• Basic rules

• DRM systems manage machines, which have a CPU and physical memory

• Machines can be booked in advance reservation

• DRM systems manage queues and slots as resources, opaque to DRMAA

• Jobs can be submitted ...

• ... to specified candidate machines or a queue

• ... to machines matching an OS type, architecture type, or memory requirement

• ... as ,classified‘ job with special treatment by the DRMS (jobCategory)

• Examples: MPICH2 job, OpenMPI job, OpenMP job
19
DRMAAv2 | FZ Jülich PT 2012
Job Template
• Most things remain as in
DRMAAv1, but:

• Relative start / end times are
gone, switched to RFC822

• Only copying between
submission and exception
machine, no host names

• Standardized job category
names (based on GFD.115,
see web page)

• Standardized resource limit
types (setrlimit)

• Several new resource
specification attributes
20
struct JobTemplate {
string remoteCommand;
OrderedStringList args;
boolean submitAsHold;
boolean rerunnable;
Dictionary jobEnvironment;
string workingDirectory;
string jobCategory;
StringList email;
boolean emailOnStarted;
boolean emailOnTerminated;
string jobName;
string inputPath; string outputPath;
string errorPath; boolean joinFiles;
string reservationId;
string queueName;
long minSlots; long maxSlots;
long priority;
OrderedStringList candidateMachines;
long minPhysMemory;
OperatingSystem machineOS;
CpuArchitecture machineArch;
AbsoluteTime startTime;
AbsoluteTime deadlineTime;
Dictionary stageInFiles;
Dictionary stageOutFiles;
Dictionary resourceLimits;
string accountingId;
};
DRMAAv2 | FZ Jülich PT 2012
Optional vs. Implementation-Specific
• Make non-mandatory things explicit 

• DrmaaCapability enumeration 

• One enum per optional feature + SessionManager::supports method

• DrmaaReflective interface 

• Lists of implementation-specific attributes + generic getter / setter functions
21
enum DrmaaCapability {
ADVANCE_RESERVATION, RESERVE_SLOTS, CALLBACK, BULK_JOBS_MAXPARALLEL, JT_EMAIL,
JT_STAGING, JT_DEADLINE, JT_MAXSLOTS, JT_ACCOUNTINGID, RT_STARTNOW, RT_DURATION,
RT_MACHINEOS, RT_MACHINEARCH };
interface DrmaaReflective {
readonly attribute StringList jobTemplateImplSpec;
readonly attribute StringList jobInfoImplSpec;
...
string getInstanceValue(in any instance, in string name);
void setInstanceValue(in any instance, in string name, in string value);
string describeAttribute(in any instance, in string name); }
DRMAAv2 | FZ Jülich PT 2012
Job / JobArray
22
interface Job {
readonly attribute string jobId;
readonly attribute string sessionName;
readonly attribute JobTemplate jobTemplate;
void suspend();
void resume();
void hold();
void release();
void terminate();
JobState getState(out any jobSubState);
JobInfo getInfo();
Job waitStarted(in TimeAmount timeout);
Job waitTerminated(in TimeAmount timeout);
}
interface JobArray {
readonly attribute string jobArrayId;
readonly attribute JobList jobs;
readonly attribute string sessionName;
readonly attribute JobTemplate jobTemplate;
void suspend();
void resume();
void hold();
void release();
void terminate();
}
• Heavy cleanup

• Dedicated control methods

• Explicit job information fetching

• waitStarted() and
waitTerminated() as on
JobSession level

• Support for the new job

sub-state model

• JobArray offers control

operations on bulk jobs

• Rejected: Signaling, modifying
running jobs, ...
DRMAAv2 | FZ Jülich PT 2012
Everybody Loves State Models
23
• Number of states reduced in

comparison to DRMAAv1

• New sub-state concept

• Similar to OGSA-BES

• Allows simple DRMAAv1 mapping

• Wait functions work only on class 

level (timing issues)

• No more different 

hold modes
DRMAAv2 | FZ Jülich PT 2012
JobInfo
24
struct JobInfo {
string jobId;
long exitStatus;
string terminatingSignal;
string annotation;
JobState jobState;
any jobSubState;
OrderedSlotInfoList allocatedMachines;
string submissionMachine;
string jobOwner;
long slots;
string queueName;
TimeAmount wallclockTime;
long cpuTime;
AbsoluteTime submissionTime;
AbsoluteTime dispatchTime;
AbsoluteTime finishTime;
};
• No surprises with JobInfo

• Used for both filtering and

job status representation

• Snapshot semantics

• No promises for terminated jobs

• Implementations can give a
meaning to the reported list of
machines being used

DRMAAv2 | FZ Jülich PT 2012
Advance Reservation
25
• Completely new

• Fits nicely to the rest of the spec

interface ReservationSession {
readonly attribute string contact;
readonly attribute string sessionName;
Reservation getReservation(in string reservationId);
Reservation requestReservation(in ReservationTemplate reservationTemplate);
ReservationList getReservations ();
};
interface Reservation {
readonly attribute string reservationId;
readonly attribute string sessionName;
readonly attribute ReservationTemplate reservationTemplate;
ReservationInfo getInfo();
void terminate();
};
DRMAAv2 | FZ Jülich PT 2012
Advance Reservation
• Reservation template

• Different combinations of
startTime, endTime, and
duration are allowed

• Sliding window + time frame support

• Support for authorization setup
26
struct ReservationTemplate {
string reservationName;
AbsoluteTime startTime;
AbsoluteTime endTime;
TimeAmount duration;
long minSlots;
long maxSlots;
string jobCategory;
StringList usersACL;
OrderedStringList candidateMachines;
long minPhysMemory;
OperatingSystem machineOS;
CpuArchitecture machineArch;
};
struct ReservationInfo {
string reservationId;
string reservationName;
AbsoluteTime reservedStartTime;
AbsoluteTime reservedEndTime;
StringList usersACL;
long reservedSlots;
OrderedSlotInfoList reservedMachines;
};
struct SlotInfo {
string machineName;
long slots;
};
...
typedef sequence<SlotInfo> OrderedSlotInfoList;
...
DRMAAv2 | FZ Jülich PT 2012
DRMS Resource Monitoring
• Completely new

• Global view on resources

• Stateless session instance

• Implementation has freedom 

to leave out information

• Resource information model

matching to SAGA

• Virtual memory = physical
memory + swap space

• Load = 1-min average load

• Snapshot semantics
27
typedef sequence <Reservation> ReservationList;
typedef sequence <Job> JobList;
typedef sequence <QueueInfo> QueueInfoList;
typedef sequence <MachineInfo> MachineInfoList;
...
interface MonitoringSession {
ReservationList getAllReservations();
JobList getAllJobs(in JobInfo filter);
QueueInfoList getAllQueues(in StringList names);
MachineInfoList getAllMachines(in StringList
names);
};
struct QueueInfo {
string name;
};
struct MachineInfo {
string name;
boolean available;
long sockets;
long coresPerSocket;
long threadsPerCore;
double load;
long physMemory;
long virtMemory;
OperatingSystem machineOS;
Version machineOSVersion;
DRMAAv2 | FZ Jülich PT 2012
More Stuff
• DRMAA_INDEX_VAR
• Implementations should set the environment variable DRMAA_INDEX_VAR

• Contains the name of the DRMS environment variable providing the job index

• TASK_ID (Grid Engine), PBS_ARRAYID (Torque), LSB_JOBINDEX (LSF)

• Jobs are enabled to get their own parametric index
28
29
DRMAAv2
C Binding Example
DRMAAv2 | FZ Jülich PT 201230
OCCI-DRMAA Example
DRMAAv2 | FZ Jülich PT 2012
Note: Job Requirements are Hard to Unify
31
DRMAAv2 | FZ Jülich PT 2012
Note: Job Submission Is Hard to Unify
32
LSF Torque PBS Pro
Wildcard Support no not recommended yes
Other than submission host no yes yes
Appending file yes no no
Directory Staging no not by default yes
• Ensuring true application portability with a unified API is difficult

• Interoperability (OGSA, JSDL) does not provide portability

• Profiles (!) with „SHOULD“ and „UnsupportedFeatureFault“ are not helpful

• DRMAA tries to define mandatory job template attributes and API functions that
are implementable in most DRM systems

• This is why DRMAA will never by exhaustive -> use SAGA !
DRMAAv2 | FZ Jülich PT 2012
Note: Relation to Other OGF Standards
• JSDL is too exhaustive, GLUE is too conservative, SAGA / OGSA-BES use DRMAA
33
DRMAAv2 | FZ Jülich PT 2012
Conclusion
• Just the right time for DRMAAv2 adoption in DRM systems

• Window of opportunity for tweaking the specs is still open

• Already broad acceptance on user side (SAGA / DRMAAv1 community)

• Promising new direction with OCCI-DRMAA mapping

• Standardization is collaborative work

• Roger Brobst (Cadence Design Systems), Daniel Gruber (Univa GmbH), 

Mariusz Mamonski (Poznan Supercomputing and Networking Center),

Morris Riedel (FZ Jülich), Daniel Templeton (Cloudera Inc.), 

Andre Merzky (LSU), Thijs Metsch (Platform Computing / IBM), 

OGF working groups, ...

• And if you want to visit Potsdam ...
34

More Related Content

What's hot

Dell PowerEdge Zero Touch Provisioning
Dell PowerEdge Zero Touch ProvisioningDell PowerEdge Zero Touch Provisioning
Dell PowerEdge Zero Touch ProvisioningDell World
 
Intel® Select Solutions for the Network
Intel® Select Solutions for the NetworkIntel® Select Solutions for the Network
Intel® Select Solutions for the NetworkLiz Warner
 
Closed Loop Network Automation for Optimal Resource Allocation via Reinforcem...
Closed Loop Network Automation for Optimal Resource Allocation via Reinforcem...Closed Loop Network Automation for Optimal Resource Allocation via Reinforcem...
Closed Loop Network Automation for Optimal Resource Allocation via Reinforcem...Liz Warner
 
Platform Observability and Infrastructure Closed Loops
Platform Observability and Infrastructure Closed LoopsPlatform Observability and Infrastructure Closed Loops
Platform Observability and Infrastructure Closed LoopsLiz Warner
 
Open Source for the 4th Industrial Revolution
Open Source for the 4th Industrial RevolutionOpen Source for the 4th Industrial Revolution
Open Source for the 4th Industrial RevolutionLiz Warner
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
Service Assurance Constructs for Achieving Network Transformation - Sunku Ran...
Service Assurance Constructs for Achieving Network Transformation - Sunku Ran...Service Assurance Constructs for Achieving Network Transformation - Sunku Ran...
Service Assurance Constructs for Achieving Network Transformation - Sunku Ran...Liz Warner
 
Learn the facts about replication in mainframe storage webinar
Learn the facts about replication in mainframe storage webinarLearn the facts about replication in mainframe storage webinar
Learn the facts about replication in mainframe storage webinarHitachi Vantara
 
Design of embedded systems
Design of embedded systemsDesign of embedded systems
Design of embedded systemsPradeep Kumar TS
 
Introduction to the DAOS Scale-out object store (HLRS Workshop, April 2017)
Introduction to the DAOS Scale-out object store (HLRS Workshop, April 2017)Introduction to the DAOS Scale-out object store (HLRS Workshop, April 2017)
Introduction to the DAOS Scale-out object store (HLRS Workshop, April 2017)Johann Lombardi
 
Webinar NETGEAR - Storagecraft e Netgear: soluzioni per il backup e il disast...
Webinar NETGEAR - Storagecraft e Netgear: soluzioni per il backup e il disast...Webinar NETGEAR - Storagecraft e Netgear: soluzioni per il backup e il disast...
Webinar NETGEAR - Storagecraft e Netgear: soluzioni per il backup e il disast...Netgear Italia
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERinside-BigData.com
 
Running Kubernetes on OpenStack
Running Kubernetes on OpenStackRunning Kubernetes on OpenStack
Running Kubernetes on OpenStackLiz Warner
 
Making workload nomadic when accelerated
Making workload nomadic when acceleratedMaking workload nomadic when accelerated
Making workload nomadic when acceleratedZhipeng Huang
 
Data-Intensive Workflows with DAOS
Data-Intensive Workflows with DAOSData-Intensive Workflows with DAOS
Data-Intensive Workflows with DAOSJohann Lombardi
 
Trends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient PerformanceTrends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient Performanceinside-BigData.com
 
IMCSummit 2015 - Day 2 Developer Track - The NVM Revolution
IMCSummit 2015 - Day 2 Developer Track - The NVM RevolutionIMCSummit 2015 - Day 2 Developer Track - The NVM Revolution
IMCSummit 2015 - Day 2 Developer Track - The NVM RevolutionIn-Memory Computing Summit
 

What's hot (20)

Dell PowerEdge Zero Touch Provisioning
Dell PowerEdge Zero Touch ProvisioningDell PowerEdge Zero Touch Provisioning
Dell PowerEdge Zero Touch Provisioning
 
Intel® Select Solutions for the Network
Intel® Select Solutions for the NetworkIntel® Select Solutions for the Network
Intel® Select Solutions for the Network
 
Closed Loop Network Automation for Optimal Resource Allocation via Reinforcem...
Closed Loop Network Automation for Optimal Resource Allocation via Reinforcem...Closed Loop Network Automation for Optimal Resource Allocation via Reinforcem...
Closed Loop Network Automation for Optimal Resource Allocation via Reinforcem...
 
Platform Observability and Infrastructure Closed Loops
Platform Observability and Infrastructure Closed LoopsPlatform Observability and Infrastructure Closed Loops
Platform Observability and Infrastructure Closed Loops
 
Open Source for the 4th Industrial Revolution
Open Source for the 4th Industrial RevolutionOpen Source for the 4th Industrial Revolution
Open Source for the 4th Industrial Revolution
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
Service Assurance Constructs for Achieving Network Transformation - Sunku Ran...
Service Assurance Constructs for Achieving Network Transformation - Sunku Ran...Service Assurance Constructs for Achieving Network Transformation - Sunku Ran...
Service Assurance Constructs for Achieving Network Transformation - Sunku Ran...
 
Learn the facts about replication in mainframe storage webinar
Learn the facts about replication in mainframe storage webinarLearn the facts about replication in mainframe storage webinar
Learn the facts about replication in mainframe storage webinar
 
Design of embedded systems
Design of embedded systemsDesign of embedded systems
Design of embedded systems
 
Introduction to the DAOS Scale-out object store (HLRS Workshop, April 2017)
Introduction to the DAOS Scale-out object store (HLRS Workshop, April 2017)Introduction to the DAOS Scale-out object store (HLRS Workshop, April 2017)
Introduction to the DAOS Scale-out object store (HLRS Workshop, April 2017)
 
Choosing the right processor
Choosing the right processorChoosing the right processor
Choosing the right processor
 
Webinar NETGEAR - Storagecraft e Netgear: soluzioni per il backup e il disast...
Webinar NETGEAR - Storagecraft e Netgear: soluzioni per il backup e il disast...Webinar NETGEAR - Storagecraft e Netgear: soluzioni per il backup e il disast...
Webinar NETGEAR - Storagecraft e Netgear: soluzioni per il backup e il disast...
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWER
 
Running Kubernetes on OpenStack
Running Kubernetes on OpenStackRunning Kubernetes on OpenStack
Running Kubernetes on OpenStack
 
DOME 64-bit μDataCenter
DOME 64-bit μDataCenterDOME 64-bit μDataCenter
DOME 64-bit μDataCenter
 
Summit workshop thompto
Summit workshop thomptoSummit workshop thompto
Summit workshop thompto
 
Making workload nomadic when accelerated
Making workload nomadic when acceleratedMaking workload nomadic when accelerated
Making workload nomadic when accelerated
 
Data-Intensive Workflows with DAOS
Data-Intensive Workflows with DAOSData-Intensive Workflows with DAOS
Data-Intensive Workflows with DAOS
 
Trends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient PerformanceTrends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient Performance
 
IMCSummit 2015 - Day 2 Developer Track - The NVM Revolution
IMCSummit 2015 - Day 2 Developer Track - The NVM RevolutionIMCSummit 2015 - Day 2 Developer Track - The NVM Revolution
IMCSummit 2015 - Day 2 Developer Track - The NVM Revolution
 

Viewers also liked

Challenges in multi core programming by Nishigandha Wankhade
Challenges in multi core programming by Nishigandha WankhadeChallenges in multi core programming by Nishigandha Wankhade
Challenges in multi core programming by Nishigandha WankhadeNishigandha Wankhade
 
Multi core programming 1
Multi core programming 1Multi core programming 1
Multi core programming 1Robin Aggarwal
 
High Temperature Cabinet Oven by ACMAS Technologies Pvt Ltd.
High Temperature Cabinet Oven by ACMAS Technologies Pvt Ltd.High Temperature Cabinet Oven by ACMAS Technologies Pvt Ltd.
High Temperature Cabinet Oven by ACMAS Technologies Pvt Ltd.Acmas Technologies Pvt. Ltd.
 
Multi core programming 2
Multi core programming 2Multi core programming 2
Multi core programming 2Robin Aggarwal
 
OpenHPI - Parallel Programming Concepts - Week 1
OpenHPI - Parallel Programming Concepts - Week 1OpenHPI - Parallel Programming Concepts - Week 1
OpenHPI - Parallel Programming Concepts - Week 1Peter Tröger
 
OpenHPI - Parallel Programming Concepts - Week 2
OpenHPI - Parallel Programming Concepts - Week 2OpenHPI - Parallel Programming Concepts - Week 2
OpenHPI - Parallel Programming Concepts - Week 2Peter Tröger
 
OpenHPI - Parallel Programming Concepts - Week 3
OpenHPI - Parallel Programming Concepts - Week 3OpenHPI - Parallel Programming Concepts - Week 3
OpenHPI - Parallel Programming Concepts - Week 3Peter Tröger
 
OpenHPI - Parallel Programming Concepts - Week 4
OpenHPI - Parallel Programming Concepts - Week 4OpenHPI - Parallel Programming Concepts - Week 4
OpenHPI - Parallel Programming Concepts - Week 4Peter Tröger
 
Stream-based Data Synchronization
Stream-based Data SynchronizationStream-based Data Synchronization
Stream-based Data SynchronizationKlemen Verdnik
 
East Coast DevCon 2014: Programming in UE4 - A Quick Orientation for Coders
East Coast DevCon 2014: Programming in UE4 - A Quick Orientation for CodersEast Coast DevCon 2014: Programming in UE4 - A Quick Orientation for Coders
East Coast DevCon 2014: Programming in UE4 - A Quick Orientation for CodersGerke Max Preussner
 
East Coast DevCon 2014: Concurrency & Parallelism in UE4 - Tips for programmi...
East Coast DevCon 2014: Concurrency & Parallelism in UE4 - Tips for programmi...East Coast DevCon 2014: Concurrency & Parallelism in UE4 - Tips for programmi...
East Coast DevCon 2014: Concurrency & Parallelism in UE4 - Tips for programmi...Gerke Max Preussner
 
GDC Europe 2014: Unreal Engine 4 for Programmers - Lessons Learned & Things t...
GDC Europe 2014: Unreal Engine 4 for Programmers - Lessons Learned & Things t...GDC Europe 2014: Unreal Engine 4 for Programmers - Lessons Learned & Things t...
GDC Europe 2014: Unreal Engine 4 for Programmers - Lessons Learned & Things t...Gerke Max Preussner
 

Viewers also liked (13)

Challenges in multi core programming by Nishigandha Wankhade
Challenges in multi core programming by Nishigandha WankhadeChallenges in multi core programming by Nishigandha Wankhade
Challenges in multi core programming by Nishigandha Wankhade
 
Multi core programming 1
Multi core programming 1Multi core programming 1
Multi core programming 1
 
High Temperature Cabinet Oven by ACMAS Technologies Pvt Ltd.
High Temperature Cabinet Oven by ACMAS Technologies Pvt Ltd.High Temperature Cabinet Oven by ACMAS Technologies Pvt Ltd.
High Temperature Cabinet Oven by ACMAS Technologies Pvt Ltd.
 
Multi core programming 2
Multi core programming 2Multi core programming 2
Multi core programming 2
 
OpenHPI - Parallel Programming Concepts - Week 1
OpenHPI - Parallel Programming Concepts - Week 1OpenHPI - Parallel Programming Concepts - Week 1
OpenHPI - Parallel Programming Concepts - Week 1
 
OpenHPI - Parallel Programming Concepts - Week 2
OpenHPI - Parallel Programming Concepts - Week 2OpenHPI - Parallel Programming Concepts - Week 2
OpenHPI - Parallel Programming Concepts - Week 2
 
OpenHPI - Parallel Programming Concepts - Week 3
OpenHPI - Parallel Programming Concepts - Week 3OpenHPI - Parallel Programming Concepts - Week 3
OpenHPI - Parallel Programming Concepts - Week 3
 
Multi threading
Multi threadingMulti threading
Multi threading
 
OpenHPI - Parallel Programming Concepts - Week 4
OpenHPI - Parallel Programming Concepts - Week 4OpenHPI - Parallel Programming Concepts - Week 4
OpenHPI - Parallel Programming Concepts - Week 4
 
Stream-based Data Synchronization
Stream-based Data SynchronizationStream-based Data Synchronization
Stream-based Data Synchronization
 
East Coast DevCon 2014: Programming in UE4 - A Quick Orientation for Coders
East Coast DevCon 2014: Programming in UE4 - A Quick Orientation for CodersEast Coast DevCon 2014: Programming in UE4 - A Quick Orientation for Coders
East Coast DevCon 2014: Programming in UE4 - A Quick Orientation for Coders
 
East Coast DevCon 2014: Concurrency & Parallelism in UE4 - Tips for programmi...
East Coast DevCon 2014: Concurrency & Parallelism in UE4 - Tips for programmi...East Coast DevCon 2014: Concurrency & Parallelism in UE4 - Tips for programmi...
East Coast DevCon 2014: Concurrency & Parallelism in UE4 - Tips for programmi...
 
GDC Europe 2014: Unreal Engine 4 for Programmers - Lessons Learned & Things t...
GDC Europe 2014: Unreal Engine 4 for Programmers - Lessons Learned & Things t...GDC Europe 2014: Unreal Engine 4 for Programmers - Lessons Learned & Things t...
GDC Europe 2014: Unreal Engine 4 for Programmers - Lessons Learned & Things t...
 

Similar to Distributed Resource Management Application API (DRMAA) Version 2

OGCE RT Rroject Review
OGCE RT Rroject ReviewOGCE RT Rroject Review
OGCE RT Rroject Reviewmarpierc
 
OGCE Review for Indiana University Research Technologies
OGCE Review for Indiana University Research TechnologiesOGCE Review for Indiana University Research Technologies
OGCE Review for Indiana University Research Technologiesmarpierc
 
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaPrometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaSridhar Kumar N
 
Scaling Prometheus Metrics in Kubernetes with Telegraf | Chris Goller | Influ...
Scaling Prometheus Metrics in Kubernetes with Telegraf | Chris Goller | Influ...Scaling Prometheus Metrics in Kubernetes with Telegraf | Chris Goller | Influ...
Scaling Prometheus Metrics in Kubernetes with Telegraf | Chris Goller | Influ...InfluxData
 
Chep04 Talk
Chep04 TalkChep04 Talk
Chep04 TalkFNian
 
Kernel Recipes 2015: Anatomy of an atomic KMS driver
Kernel Recipes 2015: Anatomy of an atomic KMS driverKernel Recipes 2015: Anatomy of an atomic KMS driver
Kernel Recipes 2015: Anatomy of an atomic KMS driverAnne Nicolas
 
Practical Operation Automation with StackStorm
Practical Operation Automation with StackStormPractical Operation Automation with StackStorm
Practical Operation Automation with StackStormShu Sugimoto
 
Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018
Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018
Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018Taro L. Saito
 
Directive-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous ComputingDirective-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous ComputingRuymán Reyes
 
GDG Cloud Southlake #8 Steve Cravens: Infrastructure as-Code (IaC) in 2022: ...
GDG Cloud Southlake #8  Steve Cravens: Infrastructure as-Code (IaC) in 2022: ...GDG Cloud Southlake #8  Steve Cravens: Infrastructure as-Code (IaC) in 2022: ...
GDG Cloud Southlake #8 Steve Cravens: Infrastructure as-Code (IaC) in 2022: ...James Anderson
 
FIWARE Global Summit - FogFlow, a new GE for IoT Edge Computing
FIWARE Global Summit - FogFlow, a new GE for IoT Edge ComputingFIWARE Global Summit - FogFlow, a new GE for IoT Edge Computing
FIWARE Global Summit - FogFlow, a new GE for IoT Edge ComputingFIWARE
 
Id0115
Id0115Id0115
Id0115FNian
 
mago3D FOSS4G NA 2018
mago3D FOSS4G NA 2018mago3D FOSS4G NA 2018
mago3D FOSS4G NA 2018정대 천
 
jcmd #javacasual
jcmd #javacasualjcmd #javacasual
jcmd #javacasualYuji Kubota
 
No-Java Enterprise Applications: It’s All About JavaScript [DEV5107]
No-Java Enterprise Applications: It’s All About JavaScript [DEV5107]No-Java Enterprise Applications: It’s All About JavaScript [DEV5107]
No-Java Enterprise Applications: It’s All About JavaScript [DEV5107]Soham Dasgupta
 

Similar to Distributed Resource Management Application API (DRMAA) Version 2 (20)

OGCE RT Rroject Review
OGCE RT Rroject ReviewOGCE RT Rroject Review
OGCE RT Rroject Review
 
OGCE Review for Indiana University Research Technologies
OGCE Review for Indiana University Research TechnologiesOGCE Review for Indiana University Research Technologies
OGCE Review for Indiana University Research Technologies
 
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaPrometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
 
Scaling Prometheus Metrics in Kubernetes with Telegraf | Chris Goller | Influ...
Scaling Prometheus Metrics in Kubernetes with Telegraf | Chris Goller | Influ...Scaling Prometheus Metrics in Kubernetes with Telegraf | Chris Goller | Influ...
Scaling Prometheus Metrics in Kubernetes with Telegraf | Chris Goller | Influ...
 
Chep04 Talk
Chep04 TalkChep04 Talk
Chep04 Talk
 
Kernel Recipes 2015: Anatomy of an atomic KMS driver
Kernel Recipes 2015: Anatomy of an atomic KMS driverKernel Recipes 2015: Anatomy of an atomic KMS driver
Kernel Recipes 2015: Anatomy of an atomic KMS driver
 
Practical Operation Automation with StackStorm
Practical Operation Automation with StackStormPractical Operation Automation with StackStorm
Practical Operation Automation with StackStorm
 
Grails 101
Grails 101Grails 101
Grails 101
 
Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018
Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018
Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018
 
drmaatutggf12
drmaatutggf12drmaatutggf12
drmaatutggf12
 
drmaatutggf12
drmaatutggf12drmaatutggf12
drmaatutggf12
 
drmaatutggf12
drmaatutggf12drmaatutggf12
drmaatutggf12
 
drmaatutggf12
drmaatutggf12drmaatutggf12
drmaatutggf12
 
Directive-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous ComputingDirective-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous Computing
 
GDG Cloud Southlake #8 Steve Cravens: Infrastructure as-Code (IaC) in 2022: ...
GDG Cloud Southlake #8  Steve Cravens: Infrastructure as-Code (IaC) in 2022: ...GDG Cloud Southlake #8  Steve Cravens: Infrastructure as-Code (IaC) in 2022: ...
GDG Cloud Southlake #8 Steve Cravens: Infrastructure as-Code (IaC) in 2022: ...
 
FIWARE Global Summit - FogFlow, a new GE for IoT Edge Computing
FIWARE Global Summit - FogFlow, a new GE for IoT Edge ComputingFIWARE Global Summit - FogFlow, a new GE for IoT Edge Computing
FIWARE Global Summit - FogFlow, a new GE for IoT Edge Computing
 
Id0115
Id0115Id0115
Id0115
 
mago3D FOSS4G NA 2018
mago3D FOSS4G NA 2018mago3D FOSS4G NA 2018
mago3D FOSS4G NA 2018
 
jcmd #javacasual
jcmd #javacasualjcmd #javacasual
jcmd #javacasual
 
No-Java Enterprise Applications: It’s All About JavaScript [DEV5107]
No-Java Enterprise Applications: It’s All About JavaScript [DEV5107]No-Java Enterprise Applications: It’s All About JavaScript [DEV5107]
No-Java Enterprise Applications: It’s All About JavaScript [DEV5107]
 

More from Peter Tröger

WannaCry - An OS course perspective
WannaCry - An OS course perspectiveWannaCry - An OS course perspective
WannaCry - An OS course perspectivePeter Tröger
 
Cloud Standards and Virtualization
Cloud Standards and VirtualizationCloud Standards and Virtualization
Cloud Standards and VirtualizationPeter Tröger
 
OpenSubmit - How to grade 1200 code submissions
OpenSubmit - How to grade 1200 code submissionsOpenSubmit - How to grade 1200 code submissions
OpenSubmit - How to grade 1200 code submissionsPeter Tröger
 
Humans should not write XML.
Humans should not write XML.Humans should not write XML.
Humans should not write XML.Peter Tröger
 
What activates a bug? A refinement of the Laprie terminology model.
What activates a bug? A refinement of the Laprie terminology model.What activates a bug? A refinement of the Laprie terminology model.
What activates a bug? A refinement of the Laprie terminology model.Peter Tröger
 
Dependable Systems - Summary (16/16)
Dependable Systems - Summary (16/16)Dependable Systems - Summary (16/16)
Dependable Systems - Summary (16/16)Peter Tröger
 
Dependable Systems - Hardware Dependability with Redundancy (14/16)
Dependable Systems - Hardware Dependability with Redundancy (14/16)Dependable Systems - Hardware Dependability with Redundancy (14/16)
Dependable Systems - Hardware Dependability with Redundancy (14/16)Peter Tröger
 
Dependable Systems - System Dependability Evaluation (8/16)
Dependable Systems - System Dependability Evaluation (8/16)Dependable Systems - System Dependability Evaluation (8/16)
Dependable Systems - System Dependability Evaluation (8/16)Peter Tröger
 
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)Peter Tröger
 
Dependable Systems -Software Dependability (15/16)
Dependable Systems -Software Dependability (15/16)Dependable Systems -Software Dependability (15/16)
Dependable Systems -Software Dependability (15/16)Peter Tröger
 
Dependable Systems -Reliability Prediction (9/16)
Dependable Systems -Reliability Prediction (9/16)Dependable Systems -Reliability Prediction (9/16)
Dependable Systems -Reliability Prediction (9/16)Peter Tröger
 
Dependable Systems -Fault Tolerance Patterns (4/16)
Dependable Systems -Fault Tolerance Patterns (4/16)Dependable Systems -Fault Tolerance Patterns (4/16)
Dependable Systems -Fault Tolerance Patterns (4/16)Peter Tröger
 
Dependable Systems - Introduction (1/16)
Dependable Systems - Introduction (1/16)Dependable Systems - Introduction (1/16)
Dependable Systems - Introduction (1/16)Peter Tröger
 
Dependable Systems -Dependability Means (3/16)
Dependable Systems -Dependability Means (3/16)Dependable Systems -Dependability Means (3/16)
Dependable Systems -Dependability Means (3/16)Peter Tröger
 
Dependable Systems - Hardware Dependability with Diagnosis (13/16)
Dependable Systems - Hardware Dependability with Diagnosis (13/16)Dependable Systems - Hardware Dependability with Diagnosis (13/16)
Dependable Systems - Hardware Dependability with Diagnosis (13/16)Peter Tröger
 
Dependable Systems -Dependability Attributes (5/16)
Dependable Systems -Dependability Attributes (5/16)Dependable Systems -Dependability Attributes (5/16)
Dependable Systems -Dependability Attributes (5/16)Peter Tröger
 
Dependable Systems -Dependability Threats (2/16)
Dependable Systems -Dependability Threats (2/16)Dependable Systems -Dependability Threats (2/16)
Dependable Systems -Dependability Threats (2/16)Peter Tröger
 
Verteilte Software-Systeme im Kontext von Industrie 4.0
Verteilte Software-Systeme im Kontext von Industrie 4.0Verteilte Software-Systeme im Kontext von Industrie 4.0
Verteilte Software-Systeme im Kontext von Industrie 4.0Peter Tröger
 
Operating Systems 1 (12/12) - Summary
Operating Systems 1 (12/12) - SummaryOperating Systems 1 (12/12) - Summary
Operating Systems 1 (12/12) - SummaryPeter Tröger
 
Operating Systems 1 (8/12) - Concurrency
Operating Systems 1 (8/12) - ConcurrencyOperating Systems 1 (8/12) - Concurrency
Operating Systems 1 (8/12) - ConcurrencyPeter Tröger
 

More from Peter Tröger (20)

WannaCry - An OS course perspective
WannaCry - An OS course perspectiveWannaCry - An OS course perspective
WannaCry - An OS course perspective
 
Cloud Standards and Virtualization
Cloud Standards and VirtualizationCloud Standards and Virtualization
Cloud Standards and Virtualization
 
OpenSubmit - How to grade 1200 code submissions
OpenSubmit - How to grade 1200 code submissionsOpenSubmit - How to grade 1200 code submissions
OpenSubmit - How to grade 1200 code submissions
 
Humans should not write XML.
Humans should not write XML.Humans should not write XML.
Humans should not write XML.
 
What activates a bug? A refinement of the Laprie terminology model.
What activates a bug? A refinement of the Laprie terminology model.What activates a bug? A refinement of the Laprie terminology model.
What activates a bug? A refinement of the Laprie terminology model.
 
Dependable Systems - Summary (16/16)
Dependable Systems - Summary (16/16)Dependable Systems - Summary (16/16)
Dependable Systems - Summary (16/16)
 
Dependable Systems - Hardware Dependability with Redundancy (14/16)
Dependable Systems - Hardware Dependability with Redundancy (14/16)Dependable Systems - Hardware Dependability with Redundancy (14/16)
Dependable Systems - Hardware Dependability with Redundancy (14/16)
 
Dependable Systems - System Dependability Evaluation (8/16)
Dependable Systems - System Dependability Evaluation (8/16)Dependable Systems - System Dependability Evaluation (8/16)
Dependable Systems - System Dependability Evaluation (8/16)
 
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
 
Dependable Systems -Software Dependability (15/16)
Dependable Systems -Software Dependability (15/16)Dependable Systems -Software Dependability (15/16)
Dependable Systems -Software Dependability (15/16)
 
Dependable Systems -Reliability Prediction (9/16)
Dependable Systems -Reliability Prediction (9/16)Dependable Systems -Reliability Prediction (9/16)
Dependable Systems -Reliability Prediction (9/16)
 
Dependable Systems -Fault Tolerance Patterns (4/16)
Dependable Systems -Fault Tolerance Patterns (4/16)Dependable Systems -Fault Tolerance Patterns (4/16)
Dependable Systems -Fault Tolerance Patterns (4/16)
 
Dependable Systems - Introduction (1/16)
Dependable Systems - Introduction (1/16)Dependable Systems - Introduction (1/16)
Dependable Systems - Introduction (1/16)
 
Dependable Systems -Dependability Means (3/16)
Dependable Systems -Dependability Means (3/16)Dependable Systems -Dependability Means (3/16)
Dependable Systems -Dependability Means (3/16)
 
Dependable Systems - Hardware Dependability with Diagnosis (13/16)
Dependable Systems - Hardware Dependability with Diagnosis (13/16)Dependable Systems - Hardware Dependability with Diagnosis (13/16)
Dependable Systems - Hardware Dependability with Diagnosis (13/16)
 
Dependable Systems -Dependability Attributes (5/16)
Dependable Systems -Dependability Attributes (5/16)Dependable Systems -Dependability Attributes (5/16)
Dependable Systems -Dependability Attributes (5/16)
 
Dependable Systems -Dependability Threats (2/16)
Dependable Systems -Dependability Threats (2/16)Dependable Systems -Dependability Threats (2/16)
Dependable Systems -Dependability Threats (2/16)
 
Verteilte Software-Systeme im Kontext von Industrie 4.0
Verteilte Software-Systeme im Kontext von Industrie 4.0Verteilte Software-Systeme im Kontext von Industrie 4.0
Verteilte Software-Systeme im Kontext von Industrie 4.0
 
Operating Systems 1 (12/12) - Summary
Operating Systems 1 (12/12) - SummaryOperating Systems 1 (12/12) - Summary
Operating Systems 1 (12/12) - Summary
 
Operating Systems 1 (8/12) - Concurrency
Operating Systems 1 (8/12) - ConcurrencyOperating Systems 1 (8/12) - Concurrency
Operating Systems 1 (8/12) - Concurrency
 

Recently uploaded

What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 

Recently uploaded (20)

What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 

Distributed Resource Management Application API (DRMAA) Version 2

  • 1. Distributed Resource Management Application API (DRMAA) Version 2 Dr. Peter Tröger Hasso-Plattner-Institute, University of Potsdam peter@troeger.eu DRMAA-WG Co-Chair http://www.drmaa.org/
  • 2. DRMAAv2 | FZ Jülich PT 2012 Open Grid Forum (OGF) Application Area 2 DRMAAAPI Grid Engine OtherOGF Standards Proprietary API DRMAAAPI Load Leveler OtherOGF Standards Proprietary API SAGA API + Backends Meta Scheduler DRMAAAPI OtherOGF Standards Proprietary API End User Application / Portal ... SAGA API / OGSA / OCCI End User Application / Portal Features Portabilit Features Portabilit
  • 3. DRMAAv2 | FZ Jülich PT 2012 DRMAA • DRMAA group established in 2002 • Goal: Standardized API for distributed resource management systems (DRMS) • Low-level portability for cluster and grid infrastructure applications • Simple design, bottom-up philosophy • Different DRMAA 1.0 documents • June 2004 - DRMAA 1.0 Proposed Recommendation (GFD.22) • April 2008 - Shift to IDL based root specification, some clarifications (GFD.130) • June 2008 - DRMAA 1.0 Grid Recommendation (GFD.133) • Official language binding documents for C, Java, Python (GFD.143) • Experience reports, tutorials, unofficial language bindings for Perl, Ruby and C# 3
  • 4. DRMAAv2 | FZ Jülich PT 2012 Stakeholders • Distributed resource management (DRM) system • Distributes computational tasks on execution resources • Central scheduling entity • DRMAA implementation / library : Implementation of a language binding, 
 semantics as in the root spec • Submission host: Resource that runs the DRMAA-based application • Execution host: Resource that can run a submitted computational task • Job: One or more operating system processes on a execution host 4
  • 5. DRMAAv2 | FZ Jülich PT 2012 DRMAA v1 Success Story 5 • Product-quality implementations of DRMAA C and Java, broad industrial up-take • Grid Engine since 2006 • Condor since 2006 • PBS/Torque, LSF, LoadLeveler, Kerrighed, SLURM, Globus (via GridWay), ...
 (third-party implementations) • Large and silent user community • Meta schedulers accessing DRM resources 
 (Galaxy, QosCosGrid, MOAB, eXludus, OpenDSP, EGEE, Unicore, SAGA, ...) • Applications or portals accessing DRM resources
 (BLAST, workflow processing, finance sector, seismic data, Mathematica, ...) • Hundreds (?) of unknown Grid Engine users
  • 6. DRMAAv2 | FZ Jülich PT 2012 DRMAA v1 C Example 6 #include "drmaa.h" int main(int argc, char **argv) { char error[DRMAA_ERROR_STRING_BUFFER]; int errnum = 0; drmaa_job_template_t *jt = NULL; errnum = drmaa_init(NULL, error, DRMAA_ERROR_STRING_BUFFER); if (errnum != DRMAA_ERRNO_SUCCESS) return 1; errnum = drmaa_allocate_job_template(&jt, error, DRMAA_ERROR_STRING_BUFFER); if (errnum != DRMAA_ERRNO_SUCCESS) return 1; drmaa_set_attribute(jt, DRMAA_REMOTE_COMMAND, "sleeper.sh", error, DRMAA_ERROR_STRING_BUFFER); const char *args[2] = {"5", NULL}; drmaa_set_vector_attribute(jt, DRMAA_V_ARGV, args, error, DRMAA_ERROR_STRING_BUFFER); char jobid[DRMAA_JOBNAME_BUFFER]; errnum = drmaa_run_job(jobid, DRMAA_JOBNAME_BUFFER, jt, error, DRMAA_ERROR_STRING_BUFFER); if (errnum != DRMAA_ERRNO_SUCCESS) return 1; errnum = drmaa_delete_job_template(jt, error, DRMAA_ERROR_STRING_BUFFER); errnum = drmaa_exit(error, DRMAA_ERROR_STRING_BUFFER); return 0; }
  • 7. DRMAAv2 | FZ Jülich PT 2012 DRMAA v1 Java Example (1/2) 7 try { session.init (""); System.out.println ("Version: " + session.getDrmaaImplementation ()); JobTemplate jt = session.createJobTemplate (); jt.setRemoteCommand ("<SGE_ROOT>/examples/javaone/sleeper.sh"); jt.setArgs (new String[]{"30"}); jt.setWorkingDirectory ("<SGE_ROOT>/<SGE_CELL>/javaone"); jt.setJobCategory ("sleeper"); String jobId = session.runJob (jt); List jobIds = session.runBulkJobs (jt, 1, 4, 1); System.out.println ("Job " + jobId + " is running"); for (Object id: List jobIds) { System.out.println ("Job " + id + " is running"); } session.deleteJobTemplate (jt);
  • 8. DRMAAv2 | FZ Jülich PT 2012 DRMAA v1 Java Example (2/2) 8 JobInfo info = session.wait (jobId, Session.TIMEOUT_WAIT_FOREVER); Map usage = info.getResourceUsage (); for (Object name : usage.keySet ()) { System.out.println (name + "=" + usage.get (name));} if (info.hasExited ()) { System.out.println ("Job exited: " + info.getExitStatus ());} else if (info.hasSignaled ()) { System.out.println ("Job signaled: " + info.getTerminatingSignal ()); if (info.hasCoreDump ()) { System.out.println ("A core dump is available.");}} else if (info.wasAborted ()) { System.out.println ("Job never ran.");} else { System.out.println ("Exit status is unknown.");} session.synchronize (jobIds, Session.TIMEOUT_WAIT_FOREVER, true); } catch (DrmaaException e) { e.printStackTrace (); System.exit (1);} }
  • 9. DRMAAv2 | FZ Jülich PT 2012 DRMAA v1 Got Old • DRM systems got better • Concept of resources, session persistency, advance reservation, parallel jobs, many-core systems, queues, state models, WS-* stuff, ... • Some obsolete / never implemented features • Date / time handling, host-to-host file staging, specialized job states, ... • Awkward design decisions • Job synchronization, job monitoring, data reaping, native specification, ... • DRMAAv2 work started in 2009 • Public survey, Sun customer feedback, implementation experiences, ... • Close collaboration with SAGA, GAT, and OCCI WG; considered JSDL & BES • Performant investigation of current DRM systems 9
  • 10. DRMAAv2 | FZ Jülich PT 2012 DRMAA v2 - Status Today • DRMAAv2 root specification is an official OGF standard since January 2012 • http://www.ogf.org/documents/GFD.194.pdf • Final draft of DRMAAv2 C binding announced yesterday • No more major changes, header file available • OGF public comment period will start in May 2012 • Official specification in late summer 2012 • New project for fork-based reference implementation (drmaav2-mock) • Language bindings for C++, Python, Java and OCCI under way 10
  • 11. DRMAAv2 | FZ Jülich PT 2012 DRMAA v2 Implementations ? • Implementations announced • Fork-based reference implementation (DRMAA Working Group) • Univa Grid Engine (Daniel Gruber) • Torque / PBS Pro (Mariusz Mamonski) • GridWay (Eduardo Huedo) • Condor (Todd Tannenbaum) • Collection type handling in C should be re-usable • Interfacing the DRM system is already part of the DRMAAv1 library • Old libraires can be extended with new features
 (job arrays, session persistency, reservation, monitoring) 11
  • 12. DRMAAv2 | FZ Jülich PT 2012 DRMAA v2 Design Approach • All behavioral aspects in the IDL-based root specification • Same model as W3C DOM, OGF SAGA, ... • What functions are offered ? How are they grouped ? • What are possible error conditions ? • For language binding designers and implementors • Language binding just defines syntactical mapping • „DRMAA IDL construct A maps to programming language construct B“ • Root spec mandates some design decisions in the language binding (e.g. „UNSET“) • End users should get dedicated man pages 12 DRMAA C Binding DRMAA OCCI Binding GridEngineImplementation PBSImplementation ... LSFImplementation GridWayImplementation ... DRMAA Root Spec
  • 13. DRMAAv2 | FZ Jülich PT 2012 DRMAA v2 Layout 13 interface SessionManager interface MonitoringSession interface Job interface JobSession interface JobArray interface Reservation interface ReservationSession interface DrmaaCallback interface DrmaaReflective Exceptions struct SlotInfo struct MachineInfo struct QueueInfo struct DrmaaNotification struct ReservationTemplate struct JobTemplate struct ReservationInfo struct JobInfo Enumerations Establish session
 with the DRM system Work with
 jobs / advance reservation /
 monitoring features 
 of the
 DRM system
  • 14. DRMAAv2 | FZ Jülich PT 2012 Session Manager 14 • Create multiple connections to one (or more) DRM system(s) at a time • JobSession / ReservationSession • Persistent storage of session state (at least) on submission machine • Can be reloaded by name, explicit reaping • MonitoringSession • Read-only semantic, global view on all machines • Maps nicely to SAGA and friends - management vs. monitoring • Security remain out of scope for DRMAA • Explicitly supports the portal / command-line tools use case • Old concept of contact strings from DRMAAv1
  • 15. DRMAAv2 | FZ Jülich PT 2012 Session Manager 15 interface SessionManager{ readonly attribute string drmsName; readonly attribute Version drmsVersion; readonly attribute string drmaaName; readonly attribute Version drmaaVersion; boolean supports(in DrmaaCapability capability); JobSession createJobSession(in string sessionName, in string contact); ReservationSession createReservationSession(
 in string sessionName, in string contact); JobSession openJobSession(in string sessionName); ReservationSession openReservationSession(in string sessionName); MonitoringSession openMonitoringSession (in string contact); void closeJobSession(in JobSession s); void closeReservationSession(in ReservationSession s); void closeMonitoringSession(in MonitoringSession s); void destroyJobSession(in string sessionName); void destroyReservationSession(in string sessionName); StringList getJobSessionNames(); StringList getReservationSessionNames(); void registerEventNotification(in DrmaaCallback callback); };
  • 16. DRMAAv2 | FZ Jülich PT 2012 Event Callback 16 interface DrmaaCallback { void notify(in DrmaaNotification notification); }; struct DrmaaNotification { DrmaaEvent event; string jobId; string sessionName; JobState jobState; }; enum DrmaaEvent { NEW_STATE, MIGRATED, ATTRIBUTE_CHANGE }; • Optional support for event push notification • Application implements DrmaaCallback interface • Demands some additional rules from the language binding • Push notification at least supported in Grid Engine • Heavily demanded by SAGA and end users • Might be realized by a 
 polling DRMAA library • Nested callbacks are forbidden • Scalability is out of scope
  • 17. DRMAAv2 | FZ Jülich PT 2012 Job Categories • Consider deployment properties • Path settings, environment variables, software installed, application starters, ... • Job category references a site-specific configuration • Non-normative set of recommendations on DRMAA home page • Idea is to perform updates without spec modification • Site operators should be allowed to 
 create their own job categories
 (configuration of the implementation) • Initial set derived from JSDL SPMD 
 Extension (GFD.115) + experience • Proposals welcome ... 17
  • 18. DRMAAv2 | FZ Jülich PT 2012 Job Session 18 interface JobSession { readonly attribute string contact; readonly attribute string sessionName; readonly attribute StringList jobCategories; JobList getJobs(in JobInfo filter); JobArray getJobArray(in string jobArrayId); Job runJob(in JobTemplate jobTemplate); JobArray runBulkJobs( in JobTemplate jobTemplate, in long beginIndex, in long endIndex, in long step, in long maxParallel); Job waitAnyStarted(in JobList jobs, in TimeAmount timeout); Job waitAnyTerminated(in JobList jobs, in TimeAmount timeout); }; native ZERO_TIME; native INFINITE_TIME; • New: Fetch list of 
 supported job 
 categories (portal case) • New: Job filtering • New: JobArray • New: maxParallel restriction for bulk jobs • Old synchronize() was 
 replaced by class-based
 state waiting approach • More responsive • Get rid of partial job
 failure problems
  • 19. DRMAAv2 | FZ Jülich PT 2012 Job Template • Basic rules • DRM systems manage machines, which have a CPU and physical memory • Machines can be booked in advance reservation • DRM systems manage queues and slots as resources, opaque to DRMAA • Jobs can be submitted ... • ... to specified candidate machines or a queue • ... to machines matching an OS type, architecture type, or memory requirement • ... as ,classified‘ job with special treatment by the DRMS (jobCategory) • Examples: MPICH2 job, OpenMPI job, OpenMP job 19
  • 20. DRMAAv2 | FZ Jülich PT 2012 Job Template • Most things remain as in DRMAAv1, but: • Relative start / end times are gone, switched to RFC822 • Only copying between submission and exception machine, no host names • Standardized job category names (based on GFD.115, see web page) • Standardized resource limit types (setrlimit) • Several new resource specification attributes 20 struct JobTemplate { string remoteCommand; OrderedStringList args; boolean submitAsHold; boolean rerunnable; Dictionary jobEnvironment; string workingDirectory; string jobCategory; StringList email; boolean emailOnStarted; boolean emailOnTerminated; string jobName; string inputPath; string outputPath; string errorPath; boolean joinFiles; string reservationId; string queueName; long minSlots; long maxSlots; long priority; OrderedStringList candidateMachines; long minPhysMemory; OperatingSystem machineOS; CpuArchitecture machineArch; AbsoluteTime startTime; AbsoluteTime deadlineTime; Dictionary stageInFiles; Dictionary stageOutFiles; Dictionary resourceLimits; string accountingId; };
  • 21. DRMAAv2 | FZ Jülich PT 2012 Optional vs. Implementation-Specific • Make non-mandatory things explicit • DrmaaCapability enumeration • One enum per optional feature + SessionManager::supports method • DrmaaReflective interface • Lists of implementation-specific attributes + generic getter / setter functions 21 enum DrmaaCapability { ADVANCE_RESERVATION, RESERVE_SLOTS, CALLBACK, BULK_JOBS_MAXPARALLEL, JT_EMAIL, JT_STAGING, JT_DEADLINE, JT_MAXSLOTS, JT_ACCOUNTINGID, RT_STARTNOW, RT_DURATION, RT_MACHINEOS, RT_MACHINEARCH }; interface DrmaaReflective { readonly attribute StringList jobTemplateImplSpec; readonly attribute StringList jobInfoImplSpec; ... string getInstanceValue(in any instance, in string name); void setInstanceValue(in any instance, in string name, in string value); string describeAttribute(in any instance, in string name); }
  • 22. DRMAAv2 | FZ Jülich PT 2012 Job / JobArray 22 interface Job { readonly attribute string jobId; readonly attribute string sessionName; readonly attribute JobTemplate jobTemplate; void suspend(); void resume(); void hold(); void release(); void terminate(); JobState getState(out any jobSubState); JobInfo getInfo(); Job waitStarted(in TimeAmount timeout); Job waitTerminated(in TimeAmount timeout); } interface JobArray { readonly attribute string jobArrayId; readonly attribute JobList jobs; readonly attribute string sessionName; readonly attribute JobTemplate jobTemplate; void suspend(); void resume(); void hold(); void release(); void terminate(); } • Heavy cleanup • Dedicated control methods • Explicit job information fetching • waitStarted() and waitTerminated() as on JobSession level • Support for the new job
 sub-state model • JobArray offers control
 operations on bulk jobs • Rejected: Signaling, modifying running jobs, ...
  • 23. DRMAAv2 | FZ Jülich PT 2012 Everybody Loves State Models 23 • Number of states reduced in
 comparison to DRMAAv1 • New sub-state concept • Similar to OGSA-BES • Allows simple DRMAAv1 mapping • Wait functions work only on class 
 level (timing issues) • No more different 
 hold modes
  • 24. DRMAAv2 | FZ Jülich PT 2012 JobInfo 24 struct JobInfo { string jobId; long exitStatus; string terminatingSignal; string annotation; JobState jobState; any jobSubState; OrderedSlotInfoList allocatedMachines; string submissionMachine; string jobOwner; long slots; string queueName; TimeAmount wallclockTime; long cpuTime; AbsoluteTime submissionTime; AbsoluteTime dispatchTime; AbsoluteTime finishTime; }; • No surprises with JobInfo • Used for both filtering and
 job status representation • Snapshot semantics • No promises for terminated jobs • Implementations can give a meaning to the reported list of machines being used

  • 25. DRMAAv2 | FZ Jülich PT 2012 Advance Reservation 25 • Completely new • Fits nicely to the rest of the spec
 interface ReservationSession { readonly attribute string contact; readonly attribute string sessionName; Reservation getReservation(in string reservationId); Reservation requestReservation(in ReservationTemplate reservationTemplate); ReservationList getReservations (); }; interface Reservation { readonly attribute string reservationId; readonly attribute string sessionName; readonly attribute ReservationTemplate reservationTemplate; ReservationInfo getInfo(); void terminate(); };
  • 26. DRMAAv2 | FZ Jülich PT 2012 Advance Reservation • Reservation template • Different combinations of startTime, endTime, and duration are allowed • Sliding window + time frame support • Support for authorization setup 26 struct ReservationTemplate { string reservationName; AbsoluteTime startTime; AbsoluteTime endTime; TimeAmount duration; long minSlots; long maxSlots; string jobCategory; StringList usersACL; OrderedStringList candidateMachines; long minPhysMemory; OperatingSystem machineOS; CpuArchitecture machineArch; }; struct ReservationInfo { string reservationId; string reservationName; AbsoluteTime reservedStartTime; AbsoluteTime reservedEndTime; StringList usersACL; long reservedSlots; OrderedSlotInfoList reservedMachines; }; struct SlotInfo { string machineName; long slots; }; ... typedef sequence<SlotInfo> OrderedSlotInfoList; ...
  • 27. DRMAAv2 | FZ Jülich PT 2012 DRMS Resource Monitoring • Completely new • Global view on resources • Stateless session instance • Implementation has freedom 
 to leave out information • Resource information model
 matching to SAGA • Virtual memory = physical memory + swap space • Load = 1-min average load • Snapshot semantics 27 typedef sequence <Reservation> ReservationList; typedef sequence <Job> JobList; typedef sequence <QueueInfo> QueueInfoList; typedef sequence <MachineInfo> MachineInfoList; ... interface MonitoringSession { ReservationList getAllReservations(); JobList getAllJobs(in JobInfo filter); QueueInfoList getAllQueues(in StringList names); MachineInfoList getAllMachines(in StringList names); }; struct QueueInfo { string name; }; struct MachineInfo { string name; boolean available; long sockets; long coresPerSocket; long threadsPerCore; double load; long physMemory; long virtMemory; OperatingSystem machineOS; Version machineOSVersion;
  • 28. DRMAAv2 | FZ Jülich PT 2012 More Stuff • DRMAA_INDEX_VAR • Implementations should set the environment variable DRMAA_INDEX_VAR • Contains the name of the DRMS environment variable providing the job index • TASK_ID (Grid Engine), PBS_ARRAYID (Torque), LSB_JOBINDEX (LSF) • Jobs are enabled to get their own parametric index 28
  • 30. DRMAAv2 | FZ Jülich PT 201230 OCCI-DRMAA Example
  • 31. DRMAAv2 | FZ Jülich PT 2012 Note: Job Requirements are Hard to Unify 31
  • 32. DRMAAv2 | FZ Jülich PT 2012 Note: Job Submission Is Hard to Unify 32 LSF Torque PBS Pro Wildcard Support no not recommended yes Other than submission host no yes yes Appending file yes no no Directory Staging no not by default yes • Ensuring true application portability with a unified API is difficult • Interoperability (OGSA, JSDL) does not provide portability • Profiles (!) with „SHOULD“ and „UnsupportedFeatureFault“ are not helpful • DRMAA tries to define mandatory job template attributes and API functions that are implementable in most DRM systems • This is why DRMAA will never by exhaustive -> use SAGA !
  • 33. DRMAAv2 | FZ Jülich PT 2012 Note: Relation to Other OGF Standards • JSDL is too exhaustive, GLUE is too conservative, SAGA / OGSA-BES use DRMAA 33
  • 34. DRMAAv2 | FZ Jülich PT 2012 Conclusion • Just the right time for DRMAAv2 adoption in DRM systems • Window of opportunity for tweaking the specs is still open • Already broad acceptance on user side (SAGA / DRMAAv1 community) • Promising new direction with OCCI-DRMAA mapping • Standardization is collaborative work • Roger Brobst (Cadence Design Systems), Daniel Gruber (Univa GmbH), 
 Mariusz Mamonski (Poznan Supercomputing and Networking Center),
 Morris Riedel (FZ Jülich), Daniel Templeton (Cloudera Inc.), 
 Andre Merzky (LSU), Thijs Metsch (Platform Computing / IBM), 
 OGF working groups, ... • And if you want to visit Potsdam ... 34