SlideShare a Scribd company logo
1 of 22
Download to read offline
Introduction to OpenSAF

                 David Fick
              Senior Software Architect
                GoAhead Software
Introduction to OpenSAF
• Service availability and high availability systems and
  concepts have been around for decades
• However, HA terminology tends to vary from industry to
  industry and company to company
• Goals of this session:
   – High-level technical overview of the Service Availability™ Forum
     standards
   – Overview of the support of those standards within OpenSAF
   – Allow you to:
       • Familiarize yourself with SA Forum and OpenSAF concepts and
         terminology OR
       • Map the HA concepts and terminology with which you are
         familiar to the SA Forum and OpenSAF versions
   – Resources for getting started with OpenSAF
SA Forum Interfaces: AIS & HPI
                                                                       Applications



                                                  Application Interface Specifications (AIS)
                                                          Service Availability Middleware
              System Management




   SAF                               Software Mgmt                        Availability         Lock (LCK)
                                    Framework (SMF)                      Management
Standards                                                             Framework (AMF)
Implemented                           Information                                           Checkpoint (CKPT)
by OpenSAF                          Model Mgmt (IMM)
                                                                Cluster Membership (CLM)
                                                                                               Event (EVT)
                                    Notification (NTF)

                                       Log (LOG)                  Platform Mgmt (PLM)        Message (MSG)


                                                                      Operating System

                                                                       Virtualization

                                                         Hardware Platform Interface (HPI)


                                  Hardware               Hardware             Hardware       Hardware
                                  Platform A             Platform B           Platform C     Platform D
But how to make sense of the
 SA Forum “acronym soup”?
AIS Service Groupings
    • First, understand that the AIS services fall into three
      logical groupings*:
           System Management                      Resource Availability               Application Services
                Services                          Management Services
                 Information                             Availability                    Checkpoint (CKPT)
               Model Mgmt (IMM)                         Management
                                                     Framework (AMF)
                                                                                            Event (EVT)
                Software Mgmt
               Framework (SMF)
                                                  Cluster Membership (CLM)
                                                                                          Message (MSG)
               Notification (NTF)

                                                    Platform Mgmt (PLM)                     Lock (LCK)
                  Log (LOG)



     Services that manage central            Services that manage and              Optional services to support
     system capabilities commonly            monitor the state of key system       application operations such as:
                                             resources that affect availability:   •    Inter-process
     used by both:                           •   Hardware / Operating                   communication
     •   AIS services                            system                            •    State replication
     •   Applications                        •   Cluster nodes                     •    Shared resource access
                                             •   Applications                           control
* - Not official SA Forum AIS service groupings
Fault Management Cycle
•   Second, AIS services that
    manage availability are
    designed around a standard
    fault management cycle
     – Detection                                     Detection
         • E.g. component
            healthchecks
     – Isolation
         • E.g. blade power off
     – Recovery                            Repair   Notification   Isolation
         • E.g. failover of workload
           assignments to associated
           standby resources
     – Repair
                                                     Recovery
         • E.g. automatic restart of
           failed resource
     – Notification
         • E.g. state change
           notifications sent by service
           managing the resource
Resource Dependencies
•   Third, Availability Management in the AIS world is
                                                                               Managed
    driven by a detailed understanding of the availability                    Applications
    management dependencies across all resource types
     – Managed Applications
         • Simple to complex dependencies and relationships can be
           modeled between the various software elements
         • Dependency on a particular node also modeled                        AMF Node
     – AMF Node
         • Represents a node where AMF services are provided
         • Depends on a CLM node
     – CLM Node                                                                CLM Node
         • Represents a cluster node where AIS services are
           provided
         • Depends on an Execution Environment (optional)
     – Platform Resource
         • Containment and logical dependencies represented                    Platform
           between platform resources                                          Resource
         • Execution Environment (EE)
              – Represents an operating system instance (standalone or
                virtual)
         • Hardware Element (HE)                                         Hardware      Execution
              – Represents a physical hardware resource in the system    Element      Environment
Common Design Patterns
• Fourth, the AIS services follow common design
  patterns:
  – API
     • Common library lifecycle
     • Naming conventions
  – Resource managed by service          Managed object
     • Typically with associated state model
     • Managed objects stored in common information model
  – Administrative operations
     • X.731 style administrative operations for resources which
       affect availability
  – Notifications automatically generated by AIS services for
    significant system events (alarms, state changes, etc.)
Resource Availability Management Services
•   Availability Management Framework (AMF)
     – Manages the lifecycle and monitors the state of the managed
       applications within the system
     – More detail in upcoming slides
•   Cluster Membership (CLM)
                                                                          AMF
     – Provides cluster membership change notifications to AIS services
       and interested applications
     – OpenSAF CLM implements cluster management protocol dealing
       with:
          • Cluster formation                                             CLM
          • Active controller selection & failover
          • Node failure detection
•   Platform Management (PLM)
     – Manages state of modeled hardware elements and execution
       environments (operating system instances)                          PLM
     – Hardware element states and events accessed through Hardware
       Platform Interface (HPI)
     – Manages graceful blade extraction / de-activation cases
     – Supports hardware element controls (power on/off and reset)
     – Optional service within OpenSAF
Availability Management Framework (AMF)
                         AMF Logical Entities
• Structural Entities                                       AMF
   – AMF Application                                     Application
       • Represents the highest-level             1..*
         service(s) provided by the
         system
   – Service Group (SG)                      Service
                                             Group
       • Represents a group of like
         logical resources that provide
         the same service(s)
       • Associated redundancy model             1..*
         (e.g. 1+1)
   – Service Unit (SU)                       Service
                                              Unit 1
       • Aggregates a set of resources
         which when combined provide
         a higher-level service                  1..*
   – Component
                                          Component
       • Represents one or more
         resources that perform a
         function within the system
Availability Management Framework (AMF)
                        AMF Logical Entities
• Workload Entities                                        AMF
                                                        Application
   – Service Instance (SI)
                                                 1..*
       • Represents a workload to be
         supported by the system              Service
                                             Service
                                            Service
                                              Group        Protected by
       • Has associated redundancy           Group
                                            Group
         requirements (1+1, N+M, etc.)
       • Protected by an identified SG
       • Assigned to one or more SUs            1..*                          1..*
         with an HA state of active,
                                              Service
         standby, quiescing or               Service
                                            Service1
                                               Unit     Assigned           Service
         quiesced                            Unit 11
                                              Unit                        Instance
   – Component Service Instance
     (CSI)                                                                     1..*
                                                1..*
      • Represents a more granular
                                                        Assigned      Component
        workload that needs to be          Component
                                          Component
                                         Component                      Service
        supported by the system                                        Instance

      • Assigned to one or more
        components
Availability Management Framework (AMF)
                              AMF Logical Entities
•   Common Characteristics
     – Well-defined state model for each logical entity type                                                CLC-CLI
                                                                                                            Scripts
     – X.731 style administrative operations                                         Lifecycle
                                                                                       mgmt
•   Common AMF Component Types                                                                                        AMF comp
                                                                                                         HA state      process
     – SA-aware                                                                                          assignment
                                                                                                                        AMF
          • Applications modified to interact with AMF through                        AMF                              Library
            AMF API

                                                                                             Lifecycle
     – Non-proxied, non-SA-aware                                                               mgmt
                                                                                                                          Non-
          • Legacy or 3rd party applications that typically cannot                                                       proxied
            be modified                                                           AMF                                   AMF comp
          • Interact with AMF through command line scripts to                                                            process
            manage application lifecycle
          • Always assigned active HA state if running
                                                                                                         CLC-CLI
                                                                                Proxy Lifecycle
                                                                                                         Scripts
                                                                                    mgmt
     – Proxied, non-SA-aware                                                                                             Proxy
                                                                                                                       component
          • Applications that have knowledge of HA concepts but
                                                                                                                         AMF
            do not directly communicate with AMF                            AMF                                         Library
          • Proxy application receives HA “commands” from
            AMF and forwards them to proxied application
                                                                                                                             Lifecycle
            through a custom interface               Proxy HA state assignment AND                         Proxied            mgmt &
                                                             Proxied comp lifecycle mgmt &                AMF comp           HA state
                                                             HA state assignment requests
                                                                                                           process          assignment
Availability Management Framework (AMF)
          Service Group Redundancy Models
• 2N                                                              SI1
  – Most common redundancy model
  – Preferred assignment model per SI:                A                   S
       • 1 active resource
       • 1 standby resource                           SU1               SU2
  – SUs can have either all active or all
    standby SI assignments                            Node1             Node2

  – A.k.a.
       • 1+1, active-standby, active-backup
                                                          SI1
• N+M
  – Preferred assignment model per SI:
                                              A                            S
       • 1 active resource
       • 1 standby resource                   SU1         SU2           SU3
  – SUs can have either all active or all
    standby SI assignments                    Node1       Node2         Node3
  – Both N and M are configurable                     A             S
  – Common variation:     N+1
                                                           SI2
Availability Management Framework (AMF)
             Service Group Redundancy Models
•   No redundancy                                   SI1              SI2
    – Preferred assignment model per SI:
        • 1 active resource                         A                   A
    – Similar to a N+0 redundancy scheme            SU1              SU2
      where N is the number of protected SIs
                                                    Node1            Node2

•   N-way                                                                            SI1
    – Preferred assignment model per SI:
        • 1 active resource                                             A        S                S
        • Y standby resources (where Y is
          configurable)                                              SU1         SU2           SU3
    – SUs can concurrently have both active and                      Node1       Node2         Node3
      standby assignments
                                                                             S        A    S
                                                          SI1
•   N-way Active                                                                     SI2
    – Preferred assignment model per SI:        A                A
        • X active resources (where X is
          configurable)                        SU1              SU2
        • No standby resource                  Node1            Node2
Availability Management Framework (AMF)
                      Error Recovery Policies
•   Pre-defined AMF component error recovery policies
     – Configurable
     – Can be overridden at runtime
•   Recovery policy scopes
     – Component
     – Service Unit
     – Node
•   Recovery policy types
     – Restart
     – Failover
     – Failfast
•   Up to 3 actions per policy
     – Isolation
     – Recovery
     – Repair
•   Error escalation policies
System Management Services
             Information Model Management (IMM)
•   Information Model Highlights
     –   Based on pre-defined object classes
         (including AIS classes)
     –   Holds both configuration and runtime
         objects
     –   Used by AIS services to store current
         configuration and runtime state info
     –   Can be used by applications as well
•   Object Management API
     –   Object class management
     –   Access object attribute values
     –   Search information model
     –   Configuration change requests
     –   Administrative operation invocation
•   Object Implementer API
     –   Runtime object management
     –   CCB validation and application
     –   Administrative operation handling
•   OpenSAF Implementation
     –   Persistence of information model
         managed through Persistence BackEnd
         (PBE) feature
     –   Replicated to multiple cluster nodes
System Management Services
             Software Management Framework (SMF)
•   SMF controls migration
    from one deployment                                                          Upgrade
                                                            “Upgrade
    configuration to another                                Instructions”       Campaign
                                                                                Definition
•   Upgrade methods
     –   Rolling upgrade                             Software
     –   Single step upgrade                         Management                  Adaptation commands
• [De-]Activation Unit Scope                         Framework                   (SMF config object)

     –   AMF Node
                                         Install / remove                   - Admin operations
     –   Service Unit
                                         software bundles                   - Read/Create/Delete/Update
•   During the migration SMF             on target nodes                    objects

     –   Maintains the campaign state
         change model
     –   Takes measures to enable
         error recovery
     –   Monitors for potential errors
         caused by the migration             Software                  Information
     –   Deploys error recovery
         procedures                         Repository                    Model
System Management Services
• Notification (NTF)
   – Publish-and-subscribe semantics for system-level notifications
       • Reader interface for reading historical alarm info as well
   – Formal syntax and semantics for ITU X.73x notifications:
       • Alarm / security alarm / state change / object create/ delete /
         attribute change
   – Used by AIS services to publish service-specific notifications
   – Alarm and security alarm notifications automatically logged
     through LOG service

• Log (LOG)
   –   Flexible, centralized, system-wide logging mechanism
   –   Pre-defined log streams: alarm, notification, system
   –   Supports multiple, custom application log streams
   –   Log streams are configurable on a per log stream basis
        • Including log file full action: halt, wrap, and rotate
Application Services
• Checkpoint (CKPT)
  – Intended as a state replication mechanism for distributed
    applications
  – Can be used for all standby “temperature levels”
      • Cold
      • Warm
      • Hot
          – Through OpenSAF CKPT service API extension
  – Semantics of a checkpoint
     • Arbitrary set of sections containing opaque data
     • Stored in one or more replicas distributed across cluster
     • Reads and writes occur against the active replica
  – Both synchronous and asynchronous replication options
    available
  – Collocated checkpoint option provided for highest performance
Application Services
• Event (EVT)
  – Publish-and-subscribe communication paradigm
  – Flexible event channel, pattern, and filtering definition
  – Subscriber event queue maintained within app process
• Message (MSG)
  –   Messages sent to and read from message queues
  –   Single message queue owner at a time
  –   Message queue maintained outside app process
  –   Message queues can be logically grouped
       • Messages can be sent to a message queue group
       • Associated distribution policy (round-robin, broadcast, etc.)
• Lock (LCK)
  – Cluster-wide, distributed lock service
  – Can be used to control access to cluster-level shared resources
Getting Started with OpenSAF
• OpenSAF Technical Educational Resources
  – Developer Wiki [http://devel.opensaf.org/wiki]
  – OpenSAF Developers blog [http://devel.opensaf.org/blog]
  – OpenSAF mailing lists [Subscribe: http://list.opensaf.org/maillist/listinfo/]
       • Users [Archive: http://list.opensaf.org/pipermail/users/]
       • Announce [Archive: http://list.opensaf.org/pipermail/announce/]
       • Development [Archive: http://list.opensaf.org/pipermail/devel/]
  – Latest documentation [http://devel.opensaf.org/hg/opensaf-4.x-
    documentation/archive/tip.tar.gz]
  – FAQ
    [http://www.opensaf.org/HOA/assn14944/images/FREQUENTLY%20ASKED%20QUESTIONS%20ABOUT%20OPE
    NSAF%20RELEASE%204%20Final%20for%20publication.docx]

  – README files in source code repository
Questions

More Related Content

Similar to An Introduction to OpenSAF 5.17.2011

CloudStack Collaboration Conference 12; Refactoring cloud stack
CloudStack Collaboration Conference 12; Refactoring cloud stackCloudStack Collaboration Conference 12; Refactoring cloud stack
CloudStack Collaboration Conference 12; Refactoring cloud stackbuildacloud
 
Overview Of Microsoft Private Cloud
Overview Of Microsoft Private CloudOverview Of Microsoft Private Cloud
Overview Of Microsoft Private CloudLai Yoong Seng
 
IBM Smarter Business 2012 - Smarta managerade övervakningstjänster baserad på...
IBM Smarter Business 2012 - Smarta managerade övervakningstjänster baserad på...IBM Smarter Business 2012 - Smarta managerade övervakningstjänster baserad på...
IBM Smarter Business 2012 - Smarta managerade övervakningstjänster baserad på...IBM Sverige
 
Arch stylesandpatternsmi
Arch stylesandpatternsmiArch stylesandpatternsmi
Arch stylesandpatternsmilord14383
 
Enabling Technologies for Branded Wireless Services - Boris Klots, Motorola, ...
Enabling Technologies for Branded Wireless Services - Boris Klots, Motorola, ...Enabling Technologies for Branded Wireless Services - Boris Klots, Motorola, ...
Enabling Technologies for Branded Wireless Services - Boris Klots, Motorola, ...mfrancis
 
Gemkomsis 2012
Gemkomsis 2012Gemkomsis 2012
Gemkomsis 2012milsoftSDC
 
2012 06-15-jazoon12-sub138-eranea-large-apps-migration
2012 06-15-jazoon12-sub138-eranea-large-apps-migration2012 06-15-jazoon12-sub138-eranea-large-apps-migration
2012 06-15-jazoon12-sub138-eranea-large-apps-migrationDidier Durand
 
21st Century SOA
21st Century SOA21st Century SOA
21st Century SOABob Rhubart
 
Mobile operating system..
Mobile operating system..Mobile operating system..
Mobile operating system..Aashish Uppal
 
110531 newlease heads in the clouds feet on the ground v2.0 (partner ready) ...
110531 newlease heads in the clouds feet on the ground v2.0 (partner ready)  ...110531 newlease heads in the clouds feet on the ground v2.0 (partner ready)  ...
110531 newlease heads in the clouds feet on the ground v2.0 (partner ready) ...New Lease
 
21st Century Service Oriented Architecture
21st Century Service Oriented Architecture21st Century Service Oriented Architecture
21st Century Service Oriented ArchitectureBob Rhubart
 
Ss Wrap Up Session 13 Aug
Ss Wrap Up Session 13 AugSs Wrap Up Session 13 Aug
Ss Wrap Up Session 13 AugWSO2
 
SOA Summer School: Best of SOA Summer School – Encore Session
SOA Summer School: Best of SOA Summer School – Encore Session SOA Summer School: Best of SOA Summer School – Encore Session
SOA Summer School: Best of SOA Summer School – Encore Session WSO2
 
Understanding the WSO2 Platform and Technology
Understanding the WSO2 Platform and TechnologyUnderstanding the WSO2 Platform and Technology
Understanding the WSO2 Platform and TechnologyWSO2
 
Enhancing ibm tivoli san manager device availability alerting redp3821
Enhancing ibm tivoli san manager device availability alerting redp3821Enhancing ibm tivoli san manager device availability alerting redp3821
Enhancing ibm tivoli san manager device availability alerting redp3821Banking at Ho Chi Minh city
 

Similar to An Introduction to OpenSAF 5.17.2011 (20)

CloudStack Collaboration Conference 12; Refactoring cloud stack
CloudStack Collaboration Conference 12; Refactoring cloud stackCloudStack Collaboration Conference 12; Refactoring cloud stack
CloudStack Collaboration Conference 12; Refactoring cloud stack
 
Overview Of Microsoft Private Cloud
Overview Of Microsoft Private CloudOverview Of Microsoft Private Cloud
Overview Of Microsoft Private Cloud
 
IBM Smarter Business 2012 - Smarta managerade övervakningstjänster baserad på...
IBM Smarter Business 2012 - Smarta managerade övervakningstjänster baserad på...IBM Smarter Business 2012 - Smarta managerade övervakningstjänster baserad på...
IBM Smarter Business 2012 - Smarta managerade övervakningstjänster baserad på...
 
Arch stylesandpatternsmi
Arch stylesandpatternsmiArch stylesandpatternsmi
Arch stylesandpatternsmi
 
Enabling Technologies for Branded Wireless Services - Boris Klots, Motorola, ...
Enabling Technologies for Branded Wireless Services - Boris Klots, Motorola, ...Enabling Technologies for Branded Wireless Services - Boris Klots, Motorola, ...
Enabling Technologies for Branded Wireless Services - Boris Klots, Motorola, ...
 
Gemkomsis 2012
Gemkomsis 2012Gemkomsis 2012
Gemkomsis 2012
 
2012 06-15-jazoon12-sub138-eranea-large-apps-migration
2012 06-15-jazoon12-sub138-eranea-large-apps-migration2012 06-15-jazoon12-sub138-eranea-large-apps-migration
2012 06-15-jazoon12-sub138-eranea-large-apps-migration
 
21st Century SOA
21st Century SOA21st Century SOA
21st Century SOA
 
Cloud. het draait allemaal om de app!
Cloud. het draait allemaal om de app!Cloud. het draait allemaal om de app!
Cloud. het draait allemaal om de app!
 
Mobile operating system..
Mobile operating system..Mobile operating system..
Mobile operating system..
 
CXL Fabric Management Standards
CXL Fabric Management StandardsCXL Fabric Management Standards
CXL Fabric Management Standards
 
110531 newlease heads in the clouds feet on the ground v2.0 (partner ready) ...
110531 newlease heads in the clouds feet on the ground v2.0 (partner ready)  ...110531 newlease heads in the clouds feet on the ground v2.0 (partner ready)  ...
110531 newlease heads in the clouds feet on the ground v2.0 (partner ready) ...
 
21st Century Service Oriented Architecture
21st Century Service Oriented Architecture21st Century Service Oriented Architecture
21st Century Service Oriented Architecture
 
Pepperweed NNMi 9 E-Paks
Pepperweed NNMi 9 E-PaksPepperweed NNMi 9 E-Paks
Pepperweed NNMi 9 E-Paks
 
Ss Wrap Up Session 13 Aug
Ss Wrap Up Session 13 AugSs Wrap Up Session 13 Aug
Ss Wrap Up Session 13 Aug
 
SOA Summer School: Best of SOA Summer School – Encore Session
SOA Summer School: Best of SOA Summer School – Encore Session SOA Summer School: Best of SOA Summer School – Encore Session
SOA Summer School: Best of SOA Summer School – Encore Session
 
Manage engine it360
Manage engine it360Manage engine it360
Manage engine it360
 
Virtualization Training
Virtualization TrainingVirtualization Training
Virtualization Training
 
Understanding the WSO2 Platform and Technology
Understanding the WSO2 Platform and TechnologyUnderstanding the WSO2 Platform and Technology
Understanding the WSO2 Platform and Technology
 
Enhancing ibm tivoli san manager device availability alerting redp3821
Enhancing ibm tivoli san manager device availability alerting redp3821Enhancing ibm tivoli san manager device availability alerting redp3821
Enhancing ibm tivoli san manager device availability alerting redp3821
 

An Introduction to OpenSAF 5.17.2011

  • 1. Introduction to OpenSAF David Fick Senior Software Architect GoAhead Software
  • 2. Introduction to OpenSAF • Service availability and high availability systems and concepts have been around for decades • However, HA terminology tends to vary from industry to industry and company to company • Goals of this session: – High-level technical overview of the Service Availability™ Forum standards – Overview of the support of those standards within OpenSAF – Allow you to: • Familiarize yourself with SA Forum and OpenSAF concepts and terminology OR • Map the HA concepts and terminology with which you are familiar to the SA Forum and OpenSAF versions – Resources for getting started with OpenSAF
  • 3. SA Forum Interfaces: AIS & HPI Applications Application Interface Specifications (AIS) Service Availability Middleware System Management SAF Software Mgmt Availability Lock (LCK) Framework (SMF) Management Standards Framework (AMF) Implemented Information Checkpoint (CKPT) by OpenSAF Model Mgmt (IMM) Cluster Membership (CLM) Event (EVT) Notification (NTF) Log (LOG) Platform Mgmt (PLM) Message (MSG) Operating System Virtualization Hardware Platform Interface (HPI) Hardware Hardware Hardware Hardware Platform A Platform B Platform C Platform D
  • 4. But how to make sense of the SA Forum “acronym soup”?
  • 5. AIS Service Groupings • First, understand that the AIS services fall into three logical groupings*: System Management Resource Availability Application Services Services Management Services Information Availability Checkpoint (CKPT) Model Mgmt (IMM) Management Framework (AMF) Event (EVT) Software Mgmt Framework (SMF) Cluster Membership (CLM) Message (MSG) Notification (NTF) Platform Mgmt (PLM) Lock (LCK) Log (LOG) Services that manage central Services that manage and Optional services to support system capabilities commonly monitor the state of key system application operations such as: resources that affect availability: • Inter-process used by both: • Hardware / Operating communication • AIS services system • State replication • Applications • Cluster nodes • Shared resource access • Applications control * - Not official SA Forum AIS service groupings
  • 6. Fault Management Cycle • Second, AIS services that manage availability are designed around a standard fault management cycle – Detection Detection • E.g. component healthchecks – Isolation • E.g. blade power off – Recovery Repair Notification Isolation • E.g. failover of workload assignments to associated standby resources – Repair Recovery • E.g. automatic restart of failed resource – Notification • E.g. state change notifications sent by service managing the resource
  • 7. Resource Dependencies • Third, Availability Management in the AIS world is Managed driven by a detailed understanding of the availability Applications management dependencies across all resource types – Managed Applications • Simple to complex dependencies and relationships can be modeled between the various software elements • Dependency on a particular node also modeled AMF Node – AMF Node • Represents a node where AMF services are provided • Depends on a CLM node – CLM Node CLM Node • Represents a cluster node where AIS services are provided • Depends on an Execution Environment (optional) – Platform Resource • Containment and logical dependencies represented Platform between platform resources Resource • Execution Environment (EE) – Represents an operating system instance (standalone or virtual) • Hardware Element (HE) Hardware Execution – Represents a physical hardware resource in the system Element Environment
  • 8. Common Design Patterns • Fourth, the AIS services follow common design patterns: – API • Common library lifecycle • Naming conventions – Resource managed by service Managed object • Typically with associated state model • Managed objects stored in common information model – Administrative operations • X.731 style administrative operations for resources which affect availability – Notifications automatically generated by AIS services for significant system events (alarms, state changes, etc.)
  • 9. Resource Availability Management Services • Availability Management Framework (AMF) – Manages the lifecycle and monitors the state of the managed applications within the system – More detail in upcoming slides • Cluster Membership (CLM) AMF – Provides cluster membership change notifications to AIS services and interested applications – OpenSAF CLM implements cluster management protocol dealing with: • Cluster formation CLM • Active controller selection & failover • Node failure detection • Platform Management (PLM) – Manages state of modeled hardware elements and execution environments (operating system instances) PLM – Hardware element states and events accessed through Hardware Platform Interface (HPI) – Manages graceful blade extraction / de-activation cases – Supports hardware element controls (power on/off and reset) – Optional service within OpenSAF
  • 10. Availability Management Framework (AMF) AMF Logical Entities • Structural Entities AMF – AMF Application Application • Represents the highest-level 1..* service(s) provided by the system – Service Group (SG) Service Group • Represents a group of like logical resources that provide the same service(s) • Associated redundancy model 1..* (e.g. 1+1) – Service Unit (SU) Service Unit 1 • Aggregates a set of resources which when combined provide a higher-level service 1..* – Component Component • Represents one or more resources that perform a function within the system
  • 11. Availability Management Framework (AMF) AMF Logical Entities • Workload Entities AMF Application – Service Instance (SI) 1..* • Represents a workload to be supported by the system Service Service Service Group Protected by • Has associated redundancy Group Group requirements (1+1, N+M, etc.) • Protected by an identified SG • Assigned to one or more SUs 1..* 1..* with an HA state of active, Service standby, quiescing or Service Service1 Unit Assigned Service quiesced Unit 11 Unit Instance – Component Service Instance (CSI) 1..* 1..* • Represents a more granular Assigned Component workload that needs to be Component Component Component Service supported by the system Instance • Assigned to one or more components
  • 12. Availability Management Framework (AMF) AMF Logical Entities • Common Characteristics – Well-defined state model for each logical entity type CLC-CLI Scripts – X.731 style administrative operations Lifecycle mgmt • Common AMF Component Types AMF comp HA state process – SA-aware assignment AMF • Applications modified to interact with AMF through AMF Library AMF API Lifecycle – Non-proxied, non-SA-aware mgmt Non- • Legacy or 3rd party applications that typically cannot proxied be modified AMF AMF comp • Interact with AMF through command line scripts to process manage application lifecycle • Always assigned active HA state if running CLC-CLI Proxy Lifecycle Scripts mgmt – Proxied, non-SA-aware Proxy component • Applications that have knowledge of HA concepts but AMF do not directly communicate with AMF AMF Library • Proxy application receives HA “commands” from AMF and forwards them to proxied application Lifecycle through a custom interface Proxy HA state assignment AND Proxied mgmt & Proxied comp lifecycle mgmt & AMF comp HA state HA state assignment requests process assignment
  • 13. Availability Management Framework (AMF) Service Group Redundancy Models • 2N SI1 – Most common redundancy model – Preferred assignment model per SI: A S • 1 active resource • 1 standby resource SU1 SU2 – SUs can have either all active or all standby SI assignments Node1 Node2 – A.k.a. • 1+1, active-standby, active-backup SI1 • N+M – Preferred assignment model per SI: A S • 1 active resource • 1 standby resource SU1 SU2 SU3 – SUs can have either all active or all standby SI assignments Node1 Node2 Node3 – Both N and M are configurable A S – Common variation: N+1 SI2
  • 14. Availability Management Framework (AMF) Service Group Redundancy Models • No redundancy SI1 SI2 – Preferred assignment model per SI: • 1 active resource A A – Similar to a N+0 redundancy scheme SU1 SU2 where N is the number of protected SIs Node1 Node2 • N-way SI1 – Preferred assignment model per SI: • 1 active resource A S S • Y standby resources (where Y is configurable) SU1 SU2 SU3 – SUs can concurrently have both active and Node1 Node2 Node3 standby assignments S A S SI1 • N-way Active SI2 – Preferred assignment model per SI: A A • X active resources (where X is configurable) SU1 SU2 • No standby resource Node1 Node2
  • 15. Availability Management Framework (AMF) Error Recovery Policies • Pre-defined AMF component error recovery policies – Configurable – Can be overridden at runtime • Recovery policy scopes – Component – Service Unit – Node • Recovery policy types – Restart – Failover – Failfast • Up to 3 actions per policy – Isolation – Recovery – Repair • Error escalation policies
  • 16. System Management Services Information Model Management (IMM) • Information Model Highlights – Based on pre-defined object classes (including AIS classes) – Holds both configuration and runtime objects – Used by AIS services to store current configuration and runtime state info – Can be used by applications as well • Object Management API – Object class management – Access object attribute values – Search information model – Configuration change requests – Administrative operation invocation • Object Implementer API – Runtime object management – CCB validation and application – Administrative operation handling • OpenSAF Implementation – Persistence of information model managed through Persistence BackEnd (PBE) feature – Replicated to multiple cluster nodes
  • 17. System Management Services Software Management Framework (SMF) • SMF controls migration from one deployment Upgrade “Upgrade configuration to another Instructions” Campaign Definition • Upgrade methods – Rolling upgrade Software – Single step upgrade Management Adaptation commands • [De-]Activation Unit Scope Framework (SMF config object) – AMF Node Install / remove - Admin operations – Service Unit software bundles - Read/Create/Delete/Update • During the migration SMF on target nodes objects – Maintains the campaign state change model – Takes measures to enable error recovery – Monitors for potential errors caused by the migration Software Information – Deploys error recovery procedures Repository Model
  • 18. System Management Services • Notification (NTF) – Publish-and-subscribe semantics for system-level notifications • Reader interface for reading historical alarm info as well – Formal syntax and semantics for ITU X.73x notifications: • Alarm / security alarm / state change / object create/ delete / attribute change – Used by AIS services to publish service-specific notifications – Alarm and security alarm notifications automatically logged through LOG service • Log (LOG) – Flexible, centralized, system-wide logging mechanism – Pre-defined log streams: alarm, notification, system – Supports multiple, custom application log streams – Log streams are configurable on a per log stream basis • Including log file full action: halt, wrap, and rotate
  • 19. Application Services • Checkpoint (CKPT) – Intended as a state replication mechanism for distributed applications – Can be used for all standby “temperature levels” • Cold • Warm • Hot – Through OpenSAF CKPT service API extension – Semantics of a checkpoint • Arbitrary set of sections containing opaque data • Stored in one or more replicas distributed across cluster • Reads and writes occur against the active replica – Both synchronous and asynchronous replication options available – Collocated checkpoint option provided for highest performance
  • 20. Application Services • Event (EVT) – Publish-and-subscribe communication paradigm – Flexible event channel, pattern, and filtering definition – Subscriber event queue maintained within app process • Message (MSG) – Messages sent to and read from message queues – Single message queue owner at a time – Message queue maintained outside app process – Message queues can be logically grouped • Messages can be sent to a message queue group • Associated distribution policy (round-robin, broadcast, etc.) • Lock (LCK) – Cluster-wide, distributed lock service – Can be used to control access to cluster-level shared resources
  • 21. Getting Started with OpenSAF • OpenSAF Technical Educational Resources – Developer Wiki [http://devel.opensaf.org/wiki] – OpenSAF Developers blog [http://devel.opensaf.org/blog] – OpenSAF mailing lists [Subscribe: http://list.opensaf.org/maillist/listinfo/] • Users [Archive: http://list.opensaf.org/pipermail/users/] • Announce [Archive: http://list.opensaf.org/pipermail/announce/] • Development [Archive: http://list.opensaf.org/pipermail/devel/] – Latest documentation [http://devel.opensaf.org/hg/opensaf-4.x- documentation/archive/tip.tar.gz] – FAQ [http://www.opensaf.org/HOA/assn14944/images/FREQUENTLY%20ASKED%20QUESTIONS%20ABOUT%20OPE NSAF%20RELEASE%204%20Final%20for%20publication.docx] – README files in source code repository