1. <Insert Picture Here>
Demystifying Oracle RAC Internals
Barb Lundhild RAC Product Management
The following is intended to outline our general
product direction. It is intended for information
purposes only, and may not be incorporated into any
contract. It is not a commitment to deliver any
material, code, or functionality, and should not be
relied upon in making purchasing decisions.
The development, release, and timing of any
features or functionality described for Oracle’s
products remains at the sole discretion of Oracle.
1
2. Agenda
Answer most common questions about
Oracle Clusterware and Oracle RAC
<Insert Picture Here>
•
•
•
•
•
•
•
•
Architecture
Oracle Clusterware – Group Membership
Oracle Cluster Registry
The Interconnect
The Public Network and the Virtual IP (VIP)
Oracle RAC Startup/Shutdown
Advanced Features of Oracle RAC
Appendix
<Insert Picture Here>
Architecture
2
3. RAC Architecture
public network
VIP1
Service
VIPn
Service
Listener
Listener
instance 1
instance n
ASM
ASM
Node1
Node n
cluster
Oracle Clusterware interconnect Oracle Clusterware
Operating System
Operating System
shared storage
Managed by ASM
RAW Devices
Redo / Archive logs all instances
Database / Control files
OCR and Voting Disks
What does Clusterware provide?
VIP
Event Management
Oracle
High Availability
Framework
Clusterware
Process Monitor
Group Membership
Operating System
3
5. Oracle Clusterware
Group Membership and Heartbeats
• Cluster needs to know who is a member at all times
• Oracle Clusterware has 2 heartbeats:
• Network heartbeat
If a node does not send a heartbeat for MissCount (time in
seconds), then node is evicted from cluster
• Disk heartbeat
If disk heartbeat is not updated in I/O timeout, then node is
evicted from cluster
Oracle Clusterware
Split Brain Resolution
• When interconnect breaks – keeps the largest cluster
possible up, other nodes will be evicted, in 2 node
cluster lowest number node remains.
• IO Fencing similar to the Stonith algorithm
• Voting disk is used to detect network problems that could
lead to a split-brain
• Final arbiter of the status of configured nodes, either up or down,
and delivers eviction notices
• Recommended to have at least 3 voting disks
• Standard NFS support for 3rd voting disk on Linux, AIX, or Solaris
5
6. IT IS NOT SUPPORTED TO
REDUCE MISSCOUNT BELOW
THE DEFAULT (30s)
<Insert Picture Here>
Oracle Cluster Registry
6
7. Oracle Cluster Registry (OCR)
• A repository containing the definition of the
configuration of the cluster and status of resources
managed by the cluster
• Required file(s) for Oracle Clusterware
• Initialized during install of Oracle Clusterware
• Location defined in Registry on Windows or OCR.LOC on
Linux and Unix
• Mirrored by Oracle Clusterware or externally (RAID)
• Supports both automatic (every 4 hours) and manual
(new in 11.1) backups
• ocrconfig –manualbackup
Oracle Cluster Registry (OCR)
• Tools to manage OCR
• OCRCONFIG – command line tool to manage backups,
restore, import, export, repair, and replace
• Make sure you have a good backup before changing the
cluster configuration!
• OCRCHECK – checks integrity and displays the version of
the OCR's block format, total space available, used space,
and the OCR locations that you have configured
• OCRDUMP - view the OCR contents by writing OCR content
to a file or stdout in a readable format.
7
8. <Insert Picture Here>
Interconnect
Failure Protection and Scalability
Private Interconnect
/…/
public network
Node1
VIP1
Service
VIP2
Service
Listener
Listener
Listener
instance 1
instance 2
instance n
Node 2
VIPn
Service
ASM
ASM
ASM
Oracle Clusterware
Oracle Clusterware
Oracle Clusterware
Operating System
Operating System
Node n
Operating System
Switch 1
Switch 2
cluster
interconnect
8
9. The Interconnect
• Interconnect is typically a standard GigE network
• IP over IB is supported
• Network should use a private dedicated non-routable
switch or VLAN
• A crossover cable is not supported as an interconnect
• For high availability and scalability use OS based
solution to combine multiple physical links into a
single logical link
• Same technology can be applied to public network
• Only logical link should be provided to Oracle
Clusterware and therefore Oracle RAC
<Insert Picture Here>
Public Network and VIP
Failure Protection
9
10. Why Oracle RAC has a VIP?
• Protects database clients from long TCP/IP timeouts
(can be >10 minutes)
• During normal operation, works the same as
hostname
• During failure, it removes network timeout from
connection request time, client fails immediately to
next address in the list
sales.us.acme.com =(DESCRIPTION=(ADDRESS_LIST=
(LOAD_BALANCE=on)(FAILOVER=ON)
(ADDRESS=(PROTOCOL=tcp)(HOST=sales1-vip)(PORT=1521))
(ADDRESS=(PROTOCOL=tcp)(HOST=sales2-vip)(PORT=1521)))
(CONNECT_DATA=
(SERVICE_NAME= sales.us.acme.com)))
Oracle RAC VIP
The Details
•
•
•
•
One for each node in cluster
Required for Oracle Clusterware installation
IP and network name should not currently be in use
Should be registered in DNS and must be on the same
subnet as public IP address
• Configuration managed by VIPCA and SRVCTL
• Note that netmask defaults to 255.255.255.0, rather
than defaulting to netmask of underlying physical
interface.
10
11. Oracle RAC VIP is DIFFERENT
• Only accepts connections when on its home node
• Failure on home node: relocates to another node in the
cluster only to send a error back to client (it will not be
in the listener so connections are not accepted!)
• You will only have one active RAC VIP per node (there
may be others who have relocated due to failure!)
• Independent of number of databases running in cluster
Oracle RAC VIP
[root@pmrac1 root]# ifconfig
eth0
Link encap:Ethernet HWaddr 00:12:79:D8:90:93
inet addr:144.15.214.10 Bcast:144.15.215.255
Mask:255.255.252.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:5070815 errors:0 dropped:0 overruns:0 frame:0
TX packets:3064435 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:509963813 (486.3 Mb) TX bytes:3621223517 (3453.4 Mb)
Interrupt:25
eth0:1
Link encap:Ethernet HWaddr 00:12:79:D8:90:93
inet addr:144.15.214.30 Bcast:144.15.215.255
Mask:255.255.252.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:5762695 errors:0 dropped:0 overruns:0 frame:0
TX packets:5679252 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3400642002 (3243.1 Mb) TX bytes:3166774792 (3020.0 Mb)
Interrupt:25
11
13. Application VIPs
• New resource as of Oracle RAC 10g Release 2
• Created as functional VIPs which can be used to
connect to an application regardless of the node it is
running on
• VIP is a dependent resource of the user registered
application
• There can be many VIPs, one per User Application
Creating an Application VIP
• The usrvip script must run as root
• The default permissions need to be changed after
registration
• As root…
crs_setperm ApplicationVIP1 –o root
• Allow oracle user to execute this script
• As root…
crs_setperm ApplicationVIP1 –u user:oracle:r-x
• Start the VIP
• As oracle…
crs_start ApplicationVIP1
13
15. Oracle Dependencies
public network
VIP1 VIP2
Service
Listener
Listener
instance 1
instance 2
ASM
Node1
VIP1
Service
ASM
Node 2
cluster
Oracle Clusterware interconnect Oracle Clusterware
Operating System
Operating System
shared storage
Redo / Archive logs all instances
Managed by ASM
Database / Control files
RAW Devices
OCR and Voting Disks
Oracle Dependencies
Prior to 10.2.0.3
public network
VIP1 VIP2
Service
Listener
Listener
instance 1
instance 2
ASM
Node1
VIP1
Service
ASM
Node2
cluster
Oracle Clusterware interconnect Oracle Clusterware
Operating System
Operating System
shared storage
Managed by ASM
RAW Devices
Redo / Archive logs all instances
Database / Control files
OCR and Voting Disks
15
16. <Insert Picture Here>
Advanced Features of RAC
High Availability and Load
Balancing for Applications
Services
• Application workloads can be defined as Services
Individually managed and controlled
Assigned to instances during normal startup
On instance failure, automatic re-assignment
Service performance individually tracked
Finer grained control with Resource Manager
Integrated with other Oracle tools / facilities (E.G. Scheduler,
Streams)
• Managed by Oracle Clusterware
• Several services created and managed by database server
•
•
•
•
•
•
Many features discussed do not apply to default database service
16
17. Cluster Managed Services
• Service has a set of resources defined to Oracle
Clusterware
• Oracle Clusterware manages start/stop/re-locate
based on definition
• Define Preferred (normal operations) and Available (if
failure occurs) instances
• Dependent on Instance and VIP
• Manage using Enterprise Manager
• SRVCTL CLI for Cluster configuration
• DBMS_SERVICE PL/SQL package
What is FAN?
• Fast Application Notification (FAN) is a RAC
notification mechanism
• FAN HA Events: Notification of Up/Down for service,
instance & node
• Load Balancing Advisory Events: Advise clients of
current load for service and where to send
connection requests
• Enable it, and Forget it.
17
18. Oracle Notification Service (ONS)
• Publish/Subscribe Messaging System
• Allows both local and remote consumption
• Used by Fast Application Notification (FAN) to publish
HA Events and Load Balancing Events
• Used by FAN clients to subscribe to events
• Automatically installed and configured by the
installation of Oracle Clusterware
• DO NOT TURN OFF – Required by Oracle
Clusterware and RAC
Fan Clients
• HA Events: JDBC Implicit Connection Cache, OCI,
ODP.NET Connection Pools, Listener, Server Side
Callouts, CMAN
• Load Balancing Advisory Events: JDBC Implicit
Connection Cache, ODP.NET Connection Pools,
Listener, CMAN
• New with 11.1.0.7: Universal Connection Pool for
JAVA
18
19. Fast Connection Failover
• Fast and reliable high availability for connections in an
Oracle Real Application Clusters 10g environment
• Enable it and forget it
• Application can make it transparent to user by
trapping SQL Exception and retrying
• Supported by Oracle JDBC, OCI, and ODP.NET
Load Balancing Advisory
• Load Balancing Advisory is an advisory for balancing
work across RAC instances.
• Load Balances at the transaction level (not
connections!)
• Directs work to where services are executing
well and resources are available.
• Adjusts distribution for different power nodes,
different priority and shape workloads, changing
demand.
• Stops sending work to slow, hung, failed nodes
early.
19
20. Runtime Connection Load Balancing
• When application does “getConnection”, the
connection given is the one that will provide
the best service.
• Supported by Oracle JDBC, OCI, and
ODP.NET connection Pools
• Policy defined by setting GOAL on Service
• Need to have Oracle Net Services Connection
Load Balancing
Leverage Temporal Connection Affinity
New with 11.1.0.7
RAC
Database
Web Client
Connect to me
Instance1
Instance2
Pool
Connection
Affinity Context
Instance3
20
21. Leverage XA Connection Affinity
New with 11.1.0.7
• DB 11g fixes the correctness problem. XA Affinity
adds Performance and Scalability.
• Eliminates current single DTP service limitation for
XA/RAC
• XA affinity is the ability to automatically localize a
global transaction to a single RAC instance
• Scope is the life of a global transaction
•
•
First connection request for a global transaction uses
Runtime Connection Load Balancing (RCLB)
Subsequent requests use affinity and are routed to the
same RAC instance where XA first started
Q
&
A
QUESTIONS
ANSWERS
21
23. Useful Metalink Notes
• Note 342082.1 “How to Change Subnet Masks for VIPs”
• Note 294430.1 “CSS Timeout Computation in RAC 10g ”
• Note 284752.1 “10g RAC: Steps To Increase CSS Misscount,
Reboottime and Disktimeout”
• Note 291962.1 ‘Setting Up Bonding in SLES 9’
• Note 291958.1 ‘Setting Up Bonding in Suse SLES8’
• Note 298891.1 ‘Configuring Linux for the Oracle 10g VIP using
bonding’
• Note 283107.1 ‘Configuring Solaris IP Multipathing (IPMP) for
the Oracle 10g VIP’
OTN.ORACLE.COM/RAC
• Workload Management with Oracle Real Application
Clusters (FAN, FCF, Load Balancing)
• Using standard NFS to support a third voting disk on a
stretch cluster configuration on Linux
• Using Oracle Clusterware to Protect 3rd Party
Applications
• New: otn.oracle.com/clusterware
• RAC Sample Code Page
http://www.oracle.com/technology/sample_code/products/rac/index.html
23