SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Downloaden Sie, um offline zu lesen
BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF
HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH
Oracle Client Failover
Robert Bialek
Principal Consultant
Under The Hood
Who Am I
Oracle Client Failover - Under The Hood
Principal Consultant and Trainer at Trivadis GmbH in Munich
– MSc in Computer Engineering
Focus:
– Oracle Database High Availability
– Database Architecture/Internals
– Backup/Recovery
– Troubleshooting/Performance Tuning
– Linux Administration
Trainer for the following Trivadis courses
– Oracle Grid Infrastructure, RAC, Data Guard
10.09.20162
Main Problems To Address
Oracle Client Failover - Under The Hood4 10.09.2016
New network session (connect) Already established network session
(re-connect)
Database Clients
1 IP not reachable (server/network/… issue)
2 Connect attempts
3 Wait for timeout
4 Client failover
Problem
Database Clients
2 IP not reachable (server/network/… issue)
1 Connected
3 Re-connect attempts
4 Wait for timeout
Problem
5 Client failover
ProblemProblem
Oracle Client Failover – And The Solution?
Oracle Client Failover - Under The Hood
Depends strongly on many factors
– Oracle client and database version
– Oracle database configuration, edition and available licenses
– Oracle client libraries/version (OCI, JDBC Thin,…)
– Application design
– Network topology, latencies
– Operating system type, version and configuration
– With or without Virtual IP Addresses (VIP)
Unfortunately no one size fits all solution…
10.09.20165
Agenda
Oracle Client Failover - Under The Hood
1. Operating System
Introduction
Connect/Re-Connect Timeouts
Virtual IP Addresses
TCP Keepalive
2. Oracle Client Failover
Database Services
Connect/Re-Connect Timeouts
Transparent Application Failover
Fast Application Notification/Fast Connection Failover
Application Continuity
3. Conclusions
10.09.20166
Oracle Client Failover - Under The Hood7 10.09.2016
Operating System
Operating System – Introduction
Oracle Client Failover - Under The Hood10.09.20168
[SYN] Seq=0
SYN-SENT[SYN, ACK] Seq=0 Ack=1
[ACK] Seq=1 Ack=1
SYN-RCVD
ESTABLISHED ESTABLISHED
1
2
3
. .
.fd -> socket:[inode]LISTEN
.socket(), bind()
fd -> socket:[inode]
read(), write () write (), read()
Data Transfer
connect()
TCP three-way handshake
Problem?
Problem?
Seq=Seq+1
Seq=Seq+1
New Network Session – Connect Timeout
Oracle Client Failover - Under The Hood
Kernel parameter: tcp_syn_retries
– Max. number of times initial SYNs for an active
TCP connection attempt will be retransmitted
– Default value in OEL 5 is 5, as of OEL 6 it is 6
– Initial Retransmission Timeout (RTO) is 1s
(changed in RHEL6.3 from 3 to 1)
– To change the value (not persistent)
10.09.20169
[SYN] Seq=0, RTO=1 sec.
[SYN] Seq=0, RTO=2 sec.
[TCP Retransmission]
1
tcp_syn_retries=5
4
5
Final Timeout = 2^(tcp_syn_retries+1)-1
Timeout/Error after 63 sec.(*)
ORA-12170:
TNS:Connect
timeout occurred
#Connect timeout after 15 sec.
sysctl -w net.ipv4.tcp_syn_retries=3
[SYN] Seq=0, RTO=16 sec.
[TCP Retransmission]
…
[SYN] Seq=0, RTO=32 sec.
[TCP Retransmission]
IP not reachable
New Network Session – Connect Timeout/ARP
Oracle Client Failover - Under The Hood
Connect timeouts controlled by tcp_syn_retries come into play, in case the client
ARP (Address Resolution Protocol) cache is not up-to-date!
10.09.201610
ARP cache (192.168.122.29)
IP:192.168.122.30 MAC:...:60:d4:0d REACHABLE
IP Packet
Source: IP, MAC
Destination: IP, MAC
Ethernet Frame
1
Broadcast ARP Who has 192.168.122.29? Tell 192.168.122.30
IP:192.168.122.29 MAC:...:1d:54:ec REACHABLE
3
IP:192.168.122.29 MAC:...:1d:54:ec REACHABLE
4
ARP cache (192.168.122.30)
2 Not refreshed yet! Client connect
timeout (tcp_syn_retries)
Refreshed! Client connect timeout ~3sec (the
same network segment)
ORA-12543: TNS: destination host unreachable
<ARP entry removed>
Established Network Session –Re-Connect Timeout
Oracle Client Failover - Under The Hood
Kernel parameter: tcp_retries2
– Max. number of TCP packet retransmissions
for established sessions minus 1
– Default value: 15, Timeout range: ~15-30 min.
– Initial RTO is 0.2 sec, max 120 sec.
– Runtime, RTO can be changed by kernel
– To change the value (not persistent)
10.09.201611
[PSH, ACK], RTO=0.2 sec.
[PSH, ACK], RTO=0.4 sec.
[TCP Retransmission]
1
tcp_retries2=3
Timeout/Error after 6.3 sec.
[PSH, ACK], RTO=0.8 sec.
[TCP Retransmission]
2
[PSH, ACK], RTO=1.6 sec.
[TCP Retransmission]
3
[PSH, ACK], RTO=3.2 sec.
[TCP Retransmission]
4
ORA-03113: end-of-file
on communication
channel
ss -ipo dst 192.168.122.29
socket timer:(on,1min44sec,11)
socket timer:(on,49sec,11) #1 sec. later
#Connect timeout after ~12 sec.
sysctl -w net.ipv4.tcp_retries2=4
Data
Virtual IP Addresses (VIP)
Oracle Client Failover - Under The Hood
IP addresses which do not correspond persistently to physical NICs
Client connects to network socket: <VIP>:<Port>
10.09.201612
eth0
eth0:1 VIP
eth0
Server A Server B
VIP:192.168.122.30 MAC:eth0<ServerA>
ARP cache
1
eth0:1 VIP
eth0
Server A Server B
VIP:192.168.122.30 MAC:eth0<ServerB>
ARP cache
2
3
Flushing neighbours
ARP Cache
5
VIP Relocate eth0
TCP [RST]
4
Network– TCP Keepalive (DCD)
Oracle Client Failover - Under The Hood
TCP mechanism which helps to detect broken network connections
Kernel parameters
For Oracle server (shadow) processes, automatically enabled on the network socket
– Implementation changed in 12c (tcp socket timer instead of Oracle Net probes)
For Oracle client processes not activated per default
– Unless ENABLE=BROKEN specified in the connect descriptor
10.09.201613
net.ipv4.tcp_keepalive_time = 7200 #keepalive probe every 2 hrs.
net.ipv4.tcp_keepalive_intvl = 75 #if not reachable probe every 75 sec.
net.ipv4.tcp_keepalive_probes =9 #close the connection after 9 failed probes
Probe
Local Address Foreign Address State PID/Program name Timer
192.168.122.2:38814 192.168.122.3:15300 ESTABLISHED 5963/sqlplus off(0.00/0/0)
Oracle Client Failover - Under The Hood14 10.09.2016
Oracle Client Failover
The Foundation – Dynamic Database Services
Oracle Client Failover - Under The Hood
A named representation of one or more running Oracle database instances
– Introduced with the Oracle 8i version
– Part of the Oracle client connect descriptor
– Basis of Oracle database high availability and workload management
10.09.201615
RAC Active/Active
RAC Active/Passive
Data Guard, RAC One Node,
Failover DB Configuration
Connect to database service
Services registered
with listener Services registered
with listener
The Foundation – Dynamic Database Services
Oracle Client Failover - Under The Hood
Database services can be created with
– srvctl (Grid Infrastructure), gdsctl (Global Data Services)
– dbms_service.create_service() PL/SQL procedure
Different high availability and workload management attributes can be defined
10.09.201616
srvctl add service
-db <db_unique_name>
-service <service>
-preferred "<preferred_list>"
-available "<available_list>"
-serverpool <pool_name>
-cardinality [UNIFORM | SINGLETON]
-tafpolicy [NONE | BASIC | PRECONNECT]
-role [PRIMARY, PHYSICAL_STANDBY, LOGICAL_STANDBY, SNAPSHOT_STANDBY]
-clbgoal [SHORT | LONG]
-rlbgoal [SERVICE_TIME | THROUGHPUT | NONE]
...
Some attributes
applicable only for
specific configurations
Most of them available
only with srvctl/gdsctl
Service
New Oracle Net Session – Connect Timeout
Oracle Client Failover - Under The Hood
sqlnet.ora parameters
Address description parameters (>=11gR2)
– Override sqlnet.ora parameters
– Parameters can be used for OCI, ODP.net
10.09.201617
OLTP.trivadis.com =
(DESCRIPTION =
(FAILOVER=ON) (LOAD_BALANCE=OFF)
(CONNECT_TIMEOUT=5)(RETRY_COUNT=3)(RETRY_DELAY=1)(TRANSPORT_CONNECT_TIMEOUT=3)
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP )(HOST = italy )(PORT = 1521)) #italy : SCAN
(ADDRESS = (PROTOCOL = TCP )(HOST = sweden )(PORT = 1521))) #sweden : SCAN
(CONNECT_DATA = (SERVICE_NAME = OLTP.trivadis.com)))
TCP.CONNECT_TIMEOUT=3 #default 60 sec.
SQLNET.OUTBOUND_CONNECT_TIMEOUT=5
LSNR LSNR
Three-way handshake
Oracle Net
1
2
#Max. connect time failover re-tries: 3+3+(3*6)=24
New in 12c
New Oracle Net Session – Connect Timeout
Oracle Client Failover - Under The Hood
JDBC Thin url
– TRANSPORT_CONNECT_TIMEOUT will be available in the 12.2 version
JDBC Thin clients can alternatively use the following driver property
– Overrides CONNECT_TIMEOUT from address description parameters
10.09.201618
pds.setURL("jdbc:oracle:thin:@(DESCRIPTION =(FAILOVER=ON)(LOAD_BALANCE=OFF)" +
"(CONNECT_TIMEOUT=3)(RETRY_COUNT=10)(RETRY_DELAY=1)" +
"(ADDRESS_LIST = " +
"(ADDRESS = (PROTOCOL = TCP )(HOST = blue.trivadis.com )(PORT = 1521)) " +
"(ADDRESS = (PROTOCOL = TCP )(HOST = brown.trivadis.com )(PORT = 1521))) " +
"(CONNECT_DATA = (SERVICE_NAME = sales_rw.trivadis.com)))");
Properties prop = new Properties();
prop.put(oracle.net.ns.SQLnetDef.TCP_CONNTIMEOUT_STR, ""+3000); // 3000ms
ods.setConnectionProperties(prop);
New in 12.1.0.2, Patch 19154304
Established Oracle Net Session – Re-Connect Timeout
Oracle Client Failover - Under The Hood
Break established network connection without
waiting for long TCP timeouts
sqlnet.ora parameters
– Parameters can be used for OCI and ODP.net clients
– The actual wait time is 2 x timeout value (wait for timeout  switch into break and
reset mode  wait for timeout)!
For JDBC Thin clients set the following connection property
10.09.201619
LSNR LSNR
Oracle Net
1
P1
2
Client failover
3
SQLNET.RECV_TIMEOUT=30 #no default value
SQLNET.SEND_TIMEOUT=30 #no default value
Properties prop = new Properties();
prop.put ("oracle.jdbc.ReadTimeout", "5000"); //5000ms
ods.setConnectionProperties(prop);
Transparent Application Failover (TAF)
Oracle Client Failover - Under The Hood
TAF is a feature of the client OCI driver that
masks many failures from the end users
– Automatic re-connection, resumable queries,
session migration
Example restrictions
– Uncommitted transactions are rolled back
– PL/SQL and session state is lost (callback functions might be a solution)
– Parallel query, database links, SYS user are not supported
– Stored procedure read is not supported
– Does not work after server process failure (ORA-03113)
Error codes reported by TAF: ORA-25400 – ORA-25425
10.09.201620
Oracle Net1
2
Fetched
Lost Fetched
Discarded
Transparent Application Failover (TAF)
Oracle Client Failover - Under The Hood
TAF properties can be set on the client or server side (recommended, higher priority)
RAC graceful session migration with TAF
10.09.201621
srvctl add service
-tafpolicy [NONE | BASIC | PRECONNECT]
-failovertype [NONE | SESSION | SELECT | TRANSACTION]
-failovermethod [NONE | BASIC] #not strictly necessary
-failoverdelay <failover_delay>
-failoverretry <failover_retries>
srvctl stop service -db <db_unique_name> -instance <instance>
-service <service> [-force]
EXEC DBMS_SERVICE.DISCONNECT_SESSION('<service>', DBMS_SERVICE.POST_TRANSACTION)
or
srvctl stop service -db <db_unique_name> -instance <instance>
-service <service> [-force]
srvctl stop instance -db <db_unique_name> -service <service>
-stopoption "TRANSACTIONAL LOCAL" #Warning: 600 sec. timeout
Fast Application Notification (FAN)
Oracle Client Failover - Under The Hood
Provides rapid notification about status changes (up/down events) for database
services, instances and nodes
Delivers workload information about services (runtime load balancing)
Starting with Oracle 12c ONS is used as
the FAN transport for all client types
FAN event consists of header and payload
information
10.09.201622
ONS ONS
FAN Subscribers
** Event Header **
Notification Type: database/event/service
Event payload:
VERSION=1.0 event_type=SERVICEMEMBER
service=sales.TRIVADIS.COM instance=RAC1
database=rac_site1 db_domain=TRIVADIS.COM
host=cldb01 status=down reason=FAILURE
timestamp=2016-09-01 18:46:52 timezone=+02:00
Fast Application Notification (FAN)
Oracle Client Failover - Under The Hood
Oracle Grid Infrastructure is necessary to register with ONS
– ONS default ports – local: 6100, remote: 6200
– Configured and started automatically for GI cluster installations
– For GI standalone systems needs to be activated and configured (e.g. Data
Guard)
Database needs to be registered in OCR/OLR with the ora.database.type type
– Does not work for user defined resources (failover databases)
Can be used with different client types: JDBC, OCI, ODP.net
– Integrated with UCP, starting with 11gR2 FAN API can be used (SimpleFan.jar)
10.09.201623
srvctl enable ons
srvctl modify ons -remoteservers <remote_node> –verbose
srvctl start ons
ONS ONS
Fast Application Notification (FAN)
Oracle Client Failover - Under The Hood
Correct database service configuration is necessary
Beginning with the 12c version (client and server), FAN-enabled clients can use FAN
auto-configuration
– For older versions you need to specify the ONS endpoints manually
ONS connects at maximum to 3 database nodes in each node group (group=cluster)
– JDBC system property can be set manually
10.09.201624
srvctl add service #The same for GDS (gdsctl)
-clbgoal [SHORT|LONG] #LONG is the default
-rlbgoal [SERVICE_TIME | THROUGHPUT | NONE]
-notification [TRUE | FALSE] #To enable FAN for OCI/ODP.net connections
pds.setONSConfiguration("nodes=blue.trivadis.com:6200,brown.trivadis.com:6200");
java -Doracle.ons.maxconnections=8 <your_programm>
ONS ONS
Fast Connection Failover (FCF)
Oracle Client Failover - Under The Hood
Pre-configured client side FAN integration for JDBC clients
Reacts to up/down FAN events (e.g. removing dead connections from connection
pool)
Do not configure TAF with FCF for JDBC thick
(OCI) clients
Example how to use FCF with Universal
Connection Pool (UCP)
– Configure ONS and database service
– Include UCP and ONS libraries in your CLASSPATH
10.09.201625
ONS
Connection Pool
(60 connections)
1
2
360 4
ONS
CLASSPATH=.:/usr/lib/oracle/12.1/client64/lib/oj
dbc7.jar:/usr/lib/oracle/12.1/client64/lib/ons.j
ar:/usr/lib/oracle/12.1/client64/lib/ucp.jar
Fast Connection Failover (FCF)
Oracle Client Failover - Under The Hood
To subscribe to FAN events and use HA UCP features you need to activate FCF first
10.09.201626
import oracle.ucp.jdbc.PoolDataSourceFactory;
import oracle.ucp.jdbc.PoolDataSource;
import oracle.ucp.jdbc.oracle.OracleJDBCConnectionPoolStatistics;
...
try {
PoolDataSource pds = PoolDataSourceFactory.getPoolDataSource();
pds.setConnectionFactoryClassName("oracle.jdbc.pool.OracleDataSource");
pds.setURL(dbURL);
pds.setUser(username);
pds.setPassword(password);
pds.setInitialPoolSize(5);
pds.setMinPoolSize(5);
pds.setMaxPoolSize(200);
pds.setConnectionPoolName("JDBC_UCP");
pds.setFastConnectionFailoverEnabled(true); //not activated per default!
Setting connection pool
properties
Fast Connection Failover (FCF)
Oracle Client Failover - Under The Hood
FCF restrictions
– In-flight transactions are lost as well as calls in the middle of execution
– Does not work after server process failure (No more data to read from socket)
– Application exception handling is absolutely necessary
isValid() method is used to check the borrowed connection after SQL exception
10.09.201627
// do some work
catch (SQLException ex) {
if (conn == null || !((ValidConnection) conn).isValid()) {
// Process FCF info...
}
...
conn.close(); //close the connection and later borrow a new one
}
Fast Connection Failover (FCF) – Connection Draining
Oracle Client Failover - Under The Hood
UCP 12.1.0.2 introduced a new system property – connection draining time
– Period of time to migrate unborrowed connections
– Default (pre 12.1.0.2): all unborrowed connections are migrated immediately
10.09.201628
// Migration Rate = CurrentPoolSize/PlannedDrainingPeriod sec.
System.setProperty("oracle.ucp.PlannedDrainingPeriod", Integer.toString(120));
ONS
UCP
ONS
Service Relocate1
Service Down
2
Application Continuity (AC)
Oracle Client Failover - Under The Hood
Addresses temporary recoverable outages of instances, databases and network
communications
Transaction Guard – server side component
– Transaction state is recorded and retrievable within database in order to ensure
idempotent execution on replay (DBMS_APP_CONT.GET_LTXID_OUTCOME)
– Can be used standalone using Oracle Client 12c for JDBC thin, OCI and ODP.net
– Available with Oracle 12c Enterprise Edition
Oracle 12c JDBC Replay Driver – client side component
– Replays the failed request so that the client may simply continue
– As of 12c Release 1 implemented only for JDBC thin client
Application Continuity requires RAC or RAC One Node or ADG (GG) option
10.09.201629
Application Continuity (AC)
Oracle Client Failover - Under The Hood
Example AC/TG interaction with UCP
10.09.201630
1
Check-out connection
(Request begin)
Associate LTXID
Send LTXID to the driver
2
3
Work: INS/DEL/UPD/COM
INS
DEL
UPD
COM
Replay Buffer
4
Communication
Break
5
Recoverable Error
SQL Exception
8
Check the last LTXID
outcome
7
If safe, Replay
6
Request new connection
9
Check-in connection
(Request end)
UCP
Runtime
Re-ConnectReplay
Processing Phases
Application Continuity (AC)
Oracle Client Failover - Under The Hood
Application Continuity with UCP
Application Continuity without connection pool
10.09.201631
PoolDataSource pds = PoolDataSourceFactory.getPoolDataSource();
pds.setConnectionFactoryClassName("oracle.jdbc.replay.OracleDataSourceImpl");
...
conn = pds.getConnection(); // Implicit database request begin
// JDBC calls protected by Application Continuity
conn.close(); // Implicit database request end
import oracle.jdbc.replay.OracleDataSourceImpl;
OracleDataSourceImpl ods = new OracleDataSourceImpl();
conn = ods.getConnection();
...
((ReplayableConnection)conn).beginRequest(); // Explicit database request begin
// JDBC calls protected by Application Continuity
((ReplayableConnection)conn).endRequest(); // Explicit database request end
conn.close();
Application Continuity (AC)
Oracle Client Failover - Under The Hood
Database service attributes for AC and TG
Some restrictions:
– Autonomous transactions, XA, ADG with read/write DB links, GoldenGate or
Logical Standby databases not supported
Error handling still necessary (non-recoverable errors, replay not possible, etc.)
10.09.201632
srvctl add service
-failovertype TRANSACTION # to enable Application Continuity
-commit_outcome TRUE # to enable Transaction Guard
-retention 86400 # the number of seconds the commit outcome is retained
-replay_init_time 900 # seconds after which replay will not be initiated
-failoverretry 20
-failoverdelay 2
-notification TRUE # with Oracle Restart, to avoid ORA-44781 during service start
Oracle Client Failover - Under The Hood33 10.09.2016
Conclusions
Conclusions
Oracle Client Failover - Under The Hood
To achieve high availability, correct client-side configuration for failover is crucial
Tuning OS kernel parameters is not the preferred way to go
VIP addresses are very useful in cluster environments
Dynamic database services are key to client high availability
At least Oracle client connect timeouts should be set (be careful with re-connect
timeouts)
TAF/FAN/FCF are very powerful
– But with some limitations – and exception handling is still necessary!
AC helps to transparently replay in-flight transactions
10.09.201634

Weitere ähnliche Inhalte

Was ist angesagt?

Understand oracle real application cluster
Understand oracle real application clusterUnderstand oracle real application cluster
Understand oracle real application cluster
Satishbabu Gunukula
 
Oracle12c data guard farsync and whats new
Oracle12c data guard farsync and whats newOracle12c data guard farsync and whats new
Oracle12c data guard farsync and whats new
Nassyam Basha
 

Was ist angesagt? (20)

Understand oracle real application cluster
Understand oracle real application clusterUnderstand oracle real application cluster
Understand oracle real application cluster
 
Expert performance tuning tips for Oracle RAC
Expert performance tuning tips for Oracle RACExpert performance tuning tips for Oracle RAC
Expert performance tuning tips for Oracle RAC
 
Convert single instance to RAC
Convert single instance to RACConvert single instance to RAC
Convert single instance to RAC
 
Dg broker &amp; client connectivity - High Availability Day 2015
Dg broker &amp; client connectivity -  High Availability Day 2015Dg broker &amp; client connectivity -  High Availability Day 2015
Dg broker &amp; client connectivity - High Availability Day 2015
 
RAC - Test
RAC - TestRAC - Test
RAC - Test
 
Oracle database high availability solutions
Oracle database high availability solutionsOracle database high availability solutions
Oracle database high availability solutions
 
Oracle Rac Performance Tunning Tips&Tricks
Oracle Rac Performance Tunning Tips&TricksOracle Rac Performance Tunning Tips&Tricks
Oracle Rac Performance Tunning Tips&Tricks
 
Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]
Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]
Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]
 
Oracle12c data guard farsync and whats new - Nassyam Basha
Oracle12c data guard farsync and whats new - Nassyam BashaOracle12c data guard farsync and whats new - Nassyam Basha
Oracle12c data guard farsync and whats new - Nassyam Basha
 
Real-Time Query for Data Guard
Real-Time Query for Data Guard Real-Time Query for Data Guard
Real-Time Query for Data Guard
 
Oracle Database 12.1.0.2: New Features
Oracle Database 12.1.0.2: New FeaturesOracle Database 12.1.0.2: New Features
Oracle Database 12.1.0.2: New Features
 
Oracle 12c and its pluggable databases
Oracle 12c and its pluggable databasesOracle 12c and its pluggable databases
Oracle 12c and its pluggable databases
 
Oracle12c data guard farsync and whats new
Oracle12c data guard farsync and whats newOracle12c data guard farsync and whats new
Oracle12c data guard farsync and whats new
 
Oracle RAC 12c (12.1.0.2) Operational Best Practices - A result of true colla...
Oracle RAC 12c (12.1.0.2) Operational Best Practices - A result of true colla...Oracle RAC 12c (12.1.0.2) Operational Best Practices - A result of true colla...
Oracle RAC 12c (12.1.0.2) Operational Best Practices - A result of true colla...
 
Oracle Drivers configuration for High Availability, is it a developer's job?
Oracle Drivers configuration for High Availability, is it a developer's job?Oracle Drivers configuration for High Availability, is it a developer's job?
Oracle Drivers configuration for High Availability, is it a developer's job?
 
Crating a Robust Performance Strategy
Crating a Robust Performance StrategyCrating a Robust Performance Strategy
Crating a Robust Performance Strategy
 
Optimizing your Database Import!
Optimizing your Database Import! Optimizing your Database Import!
Optimizing your Database Import!
 
Why oracle data guard new features in oracle 18c, 19c
Why oracle data guard new features in oracle 18c, 19cWhy oracle data guard new features in oracle 18c, 19c
Why oracle data guard new features in oracle 18c, 19c
 
PDB Provisioning with Oracle Multitenant Self Service Application
PDB Provisioning with Oracle Multitenant Self Service ApplicationPDB Provisioning with Oracle Multitenant Self Service Application
PDB Provisioning with Oracle Multitenant Self Service Application
 
Rapid Home Provisioning
Rapid Home ProvisioningRapid Home Provisioning
Rapid Home Provisioning
 

Ähnlich wie Trivadis TechEvent 2016 Oracle Client Failover - Under the Hood by Robert Bialek

Oow2007 performance
Oow2007 performanceOow2007 performance
Oow2007 performance
Ricky Zhu
 
20140513_jeffyang_demo_openstack
20140513_jeffyang_demo_openstack20140513_jeffyang_demo_openstack
20140513_jeffyang_demo_openstack
Jeff Yang
 

Ähnlich wie Trivadis TechEvent 2016 Oracle Client Failover - Under the Hood by Robert Bialek (20)

Oracle Client Failover - Under The Hood
Oracle Client Failover - Under The HoodOracle Client Failover - Under The Hood
Oracle Client Failover - Under The Hood
 
Oracle 12c Multi Process Multi Threaded
Oracle 12c Multi Process Multi ThreadedOracle 12c Multi Process Multi Threaded
Oracle 12c Multi Process Multi Threaded
 
Enterprise manager 13c -let's connect to the Oracle Cloud
Enterprise manager 13c -let's connect to the Oracle CloudEnterprise manager 13c -let's connect to the Oracle Cloud
Enterprise manager 13c -let's connect to the Oracle Cloud
 
Oracle Database: Checklist Connection Issues
Oracle Database: Checklist Connection IssuesOracle Database: Checklist Connection Issues
Oracle Database: Checklist Connection Issues
 
Oracle Enterprise Manager 12c - OEM12c Presentation
Oracle Enterprise Manager 12c - OEM12c PresentationOracle Enterprise Manager 12c - OEM12c Presentation
Oracle Enterprise Manager 12c - OEM12c Presentation
 
Trivadis TechEvent 2017 With the CLI through the Oracle Cloud Martin Berger
Trivadis TechEvent 2017 With the CLI through the Oracle Cloud Martin BergerTrivadis TechEvent 2017 With the CLI through the Oracle Cloud Martin Berger
Trivadis TechEvent 2017 With the CLI through the Oracle Cloud Martin Berger
 
Checklist_AC.pdf
Checklist_AC.pdfChecklist_AC.pdf
Checklist_AC.pdf
 
Long live to CMAN!
Long live to CMAN!Long live to CMAN!
Long live to CMAN!
 
Jérôme Witt – IT-Tage 2015 – Oracle RDBMS – Grid Infrastructure 12c: failover...
Jérôme Witt – IT-Tage 2015 – Oracle RDBMS – Grid Infrastructure 12c: failover...Jérôme Witt – IT-Tage 2015 – Oracle RDBMS – Grid Infrastructure 12c: failover...
Jérôme Witt – IT-Tage 2015 – Oracle RDBMS – Grid Infrastructure 12c: failover...
 
Oow2007 performance
Oow2007 performanceOow2007 performance
Oow2007 performance
 
CN_UNIT4.ppt ytutuim jykhjl fjghkhj gjjj
CN_UNIT4.ppt ytutuim jykhjl fjghkhj gjjjCN_UNIT4.ppt ytutuim jykhjl fjghkhj gjjj
CN_UNIT4.ppt ytutuim jykhjl fjghkhj gjjj
 
Get the most out of Oracle Data Guard - POUG version
Get the most out of Oracle Data Guard - POUG versionGet the most out of Oracle Data Guard - POUG version
Get the most out of Oracle Data Guard - POUG version
 
20140513_jeffyang_demo_openstack
20140513_jeffyang_demo_openstack20140513_jeffyang_demo_openstack
20140513_jeffyang_demo_openstack
 
An High Available Database for OpenStack Cloud Production by Pacemaker, Coros...
An High Available Database for OpenStack Cloud Production by Pacemaker, Coros...An High Available Database for OpenStack Cloud Production by Pacemaker, Coros...
An High Available Database for OpenStack Cloud Production by Pacemaker, Coros...
 
Oracle Database performance tuning using oratop
Oracle Database performance tuning using oratopOracle Database performance tuning using oratop
Oracle Database performance tuning using oratop
 
oracle dba
oracle dbaoracle dba
oracle dba
 
Ora static and-dynamic-listener
Ora static and-dynamic-listenerOra static and-dynamic-listener
Ora static and-dynamic-listener
 
CN_UNIT4.ppt notre knxckvj bjbDJKVHFL jb
CN_UNIT4.ppt notre knxckvj bjbDJKVHFL jbCN_UNIT4.ppt notre knxckvj bjbDJKVHFL jb
CN_UNIT4.ppt notre knxckvj bjbDJKVHFL jb
 
Unleash oracle 12c performance with cisco ucs
Unleash oracle 12c performance with cisco ucsUnleash oracle 12c performance with cisco ucs
Unleash oracle 12c performance with cisco ucs
 
Oracle RAC Presentation at Oracle Open World
Oracle RAC Presentation at Oracle Open WorldOracle RAC Presentation at Oracle Open World
Oracle RAC Presentation at Oracle Open World
 

Mehr von Trivadis

Mehr von Trivadis (20)

Azure Days 2019: Azure Chatbot Development for Airline Irregularities (Remco ...
Azure Days 2019: Azure Chatbot Development for Airline Irregularities (Remco ...Azure Days 2019: Azure Chatbot Development for Airline Irregularities (Remco ...
Azure Days 2019: Azure Chatbot Development for Airline Irregularities (Remco ...
 
Azure Days 2019: Trivadis Azure Foundation – Das Fundament für den ... (Nisan...
Azure Days 2019: Trivadis Azure Foundation – Das Fundament für den ... (Nisan...Azure Days 2019: Trivadis Azure Foundation – Das Fundament für den ... (Nisan...
Azure Days 2019: Trivadis Azure Foundation – Das Fundament für den ... (Nisan...
 
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
 
Azure Days 2019: Master the Move to Azure (Konrad Brunner)
Azure Days 2019: Master the Move to Azure (Konrad Brunner)Azure Days 2019: Master the Move to Azure (Konrad Brunner)
Azure Days 2019: Master the Move to Azure (Konrad Brunner)
 
Azure Days 2019: Keynote Azure Switzerland – Status Quo und Ausblick (Primo A...
Azure Days 2019: Keynote Azure Switzerland – Status Quo und Ausblick (Primo A...Azure Days 2019: Keynote Azure Switzerland – Status Quo und Ausblick (Primo A...
Azure Days 2019: Keynote Azure Switzerland – Status Quo und Ausblick (Primo A...
 
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
 
Azure Days 2019: Get Connected with Azure API Management (Gerry Keune & Stefa...
Azure Days 2019: Get Connected with Azure API Management (Gerry Keune & Stefa...Azure Days 2019: Get Connected with Azure API Management (Gerry Keune & Stefa...
Azure Days 2019: Get Connected with Azure API Management (Gerry Keune & Stefa...
 
Azure Days 2019: Infrastructure as Code auf Azure (Jonas Wanninger & Daniel H...
Azure Days 2019: Infrastructure as Code auf Azure (Jonas Wanninger & Daniel H...Azure Days 2019: Infrastructure as Code auf Azure (Jonas Wanninger & Daniel H...
Azure Days 2019: Infrastructure as Code auf Azure (Jonas Wanninger & Daniel H...
 
Azure Days 2019: Wie bringt man eine Data Analytics Plattform in die Cloud? (...
Azure Days 2019: Wie bringt man eine Data Analytics Plattform in die Cloud? (...Azure Days 2019: Wie bringt man eine Data Analytics Plattform in die Cloud? (...
Azure Days 2019: Wie bringt man eine Data Analytics Plattform in die Cloud? (...
 
Azure Days 2019: Azure@Helsana: Die Erweiterung von Dynamics CRM mit Azure Po...
Azure Days 2019: Azure@Helsana: Die Erweiterung von Dynamics CRM mit Azure Po...Azure Days 2019: Azure@Helsana: Die Erweiterung von Dynamics CRM mit Azure Po...
Azure Days 2019: Azure@Helsana: Die Erweiterung von Dynamics CRM mit Azure Po...
 
TechEvent 2019: Kundenstory - Kein Angebot, kein Auftrag – Wie Du ein individ...
TechEvent 2019: Kundenstory - Kein Angebot, kein Auftrag – Wie Du ein individ...TechEvent 2019: Kundenstory - Kein Angebot, kein Auftrag – Wie Du ein individ...
TechEvent 2019: Kundenstory - Kein Angebot, kein Auftrag – Wie Du ein individ...
 
TechEvent 2019: Oracle Database Appliance M/L - Erfahrungen und Erfolgsmethod...
TechEvent 2019: Oracle Database Appliance M/L - Erfahrungen und Erfolgsmethod...TechEvent 2019: Oracle Database Appliance M/L - Erfahrungen und Erfolgsmethod...
TechEvent 2019: Oracle Database Appliance M/L - Erfahrungen und Erfolgsmethod...
 
TechEvent 2019: Security 101 für Web Entwickler; Roland Krüger - Trivadis
TechEvent 2019: Security 101 für Web Entwickler; Roland Krüger - TrivadisTechEvent 2019: Security 101 für Web Entwickler; Roland Krüger - Trivadis
TechEvent 2019: Security 101 für Web Entwickler; Roland Krüger - Trivadis
 
TechEvent 2019: Trivadis & Swisscom Partner Angebote; Konrad Häfeli, Markus O...
TechEvent 2019: Trivadis & Swisscom Partner Angebote; Konrad Häfeli, Markus O...TechEvent 2019: Trivadis & Swisscom Partner Angebote; Konrad Häfeli, Markus O...
TechEvent 2019: Trivadis & Swisscom Partner Angebote; Konrad Häfeli, Markus O...
 
TechEvent 2019: DBaaS from Swisscom Cloud powered by Trivadis; Konrad Häfeli ...
TechEvent 2019: DBaaS from Swisscom Cloud powered by Trivadis; Konrad Häfeli ...TechEvent 2019: DBaaS from Swisscom Cloud powered by Trivadis; Konrad Häfeli ...
TechEvent 2019: DBaaS from Swisscom Cloud powered by Trivadis; Konrad Häfeli ...
 
TechEvent 2019: Status of the partnership Trivadis and EDB - Comparing Postgr...
TechEvent 2019: Status of the partnership Trivadis and EDB - Comparing Postgr...TechEvent 2019: Status of the partnership Trivadis and EDB - Comparing Postgr...
TechEvent 2019: Status of the partnership Trivadis and EDB - Comparing Postgr...
 
TechEvent 2019: More Agile, More AI, More Cloud! Less Work?!; Oliver Dörr - T...
TechEvent 2019: More Agile, More AI, More Cloud! Less Work?!; Oliver Dörr - T...TechEvent 2019: More Agile, More AI, More Cloud! Less Work?!; Oliver Dörr - T...
TechEvent 2019: More Agile, More AI, More Cloud! Less Work?!; Oliver Dörr - T...
 
TechEvent 2019: Kundenstory - Vom Hauptmann zu Köpenick zum Polizisten 2020 -...
TechEvent 2019: Kundenstory - Vom Hauptmann zu Köpenick zum Polizisten 2020 -...TechEvent 2019: Kundenstory - Vom Hauptmann zu Köpenick zum Polizisten 2020 -...
TechEvent 2019: Kundenstory - Vom Hauptmann zu Köpenick zum Polizisten 2020 -...
 
TechEvent 2019: Vom Rechenzentrum in die Oracle Cloud - Übertragungsmethoden;...
TechEvent 2019: Vom Rechenzentrum in die Oracle Cloud - Übertragungsmethoden;...TechEvent 2019: Vom Rechenzentrum in die Oracle Cloud - Übertragungsmethoden;...
TechEvent 2019: Vom Rechenzentrum in die Oracle Cloud - Übertragungsmethoden;...
 
TechEvent 2019: The sleeping Power of Data; Eberhard Lösch - Trivadis
TechEvent 2019: The sleeping Power of Data; Eberhard Lösch - TrivadisTechEvent 2019: The sleeping Power of Data; Eberhard Lösch - Trivadis
TechEvent 2019: The sleeping Power of Data; Eberhard Lösch - Trivadis
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 

Trivadis TechEvent 2016 Oracle Client Failover - Under the Hood by Robert Bialek

  • 1. BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH Oracle Client Failover Robert Bialek Principal Consultant Under The Hood
  • 2. Who Am I Oracle Client Failover - Under The Hood Principal Consultant and Trainer at Trivadis GmbH in Munich – MSc in Computer Engineering Focus: – Oracle Database High Availability – Database Architecture/Internals – Backup/Recovery – Troubleshooting/Performance Tuning – Linux Administration Trainer for the following Trivadis courses – Oracle Grid Infrastructure, RAC, Data Guard 10.09.20162
  • 3. Main Problems To Address Oracle Client Failover - Under The Hood4 10.09.2016 New network session (connect) Already established network session (re-connect) Database Clients 1 IP not reachable (server/network/… issue) 2 Connect attempts 3 Wait for timeout 4 Client failover Problem Database Clients 2 IP not reachable (server/network/… issue) 1 Connected 3 Re-connect attempts 4 Wait for timeout Problem 5 Client failover ProblemProblem
  • 4. Oracle Client Failover – And The Solution? Oracle Client Failover - Under The Hood Depends strongly on many factors – Oracle client and database version – Oracle database configuration, edition and available licenses – Oracle client libraries/version (OCI, JDBC Thin,…) – Application design – Network topology, latencies – Operating system type, version and configuration – With or without Virtual IP Addresses (VIP) Unfortunately no one size fits all solution… 10.09.20165
  • 5. Agenda Oracle Client Failover - Under The Hood 1. Operating System Introduction Connect/Re-Connect Timeouts Virtual IP Addresses TCP Keepalive 2. Oracle Client Failover Database Services Connect/Re-Connect Timeouts Transparent Application Failover Fast Application Notification/Fast Connection Failover Application Continuity 3. Conclusions 10.09.20166
  • 6. Oracle Client Failover - Under The Hood7 10.09.2016 Operating System
  • 7. Operating System – Introduction Oracle Client Failover - Under The Hood10.09.20168 [SYN] Seq=0 SYN-SENT[SYN, ACK] Seq=0 Ack=1 [ACK] Seq=1 Ack=1 SYN-RCVD ESTABLISHED ESTABLISHED 1 2 3 . . .fd -> socket:[inode]LISTEN .socket(), bind() fd -> socket:[inode] read(), write () write (), read() Data Transfer connect() TCP three-way handshake Problem? Problem? Seq=Seq+1 Seq=Seq+1
  • 8. New Network Session – Connect Timeout Oracle Client Failover - Under The Hood Kernel parameter: tcp_syn_retries – Max. number of times initial SYNs for an active TCP connection attempt will be retransmitted – Default value in OEL 5 is 5, as of OEL 6 it is 6 – Initial Retransmission Timeout (RTO) is 1s (changed in RHEL6.3 from 3 to 1) – To change the value (not persistent) 10.09.20169 [SYN] Seq=0, RTO=1 sec. [SYN] Seq=0, RTO=2 sec. [TCP Retransmission] 1 tcp_syn_retries=5 4 5 Final Timeout = 2^(tcp_syn_retries+1)-1 Timeout/Error after 63 sec.(*) ORA-12170: TNS:Connect timeout occurred #Connect timeout after 15 sec. sysctl -w net.ipv4.tcp_syn_retries=3 [SYN] Seq=0, RTO=16 sec. [TCP Retransmission] … [SYN] Seq=0, RTO=32 sec. [TCP Retransmission] IP not reachable
  • 9. New Network Session – Connect Timeout/ARP Oracle Client Failover - Under The Hood Connect timeouts controlled by tcp_syn_retries come into play, in case the client ARP (Address Resolution Protocol) cache is not up-to-date! 10.09.201610 ARP cache (192.168.122.29) IP:192.168.122.30 MAC:...:60:d4:0d REACHABLE IP Packet Source: IP, MAC Destination: IP, MAC Ethernet Frame 1 Broadcast ARP Who has 192.168.122.29? Tell 192.168.122.30 IP:192.168.122.29 MAC:...:1d:54:ec REACHABLE 3 IP:192.168.122.29 MAC:...:1d:54:ec REACHABLE 4 ARP cache (192.168.122.30) 2 Not refreshed yet! Client connect timeout (tcp_syn_retries) Refreshed! Client connect timeout ~3sec (the same network segment) ORA-12543: TNS: destination host unreachable <ARP entry removed>
  • 10. Established Network Session –Re-Connect Timeout Oracle Client Failover - Under The Hood Kernel parameter: tcp_retries2 – Max. number of TCP packet retransmissions for established sessions minus 1 – Default value: 15, Timeout range: ~15-30 min. – Initial RTO is 0.2 sec, max 120 sec. – Runtime, RTO can be changed by kernel – To change the value (not persistent) 10.09.201611 [PSH, ACK], RTO=0.2 sec. [PSH, ACK], RTO=0.4 sec. [TCP Retransmission] 1 tcp_retries2=3 Timeout/Error after 6.3 sec. [PSH, ACK], RTO=0.8 sec. [TCP Retransmission] 2 [PSH, ACK], RTO=1.6 sec. [TCP Retransmission] 3 [PSH, ACK], RTO=3.2 sec. [TCP Retransmission] 4 ORA-03113: end-of-file on communication channel ss -ipo dst 192.168.122.29 socket timer:(on,1min44sec,11) socket timer:(on,49sec,11) #1 sec. later #Connect timeout after ~12 sec. sysctl -w net.ipv4.tcp_retries2=4 Data
  • 11. Virtual IP Addresses (VIP) Oracle Client Failover - Under The Hood IP addresses which do not correspond persistently to physical NICs Client connects to network socket: <VIP>:<Port> 10.09.201612 eth0 eth0:1 VIP eth0 Server A Server B VIP:192.168.122.30 MAC:eth0<ServerA> ARP cache 1 eth0:1 VIP eth0 Server A Server B VIP:192.168.122.30 MAC:eth0<ServerB> ARP cache 2 3 Flushing neighbours ARP Cache 5 VIP Relocate eth0 TCP [RST] 4
  • 12. Network– TCP Keepalive (DCD) Oracle Client Failover - Under The Hood TCP mechanism which helps to detect broken network connections Kernel parameters For Oracle server (shadow) processes, automatically enabled on the network socket – Implementation changed in 12c (tcp socket timer instead of Oracle Net probes) For Oracle client processes not activated per default – Unless ENABLE=BROKEN specified in the connect descriptor 10.09.201613 net.ipv4.tcp_keepalive_time = 7200 #keepalive probe every 2 hrs. net.ipv4.tcp_keepalive_intvl = 75 #if not reachable probe every 75 sec. net.ipv4.tcp_keepalive_probes =9 #close the connection after 9 failed probes Probe Local Address Foreign Address State PID/Program name Timer 192.168.122.2:38814 192.168.122.3:15300 ESTABLISHED 5963/sqlplus off(0.00/0/0)
  • 13. Oracle Client Failover - Under The Hood14 10.09.2016 Oracle Client Failover
  • 14. The Foundation – Dynamic Database Services Oracle Client Failover - Under The Hood A named representation of one or more running Oracle database instances – Introduced with the Oracle 8i version – Part of the Oracle client connect descriptor – Basis of Oracle database high availability and workload management 10.09.201615 RAC Active/Active RAC Active/Passive Data Guard, RAC One Node, Failover DB Configuration Connect to database service Services registered with listener Services registered with listener
  • 15. The Foundation – Dynamic Database Services Oracle Client Failover - Under The Hood Database services can be created with – srvctl (Grid Infrastructure), gdsctl (Global Data Services) – dbms_service.create_service() PL/SQL procedure Different high availability and workload management attributes can be defined 10.09.201616 srvctl add service -db <db_unique_name> -service <service> -preferred "<preferred_list>" -available "<available_list>" -serverpool <pool_name> -cardinality [UNIFORM | SINGLETON] -tafpolicy [NONE | BASIC | PRECONNECT] -role [PRIMARY, PHYSICAL_STANDBY, LOGICAL_STANDBY, SNAPSHOT_STANDBY] -clbgoal [SHORT | LONG] -rlbgoal [SERVICE_TIME | THROUGHPUT | NONE] ... Some attributes applicable only for specific configurations Most of them available only with srvctl/gdsctl Service
  • 16. New Oracle Net Session – Connect Timeout Oracle Client Failover - Under The Hood sqlnet.ora parameters Address description parameters (>=11gR2) – Override sqlnet.ora parameters – Parameters can be used for OCI, ODP.net 10.09.201617 OLTP.trivadis.com = (DESCRIPTION = (FAILOVER=ON) (LOAD_BALANCE=OFF) (CONNECT_TIMEOUT=5)(RETRY_COUNT=3)(RETRY_DELAY=1)(TRANSPORT_CONNECT_TIMEOUT=3) (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP )(HOST = italy )(PORT = 1521)) #italy : SCAN (ADDRESS = (PROTOCOL = TCP )(HOST = sweden )(PORT = 1521))) #sweden : SCAN (CONNECT_DATA = (SERVICE_NAME = OLTP.trivadis.com))) TCP.CONNECT_TIMEOUT=3 #default 60 sec. SQLNET.OUTBOUND_CONNECT_TIMEOUT=5 LSNR LSNR Three-way handshake Oracle Net 1 2 #Max. connect time failover re-tries: 3+3+(3*6)=24 New in 12c
  • 17. New Oracle Net Session – Connect Timeout Oracle Client Failover - Under The Hood JDBC Thin url – TRANSPORT_CONNECT_TIMEOUT will be available in the 12.2 version JDBC Thin clients can alternatively use the following driver property – Overrides CONNECT_TIMEOUT from address description parameters 10.09.201618 pds.setURL("jdbc:oracle:thin:@(DESCRIPTION =(FAILOVER=ON)(LOAD_BALANCE=OFF)" + "(CONNECT_TIMEOUT=3)(RETRY_COUNT=10)(RETRY_DELAY=1)" + "(ADDRESS_LIST = " + "(ADDRESS = (PROTOCOL = TCP )(HOST = blue.trivadis.com )(PORT = 1521)) " + "(ADDRESS = (PROTOCOL = TCP )(HOST = brown.trivadis.com )(PORT = 1521))) " + "(CONNECT_DATA = (SERVICE_NAME = sales_rw.trivadis.com)))"); Properties prop = new Properties(); prop.put(oracle.net.ns.SQLnetDef.TCP_CONNTIMEOUT_STR, ""+3000); // 3000ms ods.setConnectionProperties(prop); New in 12.1.0.2, Patch 19154304
  • 18. Established Oracle Net Session – Re-Connect Timeout Oracle Client Failover - Under The Hood Break established network connection without waiting for long TCP timeouts sqlnet.ora parameters – Parameters can be used for OCI and ODP.net clients – The actual wait time is 2 x timeout value (wait for timeout  switch into break and reset mode  wait for timeout)! For JDBC Thin clients set the following connection property 10.09.201619 LSNR LSNR Oracle Net 1 P1 2 Client failover 3 SQLNET.RECV_TIMEOUT=30 #no default value SQLNET.SEND_TIMEOUT=30 #no default value Properties prop = new Properties(); prop.put ("oracle.jdbc.ReadTimeout", "5000"); //5000ms ods.setConnectionProperties(prop);
  • 19. Transparent Application Failover (TAF) Oracle Client Failover - Under The Hood TAF is a feature of the client OCI driver that masks many failures from the end users – Automatic re-connection, resumable queries, session migration Example restrictions – Uncommitted transactions are rolled back – PL/SQL and session state is lost (callback functions might be a solution) – Parallel query, database links, SYS user are not supported – Stored procedure read is not supported – Does not work after server process failure (ORA-03113) Error codes reported by TAF: ORA-25400 – ORA-25425 10.09.201620 Oracle Net1 2 Fetched Lost Fetched Discarded
  • 20. Transparent Application Failover (TAF) Oracle Client Failover - Under The Hood TAF properties can be set on the client or server side (recommended, higher priority) RAC graceful session migration with TAF 10.09.201621 srvctl add service -tafpolicy [NONE | BASIC | PRECONNECT] -failovertype [NONE | SESSION | SELECT | TRANSACTION] -failovermethod [NONE | BASIC] #not strictly necessary -failoverdelay <failover_delay> -failoverretry <failover_retries> srvctl stop service -db <db_unique_name> -instance <instance> -service <service> [-force] EXEC DBMS_SERVICE.DISCONNECT_SESSION('<service>', DBMS_SERVICE.POST_TRANSACTION) or srvctl stop service -db <db_unique_name> -instance <instance> -service <service> [-force] srvctl stop instance -db <db_unique_name> -service <service> -stopoption "TRANSACTIONAL LOCAL" #Warning: 600 sec. timeout
  • 21. Fast Application Notification (FAN) Oracle Client Failover - Under The Hood Provides rapid notification about status changes (up/down events) for database services, instances and nodes Delivers workload information about services (runtime load balancing) Starting with Oracle 12c ONS is used as the FAN transport for all client types FAN event consists of header and payload information 10.09.201622 ONS ONS FAN Subscribers ** Event Header ** Notification Type: database/event/service Event payload: VERSION=1.0 event_type=SERVICEMEMBER service=sales.TRIVADIS.COM instance=RAC1 database=rac_site1 db_domain=TRIVADIS.COM host=cldb01 status=down reason=FAILURE timestamp=2016-09-01 18:46:52 timezone=+02:00
  • 22. Fast Application Notification (FAN) Oracle Client Failover - Under The Hood Oracle Grid Infrastructure is necessary to register with ONS – ONS default ports – local: 6100, remote: 6200 – Configured and started automatically for GI cluster installations – For GI standalone systems needs to be activated and configured (e.g. Data Guard) Database needs to be registered in OCR/OLR with the ora.database.type type – Does not work for user defined resources (failover databases) Can be used with different client types: JDBC, OCI, ODP.net – Integrated with UCP, starting with 11gR2 FAN API can be used (SimpleFan.jar) 10.09.201623 srvctl enable ons srvctl modify ons -remoteservers <remote_node> –verbose srvctl start ons ONS ONS
  • 23. Fast Application Notification (FAN) Oracle Client Failover - Under The Hood Correct database service configuration is necessary Beginning with the 12c version (client and server), FAN-enabled clients can use FAN auto-configuration – For older versions you need to specify the ONS endpoints manually ONS connects at maximum to 3 database nodes in each node group (group=cluster) – JDBC system property can be set manually 10.09.201624 srvctl add service #The same for GDS (gdsctl) -clbgoal [SHORT|LONG] #LONG is the default -rlbgoal [SERVICE_TIME | THROUGHPUT | NONE] -notification [TRUE | FALSE] #To enable FAN for OCI/ODP.net connections pds.setONSConfiguration("nodes=blue.trivadis.com:6200,brown.trivadis.com:6200"); java -Doracle.ons.maxconnections=8 <your_programm> ONS ONS
  • 24. Fast Connection Failover (FCF) Oracle Client Failover - Under The Hood Pre-configured client side FAN integration for JDBC clients Reacts to up/down FAN events (e.g. removing dead connections from connection pool) Do not configure TAF with FCF for JDBC thick (OCI) clients Example how to use FCF with Universal Connection Pool (UCP) – Configure ONS and database service – Include UCP and ONS libraries in your CLASSPATH 10.09.201625 ONS Connection Pool (60 connections) 1 2 360 4 ONS CLASSPATH=.:/usr/lib/oracle/12.1/client64/lib/oj dbc7.jar:/usr/lib/oracle/12.1/client64/lib/ons.j ar:/usr/lib/oracle/12.1/client64/lib/ucp.jar
  • 25. Fast Connection Failover (FCF) Oracle Client Failover - Under The Hood To subscribe to FAN events and use HA UCP features you need to activate FCF first 10.09.201626 import oracle.ucp.jdbc.PoolDataSourceFactory; import oracle.ucp.jdbc.PoolDataSource; import oracle.ucp.jdbc.oracle.OracleJDBCConnectionPoolStatistics; ... try { PoolDataSource pds = PoolDataSourceFactory.getPoolDataSource(); pds.setConnectionFactoryClassName("oracle.jdbc.pool.OracleDataSource"); pds.setURL(dbURL); pds.setUser(username); pds.setPassword(password); pds.setInitialPoolSize(5); pds.setMinPoolSize(5); pds.setMaxPoolSize(200); pds.setConnectionPoolName("JDBC_UCP"); pds.setFastConnectionFailoverEnabled(true); //not activated per default! Setting connection pool properties
  • 26. Fast Connection Failover (FCF) Oracle Client Failover - Under The Hood FCF restrictions – In-flight transactions are lost as well as calls in the middle of execution – Does not work after server process failure (No more data to read from socket) – Application exception handling is absolutely necessary isValid() method is used to check the borrowed connection after SQL exception 10.09.201627 // do some work catch (SQLException ex) { if (conn == null || !((ValidConnection) conn).isValid()) { // Process FCF info... } ... conn.close(); //close the connection and later borrow a new one }
  • 27. Fast Connection Failover (FCF) – Connection Draining Oracle Client Failover - Under The Hood UCP 12.1.0.2 introduced a new system property – connection draining time – Period of time to migrate unborrowed connections – Default (pre 12.1.0.2): all unborrowed connections are migrated immediately 10.09.201628 // Migration Rate = CurrentPoolSize/PlannedDrainingPeriod sec. System.setProperty("oracle.ucp.PlannedDrainingPeriod", Integer.toString(120)); ONS UCP ONS Service Relocate1 Service Down 2
  • 28. Application Continuity (AC) Oracle Client Failover - Under The Hood Addresses temporary recoverable outages of instances, databases and network communications Transaction Guard – server side component – Transaction state is recorded and retrievable within database in order to ensure idempotent execution on replay (DBMS_APP_CONT.GET_LTXID_OUTCOME) – Can be used standalone using Oracle Client 12c for JDBC thin, OCI and ODP.net – Available with Oracle 12c Enterprise Edition Oracle 12c JDBC Replay Driver – client side component – Replays the failed request so that the client may simply continue – As of 12c Release 1 implemented only for JDBC thin client Application Continuity requires RAC or RAC One Node or ADG (GG) option 10.09.201629
  • 29. Application Continuity (AC) Oracle Client Failover - Under The Hood Example AC/TG interaction with UCP 10.09.201630 1 Check-out connection (Request begin) Associate LTXID Send LTXID to the driver 2 3 Work: INS/DEL/UPD/COM INS DEL UPD COM Replay Buffer 4 Communication Break 5 Recoverable Error SQL Exception 8 Check the last LTXID outcome 7 If safe, Replay 6 Request new connection 9 Check-in connection (Request end) UCP Runtime Re-ConnectReplay Processing Phases
  • 30. Application Continuity (AC) Oracle Client Failover - Under The Hood Application Continuity with UCP Application Continuity without connection pool 10.09.201631 PoolDataSource pds = PoolDataSourceFactory.getPoolDataSource(); pds.setConnectionFactoryClassName("oracle.jdbc.replay.OracleDataSourceImpl"); ... conn = pds.getConnection(); // Implicit database request begin // JDBC calls protected by Application Continuity conn.close(); // Implicit database request end import oracle.jdbc.replay.OracleDataSourceImpl; OracleDataSourceImpl ods = new OracleDataSourceImpl(); conn = ods.getConnection(); ... ((ReplayableConnection)conn).beginRequest(); // Explicit database request begin // JDBC calls protected by Application Continuity ((ReplayableConnection)conn).endRequest(); // Explicit database request end conn.close();
  • 31. Application Continuity (AC) Oracle Client Failover - Under The Hood Database service attributes for AC and TG Some restrictions: – Autonomous transactions, XA, ADG with read/write DB links, GoldenGate or Logical Standby databases not supported Error handling still necessary (non-recoverable errors, replay not possible, etc.) 10.09.201632 srvctl add service -failovertype TRANSACTION # to enable Application Continuity -commit_outcome TRUE # to enable Transaction Guard -retention 86400 # the number of seconds the commit outcome is retained -replay_init_time 900 # seconds after which replay will not be initiated -failoverretry 20 -failoverdelay 2 -notification TRUE # with Oracle Restart, to avoid ORA-44781 during service start
  • 32. Oracle Client Failover - Under The Hood33 10.09.2016 Conclusions
  • 33. Conclusions Oracle Client Failover - Under The Hood To achieve high availability, correct client-side configuration for failover is crucial Tuning OS kernel parameters is not the preferred way to go VIP addresses are very useful in cluster environments Dynamic database services are key to client high availability At least Oracle client connect timeouts should be set (be careful with re-connect timeouts) TAF/FAN/FCF are very powerful – But with some limitations – and exception handling is still necessary! AC helps to transparently replay in-flight transactions 10.09.201634