Normally I don't spend a lot of time reviewing connection
details in my client's listener logs, but I came across a situation recently
where failover and connection distribution were being questioned, so I figured
it'd be best to review listener log data just to be safe. What I ended up finding was that connection
counts as recorded in the listener logs was nearly twice that of what
"logons cumulative" statistic was showing, which in my mind wasn't
right.
To perform any type of listener log analysis, I first
needed some sort of tool, as the system involved is a 4-node OLTP RAC with
100,000's of connections happening per day.
Fortunately Arup Nanda did most of this work for me, based on his DBAzine
article on 9/16/08 entitled "Mining Information from the Listener
Log". I used his notes to create an
External Table for pointing to a given listener log and an Oracle function for
properly parsing out details within lines of the listener log. (See the end of this article for SQL queries
involved to help analyze the data.)
As for the hardware and database involved, the following
uses simplified names of targets to help illustrate the problem:
- 4-node RAC running RHEL 4.8 and Oracle 10.2.0.2.
- Nodes are named RAC1, RAC2, RAC3, and RAC4 with one 1 Instance of the database on each, named inst1, inst2, inst3, and inst4
- 2 Oracle Services are defined to use Inst1 and Inst2 as a strictly OLTP set of balanced Instances (service "inst1and2") and use Inst3 and Inst4 as a strictly Batch set of balanced Instances (service "batch1and2").
Back to the problem, I found that Service
"inst1and2" was getting 1,000's of connections per hour against
Instances Inst3 and Inst4, which from my understanding couldn't happen. I used "srvctl" as one way to
validate this Service's definition, just to be sure:
% srvctl config service -d
$db_unique_name -a
inst1and2 PREF: inst1 inst2
AVAIL: TAF: NONE
batch1and2 PREF: inst3 inst4
AVAIL: TAF: NONE
As a last check, I asked the Application team to send me
the exact connection string they were using.
Sure enough, this pointed out the problem:
(DESCRIPTION = (ADDRESS =
(PROTOCOL = TCP)(HOST = RACVIP3)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST
= RACVIP4)(PORT = 1521)) (LOAD_BALANCE = yes)(FAILOVER = ON) (CONNECT_DATA =
(SERVER = DEDICATED) (SERVICE_NAME = inst1and2)(FAILOVER_MODE = (TYPE = SELECT)
(METHOD = BASIC))))
Testing this from my PC using "sqlplus
user/pass@'' ", I reproduced the
problem. I found that RAC3 / 4's
listener log recorded a connection:
18-APR-2011 09:52:03 *
(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=inst1and2)(FAILOVER_MODE=(TYPE=SELECT)(METHOD=BASIC))(CID=(PROGRAM=C:\oracle\product\10.2.0\db_1\bin\sqlplus.exe)(HOST=myserver123)(USER=ME)))
* (ADDRESS=(PROTOCOL=tcp)(HOST=...)(PORT=1234)) * establish * inst1and2 * 0
… and also RAC1 / 2's listener log recorded a connection:
18-APR-2011 09:52:03 *
(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=inst1and2)(FAILOVER_MODE=(TYPE=SELECT)(METHOD=BASIC))(CID=(PROGRAM=C:\oracle\product\10.2.0\db_1\bin\sqlplus.exe)(HOST=myserver123)(USER=ME))(INSTANCE_NAME=inst1))
* (ADDRESS=(PROTOCOL=tcp)(HOST=...)(PORT=2345)) * establish * inst1and2 * 0
Notice the string "…INSTANCE_NAME…" in the
listener log entry for RAC1 / 2.
According to MOS document 263599.1, when this is present it represents a
redirected connection, which is true in the test case above. Simple enough, except that this string
DOESN'T occur in any of listener log entries in question. Further down in the same MOS document,
there's a short blurb about thin jdbc connections and that if these connections
are redirected, there won't be any mention of this in the listener log, as in
the string INSTANCE_NAME won't be present.
This matches my case, as all application servers involved
are connecting via thin jdbc. MOS
document 263599.1 also points out that "lsnrctl services" can also
point out connect redirects. Checking
RAC4 I see the following:
Service "inst1and2" has 2 instance(s).
Instance "inst1", status READY, has 1 handler(s) for this
service...
Handler(s):
"DEDICATED" established:658909
refused:0 state:ready
REMOTE SERVER
(ADDRESS=(PROTOCOL=TCP)(HOST=RAC1)(PORT=1521))
Instance "inst2", status READY, has 1 handler(s) for this
service...
Handler(s):
"DEDICATED" established:737610
refused:0 state:ready
REMOTE SERVER
(ADDRESS=(PROTOCOL=TCP)(HOST=RAC2)(PORT=1521))
For Service "inst1and2" on RAC4, 659,909
connections have been redirected to "inst1" and 737,610 connections
have been redirected to "inst2", all since the listener was last
restarted.
Although all of this helps account for any connection
count discrepancies between the listener log and database statistic
"logons cumulative", I'm left with having to check "lsnrctl
services" and connect strings from application servers to help tie it all
together. It'd sure be a lot easier, at
least to me, for Oracle to have included a message like "redirect"
within the listener log entry to point out what happened.
SQL Code
The following is an example of a query I used to analyze
listener log data via an external table.
I'm including this as I found a few cases where the data didn't line up
as expected and I had to use a bit that knowledge to handle most of those
cases.
SELECT min_log_date,
max_log_date,
CASE WHEN
parse_listener_log_line_fnc(connect_string, 'HOST') IS NOT NULL AND
parse_listener_log_line_fnc(connect_string, 'HOST') NOT IN ('__jdbc__')
THEN
parse_listener_log_line_fnc(connect_string, 'HOST')
ELSE
REGEXP_REPLACE(REGEXP_REPLACE(protocol_info,
'(^.*HOST=)([0-9]+.[0-9]+.[0-9]+.[0-9]+).*\$',
'\2'),
'(^.*HOST=)(lnx|racvip){1}([0-9]+.us.dnb.com).*\$',
'\2\3')
END source_machine
,
NVL(parse_listener_log_line_fnc(connect_string, 'PROGRAM'),
parse_listener_log_line_fnc(connect_string,
'HOST'))
, CASE WHEN connect_string IN ('ping',
'service_update', 'status') THEN connect_string
WHEN protocol_info IN ('services',
'status') THEN protocol_info
ELSE action
END operation
, NVL(service_name, protocol_info)
service, return_code, COUNT(*)
FROM listener_log_tb
, (SELECT MIN(log_date) min_log_date,
MAX(log_date) max_log_date
FROM listener_log_tb)
GROUP BY min_log_date, max_log_date
,
CASE WHEN parse_listener_log_line_fnc(connect_string, 'HOST') IS NOT NULL AND
parse_listener_log_line_fnc(connect_string,
'HOST') NOT IN ('__jdbc__')
THEN
parse_listener_log_line_fnc(connect_string, 'HOST')
ELSE
REGEXP_REPLACE(REGEXP_REPLACE(protocol_info,
'(^.*HOST=)([0-9]+.[0-9]+.[0-9]+.[0-9]+).*\$',
'\2'),
'(^.*HOST=)(lnx|racvip){1}([0-9]+.us.dnb.com).*\$',
'\2\3')
END
,
NVL(parse_listener_log_line_fnc(connect_string, 'PROGRAM'),
parse_listener_log_line_fnc(connect_string,
'HOST'))
,
CASE WHEN connect_string IN ('ping', 'service_update', 'status') THEN
connect_string
WHEN
protocol_info IN ('services', 'status') THEN protocol_info
ELSE action
END
,
NVL(service_name, protocol_info), return_code;