Unfortunately there are no metrics available for specific "library cache*" wait events in EM12c. There is a metric for a higher level, on the wait class, so in this case that'd be "Concurrency". But thinking this through I realized alerting on "Concurrency" wouldn't work for this case. The idea is to perform regular checks for "library cache*" waits and if repeated checks (at least 2) find x number of sessions waiting on this event then an alert should be raised. If we check for "Concurrency", we could run into a situation where at timestamp#1 there were 10 sessions waiting on "library cache lock" and at timestamp#2 there were 10 sessions waiting on "enq: HV - contention". 2 consecutive checks found 10 or more sessions waiting on the metric so an alert would be raised, even though in my contrived example it's not what we want.
Another option would be either to use a temporary table to store snapshots of current session waits or to query AWR data. A temporary table wouldn't work because this is a read-only standby. AWR data wouldn't work because 1, monitored would be in the past, the earliest being 1 snapshot ago. Plus, again this is a standby database so any AWR present would be a copy of the primary.
Our solution was to go with a Metric Extension. The ME was based on the cursor:
SELECT COUNT(*) total_sessions
FROM v$session
WHERE state = 'WAITING'
AND event LIKE
'library cache%';
... set at the instance-level. Thresholds were set for 2 (or more) consecutive checks at 10 (warning) or 20 (critical). After updating the appropriate Incident Rule everything worked as expected.
No comments:
Post a Comment