As part of my company's policy, we regularly perform DR-related activities, which includes switchover tests of databases configured to use Data Guard (physical standby). During one of the tests in a lower environment the switchover command (run in "dgmgrl") hung. Normally this process takes a few minutes but after waiting longer than usual, we checked the database for wait events and saw that the switchover was waiting on "EMON Termination". We also checked the primary's alert log and saw the following:
Started @ 11:11:40 2018-10-22 11:11:56.745000 -04:00 minact-scn: Master considers inst:4 dead kponpst: ORA-1033, during ksrpublish0 kponpst: ORA-1033, during ksrpublish0 kponpst: ORA-1033, during ksrpublish0 2018-10-22 11:11:58.862000 -04:00 Archived Log entry 438383 added for thread 1 sequence 31328 ID 0xe8c2ea16 dest 1: ARC4: Archive log rejected (thread 2 sequence 31330) at host '****' FAL[server, ARC4]: FAL archive failed, see trace file. ARCH: FAL archive failed. Archiver continuing ORACLE Instance **** - Archival Error. Archiver continuing. 2018-10-22 11:12:48.377000 -04:00 kponpst: ORA-1033, during ksrpublish0 2018-10-22 11:12:55.734000 -04:00 kponpst: ORA-1033, during ksrpublish0 In ARC[4] tracefile: *** 2018-10-22 11:11:58.429 4645 krsu.c Logged on to standby successfully Client logon and security negotiation successful! Error 16401 creating standby archive log file at host '****' kcrrwkx: unknown error:16401
MOS document "Database Shutdown Hangs on "EMON termination" / " Waiting for Job queue slaves to complete" (Doc ID 1915888.1)" describes this event as caused by "a few orphan subscribers for the ALERT QUEUE which caused emon to spin during shutdown.". This implies that one or more EMON background process could be hung. Checking the view V$EMON I found:
SID LAST_UPDATE_TIME
------- -----------------------------
336 04-OCT-18 23.42.02.881 -04:00
396 22-OCT-18 11.11.51.986 -04:00
427 22-OCT-18 11.11.51.986 -04:00
456 22-OCT-18 11.11.51.986 -04:00
7 22-OCT-18 11.11.51.986 -04:00
From the above it looks as though EMON SID 336 (highlighted) doesn't belong or at least is having issues. I killed this database session, but unfortunately it couldn't be killed (marked for killed). Fortunately, experience has told me to always grab the SPID associated with an Oracle session, just in case it can't be killed. After killing the OS process the switchover picked up immediately and completed without issue.