Wednesday, March 21, 2018

Wrapping "repvfy" to Make Your OEM Life Easier

For those of you who manage Oracle Enterprise Manager environments, you know how important the metadata is.  Fortunately, Oracle provides a utility called "repvfy" that, as the abbreviation implies, helps you verify the repository.  In this blog I'd like to share how you can wrap a bit of code around the execution of "repvfy" to make your life a bit easier.
Before proceeding, note that:
  • I'll be using the variable $EMDIAG_HOME to refer to where the "repvfy" utility has been extracted.
  • Our OEM environments that "repvfy" and its associated wrapper code run against are 11.1.0.1, 12.1.0.4 and 12.1.0.5.
  • The wrapper itself can be written in a variety of languages but since it doesn't require complex coding nor has strict elapsed time requirements and all members of our DBA team are proficient in bash, it was decided to use bash.
We created crontab entries to run "repvfy" on a daily basis against each of our OEM environments, including the "-fix" argument so that in case the found issue is simple to resolve, "repvfy" can take care of it.  But, periodically issues are found that require more effort so we decided it would helpful to receive an email with the output from each "repvfy" execution.  Reviewing these emails isn't too difficult as output is similar to the following:

verifyAGENTS
1008. NMO not setuid-root (Unix-only): 2
6002. Blocked Agents: 2
8005. Broken Agents: 1
verifyASLM
verifyAVAILABILITY
8001. Composite availability errors: 2
verifyBLACKOUTS
7001. Expired blackout windows: 9
      Fix: 9
verifyCA
verifyCAT
 ...
verifyTARGETS
1017. Platform ID mismatch between host and ORACLE_HOME: 25
2006. Targets with missing ORACLE_HOME target: 4
2013. CRS clusters with nodes not discovered: 8
3002. DB Systems linked to multiple databases: 2
6004. Targets not uploading: 33
7008. Orphan source target associations: 57
7010. Systems without members: 12
7013. Composite targets without metric dependency details: 4
8002. Broken targets: 3
verifyTEMPLATES
verifyUPGRADE
verifyUSERS
1003. Custom super user admins.: 11

But some of these we know can be skipped, the BLACKOUTS test 7001 alert can be ignored since all issues were fixed, etc.  So reviewing output like this involves some repetitive work.

Here's where the "make your life a bit easier" part comes in.  We can wrap a bit of code around "repvfy" to avoid some of the repetitive work.  This starts by parsing output from "repvfy".

By default, "repvfy" creates a logfile of each execution under $EMDIAG_HOME/log.  If you run a "verify" then the logfile will have a format similar to "verify_<yyyy>_<mm>_<dd>_<hh24miss>.log".  To obtain the most recent logfile, code like the following can be used:

% ls -lt $EMDIAG_HOME/log/verify_* | head -1 | awk '{print $NF}'

Parsing the logfile is relatively straight-forward as each module begins with "verify" followed by 0 to many tests, each of which are prefixed by a test 4-digit number.  An example of a loop for parsing out the MODULE sections is:

for REPVFY_LINE in `grep -E '(^verify|^[0-9]*\. |^ *Fix)' $LOGFILE | awk '{if ($1 ~ /verify/)  PREFIX=$1; else print PREFIX" "$0}' | grep -vE '(^verifyEMDIAG 6001\.|^verifyEXADATA 6006\.|^verifyREPOSITORY 6039\.|^verifyREPOSITORY 7001\.|verifyJOBS 707\.|verifyTARGETS 605\.)' | sed "s/ /~/g"`

The above line does the following:
1.       Start a loop containing all lines that begin with "verify" (the module header), a set of digits following by a period then a space (test output lines) or any line whose first characters are "Fix" -> grep -E '(^verify|^[0-9]*\. |^ *Fix)
2.       Use "awk" to prefix each module's test or "Fix" lines with the module header -> awk '{if ($1 ~ /verify/)  PREFIX=$1; else print PREFIX" "$0}'
3.       Skip several test results that we aren't concerned with -> grep -vE '(^verifyEMDIAG 6001 ... verifyTARGETS 605\.)'.  We've chosen to skip these alerts because either we know of the situation and don't see it as an issue.  For example, test 707 in module JOBS concerns active executions reaching job purge time.  This is typical archive activity and a non-issue.
4.       Last, each line's spaces are changed to "~" for easier work with delimiting fields.

The previously listed "repvfy" output now looks like:

verifyAGENTS~1008.~NMO~not~setuid-root~(Unix-only):~2
verifyAGENTS~6002.~Blocked~Agents:~2
verifyAGENTS~6006.~Deployed~Agent~plugins~lower~than~OMS~plugin:~8
verifyAGENTS~8005.~Broken~Agents:~1
verifyAVAILABILITY~8001.~Composite~availability~errors:~2
verifyBLACKOUTS~7001.~Expired~blackout~windows:~9
verifyBLACKOUTS~~~~~~~Fix:~9
verifyTARGETS~1017.~Platform~ID~mismatch~between~host~and~ORACLE_HOME:~25
verifyTARGETS~2006.~Targets~with~missing~ORACLE_HOME~target:~4
verifyTARGETS~2013.~CRS~clusters~with~nodes~not~discovered:~8
verifyTARGETS~3002.~DB~Systems~linked~to~multiple~databases:~2
verifyTARGETS~6004.~Targets~not~uploading:~33
verifyTARGETS~7008.~Orphan~source~target~associations:~57
verifyTARGETS~7010.~Systems~without~members:~12
verifyTARGETS~7013.~Composite~targets~without~metric~dependency~details:~4
verifyTARGETS~8002.~Broken~targets:~3
verifyUSERS~1003.~Custom~super~user~admins.:~11

At this point further logic is applied based on the module and test to try and fix some issues.  One example is test 6004 in the TARGETS module.  This test concerns targets that aren't uploading data.  What we found, though, is that clusters/scans only upload data every 24hrs so we rerun this test *manually* by querying the metadata directly, filtering out object types of 'cluster'.

How did we determine what the best cursor to use for this test?  Another great feature of "repvfy" is that the cursors on which tests are based are saved in a .sql file when the "-details" argument is passed.  In the example of test 6004 in the TARGETS module we ran:

% $EMDIAG_HOME/bin/repvfy verify -module TARGETS -test 6004 -details

… and then reviewed the generated .sql file under $EMDIAG_HOME/log to see the cursor that the tool used.

Before emailing the finalized output we still need to deal with test alerts that are automatically fixed.  Those are the lines that had begun with "Fix" as its first characters.  For these we compare the total alerts reported with the total reported as fixed.  If the totals match, we skip this alert, otherwise we report it.  Here's a code snippet showing this logic:

if [ -n "$REPVFY_PREVLINE" ]; then
   if [ `echo $REPVFY_LINE | grep -c "Fix:"` -eq 1 -a $ERROR_TOTAL -eq $ERROR_PREVTOTAL ]; then
      ERROR_PREVTOTAL=
      ERROR_TOTAL=
      REPVFY_PREVLINE=
      REPVFY_LINE=
   elif [ `echo $REPVFY_LINE | grep -c "Fix:"` -eq 0 -o $ERROR_TOTAL -ne $ERROR_PREVTOTAL ]; then
      echo "$REPVFY_PREVLINE" >>$PARSE_LOGFILE
      ERROR_PREVTOTAL=$ERROR_TOTAL
      REPVFY_PREVLINE=$REPVFY_LINE
   fi
else
   ERROR_PREVTOTAL=$ERROR_TOTAL
   REPVFY_PREVLINE=$REPVFY_LINE
fi


The last step is to email lines of output that, through the logic in this wrapper, only pertain to what our team should be concerned with.  In the example above the final list is:

verifyAGENTS 1008. NMO not setuid-root (Unix-only): 2
verifyAGENTS 6002. Blocked Agents: 2
verifyAGENTS 6006. Deployed Agent plugins lower than OMS plugin: 8
verifyAGENTS 8005. Broken Agents: 1
verifyTARGETS 1017. Platform ID mismatch between host and ORACLE_HOME: 25
verifyTARGETS 2006. Targets with missing ORACLE_HOME target: 4
verifyTARGETS 2013. CRS clusters with nodes not discovered: 8
verifyTARGETS 3002. DB Systems linked to multiple databases: 2
verifyTARGETS 6004. Targets not uploading: 32
verifyTARGETS 7008. Orphan source target associations: 57
verifyTARGETS 7010. Systems without members: 12
verifyTARGETS 7013. Composite targets without metric dependency details: 4
verifyTARGETS 8002. Broken targets: 3
verifyUSERS 1003. Custom super user admins.: 11


With this output we know that any issues that could have been automatically fixed have been addressed and that ones we don't need to be concerned with aren't listed, making the alert more meaningful for us.

Hopefully this showed how you can easily wrap a bit of additional functionality around the "repvfy" utility to remove alerts you don't care about and add a bit of code to try to automatically address other issues.

Additional Notes

Although a full explanation of how to use the "repvfy" utility is beyond this post, a good article in MOS for further research is " EMDIAG Repvfy 12c /13c Kit - How to Use the Repvfy 12c/13c kit (Doc ID 1427365.1".

It can be helpful to know what all the possible tests are for a given module.  This can be achieved using the "list" parameter.  For example, to see all tests for the AGENTS module use:

% $EMDIAG_HOME/bin/repvfy -module AGENTS list