Tuesday, February 19, 2013

Mining EM Metadata - Job Health Check


This is the first of (hopefully) a number of posts about how to take advantage of information stored in Enterprise Manager's (EM) metadata.  EM has to capture a large quantity and variety of details related to its functionality, which is stored in a highly normalized format.  A lot of this information includes key details about environments that you, as a DBA, support.  Knowing what's stored and how to access it correctly can actually make your job a bit easier.

As a DBA you more than likely use a variety of monitoring tools and require assurance that those tools are working properly.  Ideally EM works without issue but obviously bugs will always be there.  For the situation discussed in this post, the bug deals with an older EM release (10.2.0.3) and problem where targets for an EM Job will disappear, for whatever reason, while other targets stay defined.

For example, let's say you have EM Job "MONITOR - ORACLE SERVICES" and it's defined to run against all cluster databases.  At some point 1 or more of those intended targets drop from the full list of the scheduled Job, so at a later point in time you may find that the EM Job is now running only against 5 cluster databases when you originally submitted it to run on 8.  That's a pretty scary situation for a DBA to find himself it, especially since this particular bug doesn't appear to give any warning or messages.

The good news is EM metadata includes everything you need to catch this situation when it happens.  The key objects are views MGMT$JOBS, MGMT$JOB_TARGETS, and MGMT$JOB_EXECUTION_HISTORY, with a little help from MGMT$TARGET_FLAT_MEMBERS.

MGMT$JOBS
View MGMT$JOBS holds basic information about every EM Job, for both what resides (if at all) in the library and for what has been submitted.  Differentiation is done using IS_LIBRARY, a flag where a value of "1" means the EM Job for that row of data is saved in the library whereas a value of "0" refers to a submitted EM Job.

COL job_name FORMAT A30
COL target_type FORMAT A15

SELECT job_name, job_type, target_type, is_library
  FROM mgmt$jobs
 WHERE job_name LIKE 'EXAMPLE_JOB%';

JOB_NAME                       JOB_TYPE                         TARGET_TYPE             IS_LIBRARY
------------------------------ -------------------------------- --------------- ------------------
EXAMPLE JOB                    OSCommand                        host                             1

Once you submit the EM Job an additional row is returned from MGMT$JOBS where IS_LIBRARY = 1:

JOB_NAME                       JOB_TYPE                         TARGET_TYPE             IS_LIBRARY
------------------------------ -------------------------------- --------------- ------------------
EXAMPLE JOB                    OSCommand                        host                             1
EXAMPLE JOB.1                  OSCommand                        host                             0

Also note that until you define a schedule for the EM Job (either "One Time" or "Repeating"), the column START_TIME will be NULL.

MGMT$JOB_TARGETS
To see which targets have been assigned to a given EM Job, use the view MGMT$JOB_TARGETS.  Be careful, though.  Like I mentioned before, EM's metadata is highly normalized, along with how you can define your own groupings.  If you create an EM Job to run against 5 specific hosts you'll give 5 rows returned for this EM Job from MGMT$JOB_TARGETS.

SELECT j.job_name, j.job_type, jt.target_name, j.target_type
  FROM mgmt$jobs j, mgmt$job_targets jt
 WHERE j.job_name LIKE 'EXAMPLE_JOB%'
   AND j.is_library = 1
   AND (j.job_id = jt.job_id);

JOB_NAME                       JOB_TYPE                         TARGET_NAME TARGET_TYPE
------------------------------ -------------------------------- ----------- ---------------
EXAMPLE JOB                    OSCommand                        linux1      host
EXAMPLE JOB                    OSCommand                        linux2      host
   ...
EXAMPLE JOB                    OSCommand                        linux5      host

But if you defined the EM Job as type "host" and assign it to an EM Group you'll get 1 row with a target type of "composite".

SELECT j.job_name, j.job_type, jt.target_name, jt.target_type
  FROM mgmt$jobs j, mgmt$job_targets jt
 WHERE j.job_name LIKE 'EXAMPLE_JOB%'
   AND j.is_library = 1
   AND (j.job_id = jt.job_id);

JOB_NAME                       JOB_TYPE                         TARGET_NAME TARGET_TYPE
------------------------------ -------------------------------- ----------- ---------------
EXAMPLE JOB                    OSCommand                        PROD_GROUP  composite

In the example above the "Target Type" defined for the EM Job is still "host" but when I selected individual Targets for the EM Job I picked a Group, as I wanted to execute the EM Job against EVERY "host" within the selected Group.  To break out (or "flatten") type "composite" you'll need to join to view MGMT$TARGET_FLAT_MEMBERS.  More on that in just a bit.

MGMT$JOB_EXECUTION_HISTORY
Now that we've found out how to list all targets defined for an EM Job, we need to find out how to compare that to what's actually scheduled.  In comes the view MGMT$JOB_EXECUTION_HISTORY.  The query I used for this is rather complex so let me first break it into pieces.  First, I union EM Jobs of non-composite target types with those that are of composite target types.  The composite target types are broken out into individual targets via view MGMT$TARGET_FLAT_MEMBERS.  Second, I take this list of defined EM Jobs and join that to MGMT$JOB_EXECUTION_HISTORY, which represents those EM Jobs that are either "Scheduled" or current "Running".  (Note: when an EM Job is "Running" it doesn't have an entry for "Scheduled".  That only happens after the execution completes.)

WITH c_jobs_defined AS (SELECT j.job_name, j.job_id, j.job_owner, j.job_type, jt.target_name
                             , jt.target_type
                          FROM mgmt$jobs j, mgmt$job_targets jt
                         WHERE (j.job_id = jt.job_id)
                           AND j.is_library = 0
                           AND j.start_time IS NOT NULL
                           AND jt.target_type NOT IN ('composite')
                               UNION ALL
                        -- Join in Jobs with a composite target type separately as these need to be
                        -- flattened out to get a full list of all targets for that composite type.
                        SELECT j.job_name, j.job_id, j.job_owner, j.job_type, tfm.member_target_name
                             , tfm.member_target_type
                          FROM mgmt$jobs j, mgmt$job_targets jt, mgmt$target_flat_members tfm
                         WHERE (j.job_id = jt.job_id)
                           AND (    jt.target_name = tfm.aggregate_target_name
                                AND jt.target_type = tfm.aggregate_target_type
                                AND j.target_type = tfm.member_target_type)
                           AND j.is_library = 0
                           AND j.start_time IS NOT NULL
                           AND jt.target_type IN ('composite'))
SELECT c.job_name, c.job_owner, c.job_type, c.target_name missing_on_target, c.target_type
  FROM c_jobs_defined c LEFT OUTER JOIN mgmt$job_execution_history jeh
       ON (c.job_id = jeh.job_id AND c.target_name = jeh.target_name)
                        INNER JOIN (SELECT member_target_name, member_target_type
                                      FROM mgmt$target_flat_members
                                     WHERE aggregate_target_name = 'PROD_GROUP') m
       ON (c.target_name = m.member_target_name AND c.target_type = m.member_target_type)
 WHERE (jeh.status IN ('Scheduled', 'Running') OR jeh.status IS NULL)
   AND jeh.job_name IS NULL
 ORDER BY c.job_name, c.target_name;

The INNER JOIN to MGMT$TARGET_FLAT_MEMBERS with the UNION-ed results is to limit the search to those targets within a particular EM Group, in this case our production Group named "PROD_GROUP".

Although this was rather lengthy to explain, hopefully it's clear now how to differentiate EM Jobs as defined in the Job Library verses those listed in Job Activity within the EM Console.