Ever work on a problem where initial observation and past experience says "1 + 1 = 2" yet it keeps coming out "3"? I recently ran into a situation like this, where all past experience told me that the query should run with parallelism and even the xplan showed it would, yet during the execution of the query I could tell it was running serially.
A simplified version of the original query is listed below, involving 3 tables that all had a parallel degree of 8:
SELECT std_function_pkg.domain_group_fnc(dn.domain, dn.domain_suffix)
, MAX(CASE WHEN event_type = 'C' THEN e.event_date ELSE NULL END) latest_type
FROM dn_tb dn LEFT OUTER JOIN vlm_sum_tb vlm
ON (dn.member_id = vlm.member_id)
LEFT OUTER JOIN
event_tb PARTITION (v8) e
ON ( dn.member_id = e.member_id
AND e.vendor_id = 13569
AND e.event_type IN ('O','C'))
GROUP BY std_function_pkg.domain_group_fnc(dn.domain, dn.domain_suffix);
This query yields an xplan as follows:
-------------------------------------------------------------------------------------------------------------------------------------- | Id|Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib -------------------------------------------------------------------------------------------------------------------------------------- | 0|SELECT STATEMENT | | 14960 | 613K| 83941 (5)| 00:16:48 | | | | | | 1| PX COORDINATOR FORCED SERIAL | | | | | | | | | | | 2| PX SEND QC (RANDOM) | :TQ10003 | 14960 | 613K| 83941 (5)| 00:16:48 | | |Q1,03 | P->S |QC (RAND) | 3| SORT GROUP BY | | 14960 | 613K| 83941 (5)| 00:16:48 | | |Q1,03 | PCWP | | 4| PX RECEIVE | | 14960 | 613K| 83941 (5)| 00:16:48 | | |Q1,03 | PCWP | | 5| PX SEND HASH | :TQ10002 | 14960 | 613K| 83941 (5)| 00:16:48 | | |Q1,02 | P->P | HASH | 6| SORT GROUP BY | | 14960 | 613K| 83941 (5)| 00:16:48 | | |Q1,02 | PCWP | | 7| NESTED LOOPS OUTER | | 4568K| 182M| 83713 (4)| 00:16:45 | | |Q1,02 | PCWP | |* 8| HASH JOIN RIGHT OUTER | | 253K| 6444K| 54 (6)| 00:00:01 | | |Q1,02 | PCWP | | 9| PX RECEIVE | | 286K| 1676K| 7 (0)| 00:00:01 | | |Q1,02 | PCWP | | 10| PX SEND HASH | :TQ10000 | 286K| 1676K| 7 (0)| 00:00:01 | | |Q1,00 | P->P | HASH | 11| PX BLOCK ITERATOR | | 286K| 1676K| 7 (0)| 00:00:01 | 1 | 16 |Q1,00 | PCWC | | 12| TABLE ACCESS FULL | VLM_SUM_TB | 286K| 1676K| 7 (0)| 00:00:01 | 1 | 16 |Q1,00 | PCWP | | 13| PX RECEIVE | | 253K| 4957K| 46 (5)| 00:00:01 | | |Q1,02 | PCWP | | 14| PX SEND HASH | :TQ10001 | 253K| 4957K| 46 (5)| 00:00:01 | | |Q1,01 | P->P | HASH | 15| PX BLOCK ITERATOR | | 253K| 4957K| 46 (5)| 00:00:01 | 1 | 16 |Q1,01 | PCWC | | 16| TABLE ACCESS FULL | DN_TB | 253K| 4957K| 46 (5)| 00:00:01 | 1 | 16 |Q1,01 | PCWP | | 17| PARTITION RANGE SINGLE| | 18 | 288 | | | 8 | 8 |Q1,02 | PCWP | | 18| PARTITION LIST INLIST| | 18 | 288 | | |KEY(I) |KEY(I) |Q1,02 | PCWP | | 19| VIEW | | 18 | 288 | | | | |Q1,02 | PCWP | |*20| TABLE ACCESS FULL | EVENT_TB | 18 | 324 | 51903 (7)| 00:10:23 |KEY(I) |KEY(I) |Q1,02 | PCWP | --------------------------------------------------------------------------------------------------------------------------------------
As you can see from the xplan, the query should use parallelism. Yet when monitoring the system while the query was running, I saw that V$PX_SESSION returned 0 rows (meaning no PX slaves were active) and that the only session involved in the query was generating "db file scattered read" waits, which is a non-parallel read wait.
I ended up taking the trial and error approach, finding a basic version of the query that ran properly with parallelism, then added back items from the column list and GROUP BY section until parallelism was removed.
It turns out that the call to the user function DOMAIN_GROUP_FNC was the culprit. From there I found out, by isolating the function from the package, that the function itself wasn't the problem but definitions for the function WITHIN the package was the problem.
Within the package STD_FUNCTION_PKG, the function declaration was followed by the directive: pragma restrict_references(domain_group_fnc, WNPS); I don't know the original intent of the programmers who added this, but for a parallel query, this won't work since the documentation clearly states that all four constraints (RNDS, RNPS, WNDS, WNPS) must be specified when using a parallel query. Plus, the clause PARALLEL_ENABLE needs to be added when the function is contained within a package (for that function to allow the query to run with parallelism).
In the end, it points out that wait events are not only key for resolving performance issues, but an excellent method for determining what the query is really doing.