This is the last post of this mini-series regarding CPU allocation in Resource Manager. The idea behind this last post is very simple: Tracing the same test case we’ve used before and analyze trace files. This will let us understand how Oracle instrumentation works when DBRM is active and managing the CPU.
Please note that we are going to trace for only one service, that is perfectly enough for our testing.
Changing our cpu_alloc_burn.sql for tracing using 10046 event with the prefix for our traces ‘DBRM_TRACE’:
SET TERMOUT OFF alter session set tracefile_identifier='DBRM_TRACE'; alter session set events '10046 trace name context forever, level 12'; select distinct t1.N2 from t1, t2 where t1.N1t2.N2 and t1.N3t2.N1 and t1.N2 t2.N1 and t2.N2 is not null;
[oracle@phoenix resource_manager]$ ./run_adhoc.sh Starting 20 new executions for S_ADHOC service with tracing...
Now we have 20 new sessions connected to the service name S_ADHOC and consumer group ADHOC_QUERYS. The first thing that we will notice before digging into trace files is the wait event resmgr:cpu quantum:
SID STATUS RESOURCE_CONSUMER_GROUP SERVICE_NA EVENT ---------- -------- -------------------------------- ---------- ------------------------------ 22 ACTIVE ADHOC_QUERYS S_ADHOC resmgr:cpu quantum 24 ACTIVE ADHOC_QUERYS S_ADHOC resmgr:cpu quantum 26 ACTIVE ADHOC_QUERYS S_ADHOC resmgr:cpu quantum 28 ACTIVE ADHOC_QUERYS S_ADHOC resmgr:cpu quantum 29 ACTIVE ADHOC_QUERYS S_ADHOC resmgr:cpu quantum 32 ACTIVE ADHOC_QUERYS S_ADHOC resmgr:cpu quantum 34 ACTIVE ADHOC_QUERYS S_ADHOC resmgr:cpu quantum 35 ACTIVE ADHOC_QUERYS S_ADHOC resmgr:cpu quantum 38 ACTIVE ADHOC_QUERYS S_ADHOC resmgr:cpu quantum 134 ACTIVE ADHOC_QUERYS S_ADHOC resmgr:cpu quantum 136 ACTIVE ADHOC_QUERYS S_ADHOC resmgr:cpu quantum SID STATUS RESOURCE_CONSUMER_GROUP SERVICE_NA EVENT ---------- -------- -------------------------------- ---------- ------------------------------ 143 ACTIVE ADHOC_QUERYS S_ADHOC resmgr:cpu quantum 148 ACTIVE ADHOC_QUERYS S_ADHOC resmgr:cpu quantum 150 ACTIVE ADHOC_QUERYS S_ADHOC resmgr:cpu quantum 151 ACTIVE ADHOC_QUERYS S_ADHOC resmgr:cpu quantum 152 ACTIVE ADHOC_QUERYS S_ADHOC resmgr:cpu quantum 156 ACTIVE ADHOC_QUERYS S_ADHOC resmgr:cpu quantum 157 ACTIVE ADHOC_QUERYS S_ADHOC resmgr:cpu quantum 159 ACTIVE ADHOC_QUERYS S_ADHOC resmgr:cpu quantum 162 ACTIVE ADHOC_QUERYS S_ADHOC resmgr:cpu quantum
This wait event basically states that a session exists and is waiting for the allocation of a quantum of CPU. It is basically DBRM doing his job, throttling CPU allocation until it is according the plan directives that we have defined. It is then obvious if you want to reduce the persistence of this wait event (AWR will help you checking that), you have to increase your CPU allocation (your plan directives) to avoid waiting so much on it.
Another (and the best way to do it, since it gives you a lot of information) is to check the trace file that we’ve generated before:
*** 2014-06-13 17:06:39.844 WAIT #140096016814088: nam='resmgr:cpu quantum' ela= 807849 location=2 consumer group id=88620 =0 obj#=88623 tim=1402675599844408 WAIT #140096016814088: nam='Disk file operations I/O' ela= 5589 FileOperation=2 fileno=0 filetype=15 obj#=88623 tim=1402675599854817 *** 2014-06-13 17:06:40.778 WAIT #140096016814088: nam='resmgr:cpu quantum' ela= 821271 location=3 consumer group id=88620 =0 obj#=88623 tim=1402675600778500 *** 2014-06-13 17:06:41.736 WAIT #140096016814088: nam='resmgr:cpu quantum' ela= 917063 location=3 consumer group id=88620 =0 obj#=88623 tim=1402675601736754 *** 2014-06-13 17:06:42.605 WAIT #140096016814088: nam='resmgr:cpu quantum' ela= 859088 location=3 consumer group id=88620 =0 obj#=88623 tim=1402675602605611 *** 2014-06-13 17:06:43.612 WAIT #140096016814088: nam='resmgr:cpu quantum' ela= 905964 location=3 consumer group id=88620 =0 obj#=88623 tim=1402675603612339 WAIT #140096016814088: nam='direct path read' ela= 1332 file number=4 first dba=16130 block cnt=62 obj#=88623 tim=1402675603682243
Some interesting info here:
ela – Amount time in microseconds that the session spent waiting for a CPU quantum allocation. If we sum everything (all the microseconds) we will have the total time of the session that is “out of CPU”;
consumer group id– The consumer group id, maps with DBA_RSRC_CONSUMER_GROUPS view;
obj# – The object that is part of the wait itself. On our case, it is a table. Maps directly with view DBA_OBJECTS.
Of course if we use tkprof to help us, we can have a more broader picture showing that one of our 20 sessions waited 391,34 seconds during his lifetime and waited for a maximum of 1,10 seconds for a CPU quantum allocation.
Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ SQL*Net message to client 2 0.00 0.00 SQL*Net message from client 1 0.00 0.00 cursor: pin S wait on X 1 0.14 0.14 resmgr:cpu quantum 511 1.10 391.34 Disk file operations I/O 4 0.00 0.01 direct path read 105 0.30 0.96
Conclusions:
– Use math to define correctly your CPU allocation in DBRM plans and be careful with over and under allocations as they impact your database performance.
– Always try to test your DBRM implementation before go live. Sometimes complex plans can be tricky to test and if you can’t measure the impact you can be in trouble. Trial and error is not a problem, when you are not live.
– Understand how DBRM works! DBRM is a complex beast and i hope that this mini-series can help on that.
