Monday, January 31, 2011

Oracle won't do partition pruning on MAX/MIN query of partition key.


  Oracle doesn’t do a partition pruning on MAX/MIN query on partition key. Even it makes perfect sense for Oracle to scan only the partition that has MAX/MIN value. And this is not something new, the user community certainly noticed this.

http://www.oramoss.com/blog/2009/06/no-pruning-for-minmax-of-partition-key.html

  Right now, all we can do is some work around. For example one of our database use this query to figure out MAX AGG_DATE as part of daily ETL process. AGG_DATE is partition key of the table and not indexed.
The old execution plan looks like this,
Ouch and yes, the Pstart is 1 and Pstop is 1149. Oracle scanned all 1149 partitions of the table and took a very long time as expected.


SQL>  explain plan for SELECT max(AGG_DATE) from (SELECT "A1"."AGG_DATE" FROM "WEB_APPS"."COUNTER_DAY_AGG" "A1" order by AGG_DATE desc );
Explained.
SQL>  select * from table(dbms_xplan.display());


PLAN_TABLE_OUTPUT
------------------------------------
Plan hash value: 4125776214
----------------------------------------------------------------------------------| Id  | Operation            | Name                       | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                            |     1 |     8 |    10M  (2)| 34:45:45 |       |       |
|   1 |  SORT AGGREGATE      |                            |     1 |     8 |            |          |       |       |
|   2 |   PARTITION RANGE ALL|                            |  5196M|    38G|    10M  (2)| 34:45:45 |     1 |  1149 |
|   3 |    TABLE ACCESS FULL |            COUNTER_DAY_AGG |  5196M|    38G|    10M  (2)| 34:45:45 |     1 |  1149 |
----------------------------------------------------------------------------------


Since this our daily job, the work around I put in is where clause. 
The plan looks better after that, Pstart is now KEY instead 1. In our case it will scan 7 daily partitions.
The stats give bogus running time estimate. The actual run time reduced from 20 minutes to 1 minute. 

SQL>  explain plan for SELECT MAX("A1"."AGG_DATE") FROM "ODS_WEB_APPS"."COUNTER_DAY_AGG" "A1" where AGG_DATE > sysdate-7;
Explained.
SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------Plan hash value: 1669369268

----------------------------------------------------------------------------------| Id  | Operation                 | Name                       | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                            |     1 |     8 |    10M  (4)| 36:09:51 |       |       |
|   1 |  SORT AGGREGATE           |                            |     1 |     8 |            |          |       |       |
|   2 |   PARTITION RANGE ITERATOR|                            |  6919K|    52M|    10M  (4)| 36:09:51 |   KEY |  1149 |
|*  3 |    TABLE ACCESS FULL      |            COUNTER_DAY_AGG |  6919K|    52M|    10M  (4)| 36:09:51 |   KEY |  1149 |
----------------------------------------------------------------------------------

Of course there's one trade off of this work around. It will limit the script's ability to catch up failed or missed loading. The script use this query to find out max loading date and catch up load from that date. So if our loading didn't run for more than 7 days, the script won't be able to catchup. I guess that's something we can live with, it's not possible that we didn't notice our daily ETL job was not running for past 7 days  :) Even in worst case scenario that really happens, we can still deal with it individually.