Where ORACLE is not just another database: 2011

Sunday, November 20, 2011

GATHER_TABLE_STATS and ORA-01652

We had ORA-01652 error from one production database recently. The culprit is GATHER_TABLE_STATS job. By the way, this DB is 10.2.0.4

ORA-01652: unable to extend temp segment by 128 in tablespace TEMP
*** 2011-11-20 20:28:51.105
GATHER_STATS_JOB: GATHER_TABLE_STATS('"L53"','"L_CARD"','""', ...)
ORA-01652: unable to extend temp segment by 128 in tablespace TEMP

Originally it was error out and paging at 4AM which is really not preferred timing for On Call DBA.
Since this DB tend to have higher load during early morning any way. I changed maintenance windows to the late afternoon.
To change this auto statistics collection job time use this command.

BEGIN
DBMS_SCHEDULER.SET_ATTRIBUTE (
name => 'GATHER_STATS_JOB',
attribute => 'repeat_interval',
value => 'freq=daily;byday=SUN,MON,TUE,WED,THU,FRI,SAT;byhour=17;byminute=0; bysecond=0');
END;

However this didn't address the root cause of the issue apparently. A couple of days later the job failed again with same error.

Increase TEMP tablespace is not an option. The TEMP TBS on this DB is 95G. This job run for 3 hours and used them all. Adding more TEMP will only delay the inevitable.

I decide to changed estimate percent from auto sampling to 1% , this fixed the issue. I did some research on google about this but there's not much useful past discussion.
Only found this asktom thread pretty helpful by pointing the right statement to change the default GATHER_TABLE_STATS job
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:652425700346984666

ops$tkyte%ORA10GR2> select dbms_stats.get_param( 'estimate_percent' ) from dual;

DBMS_STATS.GET_PARAM('ESTIMATE_PERCENT')
-------------------------------------------------------------------------------
DBMS_STATS.AUTO_SAMPLE_SIZE

reset or set them with these:

http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14258/d_stats.htm#i1047505

http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14258/d_stats.htm#i1048566

Wednesday, September 07, 2011

How to connect wireless router to another wireless router

This post is not related to Oracle. It just a hint for a problem I found while I was setting up my home network by connecting a new wireless router to an existing one.

In beginning, this task seems super easy and no brainer to me. Just connect 'Internet' port of new router to any Local ethernet port on existing router and setup new router and Wala!

Oh well, it doesn't work. The new router keeps complaining it's not connected to internet. The Internet setup page show it get 127.0.0.1 (localhost) as DHCP address from old router and that doesn't work obviously. Actually some network guru probably already figured out of the problem when they saw this.

So why it's get a 127.0.0.1 address instead of a valid DHCP release? Well, the trick is most router by default using 192.168.1.1 address and subnet. So if two routers sharing the same address, of course the new one will get localhost as address thinking he is 192.168.1.1

The solution is easy, change the new router's default subnet to 192.168.2.1 etc. or change new router's IP to something like 192.168.1.10

Monday, August 29, 2011

ORA-01555 with Query Duration=0 sec

Most DBAs know that ORA-1555 is caused by long running query. And in alert.log file it will tell you which SQL caused ORA-1555 and run for how long.
However from time to time you will see errors like this. It's basically tell you that the query failed right away. So why's the case?

Mon Aug 29 06:39:09 2011

ORA-01555 caused by SQL statement below (SQL ID: 0jc2g6km899ps, Query Duration=0 sec, SCN: 0x00ae.75483a06):

Mon Aug 29 06:39:09 2011

SELECT.xxxx

I ran a query to find out the time stamp of this query's SCN and found out that the query has a time stamp of 6AM. But i was failed 40 min later. That could only mean one thing that it was in a transaction that started 6AM and Oracle already over written the data in UNDO.

SYS@VAULTPROD>select scn_to_timestamp(749291977222) from dual;

SCN_TO_TIMESTAMP(749291977222)

---------------------------------------------------------------------------

29-AUG-11 06.00.01.000000000 AM

There’s a couple of ways to help improve the situation.

Does all the statements in this job need to be in single transaction? If not, don’t put them into single transaction.
Increase undo retention of DB, Oracle will try to honor this retention subject to UNDO space.
Increase the UNDO tablespace to mitigate the potential space squeeze but remember the reason we got this error is not from UNDO space limitation.

Wednesday, March 09, 2011

Data pump expdp failed with DMSYS related errors ORA-39126 ORA-06512 etc

Today one of our data pump export/import jobs failed with errors attached at the bottom. The process working fine before our 11g upgrade.
I did a little research and found metalink doc 304449.1 has perfect solution.

The problem is we removed some unused database options before we upgrade from 10g to 11g.
The reason is because with all these unnecessary options, the upgrade scripts will run almost two hours. Removing them the upgrade will finish in 15 minutes.

It turns out DMSYS data mining option is among them, but somehow Oracle didn't cleanly remove the option with some left over records in data pump export table.

The solution in this case is delete these records,

SQL> DELETE FROM exppkgact$ WHERE SCHEMA='DMSYS';
SQL> commit;

There are other potential causes for the same error. You can check the metalink doc for more info.

Database Data Pump Export fails with PLS-00201 identifier DMSYS.DBMS_MODEL_EXP must be declared [ID 304449.1]

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Starting "SYSTEM"."SYS_IMPORT_SCHEMA_11": userid=system/********@TEST parfile=/home/oracle/dba/sql/DWS.par
Estimate in progress using BLOCKS method...
Processing object type SCHEMA_EXPORT/TABLE/TABLE_DATA
ORA-39126: Worker unexpected fatal error in KUPW$WORKER.GET_TABLE_DATA_OBJECTS []
ORA-31642: the following SQL statement fails:
BEGIN "DMSYS"."DBMS_DM_MODEL_EXP".SCHEMA_CALLOUT(:1,0,1,'11.02.00.00.00'); END;
ORA-06512: at "SYS.DBMS_SYS_ERROR", line 86
ORA-06512: at "SYS.DBMS_METADATA", line 1245
ORA-04063: package body "DMSYS.DBMS_DM_MODEL_EXP" has errors
ORA-06508: PL/SQL: could not find program unit being called: "DMSYS.DBMS_DM_MODEL_EXP"
ORA-06512: at "SYS.DBMS_METADATA", line 5300
ORA-06512: at "SYS.DBMS_SYS_ERROR", line 86
ORA-06512: at "SYS.KUPW$WORKER", line 8159
----- PL/SQL Call Stack -----
object line object
handle number name
70000007ddbc258 19028 package body SYS.KUPW$WORKER
70000007ddbc258 8191 package body SYS.KUPW$WORKER
70000007ddbc258 12728 package body SYS.KUPW$WORKER
70000007ddbc258 4618 package body SYS.KUPW$WORKER
70000007ddbc258 8902 package body SYS.KUPW$WORKER
70000007ddbc258 1651 package body SYS.KUPW$WORKER
70000007eaf9060 2 anonymous block

Monday, January 31, 2011

Oracle won't do partition pruning on MAX/MIN query of partition key.

Oracle doesn’t do a partition pruning on MAX/MIN query on partition key. Even it makes perfect sense for Oracle to scan only the partition that has MAX/MIN value. And this is not something new, the user community certainly noticed this.

http://www.oramoss.com/blog/2009/06/no-pruning-for-minmax-of-partition-key.html

Right now, all we can do is some work around. For example one of our database use this query to figure out MAX AGG_DATE as part of daily ETL process. AGG_DATE is partition key of the table and not indexed.
The old execution plan looks like this,
Ouch and yes, the Pstart is 1 and Pstop is 1149. Oracle scanned all 1149 partitions of the table and took a very long time as expected.

SQL> explain plan for SELECT max(AGG_DATE) from (SELECT "A1"."AGG_DATE" FROM "WEB_APPS"."COUNTER_DAY_AGG" "A1" order by AGG_DATE desc );
Explained.
SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
------------------------------------
Plan hash value: 4125776214
----------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8 | 10M (2)| 34:45:45 | | |
| 1 | SORT AGGREGATE | | 1 | 8 | | | | |
| 2 | PARTITION RANGE ALL| | 5196M| 38G| 10M (2)| 34:45:45 | 1 | 1149 |
| 3 | TABLE ACCESS FULL | COUNTER_DAY_AGG | 5196M| 38G| 10M (2)| 34:45:45 | 1 | 1149 |
----------------------------------------------------------------------------------

Since this our daily job, the work around I put in is where clause.

The plan looks better after that, Pstart is now KEY instead 1. In our case it will scan 7 daily partitions.

The stats give bogus running time estimate. The actual run time reduced from 20 minutes to 1 minute.

SQL> explain plan for SELECT MAX("A1"."AGG_DATE") FROM "ODS_WEB_APPS"."COUNTER_DAY_AGG" "A1" where AGG_DATE > sysdate-7;

Explained.

SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT

----------------------------------------------------------------------------------Plan hash value: 1669369268

----------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |

------------------------------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 8 | 10M (4)| 36:09:51 | | |

| 1 | SORT AGGREGATE | | 1 | 8 | | | | |

| 2 | PARTITION RANGE ITERATOR| | 6919K| 52M| 10M (4)| 36:09:51 | KEY | 1149 |

|* 3 | TABLE ACCESS FULL | COUNTER_DAY_AGG | 6919K| 52M| 10M (4)| 36:09:51 | KEY | 1149 |

----------------------------------------------------------------------------------

Of course there's one trade off of this work around. It will limit the script's ability to catch up failed or missed loading. The script use this query to find out max loading date and catch up load from that date. So if our loading didn't run for more than 7 days, the script won't be able to catchup. I guess that's something we can live with, it's not possible that we didn't notice our daily ETL job was not running for past 7 days :) Even in worst case scenario that really happens, we can still deal with it individually.