Where ORACLE is not just another database

Wednesday, September 07, 2011

How to connect wireless router to another wireless router

This post is not related to Oracle. It just a hint for a problem I found while I was setting up my home network by connecting a new wireless router to an existing one.

In beginning, this task seems super easy and no brainer to me. Just connect 'Internet' port of new router to any Local ethernet port on existing router and setup new router and Wala!

Oh well, it doesn't work. The new router keeps complaining it's not connected to internet. The Internet setup page show it get 127.0.0.1 (localhost) as DHCP address from old router and that doesn't work obviously. Actually some network guru probably already figured out of the problem when they saw this.

So why it's get a 127.0.0.1 address instead of a valid DHCP release? Well, the trick is most router by default using 192.168.1.1 address and subnet. So if two routers sharing the same address, of course the new one will get localhost as address thinking he is 192.168.1.1

The solution is easy, change the new router's default subnet to 192.168.2.1 etc. or change new router's IP to something like 192.168.1.10

Monday, August 29, 2011

ORA-01555 with Query Duration=0 sec

Most DBAs know that ORA-1555 is caused by long running query. And in alert.log file it will tell you which SQL caused ORA-1555 and run for how long.
However from time to time you will see errors like this. It's basically tell you that the query failed right away. So why's the case?

Mon Aug 29 06:39:09 2011

ORA-01555 caused by SQL statement below (SQL ID: 0jc2g6km899ps, Query Duration=0 sec, SCN: 0x00ae.75483a06):

Mon Aug 29 06:39:09 2011

SELECT.xxxx

I ran a query to find out the time stamp of this query's SCN and found out that the query has a time stamp of 6AM. But i was failed 40 min later. That could only mean one thing that it was in a transaction that started 6AM and Oracle already over written the data in UNDO.

SYS@VAULTPROD>select scn_to_timestamp(749291977222) from dual;

SCN_TO_TIMESTAMP(749291977222)

---------------------------------------------------------------------------

29-AUG-11 06.00.01.000000000 AM

There’s a couple of ways to help improve the situation.

Does all the statements in this job need to be in single transaction? If not, don’t put them into single transaction.
Increase undo retention of DB, Oracle will try to honor this retention subject to UNDO space.
Increase the UNDO tablespace to mitigate the potential space squeeze but remember the reason we got this error is not from UNDO space limitation.

Wednesday, March 09, 2011

Data pump expdp failed with DMSYS related errors ORA-39126 ORA-06512 etc

Today one of our data pump export/import jobs failed with errors attached at the bottom. The process working fine before our 11g upgrade.
I did a little research and found metalink doc 304449.1 has perfect solution.

The problem is we removed some unused database options before we upgrade from 10g to 11g.
The reason is because with all these unnecessary options, the upgrade scripts will run almost two hours. Removing them the upgrade will finish in 15 minutes.

It turns out DMSYS data mining option is among them, but somehow Oracle didn't cleanly remove the option with some left over records in data pump export table.

The solution in this case is delete these records,

SQL> DELETE FROM exppkgact$ WHERE SCHEMA='DMSYS';
SQL> commit;

There are other potential causes for the same error. You can check the metalink doc for more info.

Database Data Pump Export fails with PLS-00201 identifier DMSYS.DBMS_MODEL_EXP must be declared [ID 304449.1]

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Starting "SYSTEM"."SYS_IMPORT_SCHEMA_11": userid=system/********@TEST parfile=/home/oracle/dba/sql/DWS.par
Estimate in progress using BLOCKS method...
Processing object type SCHEMA_EXPORT/TABLE/TABLE_DATA
ORA-39126: Worker unexpected fatal error in KUPW$WORKER.GET_TABLE_DATA_OBJECTS []
ORA-31642: the following SQL statement fails:
BEGIN "DMSYS"."DBMS_DM_MODEL_EXP".SCHEMA_CALLOUT(:1,0,1,'11.02.00.00.00'); END;
ORA-06512: at "SYS.DBMS_SYS_ERROR", line 86
ORA-06512: at "SYS.DBMS_METADATA", line 1245
ORA-04063: package body "DMSYS.DBMS_DM_MODEL_EXP" has errors
ORA-06508: PL/SQL: could not find program unit being called: "DMSYS.DBMS_DM_MODEL_EXP"
ORA-06512: at "SYS.DBMS_METADATA", line 5300
ORA-06512: at "SYS.DBMS_SYS_ERROR", line 86
ORA-06512: at "SYS.KUPW$WORKER", line 8159
----- PL/SQL Call Stack -----
object line object
handle number name
70000007ddbc258 19028 package body SYS.KUPW$WORKER
70000007ddbc258 8191 package body SYS.KUPW$WORKER
70000007ddbc258 12728 package body SYS.KUPW$WORKER
70000007ddbc258 4618 package body SYS.KUPW$WORKER
70000007ddbc258 8902 package body SYS.KUPW$WORKER
70000007ddbc258 1651 package body SYS.KUPW$WORKER
70000007eaf9060 2 anonymous block

Monday, January 31, 2011

Oracle won't do partition pruning on MAX/MIN query of partition key.

Oracle doesn’t do a partition pruning on MAX/MIN query on partition key. Even it makes perfect sense for Oracle to scan only the partition that has MAX/MIN value. And this is not something new, the user community certainly noticed this.

http://www.oramoss.com/blog/2009/06/no-pruning-for-minmax-of-partition-key.html

Right now, all we can do is some work around. For example one of our database use this query to figure out MAX AGG_DATE as part of daily ETL process. AGG_DATE is partition key of the table and not indexed.
The old execution plan looks like this,
Ouch and yes, the Pstart is 1 and Pstop is 1149. Oracle scanned all 1149 partitions of the table and took a very long time as expected.

SQL> explain plan for SELECT max(AGG_DATE) from (SELECT "A1"."AGG_DATE" FROM "WEB_APPS"."COUNTER_DAY_AGG" "A1" order by AGG_DATE desc );
Explained.
SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
------------------------------------
Plan hash value: 4125776214
----------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8 | 10M (2)| 34:45:45 | | |
| 1 | SORT AGGREGATE | | 1 | 8 | | | | |
| 2 | PARTITION RANGE ALL| | 5196M| 38G| 10M (2)| 34:45:45 | 1 | 1149 |
| 3 | TABLE ACCESS FULL | COUNTER_DAY_AGG | 5196M| 38G| 10M (2)| 34:45:45 | 1 | 1149 |
----------------------------------------------------------------------------------

Since this our daily job, the work around I put in is where clause.

The plan looks better after that, Pstart is now KEY instead 1. In our case it will scan 7 daily partitions.

The stats give bogus running time estimate. The actual run time reduced from 20 minutes to 1 minute.

SQL> explain plan for SELECT MAX("A1"."AGG_DATE") FROM "ODS_WEB_APPS"."COUNTER_DAY_AGG" "A1" where AGG_DATE > sysdate-7;

Explained.

SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT

----------------------------------------------------------------------------------Plan hash value: 1669369268

----------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |

------------------------------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 8 | 10M (4)| 36:09:51 | | |

| 1 | SORT AGGREGATE | | 1 | 8 | | | | |

| 2 | PARTITION RANGE ITERATOR| | 6919K| 52M| 10M (4)| 36:09:51 | KEY | 1149 |

|* 3 | TABLE ACCESS FULL | COUNTER_DAY_AGG | 6919K| 52M| 10M (4)| 36:09:51 | KEY | 1149 |

----------------------------------------------------------------------------------

Of course there's one trade off of this work around. It will limit the script's ability to catch up failed or missed loading. The script use this query to find out max loading date and catch up load from that date. So if our loading didn't run for more than 7 days, the script won't be able to catchup. I guess that's something we can live with, it's not possible that we didn't notice our daily ETL job was not running for past 7 days :) Even in worst case scenario that really happens, we can still deal with it individually.

Friday, October 29, 2010

ORA-01591 and quick solution

One of the user reported they got this error from application.

ORA-01591: lock held by in-doubt distributed transaction 4.7.533420

We don't really see this error often. So I did a little research.

The error message doc from Oracle has pretty good explanation but didn't provide a solution how to resolve this.

ORA-01591:	lock held by in-doubt distributed transaction string
Cause:	Trying to access resource that is locked by a dead two-phase commit transaction that is in prepared state.
Action:	DBA should query the pending_trans$ and related tables, and attempt to repair network connection(s) to coordinator and commit point. If timely repair is not possible, DBA should contact DBA at commit point if known or end user for correct outcome, or use heuristic default if given to issue a heuristic commit or abort command to finalize the local portion of the distributed transaction.

What I end up did is pretty easy, rollback force didn't do the trick. The DBMS_TRANSACTION helped.

SQL> select local_tran_id from dba_2pc_pending;

LOCAL_TRAN_ID

----------------------

4.7.533420

SQL> rollback force '4.7.533420';

Rollback complete.

SQL> select local_tran_id from dba_2pc_pending;

LOCAL_TRAN_ID

----------------------

4.7.533420

SQL> exec dbms_transaction.purge_lost_db_entry('4.7.533420');

PL/SQL procedure successfully completed.

SQL> commit;

Commit complete.

SQL> select local_tran_id from dba_2pc_pending;

no rows selected

Tuesday, September 28, 2010

ORA-12547 and procmap error while running sqlplus

If you got following error message while trying to run sqlplus on IBM AIX 5L

This is an exact match of

ORA-12547 connecting to sqlplus / as sysdba on IBM AIX 5L [ID 372143.1]

Basically because the /proc is not mounted on your server.

/DB/../10204-64/network/admin PROD 341 >sqlplus / as sysdba
/usr/bin/procmap : no such process : 373080
/usr/bin/procmap : no such process : 373080
/usr/bin/procmap : no such process : 373080

SQL*Plus: Release 10.2.0.4.0 - Production on Tue Sep 28 15:39:02 2010

Copyright (c) 1982, 2007, Oracle. All Rights Reserved.

ERROR:
ORA-12547: TNS:lost contact

Wednesday, August 18, 2010

Hot Backup datafile copy problem with CIO mount option on AIX

We encountered some Hot Backup error after we followed IBM's suggestion to remove CIO mount option from our JFS2 data volumes.

The error is like follows:

scp /DB/VPROD/data01/sysaux01.dbf .

> cp: /DB/VPROD/data01/sysaux01.dbf: A system call received a parameter that is not valid.

This happens after we put database in hot backup mode and trying to copy datafiles.

So this approves what IBM told us, Oracle will use CIO no matter if the file system is mounted using CIO option. But this created another problem that AIX will not allow non-CIO system operation to access file opened with CIO.

There you go we back to square one, mounted the data volume back with CIO option in order to facilitate our Hot Backup. Another reason to use RMAN backup I guess.