Wednesday, August 18, 2010

Hot Backup datafile copy problem with CIO mount option on AIX

We encountered some Hot Backup error after we followed IBM's suggestion to remove CIO mount option from our JFS2 data volumes. 
The error is like follows:


scp /DB/VPROD/data01/sysaux01.dbf .
> cp: /DB/VPROD/data01/sysaux01.dbf: A system call received a parameter that is not valid.

This happens after we put database in hot backup mode and trying to copy datafiles.

So this approves what IBM told us, Oracle will use CIO no matter if the file system is mounted using CIO option. But this created another problem that AIX will not allow non-CIO system operation to access file opened with CIO.
There you go we back to square one, mounted the data volume back with CIO option in order to facilitate our Hot Backup. Another reason to use RMAN backup I guess.

Friday, April 09, 2010

ORA-00064: object is too large to allocate on this O/S (1,16777216)

Not sure how many of you run into this problem. It happens to one of our production database after we trying to increase the SGA size to 320G.

Oracle version is 10.2.0.4, AIX 5.3 L6 32CPUs 750G RAM

The error is
  ORA-00064: object is too large to allocate on this O/S (1,16777216)

It seems pretty conclusive OS limit problem.  However our OS ulimit for Oracle user is set to unlimited.
Actually the problem is  _ksmg_granule_size  SGA units of granules

Granule size is determined by total SGA size. On most platforms, the size of a granule is 4 MB if the total SGA size is less than 1 GB, and granule size is 16MB for larger SGAs.

In our case since we increased our SGA so big, even 16MB is not big enough to fix our needs.
After increased _ksmg_granule_size to 32MB, we are able to start the instance with 350MB SGA.


  alter system set "_ksmg_granule_size"=33554432 scope=spfile;





Thursday, March 18, 2010

Oracle instance slow startup

Recently we noticed one of our production database take longer than usual to startup.
In some cases, it took 3 to 4 hours for alter database open to complete.
The case is particularly bad for our TEST and DEV database after they got refreshed with production. Our production is very powerful 32 CPUs box, when production took like 20 to 30 minutes to open. TEST and DEV will take hours.


Sat Feb 27 06:48:55 2010
alter database open
-snip-
Sat Feb 27 09:54:22 2010
Completed: alter database open


We engaged Oracle support and they suggested to do a trace.

SQL> conn / as sysdba
SQL> startup mount
SQL> alter session set events '10046 trace name context forever, level 12';
SQL> alter database open;
Once the instance is opened, immediately turn off the 10046 tracing through that session.
SQL> alter session set events '10046 trace name context off';


The trace revealed that, database is querying two advanced queue tables used by STREAMS. The two tables are highly fragmented, for example table aq$_qt_cap_st_D had just 150 records and had 8000 + extents.


strmadmin.aq$_qt_cap_st_p
strmadmin.aq$_qt_cap_st_d


For TEST and DEV we can easily go around the issue by truncating the two tables because we are not using STREAMS on them. For production, table re-org are in order, for that we choose to use Online Redefinition.

Wednesday, January 27, 2010

tnsnames.ora is picky

Recently we added some TNS entries into our Remote Desktop servers that shared by all our users. The user report newly added entries are not working.
I did a tnsping on them and got these errors.

Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION =
TNS-12533: TNS:illegal ADDRESS parameters


I then check the entry in tnsnames.ora file, it reads

VTEST.NET.COM =

(DESCRIPTION =
(ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = vqadb01.netxx.com)(PORT = 1521))
)
(CONNECT_DATA =
(SID = VTEST)
)
)


Do you find anything wrong with them?

Not sure about you, but it took me a while to figure out that tnsnames.ora is pretty sensitive to the format. The problem is the new entry description all flatten out to the left instead of having indentation like this one just below.

FPROD.NET.COM=
--(DESCRIPTION =
----(ADDRESS_LIST =
-----(ADDRESS = (PROTOCOL = TCP)(HOST = FPROD)(PORT = 1528))
----)
----(CONNECT_DATA =
-----(SID = FPROD)
----)
--)

-- Oh well, I think we had same problem here on blogspot, it left align all my postings. Let me use dash instead.

So Oracle think "(DESCRIPTION" is a new entry in the tnsnames.ora instead of the description of previous entry. Because we DBA don't have control over the remote desktop server, I need to create tickets to add new entry. When I pasted the text into ticket, the ticket system left aligned them hence lost indentation.

Well, it's a small trivial problem but sometime it's hard to figure if you don't already know. Because the actual configure text are correct and worked on other hosts.

Wednesday, September 02, 2009

Stop Datapump Job

If you have a data pump job running and you want to stop it, there's a couple of ways to stop them.

First run a statement to find out the running jobs,

SELECT JOB_NAME, OWNER_NAME , STATE FROM DBA_DATAPUMP_JOBS

JOB_NAME OWNER_NAME STATE
------------------------------ ------------------------------ ------------------------------
SYS_IMPORT_FULL_02 SYSTEM EXECUTING
SYS_IMPORT_FULL_03 SYSTEM STOP PENDING
SYS_EXPORT_SCHEMA_01 SYSTEM NOT RUNNING
SYS_IMPORT_FULL_01 SYSTEM NOT RUNNING

then you can either use DBMS_DATAPUMP.ATTACH and DBMS_DATAPUMP.STOP_JOB to stop them from SQL

or

impdp userid=system/password attach=SYS_IMPORT_FULL_03
Import> STOP_JOB=IMMEDIATE
Are you sure you wish to stop this job ([yes]/no): yes

Monday, August 31, 2009

Hint, Check patch history

One of the method to check DB patch history is query registry$history

SYS>select ACTION_TIME,ACTION,NAMESPACE,VERSION,BUNDLE_SERIES from registry$history

ACTION_TIME ACTION NAMESPACE VERSION BUNDLE_SERIES
------------------------------ ---------- ---------- ------------------------------ --------------------
12-SEP-08 10.28.24.895804 PM CPU SERVER 10.2.0.3.0
12-SEP-08 10.30.59.512031 PM CPU
30-MAY-09 11.34.14.809290 AM UPGRADE SERVER 10.2.0.4.0
30-MAY-09 01.13.24.879514 PM APPLY SERVER 10.2.0.4 CPU
30-MAY-09 01.21.04.024997 PM CPU

Wednesday, August 26, 2009

OEM Grid Control Agent issue

This morning we keep getting OEM agent message from one of our production DB server. 10.2.0.4 HP-UX

It sent this Agent unreachable alert and clear alert repeatedly.

Severity=Unreachable StartMessage=Agent is Unreachable (REASON = javax.net.ssl.SSLException: SSL handshake failed: SSLSessionNotFoundErr) but the host is UP.

Severity=Unreachable ClearMessage=Agent Unreachability is cleared. The current status of the target is UP.

When check on the hosts, we observed a number of emdprocstats.pl processes taking high CPU usage and memory and running for a couple of hours.

11466 /oracle/xxx/agent10g/perl/bin/perl /oracle/xxx/agent10g/sysman/admin/scripts/emdprocstats.pl 29011 14180 /oracle/xxx/agent10g/perl/bin/perl /oracle/xxx/agent10g/sysman/admin/scripts/emdprocstats.pl 32100

It's Symptoms of BUG 5908032 described in metalink doc,
Doc ID:
437305.1

The immediate solution is to stop/start agent, or kill these processes if agent can't be stop gracefully.
The long term Solution is to apply Patch 5908032

an update on this, the other DBA told me the agent on this server was not patched after DB was upgraded from 10.2.0.3 to 10.2.0.4, the agent is still 10.2.0.3 in this case.
Also, stop agent will not remove the hung process. manually killed them.