Wednesday, October 22, 2008

Yet another RMAN bug

One of our RMAN backup failed with

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of delete command at 10/22/2008 00:30:31
RMAN-03014: implicit resync of recovery catalog failed
RMAN-03009: failure of full resync command on default channel at 10/22/2008 00:30:31
ORA-00001: unique constraint (RMANCAT.TF_U2) violated

Which is perfect hit for bug

Rman Resync Fails After Adding Temp Data File ORA-00001 on TF_U2
Doc ID:
NOTE:402352.1


According to the doc, this bug supposed to be fixed in 10.2.0.2 but we are using 10.2.0.3.
10.2.0.4 patchset note specifically mentioned this bug in fixed list so your safe bet is upgrade to 10.2.0.4

The dangerous part is the cause of this problem. After you dropped a tempfile and recreated a new one with bigger size, expecting your RMAN backup fail tonight with this error if you are using 10.2.0.3 and earlier.

RMAN catalog table TF has a unique key on ("DBINC_KEY", "TS#", "TS_CREATE_SCN", "FILE#"), it's turned out Oracle used the same FILE# but somehow forget to use new TS_CREATE_SCN

ALTER TABLE "RCAT"."TF" ADD CONSTRAINT "TF_U2" UNIQUE ("DBINC_KEY", "TS#", "TS_CREATE_SCN", "FILE#")

Anyway, just something you need to remember after your changed your TEMP tablespace. Or yet another reason to stay fully patched to terminal release.

Addition,

Looks like there are few other people had the same problem from OTN forum, let me include some steps to tackle this problem. Since you need to remove the duplicate record that causing the error, first you need to identify the problem record.

1. Find out the DBINC_KEY, if your RMAN catalog only serving one database, it's easy. But in most cases, you have multiple instances. You need to find out DBINC_KEY of your instance by DBID. Your DBID will show when you connect to RMAN,

connected to target database: ENGDB (DBID=620206583)
Or, select dbid from v$database;

select DBID,NAME,RESETLOGS_TIME, DBINC_KEY
from rc_database_incarnation where dbid=620206583

DBID NAME RESETLOGS DBINC_KEY
---------- -------- --------- ----------
620206583 EDB 06-DEC-06 21822 620206583 EDB 22-OCT-05 21828

2. Find out the problem file#

select "DBINC_KEY", "TS#", "TS_CREATE_SCN", "FILE#"
from tf where DBINC_KEY=21822;

3. Take a note and remove the record from TF_U2

Do a resync catalog using RMAN after delete. With the duplicate record removed the resync should finish.

2 comments:

Anonymous said...

Hi Yingkuan,

That's a nice demo.. !!

- Pavan Kumar N

Steve Rowe said...

Hi Yingkuan

Thanks for the information, had the same issue with a datafile as opposed to a tempfile, and yes at 10.2.0.4. Your fix worked a treat.

Steve