Monday, August 31, 2009

Hint, Check patch history

One of the method to check DB patch history is query registry$history

SYS>select ACTION_TIME,ACTION,NAMESPACE,VERSION,BUNDLE_SERIES from registry$history

ACTION_TIME ACTION NAMESPACE VERSION BUNDLE_SERIES
------------------------------ ---------- ---------- ------------------------------ --------------------
12-SEP-08 10.28.24.895804 PM CPU SERVER 10.2.0.3.0
12-SEP-08 10.30.59.512031 PM CPU
30-MAY-09 11.34.14.809290 AM UPGRADE SERVER 10.2.0.4.0
30-MAY-09 01.13.24.879514 PM APPLY SERVER 10.2.0.4 CPU
30-MAY-09 01.21.04.024997 PM CPU

Wednesday, August 26, 2009

OEM Grid Control Agent issue

This morning we keep getting OEM agent message from one of our production DB server. 10.2.0.4 HP-UX

It sent this Agent unreachable alert and clear alert repeatedly.

Severity=Unreachable StartMessage=Agent is Unreachable (REASON = javax.net.ssl.SSLException: SSL handshake failed: SSLSessionNotFoundErr) but the host is UP.

Severity=Unreachable ClearMessage=Agent Unreachability is cleared. The current status of the target is UP.

When check on the hosts, we observed a number of emdprocstats.pl processes taking high CPU usage and memory and running for a couple of hours.

11466 /oracle/xxx/agent10g/perl/bin/perl /oracle/xxx/agent10g/sysman/admin/scripts/emdprocstats.pl 29011 14180 /oracle/xxx/agent10g/perl/bin/perl /oracle/xxx/agent10g/sysman/admin/scripts/emdprocstats.pl 32100

It's Symptoms of BUG 5908032 described in metalink doc,
Doc ID:
437305.1

The immediate solution is to stop/start agent, or kill these processes if agent can't be stop gracefully.
The long term Solution is to apply Patch 5908032

an update on this, the other DBA told me the agent on this server was not patched after DB was upgraded from 10.2.0.3 to 10.2.0.4, the agent is still 10.2.0.3 in this case.
Also, stop agent will not remove the hung process. manually killed them.