Wednesday, August 26, 2009

OEM Grid Control Agent issue

This morning we keep getting OEM agent message from one of our production DB server. 10.2.0.4 HP-UX

It sent this Agent unreachable alert and clear alert repeatedly.

Severity=Unreachable StartMessage=Agent is Unreachable (REASON = javax.net.ssl.SSLException: SSL handshake failed: SSLSessionNotFoundErr) but the host is UP.

Severity=Unreachable ClearMessage=Agent Unreachability is cleared. The current status of the target is UP.

When check on the hosts, we observed a number of emdprocstats.pl processes taking high CPU usage and memory and running for a couple of hours.

11466 /oracle/xxx/agent10g/perl/bin/perl /oracle/xxx/agent10g/sysman/admin/scripts/emdprocstats.pl 29011 14180 /oracle/xxx/agent10g/perl/bin/perl /oracle/xxx/agent10g/sysman/admin/scripts/emdprocstats.pl 32100

It's Symptoms of BUG 5908032 described in metalink doc,
Doc ID:
437305.1

The immediate solution is to stop/start agent, or kill these processes if agent can't be stop gracefully.
The long term Solution is to apply Patch 5908032

an update on this, the other DBA told me the agent on this server was not patched after DB was upgraded from 10.2.0.3 to 10.2.0.4, the agent is still 10.2.0.3 in this case.
Also, stop agent will not remove the hung process. manually killed them.

No comments: