Monday, November 16, 2015

Can't start listener after server reboot. TNS-12531: TNS:cannot allocate memory


Had an interesting problem after rebooting a test Oracle DB server. The listener won't come back properly post a reboot. It's simply hung and we found following errors in logfile eventually.

TNS-12531: TNS:cannot allocate memory

However this error is very misleading and confusing because there's plenty of memory on the host. As matter of fact the Oracle instance itself is not started yet. 
Found some hint from Oracle support doc. 

Oracle 11g Listener Services & Databases fails to start after server reboot

Apparently, the /etc/resolv.conf miss a domain in search string. We undertook some domain changes internally. Since all our DB servers are not managed by puppet, this host was overlooked.

Problem solved by adding proper domain names into /etc/resolv.conf
However still need to dig deeper as to why Oracle behave so strangely and spit out misleading errors.

Tuesday, April 22, 2014

Install PHP OCI8 extension for Codeigniter on Mac OSX




  Working on a DBA dashboard using Codeigniter and Bootstrap , the most popular platform to build professional looking web apps. To make easy development, I installed MAMP on my MAC, so I can have an environment with me on the go. Being primarily focused on LAMP development, MAMP installed ready for MYSQL connection, however if you need connect to Oracle database, you need a little bit of work. oci8 is not fully configured and you need to do it manually.
  To save you some time I compiled a list that works for me so you don't have to crawl all over internet to find them.
  Among all the blogs and posts I checked, this one enabling-oracle-oci8-php-extension-on-os-x came closest to a full list, it only missing the part how to get PECL and autoconf works on your MAC.


Install PEAR on your MAC 

curl -O http://pear.php.net/go-pear.phar sudo php -d detect_unicode=0 go-pear.phar

Download these and unzip into same Dir
Create a bunch of symbolic link for compilesudo cp instantclient_11_2/sdk/include/*.h /usr/include
sudo cp instantclient_11_2/sqlplus /usr/bin

sudo cp instantclient_11_2/*.dylib /usr/lib
sudo cp instantclient_11_2/*.dylib.* /usr/lib


Now move to the /usr/lib directory and create the following link:sudo ln -s libclntsh.dylib.11.1 libclntsh.dylib

At this point, when you runsudo pecl install oci8You likely to get these errors:
Cannot find autoconf. Please check your autoconf installation and the $PHP_AUTOCONF environment variable.

autoconf need to manually installed on MAC
curl http://ftp.gnu.org/gnu/autoconf/autoconf-latest.tar.gz > autoconf.tar.gz

Untar it
tar -xvzf autoconf.tar.gz
cd autoconf-2.69 ./configure 
make sudo make install 
export PHP_AUTOCONF=/usr/local/bin/autoconf

The oci8 installation should be successful after this. These are the messages for successful installation.
sudo pecl install oci8
Installing '/usr/lib/php/extensions/no-debug-non-zts-20100525/oci8.so'install ok: channel://pecl.php.net/oci8-2.0.8configuration option "php_ini" is not set to php.ini locationYou should add "extension=oci8.so" to php.ini

Monday, March 31, 2014

Something about SQL Server Isolation Level


  As we all know Oracle's default transaction isolation level is Read Committed. That means you can read data from changes made by other session as long as it's committed. Readers are not blocking writers and vice versa.

   The default isolation level of MS SQL Server is also Read Committed, however, from time to time our users saw their select blocked by other sessions and after they added NOWAIT hint, you get these errors
  Lock request time out period exceeded
  I did a quick testing and confirmed the observation. A select statement will be blocked by uncommitted transaction on underlying table and if put NOWAIT hint, the query will exit will the error. 
  
  There's another hint NOLOCK in SQL Server, however, that's definitely not preferable option, because the behavior of NOLOCK is same as setting transaction isolation level to READ UNCOMMITTED. Which mean you could read dirty data that changed by other session but not committed yet.
  
  After a little digging, I found SQL Server documented this behavior. Basically you need to set database option READ_COMMITTED_SNAPSHOT to ON, by default it was set to OFF.

  select * from sys.databases


  The SNAPSHOT serving the similar function as Oracle Rollback segment, holding pre-image of changes made by transaction. I am surprised SQL Server by default setting it to 0 (OFF) which essentially disabled default isolation level READ Committed. Means all read will require shared lock on table and if there's open transaction, the read is blocked.

 http://technet.microsoft.com/en-us/library/ms173763.aspx

  Of course the quick fix is set to READ_COMMITTED_SNAPSHOT to ON which can be done at database level of SQL Server.
  ALTER DATABASE test SET READ_COMMITTED_SNAPSHOT ON

Quick follow up on this, as expected there's performance penalty by turning on Read Committed Snapshot. A blog talked about this a while ago. Apparently, SQL Server does not improve much in the last few years. 
Clearly SQL Server is not suitable for high volume read/write load.  Small Biz and office use please.

Performance Impact: The Potential Cost of Read_Committed_Snapshot

Friday, January 04, 2013

HTML code for "Please Wait" message




Not usual DBA stuff,
I was designing a web page for my team to quickly checking the status of all TEST, DEV and PROD databases. Since we have a hundreds instances to check, the page will take a while to load. Found this code from google, perfect fit for my need.

Code:
<script type="text/javascript">
var ray={
ajax:function(st)
 {
  this.show('load');
 },
show:function(el)
 {
  this.getID(el).style.display='';
 },
getID:function(el)
 {
  return document.getElementById(el);
 }
}
</script>
<style type="text/css">
#load{
position:absolute;
z-index:1;
border:3px double #999;
background:#f7f7f7;
width:400px;
height:200px;
margin-top:-150px;
margin-left:-150px;
top:50%;
left:50%;
text-align:center;
line-height:300px;
font-family:"Trebuchet MS", verdana, arial,tahoma;
font-size:18pt;
}
</style>
<div id="load" style="display:none;">Checking DB ... Please wait</div>
<form action="http://www.yahoo.com" method="post" onsubmit="return ray.ajax()">
<input type="text" value="Test" name="q">
<input type="submit" value="Search">
</form>

Thursday, November 29, 2012

A Typical Case of SQL Tuning using 11g SQL Baseline



Running Swingbench on TEST Server found performance tanked.
Found this frequent execute SQL had a plan change and taking massive Temp space and IO

SQL> @check_sql_plan_change
Enter value for sql_id: 7hk2m2702ua0g
old   6: where sql_id = '&sql_id'
new   6: where sql_id = '7hk2m2702ua0g'

   SNAP_ID   NODE BEGIN_INTERVAL_TIME            SQL_ID        PLAN_HASH_VALUE        EXECS    AVG_ETIME        AVG_LIO
---------- ------ ------------------------------ ------------- --------------- ------------ ------------ --------------
         7      1 28-NOV-12 04.10.03.425 PM      7hk2m2702ua0g      3283867725       76,049         .005           20.4
         8      1 28-NOV-12 04.20.09.222 PM      7hk2m2702ua0g                      118,691         .005           20.4
         9      1 28-NOV-12 04.37.36.000 PM      7hk2m2702ua0g                       13,106         .014           20.4
        10      1 28-NOV-12 04.48.33.912 PM      7hk2m2702ua0g                       43,318         .007           20.4
        11      1 28-NOV-12 05.00.35.170 PM      7hk2m2702ua0g                      121,633         .004           20.3
        12      1 28-NOV-12 05.10.37.015 PM      7hk2m2702ua0g                       56,663         .004           20.3
       114      1 29-NOV-12 10.10.14.781 AM      7hk2m2702ua0g      2125236239        1,400       69.475        5,146.7
       115      1 29-NOV-12 10.20.15.130 AM      7hk2m2702ua0g                        6,284       65.791        7,295.8
       116      1 29-NOV-12 10.30.16.251 AM      7hk2m2702ua0g                          700       71.816       13,954.5

LOAD Plans to SQLSET from AWR

declare
baseline_ref_cursor DBMS_SQLTUNE.SQLSET_CURSOR;
begin open
baseline_ref_cursor for select VALUE(p) from table(DBMS_SQLTUNE.SELECT_WORKLOAD_REPOSITORY(1,132,NULL,NULL,NULL,NULL,NULL,NULL,NULL,'ALL')) p;
DBMS_SQLTUNE.LOAD_SQLSET('SWING', baseline_ref_cursor);
end;

LOAD Plans to Baseline from SQLSET

DECLARE 
  my_10gplans PLS_INTEGER;
BEGIN
  my_10gplans := DBMS_SPM.LOAD_PLANS_FROM_SQLSET( sqlset_name => 'SWING' );
END;

Checking loaded plans

SQL> SELECT sql_handle, plan_name, enabled, accepted , ELAPSED_TIME FROM   dba_sql_plan_baselines where PARSING_SCHEMA_NAME='SOE';

SQL_HANDLE                     PLAN_NAME                      ENA ACC ELAPSED_TIME
------------------------------ ------------------------------ --- --- ------------
SQL_7aa3962c9ae2db34           SQL_PLAN_7p8wq5kdf5qtn641e1b0c YES YES   2249855466
SQL_7aa3962c9ae2db34           SQL_PLAN_7p8wq5kdf5qtnc4b22a15 YES YES   6.5511E+11

Disable unwanted plan, enable and fix desired plan


DECLARE
     i NATURAL;
    BEGIN
      i := dbms_spm.alter_sql_plan_baseline('SQL_7aa3962c9ae2db34', attribute_name => 'enabled', attribute_value=>'NO');
      dbms_output.put_line(i);
    END;


SQL> set serveroutput on
SQL> DECLARE
  2   i NATURAL;
  3  BEGIN
  4    i := dbms_spm.alter_sql_plan_baseline('SQL_7aa3962c9ae2db34', 'SQL_PLAN_7p8wq5kdf5qtn641e1b0c', attribute_name => 'enabled', attribute_value=>'YES');
  5    dbms_output.put_line(i);
  6  END;
  7 
  8 
  9  /
PL/SQL procedure successfully completed.

SQL> SELECT sql_handle, plan_name, enabled, accepted , ELAPSED_TIME, fixed FROM   dba_sql_plan_baselines where PARSING_SCHEMA_NAME='SOE';

SQL_HANDLE                     PLAN_NAME                      ENA ACC ELAPSED_TIME FIX
------------------------------ ------------------------------ --- --- ------------ ---
SQL_7aa3962c9ae2db34           SQL_PLAN_7p8wq5kdf5qtn641e1b0c YES YES   2249855466 NO
SQL_7aa3962c9ae2db34           SQL_PLAN_7p8wq5kdf5qtnc4b22a15 NO  YES   6.5511E+11 NO

Now Swingbench using correct plan and perform as expected.

Thursday, October 25, 2012

AIX ASM Diskgroup reporting wrong Lun size


  We discovered this strange problem while trying to convert one of our big databases from file system to ASM. After we assigned a 1TB lun to AIX and created a diskgroup on top of it. ASM reporting wrong disk size on it. ASM only be able to detect around 100G out of 1TB.
  After a few poke around to make sure we didn't make any mistake provision the storage to the OS. I did some research on Metalink. And fair enough, I found this is a known bug on AIX ASM

Bug 9495887 AIX: ASM does not recognize correct diskgroup size for large disks

-------------------------------------------------------------------------------
Device         Size (GB)  Paths  Vol Name       Vol Id   XIV Id   XIV Host     
-------------------------------------------------------------------------------
/dev/hdisk4    1135.7     5/5    hedata16     113      7825812  heproddb102


NAME            PATH              GROUP_NUMBER   TOTAL_MB    FREE_MB READS WRITES 
--------------- ----------------- ------------ ---------- ---------- ----- ------ 
TEST_0000       /dev/rhdisk4                 5     101920     101800    60     10 

The workaround suggested was to create the disk group with specific size like following

SQL> create diskgroup DATA external redundancy disk '/dev/rhdisk4' size 1135G;

Frankly it's quite surprising we still hit such basis bugs on AIX ASM even after it was released 3 years. 1TB disk is hardly a large disk nowadays, I guess AIX is just such an unpopular OS for Oracle installation and have very little customer base. And I can totally understand why.






Sunday, November 20, 2011

GATHER_TABLE_STATS and ORA-01652

We had ORA-01652 error from one production database recently. The culprit is GATHER_TABLE_STATS  job. By the way, this DB is 10.2.0.4



ORA-01652: unable to extend temp segment by 128 in tablespace TEMP
*** 2011-11-20 20:28:51.105
GATHER_STATS_JOB: GATHER_TABLE_STATS('"L53"','"L_CARD"','""', ...)
ORA-01652: unable to extend temp segment by 128 in tablespace TEMP

Originally it was error out and paging at 4AM which is really not preferred timing for On Call DBA.
Since this DB tend to have higher load during early morning any way. I changed maintenance windows to the late afternoon.
To change this auto statistics collection job time use this command.


BEGIN
DBMS_SCHEDULER.SET_ATTRIBUTE (
name => 'GATHER_STATS_JOB',
attribute => 'repeat_interval',
value => 'freq=daily;byday=SUN,MON,TUE,WED,THU,FRI,SAT;byhour=17;byminute=0; bysecond=0');
END;




However this didn't address the root cause of the issue apparently. A couple of days later the job failed again with same error. 

Increase TEMP tablespace is not an option. The TEMP TBS on this DB is 95G. This job run for 3 hours and used them all. Adding more TEMP will only delay the inevitable.
I decide to changed estimate percent from auto sampling to 1% , this fixed the issue. I did some research on google about this but there's not much useful past discussion.
Only found this asktom thread pretty helpful by pointing the right statement to change the default GATHER_TABLE_STATS job
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:652425700346984666
ops$tkyte%ORA10GR2> select dbms_stats.get_param( 'estimate_percent' ) from dual;

DBMS_STATS.GET_PARAM('ESTIMATE_PERCENT')
-------------------------------------------------------------------------------
DBMS_STATS.AUTO_SAMPLE_SIZE

reset or set them with these:

http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14258/d_stats.htm#i1047505

http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14258/d_stats.htm#i1048566