Monday, March 31, 2014

Something about SQL Server Isolation Level


  As we all know Oracle's default transaction isolation level is Read Committed. That means you can read data from changes made by other session as long as it's committed. Readers are not blocking writers and vice versa.

   The default isolation level of MS SQL Server is also Read Committed, however, from time to time our users saw their select blocked by other sessions and after they added NOWAIT hint, you get these errors
  Lock request time out period exceeded
  I did a quick testing and confirmed the observation. A select statement will be blocked by uncommitted transaction on underlying table and if put NOWAIT hint, the query will exit will the error. 
  
  There's another hint NOLOCK in SQL Server, however, that's definitely not preferable option, because the behavior of NOLOCK is same as setting transaction isolation level to READ UNCOMMITTED. Which mean you could read dirty data that changed by other session but not committed yet.
  
  After a little digging, I found SQL Server documented this behavior. Basically you need to set database option READ_COMMITTED_SNAPSHOT to ON, by default it was set to OFF.

  select * from sys.databases


  The SNAPSHOT serving the similar function as Oracle Rollback segment, holding pre-image of changes made by transaction. I am surprised SQL Server by default setting it to 0 (OFF) which essentially disabled default isolation level READ Committed. Means all read will require shared lock on table and if there's open transaction, the read is blocked.

 http://technet.microsoft.com/en-us/library/ms173763.aspx

  Of course the quick fix is set to READ_COMMITTED_SNAPSHOT to ON which can be done at database level of SQL Server.
  ALTER DATABASE test SET READ_COMMITTED_SNAPSHOT ON

Quick follow up on this, as expected there's performance penalty by turning on Read Committed Snapshot. A blog talked about this a while ago. Apparently, SQL Server does not improve much in the last few years. 
Clearly SQL Server is not suitable for high volume read/write load.  Small Biz and office use please.

Performance Impact: The Potential Cost of Read_Committed_Snapshot

Friday, January 04, 2013

HTML code for "Please Wait" message




Not usual DBA stuff,
I was designing a web page for my team to quickly checking the status of all TEST, DEV and PROD databases. Since we have a hundreds instances to check, the page will take a while to load. Found this code from google, perfect fit for my need.

Code:
<script type="text/javascript">
var ray={
ajax:function(st)
 {
  this.show('load');
 },
show:function(el)
 {
  this.getID(el).style.display='';
 },
getID:function(el)
 {
  return document.getElementById(el);
 }
}
</script>
<style type="text/css">
#load{
position:absolute;
z-index:1;
border:3px double #999;
background:#f7f7f7;
width:400px;
height:200px;
margin-top:-150px;
margin-left:-150px;
top:50%;
left:50%;
text-align:center;
line-height:300px;
font-family:"Trebuchet MS", verdana, arial,tahoma;
font-size:18pt;
}
</style>
<div id="load" style="display:none;">Checking DB ... Please wait</div>
<form action="http://www.yahoo.com" method="post" onsubmit="return ray.ajax()">
<input type="text" value="Test" name="q">
<input type="submit" value="Search">
</form>

Thursday, November 29, 2012

A Typical Case of SQL Tuning using 11g SQL Baseline



Running Swingbench on TEST Server found performance tanked.
Found this frequent execute SQL had a plan change and taking massive Temp space and IO

SQL> @check_sql_plan_change
Enter value for sql_id: 7hk2m2702ua0g
old   6: where sql_id = '&sql_id'
new   6: where sql_id = '7hk2m2702ua0g'

   SNAP_ID   NODE BEGIN_INTERVAL_TIME            SQL_ID        PLAN_HASH_VALUE        EXECS    AVG_ETIME        AVG_LIO
---------- ------ ------------------------------ ------------- --------------- ------------ ------------ --------------
         7      1 28-NOV-12 04.10.03.425 PM      7hk2m2702ua0g      3283867725       76,049         .005           20.4
         8      1 28-NOV-12 04.20.09.222 PM      7hk2m2702ua0g                      118,691         .005           20.4
         9      1 28-NOV-12 04.37.36.000 PM      7hk2m2702ua0g                       13,106         .014           20.4
        10      1 28-NOV-12 04.48.33.912 PM      7hk2m2702ua0g                       43,318         .007           20.4
        11      1 28-NOV-12 05.00.35.170 PM      7hk2m2702ua0g                      121,633         .004           20.3
        12      1 28-NOV-12 05.10.37.015 PM      7hk2m2702ua0g                       56,663         .004           20.3
       114      1 29-NOV-12 10.10.14.781 AM      7hk2m2702ua0g      2125236239        1,400       69.475        5,146.7
       115      1 29-NOV-12 10.20.15.130 AM      7hk2m2702ua0g                        6,284       65.791        7,295.8
       116      1 29-NOV-12 10.30.16.251 AM      7hk2m2702ua0g                          700       71.816       13,954.5

LOAD Plans to SQLSET from AWR

declare
baseline_ref_cursor DBMS_SQLTUNE.SQLSET_CURSOR;
begin open
baseline_ref_cursor for select VALUE(p) from table(DBMS_SQLTUNE.SELECT_WORKLOAD_REPOSITORY(1,132,NULL,NULL,NULL,NULL,NULL,NULL,NULL,'ALL')) p;
DBMS_SQLTUNE.LOAD_SQLSET('SWING', baseline_ref_cursor);
end;

LOAD Plans to Baseline from SQLSET

DECLARE 
  my_10gplans PLS_INTEGER;
BEGIN
  my_10gplans := DBMS_SPM.LOAD_PLANS_FROM_SQLSET( sqlset_name => 'SWING' );
END;

Checking loaded plans

SQL> SELECT sql_handle, plan_name, enabled, accepted , ELAPSED_TIME FROM   dba_sql_plan_baselines where PARSING_SCHEMA_NAME='SOE';

SQL_HANDLE                     PLAN_NAME                      ENA ACC ELAPSED_TIME
------------------------------ ------------------------------ --- --- ------------
SQL_7aa3962c9ae2db34           SQL_PLAN_7p8wq5kdf5qtn641e1b0c YES YES   2249855466
SQL_7aa3962c9ae2db34           SQL_PLAN_7p8wq5kdf5qtnc4b22a15 YES YES   6.5511E+11

Disable unwanted plan, enable and fix desired plan


DECLARE
     i NATURAL;
    BEGIN
      i := dbms_spm.alter_sql_plan_baseline('SQL_7aa3962c9ae2db34', attribute_name => 'enabled', attribute_value=>'NO');
      dbms_output.put_line(i);
    END;


SQL> set serveroutput on
SQL> DECLARE
  2   i NATURAL;
  3  BEGIN
  4    i := dbms_spm.alter_sql_plan_baseline('SQL_7aa3962c9ae2db34', 'SQL_PLAN_7p8wq5kdf5qtn641e1b0c', attribute_name => 'enabled', attribute_value=>'YES');
  5    dbms_output.put_line(i);
  6  END;
  7 
  8 
  9  /
PL/SQL procedure successfully completed.

SQL> SELECT sql_handle, plan_name, enabled, accepted , ELAPSED_TIME, fixed FROM   dba_sql_plan_baselines where PARSING_SCHEMA_NAME='SOE';

SQL_HANDLE                     PLAN_NAME                      ENA ACC ELAPSED_TIME FIX
------------------------------ ------------------------------ --- --- ------------ ---
SQL_7aa3962c9ae2db34           SQL_PLAN_7p8wq5kdf5qtn641e1b0c YES YES   2249855466 NO
SQL_7aa3962c9ae2db34           SQL_PLAN_7p8wq5kdf5qtnc4b22a15 NO  YES   6.5511E+11 NO

Now Swingbench using correct plan and perform as expected.

Thursday, October 25, 2012

AIX ASM Diskgroup reporting wrong Lun size


  We discovered this strange problem while trying to convert one of our big databases from file system to ASM. After we assigned a 1TB lun to AIX and created a diskgroup on top of it. ASM reporting wrong disk size on it. ASM only be able to detect around 100G out of 1TB.
  After a few poke around to make sure we didn't make any mistake provision the storage to the OS. I did some research on Metalink. And fair enough, I found this is a known bug on AIX ASM

Bug 9495887 AIX: ASM does not recognize correct diskgroup size for large disks

-------------------------------------------------------------------------------
Device         Size (GB)  Paths  Vol Name       Vol Id   XIV Id   XIV Host     
-------------------------------------------------------------------------------
/dev/hdisk4    1135.7     5/5    hedata16     113      7825812  heproddb102


NAME            PATH              GROUP_NUMBER   TOTAL_MB    FREE_MB READS WRITES 
--------------- ----------------- ------------ ---------- ---------- ----- ------ 
TEST_0000       /dev/rhdisk4                 5     101920     101800    60     10 

The workaround suggested was to create the disk group with specific size like following

SQL> create diskgroup DATA external redundancy disk '/dev/rhdisk4' size 1135G;

Frankly it's quite surprising we still hit such basis bugs on AIX ASM even after it was released 3 years. 1TB disk is hardly a large disk nowadays, I guess AIX is just such an unpopular OS for Oracle installation and have very little customer base. And I can totally understand why.






Sunday, November 20, 2011

GATHER_TABLE_STATS and ORA-01652

We had ORA-01652 error from one production database recently. The culprit is GATHER_TABLE_STATS  job. By the way, this DB is 10.2.0.4



ORA-01652: unable to extend temp segment by 128 in tablespace TEMP
*** 2011-11-20 20:28:51.105
GATHER_STATS_JOB: GATHER_TABLE_STATS('"L53"','"L_CARD"','""', ...)
ORA-01652: unable to extend temp segment by 128 in tablespace TEMP

Originally it was error out and paging at 4AM which is really not preferred timing for On Call DBA.
Since this DB tend to have higher load during early morning any way. I changed maintenance windows to the late afternoon.
To change this auto statistics collection job time use this command.


BEGIN
DBMS_SCHEDULER.SET_ATTRIBUTE (
name => 'GATHER_STATS_JOB',
attribute => 'repeat_interval',
value => 'freq=daily;byday=SUN,MON,TUE,WED,THU,FRI,SAT;byhour=17;byminute=0; bysecond=0');
END;




However this didn't address the root cause of the issue apparently. A couple of days later the job failed again with same error. 

Increase TEMP tablespace is not an option. The TEMP TBS on this DB is 95G. This job run for 3 hours and used them all. Adding more TEMP will only delay the inevitable.
I decide to changed estimate percent from auto sampling to 1% , this fixed the issue. I did some research on google about this but there's not much useful past discussion.
Only found this asktom thread pretty helpful by pointing the right statement to change the default GATHER_TABLE_STATS job
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:652425700346984666
ops$tkyte%ORA10GR2> select dbms_stats.get_param( 'estimate_percent' ) from dual;

DBMS_STATS.GET_PARAM('ESTIMATE_PERCENT')
-------------------------------------------------------------------------------
DBMS_STATS.AUTO_SAMPLE_SIZE

reset or set them with these:

http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14258/d_stats.htm#i1047505

http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14258/d_stats.htm#i1048566


Wednesday, September 07, 2011

How to connect wireless router to another wireless router

This post is not related to Oracle. It just a hint for a problem I found while I was setting up my home network by connecting a new wireless router to an existing one.

In beginning, this task seems super easy and no brainer to me. Just connect 'Internet' port of new router to any Local ethernet port on existing router and setup new router and Wala!

Oh well, it doesn't work. The new router keeps complaining it's not connected to internet. The Internet setup page show it get 127.0.0.1 (localhost) as DHCP address from old router and that doesn't work obviously. Actually some network guru probably already figured out of the problem when they saw this.

So why it's get a 127.0.0.1 address instead of a valid DHCP release? Well, the trick is most router by default using 192.168.1.1 address and subnet. So if two routers sharing the same address, of course the new one will get localhost as address thinking he is 192.168.1.1

The solution is easy, change the new router's default subnet to 192.168.2.1 etc. or change new router's IP to something like 192.168.1.10

Monday, August 29, 2011

ORA-01555 with Query Duration=0 sec

Most DBAs know that ORA-1555 is caused by long running query. And in alert.log file it will tell you which SQL caused ORA-1555 and run for how long.
However from time to time you will see errors like this. It's basically tell you that the query failed right away. So why's the case?


Mon Aug 29 06:39:09 2011
ORA-01555 caused by SQL statement below (SQL ID: 0jc2g6km899ps, Query Duration=0 sec, SCN: 0x00ae.75483a06):
Mon Aug 29 06:39:09 2011
SELECT.xxxx

I ran  a query to find out the time stamp of this query's SCN and found out that the query has a time stamp of 6AM. But i was failed 40 min later. That could only mean one thing that it was in a transaction that started 6AM and Oracle already over written the data in UNDO.

SYS@VAULTPROD>select scn_to_timestamp(749291977222) from dual;
SCN_TO_TIMESTAMP(749291977222)
---------------------------------------------------------------------------
29-AUG-11 06.00.01.000000000 AM


There’s a couple of ways to help improve the situation.
  • Does all the statements in this job need to be in single transaction? If not, don’t put them into single transaction.
  • Increase undo retention of DB, Oracle will try to honor this retention subject to UNDO space.
  • Increase the UNDO tablespace to mitigate the potential space squeeze but remember the reason we got this error is not from UNDO space limitation.