Ok, it is not what you think it to be, its about R.A.I.D. Technology as in redundant array of independent (or inexpensive) disks.
I am providing a generalized view of how i worked with the SQL Server and related Storage Environment(s), i am not a Storage expert or do not have hands on Enterprise Storage system at my leisure (But limited access to EMC ClarionCX4-480),  to try out various performance configurations and enhancements. I Just have one shot of getting the configuration right and better get them done right that time. I am discussing the generalized benefits of SQL Server Performance on RAID 5 vs RAID 10 systems. People have done a lot more research to prove some technologies that work and some don’t. So Following the guidelines, i present some meaningful overview of SQL Server and Storage concepts.

RAID is the technique used where data can be written to Multiple Hard Drives to provide redundancy, where if one hard disk failure in the array of disks, still preserves the data across several other and thus providing fault tolerance.

I have seen most DBA’s have little or no Knowledge of RAID Technologies OR are not interested in getting themselves engaged in the Database file management/architecture scenarios and let the system admins or Managers handle it as it is. Also many don’t even know, how their RAID environment is configured, which is a SAD scenario.

DBA’s need to understand, how important it is to know these aspects of technology and the benefits and cons associated with it.  I would promote the DBA’s to learn about Tiered Storage and DISK Speeds / RAID controllers / SAN Vendors / Strip sizes. There are many Whitepapers on the VENDOR Sites (EMC/DELL/HP/IBM to name a few), which are really helpful in understanding the SERVERS as well as performance benefits of various technologies.

NOTE: Microsoft recommends not to put the page file on the RAID-5 Volume, because it degrades the performance and to use a mirrored (RAID 1) volume instead. Because re-generating a RAID 5 Orphaned disk takes some work, where the network connection to the volume are lost and the volume has to be locked.

RAID is a part and parcel of every MID to HIGH END server and (now gaming and other desktops too) and has been a very hot Topic. A DBA, should be aware of the concepts as this encapsulates the performance enhancement strategies. I will be referring to SQL Server RAID configurations practices and the ones i have worked across.

RAID technologies have various forms as we will take a look. MID  and SMALL companies are heavily relying on the default RAID 5 configuration, because of the cost associated with it and the maximum space availability associated with it. Many Database Shops are still relying on the Out of Box RAID 5 configuration for all of their requirements.(which is not a bad bet, but when it comes to database servers with high Random IO, we have to re-think our strategies).

Consider for example a single hard disk. when you start windows/outlook/SQL Server/3d Studio Max, There are certain factors that come into play. The hard disk works with the CPU and Memory to Produce (OPEN) the programs as quickly as possible. This heads have to do all the work of reads and writes before data is fetched from HDD into memory. It will first bring up windows, then outlook, then SQL Server and 3DStudio. HDD will keep on reading the blocks as and when available. So you see the half the program load before the other program loads half way and then the previous one finishes loading. There are many disadvantages of using the single disks or JBOD for IO intensive applications. Disk Subsystems come in all flavors of SPEED (10K RPM to 15K RPM for servers) and (5400 RPM to 7200 RPM and now sometime at 10000 RPM for special gaming desktops) Another HDD are Solid State, which absolutely require 1/4th the time compared to regular disks to fetch data.

With one of my clients, I was given SQL Server 2000 and 2005 systems with 1 LUN based on RAID-5 and another volume a RAID-1 for binaries. This kind of configuration has some performance constraints and issues. and on top of it if your data files(.mdf and .ldf) and system databases all reside on the same LUN from same RAID system, they do not provide any performance benefits. There is definitely going to be a write as well as read overhead associated with this design, where your disk avg. queue length will top it all. All your Files are still on the SAME SET and are getting Sequentially Read.(Specially Log Files IO are Sequential)

Many forms of RAID-5 can have data loss after a system failure. Data loss occurs because a system failure causes the data and parity in the RAID-5 volume to become unsynchronized. If an attempt is made to read data contained on a stale subdisk, the data is reconstructed from data on all other stripe units in the stripe. This operation is called a reconstructing-read. This is a more expensive operation than simply reading the data and can result in degraded read performance. When a RAID-5 volume has stale subdisks, it is considered to be in degraded mode. If the HOT-SPARE is available and enabled, it is automatically triggered by the failure and Data can be re-generated using the parity.

RAID-5 is less expensive than RAID-10, also write performance has a better advantage on RAID-10 over Raid-5, because of the overhead of writing parity information in RAID 5. (Which is negligible if not HIGH Random I/O’s are generated).

The most common RAID types used in production IT environments are RAID-1, RAID-5, and RAID-10

When i talk about high end and low end disk subsystems, i mean a 7.2K vs 10K vs 15K disk-subsystems under the RAID-5 systems. so these variations makes a lot of difference in performance as well. STRIP SIZE is another factor which should be taken into consideration:

Strip Size defines the block size defined while the disk is prepared as LUN.

The default block size used by most vendors is 64K. Generally it depends upon your environment requirements.

Do not take my word for it. Here are some more references that are a great READ.

Start by reading this paper on http://technet.microsoft.com/en-us/library/cc966414.aspx
http://blogs.msdn.com/jimmymay/ http://blogs.msdn.com/jimmymay/archive/tags/Disk+Partition+Alignment/default.aspx
http://kendalvandyke.blogspot.com/2009/02/disk-performance-hands-on-series.html
http://blogs.msdn.com/sqlcat/archive/2005/11/21/495440.aspx

Remember RAID 0+1(Disks are Striped together then Mirrored) is different than RAID 1+0 (Disks Groups  are Mirrored then Stripped)

Spreading your SQL Server Databases across separate LUNS from separate disks (RAID Groups) can benefit you in getting rid of many(not all) Performance issues resolved with slowness and locking and several other disk related IO will be minimized.

In case of a RANDOM IO Type environment or even close or if you are have somehow figured the issues is with the IO, then try the following for performance benefits. This depends on the type of luxury you have with storage and $$ to spend. But consider the Performance (benefit) enhancements that come along and how some of the generalized issues of slowness will go away.

As an example: A SQL 2000 Server and a Database size of 180 GB, used to take 70 min’s on a RAID 5 system(Data Files / Log Files / TempDB / Indexes / Backups are all stored on SAME LUN with strip size of 8K), Which was a havoc. The system Manager was in disagreement regarding any performance issues on this box.
I had to challenge him to provide extra space in form of additional dedicated LUN with STRIP size to 64KB from separate physical disks for Log Files and TEMPDB. Which he did and immediately saw the dramatic change, in performance of backups and Other database tasks. (15K RPM disks with RAID 5 (4+1 disks) and block size of 64K). The backup were down to under 50 min’s and rest of the Disk Queue Length (Just one the primary measures i use) on both of these Logical Drives under acceptable limits. It was no longer that we moved to IBM DS8000 Ent. SAN with configurations to RAID 10 and our problems started to fade away for slowness and Backup interfering with Daily Jobs and on top of it blocking scenarios due to Slow IO. The backups went down to 10 min’s.. I have everything separated out as a SQL Server best practice. Same benefits i got from another SQL Server 2005 with Db size of 90 GB and now the files are on separate RAID 10 volumes, with the exception of binaries.

ok how do you find you current strip size / block size.

In windows cmd type fsutil fsinfo ntfsinfo <drive letter:>
then check the Bytes Per Cluster.

Transaction Log Files = RAID 10 (There is always another option as described below…)
Data and index Files = RAID 10 Or  RAID 5
Tempdb (Depend on Load) = RAID 10 (RAID 5 will do if not Heavy Write intensive LOAD)
Binaries and OS = RAID 1

Or Assign a Database log File to a Separate LUN with dedicated Physical disks with nothing else on it. Also Tempdb to a Separate LUN with dedicated Physical Disks with nothing else on it.
If you are a active DBA with Enterprise SAN at your Leisure, then work with it dedicatedly, defining your Database File storage structures and location.

Many will disagree, but the lessons i have learnt with hands burnt down, provide me solid proof for perusing the best practices.

Performance Tuning is a integral part of the DBA Activities and to say it is the Top Job Requirement too, which encapsulates the DBA’s major Skills. Performance Tuning is a very enormous topic and cannot be covered in one paper or topic, but can provide you a stepping stone in the right direction. I discuss here what i do and how i approach performance issues also slicing information for wannabe DBA’s. Every Database Professional has his own perspective of direction he wants to take and cannot be debated and can become a learning source.

There are several things you need knowledge of as a Database Server Performance Tuning Expert and i have tried to simplify the understanding as much as i can in this article.

Having Database Internals Knowledge will help you in long run, but having a good knowledge in SQL Server overall environment will server the purpose, as to perform your duties.

  • Understanding client / business / server needs. (be a very good listener) : What i mean by a very good listener is to really pay attention and write the points down if you can, and not start forming solution as you dig into the issue/problem or as the user explains.
  • Have good knowledge about (Blocking / Deadlock / Query Plans / Profiler / Event Viewer / SQL Error Logs / Good Understanding of Indexes (Fragmentation) / Protocols (Named Pipes/TCPIP/Shared Memory) / File Groups / Shrinking (Database and Transaction Log) / system databases.
  • Profiler Traces : Know what to capture in profiler trace and how to analyze them.
  • Creating and reading Performance Monitor counters.(Perfmon)
  • Hands on using basic T-SQL for your DMV’s / system SP’s.
  • SQL Server CPU / Memory Management Techniques (know your sp_configure) or SSMS.
  • DISK IO Monitoring / Management in respect to Database Data Files.
  • Understanding of TEMPDB Database Management. http://blogs.msdn.com/sqlserverstorageengine/archive/tags/TempDB/default.aspx
  • Good understanding about RAID (Redundant Array of Independent Disks) Technologies and Advantages of Placing Data / T-Log /Tempdb files on various RAID formations.
  • ~ Background about Windows OS (Processes and CPU / Memory Management)
  • Difference between Several Edition of Windows and SQL Server and how a combination of those can help in improvement of resource usage. (e.g.: upgrading to SQL Server 2008 over 2005 will be advantageous as 2008 has built-in compression that allows you to compress the database files and the transaction log files / backup compression / Built-in SQL Server 2008 Policy Management  etc)
  • Hands on experience with performance tuning can be advantageous.
  • Going Further knowledge about how to improve SSIS / SSAS / SSRS Performance can be advantageous.
  • Last but not the least good command on BING or Google Search: (you know for what?)

There might be more additions and subtractions to the above pointers, but this is a generalized specs of sql server performance monitoring requirements.

Always remember, For a Database Server, I/O is the primary reflector. Since Database servers do more I/O Intensive work than bandwidth related. we need to focus at index fragmentation / disk fragmentation / Disk IO / How the CPU / Memory is used? This should be the primary focus of a performance tuning expert.

Suggestions to users are definitely good, but should be limited and depend on existing resources/hardware in place, but wrong recommendations can crash up in the middle of nowhere as in “CAST AWAY” and you are on your own forever. So read a lot of whitepapers, before you actually start looking at suggestions/solutions.

When starting a performance tuning task, always remember these small pointers.

  • Listen to what the performance issue is. A good listener and lots of question are good for you and make your life easier, or even can give you the solution right away, but wait and re-think twice.
  • Check to see if the users/server has a existing baseline file to compare solutions/results with.
  • Ask about how the issue appeared in the first place. what time it appeared. how is the system configured (CPU / Memory / Disk configuration).
  • Analyze to check for, human caused error or hardware failure or system generated issues(Software/Hardware).
  • Check Editions of SQL Server and Windows Used along with configuration parameters.
  • Check the SQL Error Logs for that particular time (and compare with the similar times before days) / Get Event Viewer Logs and analyze them as much as you can.)
  • How critical is this issue (Production will be affected or production work can still continue)
  • Check the Process Monitor/Task Manager for a Quick View of the system resources. Or use SYSINTERNAL Suite from Microsoft to further analyze further (http://technet.microsoft.com/en-us/sysinternals/default.aspx)
  • Check to see if any 3rd party Drivers are installed on the servers. or any other applications are installed which might be the cause of the conflict.
  • Is this a clustered / replication / log shipping / mirroring  environment. Dig into the details, because these environments might need a diff approach and strategies.
  • Check sp_configure and sys.sysprocesses OR sys.dm_exec_requests / sys.dm_exec_sessions / sys.dm_exec_connections for processes running and any blocking /deadlock in the system.
  • Check for Index fragmentation (sys.dm_db_index_physical_stats) / overlapping indexes. – Too many HEAP tables in the queries can be Trouble too. Try to reduce the # of Heap tables. Also check to see if additional Logically correct indexes can be created on table, but remember additional index are even overheads for the system and can damped your table.
  • Too Many varchar and big varchar columns might slow the Queries.(restrict the length of the varchar columns, if there is nothing you can do about the data type itself)
  • Run Perfmon counters in windows, for the activity you want to Log (DISK / CPU / MEMORY).
  • Run Profiler Trace for Databases level performance issues.
  • Let PAL analyze your Perfmon Counter Log files some times. (http://www.codeplex.com/PAL)
  • If these are several queries in question, check Query Plans. Check for Complex JOINS and related objects.
  • Do not kill blocked / required Queries (because it is lucrative), without knowing or sharing the effects with the system owner.
  • Document each and every step in the process. (document each change to implement on the system)
  • Build a Roll back plan for each step.
  • Use Bing and Google wisely (you know what?)

Quick Example of a Performance Tuning Scenario:

Working with Parallelism :

We had a situation where one of our SQL Servers 2005 STD SP3 Edition on Windows 2003 SP2 Standard Edition, used for Solar Winds Network performance monitoring Database had a great deal of performance issues.

After the users reported to me about the slowness and blocking, i started with some common sys.dm_exec_requests / exec_sessions / exec_connections queries and the good old sysprocesses. i saw bunch of blocking on the server. I used SQL Profiler to further filter down the results and captured some more queries (SP’s) that were posing an issue. I checked for index fragmentation with sys.dm_db_index_physical_stats DMV and found fragmentation for some critical table to be above 80%, I got down to several Stored procedures and created some NC indexes on the related tables(based on the where clause) to manage those blocking issues and it worked to a extent and also Tried to defragment the indexes using ALTER INDEX. This system had lot of long varchar fields (I am not furnishing the details.) But i could still see some blocking even after the initial index Maintenance. This server had 2 Socket 2 Cores each = 4 CPU (Hyper threaded) with 4 GB of RAM, and data and log file were all on the same volume E which was a Local RAID5, presented as C and E volumes. so nothing much can be done there.

Next thing i went for was from sys.dm_exec_requests i saw CXPACKET waittype to be occupying 40% of the total waittype’s. Following that lead, I checked the signal_waits for CPU Bottleneck with sys.dm_os_wait_stats and wanted to see the waiting queries in Queue on CPU. I was able to see that there was sufficient signal_waits to raise my doubts even more, then i checked sys.dm_os_wait_stats for wait_statistics and found CXPACKET number as the main culprit. CXPACKET relates to Parallel Queries or Parallelism and shows the Parallelism Option is Putting LOAD on CPU for Execution of Parallel Queries to speed up the process. Now i knew what i wanted to do with this info in hand.

I opened up the sp_configure with:

sp_configure ‘show advanced option’,’1′
reconfigure with override

Max degree of parallelism :  Signifies the number of CPU’s used to Run parallel Queries. Default is 0, which tell SQL Server to use all available processors. Setting this value to 1 will turn off Parallelism also the cost threshold for parallelism will be ignored.

Cost threshold for parallelism : This setting is in seconds and tells the Query Processor to start using parallel query execution after a certain time (elapsed time in seconds). This is ignored if the Max Degree of Parallelism is set to 1. Default value is 5.

I had to Turn off the Max degree of parallelism using the value as “1”. and checked the Server after a couple of minutes and everything was back to normal. the server was at good performance levels.

sp_configure ‘Max degree of parallelism’,’1’
reconfigure with override

This was one of the Performance issues i faced on work.

So what knowledge was used: Blocking / Deadlock / Simple T-SQL / DMV’s + system SP’s / Sp_configure / CPU / MEMORY / Query Parallelism / Profiler /Index Fragmentation knowledge.

The total workaround took me less than 2 hours. I Guess when you become familiar with your environment and performance tuning strategies, things become simple and can be acted upon much faster.

Parallel INDEX Operation are available only in SQL Server Enterprise Edition.

MAXDOP can also be used a QUERY HINT Option

In a multiprocessor SQL Server System, this setting controls the number of processors you want to use in query execution or use in parallel Query execution.

When we execute a SQL Query, the Query processor in the SQL Server Engine, goes to work and determines among other factors, the processors to use for the Query. If the setting is at default, SQL Server will determine the number of Processors to use for a particular Query, depending upon the complexity of the query.

Default setting is = 0  (use at will or all) – which tells SQL Server to use all the available processors as desired for the particular query.

This setting can be changed in many ways and some are:

using SSMS: Right click on Server –> Properties, then click on Advanced.

image

or using sp_configure in Query Analyzer.

sp_configure ‘show advanced options’,’1′
reconfigure with override

again specify sp_configure and t will produce these results. you can change the value if you are an expert, otherwise leave it at default as it best perform when at default.

image

Now using the MAXDOP in the SELECT Query as Query Hint is also possible as with this example from BOL:

USE AdventureWorks ;
GO
SELECT ProductID, OrderQty, SUM(LineTotal) AS Total
FROM Sales.SalesOrderDetail
WHERE UnitPrice < $5.00
GROUP BY ProductID, OrderQty
ORDER BY ProductID, OrderQty
OPTION (MAXDOP 2); – tells the query to use 2 processors.

SQL Server will self detect the best degree of parallelism based on :

  • If sufficient result is the be produced, i.e. sufficient number of rows are present as a result.
  • Type of Query or INDEX used. (Large Joins and Aggregates)
  • If sufficient Threads are available for the operation. (More threads are required to execute a Query in parallel than when it is executed serially, if the thread requirement is not met, by sql server it decreases the parallel execution or completely abandons the parallel execution).

A bit complex internally to SQL Server, so lets us concentrate on our Queries for now and if we want to control the num of processors used for a Query.

I would suggest to give this a try if you have more then 4 processors. If you have a total of 4 Processor(Physical+Hyper threaded), leave this to default.

But i had benefitted on one of my Servers (running solarwinds network performance monitoring services) Running Extensive transactions. I Turned off the Max Degree of Parallelism and the server was faster with no locking scenarios than before the setting was at default of 0.

sp_configure ‘Max degree of parallelism’,’1’ – use this to turn off the parallelism.

Pl read more before taking any action on your production servers. The information mentioned above benefitted me, and does not mean it will be same for you. So try the things on your TEST Servers before ever Thinking of the production boxes.

I saw these errors in one of our Network Monitoring Server and this being a highly transactional and critical 2005 database server there was no room for “ OOPS!! “.

Use flag –g512 as the SQL Server Start up Parameter, then restart SQL Server. The error will no longer appear in the SQL server Error Log. The CPU and Memory be with in acceptable limits after this change.

I also disabled the Query parallelism because of the Queries Slowness and the Deadlock occurring in the error Log. I did this and it worked for this server, and in this environment.

sp_configure ‘show advanced options’,’1′
reconfigure with override

sp_configure ‘max degree of parallelism’,’1′
reconfigure with override

image

Ok, i meant was if your servers are located at remote locations(Across WAN) or Over slow networks, like our are located at Colorado and we are at Pittsburgh, Definitely we were having the issue of SLOW QUERY Responses.

What do you look for in this kind of a incident, when everything is working good as per the SYSADMINS and NETWORK ADMINS, the DBA’s are left to wander in the dark, with out any proper responses.

I started playing with protocols Named Pipes and TCPIP but hey what i found was even if you select the TCPIP in the SSMS connection box, it resolved to Named Pipes during the connection.

I tried again and with the same results and same slowness of the Query.

Ok how slow were the results: Well PRE-MOVE to Colorado, we were at a baseline of 28000 rows in 10 sec with a select * from <table name>

After the move we were only able to retrieve 400 rows in 10secs.. Well isn’t that a huge difference, and as a DBA, my Judgment was at risk and so was my performance on the EDGE.

Now I checked the CLICONFG at START-> RUN. Changed the connection to TCPIP and LO! Gods Spell worked and it worked like a charm. Well these are some hard earned but interesting and helpful lessons learned on the JOB.

http://msdn.microsoft.com/en-us/library/aa178138(SQL.80).aspx

image

image

image

Disable the Check for publishers certificate revocation option which is under the Internet Options –> Advanced
OR
Under Start -> Program Files –> Microsoft SQL server 2005 –> Configuration Tools ->SQL Server Error and Usage Reporting.

image

© 2010 D B A N A T I O N Suffusion WordPress theme by Sayontan Sinha
Better Tag Cloud