SQL Server Performance Tuning Tips

1/24/2005

SQL Server Performance Tuning Tips

2. Tips on Optimizing SQL Server Indexes

1. If your databases are set for "Auto Create Statistics," the SQL Server Query Optimizer, when running queries, will consider if there is any benefit to adding statistics for any column that doesn't already have statistics for it. This is for columns without indexes. This is a good thing as it helps provide better information to the Query Optimizer so that better optimized execution plans are created to execute queries.
The addition of an automatically added statistics to a column is also useful for something else. It is a clue to the potential need for an index on the column. In other words, if the Query Optimizer thinks that column statistics are useful, there is also a good chance that adding an appropriate index to this same column would be useful. This is not always the case, so you will need to perform some testing--before and after an index is added--to see if adding an index actually helps or not. But this is simple to do. How do you know if the Query Optimizer has automatically created column statistics on a column in a table?
Actually, this is quite easy to find out. Run the following query from Query Analyzer, which is pointing to a user database.

SELECT name
FROM sysindexes
WHERE (name LIKE '%_WA_Sys%')

This query will return all of the columns from the tables in your database that have column statistics on them that have been added automatically by the Query Optimizer. The value that is in the "name" column of the sysindexes table is the name assigned to the statistics that SQL Server keeps track of for the named column. This information provide you a starting point from which to explore whether or not adding indexes to these columns will be useful or not.

2. An index on a column can often be created different ways, some of which are more optimal that others. What this means that just because you create a useful index on a column doesn't mean that it automatically is the fastest version of that index. It is quite possible that a different version of the same index is faster.
The most obvious example of this is that an index can be a clustered or non-clustered. Another example of how an index is created that can affect its performance is the FILLFACTOR and PAD_INDEX settings used to create it. Also, whether the index is also a composite index or not (and what columns it contains) can affect an index's performance. Unfortunately, there is no easy answer as to which variation of the same index is the fastest in your situation as the data and queries run against the data are different. While I can't offer you specific rules that fit in all cases, the index tips you find on this website should help you decide which variation of an index is best in your particular circumstance. You may also need to test variations of the same index to see which variation works best for you.

3. Indexes cannot be created in a vacuum. In other words, before you can identify and create optimal indexes for your tables, you must thoroughly understand the kinds of queries that will be run against them. This is not an easy task, especially if you are attempting to add indexes to a new database.

Whether you are optimizing the indexes for the first time for a new database, or for a production database, you need to identify what queries are run, and how often they are run. Obviously, you will want to spend more time creating and tuning indexes for queries that are run very often than for queries that are seldomly run. In addition, you will want to identify those queries that are the most resource intensive, even if they aren't run the most often.
Once you know which queries run the most often, and which are the most resource intensive, you can begin to better allocate your time in order to get the biggest bang for your time invested.
But there is still one little question. How do you identify which queries are run the most often, and which are the most resource intensive? The easiest solution is to capture a Profiler trace, which can be configured to identify which queries run the most often, and to identify which queries use the most resources. How you configure Profiler won't be discussed now, as it would take a large article to explain all the options. The point here is to make you aware that the Profiler is the tool of choice to identify the queries that are being run against your database.
If you are adding indexes to a production database, capturing the data you need is simple. But if your database is new, what you will need to do is to simulate actual activity as best as possible, perhaps during beta testing of the application, and capture this activity. While it may not be perfect data, it will at least give you a head start. And once production begins, you can continue your index tuning efforts on an on-going basis until you are relatively satisfied that you have identified and tuned the indexes to the best of your ability.
Once you have identified the key queries, your next job is to identify the best indexes for them. This is also a process to big to describe in this single tip, although there are many tips on this website that relate directly to this issue. Essentially, what you need to do is to run each query you need to analyze in Query Analyzer, examining how it works, and examining its execution plan. Then based on your knowledge of the query, and your knowledge of indexing and how it works in SQL Server, you begin the art of adding and optimizing indexes for your application.

4. Indexes should be considered on all columns that are frequently accessed by the WHERE, ORDER BY, GROUP BY, TOP, and DISTINCT clauses. Without an index, each of these operations will require a table scan of your table, potentially hurting performance.
Keep in mind the word "considered". An index created to support the speed of a particular query may not be the best index for another query on the same table. Sometimes you have to balance indexes to attain acceptable performance on all the various queries that are run against a table.

5. Don't automatically add indexes on a table because it seems like the right thing to do. Only add indexes if you know that they will be used by the queries run against the table.

6. As a rule of thumb, every table should have at least a clustered index. Generally, but not always, the clustered index should be on a column that monotonically increases--such as an identity column, or some other column where the value is increasing--and is unique. In many cases, the primary key is the ideal column for a clustered index. See this url for more details about clustered indexes.

7. Static tables (those tables that change very little, or not at all) can be more heavily indexed that dynamic tables (those subject to many INSERT, UPDATES, or DELETES) without negative effect. This doesn't mean you should index every column. Only those columns that need an index should have them. But at least you don't have to worry about the overhead of indexes when they are added to static tables, as you must keep in mind when adding indexes to dynamic tables. In addition for these tables, create the indexes with a fillfactor and a PAD_INDEX of 100 to ensure there is no wasted space. This can help to reduce disk I/O, helping to boost overall performance.

8. Point queries, queries that return a single row, are just as fast using a clustered index as a non-clustered index. If you will be creating an index to speed the retrieval of a single record, you may want to consider making it a non-clustered index, and saving the clustering index (you can only have one) for a more complex query.

9. To help identify which tables in your database may need additional or improved indexes, use the SQL Server Profiler Create Trace Wizard to run the "Identify Scans of Large Tables" trace. This trace will tell which tables are being scanned by queries instead of using an index to seek the data. This should provide you data you can use to help you identify which tables may need additional or better indexes.

10. Don't over index your OLTP tables, as every index you add increases the time it takes to perform INSERTS, UPDATES, and DELETES. There must be a fine line drawn between having the ideal number of indexes (for SELECTs) and the ideal number for data modifications.

11. Don't accidentally add the same index twice on a table. This is easier to happen than you think. For example, you add a unique or primary key to a column, which of course creates an index to enforce what you want to happen. But without thinking about it when evaluating the need for indexes on a table, you decide to add a new index, and this new index happens to be on the same column as the unique or primary key. As long as you give indexes different names, SQL Server will allow you to create the same index over and over.

12. Drop indexes that are never used by the Query Optimizer. Unused indexes slow data modifications, causes unnecessary I/O reads when reading pages, and wastes space in your database, increasing the amount of time it takes to backup and restore databases. Use the Index Wizard (7.0 and 2000) to help identify indexes that are not being used.

13. Generally, you probably won't want to add an index under these conditions:

• If the index is not used by the query optimizer. Use Query Analyzer's "Show Execution Plan" option to see if your queries against a particular table use an index or not. If the table is small, most likely indexes will not be used.
• If the column values exhibit low selectivity, often less than 90%-95% for non-clustered indexes.
• If the column(s) to be indexed are very wide.
• If the column(s) are defined as TEXT, NTEXT, or IMAGE data types.
• If the table is rarely queried.

While high index selectivity is generally an important factor that the Query Optimizer uses to determine whether or not to use an index, there is one special case where indexes with low selectivity can be useful speeding up SQL Server. This is the case for indexes on foreign keys. Whether an index on a foreign key has either high or low selectivity, an index on a foreign key can be used by the Query Optimizer to perform a merge join on the tables in question. A merge join occurs when a row from each table is taken and then they are compared to see if they match the specified join criteria. If the tables being joined have the appropriate indexes (no matter the selectivity), a merge join can be performed, which is generally much faster than a join to a table with a foreign key that does not have an index.

*****

On data warehousing databases, which are essentially read-only, having an many indexes as necessary for covering virtually any query is not normally a problem.

*****

To provide the up-to-date statistics the query optimizer needs to make smart query optimization decisions, you will generally want to leave the "Auto Update Statistics" database option on. This helps to ensure that the optimizer statistics are valid, helping to ensure that queries are properly optimized when they are run.
But this option is not a panacea. When a SQL Server database is under very heavy load, sometimes the auto update statistics feature can update the statistics at inappropriate times, such as the busiest time of the day.
If you find that the auto update statistics feature is running at inappropriate times, you may want to turn it off, and then manually update the statistics (using UPDATE STATISTICS or sp_updatestats) when the database is under a less heavy load.
But again, consider what will happen if you do turn off the auto update statistics feature? While turning this feature off may reduce some stress on your server by not running at inappropriate times of the day, it could also cause some of your queries not to be properly optimized, which could also put extra stress on your server during busy times.
Like many optimization issues, you will probably need to experiment to see if turning this option on or off is more effective for your environment. But as a rule of thumb, if your server is not maxed out, then leaving this option is probably the best decision.

*****

Keep the "width" of your indexes as narrow as possible. This reduces the size of the index and reduces the number of disk I/O reads required to read the index, boosting performance.

*****

If possible, try to create indexes on columns that have integer values instead of characters. Integer values have less overhead than character values.

*****

If you have two or more tables that are frequently joined together, then the columns used for the joins should have an appropriate index. If the columns used for the joins are not naturally compact, then considering adding surrogate keys to the tables that are compact in order to reduce the size of the keys, thus decreasing I/O during the join process, increasing overall performance.

*****

When creating indexes, try to make them unique indexes if at all possible. SQL Server can often search through a unique index faster than a non-unique index because in a unique index, each row is unique, and once it is found, SQL Server doesn't have to look any further.

*****

If a particular query against a table is run infrequently, and the addition of an index greatly speeds the performance of the query, but the performance of INSERTS, UPDATES, and DELETES is negatively affected by the addition of the index, consider creating the index for the table for the duration of when the query is run, then dropping the index. An example of this is when monthly reports are run at the end of the month on an OLTP application.

*****

If you like to get under the cover of SQL Server to learn more about indexing, take a look at the sysindex system table that is found in every database. Here, you can find a wealth of information on the indexes and tables in your database. To view the data in this table, run this query from the database you are interested in:

SELECT *
FROM sysindexes.

Here are some of the more interesting fields found in this table:

• dpages: If the indid value is 0 or 1, then dpages is the count of the data pages used for the index. If the indid is 255, then dpages equals zero. In all other cases, dpages is the count of the non-clustered index pages used in the index.
• id: Refers to the id of the table this index belongs to.
• indid: This column indicates the type of index. For example, 1 is for a clustered table, a value greater than 1 is for a non-clustered index, and a 255 indicates that the table has text or image data.
• OrigFillFactor: This is the original fillfactor used when the index was first created, but it is not maintained over time.
• statversion: Tracks the number of times that statistics have been updated.
• status: 2 = unique index, 16 = clustered index, 64 = index allows duplicate rows, 2048 = the index is used to enforce the Primary Key constraint, 4096 = the index is used to enforce the Unique constraint. These values are additive, and the value you see in this column may be a sum of two or more of these options.
• used: If the indid value is 0 or 1, then used is the number of total pages used for all index and table data. If indid is 255, used is the number of pages for text or image data. In all other cases, used is the number of pages in the index.

*****

Don't use FLOAT or REAL data types for primary keys, as they add unnecessary overhead and can hurt performance.

*****

If you want to boost the performance of a query that includes an AND operator in the WHERE clause, consider the following:

• Of the search criterions in the WHERE clause, at least one of them should be based on a highly selective column that has an index.
• If at least one of the search criterions in the WHERE clause is not highly selective, consider adding indexes to all of the columns referenced in the WHERE clause.
• If none of the column in the WHERE clause are selective enough to use an index on their own, consider creating a covering index for this query.

*****

The Query Optimizer will always perform a table scan or a clustered index scan on a table if the WHERE clause in the query contains an OR operator and if any of the referenced columns in the OR clause are not indexed (or does not have a useful index). Because of this, if you use many queries with OR clauses, you will want to ensure that each referenced column in the WHERE clause has an index.

*****

A query with one or more OR clauses can sometimes be rewritten as a series of queries that are combined with a UNION statement, in order to boost the performance of the query. For example, let's take a look at the following query:

SELECT employeeID, firstname, lastname
FROM names
WHERE dept = 'prod' or city = 'Orlando' or division = 'food'

This query has three separate conditions in the WHERE clause. In order for this query to use an index, then there must be an index on all three columns found in the WHERE clause.
This same query can be written using UNION instead of OR, like this example:
SELECT employeeID, firstname, lastname FROM names WHERE dept = 'prod'
UNION
SELECT employeeID, firstname, lastname FROM names WHERE city = 'Orlando'
UNION
SELECT employeeID, firstname, lastname FROM names WHERE division = 'food'

Each of these queries will produce the same results. If there is only an index on dept, but not the other columns in the WHERE clause, then the first version will not use any index and a table scan must be performed. But in the second version of the query will use the index for part of the query, but not for all of the query.
Admittedly, this is a very simple example, but even so, it does demonstrate how rewriting a query can affect whether or not an index is used or not. If this query was much more complex, then the approach of using UNION might be must more efficient, as it allows you to tune each part of the index individually, something that cannot be done if you use only ORs in your query.
If you have a query that uses ORs and it not making the best use of indexes, consider rewriting it as a UNION, and then testing performance. Only through testing can you be sure that one version of your query will be faster than another.

*****

The Query Optimizer converts the Transact-SQL IN clause to the OR operator when parsing your code. Because of this, keep in mind that if the referenced column in your query doesn't include an index, then the Query Optimizer will perform a table scan or clustered index scan on the table.

*****

If you use the SOUNDEX function against a table column in a WHERE clause, the Query Optimizer will ignore any available indexes and perform a table scan. If your table is large, this can present a major performance problem. If you need to perform SOUNDEX type searches, one way around this problem is to pre-calculate the SOUNDEX code for the column you are searching and then place this value in a column of its own, and then place an index on this column in order to speed searches.

*****

If you need to create indexes on large tables, you may be able to speed up their creation by using the new SORT_IN_TEMPDB option available with the CREATE INDEX command. This option tells SQL Server to use the tempdb database, instead of the current database, to sort data while creating indexes. Assuming your tempdb database is isolated on its own separate disk or disk array, then the process of creating the index can be sped up. The only slight downside to using this option is that it takes up slightly more disk space than if you didn't use it, but this shouldn't be much of an issue in most cases. If your tempdb database is not on its own disk or disk array, then don't use this option, as it can actually slow performance.

*****

SQL Server 2000 Enterprise Edition (not the standard edition) offers the ability to create indexes in parallel, greatly speeding index creation. Assuming your server has multiple CPUs, SQL Server 2000 uses near-linear scaling to boost index creation speed. For example, using two CPUs instead of one CPU almost halves the speed it takes to create indexes.

*****

As you probably already know, indexes on narrow columns are preferable to indexes on wide columns. The narrower the index, the more entries SQL Server can fit on a data page, which in turn reduces the amount of I/O required to access the data. But sometimes the column you want to search on using an index is much wider than desirable.
For example, let's say you have a music database that lists the titles of over 5,000,000 songs, and that you want to search by song title. Also assume that the column used to store the music titles is a VARCHAR(45). Forty-five characters is a very wide index, and creating an index on such a wide column is not wise from a performance perspective. So how do we deal with such a scenario?

*****

SQL Server 2000 offers a new function called CHECKSUM.
The main purpose for this function is to create what are called hash indices. A hash indices is an index built on a column that stores the checksum of the data found in another column in the table. The CHECKSUM function takes data from another column and creates a checksum value. In other words, the CHECKSUM function is used to create a mostly unique value that represents other data in your table. In most cases, the CHECKSUM value will be much smaller than the actual value. For the most part, checksum values are unique, but this is not guaranteed. It is possible that two slightly different values may produce the same identical CHECKSUM value.
Here's how this works using our music database example.
Say we have a song with the title "My Best Friend is a Mule from Missouri". As you can see, this is a rather long value, and adding an index to the song title column would make for a very wide index. But in this same table, we can add a CHECKSUM column that takes the title of the song and creates a checksum based on it. In this case, the checksum would be 1866876339. The CHECKSUM function always works the same, so if you perform the CHECKSUM function on the same value many different times, you would always get the same result.
So how does the CHECKSUM help us?
The advantage of the CHECKSUM function is that instead of creating a wide index by using the song title column, we create an index on the CHECKSUM column instead. "That's fine and dandy, but I thought you wanted to search by the song's title? How can anybody ever hope to remember a checksum value in order to perform a search?"
Here's how. Take a moment to review this code:

SELECT title, artist, composer
FROM songs
WHERE title = 'My Best Friend is a Mule from Missouri'
AND checksum_title = CHECKSUM('My Best Friend is a Mule from Missouri')

In this example, it appears that we are asking the same question twice, and in a sense, we are. The reason we have to do this is because there may be checksum values that are identical, even though the names of the songs are different. Remember, unique checksum values are not guaranteed.
Here's how the query works. When the Query Optimizer examines the WHERE clause, it determines that there is an index on the checksum_title column. And because the checksum_title column is highly selective (minimal duplicate values) the Query Optimizer decides to use the index. In addition, the Query Optimizer is able to perform the CHECKSUM function, converting the song's title into a checksum value and using it to locate the matching records in the index. Because an index is used, SQL Server can very quickly locate the rows that match the second part of the WHERE clause. Once the rows have been narrowed down by the index, then all that has to be done is to compare these matching rows to the first part of the WHERE clause, which will take very little time. This may seem a lot of work to shorten the width of an index, but in many cases, this extra work will pay off in better performance in the long run. Because of the nature of this tip, I suggest you experiment using this method, and the more conventional method of creating an index on the title column itself. Since there are so many variables to consider, it is tough to know which method is better in your particular situation unless you give them both a try.

*****

Some queries can be very complex, involving many tables, joins, and other conditions. I have seen some queries run over 1000 lines of code (I didn't write it). This can make them difficult to analyze in order to identify what indexes might be used to help the query perform better.
For example, perhaps you want to create a covering index for the query and you need to identify the columns to include in the covering index. Or, perhaps you want to identify those columns that are used in joins in order to check to see that you have indexes on those columns used in the joins in order to maximize performance.
To make complex queries easier to analyze, consider breaking them down into their smaller constituent parts. One way to do this is to simply create lists of the key components of the query, such as:

• List all of the columns that are to be returned
• List all of the columns that are used in the WHERE clause
• List all of the columns used in the JOINs (if applicable)
• List all the tables used in JOINs (if applicable)

Once you have the above information organized in this easy-to-comprehend form, it is must easier to identify those columns that could potentially make use of indexes when executed.

*****

Queries that include either the DISTINCT or the GROUP BY clauses can be optimized by including appropriate indexes.
Any of the following indexing strategies can be used:

• Include a covering, non-clustered index (covering the appropriate columns) of the DISTINCT or GROUP BY clauses.
• Include a clustered index on the columns in the GROUP BY clause.
• Include a clustered index on the columns found in the SELECT clause.

Adding appropriate indexes to queries that include DISTINCT or GROUP BY is most important for those queries that run often. If a query is run rarely, then adding an index for its benefit may cause more performance problems than prevent.

*****

Computed columns in SQL Server 2000 can be indexed if they meet all of the following criteria:

• The computed column's expression is deterministic. This means that the computed value must always be the same given the same inputs.
• The ANSI_NULL connection-level object was on when the table was created.
• TEXT, NTEXT, or IMAGE data types are not used in the computed column.
• The physical connection used to create the index, and all connections used to INSERT, UPDATE, or DELETE rows in the table must have these six SET options properly configured: ANSI_NULLS = ON, ANSI_PADDINGS = ON, ANSI_WARNINGS = ON, ARITHABORT = ON, CONCAT_NULL_YIELDS_NULL = ON, QUOTED_IDENTIFIER = ON, NUMERIC_ROUNDABORT = OFF.
If you create a clustered index on a computed column, the computed values are stored in the table, just like with any clustered index. If you create a non-clustered index, the computed value is stored in the index, not in the actual table.
While adding an index to a computed column is possible, it is rarely advisable. The biggest problem with doing so is that if the computed column changes, then the index (clustered or non-clustered) has to also be updated, which contributes to overhead. If there are many computed values changing, this overhead can significantly hurt performance.
The most common reason you might consider adding an index to a computed column is if you are using the CHECKSUM() function on a large character column in order to reduce the size of an index. By using the CHECKSUM() of a large character column, and indexing it instead of the large character column itself, the size of the index can be reduced, helping to save space and boost overall performance.

*****

Many databases experience both OLTP and DSS queries. As you probably already know, it is nearly impossible to optimize the indexing of a database that has both type of queries.
This is because in order for OLTP queries to be fast, there should not be too many indexes as to hinder INSERT, UPDATE, or DELETE operations. And for DSS queries to be fast, there should be as many indexes as needed to speed SELECT queries.
While there are many options for dealing with this dilemma, one option that may work for some people is a strategy where DSS queries are mostly (if not all) are run during off hours (assuming the database has any off hours), and take advantage of indexes that are added each night before the DSS queries begin, and then are dropped once the DSS queries are complete.
This way, those indexes needed for fast performing DSS queries will minimally interfere with OLTP transactions (especially during busy times).
As you can imagine, this strategy can take a lot of planning and work, but in some cases, it can offer the best performance for databases that experience both OLTP and DSS queries. Because it is hard to guess if this strategy will work for you, you will want to test it before putting it into production.

*****

Be aware that the MIN() or MAX() functions can take advantage of appropriate indexes. If you find that you are using these functions often, and your current query is not taking advantage of current indexes to speed up these functions, consider adding appropriate indexes.

*****

If you know that a particular column will be subject to many sorts, consider adding a unique index to that column. This is because unique columns generally sort faster in SQL Server than if there are duplicate column data present.

Tech Stuff

1/24/2005

SQL Server Performance Tuning Tips