Performance Enhancement for Index Seek + Key Lookup in SQL Server 2

In last post  you saw way to imrove the performance of the key lookup when the data is clustered perfectly. In this post I will show the scenarios where the perforamnce might be worse than actual methods.

1. The data is perfectly clustered but all of your queries get the most of the data from the table. The threshold at which this method will perform bad depends on the table size and indexes and queries. Thus test you queries and see at what point the tweaked sql is performing bad.

exec clustproc @dt= ‘20020701’,@days = 30500 with recompile
go

–864 IO’s for full table..whereas normal table scan is just 803 IO’s. so in the case where you need full table’s data then this might be slightly bad performaing..

2. This method is best when the data in index key is perfectly clustered with the clustered index key. DO NOT USE this method when the clustering is not good. To show this I will create clustered index on rowguid columns so that the clustering is very bad for orderdate column.

if OBJECT_ID(‘actsoh’) is not null
begin
drop table actsoh
end
select * into actsoh from sales.SalesOrderHeader
go
alter table actsoh alter column rowguid varbinary(32)
go
create unique clustered index idx_pk_rowguid on actsoh(rowguid)
go
create nonclustered index idx_ord_dt on actsoh(orderdate)
go

update Statistics actsoh with fullscan
go
–now query the table say based on the order date range of say around 1 day
–1 month
–or 1 year
exec nproc @dt= ‘20010701’,@days = 0 with recompile
–142 IO’s. 0 ms CPU,170ms elapsed time.. –using index seek + key lookup

–1 month
exec nproc @dt= ‘20010701’,@days = 30 with recompile

–574 IO’s. 0 ms ,382 ms –using index seek + key lookup
–try for 1 year
exec nproc @dt= ‘20010701’,@days = 365 with recompile
go
–804 for full table scan..
–Now i will create the clustproc again with the rowguid column..
–Now I will apply my technique..In This case the clustering is very bad so let us see how it will perform..
drop proc clustproc
go
create proc clustproc
(@dt datetime,
@days smallint = 0
)
as
begin
declare @minrowguid varbinary(32),@maxrowguid varbinary(32)

select
@minrowguid= MIN(rowguid),@maxrowguid=max(rowguid)
from
actsoh idx with (index = 2)
where
1=1
and idx.OrderDate between @dt and DATEADD(dd,@days,@dt)

select tbl.* from
actsoh tbl with (index = 1)
where
1=1
and tbl.rowguid between @minrowguid and @maxrowguid
and tbl.OrderDate between @dt and DATEADD(dd,@days,@dt)
end
go

exec clustproc @dt= ‘20010701’,@days = 0 with recompile
–759 IO’s. 0 ms CPU,130ms elapsed time.. This is way too high than 142 by normal proc
–1 month
exec clustproc @dt= ‘20010701’,@days = 30 with recompile
–804 this is equivalent to table scan.. it is worse than 574 IO’s by using normal index seek + lookup
–try for 1 year
exec clustproc @dt= ‘20010701’,@days = 365 with recompile
go
–828 IO’s. 31 ms CPU,314 elapsed time.
–for full table
exec clustproc @dt= ‘20020701’,@days = 30500 with recompile
go
–938 IO’s/
/*

Conclsuion: Before using this method please make sure following.
1. Apply Normal performance technique like covering index etc to improve the proc performance.
2. Make sure that the data is clustered almost perfectly and most of your queries are used to get a part of the data and not whole table.
3. Benchmark the performance of this method with the best method you have already.If it provides enough perf gain then you could implement it.
4. Just test and try in dev and before appying to prod.Test whether it returns the data properly.
*/

Conclsuion:

This is quite a good way to reduce the IO and overall resource usages when the data is perfectly clustered.But before using this make sure that you try normal performance techniques like coversing indexes,tryidng to reduce the number of columns so that some of the existing indexes covers the actual query. Also, make sure that you benchmark it against your best method for all kind of posisble scenarios and then based on that decide whether to use this or not.

Performance Enhancement for Index Seek + Key Lookup in SQL Server 1

Index Seek + lookup: This is considered very bad in SQL server.But I think it is normal because you can not have all the columns in the index key always and thus to get the columns which are not part of index key,you need to get it from data page.

Anyway, there are number of ways to enhance the performance where index key + lookup is causing the issue.All these are specific e.g. one is to create a covering index for a given query.Thus making sure that we have just the index seek and no key lookup. But can we do it for all our queries?No.However, you can use this for the queries which are quite frequently used or which are causing real perf issues and creating a covering index avoid this.

Anyway, I am going to introduce a method which is quite specific for given type of queries where you can use existing index to improve the overall perrformance of query.

This is related to the data correlation between the non clustered index key and clustered index key.In DB terms it is called as clustering of the data as well.E.g. Orderdate in Sales.salesorderheader table and the Salesorderid has a correlation such that when one value is increasing other one is increasing as well.Like for a given orderdate the salesorderid will be greater than the salesorderid for any orderdate before the given orderdate.

I am giving the query below which has all the explaination.

PS: This method is quite specific as the data is perfectly correlated. Thus before applying this technique make sure following
1. You have used other Performance Techniques like creating covering index and any other technique.
2. Also,make sure that the clustering is almost perfect between non clustered index key + clustered index key

When you have applied the normal performance technique and still the performance is not improved and the data clustering is perfect. Then try this method and benchmark the performance of this method against your existing or best method. Then decide whether this technique is good enough as compared to your best method.

Benchmark your perf results before applying any new change.

In below code the elpased time and CPU times varies a lot as i have actual exec plan enabled. This impcats these metrics. Thus you should run the query without using that and see the avg cpu time and avg elapsed time over a number of executions.

This method is not recomended if the data clustering is not good or your query retrunrs most of the table’s data.

use AdventureWorks
go
if OBJECT_ID(‘actsoh’) is not null
begin
drop table actsoh
end
go
select * into actsoh from sales.SalesOrderHeader
go
create unique clustered index idx_pk_salesorderid on actsoh(salesorderid)
go
create nonclustered index idx_ord_dt on actsoh(orderdate)
go
update Statistics actsoh with fullscan
go

–now query the table say based on the order date range of say around 1 day
–1 month
–or 1 year
drop proc nproc
go
create proc nproc
(@dt datetime,
@days smallint = 0
)
as
begin
select * from actsoh
where orderdate between @dt and DATEADD(dd,@days,@dt)

end
go
exec nproc @dt= ‘20010701’,@days = 0 with recompile
–131 IO’s. 0 ms CPU,170ms elapsed time.. –using index seek + key lookup

–1 month
exec nproc @dt= ‘20010701’,@days = 30 with recompile

–554 IO’s. 0 ms ,382 ms –using index seek + key lookup
–try for 1 year
exec nproc @dt= ‘20010701’,@days = 365 with recompile
go
–804 IO’s. 31 ms CPU,791ms elapsed time..Using table scan or cluestered index scan

–Now I will apply my technique..In THis case the clustering is perfect..I will use the min and max salesorderid
drop proc clustproc
go
create proc clustproc
(@dt datetime,
@days smallint = 0
)
as
begin
declare @minsalesorderid int,@maxsalesorderid int

select
@minsalesorderid= MIN(SalesOrderID),@maxsalesorderid=max(SalesOrderID)
from
actsoh idx with (index = 2)
where
1=1
and idx.OrderDate between @dt and DATEADD(dd,@days,@dt)

select tbl.* from
actsoh tbl with (index = 1)
where
1=1
and tbl.SalesOrderID between @minsalesorderid and @maxsalesorderid
and tbl.OrderDate between @dt and DATEADD(dd,@days,@dt)
end
go

exec clustproc @dt= ‘20010701’,@days = 0 with recompile
–7 IO’s. 0 ms CPU,166ms elapsed time..
–1 month
exec clustproc @dt= ‘20010701’,@days = 30 with recompile

–10 IO’s. 0 ms ,223 ms
–try for 1 year
exec clustproc @dt= ‘20010701’,@days = 365 with recompile
go
–90 IO’s. 16 ms CPU,266ms elapsed time.
–for full table
exec clustproc @dt= ‘20020701’,@days = 30500 with recompile
go

–864 IO’s for full table..whereas normal table scan is just 803 IO’s. so in the case where you need full table’s data then this might be slightly
–bad performaing..

I will explain the disadvantages and the scenarios where it could cause perfromance get worse in the next post.