Perfromance issue related with nvarchar(max) in SQL server 2008

Lots of people has started using nvarchar(max) instead of defining the fixed length nvarchar columns. This is mostly to avoid the modification in the code related to the length of the columns or where you are not sure what would be right length for a column. This should be actually resolved at the database design level itself and database designer should make sure that they define a proper length for the column based on the business requirement.However, that is either not possible all the time or people just do not bother and simply create the column using the nvarchar(max). I will show you the disadvantages of using nvarchar(max) instead of the fixed length columns specially if the size of the column is less than 8000 bytes.The columns which are more than 8000 bytes are anyway can not use the fixed length columns as that would result in truncation of the data for some of your rows.

Anyway, below is the script which you could run to see why the nvarchar(max) should not be used where it is possible to use nvarchar(fixedlength).

Use AdventureWorks
go
–create a table with some records records..
if OBJECT_ID(‘nvarcharfixed’) is not null
drop table nvarcharfixed
go

create table nvarcharfixed ( id int primary key not null,col1 nvarchar(200) not null,col2 char(400) not null,col3 int not null)
go
insert into nvarcharfixed
select SalesOrderID,SUBSTRING(replicate(reverse(SalesOrderNumber),100),1,100),SalesOrderNumber,soh.CustomerID
from Sales.SalesOrderHeader soh
go

create nonclustered index idx_col1_fixed on nvarcharfixed(col3) include(col1)
go
update Statistics nvarcharfixed with fullscan
go

if OBJECT_ID(‘nvarcharmax’) is not null
drop table nvarcharmax
go
create table nvarcharmax ( id int primary key not null,col1 nvarchar(max) not null,col2 char(400) not null,col3 int not null)
go

insert into nvarcharmax
select *
from nvarcharfixed
go
create nonclustered index idx_col1_max on nvarcharmax(col3) include (col1)
go
update Statistics nvarcharmax with fullscan
go

–Now Let us check what are the Index size ,table size etc to see what impact nvarchar(max) has on storage.
with IndexPhysStats
as
(
select OBJECT_NAME(object_id) as TblName,* from sys.dm_db_index_physical_stats(db_id(),object_id(‘nvarcharfixed’),null,null,’DETAILED’)
union all
select OBJECT_NAME(object_id) as TblName,* from sys.dm_db_index_physical_stats(db_id(),object_id(‘nvarcharmax’),null,null,’DETAILED’)
)
select * from IndexPhysStats
go

–There is no difference between the record size,index_depth,page_count,space_used and even fragmentation..However, there is a lob_data row and offcourse it is empty..

–Now let us try to get say avlaue from both tables.turn on actual execution plan
set statistics io,time on
go
select * from nvarcharfixed where id = 43659
select * from nvarcharmax where id = 43659
go
set statistics io,time off
go

–Estimated cost is same and everything else is same..But expected row size is 4443B for the max and 619 B for fixed.
–This is big difference. This could impcat the sorting etc..because the optimizer will use 4443B to assign the memory in case of max
–then 619 B..So let us see what diff will it make

set statistics io,time on
go
select * from nvarcharfixed where id between 43659 and 43670 order by col1
select * from nvarcharmax where id between 43659 and 43670 order by col1
go
set statistics io,time off
go

–No diff except row size..Though 1 extra IO for max table..

–now query based on the col3..

set statistics io,time on
go
select * from nvarcharfixed where col3 = 442– order by col1
select * from nvarcharmax where col3 = 442– order by col1
go
set statistics io,time off
go
–No diff except row size..Though 1 extra IO for max table..

set statistics io,time on
go
select * from nvarcharfixed where col3 = 442 order by col1
select * from nvarcharmax where col3 = 442 order by col1
go
set statistics io,time off
go
–No diff except row size..Though 1 extra IO for max table..

–Now I would like to sort on the columns but for full table..

set statistics io,time on
go
select * from nvarcharfixed order by col1
select * from nvarcharmax order by col1
option(maxdop 1)
go
set statistics io,time off
go
–Look at the plan costs …3.99 for fixed and 356.5 for the max. That is lots of difference. this caused the optimizer
–to opt for the parallel plan for max..Alos Memory grant is almost 5 times. 25K for fixed and 121K for max.

–Now I will sort based on say column col2

set statistics io,time on
go
select * from nvarcharfixed order by col2
select * from nvarcharmax order by col2
option(maxdop 1)
go
set statistics io,time off
go

–same as above..

–Now exclude the col1 column and get remaining columns..

set statistics io,time on
go
select id,col3,col2 from nvarcharfixed order by col2
select id,col3,col2 from nvarcharmax order by col2
option(maxdop 1)
go
set statistics io,time off
go

–Now both have same row size,same estimated cost and same amount of memory grant..

–finally I would like to query based on col1.. There are no indexes on col1 so let us see what will be the diff..

set statistics io,time on
go
select * from nvarcharfixed where col1 like ‘10%’ order by col2
select * from nvarcharmax where col1 like ‘10%’ order by col2
option(maxdop 1)
go
set statistics io,time off
go

–Now there is no diff at all except the row size which is now 619 b and 643 b much olesser than 4443 bytes..
–Is it because of the non clustered index scan that we are getting the size of the row less?so we will avoid the
–use of non clustered index scan and then we will see what will happen.

set statistics io,time on
go
select * from nvarcharfixed where col1 like ‘%’ order by col2
select * from nvarcharmax where col1 like ‘%’ order by col2
option(maxdop 1)
go
set statistics io,time off
go
–If you will see the size is 619 b and 643 b. Thus the size of the row is reduced even though the non clustered index
–is not used..The memory grant is a bit more than the fixed.

–was it because the stats are updated and optimizer is using them for row size..??
–Before checking that let us rerun the table scan with order by

set statistics io,time on
go
select * from nvarcharfixed order by col2
select * from nvarcharmax order by col2
option(maxdop 1)
go
set statistics io,time off
go

–still memory gramt is 106K and estimated row size is 4443 bytes again..So stats did not do anything it is just that providing the
–col1 in where cluase is making diff..

–try again

set statistics io,time on
go
select * from nvarcharfixed where col1 like ‘%’ order by col2
select * from nvarcharmax where col1 like ‘%’ order by col2
option(maxdop 1)
go
set statistics io,time off
go

–row size is reduced to 643 bytes again and so does the ,e,ory gramt and estimated cost.. Thus using the nvarcharmax column in where
–reduce the size of estimated row, thus in turn estimated cost and memory grant..

–let me drop the statistis on the col1 column and then swicth the auto creation off as well and see how it will perform

select * from sys.stats where object_id = object_id(‘nvarcharmax’,’U’)
go
drop statistics nvarcharmax._WA_Sys_00000002_413A6905
go
select * from sys.stats where object_id = object_id(‘nvarcharmax’,’U’)
go
alter database adventureworks set auto_create_statistics off
go
set statistics io,time on
go
select * from nvarcharfixed where col1 like ‘%’ order by col2
select * from nvarcharmax where col1 like ‘%’ order by col2
option(maxdop 1)
go
set statistics io,time off
go

–The size of the row is again 4443 bytes and thus a large amount of memory grant is needed.In this case it is 74K
–You will get the warning as well that there are no stats for col1

alter database adventureworks set auto_create_statistics on
go

Conclusion:
So if you are sure that the length of your nvarchar column will be less than 8000.Then do not define the columns as nvarchar(max)
but rather define the nvarchar(fixedlength) wherever possible. The main advantage I see for using fixed length are following.

1. You can create an index on the fixed length column but can not create on the nvarchar(max) column. If you are using the column in where clause in your queries the you should always use fixed length column instead of nvarchar(max). This should be enough to use the nvarchar fixed lengthcolumn. I will specify others as well. However, index key length allowed is 900 bytes. Thus any nvarchar column which will exceed this value will cause issue and will be failed.
2. Memory grant could be a big issue where the server is already memroy starved.As expected row size is more the optimizer will
estimates more memory grant and this value will be much higher than actual required and this would be a waste of a resource as precisous as memory.
If you have couple of queries which are using nvarchar(amx) column and sorting is needed then server might have memroy related issues.
This could be a big perf issue.
3. You can not create the indexes online when the table or index has the nvarchar(max) column.
alter index all on nvarcharmax
rebuild with (online=on)

However, if you have defined your columns as nvarchar(max) and can not change it now. Then you can reduce the memory grant related issue
by creating the stats on the column and specify the column in where clause (if it doesnt exist already) with a condition like where colname like ‘%’ and this will make sure that
the memory grant will be less (atleast for the columns which have avg length than 8000) as the estimated row size will be reduced. This will make sure that
memory is not wasted and thus will reduce the memory pressure if it was caused by the memory grant related to navrcar(max) columns and order by or sort