Friday, March 30, 2012

Impact of SELECTed colums on the execution plan

Hi guys,

I have something weird that I want to understand.

I have a big table, containing around 17 millions of rows. This table has been progressively indexed over time, mainly by following some of the recommendations of the DB tuning advisor. As a result, we have around ten indexes on this table, some of them are using the new "INCLUDE" feature of SQL2005 indexes with non-key data.

The problem I have is the following : I have a very simple query (SELECT <some columns> FROM that_table WHERE <simple clause conditions>) which should benefit from the indexes built on the table. But depending on the columns I select, the execution plan varies totally !!

With one column selected, the good index is used, and the query is fast.

With two columns selected, the execution plan becomes complicated (several different indexes seeked in parallel). I guess the optimization system is trying to get benefit from some non-key data included in some of the indexes ? But I'm not sure...

With a "SELECT *" no appropriate index is used. The excution plan shows it will do a "clustered index scan" over the primary key (which is not part of the where clause), resulting in a full scan of the table... OK, I know that "SELECT *" is not a good practice, but anyway, this result is very surprising.

(I tried to make it short, I can detail the queries and the construction of the indexes if needed, but the main info is there I think)

How can it be possible to have so many differences, just by changing the list of SELECTed columns ? It's beyond my understanding of what is an index...

Thanks for your help

Mathieu

Hi Mathieu,

There's a few things that could be going on here, and it's hard to make a definitive call without investigating the query plan, but...

If you include columns in the select list that are not served by the index the used in order to serve the where clause, the engine will need to perform what's known as a bookmark lookup in order to retrieve the column values that are not included in the index. It of course gets a bit more complicated if the optimiser chooses parallel indexes.

So, if your query was something like:

SELECT id, name, dob

FROM tblPerson

WHERE id = 993

You can also try execting the query with OPTION (MAXDOP 1) in order to determine if the generation of a parallel query plan is introducing unnecessary overhead.

You would create an index on the id column and list name and dob columns in the new INCLUDE clause. This would be a good starting point, but you'd need to take into account all other queries against this table in order to decide if indeed this. Indexing is a huge subject, so get googling! :)

Cheers,

Rob

|||

To be clearer :

1st case :

SELECT a FROM table WHERE b=constant AND c>constant AND d LIKE 'string%'.

c AND d are covered by an index, and the column a is included (INCLUDE keyword) in that index. b, c, and d are regulars data columns (not involved in PK). The exec plan is fine, the query is fast.

2nd case : I just add a column in the SELECT, the where clause is left untouched

SELECT a, e FROM table WHERE b=constant AND c>constant AND d LIKE 'string%'.

Here, the column e is not included in the index mentionned hereunder. The exec plan becomes complicated, involving others indexes in parallel. The query becomes slow.

3rd case : retrieving all columns, the where clause is still left untouched

SELECT * FROM table WHERE b=constant AND c>constant AND d LIKE 'string%'.

This time, the exec plan is "clustered index scan" over the PK. But the PK does not appear in the where clause ! Result is a full scan of the 17millions of rows... catastrophic !

While writing this post, I'm getting convinced that SQL Server is perturbed by the INCLUDEd columns in the indexes... What's your opinion ?

|||

Hi Mathieu,

As mentioned, if a column in the select list is not included in the index, a bookmark lookup is used. So, in the 2nd case above, you would either include column e in the index used for the operation either as a key value or in the INCLUDE list.

In the 3rd case, it would appear the optimiser has decided that rather then performing lookups for all the columns not servicable by the index (ie the SELECT *), it has chosen to perform a full scan. I'd need to see the full query plan and the schema to provide more info, but it sounds like you need to revisit your indexing strategy from the ground up.

Although INCLUDE is new to 2005, I've never had nor heard of an issue directly related to its use (yet).

Cheers,

Rob

No comments:

Post a Comment