Wednesday, January 1, 2014

Non-Clustered Index in SQL Server

As I said in my earlier post, Index acts like a pointer. In case of Non-Clustered Index, it stores a key value and address of the data row. Index is mainly used to sort the data rows. Apart from Clustered Index, SQL Server allows you to have a Non-Clustered Index on a table.

A table can have only one clustered index in any version of SQL Server but until SQL Server 2008 R2, there can be 249 Non-Clustered Indexes per table. This limit has increased in SQL Server 2012 to 999. Non-Clustered Index doesn't sort the data rows in a physical order they are stored on disk. It does a logical sorting and sorting mechanism is as it is for Clustered Index. It is ASC (Ascending) by default and user can manually specify it as DESC (Descending). Non-Clustered Index can be Unique or Non-Unique.

Unlike Clustered Index, Non-Clustered Index sorts the data rows in a logical order but not in a physical order. Physically rows are present as they are stored on disk but on creating a Non-Clustered Index, it appears to the user as they are sorted by his/her choice.

Architecture


Levels of a nonclustered index

This is the architecture of Non-Clustered Index specified by Microsoft in its official documentation. Like Clustered Index, it is also a 3 level architecture. At the bottom level, the data pages are present. If there is no clustered index created before then these data pages store the addresses of data rows directly. A table without any index is called HEAP. If there is already a clustered index existing on table then data pages of Non-Clustered Index store the key values of Clustered Index. If the clustered index is unique then directly those unique key values are stored in data pages. If the clustered index is non-unique index then SQL Server makes all key values unique by adding a 4 byte number to every key value. User cannot know what is that 4 byte value, it's done internally by SQL Server. This 4 byte number is called UNIQUEIFIER.

Like Clustered Index architecture, Non-Clustered Index architecture also has a doubly linked list. Leaf Nodes contain either addresses of Data Rows or Clustered Index key values. Intermediate Level Nodes point to the leaf nodes while Root Node points to the Intermediate Level Nodes.

In the above specified Microsoft's architecture picture, there is a minor mistake. It specified that index_id > 0. It should be index_id > 1 because the id 1 is always reserved for Clustered Index. Let's prove it.


SELECT * FROM sys.partitions


See the Result Pane, if the column value of index_id is 1 then it specifies Clustered Index. Any index_id value greater than 1 specifies Non-Clustered Index. As I said before, there can be 249 Non-Clustered Indexes per table up to SQL Server 2008 R2 and 999 in SQL Server 2012. If you are using 2008 R2 you can have index_id column value up to 250. If you are using 2012 then you can have it up to 1000. If for a table you create one Non-Clustered Index, index_id has 2 for an object_id. If you create one more Non-Clustered Index on the same table then index_id has 3 for the same object_id. If the index_id column has 0 then that specifies a Heap ( A table without any index).

As usual, to know about the indexes, use the following system view.

SELECT * FROM sys.indexes


From the Result Pane, you can notice what are the Clustered and Non-Clustered indexes present on what tables in your database. Remember that Heap is of type 0, Clustered Index is of type 1 and Non-Clustered Index is of type 2. If it is unique then the column is_unique has a value 1. 

As I said in my post on Clustered Index, if you create a Primary Key it automatically creates a unique clustered index. Here a small question may arise in mind.

Can I create a Primary Key which creates a Non-Clustered Index?

The answer is YES. You can create a Non-Clustered Index with a primary key. But you should specify it manually while creating it. If not SQL Server creates Clustered Index by default. Whether it is a Clustered or a Non-Clustered, an index created by Primary Key is always unique. Specify the server to create a Non-Clustered Index with a Primary Key as below:

ALTER TABLE <Table-Name> ADD CONSTRAINT PK_COLUMN1 PRIMARY KEY NONCLUSTERED (<Column-Name>)

Now you can check in Object Explorer that a Unique Non-Clustered index is created with Primary Key. 

Usage of Non-Clustered Index


Non-Clustered Index is not used by Optimizer always like it uses Clustered Index. Let's have a small demonstration of it.

Create a sample table and insert some rows into it as follows:


CREATE TABLE SampleTable (ID INT,Name VARCHAR(100))

INSERT SampleTable VALUES(1,'A')
INSERT SampleTable VALUES(2,'B')
INSERT SampleTable VALUES(3,'C')
INSERT SampleTable VALUES(4,'D')
INSERT SampleTable VALUES(5,'E')

Now create a Non-Clustered Index either unique or non-unique. For now I'm creating a non-unique index.

CREATE NONCLUSTERED INDEX NON_IX_ID ON SampleTable(ID)

Note a point here. I specified in the above statement that I'm creating a Non-Clustered Index. If you don't specify that keyword, Optimizer creates Non-Clustered Index by default.

CREATE INDEX NON_IX_ID ON SampleTable(ID)

This statement is equivalent to previous statement but specifying NONCLUSTERED is a good SQL practice.

Now query data from the table by turning ON the Actual Execution Plan.

SELECT * FROM SampleTable

Now see the Actual Execution Plan, Optimizer performs Table scan instead of performing Index Scan. 


Let's try for another alternative by including a query predicate.

SELECT ID,Name FROM SampleTable WHERE ID=1

See the Execution Plan, even in this case optimizer performs Table Scan instead of Index Scan.


If you query only indexed column and if you include a predicate for that then Optimizer uses the index. Try running both the below queries.

SELECT ID FROM SampleTable

SELECT ID FROM SampleTable WHERE ID=1

See the Execution plan for both the above queries. Optimizer performs Index Scan and Index Seek operations respectively.



We may get surprised why our index is not being used when full table is queried. For that we can force the Optimizer to use our index using a hint. Let's do that now.

SELECT * FROM SampleTable WITH (INDEX(NON_IX_ID))

See the Execution Plan now:


As we forced, Optimizer used index but as per its wish it also performed RID Lookup operation in the heap. This means it scanned the whole table along with scanning index. It looked every row in table and every row in Index and then performed Inner Join of both. The matching values are returned. This becomes a burden for optimizer every time we use a forcing hint. 

Now let's analyze why Optimizer ignored the index and went for Table Scan. We'll analyze the costs incurred for Optimizer for doing both the operations.



In the above figures, the Cost of Nested Loop belongs to the query where we used hint to use our index. Carefully observe the values we got for I/O cost, Operator Cost and CPU Cost. Sum the costs of Heap, Index Scan and Nested loops and then compare the result value with Table Scan. We can notice that for Table Scan, the costs incurred are lesser. However, the job of Query Optimizer is to find the cheapest execution plan among all the available plans. So it chose to perform Table Scan rather than Index Scan. Here our table is a heap, so it performed RID Lookup. If there is a Clustered Index then Optimizer would perform Key Lookup which scans Clustered Index. The major difference in usage of Clustered Index and Non-Clustered Index is that Clustered Index stores the entire data row in its leaf nodes while Non-Clustered Index stores only the addresses of Data Rows in its leaf nodes.

To have our index used for any kind of querying, we should have INCLUDED COLUMNS for it. I'll write about this concept in my future posts.

Disadvantages of Non-Clustered Index


Non-Clustered Index has disadvantages and some limitations.

Disk Space


As I demonstrated above, for querying whole table, Optimizer performs Table Scan instead of Index Scan. In such cases Index doesn't come to work. This also consumes Disk Space. So before creating index data in your table should be analyzed whether index is needed for it.

Fragmentation


Generally in our computer disks, if we do more file deletions and creation there occurs a fragmentation of disk space. The same is the case with index. If more number of INSERT, UPDATE and DELETE operations are performed on the table with index then it results in Fragmentation of index. This fragmentation causes slow execution of query.

No comments:

Post a Comment