In SQL Server database there are mainly two types of indexes,
Clustered index and Non Clustered index, and difference between Clustered and
Non Clustered index is very important from SQL performance perspective. It is
also one of the most common SQL Interview question, similar to difference between truncate and delete, primary key or unique key or correlated
vs non correlated subquery. For those, who are not aware of benefits of
Index or why we use index in database, they help in making your SELECT query
faster. A query with index is sometime 100 times faster than a query without
index, of course depending upon how big your table is, but, you must index on
columns which are frequently used in WHERE clause of SELECT
query, or which forms major criterion for searching in database. For example in
Employee database, EmployeeId or EmployeeName are common
conditions to find an Employee in database. As I said, there can be either clustered index or non clustered index in
database, former is used to decide how data is physically stored in disk
and that's why there can be only one clustered index in any table. In this
article, we will explore more about both of this indexes and learn some key difference
between clustered and non clustered index from interview and performance
perspective.
2 words on Indexes in SQL
Continuing from first paragraph, Index are used to make search faster in
SQL. They are mostly maintained as balanced tree (BST), where tree traversal
gives you performance in order of log(N). In case
of clustered index, data is present in leaf node, so when we run a particular
query, which uses clustered index, we can directly find the data by tree
traversal. Query optimizer is a component of database, which decides whether to
use an index or not to execute a SELECT
query, or if use index then which one. You can even see, which index is
used for executing your query by looking at query plan, a FULL
TABLE SCAN means no index is used and every row of table is scanned by database to
find data, on the other hand INDEX UNIQUE SCAN or INDEX
RANGE SCAN suggest use of Index for finding data. By the Index also has there
own disadvantage as they make INSERT and UPDATE query
slower and they also need space. A careful use of index is the best way to
go.
Clustered vs Non Clustered Index in SQL
Now we have some idea about what is Index in database and how they work,
it's time to look some key differences between clustered and non clustered
index in SQL Server, which is mostly true for other database as well e.g.
Oracle or MySQL.
1) One of the main difference between clustered and non clustered index
in SQL Server is that, one table can only have one clustered Index but
It can have many non clustered index, approximately 250. This limitation comes
from the fact clustered index is used to determines how data is stored
physically in table. You should be very careful while choosing clustered index
and should use columns which can be queried in range e.g. select
* from Employee where EMP_ID > 20 and EMP_ID < 50. Since
clustered index stores data in cluster, related data are stored together and
it's easy for database to retrieve all data in one shot. This further reduces
lots of disk IO which is very expensive operation. Clustered Index is also very
good on finding unique values in a table e.g. queries like select
* from Employee where EMP_ID=40; can be very fast if EMP_ID has
clustered index on it.
2) Another key difference between Clustered Index and Non Clustered Index
in database is that many relational database including SQL Server by
default creates clustered index on PRIMARY
KEY constraint, if there
is no clustered index exists in database and a nonclustered index is not
specified while declaring PRIMARY KEY constraint.
3) One more difference between them is that, clustered index contains
data i..e rows in there leaf node, as Index is represented as BST, while
nonclustered index contains pointer to data (address or rows) in there leaf node,
which means one more extra step to get the data.
4) By the way there is a misconception that we can only define clustered
index with one column, which is not true. You can create clustered index with
multiple columns, known as composite index. For example in Employee table, a
composite index on firstname and lastname can be a good clustered index,
because most of the query uses this as criterion. Though you should try to
minimize number of columns in clustered index for better performance in SQL
Server. On related not, while declaring composite index, pay some attention to
the order of columns in index, that can decide which statement will use index
and which will not. In fact this is one of the mostly asked question as, does
order of columns in composite index matters.
Last but not the least, pay some attention while creating clustered
and non clustered index in database. Create clustered index for columns which
contains unique values, are accessed sequentially, used in range queries and
return large result set. Avoid creating clustered index on columns, which are
update frequently because that would lead rearrangement of rows on disk level,
a potentially slow operation.
That's all on difference between clustered and nonclustered index in
SQL Server database. Remember that, it's possible to create clustered index
on non PRIMARY KEY column and PRIMARY
KEY constraint only creates a clustered index, if there is not already
in database and a nonclustered index is not provided. Key difference is that,
clustered index decides physical sorting or order of data in disk.
Tidak ada komentar:
Posting Komentar