Groups > comp.databases.ms-sqlserver > #847 > unrolled thread

question on clustered indexes in sql-server

Started by	Lennart Jonsson <erik.lennart.jonsson@gmail.com>
First post	2011-11-29 16:39 +0100
Last post	2011-12-02 08:32 +0100
Articles	9 — 3 participants

Back to article view | Back to comp.databases.ms-sqlserver

  question on clustered indexes in sql-server Lennart Jonsson <erik.lennart.jonsson@gmail.com> - 2011-11-29 16:39 +0100
    Re: question on clustered indexes in sql-server "Bob Barrows" <reb01501@NOSPAMyahoo.com> - 2011-11-29 12:03 -0500
      Re: question on clustered indexes in sql-server Lennart Jonsson <erik.lennart.jonsson@gmail.com> - 2011-11-29 18:39 +0100
    Re: question on clustered indexes in sql-server Erland Sommarskog <esquel@sommarskog.se> - 2011-11-29 23:45 +0100
      Re: question on clustered indexes in sql-server Lennart Jonsson <erik.lennart.jonsson@gmail.com> - 2011-11-30 15:30 +0100
        Re: question on clustered indexes in sql-server Erland Sommarskog <esquel@sommarskog.se> - 2011-11-30 23:33 +0100
          Re: question on clustered indexes in sql-server Lennart Jonsson <erik.lennart.jonsson@gmail.com> - 2011-12-01 10:25 +0100
            Re: question on clustered indexes in sql-server Erland Sommarskog <esquel@sommarskog.se> - 2011-12-01 23:55 +0100
              Re: question on clustered indexes in sql-server Lennart Jonsson <erik.lennart.jonsson@gmail.com> - 2011-12-02 08:32 +0100

#847 — question on clustered indexes in sql-server

From	Lennart Jonsson <erik.lennart.jonsson@gmail.com>
Date	2011-11-29 16:39 +0100
Subject	question on clustered indexes in sql-server
Message-ID	<jb2ucg$tq7$1@dont-email.me>

What is the purpose of a clustered index in sql-server (as you probably
have guessed I have zero to none experience with sql-server)?

The reason I ask is because I look at a databas where more or less all
tables are designed as:

create table T (
	x int IDENTITY(1,1) NOT NULL,
[...]
 CONSTRAINT ... PRIMARY KEY CLUSTERED ( x ) ...
	
In db2 I would look at range predicates and order by clauses on queries
to determine what index that should be clustered (inorder to avoid sorts).

What is the rationale to use a clustering index like the one above?

/Lennart

[toc] | [next] | [standalone]

#848

From	"Bob Barrows" <reb01501@NOSPAMyahoo.com>
Date	2011-11-29 12:03 -0500
Message-ID	<jb3388$vjm$1@dont-email.me>
In reply to	#847

Lennart Jonsson wrote:
> What is the purpose of a clustered index in sql-server (as you
> probably have guessed I have zero to none experience with sql-server)?
>
> The reason I ask is because I look at a databas where more or less all
> tables are designed as:
>
> create table T (
> x int IDENTITY(1,1) NOT NULL,
> [...]
> CONSTRAINT ... PRIMARY KEY CLUSTERED ( x ) ...
>
> In db2 I would look at range predicates and order by clauses on
> queries to determine what index that should be clustered (inorder to
> avoid sorts).
>
> What is the rationale to use a clustering index like the one above?
>
Kimberly Tripp sums up the debate (and there is, indeed, some controversy 
about this) quite nicely here:
http://sqlskills.com/BLOGS/KIMBERLY/post/The-Clustered-Index-Debate-again!.aspx

Make sure you follow the links near the beginning of the article.

[toc] | [prev] | [next] | [standalone]

#849

From	Lennart Jonsson <erik.lennart.jonsson@gmail.com>
Date	2011-11-29 18:39 +0100
Message-ID	<jb35c8$eb7$1@dont-email.me>
In reply to	#848

On 2011-11-29 18:03, Bob Barrows wrote:
[...]
> Kimberly Tripp sums up the debate (and there is, indeed, some controversy 
> about this) quite nicely here:
> http://sqlskills.com/BLOGS/KIMBERLY/post/The-Clustered-Index-Debate-again!.aspx
> 
> Make sure you follow the links near the beginning of the article.
> 

Thanks Bob. I've skimmed through the article and the links and I think I
got most of it. Clustering index is a different creature in sql-server
compared to db2. The clustering indexes used makes more sense given this
information. Thanks

/Lennart

[toc] | [prev] | [next] | [standalone]

#850

From	Erland Sommarskog <esquel@sommarskog.se>
Date	2011-11-29 23:45 +0100
Message-ID	<Xns9FACF1B805663Yazorman@127.0.0.1>
In reply to	#847

Lennart Jonsson (erik.lennart.jonsson@gmail.com) writes:
> What is the purpose of a clustered index in sql-server (as you probably
> have guessed I have zero to none experience with sql-server)?
> 
> The reason I ask is because I look at a databas where more or less all
> tables are designed as:
> 
> create table T (
>      x int IDENTITY(1,1) NOT NULL,
> [...]
>  CONSTRAINT ... PRIMARY KEY CLUSTERED ( x ) ...
>      
> In db2 I would look at range predicates and order by clauses on queries
> to determine what index that should be clustered (inorder to avoid sorts).
> 
> What is the rationale to use a clustering index like the one above?

Probably database design on autopilot.

I don't know about DB2, but I know that in Oracle, heaps are the norm,
and index-organised tables is something you use rarely. In SQL Server,
it is the other way around. The clustered index is the normal thing,
and heaps is something you only use sometimes. Except in SQL Azure, 
where heaps are not even supported. All mindsets in SQL Server is
geared on clustered indexes, and you better know what you are doing if
use a heap.

As a consequence of this, by default when you define a primary key,
it will be clustered, unless there already is a clustered index. Which
there rarely is, since the PK is typically the first index.

Furthermore, many inexperienced developers slaps IDENTITY column in
each table (often with all other columns nullable), so that's why
you get it.

That said, there are also sound reasons to have an clustered index
on an IDENTITY column to avoid fragmentation. But that does not 
apply to all tables you see. And, incidently, nor if you want really
good INSERT performance, since you get a single hot-spot. Thomas
Kejser had a good presentation on this on SQL Rally where he talked
about getting really good INSERT performance on flash drives.

-- 
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx

[toc] | [prev] | [next] | [standalone]

#851

From	Lennart Jonsson <erik.lennart.jonsson@gmail.com>
Date	2011-11-30 15:30 +0100
Message-ID	<jb5eld$cj7$1@dont-email.me>
In reply to	#850

On 2011-11-29 23:45, Erland Sommarskog wrote:
[...]
> 
> I don't know about DB2, but I know that in Oracle, heaps are the norm,
> and index-organised tables is something you use rarely. In SQL Server,
> it is the other way around. The clustered index is the normal thing,
> and heaps is something you only use sometimes. Except in SQL Azure, 
> where heaps are not even supported. All mindsets in SQL Server is
> geared on clustered indexes, and you better know what you are doing if
> use a heap.
> 

Hi Erland, reading the links provided by Bob made me realize that that
are some essential differences between db2 and sql-server (when it comes
to clustering indexes anyhow). If I got it right, the leaf pages in a
clustered index in sql-server is the data pages. In db2 leaf pages
contains a pointer to the data page just like any other index.

In db2 the main focus is how queries may benefit from the clustering
index (reduce i/o and sort), and clustering indexes is therefor not
added until one is knows what the typical queries are. This may of
course be known at design time, but is often not discovered until later.

Where the clustering strategy used in my example would have been
absolutely braindead in db2, it makes a whole lot more sense in
sql-server (even if there seems to be some controversy of what strategy
to choose).

Looking at other constructions in the datamodel, I'll bet a dollar or
two on your auto pilot hyphothesis.

Cheers
/Lennart

[...]

[toc] | [prev] | [next] | [standalone]

#852

From	Erland Sommarskog <esquel@sommarskog.se>
Date	2011-11-30 23:33 +0100
Message-ID	<Xns9FADEFAF7E378Yazorman@127.0.0.1>
In reply to	#851

Lennart Jonsson (erik.lennart.jonsson@gmail.com) writes:
> If I got it right, the leaf pages in a clustered index in sql-server is
> the data pages. 

Yes, that is exactly essense.

> In db2 leaf pages contains a pointer to the data page just like any
> other index. 

So then in DB2, what is the difference between a clustered index and a
non-clustered index?



-- 
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx

[toc] | [prev] | [next] | [standalone]

#853

From	Lennart Jonsson <erik.lennart.jonsson@gmail.com>
Date	2011-12-01 10:25 +0100
Message-ID	<jb7h5k$k4j$1@dont-email.me>
In reply to	#852

On 2011-11-30 23:33, Erland Sommarskog wrote:
[...]
>> In db2 leaf pages contains a pointer to the data page just like any
>> other index. 
> 
> So then in DB2, what is the difference between a clustered index and a
> non-clustered index?
> 

If the index is clustered the data pages are ordered according to the
clustering index. When a row is inserted, db2 finds the page where the
row should reside. If the row does'nt fit there, db2 looks in a
neighbourhould of the page. If that does not succedd etheir, the row is
put at the end. In these cases a pointer to the chosen page is stored in
the page where the row should have been. If there are many "overflow"
pages in a table, additional I/O is required when reading pages, and a
reorg of the table should be performed.

For a table without a clustering index db2 store the rows in no
particular order.

/Lennart

[toc] | [prev] | [next] | [standalone]

#854

From	Erland Sommarskog <esquel@sommarskog.se>
Date	2011-12-01 23:55 +0100
Message-ID	<Xns9FAEF36A48289Yazorman@127.0.0.1>
In reply to	#853

Lennart Jonsson (erik.lennart.jonsson@gmail.com) writes:
> If the index is clustered the data pages are ordered according to the
> clustering index. When a row is inserted, db2 finds the page where the
> row should reside. If the row does'nt fit there, db2 looks in a
> neighbourhould of the page. If that does not succedd etheir, the row is
> put at the end. In these cases a pointer to the chosen page is stored in
> the page where the row should have been. If there are many "overflow"
> pages in a table, additional I/O is required when reading pages, and a
> reorg of the table should be performed.
 
So the good news is that there are no page splits.

The bad news is that scans along the clustered index can be jumping forth 
and back.

Am I right?

-- 
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx

[toc] | [prev] | [next] | [standalone]

#855

From	Lennart Jonsson <erik.lennart.jonsson@gmail.com>
Date	2011-12-02 08:32 +0100
Message-ID	<jb9uue$48d$1@dont-email.me>
In reply to	#854

On 2011-12-01 23:55, Erland Sommarskog wrote:
> Lennart Jonsson (erik.lennart.jonsson@gmail.com) writes:
>> If the index is clustered the data pages are ordered according to the
>> clustering index. When a row is inserted, db2 finds the page where the
>> row should reside. If the row does'nt fit there, db2 looks in a
>> neighbourhould of the page. If that does not succedd etheir, the row is
>> put at the end. In these cases a pointer to the chosen page is stored in
>> the page where the row should have been. If there are many "overflow"
>> pages in a table, additional I/O is required when reading pages, and a
>> reorg of the table should be performed.
>  
> So the good news is that there are no page splits.
> 
> The bad news is that scans along the clustered index can be jumping forth 
> and back.
> 
> Am I right?
>

Indeed, a typical rule of thumb is that when overflows (jumps) > 3% of
read rows, it's time to reorganize the table (and eventually indexes on
the table as well). There are other indicators of when reorg is needed,
but this is the one I use most.


/Lennart

[toc] | [prev] | [standalone]

csiph-web

question on clustered indexes in sql-server

Contents

#847 — question on clustered indexes in sql-server

#848

#849

#850

#851

#852

#853

#854

#855