Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.databases.ms-sqlserver > #847 > unrolled thread
| Started by | Lennart Jonsson <erik.lennart.jonsson@gmail.com> |
|---|---|
| First post | 2011-11-29 16:39 +0100 |
| Last post | 2011-12-02 08:32 +0100 |
| Articles | 9 — 3 participants |
Back to article view | Back to comp.databases.ms-sqlserver
question on clustered indexes in sql-server Lennart Jonsson <erik.lennart.jonsson@gmail.com> - 2011-11-29 16:39 +0100
Re: question on clustered indexes in sql-server "Bob Barrows" <reb01501@NOSPAMyahoo.com> - 2011-11-29 12:03 -0500
Re: question on clustered indexes in sql-server Lennart Jonsson <erik.lennart.jonsson@gmail.com> - 2011-11-29 18:39 +0100
Re: question on clustered indexes in sql-server Erland Sommarskog <esquel@sommarskog.se> - 2011-11-29 23:45 +0100
Re: question on clustered indexes in sql-server Lennart Jonsson <erik.lennart.jonsson@gmail.com> - 2011-11-30 15:30 +0100
Re: question on clustered indexes in sql-server Erland Sommarskog <esquel@sommarskog.se> - 2011-11-30 23:33 +0100
Re: question on clustered indexes in sql-server Lennart Jonsson <erik.lennart.jonsson@gmail.com> - 2011-12-01 10:25 +0100
Re: question on clustered indexes in sql-server Erland Sommarskog <esquel@sommarskog.se> - 2011-12-01 23:55 +0100
Re: question on clustered indexes in sql-server Lennart Jonsson <erik.lennart.jonsson@gmail.com> - 2011-12-02 08:32 +0100
| From | Lennart Jonsson <erik.lennart.jonsson@gmail.com> |
|---|---|
| Date | 2011-11-29 16:39 +0100 |
| Subject | question on clustered indexes in sql-server |
| Message-ID | <jb2ucg$tq7$1@dont-email.me> |
What is the purpose of a clustered index in sql-server (as you probably have guessed I have zero to none experience with sql-server)? The reason I ask is because I look at a databas where more or less all tables are designed as: create table T ( x int IDENTITY(1,1) NOT NULL, [...] CONSTRAINT ... PRIMARY KEY CLUSTERED ( x ) ... In db2 I would look at range predicates and order by clauses on queries to determine what index that should be clustered (inorder to avoid sorts). What is the rationale to use a clustering index like the one above? /Lennart
[toc] | [next] | [standalone]
| From | "Bob Barrows" <reb01501@NOSPAMyahoo.com> |
|---|---|
| Date | 2011-11-29 12:03 -0500 |
| Message-ID | <jb3388$vjm$1@dont-email.me> |
| In reply to | #847 |
Lennart Jonsson wrote: > What is the purpose of a clustered index in sql-server (as you > probably have guessed I have zero to none experience with sql-server)? > > The reason I ask is because I look at a databas where more or less all > tables are designed as: > > create table T ( > x int IDENTITY(1,1) NOT NULL, > [...] > CONSTRAINT ... PRIMARY KEY CLUSTERED ( x ) ... > > In db2 I would look at range predicates and order by clauses on > queries to determine what index that should be clustered (inorder to > avoid sorts). > > What is the rationale to use a clustering index like the one above? > Kimberly Tripp sums up the debate (and there is, indeed, some controversy about this) quite nicely here: http://sqlskills.com/BLOGS/KIMBERLY/post/The-Clustered-Index-Debate-again!.aspx Make sure you follow the links near the beginning of the article.
[toc] | [prev] | [next] | [standalone]
| From | Lennart Jonsson <erik.lennart.jonsson@gmail.com> |
|---|---|
| Date | 2011-11-29 18:39 +0100 |
| Message-ID | <jb35c8$eb7$1@dont-email.me> |
| In reply to | #848 |
On 2011-11-29 18:03, Bob Barrows wrote: [...] > Kimberly Tripp sums up the debate (and there is, indeed, some controversy > about this) quite nicely here: > http://sqlskills.com/BLOGS/KIMBERLY/post/The-Clustered-Index-Debate-again!.aspx > > Make sure you follow the links near the beginning of the article. > Thanks Bob. I've skimmed through the article and the links and I think I got most of it. Clustering index is a different creature in sql-server compared to db2. The clustering indexes used makes more sense given this information. Thanks /Lennart
[toc] | [prev] | [next] | [standalone]
| From | Erland Sommarskog <esquel@sommarskog.se> |
|---|---|
| Date | 2011-11-29 23:45 +0100 |
| Message-ID | <Xns9FACF1B805663Yazorman@127.0.0.1> |
| In reply to | #847 |
Lennart Jonsson (erik.lennart.jonsson@gmail.com) writes: > What is the purpose of a clustered index in sql-server (as you probably > have guessed I have zero to none experience with sql-server)? > > The reason I ask is because I look at a databas where more or less all > tables are designed as: > > create table T ( > x int IDENTITY(1,1) NOT NULL, > [...] > CONSTRAINT ... PRIMARY KEY CLUSTERED ( x ) ... > > In db2 I would look at range predicates and order by clauses on queries > to determine what index that should be clustered (inorder to avoid sorts). > > What is the rationale to use a clustering index like the one above? Probably database design on autopilot. I don't know about DB2, but I know that in Oracle, heaps are the norm, and index-organised tables is something you use rarely. In SQL Server, it is the other way around. The clustered index is the normal thing, and heaps is something you only use sometimes. Except in SQL Azure, where heaps are not even supported. All mindsets in SQL Server is geared on clustered indexes, and you better know what you are doing if use a heap. As a consequence of this, by default when you define a primary key, it will be clustered, unless there already is a clustered index. Which there rarely is, since the PK is typically the first index. Furthermore, many inexperienced developers slaps IDENTITY column in each table (often with all other columns nullable), so that's why you get it. That said, there are also sound reasons to have an clustered index on an IDENTITY column to avoid fragmentation. But that does not apply to all tables you see. And, incidently, nor if you want really good INSERT performance, since you get a single hot-spot. Thomas Kejser had a good presentation on this on SQL Rally where he talked about getting really good INSERT performance on flash drives. -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Links for SQL Server Books Online: SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
[toc] | [prev] | [next] | [standalone]
| From | Lennart Jonsson <erik.lennart.jonsson@gmail.com> |
|---|---|
| Date | 2011-11-30 15:30 +0100 |
| Message-ID | <jb5eld$cj7$1@dont-email.me> |
| In reply to | #850 |
On 2011-11-29 23:45, Erland Sommarskog wrote: [...] > > I don't know about DB2, but I know that in Oracle, heaps are the norm, > and index-organised tables is something you use rarely. In SQL Server, > it is the other way around. The clustered index is the normal thing, > and heaps is something you only use sometimes. Except in SQL Azure, > where heaps are not even supported. All mindsets in SQL Server is > geared on clustered indexes, and you better know what you are doing if > use a heap. > Hi Erland, reading the links provided by Bob made me realize that that are some essential differences between db2 and sql-server (when it comes to clustering indexes anyhow). If I got it right, the leaf pages in a clustered index in sql-server is the data pages. In db2 leaf pages contains a pointer to the data page just like any other index. In db2 the main focus is how queries may benefit from the clustering index (reduce i/o and sort), and clustering indexes is therefor not added until one is knows what the typical queries are. This may of course be known at design time, but is often not discovered until later. Where the clustering strategy used in my example would have been absolutely braindead in db2, it makes a whole lot more sense in sql-server (even if there seems to be some controversy of what strategy to choose). Looking at other constructions in the datamodel, I'll bet a dollar or two on your auto pilot hyphothesis. Cheers /Lennart [...]
[toc] | [prev] | [next] | [standalone]
| From | Erland Sommarskog <esquel@sommarskog.se> |
|---|---|
| Date | 2011-11-30 23:33 +0100 |
| Message-ID | <Xns9FADEFAF7E378Yazorman@127.0.0.1> |
| In reply to | #851 |
Lennart Jonsson (erik.lennart.jonsson@gmail.com) writes: > If I got it right, the leaf pages in a clustered index in sql-server is > the data pages. Yes, that is exactly essense. > In db2 leaf pages contains a pointer to the data page just like any > other index. So then in DB2, what is the difference between a clustered index and a non-clustered index? -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Links for SQL Server Books Online: SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
[toc] | [prev] | [next] | [standalone]
| From | Lennart Jonsson <erik.lennart.jonsson@gmail.com> |
|---|---|
| Date | 2011-12-01 10:25 +0100 |
| Message-ID | <jb7h5k$k4j$1@dont-email.me> |
| In reply to | #852 |
On 2011-11-30 23:33, Erland Sommarskog wrote: [...] >> In db2 leaf pages contains a pointer to the data page just like any >> other index. > > So then in DB2, what is the difference between a clustered index and a > non-clustered index? > If the index is clustered the data pages are ordered according to the clustering index. When a row is inserted, db2 finds the page where the row should reside. If the row does'nt fit there, db2 looks in a neighbourhould of the page. If that does not succedd etheir, the row is put at the end. In these cases a pointer to the chosen page is stored in the page where the row should have been. If there are many "overflow" pages in a table, additional I/O is required when reading pages, and a reorg of the table should be performed. For a table without a clustering index db2 store the rows in no particular order. /Lennart
[toc] | [prev] | [next] | [standalone]
| From | Erland Sommarskog <esquel@sommarskog.se> |
|---|---|
| Date | 2011-12-01 23:55 +0100 |
| Message-ID | <Xns9FAEF36A48289Yazorman@127.0.0.1> |
| In reply to | #853 |
Lennart Jonsson (erik.lennart.jonsson@gmail.com) writes: > If the index is clustered the data pages are ordered according to the > clustering index. When a row is inserted, db2 finds the page where the > row should reside. If the row does'nt fit there, db2 looks in a > neighbourhould of the page. If that does not succedd etheir, the row is > put at the end. In these cases a pointer to the chosen page is stored in > the page where the row should have been. If there are many "overflow" > pages in a table, additional I/O is required when reading pages, and a > reorg of the table should be performed. So the good news is that there are no page splits. The bad news is that scans along the clustered index can be jumping forth and back. Am I right? -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Links for SQL Server Books Online: SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
[toc] | [prev] | [next] | [standalone]
| From | Lennart Jonsson <erik.lennart.jonsson@gmail.com> |
|---|---|
| Date | 2011-12-02 08:32 +0100 |
| Message-ID | <jb9uue$48d$1@dont-email.me> |
| In reply to | #854 |
On 2011-12-01 23:55, Erland Sommarskog wrote: > Lennart Jonsson (erik.lennart.jonsson@gmail.com) writes: >> If the index is clustered the data pages are ordered according to the >> clustering index. When a row is inserted, db2 finds the page where the >> row should reside. If the row does'nt fit there, db2 looks in a >> neighbourhould of the page. If that does not succedd etheir, the row is >> put at the end. In these cases a pointer to the chosen page is stored in >> the page where the row should have been. If there are many "overflow" >> pages in a table, additional I/O is required when reading pages, and a >> reorg of the table should be performed. > > So the good news is that there are no page splits. > > The bad news is that scans along the clustered index can be jumping forth > and back. > > Am I right? > Indeed, a typical rule of thumb is that when overflows (jumps) > 3% of read rows, it's time to reorganize the table (and eventually indexes on the table as well). There are other indicators of when reorg is needed, but this is the one I use most. /Lennart
[toc] | [prev] | [standalone]
Back to top | Article view | comp.databases.ms-sqlserver
csiph-web