Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.databases.ms-sqlserver > #1286
| Newsgroups | comp.databases.ms-sqlserver |
|---|---|
| Date | 2012-09-14 05:05 -0700 |
| References | <37389863-6d11-4092-9f04-2e27dbccbd2b@googlegroups.com> |
| Message-ID | <aa20c002-6e6e-4f36-b678-d4c2c736dada@googlegroups.com> (permalink) |
| Subject | Re: table design for massive GPS data |
| From | rja.carnegie@gmail.com |
Off the top of a semi-qualified head: If the table is indexed well, then query performance will depend on the volume of data in a query, but not so much on the size of a table and the rows that you aren't looking at. If the typical query is all the records for one vehicle on one calendar day (one working day), then a primary key of date, vehicle id, time, looks possible. If you want to count work after midnight for the previous day, that could be messier - you could think about using datetimeoffset. I expect such a design also to be efficient when you are deleting records that are 5 years old every day. If you use datetime order, then vehicle, then the server has to scan records of 1000 vehicles to get the data that you actually want to see. On the other hand, if you want the query for 1000 vehicles, then it's probably okay if they're mixed together. If data arrives in real time, then a clustered index of date, vehicle, time, may slow down updates, while favouring the query. Usually you want it that way around. The clustered index makes SQL Server physically store neighbouring records together. So you'll have 1000 distinct points in the data file where each vehicle's records are being added. You'd also see data filling pages that are then split into two, a lot. But I expect the server still to perform well in that case. There also may be an implication of the growth of the transaction log, but if data is inserted one row at a time then it won't make much of a difference - the log ray be pretty big whatever you do. Your primary key is of course unique and doesn't allow NULL and by default is a clustered index, but none of this is necessary, including even having a primary key as such. Indexes that aren't the primary key will work just as well. The clustered index does matter; it's also desirable for its key to be short, since it's incorporated into any nonclustered indexes, so if there isn't a need to use the clustered index to reorganise the data then a clustered index on an IDENTITY(int, 1, 1) column can be pretty good. /Then/ you set your nonclustered primary key (which does have to be unique and not NULL). Including a non-meaningful IDENTITY in a primary key wouldn't be right, but it's right if it is, in fact, the key of your table in relationships with other data. The point is, if your data is not unique in the date, time, and vehicle ID fields, then don't get sidetracked into worrying about a primary key that you can't have. Alternatively, add 1 second to one of the duplicate key values, and it'll be unique. Primary key also is a constraint that must have a unique object name (PK_nameoftable probably), whereas you can and I do name other indexes IX_1, IX_2, IX_3, on every table in your database. This doesn't help other developers understand the database, but other developers don't often help me, so if they want to know what my indexes are, they can look 'em up. I probably have 500 nearly identical tables created at the same time. (Arguably I should have one table with 500 times as many rows and a couple of extra columns, but that isn't how we do things here.) So I don't want to invent 500 meaningful index names. By the way, this project sounds important enough that you'll be thinking carefully about data backup and recovering from disaster or downtime, if you don't already have that covered thoroughly for your platform. So, your thinking about backup and recovery also may bear on your database design. For instance, do you want that big, fat transaction log that holds everything that happened in your database between the last backup and the failure - or do you have files of the last 7 days' GPS data that you can re-load as part of your disaster recovery process, so that it doesn't matter if the database is blown away except for the twice-weekly taken-off-site backup?
Back to comp.databases.ms-sqlserver | Previous | Next — Previous in thread | Next in thread | Find similar
table design for massive GPS data migurus <migurus@yahoo.com> - 2012-09-13 14:47 -0700
Re: table design for massive GPS data bradbury9 <ray.bradbury9@gmail.com> - 2012-09-14 00:06 -0700
Re: table design for massive GPS data "Bob Barrows" <reb01501@NOSPAMyahoo.com> - 2012-09-14 07:33 -0400
Re: table design for massive GPS data migurus <migurus@yahoo.com> - 2012-09-14 10:13 -0700
Re: table design for massive GPS data rja.carnegie@gmail.com - 2012-09-14 05:05 -0700
Re: table design for massive GPS data migurus <migurus@yahoo.com> - 2012-09-14 10:25 -0700
Re: table design for massive GPS data Erland Sommarskog <esquel@sommarskog.se> - 2012-09-15 19:21 +0200
Re: table design for massive GPS data rja.carnegie@gmail.com - 2012-09-17 08:46 -0700
csiph-web