Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.databases.ms-sqlserver > #1288
| Newsgroups | comp.databases.ms-sqlserver |
|---|---|
| Date | 2012-09-14 10:25 -0700 |
| References | <37389863-6d11-4092-9f04-2e27dbccbd2b@googlegroups.com> <aa20c002-6e6e-4f36-b678-d4c2c736dada@googlegroups.com> |
| Message-ID | <86717762-58b1-4a3a-a68d-2177c6047df7@googlegroups.com> (permalink) |
| Subject | Re: table design for massive GPS data |
| From | migurus <migurus@yahoo.com> |
On Friday, September 14, 2012 5:05:05 AM UTC-7, rja.ca...@gmail.com wrote: > Off the top of a semi-qualified head: > > > > If the table is indexed well, then query performance will depend on the > > volume of data in a query, but not so much on the size of a table and the > > rows that you aren't looking at. > > Thanks, I had an impression that hundreds of millions of small records is not a sound design. > > If the typical query is all the records for one vehicle on one calendar day > > (one working day), then a primary key of date, vehicle id, time, looks > > possible. If you want to count work after midnight for the previous day, > > that could be messier - you could think about using datetimeoffset. > > > > I expect such a design also to be efficient when you are deleting > > records that are 5 years old every day. > > > > If you use datetime order, then vehicle, then the server has to scan > > records of 1000 vehicles to get the data that you actually want to see. > > On the other hand, if you want the query for 1000 vehicles, then > > it's probably okay if they're mixed together. > One vehicle at a time queries required. > > > If data arrives in real time, then a clustered index of date, vehicle, time, > > may slow down updates, while favouring the query. Usually you want it > > that way around. The clustered index makes SQL Server physically store > > neighbouring records together. So you'll have 1000 distinct points in > > the data file where each vehicle's records are being added. You'd also > > see data filling pages that are then split into two, a lot. But I expect > > the server still to perform well in that case. There also may be an > > implication of the growth of the transaction log, but if data is inserted > > one row at a time then it won't make much of a difference - the log ray be > > pretty big whatever you do. > There is no updates, as I mentioned. Inserts only. I love your idea of building three part index: date, vehicle_id, time. This is right to the point and not a big overhead space wise. > > > Your primary key is of course unique and doesn't allow NULL and by > > default is a clustered index, but none of this is necessary, > > including even having a primary key as such. Indexes that aren't > > the primary key will work just as well. The clustered index does > > matter; it's also desirable for its key to be short, since it's > > incorporated into any nonclustered indexes, so if there isn't a need > > to use the clustered index to reorganise the data then a clustered > > index on an IDENTITY(int, 1, 1) column can be pretty good. > > /Then/ you set your nonclustered primary key (which does have > > to be unique and not NULL). Including a non-meaningful IDENTITY > > in a primary key wouldn't be right, but it's right if it is, > > in fact, the key of your table in relationships with other data. > > > > The point is, if your data is not unique in the date, time, and > > vehicle ID fields, then don't get sidetracked into worrying about > > a primary key that you can't have. Alternatively, add 1 second > > to one of the duplicate key values, and it'll be unique. > > > > Primary key also is a constraint that must have a unique object name > > (PK_nameoftable probably), whereas you can and I do name other indexes > > IX_1, IX_2, IX_3, on every table in your database. This doesn't > > help other developers understand the database, but other developers > > don't often help me, so if they want to know what my indexes are, > > they can look 'em up. I probably have 500 nearly identical tables > > created at the same time. (Arguably I should have one table with > > 500 times as many rows and a couple of extra columns, but that isn't > > how we do things here.) So I don't want to invent 500 meaningful > > index names. > > > > By the way, this project sounds important enough that you'll be > > thinking carefully about data backup and recovering from disaster > > or downtime, if you don't already have that covered thoroughly > > for your platform. So, your thinking about backup and recovery > > also may bear on your database design. For instance, do you > > want that big, fat transaction log that holds everything that > > happened in your database between the last backup and the failure - > > or do you have files of the last 7 days' GPS data that you can > > re-load as part of your disaster recovery process, so that it > > doesn't matter if the database is blown away except for the > > twice-weekly taken-off-site backup? I was thinking about an hourly log shipping and daily or weekly full backups. Would really appreciate a comment on this idea. Thanks for your thoughts.
Back to comp.databases.ms-sqlserver | Previous | Next — Previous in thread | Next in thread | Find similar
table design for massive GPS data migurus <migurus@yahoo.com> - 2012-09-13 14:47 -0700
Re: table design for massive GPS data bradbury9 <ray.bradbury9@gmail.com> - 2012-09-14 00:06 -0700
Re: table design for massive GPS data "Bob Barrows" <reb01501@NOSPAMyahoo.com> - 2012-09-14 07:33 -0400
Re: table design for massive GPS data migurus <migurus@yahoo.com> - 2012-09-14 10:13 -0700
Re: table design for massive GPS data rja.carnegie@gmail.com - 2012-09-14 05:05 -0700
Re: table design for massive GPS data migurus <migurus@yahoo.com> - 2012-09-14 10:25 -0700
Re: table design for massive GPS data Erland Sommarskog <esquel@sommarskog.se> - 2012-09-15 19:21 +0200
Re: table design for massive GPS data rja.carnegie@gmail.com - 2012-09-17 08:46 -0700
csiph-web