Groups > comp.databases.ms-sqlserver > #1288

Re: table design for massive GPS data

Newsgroups	comp.databases.ms-sqlserver
Date	2012-09-14 10:25 -0700
References	<37389863-6d11-4092-9f04-2e27dbccbd2b@googlegroups.com> <aa20c002-6e6e-4f36-b678-d4c2c736dada@googlegroups.com>
Message-ID	<86717762-58b1-4a3a-a68d-2177c6047df7@googlegroups.com> (permalink)
Subject	Re: table design for massive GPS data
From	migurus <migurus@yahoo.com>

Show all headers | View raw

On Friday, September 14, 2012 5:05:05 AM UTC-7, rja.ca...@gmail.com wrote:
> Off the top of a semi-qualified head:
> 
> 
> 
> If the table is indexed well, then query performance will depend on the
> 
> volume of data in a query, but not so much on the size of a table and the
> 
> rows that you aren't looking at.
> 
> 

Thanks, I had an impression that hundreds of millions of small records is not a sound design.

> 
> If the typical query is all the records for one vehicle on one calendar day
> 
> (one working day), then a primary key of date, vehicle id, time, looks
> 
> possible.  If you want to count work after midnight for the previous day, 
> 
> that could be messier - you could think about using datetimeoffset.
> 
> 
> 
> I expect such a design also to be efficient when you are deleting 
> 
> records that are 5 years old every day.
> 
> 
> 
> If you use datetime order, then vehicle, then the server has to scan 
> 
> records of 1000 vehicles to get the data that you actually want to see.
> 
> On the other hand, if you want the query for 1000 vehicles, then
> 
> it's probably okay if they're mixed together.
> 

One vehicle at a time queries required.

> 
> 
> If data arrives in real time, then a clustered index of date, vehicle, time, 
> 
> may slow down updates, while favouring the query.  Usually you want it 
> 
> that way around.  The clustered index makes SQL Server physically store
> 
> neighbouring records together.  So you'll have 1000 distinct points in 
> 
> the data file where each vehicle's records are being added.  You'd also 
> 
> see data filling pages that are then split into two, a lot.  But I expect
> 
> the server still to perform well in that case.  There also may be an
> 
> implication of the growth of the transaction log, but if data is inserted
> 
> one row at a time then it won't make much of a difference - the log ray be 
> 
> pretty big whatever you do.
> 

There is no updates, as I mentioned. Inserts only.
I love your idea of building three part index: date, vehicle_id, time.
This is right to the point and not a big overhead space wise.

> 
> 
> Your primary key is of course unique and doesn't allow NULL and by 
> 
> default is a clustered index, but none of this is necessary, 
> 
> including even having a primary key as such.  Indexes that aren't 
> 
> the primary key will work just as well.  The clustered index does 
> 
> matter; it's also desirable for its key to be short, since it's
> 
> incorporated into any nonclustered indexes, so if there isn't a need 
> 
> to use the clustered index to reorganise the data then a clustered 
> 
> index on an IDENTITY(int, 1, 1) column can be pretty good.  
> 
> /Then/ you set your nonclustered primary key (which does have 
> 
> to be unique and not NULL).  Including a non-meaningful IDENTITY 
> 
> in a primary key wouldn't be right, but it's right if it is, 
> 
> in fact, the key of your table in relationships with other data.
> 
> 
> 
> The point is, if your data is not unique in the date, time, and 
> 
> vehicle ID fields, then don't get sidetracked into worrying about
> 
> a primary key that you can't have.  Alternatively, add 1 second 
> 
> to one of the duplicate key values, and it'll be unique.
> 
> 
> 
> Primary key also is a constraint that must have a unique object name
> 
> (PK_nameoftable probably), whereas you can and I do name other indexes 
> 
> IX_1, IX_2, IX_3, on every table in your database.  This doesn't 
> 
> help other developers understand the database, but other developers 
> 
> don't often help me, so if they want to know what my indexes are,
> 
> they can look 'em up.  I probably have 500 nearly identical tables
> 
> created at the same time.  (Arguably I should have one table with 
> 
> 500 times as many rows and a couple of extra columns, but that isn't
> 
> how we do things here.)  So I don't want to invent 500 meaningful 
> 
> index names.
> 
> 
> 
> By the way, this project sounds important enough that you'll be 
> 
> thinking carefully about data backup and recovering from disaster
> 
> or downtime, if you don't already have that covered thoroughly 
> 
> for your platform.  So, your thinking about backup and recovery
> 
> also may bear on your database design.  For instance, do you 
> 
> want that big, fat transaction log that holds everything that 
> 
> happened in your database between the last backup and the failure -
> 
> or do you have files of the last 7 days' GPS data that you can 
> 
> re-load as part of your disaster recovery process, so that it 
> 
> doesn't matter if the database is blown away except for the
> 
> twice-weekly taken-off-site backup?

I was thinking about an hourly log shipping and daily or weekly full backups. 
Would really appreciate a comment on this idea.

Thanks for your thoughts.

Back to comp.databases.ms-sqlserver | Previous | Next — Previous in thread | Next in thread | Find similar

Thread

table design for massive GPS data migurus <migurus@yahoo.com> - 2012-09-13 14:47 -0700
  Re: table design for massive GPS data bradbury9 <ray.bradbury9@gmail.com> - 2012-09-14 00:06 -0700
  Re: table design for massive GPS data "Bob Barrows" <reb01501@NOSPAMyahoo.com> - 2012-09-14 07:33 -0400
    Re: table design for massive GPS data migurus <migurus@yahoo.com> - 2012-09-14 10:13 -0700
  Re: table design for massive GPS data rja.carnegie@gmail.com - 2012-09-14 05:05 -0700
    Re: table design for massive GPS data migurus <migurus@yahoo.com> - 2012-09-14 10:25 -0700
  Re: table design for massive GPS data Erland Sommarskog <esquel@sommarskog.se> - 2012-09-15 19:21 +0200
    Re: table design for massive GPS data rja.carnegie@gmail.com - 2012-09-17 08:46 -0700

csiph-web