Groups > comp.databases.ms-sqlserver > #1286

Re: table design for massive GPS data

Newsgroups	comp.databases.ms-sqlserver
Date	2012-09-14 05:05 -0700
References	<37389863-6d11-4092-9f04-2e27dbccbd2b@googlegroups.com>
Message-ID	<aa20c002-6e6e-4f36-b678-d4c2c736dada@googlegroups.com> (permalink)
Subject	Re: table design for massive GPS data
From	rja.carnegie@gmail.com

Show all headers | View raw

Off the top of a semi-qualified head:

If the table is indexed well, then query performance will depend on the
volume of data in a query, but not so much on the size of a table and the
rows that you aren't looking at.

If the typical query is all the records for one vehicle on one calendar day
(one working day), then a primary key of date, vehicle id, time, looks
possible.  If you want to count work after midnight for the previous day, 
that could be messier - you could think about using datetimeoffset.

I expect such a design also to be efficient when you are deleting 
records that are 5 years old every day.

If you use datetime order, then vehicle, then the server has to scan 
records of 1000 vehicles to get the data that you actually want to see.
On the other hand, if you want the query for 1000 vehicles, then
it's probably okay if they're mixed together.

If data arrives in real time, then a clustered index of date, vehicle, time, 
may slow down updates, while favouring the query.  Usually you want it 
that way around.  The clustered index makes SQL Server physically store
neighbouring records together.  So you'll have 1000 distinct points in 
the data file where each vehicle's records are being added.  You'd also 
see data filling pages that are then split into two, a lot.  But I expect
the server still to perform well in that case.  There also may be an
implication of the growth of the transaction log, but if data is inserted
one row at a time then it won't make much of a difference - the log ray be 
pretty big whatever you do.

Your primary key is of course unique and doesn't allow NULL and by 
default is a clustered index, but none of this is necessary, 
including even having a primary key as such.  Indexes that aren't 
the primary key will work just as well.  The clustered index does 
matter; it's also desirable for its key to be short, since it's
incorporated into any nonclustered indexes, so if there isn't a need 
to use the clustered index to reorganise the data then a clustered 
index on an IDENTITY(int, 1, 1) column can be pretty good.  
/Then/ you set your nonclustered primary key (which does have 
to be unique and not NULL).  Including a non-meaningful IDENTITY 
in a primary key wouldn't be right, but it's right if it is, 
in fact, the key of your table in relationships with other data.

The point is, if your data is not unique in the date, time, and 
vehicle ID fields, then don't get sidetracked into worrying about
a primary key that you can't have.  Alternatively, add 1 second 
to one of the duplicate key values, and it'll be unique.

Primary key also is a constraint that must have a unique object name
(PK_nameoftable probably), whereas you can and I do name other indexes 
IX_1, IX_2, IX_3, on every table in your database.  This doesn't 
help other developers understand the database, but other developers 
don't often help me, so if they want to know what my indexes are,
they can look 'em up.  I probably have 500 nearly identical tables
created at the same time.  (Arguably I should have one table with 
500 times as many rows and a couple of extra columns, but that isn't
how we do things here.)  So I don't want to invent 500 meaningful 
index names.

By the way, this project sounds important enough that you'll be 
thinking carefully about data backup and recovering from disaster
or downtime, if you don't already have that covered thoroughly 
for your platform.  So, your thinking about backup and recovery
also may bear on your database design.  For instance, do you 
want that big, fat transaction log that holds everything that 
happened in your database between the last backup and the failure -
or do you have files of the last 7 days' GPS data that you can 
re-load as part of your disaster recovery process, so that it 
doesn't matter if the database is blown away except for the
twice-weekly taken-off-site backup?

Back to comp.databases.ms-sqlserver | Previous | Next — Previous in thread | Next in thread | Find similar

Thread

table design for massive GPS data migurus <migurus@yahoo.com> - 2012-09-13 14:47 -0700
  Re: table design for massive GPS data bradbury9 <ray.bradbury9@gmail.com> - 2012-09-14 00:06 -0700
  Re: table design for massive GPS data "Bob Barrows" <reb01501@NOSPAMyahoo.com> - 2012-09-14 07:33 -0400
    Re: table design for massive GPS data migurus <migurus@yahoo.com> - 2012-09-14 10:13 -0700
  Re: table design for massive GPS data rja.carnegie@gmail.com - 2012-09-14 05:05 -0700
    Re: table design for massive GPS data migurus <migurus@yahoo.com> - 2012-09-14 10:25 -0700
  Re: table design for massive GPS data Erland Sommarskog <esquel@sommarskog.se> - 2012-09-15 19:21 +0200
    Re: table design for massive GPS data rja.carnegie@gmail.com - 2012-09-17 08:46 -0700

csiph-web