Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.databases.ms-sqlserver > #1283 > unrolled thread

table design for massive GPS data

Started bymigurus <migurus@yahoo.com>
First post2012-09-13 14:47 -0700
Last post2012-09-17 08:46 -0700
Articles 8 — 5 participants

Back to article view | Back to comp.databases.ms-sqlserver


Contents

  table design for massive GPS data migurus <migurus@yahoo.com> - 2012-09-13 14:47 -0700
    Re: table design for massive GPS data bradbury9 <ray.bradbury9@gmail.com> - 2012-09-14 00:06 -0700
    Re: table design for massive GPS data "Bob Barrows" <reb01501@NOSPAMyahoo.com> - 2012-09-14 07:33 -0400
      Re: table design for massive GPS data migurus <migurus@yahoo.com> - 2012-09-14 10:13 -0700
    Re: table design for massive GPS data rja.carnegie@gmail.com - 2012-09-14 05:05 -0700
      Re: table design for massive GPS data migurus <migurus@yahoo.com> - 2012-09-14 10:25 -0700
    Re: table design for massive GPS data Erland Sommarskog <esquel@sommarskog.se> - 2012-09-15 19:21 +0200
      Re: table design for massive GPS data rja.carnegie@gmail.com - 2012-09-17 08:46 -0700

#1283 — table design for massive GPS data

Frommigurus <migurus@yahoo.com>
Date2012-09-13 14:47 -0700
Subjecttable design for massive GPS data
Message-ID<37389863-6d11-4092-9f04-2e27dbccbd2b@googlegroups.com>
Windows Server/SQL Server 2008, we need to store GPS readings from roughly 1000 vehicles, ~ 300 readings per vehicle per day. The data comes in in real-time. Records are never updated after insert. That gives us roughly 120 million records per year. We need to keep last 5 years of records. 

The usage of stored data is to query vehicle positions and event attributes within entire day by given vehicle and date. Geospatial features and functions are not really needed for the reporting.

I'd like to discuss table design for these requirements.

Here is a preliminary record content:
- vehicle id
- gps reading X and Y
- gps reading time
- one-letter event attribute
- event id (nullable)

First idea is to keep data normalized in one table, one record per GPS reading. It looks very simple to implement, but I doubt the server will be happy with this high number of small records. I we were to do this, what would be a good candidate for primary key here?
Just for illustration I show table def for idea# 1
create table idea1 (
	VEHICLE_ID	int	not null
,	GPS_X		real	not null
,	GPS_Y		real	not null
,	GPS_TIME	datetime not null
,	EVENT_LETTER	char	not null
,	EVENT_ID	int	null
);
I don't show indexes as I am not clear what to use.
The typical query would look like this:
select
	GPS_X	
,	GPS_Y	
,	GPS_TIME
,	EVENT_LETTER
,	EVENT_ID
from
	idea1
where
	VEHICLE_ID=@id
and	GPS_TIME between @from and @to

	
Second idea is to keep a small table where records are inserted as they come in the structure shown above and then on the daily basis aggregate it and store as one record per vehicle in the another table, which will be used for queries. By doing this we shrink number of records from 120 million to be ~ 350 thousands per year. This looks much more manageable. The cons here are additional effort to come up with the way of storing and retrieving gps readings as a blob.
So, the table would look:
create table idea2 (
	VEHICLE_ID	int	not null
,	RUN_DATE	date	not null
,	GPS_READINGS	varbinary or whatever appropriate to keep daily portion of readings
-- we maight need reading_count as int here to help extract them from blob
);
And typical query would look like:
select
	GPS_READINGS
from
	idea2
where
	VEHICLE_ID=@id
and	RUN_DATE=@date
This result blob will need to be 'expanded', we might use sp to do it.

Any comments are welcome, critique is appreciated.
Thanks in advance.	
migurus.

[toc] | [next] | [standalone]


#1284

Frombradbury9 <ray.bradbury9@gmail.com>
Date2012-09-14 00:06 -0700
Message-ID<f14f137d-2075-48ae-bd86-04bb6c189d8f@googlegroups.com>
In reply to#1283
El jueves, 13 de septiembre de 2012 23:47:43 UTC+2, migurus  escribió:
> Windows Server/SQL Server 2008, we need to store GPS readings from roughly 1000 vehicles, ~ 300 readings per vehicle per day. The data comes in in real-time. Records are never updated after insert. That gives us roughly 120 million records per year. We need to keep last 5 years of records. 
> 
> 
> 
> The usage of stored data is to query vehicle positions and event attributes within entire day by given vehicle and date. Geospatial features and functions are not really needed for the reporting.
> 
> 
> 
> I'd like to discuss table design for these requirements.
> 
> 
> 
> Here is a preliminary record content:
> 
> - vehicle id
> 
> - gps reading X and Y
> 
> - gps reading time
> 
> - one-letter event attribute
> 
> - event id (nullable)
> 
> 
> 
> First idea is to keep data normalized in one table, one record per GPS reading. It looks very simple to implement, but I doubt the server will be happy with this high number of small records. I we were to do this, what would be a good candidate for primary key here?
> 
> Just for illustration I show table def for idea# 1
> 
> create table idea1 (
> 
> 	VEHICLE_ID	int	not null
> 
> ,	GPS_X		real	not null
> 
> ,	GPS_Y		real	not null
> 
> ,	GPS_TIME	datetime not null
> 
> ,	EVENT_LETTER	char	not null
> 
> ,	EVENT_ID	int	null
> 
> );
> 
> I don't show indexes as I am not clear what to use.
> 
> The typical query would look like this:
> 
> select
> 
> 	GPS_X	
> 
> ,	GPS_Y	
> 
> ,	GPS_TIME
> 
> ,	EVENT_LETTER
> 
> ,	EVENT_ID
> 
> from
> 
> 	idea1
> 
> where
> 
> 	VEHICLE_ID=@id
> 
> and	GPS_TIME between @from and @to
> 
> 
> 
> 	
> 
> Second idea is to keep a small table where records are inserted as they come in the structure shown above and then on the daily basis aggregate it and store as one record per vehicle in the another table, which will be used for queries. By doing this we shrink number of records from 120 million to be ~ 350 thousands per year. This looks much more manageable. The cons here are additional effort to come up with the way of storing and retrieving gps readings as a blob.
> 
> So, the table would look:
> 
> create table idea2 (
> 
> 	VEHICLE_ID	int	not null
> 
> ,	RUN_DATE	date	not null
> 
> ,	GPS_READINGS	varbinary or whatever appropriate to keep daily portion of readings
> 
> -- we maight need reading_count as int here to help extract them from blob
> 
> );
> 
> And typical query would look like:
> 
> select
> 
> 	GPS_READINGS
> 
> from
> 
> 	idea2
> 
> where
> 
> 	VEHICLE_ID=@id
> 
> and	RUN_DATE=@date
> 
> This result blob will need to be 'expanded', we might use sp to do it.
> 
> 
> 
> Any comments are welcome, critique is appreciated.
> 
> Thanks in advance.	
> 
> migurus.

120.000.000 record per year up to 5 years = 600.000.000. That is a pretty big database as you already said.

I wonder if it is not better creating an indexed view. That way you avoid creating 547.000 extra records that consume disk space.

BTW, make sure you run the SQL Server quey optimizer so the select used in the view has good indexes.

[toc] | [prev] | [next] | [standalone]


#1285

From"Bob Barrows" <reb01501@NOSPAMyahoo.com>
Date2012-09-14 07:33 -0400
Message-ID<k2v4pe$n3t$1@dont-email.me>
In reply to#1283
migurus wrote:
> Windows Server/SQL Server 2008, we need to store GPS readings from
> roughly 1000 vehicles, ~ 300 readings per vehicle per day. The data
> comes in in real-time. Records are never updated after insert. That
> gives us roughly 120 million records per year. We need to keep last 5
> years of records.
>
> The usage of stored data is to query vehicle positions and event
> attributes within entire day by given vehicle and date. Geospatial
> features and functions are not really needed for the reporting.
>
> I'd like to discuss table design for these requirements.
>
> Here is a preliminary record content:
> - vehicle id
> - gps reading X and Y
> - gps reading time
> - one-letter event attribute
> - event id (nullable)
>
> First idea is to keep data normalized in one table, one record per
> GPS reading. It looks very simple to implement, but I doubt the
> server will be happy with this high number of small records.

Why not? Narrow and long tables are perfect for relational databases

> I we
> were to do this, what would be a good candidate for primary key here?

What combination of columns identifies a row? Vehicle_ID and GPS_TIME? If 
so, there's your clustered primary key (wtp?)
What is that event id and why is it nullable?


> Just for illustration I show table def for idea# 1
> create table idea1 (
> VEHICLE_ID int not null
> , GPS_X real not null
> , GPS_Y real not null
> , GPS_TIME datetime not null
> , EVENT_LETTER char not null
> , EVENT_ID int null
> );
> I don't show indexes as I am not clear what to use.
> The typical query would look like this:
> select
> GPS_X
> , GPS_Y
> , GPS_TIME
> , EVENT_LETTER
> , EVENT_ID
> from
> idea1
> where
> VEHICLE_ID=@id
> and GPS_TIME between @from and @to

The proposed clustered primary key above would suit this query perfectly

>
>
> Second idea is to keep a small table where records are inserted as
> they come in the structure shown above and then on the daily basis
> aggregate it and store as one record per vehicle in the another
> table, which will be used for queries. By doing this we shrink number
> of records from 120 million to be ~ 350 thousands per year. This
> looks much more manageable. The cons here are additional effort to
> come up with the way of storing and retrieving gps readings as a
> blob.
>
Nah, this data isn't really suitable for aggregation, IMO. 

[toc] | [prev] | [next] | [standalone]


#1287

Frommigurus <migurus@yahoo.com>
Date2012-09-14 10:13 -0700
Message-ID<316dd77d-8a3b-4a09-9477-56fd928d6008@googlegroups.com>
In reply to#1285
On Friday, September 14, 2012 4:35:11 AM UTC-7, Bob Barrows wrote:
> migurus wrote:
> 
> > Windows Server/SQL Server 2008, we need to store GPS readings from
> 
> > roughly 1000 vehicles, ~ 300 readings per vehicle per day. The data
> 
> > comes in in real-time. Records are never updated after insert. That
> 
> > gives us roughly 120 million records per year. We need to keep last 5
> 
> > years of records.
> 
> >
> 
> > The usage of stored data is to query vehicle positions and event
> 
> > attributes within entire day by given vehicle and date. Geospatial
> 
> > features and functions are not really needed for the reporting.
> 
> >
> 
> > I'd like to discuss table design for these requirements.
> 
> >
> 
> > Here is a preliminary record content:
> 
> > - vehicle id
> 
> > - gps reading X and Y
> 
> > - gps reading time
> 
> > - one-letter event attribute
> 
> > - event id (nullable)
> 
> >
> 
> > First idea is to keep data normalized in one table, one record per
> 
> > GPS reading. It looks very simple to implement, but I doubt the
> 
> > server will be happy with this high number of small records.
> 
> 
> 
> Why not? Narrow and long tables are perfect for relational databases
> 
> 

Thanks, I did not know that, as I have limited exposure in db field.

> 
> > I we
> 
> > were to do this, what would be a good candidate for primary key here?
> 
> 
> 
> What combination of columns identifies a row? Vehicle_ID and GPS_TIME? If 
> 
> so, there's your clustered primary key (wtp?)
> 
> What is that event id and why is it nullable?

Some events do not require id, this will not be any foreign key or anything

> 
> 
> 
> 
> 
> > Just for illustration I show table def for idea# 1
> 
> > create table idea1 (
> 
> > VEHICLE_ID int not null
> 
> > , GPS_X real not null
> 
> > , GPS_Y real not null
> 
> > , GPS_TIME datetime not null
> 
> > , EVENT_LETTER char not null
> 
> > , EVENT_ID int null
> 
> > );
> 
> > I don't show indexes as I am not clear what to use.
> 
> > The typical query would look like this:
> 
> > select
> 
> > GPS_X
> 
> > , GPS_Y
> 
> > , GPS_TIME
> 
> > , EVENT_LETTER
> 
> > , EVENT_ID
> 
> > from
> 
> > idea1
> 
> > where
> 
> > VEHICLE_ID=@id
> 
> > and GPS_TIME between @from and @to
> 
> 
> 
> The proposed clustered primary key above would suit this query perfectly
> 
> 
> 
> >
> 
> >
> 
> > Second idea is to keep a small table where records are inserted as
> 
> > they come in the structure shown above and then on the daily basis
> 
> > aggregate it and store as one record per vehicle in the another
> 
> > table, which will be used for queries. By doing this we shrink number
> 
> > of records from 120 million to be ~ 350 thousands per year. This
> 
> > looks much more manageable. The cons here are additional effort to
> 
> > come up with the way of storing and retrieving gps readings as a
> 
> > blob.
> 
> >
> 
> Nah, this data isn't really suitable for aggregation, IMO.

Thanks for your input.

[toc] | [prev] | [next] | [standalone]


#1286

Fromrja.carnegie@gmail.com
Date2012-09-14 05:05 -0700
Message-ID<aa20c002-6e6e-4f36-b678-d4c2c736dada@googlegroups.com>
In reply to#1283
Off the top of a semi-qualified head:

If the table is indexed well, then query performance will depend on the
volume of data in a query, but not so much on the size of a table and the
rows that you aren't looking at.

If the typical query is all the records for one vehicle on one calendar day
(one working day), then a primary key of date, vehicle id, time, looks
possible.  If you want to count work after midnight for the previous day, 
that could be messier - you could think about using datetimeoffset.

I expect such a design also to be efficient when you are deleting 
records that are 5 years old every day.

If you use datetime order, then vehicle, then the server has to scan 
records of 1000 vehicles to get the data that you actually want to see.
On the other hand, if you want the query for 1000 vehicles, then
it's probably okay if they're mixed together.

If data arrives in real time, then a clustered index of date, vehicle, time, 
may slow down updates, while favouring the query.  Usually you want it 
that way around.  The clustered index makes SQL Server physically store
neighbouring records together.  So you'll have 1000 distinct points in 
the data file where each vehicle's records are being added.  You'd also 
see data filling pages that are then split into two, a lot.  But I expect
the server still to perform well in that case.  There also may be an
implication of the growth of the transaction log, but if data is inserted
one row at a time then it won't make much of a difference - the log ray be 
pretty big whatever you do.

Your primary key is of course unique and doesn't allow NULL and by 
default is a clustered index, but none of this is necessary, 
including even having a primary key as such.  Indexes that aren't 
the primary key will work just as well.  The clustered index does 
matter; it's also desirable for its key to be short, since it's
incorporated into any nonclustered indexes, so if there isn't a need 
to use the clustered index to reorganise the data then a clustered 
index on an IDENTITY(int, 1, 1) column can be pretty good.  
/Then/ you set your nonclustered primary key (which does have 
to be unique and not NULL).  Including a non-meaningful IDENTITY 
in a primary key wouldn't be right, but it's right if it is, 
in fact, the key of your table in relationships with other data.

The point is, if your data is not unique in the date, time, and 
vehicle ID fields, then don't get sidetracked into worrying about
a primary key that you can't have.  Alternatively, add 1 second 
to one of the duplicate key values, and it'll be unique.

Primary key also is a constraint that must have a unique object name
(PK_nameoftable probably), whereas you can and I do name other indexes 
IX_1, IX_2, IX_3, on every table in your database.  This doesn't 
help other developers understand the database, but other developers 
don't often help me, so if they want to know what my indexes are,
they can look 'em up.  I probably have 500 nearly identical tables
created at the same time.  (Arguably I should have one table with 
500 times as many rows and a couple of extra columns, but that isn't
how we do things here.)  So I don't want to invent 500 meaningful 
index names.

By the way, this project sounds important enough that you'll be 
thinking carefully about data backup and recovering from disaster
or downtime, if you don't already have that covered thoroughly 
for your platform.  So, your thinking about backup and recovery
also may bear on your database design.  For instance, do you 
want that big, fat transaction log that holds everything that 
happened in your database between the last backup and the failure -
or do you have files of the last 7 days' GPS data that you can 
re-load as part of your disaster recovery process, so that it 
doesn't matter if the database is blown away except for the
twice-weekly taken-off-site backup?

[toc] | [prev] | [next] | [standalone]


#1288

Frommigurus <migurus@yahoo.com>
Date2012-09-14 10:25 -0700
Message-ID<86717762-58b1-4a3a-a68d-2177c6047df7@googlegroups.com>
In reply to#1286
On Friday, September 14, 2012 5:05:05 AM UTC-7, rja.ca...@gmail.com wrote:
> Off the top of a semi-qualified head:
> 
> 
> 
> If the table is indexed well, then query performance will depend on the
> 
> volume of data in a query, but not so much on the size of a table and the
> 
> rows that you aren't looking at.
> 
> 

Thanks, I had an impression that hundreds of millions of small records is not a sound design.

> 
> If the typical query is all the records for one vehicle on one calendar day
> 
> (one working day), then a primary key of date, vehicle id, time, looks
> 
> possible.  If you want to count work after midnight for the previous day, 
> 
> that could be messier - you could think about using datetimeoffset.
> 
> 
> 
> I expect such a design also to be efficient when you are deleting 
> 
> records that are 5 years old every day.
> 
> 
> 
> If you use datetime order, then vehicle, then the server has to scan 
> 
> records of 1000 vehicles to get the data that you actually want to see.
> 
> On the other hand, if you want the query for 1000 vehicles, then
> 
> it's probably okay if they're mixed together.
> 

One vehicle at a time queries required.

> 
> 
> If data arrives in real time, then a clustered index of date, vehicle, time, 
> 
> may slow down updates, while favouring the query.  Usually you want it 
> 
> that way around.  The clustered index makes SQL Server physically store
> 
> neighbouring records together.  So you'll have 1000 distinct points in 
> 
> the data file where each vehicle's records are being added.  You'd also 
> 
> see data filling pages that are then split into two, a lot.  But I expect
> 
> the server still to perform well in that case.  There also may be an
> 
> implication of the growth of the transaction log, but if data is inserted
> 
> one row at a time then it won't make much of a difference - the log ray be 
> 
> pretty big whatever you do.
> 

There is no updates, as I mentioned. Inserts only.
I love your idea of building three part index: date, vehicle_id, time.
This is right to the point and not a big overhead space wise.

> 
> 
> Your primary key is of course unique and doesn't allow NULL and by 
> 
> default is a clustered index, but none of this is necessary, 
> 
> including even having a primary key as such.  Indexes that aren't 
> 
> the primary key will work just as well.  The clustered index does 
> 
> matter; it's also desirable for its key to be short, since it's
> 
> incorporated into any nonclustered indexes, so if there isn't a need 
> 
> to use the clustered index to reorganise the data then a clustered 
> 
> index on an IDENTITY(int, 1, 1) column can be pretty good.  
> 
> /Then/ you set your nonclustered primary key (which does have 
> 
> to be unique and not NULL).  Including a non-meaningful IDENTITY 
> 
> in a primary key wouldn't be right, but it's right if it is, 
> 
> in fact, the key of your table in relationships with other data.
> 
> 
> 
> The point is, if your data is not unique in the date, time, and 
> 
> vehicle ID fields, then don't get sidetracked into worrying about
> 
> a primary key that you can't have.  Alternatively, add 1 second 
> 
> to one of the duplicate key values, and it'll be unique.
> 
> 
> 
> Primary key also is a constraint that must have a unique object name
> 
> (PK_nameoftable probably), whereas you can and I do name other indexes 
> 
> IX_1, IX_2, IX_3, on every table in your database.  This doesn't 
> 
> help other developers understand the database, but other developers 
> 
> don't often help me, so if they want to know what my indexes are,
> 
> they can look 'em up.  I probably have 500 nearly identical tables
> 
> created at the same time.  (Arguably I should have one table with 
> 
> 500 times as many rows and a couple of extra columns, but that isn't
> 
> how we do things here.)  So I don't want to invent 500 meaningful 
> 
> index names.
> 
> 
> 
> By the way, this project sounds important enough that you'll be 
> 
> thinking carefully about data backup and recovering from disaster
> 
> or downtime, if you don't already have that covered thoroughly 
> 
> for your platform.  So, your thinking about backup and recovery
> 
> also may bear on your database design.  For instance, do you 
> 
> want that big, fat transaction log that holds everything that 
> 
> happened in your database between the last backup and the failure -
> 
> or do you have files of the last 7 days' GPS data that you can 
> 
> re-load as part of your disaster recovery process, so that it 
> 
> doesn't matter if the database is blown away except for the
> 
> twice-weekly taken-off-site backup?

I was thinking about an hourly log shipping and daily or weekly full backups. 
Would really appreciate a comment on this idea.

Thanks for your thoughts.

[toc] | [prev] | [next] | [standalone]


#1289

FromErland Sommarskog <esquel@sommarskog.se>
Date2012-09-15 19:21 +0200
Message-ID<XnsA0CFC4E06222AYazorman@127.0.0.1>
In reply to#1283
migurus (migurus@yahoo.com) writes:
> First idea is to keep data normalized in one table, one record per GPS
> reading. It looks very simple to implement, but I doubt the server will
> be happy with this high number of small records. 

Why not? As long it is properly indexed, I don't see any problem.

What you should consider though, is to partition the table (which 
requires Enterprise Edition). When five years has passed, and you 
want to purge old data, this is a swift affair with partitioning, 
provided that your partition aligns with what you want to purge.

> I we were to do this, what would be a good candidate for primary key here?

(VEHICLE_ID, GPS_TIME). Although, you should probably have GPS_TIME as the 
first column in the clustered index to avoid fragmentation. GPS_TIME would 
also be your partitioning column.


> So, the table would look:
> create table idea2 (
>      VEHICLE_ID     int     not null
> ,     RUN_DATE     date     not null
> ,     GPS_READINGS     varbinary or whatever appropriate to keep daily 
> portion of readings

You may save some space with this, because you save row overhead. But you 
get a much more complex solution which is more difficult to understand and
maintain. And if the neeeds changes in the future and people want to run
queries like "which vehicles have been in that area at some time?", that 
is hopeless with the above. I strongly recommend against it.


-- 
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx

[toc] | [prev] | [next] | [standalone]


#1290

Fromrja.carnegie@gmail.com
Date2012-09-17 08:46 -0700
Message-ID<c7f8f434-150c-4b5b-b223-9c4fe5431912@googlegroups.com>
In reply to#1289
On Saturday, September 15, 2012 6:21:13 PM UTC+1, Erland Sommarskog wrote:
> migurus (migurus@yahoo.com) writes: 
> > First idea is to keep data normalized in one table, one record per
> > GPS reading. It looks very simple to implement, but I doubt the 
> > server will be happy with this high number of small records.
> 
> Why not? As long it is properly indexed, I don't see any problem.
> What you should consider though, is to partition the table (which 
> requires Enterprise Edition). When five years has passed, and you
> want to purge old data, this is a swift affair with partitioning, 
> provided that your partition aligns with what you want to purge.

Partitioning might be not a great boost to performance, depending on 
how it's done.  With methods possibly out of date or wrong, I estimate 
the original design row size is 23 bytes, counting one byte to map 
NULL columns (for up to 8 table columns whether nullable or not),
and an overhead of 11 bytes per table row, although I don't remember
how I worked that out - maybe by trying it out.  Then I think that 
one day's new data is a little less than 10 megabytes, and five years
is around 20 GB.  That's before indexing - which would probably be, 
well, somewhat smaller than the data.  If it's stupid to make tables
as large as that, I've been stupider.

300,000 transactions per day, however, I put at about 2 GB or more 
in the transaction log if it's fully logged, so keeping that under 
control might be an important part of the backup/maintenance plan.
On the other hand, since I'm assuming storing one 8 KB page in the 
log for storing each 34 bytes row, it's a lousy way to keep a backup 
version of the data as far as efficiency goes.  But efficient isn't
the same thing as effective.  If you can bear the inefficiency, it'll
work.  Whoops, I forgot that these additions will probably update 
the index as well, so, double that log requirement.

For deleting data /without/ taking table partitioning into account, 
you might remove one day's records each day after five years, but
that won't be logged so badly if the rows are grouped together by 
clustered index - you're probably okay even if they're ordered by
vehicle ID and /then/ date/time.

And then the physical storage of data rows will be merrily jumbled 
within the data file - fragmented somewhat as time goes on - 
but if you only want to query one day's and one vehicle's data
at a time, I think that should be fine.

If partitioning the table, instead, what size of partition would you 
make - one day, one month, one year - and how would you maintain that?
Can it be done automatically?

Backup should be whatever you need to have in order to restore 
as much as you need to restore, as promptly as you need to have it,
with depth of design in case you discover just at the wrong time 
that your backup tape or disc isn't any good.  There are some 
horror stories that it's good to read, but not at bedtime.
So, what you should set up depends on the circumstances, and
the budget.  You may need to explain some of the horror stories
to whoever pays for stuff.

I might include separately storing at least a day's worth of GPS 
messages to re-load into your database from the external format, 
in case of (a) equipment failure and data loss and (b) finding out 
that you were mishandling and corrupting the messages for a while.
There are horror stories about that, too.  And if you can restore
yesterday's data in that way, then you may only need a daily backup 
of the entire database, just after midnight - which will save you 
from storing copies of the transaction log.

On the other hand, your recovery scenario may be, you have database
backups, but someone stole all of your computers...

Also, would it be bad if somebody stole your data... such as, 
to rob the vehicles.  Like, if the truck full of money drives 
the same route every Friday... sure, the bad guys could go out
and follow it anyway, but at least then they have to pay for gas.
I don't think you want to tell /us/ if you have trucks full 
of money.

So, anyway, you should stop that from happening.

[toc] | [prev] | [standalone]


Back to top | Article view | comp.databases.ms-sqlserver


csiph-web