Groups > comp.lang.forth > #14790 > unrolled thread

Implementing virtual memory on cassette tape

Started by	chitselb <chitselb@gmail.com>
First post	2012-08-07 06:21 -0700
Last post	2012-08-11 21:51 -1000
Articles	20 on this page of 62 — 17 participants

Back to article view | Back to comp.lang.forth

  Implementing virtual memory on cassette tape chitselb <chitselb@gmail.com> - 2012-08-07 06:21 -0700
    Re: Implementing virtual memory on cassette tape Andrew Haley <andrew29@littlepinkcloud.invalid> - 2012-08-07 08:44 -0500
    Re: Implementing virtual memory on cassette tape anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-08-07 14:01 +0000
    Re: Implementing virtual memory on cassette tape Mark Wills <markrobertwills@yahoo.co.uk> - 2012-08-07 07:24 -0700
    Re: Implementing virtual memory on cassette tape Stan Barr <plan.b@dsl.pipex.com> - 2012-08-07 15:30 +0000
      Re: Implementing virtual memory on cassette tape Stan Barr <plan.b@dsl.pipex.com> - 2012-08-07 17:36 +0000
    Re: Implementing virtual memory on cassette tape Jason Damisch <jasondamisch@yahoo.com> - 2012-08-07 11:52 -0700
      Re: Implementing virtual memory on cassette tape Paul Rubin <no.email@nospam.invalid> - 2012-08-07 12:39 -0700
        Re: Implementing virtual memory on cassette tape Paul Rubin <no.email@nospam.invalid> - 2012-08-07 12:55 -0700
          Re: Implementing virtual memory on cassette tape Bernd Paysan <bernd.paysan@gmx.de> - 2012-08-07 22:00 +0200
            Re: Implementing virtual memory on cassette tape Paul Rubin <no.email@nospam.invalid> - 2012-08-08 00:27 -0700
              Re: Implementing virtual memory on cassette tape Mark Wills <markrobertwills@yahoo.co.uk> - 2012-08-08 01:26 -0700
                Re: Implementing virtual memory on cassette tape Alex McDonald <blog@rivadpm.com> - 2012-08-08 02:31 -0700
                Re: Implementing virtual memory on cassette tape Paul Rubin <no.email@nospam.invalid> - 2012-08-08 02:46 -0700
              Re: Implementing virtual memory on cassette tape Alex McDonald <blog@rivadpm.com> - 2012-08-08 02:23 -0700
                Re: Implementing virtual memory on cassette tape anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-08-08 10:57 +0000
                  Re: Implementing virtual memory on cassette tape Alex McDonald <blog@rivadpm.com> - 2012-08-08 04:59 -0700
                    Re: Implementing virtual memory on cassette tape anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-08-08 12:24 +0000
                      Re: Implementing virtual memory on cassette tape Alex McDonald <blog@rivadpm.com> - 2012-08-08 11:10 -0700
                        Re: Implementing virtual memory on cassette tape Bernd Paysan <bernd.paysan@gmx.de> - 2012-08-09 00:13 +0200
                          Re: Implementing virtual memory on cassette tape Alex McDonald <blog@rivadpm.com> - 2012-08-08 16:05 -0700
                            Re: Implementing virtual memory on cassette tape Paul Rubin <no.email@nospam.invalid> - 2012-08-08 17:30 -0700
                            Re: Implementing virtual memory on cassette tape Bernd Paysan <bernd.paysan@gmx.de> - 2012-08-09 03:26 +0200
                              Re: Implementing virtual memory on cassette tape Alex McDonald <blog@rivadpm.com> - 2012-08-09 05:30 -0700
                                Re: Implementing virtual memory on cassette tape Bernd Paysan <bernd.paysan@gmx.de> - 2012-08-09 19:21 +0200
                                  Re: Implementing virtual memory on cassette tape Alex McDonald <blog@rivadpm.com> - 2012-08-09 13:30 -0700
                                    Re: Implementing virtual memory on cassette tape Bernd Paysan <bernd.paysan@gmx.de> - 2012-08-10 01:27 +0200
                          Re: Implementing virtual memory on cassette tape vandys@vsta.org - 2012-08-09 00:32 +0000
                            Re: Implementing virtual memory on cassette tape Bernd Paysan <bernd.paysan@gmx.de> - 2012-08-09 03:33 +0200
                        Re: Implementing virtual memory on cassette tape anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-08-09 06:00 +0000
                          Re: Implementing virtual memory on cassette tape Alex McDonald <blog@rivadpm.com> - 2012-08-09 05:26 -0700
                            Re: Implementing virtual memory on cassette tape anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-08-09 13:44 +0000
                              Re: Implementing virtual memory on cassette tape Alex McDonald <blog@rivadpm.com> - 2012-08-09 10:21 -0700
                                Re: Implementing virtual memory on cassette tape Bernd Paysan <bernd.paysan@gmx.de> - 2012-08-09 19:50 +0200
                                  Re: Implementing virtual memory on cassette tape Alex McDonald <blog@rivadpm.com> - 2012-08-09 12:32 -0700
                                    Re: Implementing virtual memory on cassette tape Bernd Paysan <bernd.paysan@gmx.de> - 2012-08-09 22:07 +0200
                                      Re: Implementing virtual memory on cassette tape Alex McDonald <blog@rivadpm.com> - 2012-08-09 13:58 -0700
                                    Re: Implementing virtual memory on cassette tape Paul Rubin <no.email@nospam.invalid> - 2012-08-09 17:36 -0700
                                      Re: Implementing virtual memory on cassette tape Alex McDonald <blog@rivadpm.com> - 2012-08-10 04:13 -0700
                                        Re: Implementing virtual memory on cassette tape Paul Rubin <no.email@nospam.invalid> - 2012-08-11 20:27 -0700
                                Re: Implementing virtual memory on cassette tape anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-08-10 15:57 +0000
                                  Re: Implementing virtual memory on cassette tape Alex McDonald <blog@rivadpm.com> - 2012-08-13 05:23 -0700
                                    Re: Implementing virtual memory on cassette tape anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-08-15 15:13 +0000
                                      Re: Implementing virtual memory on cassette tape Alex McDonald <blog@rivadpm.com> - 2012-08-15 11:57 -0700
        Re: Implementing virtual memory on cassette tape anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-08-08 07:08 +0000
          Re: Implementing virtual memory on cassette tape chitselb <chitselb@gmail.com> - 2012-08-08 06:25 -0700
        Re: Implementing virtual memory on cassette tape Mark Wills <markrobertwills@yahoo.co.uk> - 2012-08-08 01:23 -0700
        Re: Implementing virtual memory on cassette tape kenney@cix.compulink.co.uk - 2012-08-08 05:06 -0500
    Re: Implementing virtual memory on cassette tape Percy <percival.andrews@gmail.com> - 2012-08-08 21:11 -0700
      Re: Implementing virtual memory on cassette tape chitselb <chitselb@gmail.com> - 2012-08-08 21:30 -0700
        Re: Implementing virtual memory on cassette tape percival.andrews@gmail.com - 2012-08-08 23:50 -0700
          Re: Implementing virtual memory on cassette tape chitselb <chitselb@gmail.com> - 2012-08-09 03:54 -0700
            Re: Implementing virtual memory on cassette tape Paul Rubin <no.email@nospam.invalid> - 2012-08-09 09:07 -0700
              Re: Implementing virtual memory on cassette tape chitselb <chitselb@gmail.com> - 2012-08-09 12:20 -0700
    Re: Implementing virtual memory on cassette tape Mat <dambere@web.de> - 2012-08-10 13:41 -0700
      Re: Implementing virtual memory on cassette tape Coos Haak <chforth@hccnet.nl> - 2012-08-10 23:54 +0200
        Re: Implementing virtual memory on cassette tape dambere@web.de - 2012-08-10 15:41 -0700
          Re: Implementing virtual memory on cassette tape Coos Haak <chforth@hccnet.nl> - 2012-08-11 01:47 +0200
      Re: Implementing virtual memory on cassette tape Andrew Haley <andrew29@littlepinkcloud.invalid> - 2012-08-11 03:50 -0500
        Re: Implementing virtual memory on cassette tape anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-08-11 09:03 +0000
          Re: Implementing virtual memory on cassette tape Andrew Haley <andrew29@littlepinkcloud.invalid> - 2012-08-11 16:08 -0500
      Re: Implementing virtual memory on cassette tape "Elizabeth D. Rather" <erather@forth.com> - 2012-08-11 21:51 -1000

Page 2 of 4 — ← Prev page 1 [2] 3 4 Next page →

#14866

From	Alex McDonald <blog@rivadpm.com>
Date	2012-08-08 16:05 -0700
Message-ID	<882dd387-f2ab-4d08-8b12-fde5a656157f@a9g2000vbn.googlegroups.com>
In reply to	#14865

On Aug 8, 11:13 pm, Bernd Paysan <bernd.pay...@gmx.de> wrote:
> Alex McDonald wrote:
> > 3.5TB raw, 8TB with 2.5:1 compression on LTO6. You can't buy them at
> > your corner shop just quite yet, but they are available.
>
> I don't think it makes sense to compare compressed size with
> uncompressed hard disk size.  We have better compression algorithms, and
> for hard disk backups, we use those - and usually, the tape backup is
> already pre-compressed.  Tapes are filled with already compressed data,
> because you only copy hard disk backups to tape, and the harddisk backup
> is already compressed.

That's true, but it doesn't change the fundamentals; a 4TB disk still
contains 4TB of data, and a 3.2TB tape 3.2TB of data, compressed or
otherwise.

> AFAIK, LTO-6 is 2.5TB raw, and the usual status
> you get is "planned".  As a data center, I woulc not consider that as
> "available", even if there are low-quanity prototypes available for
> testing.  Available is the 1.5TB LTO-5, the price is a bit above $45.

HP's LTO6 is 3.2TB uncompressed, which is what the Ultrium consortium
indicated. I mistyped 3.2TB as as 3.5TB above.

> About half the price of a similar-sized hard disk, as hard disk prices
> are still suffering a bit from the Thailand flood.

True. But you're quoting bog standard desktop drives at that price,
which I sincerely hope you aren't using in your backup servers.
Enterprise class drives are a good bit more expensive than that.

>
> > That's true. But for a handful of tapes or more, tape is hard to beat.
> > Moving your data center out of Vienna to a city better servicing
> > Auckland seems like a good idea too. Unless you value a good cup of
> > coffee more highly, in which case I would stay where you are.
>
> Whereever you move, there's a place on the other side of the world,
> which is more than 24h flight away - and for Vienna, the place on the
> other side of the world is Auckland.  The best I can get from Frankfurt
> is 28:25h via Dubai&Melbourne.  And that sort of flight has two slots
> per day or so, so add an average waiting time of 12h (customer calls you
> "I need the data *now*" in the middle of your night).
>
> Both A380 and 787 have a maximum range of ~15000km.  So you can't go
> nonstop to the other side of the world.

Perhaps I should have used a few smilies rather than references to the
crap coffee in NZ on this one; the point that tape has incredible
bandwidth per km seems to be getting lost in plane timetables...


>
> --
> Bernd Paysan
> "If you want it done right, you have to do it yourself"http://bernd-paysan.de/

[toc] | [prev] | [next] | [standalone]

#14869

From	Paul Rubin <no.email@nospam.invalid>
Date	2012-08-08 17:30 -0700
Message-ID	<7xa9y4hpoh.fsf@ruckus.brouhaha.com>
In reply to	#14866

Alex McDonald <blog@rivadpm.com> writes:
> HP's LTO6 is 3.2TB uncompressed, which is what the Ultrium consortium
> indicated. I mistyped 3.2TB as as 3.5TB above.

It was decreased to 2.5TB:

   http://www.storagenewsletter.com/news/tapes/licensing-specs-august-2012

   "The new main specs of LTO-6 are below what was formerly announced by
   the LTO consortium. For uncompressed capacity and transfer rates, it
   was supposed to be 3.2TB and 210MB/s for LTO-6, it's now 2.5TB and
   160MB/s, or an increase of only 67% and 14% respectively, in
   comparison to 1.5TB and 140MB/s for LTO-5."

[toc] | [prev] | [next] | [standalone]

#14874

From	Bernd Paysan <bernd.paysan@gmx.de>
Date	2012-08-09 03:26 +0200
Message-ID	<5095812.K4UvBJtjZl@sunwukong.fritz.box>
In reply to	#14866

Alex McDonald wrote:
> True. But you're quoting bog standard desktop drives at that price,
> which I sincerely hope you aren't using in your backup servers.

Of course I do.  Backup is redundancy, not expensive disks, the 
likelyhood that a bog standard desktop drive fails is not that much 
different from a snake-oil expensive SAS drive - the fundamental 
construction of both drives are the same.  Backup is not something you 
need extremely high bandwidth for, bog standard desktop drives are fine.  
LTO-5 is 140MB/s, that's in the range of cheap 5400rpm desktop drives - 
7200rpm are already in the 200MB/s range of LTO-6.

-- 
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://bernd-paysan.de/

[toc] | [prev] | [next] | [standalone]

#14892

From	Alex McDonald <blog@rivadpm.com>
Date	2012-08-09 05:30 -0700
Message-ID	<25899bec-a795-4f1f-b84c-32f3da20b0ad@b10g2000vbj.googlegroups.com>
In reply to	#14874

On Aug 9, 2:26 am, Bernd Paysan <bernd.pay...@gmx.de> wrote:
> Alex McDonald wrote:
> > True. But you're quoting bog standard desktop drives at that price,
> > which I sincerely hope you aren't using in your backup servers.
>
> Of course I do.  Backup is redundancy, not expensive disks, the
> likelyhood that a bog standard desktop drive fails is not that much
> different from a snake-oil expensive SAS drive - the fundamental
> construction of both drives are the same.

Yet they have different specs. Why?

> Backup is not something you
> need extremely high bandwidth for, bog standard desktop drives are fine.
> LTO-5 is 140MB/s, that's in the range of cheap 5400rpm desktop drives -
> 7200rpm are already in the 200MB/s range of LTO-6.

Backup needs reliability, which is where I take issue with your
assertion that desktop drives are "good enough". Your backup is not
redundancy when you only have it to continue with.

>
> --
> Bernd Paysan
> "If you want it done right, you have to do it yourself"http://bernd-paysan.de/

[toc] | [prev] | [next] | [standalone]

#14895

From	Bernd Paysan <bernd.paysan@gmx.de>
Date	2012-08-09 19:21 +0200
Message-ID	<2845685.VmIIC3HCAb@sunwukong.fritz.box>
In reply to	#14892

Alex McDonald wrote:

> On Aug 9, 2:26 am, Bernd Paysan <bernd.pay...@gmx.de> wrote:
>> Alex McDonald wrote:
>> > True. But you're quoting bog standard desktop drives at that price,
>> > which I sincerely hope you aren't using in your backup servers.
>>
>> Of course I do.  Backup is redundancy, not expensive disks, the
>> likelyhood that a bog standard desktop drive fails is not that much
>> different from a snake-oil expensive SAS drive - the fundamental
>> construction of both drives are the same.
> 
> Yet they have different specs. Why?

To get the money of idiots believing that expensive is better.  And 
because the SAS controller is low volume, while the SATA controller is 
high volume.  SAS drives have faster spindle speeds and shorter access 
times, which is completely useless in this case: All we want is 
reasonable speed to fill the backup disk with files.

And BTW: You can buy disks with similar specs both for SATA and SAS, 
they can have similar pricing, though.

>> Backup is not something you
>> need extremely high bandwidth for, bog standard desktop drives are
>> fine. LTO-5 is 140MB/s, that's in the range of cheap 5400rpm desktop
>> drives - 7200rpm are already in the 200MB/s range of LTO-6.
> 
> Backup needs reliability, which is where I take issue with your
> assertion that desktop drives are "good enough". Your backup is not
> redundancy when you only have it to continue with.

You have only *one* backup?  That's why I suggest using cheap drives: 
Buy another, make two backups.

-- 
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://bernd-paysan.de/

[toc] | [prev] | [next] | [standalone]

#14902

From	Alex McDonald <blog@rivadpm.com>
Date	2012-08-09 13:30 -0700
Message-ID	<269f51bb-8d8b-4433-9bda-35aba932e31f@n13g2000vby.googlegroups.com>
In reply to	#14895

On Aug 9, 6:21 pm, Bernd Paysan <bernd.pay...@gmx.de> wrote:
> Alex McDonald wrote:
> > On Aug 9, 2:26 am, Bernd Paysan <bernd.pay...@gmx.de> wrote:
> >> Alex McDonald wrote:
> >> > True. But you're quoting bog standard desktop drives at that price,
> >> > which I sincerely hope you aren't using in your backup servers.
>
> >> Of course I do.  Backup is redundancy, not expensive disks, the
> >> likelyhood that a bog standard desktop drive fails is not that much
> >> different from a snake-oil expensive SAS drive - the fundamental
> >> construction of both drives are the same.
>
> > Yet they have different specs. Why?
>
> To get the money of idiots believing that expensive is better.  And
> because the SAS controller is low volume, while the SATA controller is
> high volume.  SAS drives have faster spindle speeds and shorter access
> times, which is completely useless in this case: All we want is
> reasonable speed to fill the backup disk with files.
>
> And BTW: You can buy disks with similar specs both for SATA and SAS,
> they can have similar pricing, though.

There are enterprise class SATA drives too, btw, something that may
not be apparent from a casual inspection of a disk mfrs website. Both
SATA and SAS enterprise drives have a much lower bit error rate (a
factor of 10), a lower AFR and higher MTBF than the corresponding
desktop variety. The firmware is not the same either, since the
assumption is made that the drives are part of a RAID group, and the
hardware or software upstream can handle the errors. Desktop drives go
to extraordinary lengths to read data to the point of "spasm" for what
might be several minutes, since they assume that this is your only
copy. They re-allocate bad blocks out of line, requiring a hidden
seek. That's not desirable in a system that needs to perform as though
these issues don't exist. Enterprise drives need to tolerate much
higher levels of vibration, since they are mounted cheek by jowl in
dense arrays where vibration can be a significant factor. They have
multiple servo wedges (track markers monitored by the read heads) to
provide accurate tracking & feedback through the servo system; desktop
drives may have one or even none, and track entirely based on data
written.

And so on. They are not the same. The price is not hugely different
from a desktop drive; around twice the price.

[toc] | [prev] | [next] | [standalone]

#14905

From	Bernd Paysan <bernd.paysan@gmx.de>
Date	2012-08-10 01:27 +0200
Message-ID	<1642819.e4UGJbCkPf@sunwukong.fritz.box>
In reply to	#14902

Alex McDonald wrote:
> There are enterprise class SATA drives too, btw, something that may
> not be apparent from a casual inspection of a disk mfrs website. Both
> SATA and SAS enterprise drives have a much lower bit error rate (a
> factor of 10), a lower AFR and higher MTBF than the corresponding
> desktop variety.

Actually, most of the desktop varieties have bit error rate, AFR and 
MTBF unspecified, and the enterprice class disks have them specified.  
This is IMHO, because they are actually identical, maybe except firmware 
issues, as below.  What is certainly possible is that the desktop 
varieties haven't been tested and contain severe bugs.

> The firmware is not the same either, since the
> assumption is made that the drives are part of a RAID group, and the
> hardware or software upstream can handle the errors. Desktop drives go
> to extraordinary lengths to read data to the point of "spasm" for what
> might be several minutes, since they assume that this is your only
> copy.

Yes, and the RAID controller assumes that when exceeding a certain 
timeout, the disk is due to replacement.  Which is true.  I'm completely 
unconvinced by what e.g Western Digital says about this topic: If your 
drive in a RAID array has problems reading data, replace it *now*.  
Crash early, as we Forthers say.  This long spasm is the right reaction 
in both environments: The desktop user gets his precious data back, and 
the RAID controller throws the bad disk out.  Which he should.

IMHO, they got the complete protocol wrong, this shouldn't be hidden.  
The correct way to deal with these problems should be:

Say "Oops, read error" when you encounter a read error.  The host then 
can respond with "retry", "retry harder", and "attempt to repair", if it 
feels like it.  For a RAID system, a read error is no problem, there is 
enough redundancy to deliver the data, anyways.  It's much more 
important that you say "Oops" quickly.  Even on a mirrored system where 
you only access one disk for one request to improve throughput (they can 
serve twice as many read requests in that mode), retrying on the other 
trive is faster than the thorrough "retry harder".

We had that 20 years ago with floppy disk drivers, and the 
"Abort/Ignore/Retry" message from DOS.  The wrong thing was to present 
this message to the user; internally, the protocol was perfectly ok.  
Say something when you don't feel ok, say it quickly.

I've a similar thing in my net2o protocol.  TCP tries to retransmit 
packets which have been dropped.  net2o tries to re-request packets 
which didn't arrive.  This turns the situation around, the client is 
responsible for correct transmission, not the server.  Which allows the 
client to use more intelligent strategies - e.g. when copying identical 
files from several peers, you can ask any of them to transmit that lost 
block.  Or when you stream real-time low-latency audio data, just 
interpolate the lost block.  Assuming that you have to deliver 100% 
quality of service all the time can be wrong.  Deliver what you can, and 
say when you can't.  If that's not acceptable, the other side will 
complain, and then you can try harder.

> And so on. They are not the same. The price is not hugely different
> from a desktop drive; around twice the price.

Yes, I know, and I'm quite convinced that this is not worth it, and that 
this comes from ill-percieved risk assessment.  Or ill-perceived ways to 
save costs or something - it *is* cheaper to remove the vibrations of 
hard-disks than to make vibration-resistant ones, which probably are 
more vibration-resistant on paper than in reality.

Always remember: For twice the price, you can get twice the cheap disks.  
Usually, you don't need that many to reduce the risk to the same level 
you paid twice the price for.  Or put differently: Flying business class 
is no more secure than flying economy class.  But when paying a higher 
price makes you feel better, you should fly business class.

-- 
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://bernd-paysan.de/

[toc] | [prev] | [next] | [standalone]

#14870

From	vandys@vsta.org
Date	2012-08-09 00:32 +0000
Message-ID	<a8geo1Fr2rU1@mid.individual.net>
In reply to	#14865

Bernd Paysan <bernd.paysan@gmx.de> wrote:
> We have better compression algorithms, and 
> for hard disk backups, we use those

I've had some troubles with compressed archives, where there were some hits
on the media.  The fact that the archive was compressed made it much harder
to recover the remaining bits on the media.  I'd recommend avoiding
compression in your backups if you can afford the storage.

-- 
Andy Valencia
Home page: http://www.vsta.org/andy/
To contact me: http://www.vsta.org/contact/andy.html

[toc] | [prev] | [next] | [standalone]

#14877

From	Bernd Paysan <bernd.paysan@gmx.de>
Date	2012-08-09 03:33 +0200
Message-ID	<2374314.97TPzbGcXa@sunwukong.fritz.box>
In reply to	#14870

vandys@vsta.org wrote:

> Bernd Paysan <bernd.paysan@gmx.de> wrote:
>> We have better compression algorithms, and
>> for hard disk backups, we use those
> 
> I've had some troubles with compressed archives, where there were some
> hits
> on the media.  The fact that the archive was compressed made it much
> harder
> to recover the remaining bits on the media.  I'd recommend avoiding
> compression in your backups if you can afford the storage.

I recommend saving twice if you can afford the storage.  It's way more 
robust to have redundancy to recover these problems than to have 
uncompressed data.  Anyways, most of the current data that really takes 
space is already compressed - videos, images, music.  Some data, like 
textures, are even compressed in RAM, because decompression on the fly 
is worth the effort (RAM is slow... the GPU is much faster).  We keep 
text files uncompressed, and to be honest, I don't know why.  Usually, 
we read and write them in one go today, compressing/decompressing on the 
fly is not really a problem.

-- 
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://bernd-paysan.de/

[toc] | [prev] | [next] | [standalone]

#14883

From	anton@mips.complang.tuwien.ac.at (Anton Ertl)
Date	2012-08-09 06:00 +0000
Message-ID	<2012Aug9.080022@mips.complang.tuwien.ac.at>
In reply to	#14860

Alex McDonald <blog@rivadpm.com> writes:
>On Aug 8, 1:24=A0pm, an...@mips.complang.tuwien.ac.at (Anton Ertl)
>wrote:
>> Alex McDonald <b...@rivadpm.com> writes:
[...]
>Then the best of luck getting DHL to deliver your 4TB drive in one
>piece.

The drives are delivered to us in one piece, why wouldn't they
delivered elsewhere in one piece.

>> >In general,
>> >drives don't like being spun up; they fail much more quickly than
>> >disks that are spun throughout their entire lives.
>>
>> Not in my experience. =A0We have a backup server that spins down idle
>> disks, and have not noticed any reliability problems. =A0And we have not
>> noticed reliability problems with our off-line storage disks, either.
>> So disk spin-down is practical, and, in our experience, reliable.
>
>Yes, you're one of the lucky devils that I meet; but much less
>frequently than I used to, since the advent of super large SATA
>drives. The stats speak for themselves; failure rates of SATA drives
>are in the ones and twos of % per annum, and if you have an array with
>a couple of hundred plus, failure is to be expected and needs to be
>managed.

Sure, hard disks fail now and then.  And we certainly have organized
our backups such that the failure of one or two drives does not lead
to catastrophic loss.

>It's not just that the drive fails to spin up, or dies with a
>catastrophic failure in operation either. Bit error rates per byte
>haven't changed in 10 years, but the size of drives has grown
>exponentially. Every 4TB drives will have, on average, several
>correctly sent but badly written blocks, and a number of blocks where
>the drive declared that it had -- honest! -- written your data but
>hadn't. You just haven't found those corrupt or silent blocks yet.
>Spin down & up exacerbates these problems.

Sounds like you swallowed some horror stories some people like to
spin.  Why should spin down exacerbate these problems?

BTW, in my experience (based on several occasions) the most frequent
cause of corrupted disk blocks is due to misdesigned drives that do not
react correctly to power fluctuations.

>> I never heard about MAID and COPAN before, but it seems that this was
>> not sold as a backup solution, but as main storage. =A0There, I agree,
>> it is not very practical for most uses, and spin-slow is better.
>
>It was sold as backup; the systems couldn't support all the disks
>spinning at the same time.

The latter is true.  But according to
<http://wikibon.org/blog/copan-may-be-dead-but-maid-isnt/>, this was
sold as main storage.

>To get adequate bandwidth, data needs to be striped across several 10s
>of disks, and everyone wants to do their backups at the same time. But
>the COPAN power supplies were inadequate to support all the disks.
>It's the economics of competing with tape; big power supplies to
>support 480 disks packed in a single rack cost lots of money.

If you need that much bandwidth from your backup system, the tape
solution needs a similar number of tape drives (because tape drives
have a similar bandwidth), and the cost of that would dwarf the costs
of everything in the disk system, including a power supply for
spinning all the disks.  And these tape drives would need an even more
powerful power supply: Looking at
<https://iq.quantum.com/exLink.asp?8078910OS53M15I46299120>, idle
power consumption is 6.5W, typical 21.4W, peak 30.2W, i.e., about 2-3
times of a hard disk drive.

And how are you getting all this bandwidth to and from the backup
system?  480 disks with, say, 150MB/s each means 72000MB/s.

> Then,
>when data is required for restore, lots of disks have to be spun down,
>and others spun up, an activity that draws a lot of juice; it takes as
>much power to spin up a disk as running it for several minutes.

<http://www.seagate.com/files/staticfiles/docs/pdf/de-DE/datasheet/disc/barracuda-ds1737-1-1111de.pdf>
lists a power-up power consumption of at most 2A @12V, i.e., 24W (my
own measurements are in the same ballpark), and an average power
consumption of 8W for the 3TB model.  It takes about 10s to spin up a
drive, so spinning up takes as much as running a disk for half a
minute.

> And it
>takes 10s of minutes to do so as they can't all be powered up at
>once.

With the power supply you would need for the 480 tape
drives, yes, you could spin them all up at the same time.  But this is
typically not needed, certainly not for a saner backup management (but
neither are the 480 tape drives).

>For users that want to do restores quickly, it's useless. The power
>economics vs the high latency, low bandwidth & inconvenience just
>don't stack up.

Tape loses in power, latency, bandwidth, and convenience.  The only
thing where it wins is cost for low-bandwidth high-capacity storage.
Taking the numbers from my price watch site, I get:

EUR 46/TB for external 3TB disks (similar price for internal disks)
EUR 1311 for an internal LTO-5 tape drive (>1700 for external)
EUR 41 for a 1.5TB LTO-5 tape (EUR27/TB)

the crossover is at about 69 TB per tape drive (higher if you use
external tape drives).  If you want reliable and timely access to your
tapes, you need at least two tape drives, so tape is only cheaper if
you want to store more than 138TB on it (and even then it still has
all the other disadvantages).

>For the occasional server with a handful of disk
>drives, it's not so much of a problem, but at scale, even a moderate
>scale in the 10s of TB range, it's unworkable.

That's nonsense.

>Again, the power economics
>don't make sense for main storage where a complete stripe of 10s or
>more of them need spun up to get at a single 4K file.

Yes, striping (RAID-0) a 4KB file across tens of disks does not make
sense.

>Believe me, you're gambling with your
>current backup strategy...)

The only thing you know about our current backup strategy is that we
use disks and spin-down.  If by "gambling" you mean that we are
relying on luck, no, not much.  Of course there is the possibility
that everything fails at the same time, but that possibility is not
exclusive to disk drives.  Actually the probability that two tape
drives fail before we get a replacement is much higher than the
probability that all the disks on which we have our backups fail
between two backups.

- anton
-- 
M. Anton Ertl  http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
     New standard: http://www.forth200x.org/forth200x.html
   EuroForth 2012: http://www.euroforth.org/ef12/

[toc] | [prev] | [next] | [standalone]

#14891

From	Alex McDonald <blog@rivadpm.com>
Date	2012-08-09 05:26 -0700
Message-ID	<c5f1248a-11dd-4cca-b710-b95ea69d6c5f@y1g2000vbx.googlegroups.com>
In reply to	#14883

On Aug 9, 7:00 am, an...@mips.complang.tuwien.ac.at (Anton Ertl)
wrote:
> Alex McDonald <b...@rivadpm.com> writes:
> >On Aug 8, 1:24=A0pm, an...@mips.complang.tuwien.ac.at (Anton Ertl)
> >wrote:
> >> Alex McDonald <b...@rivadpm.com> writes:
> [...]
> >Then the best of luck getting DHL to deliver your 4TB drive in one
> >piece.
>
> The drives are delivered to us in one piece, why wouldn't they
> delivered elsewhere in one piece.

They don't contain your data. The failure rate of new drives is partly
as high as it is due to shipping.

>
> >> >In general,
> >> >drives don't like being spun up; they fail much more quickly than
> >> >disks that are spun throughout their entire lives.
>
> >> Not in my experience. =A0We have a backup server that spins down idle
> >> disks, and have not noticed any reliability problems. =A0And we have not
> >> noticed reliability problems with our off-line storage disks, either.
> >> So disk spin-down is practical, and, in our experience, reliable.
>
> >Yes, you're one of the lucky devils that I meet; but much less
> >frequently than I used to, since the advent of super large SATA
> >drives. The stats speak for themselves; failure rates of SATA drives
> >are in the ones and twos of % per annum, and if you have an array with
> >a couple of hundred plus, failure is to be expected and needs to be
> >managed.
>
> Sure, hard disks fail now and then.  And we certainly have organized
> our backups such that the failure of one or two drives does not lead
> to catastrophic loss.
>
> >It's not just that the drive fails to spin up, or dies with a
> >catastrophic failure in operation either. Bit error rates per byte
> >haven't changed in 10 years, but the size of drives has grown
> >exponentially. Every 4TB drives will have, on average, several
> >correctly sent but badly written blocks, and a number of blocks where
> >the drive declared that it had -- honest! -- written your data but
> >hadn't. You just haven't found those corrupt or silent blocks yet.
> >Spin down & up exacerbates these problems.
>
> Sounds like you swallowed some horror stories some people like to
> spin.  Why should spin down exacerbate these problems?

Several reasons.

Rated start/stop cycles; 250 average on/off cycles per year at the
expected population AFR of 0.55% (Seagate Cheetah 15.7, enterprise
class drive). Cumulative head damage; carbonisation during spin up
drag.
Low temperature operation; the AFR increases significantly (5 times
the AFR at <20C to those running >40C, Google study on 100000 desktop
class drives), and spun down drives will be cooler during early hours
of operation.
Slow spin; heads are designed for flight at a given RPM. Slow spin
reduces the air cushion/head height and makes the drives more
susceptible to shock. Even at full speed they can be shouted into
submission; http://www.youtube.com/watch?v=tDacjrSCeq4&feature=player_embedded

Overview of the Google experience, including a pointer to the paper
http://storagemojo.com/2007/02/19/googles-disk-failure-experience/

Due to commercial NDAs and other reasons, I can't do any better than
point you at what is publicly available. Our AFRs are much lower for a
variety of reasons; dual parity RAID, enterprise class drives,
temperature & vibration control, scrubbing, not depending on SMART or
for the drive to terminally die before replacing it amongst them.

>
> BTW, in my experience (based on several occasions) the most frequent
> cause of corrupted disk blocks is due to misdesigned drives that do not
> react correctly to power fluctuations.

That is rarely a problem on a well designed storage array, where the
power management is more sophisticated than that of a server. Pulling
the plug on such a system should have no deleterious effects.

>
> >> I never heard about MAID and COPAN before, but it seems that this was
> >> not sold as a backup solution, but as main storage. =A0There, I agree,
> >> it is not very practical for most uses, and spin-slow is better.
>
> >It was sold as backup; the systems couldn't support all the disks
> >spinning at the same time.
>
> The latter is true.  But according to
> <http://wikibon.org/blog/copan-may-be-dead-but-maid-isnt/>, this was
> sold as main storage.

I beg to differ. David Vallente is a sharp analyst, but to suggest
that a system that could only support 25% of its disks running at any
one time as "main storage" is a stretch; nor is it what he says in
that 3 year old article. He describes it as "disk arrays for storing
less active enterprise data"; the rest of the industry and COPAN's
hundred-odd customers were less charitable, and it only ever found a
place as a backup device.

>
> >To get adequate bandwidth, data needs to be striped across several 10s
> >of disks, and everyone wants to do their backups at the same time. But
> >the COPAN power supplies were inadequate to support all the disks.
> >It's the economics of competing with tape; big power supplies to
> >support 480 disks packed in a single rack cost lots of money.
>
> If you need that much bandwidth from your backup system, the tape
> solution needs a similar number of tape drives (because tape drives
> have a similar bandwidth), and the cost of that would dwarf the costs
> of everything in the disk system, including a power supply for
> spinning all the disks.  And these tape drives would need an even more
> powerful power supply: Looking at
> <https://iq.quantum.com/exLink.asp?8078910OS53M15I46299120>, idle
> power consumption is 6.5W, typical 21.4W, peak 30.2W, i.e., about 2-3
> times of a hard disk drive.
>
> And how are you getting all this bandwidth to and from the backup
> system?  480 disks with, say, 150MB/s each means 72000MB/s.

A lot of connectivity. It's not unusual to see 100s of 8Gb/s FC
interconnects or 6Gb/s SAS, or 10GbE on high end systems; they are
designed to support multiple parallel streams from 100s of systems.
Note that out of 480 disk drives, COPAN could only support 120
spinning.

>
> > Then,
> >when data is required for restore, lots of disks have to be spun down,
> >and others spun up, an activity that draws a lot of juice; it takes as
> >much power to spin up a disk as running it for several minutes.
>
> <http://www.seagate.com/files/staticfiles/docs/pdf/de-DE/datasheet/dis...>
> lists a power-up power consumption of at most 2A @12V, i.e., 24W (my
> own measurements are in the same ballpark), and an average power
> consumption of 8W for the 3TB model.  It takes about 10s to spin up a
> drive, so spinning up takes as much as running a disk for half a
> minute.

Drives vary; SATA drives at 5k RPM spin up faster than high RPM SAS
drives at 15K, which may take minutes to stabilize at operating speed.
During that time, the disk isn't usable, and I stand by my assertion
that spin up wastes as much power as several minutes of full
operation.

>
> > And it
> >takes 10s of minutes to do so as they can't all be powered up at
> >once.
>
> With the power supply you would need for the 480 tape
> drives, yes, you could spin them all up at the same time.  But this is
> typically not needed, certainly not for a saner backup management (but
> neither are the 480 tape drives).

I don't know where you got the idea that 480 tape drives was the
equivalent to 480 disk drives, but it's not an assertion I made and
certainly qualifies as insane.

>
> >For users that want to do restores quickly, it's useless. The power
> >economics vs the high latency, low bandwidth & inconvenience just
> >don't stack up.
>
> Tape loses in power, latency, bandwidth, and convenience.  The only
> thing where it wins is cost for low-bandwidth high-capacity storage.
> Taking the numbers from my price watch site, I get:
>
> EUR 46/TB for external 3TB disks (similar price for internal disks)
> EUR 1311 for an internal LTO-5 tape drive (>1700 for external)
> EUR 41 for a 1.5TB LTO-5 tape (EUR27/TB)
>
> the crossover is at about 69 TB per tape drive (higher if you use
> external tape drives).  If you want reliable and timely access to your
> tapes, you need at least two tape drives, so tape is only cheaper if
> you want to store more than 138TB on it (and even then it still has
> all the other disadvantages).
>
> >For the occasional server with a handful of disk
> >drives, it's not so much of a problem, but at scale, even a moderate
> >scale in the 10s of TB range, it's unworkable.
>
> That's nonsense.

Why? The limiting factor isn't the disk or tape that you're backing up
to, but how fast you can shovel it off the server.

>
> >Again, the power economics
> >don't make sense for main storage where a complete stripe of 10s or
> >more of them need spun up to get at a single 4K file.
>
> Yes, striping (RAID-0) a 4KB file across tens of disks does not make
> sense.

Ignoring RAID-0, since RAID-any systems also stripe, the problem is
that such files do get spread across an unknown number of disks. They
all need fired up to find even the smallest file, since it's not just
the file, but the meta data that needs accessed too.

>
> >Believe me, you're gambling with your
> >current backup strategy...)
>
> The only thing you know about our current backup strategy is that we
> use disks and spin-down.  If by "gambling" you mean that we are
> relying on luck, no, not much.  Of course there is the possibility
> that everything fails at the same time, but that possibility is not
> exclusive to disk drives.  Actually the probability that two tape
> drives fail before we get a replacement is much higher than the
> probability that all the disks on which we have our backups fail
> between two backups.

That's true. I didn't mean to imply that your backup strategy wasn't
thoughtful or adequate, but it's my experience that such things are
rarely on anyone's mind until they fail to provide an adequate
restore, particularly when disaster strikes. A fire a few years ago at
Edinburgh Uni destroyed much of the AI department; they had just
implemented a DR system that saved the main electronic archives
(although much personal research data & the non-digitized archive was
lost). Good on them for recognizing at least part of the problem; many
don't until it's too late.

>
> - anton
> --
> M. Anton Ertl  http://www.complang.tuwien.ac.at/anton/home.html
> comp.lang.forth FAQs:http://www.complang.tuwien.ac.at/forth/faq/toc.html
>      New standard:http://www.forth200x.org/forth200x.html
>    EuroForth 2012:http://www.euroforth.org/ef12/

[toc] | [prev] | [next] | [standalone]

#14893

From	anton@mips.complang.tuwien.ac.at (Anton Ertl)
Date	2012-08-09 13:44 +0000
Message-ID	<2012Aug9.154425@mips.complang.tuwien.ac.at>
In reply to	#14891

Alex McDonald <blog@rivadpm.com> writes:
>On Aug 9, 7:00=A0am, an...@mips.complang.tuwien.ac.at (Anton Ertl)
>wrote:
>> Alex McDonald <b...@rivadpm.com> writes:
>> >On Aug 8, 1:24=3DA0pm, an...@mips.complang.tuwien.ac.at (Anton Ertl)
>> >wrote:
>> >> Alex McDonald <b...@rivadpm.com> writes:
>> [...]
>> >Then the best of luck getting DHL to deliver your 4TB drive in one
>> >piece.
>>
>> The drives are delivered to us in one piece, why wouldn't they
>> delivered elsewhere in one piece.
>
>They don't contain your data.

So what?  If it's broken, I send another one; it's a backup.  It's
redundant, and it's definitely not the only backup.

>> Sounds like you swallowed some horror stories some people like to
>> spin. =A0Why should spin down exacerbate these problems?
>
>Several reasons.
>
>Rated start/stop cycles; 250 average on/off cycles per year at the
>expected population AFR of 0.55% (Seagate Cheetah 15.7, enterprise
>class drive).

What does AFR have to do with the horror stories about corrupted data?
And anyone who uses "enterprise class" drives for backup has too much
money.

>Low temperature operation; the AFR increases significantly (5 times
>the AFR at <20C to those running >40C, Google study on 100000 desktop
>class drives), and spun down drives will be cooler during early hours
>of operation.

Fortunately even our spun-down drives have a higher temperature.  And
again, what does AFR have to do with your horror stories about
corrupted data?

>Slow spin; heads are designed for flight at a given RPM. Slow spin
>reduces the air cushion/head height and makes the drives more
>susceptible to shock. Even at full speed they can be shouted into
>submission; http://www.youtube.com/watch?v=3DtDacjrSCeq4&feature=3Dplayer_e=
>mbedded

What do head crashes resulting from shock have to do with the horror
stories about corrupted data?

>Due to commercial NDAs and other reasons, I can't do any better than
>point you at what is publicly available. Our AFRs are much lower for a
>variety of reasons; dual parity RAID

How does RAID make individual drives more reliable?

>> BTW, in my experience (based on several occasions) the most frequent
>> cause of corrupted disk blocks is due to misdesigned drives that do not
>> react correctly to power fluctuations.
>
>That is rarely a problem on a well designed storage array, where the
>power management is more sophisticated than that of a server. Pulling
>the plug on such a system should have no deleterious effects.

It's also not a problem for well-designed disk drives, but yes, to
some extent the power supply can alleviate the problems coming from
misdesigned drives; but if the problem is between the power supply and
the drive (i.e., a suboptimal power connection), the misdesigned drive
will still produce corrupt blocks.

>> > Then,
>> >when data is required for restore, lots of disks have to be spun down,
>> >and others spun up, an activity that draws a lot of juice; it takes as
>> >much power to spin up a disk as running it for several minutes.
>>
>> <http://www.seagate.com/files/staticfiles/docs/pdf/de-DE/datasheet/dis...=
>>
>> lists a power-up power consumption of at most 2A @12V, i.e., 24W (my
>> own measurements are in the same ballpark), and an average power
>> consumption of 8W for the 3TB model. =A0It takes about 10s to spin up a
>> drive, so spinning up takes as much as running a disk for half a
>> minute.
>
>Drives vary; SATA drives at 5k RPM spin up faster than high RPM SAS
>drives at 15K, which may take minutes to stabilize at operating speed.
>During that time, the disk isn't usable, and I stand by my assertion
>that spin up wastes as much power as several minutes of full
>operation.

Sure, if a drive takes several minutes to spin up, it will consume as
much power as several minutes of full operation.

But who in his right mind uses an expensive and power-hungry high-RPM
drive that takes forever to spin up for a storage solution that
requires low power and fast spin-up?  Ok, a sales guy selling to a
clueless and rich customer will do it, but not because of technical
merit.

>> With the power supply you would need for the 480 tape
>> drives, yes, you could spin them all up at the same time. =A0But this is
>> typically not needed, certainly not for a saner backup management (but
>> neither are the 480 tape drives).
>
>I don't know where you got the idea that 480 tape drives was the
>equivalent to 480 disk drives, but it's not an assertion I made and
>certainly qualifies as insane.

You claimed that lots of disks had to be spun up for bandwidth
reasons, and you wrote:

|It's the economics of competing with tape; big power supplies to
|support 480 disks packed in a single rack cost lots of money.

which suggest that you think that a backup solution needs 480 disks
spun up for bandwidth reasons.

>> >For the occasional server with a handful of disk
>> >drives, it's not so much of a problem, but at scale, even a moderate
>> >scale in the 10s of TB range, it's unworkable.
>>
>> That's nonsense.
>
>Why? The limiting factor isn't the disk or tape that you're backing up
>to, but how fast you can shovel it off the server.

It's nonsense, because we are backing up to disks with a total of 10s
of TB, and it's workable, and if we wanted to back up to more disks,
we would just use more disks.  And the main bandwidth limit is, as you
write, getting the data off the main storage.

>> >Again, the power economics
>> >don't make sense for main storage where a complete stripe of 10s or
>> >more of them need spun up to get at a single 4K file.
>>
>> Yes, striping (RAID-0) a 4KB file across tens of disks does not make
>> sense.
>
>Ignoring RAID-0, since RAID-any systems also stripe,

RAID-1 doesn't.

>the problem is
>that such files do get spread across an unknown number of disks.

With typical block sizes, a 4KB block is not distributed across
multiple disks, even with RAID-0.

>They
>all need fired up to find even the smallest file, since it's not just
>the file, but the meta data that needs accessed too.

Meta data is often in OS caches, at least on decent OSs.

But yes, I agree that spin-down is not practical for main storage; but
from what I read, the idea of COPAN was to make it practical by
rearranging data such that frequently-accessed data resides on a few
drives.  Anyway, for backups spin-down is totally practical, certainly
the way we do our backups.  When the backup is written (or read), the
disk spins up, and some time after the access, it spins down.

- anton
-- 
M. Anton Ertl  http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
     New standard: http://www.forth200x.org/forth200x.html
   EuroForth 2012: http://www.euroforth.org/ef12/

[toc] | [prev] | [next] | [standalone]

#14896

From	Alex McDonald <blog@rivadpm.com>
Date	2012-08-09 10:21 -0700
Message-ID	<46bbc85d-2dab-40cd-a4a0-6e4550e969b5@i7g2000vbc.googlegroups.com>
In reply to	#14893

On Aug 9, 2:44 pm, an...@mips.complang.tuwien.ac.at (Anton Ertl)
wrote:
> Alex McDonald <b...@rivadpm.com> writes:
> >On Aug 9, 7:00=A0am, an...@mips.complang.tuwien.ac.at (Anton Ertl)
> >wrote:
> >> Alex McDonald <b...@rivadpm.com> writes:
> >> >On Aug 8, 1:24=3DA0pm, an...@mips.complang.tuwien.ac.at (Anton Ertl)
> >> >wrote:
> >> >> Alex McDonald <b...@rivadpm.com> writes:
> >> [...]
> >> >Then the best of luck getting DHL to deliver your 4TB drive in one
> >> >piece.
>
> >> The drives are delivered to us in one piece, why wouldn't they
> >> delivered elsewhere in one piece.
>
> >They don't contain your data.
>
> So what?  If it's broken, I send another one; it's a backup.  It's
> redundant, and it's definitely not the only backup.
>
> >> Sounds like you swallowed some horror stories some people like to
> >> spin. =A0Why should spin down exacerbate these problems?
>
> >Several reasons.
>
> >Rated start/stop cycles; 250 average on/off cycles per year at the
> >expected population AFR of 0.55% (Seagate Cheetah 15.7, enterprise
> >class drive).
>
> What does AFR have to do with the horror stories about corrupted data?

AFR includes corrupted data.

> And anyone who uses "enterprise class" drives for backup has too much
> money.

Why? Since many operations value data integrity greater than the cost,
this is an economic argument, not one of wealth causing stupidity.

>
> >Low temperature operation; the AFR increases significantly (5 times
> >the AFR at <20C to those running >40C, Google study on 100000 desktop
> >class drives), and spun down drives will be cooler during early hours
> >of operation.
>
> Fortunately even our spun-down drives have a higher temperature.  And
> again, what does AFR have to do with your horror stories about
> corrupted data?

AFR includes corrupted data.

>
> >Slow spin; heads are designed for flight at a given RPM. Slow spin
> >reduces the air cushion/head height and makes the drives more
> >susceptible to shock. Even at full speed they can be shouted into
> >submission;http://www.youtube.com/watch?v=3DtDacjrSCeq4&feature=3Dplayer_e=
> >mbedded
>
> What do head crashes resulting from shock have to do with the horror
> stories about corrupted data?

Shock can cause high flying writes; the exact opposite of a head
crash. The data isn't written. What is being demonstrated in the video
is the effect of drive recovery (which will be successful if the
software is up to the task, something that most OSes find hard to deal
with) on response time as disks fail to write data.

And AFR includes corrupted data. I'm mystified; where did I say that
corrupted data was the only issue?

>
> >Due to commercial NDAs and other reasons, I can't do any better than
> >point you at what is publicly available. Our AFRs are much lower for a
> >variety of reasons; dual parity RAID
>
> How does RAID make individual drives more reliable?

It doesn't. It makes them collectively more reliable.

>
> >> BTW, in my experience (based on several occasions) the most frequent
> >> cause of corrupted disk blocks is due to misdesigned drives that do not
> >> react correctly to power fluctuations.
>
> >That is rarely a problem on a well designed storage array, where the
> >power management is more sophisticated than that of a server. Pulling
> >the plug on such a system should have no deleterious effects.
>
> It's also not a problem for well-designed disk drives, but yes, to
> some extent the power supply can alleviate the problems coming from
> misdesigned drives; but if the problem is between the power supply and
> the drive (i.e., a suboptimal power connection), the misdesigned drive
> will still produce corrupt blocks.
>
>

Caveat emptor.

>
>
>
>
>
>
>
> >> > Then,
> >> >when data is required for restore, lots of disks have to be spun down,
> >> >and others spun up, an activity that draws a lot of juice; it takes as
> >> >much power to spin up a disk as running it for several minutes.
>
> >> <http://www.seagate.com/files/staticfiles/docs/pdf/de-DE/datasheet/dis...
>
> >> lists a power-up power consumption of at most 2A @12V, i.e., 24W (my
> >> own measurements are in the same ballpark), and an average power
> >> consumption of 8W for the 3TB model. =A0It takes about 10s to spin up a
> >> drive, so spinning up takes as much as running a disk for half a
> >> minute.
>
> >Drives vary; SATA drives at 5k RPM spin up faster than high RPM SAS
> >drives at 15K, which may take minutes to stabilize at operating speed.
> >During that time, the disk isn't usable, and I stand by my assertion
> >that spin up wastes as much power as several minutes of full
> >operation.
>
> Sure, if a drive takes several minutes to spin up, it will consume as
> much power as several minutes of full operation.
>
> But who in his right mind uses an expensive and power-hungry high-RPM
> drive that takes forever to spin up for a storage solution that
> requires low power and fast spin-up?  Ok, a sales guy selling to a
> clueless and rich customer will do it, but not because of technical
> merit.

I was giving an example of slow spin up to counterpoint the "10
seconds and you're good to go" example you gave.

To spin up a RAID group of say 14 drives on a shelf of disks will
require that the drives are turned on serially in small groups. By the
time they're all turned on and ready to go, regardless of whether
they're SATA or SAS, enterprise or desktop, slow or fast, a certain
amount of time will have elapsed. In the case of systems I know and
understand -- the majority of commercially available systems --
minutes will have passed during which there has been (a) no productive
work and (b) higher than average power consumption. Then there's the
decision on when to power down; that's made after a period of
inactivity, during which there has been no productive work and
continued power consumption.

All spin-down/up schemes for infrequently accessed data have to
account for these issues, and none do so in any effective way since
crystal balls aren't part of the armoury of most storage management
systems. That's where the cluelessness plays its part.

>
> >> With the power supply you would need for the 480 tape
> >> drives, yes, you could spin them all up at the same time. =A0But this is
> >> typically not needed, certainly not for a saner backup management (but
> >> neither are the 480 tape drives).
>
> >I don't know where you got the idea that 480 tape drives was the
> >equivalent to 480 disk drives, but it's not an assertion I made and
> >certainly qualifies as insane.
>
> You claimed that lots of disks had to be spun up for bandwidth
>
> reasons, and you wrote:
>
> |It's the economics of competing with tape; big power supplies to
> |support 480 disks packed in a single rack cost lots of money.
>
> which suggest that you think that a backup solution needs 480 disks
> spun up for bandwidth reasons.

No, that was the COPAN solution. (IIRC it was the smallest COPAN
system you could buy.) Streaming backups is not a difficult task; if
all you have is a single stream, then a couple of active disks will
do. For 100s of streams to a single backup system, then you need a lot
more, and the task is correspondingly more complicated to achieve at
decent speeds.

>
> >> >For the occasional server with a handful of disk
> >> >drives, it's not so much of a problem, but at scale, even a moderate
> >> >scale in the 10s of TB range, it's unworkable.
>
> >> That's nonsense.
>
> >Why? The limiting factor isn't the disk or tape that you're backing up
> >to, but how fast you can shovel it off the server.
>
> It's nonsense, because we are backing up to disks with a total of 10s
> of TB, and it's workable, and if we wanted to back up to more disks,
> we would just use more disks.  And the main bandwidth limit is, as you
> write, getting the data off the main storage.

That was my point. If you want off-server backup, then the bandwidth
off the server is the issue. That's what kills very large disk server
systems from doing adequate & timely backups; not everyone has a
backup window. Adding more disks inside the same box isn't a backup.

>
> >> >Again, the power economics
> >> >don't make sense for main storage where a complete stripe of 10s or
> >> >more of them need spun up to get at a single 4K file.
>
> >> Yes, striping (RAID-0) a 4KB file across tens of disks does not make
> >> sense.
>
> >Ignoring RAID-0, since RAID-any systems also stripe,
>
> RAID-1 doesn't.

True, if a nit pick, since the stripe is a mirror.

>
> >the problem is
> >that such files do get spread across an unknown number of disks.
>
> With typical block sizes, a 4KB block is not distributed across
> multiple disks, even with RAID-0.

It would appear on at least 3 disks in most modern systems using large
multi TB disks with adequate protection like RAID-6. Once as a data
block, and twice for its contribution to parity. It's at least 2 on
RAID-5 or RAID-1/10; it may be many more on systems that employ
erasure encoding schemes. Without meta data (see below), it's not
possible to tell which disks to fire up to cover the blocks in
question; and the meta data is on the disks, normally well distributed
over them to increase opportunities for parallelism.

>
> >They
> >all need fired up to find even the smallest file, since it's not just
> >the file, but the meta data that needs accessed too.
>
> Meta data is often in OS caches, at least on decent OSs.

In shared system environments, caches can and do contain stale
information; coherency is a big issue, and high end clusters (both
storage and server types) spend a lot of expensive compute and wire
time (and presumably power) making sure that they are consistent.
Plus, infrequently used data should be flushed, along with its meta
data; if you don't need the former, you're unlikely to need the latter
any time soon.

>
> But yes, I agree that spin-down is not practical for main storage; but
> from what I read, the idea of COPAN was to make it practical by
> rearranging data such that frequently-accessed data resides on a few
> drives.

At last! Agreement! Yes, that was the very thing they failed to
accomplish.

> Anyway, for backups spin-down is totally practical, certainly
> the way we do our backups.  When the backup is written (or read), the
> disk spins up, and some time after the access, it spins down.
>
> - anton
> --
> M. Anton Ertl  http://www.complang.tuwien.ac.at/anton/home.html
> comp.lang.forth FAQs:http://www.complang.tuwien.ac.at/forth/faq/toc.html
>      New standard:http://www.forth200x.org/forth200x.html
>    EuroForth 2012:http://www.euroforth.org/ef12/

[toc] | [prev] | [next] | [standalone]

#14897

From	Bernd Paysan <bernd.paysan@gmx.de>
Date	2012-08-09 19:50 +0200
Message-ID	<16202327.eVmLZQkrAi@sunwukong.fritz.box>
In reply to	#14896

Alex McDonald wrote:
>> >Rated start/stop cycles; 250 average on/off cycles per year at the
>> >expected population AFR of 0.55% (Seagate Cheetah 15.7, enterprise
>> >class drive).
>>
>> What does AFR have to do with the horror stories about corrupted
>> data?
> 
> AFR includes corrupted data.

I can believe that a 15k RPM drive which takes minutes to stabilize will 
have start-stop problems, and will even have data corruptions in 
operation by vibrations causing to write over the next track.  This 
simply means that these drives are not built for reliability, but for 
speed.  We are talking about backup here.  If you think the Cheetah 15.7 
is the right drive to backup your data, you are simply wrong - you need 
an elephant for backups, not a cheetah.  I think you are simply wrong by 
buying the Cheetah at all (no matter what metric), and not an SSD for 
the same price per gigabyte - if you need the performance, the SSD will 
beat the Cheetah hands down.

-- 
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://bernd-paysan.de/

[toc] | [prev] | [next] | [standalone]

#14900

From	Alex McDonald <blog@rivadpm.com>
Date	2012-08-09 12:32 -0700
Message-ID	<45e00baf-cee5-4f1c-9e9a-ffae8efae594@j11g2000vbc.googlegroups.com>
In reply to	#14897

On Aug 9, 6:50 pm, Bernd Paysan <bernd.pay...@gmx.de> wrote:
> Alex McDonald wrote:
> >> >Rated start/stop cycles; 250 average on/off cycles per year at the
> >> >expected population AFR of 0.55% (Seagate Cheetah 15.7, enterprise
> >> >class drive).
>
> >> What does AFR have to do with the horror stories about corrupted
> >> data?
>
> > AFR includes corrupted data.
>
> I can believe that a 15k RPM drive which takes minutes to stabilize

It takes a lot longer to get up to a stable spin speed, yes.

> will
> have start-stop problems, and will even have data corruptions in
> operation by vibrations causing to write over the next track.

Politely put, I'd say you were guessing, and that's not what I said.

> This
> simply means that these drives are not built for reliability, but for
> speed.

Because of the guesswork in the previous sentence, no doubt.

> We are talking about backup here.  If you think the Cheetah 15.7
> is the right drive to backup your data,

I don't think and didn't say any such thing. This was in the context
of spin down and the subsequent reliability, availability and so on vs
power savings that could be achieved as a "main storage" system. See
my reply to Anton.

> you are simply wrong - you need

Well, there's a surprise. Strawman up...

> an elephant for backups, not a cheetah.  I think you are simply wrong by
> buying the Cheetah at all (no matter what metric), and not an SSD for
> the same price per gigabyte - if you need the performance, the SSD will
> beat the Cheetah hands down.

...and knocked down.

Look, if you're happy with backups to large TB desktop class drives
and can afford the time and effort to do it several times to avoid the
lottery that are unrecoverable disk errors, good on you. I'll withdraw
my "best of luck" comment and reserve it for the companies that take
your approach but go down the pan while footering around looking for
an end to end accurate & readable copy to do a restore.

>
> --
> Bernd Paysan
> "If you want it done right, you have to do it yourself"http://bernd-paysan.de/

[toc] | [prev] | [next] | [standalone]

#14901

From	Bernd Paysan <bernd.paysan@gmx.de>
Date	2012-08-09 22:07 +0200
Message-ID	<2087683.EjgH0TgcYv@sunwukong.fritz.box>
In reply to	#14900

Alex McDonald wrote:
> Look, if you're happy with backups to large TB desktop class drives
> and can afford the time and effort to do it several times to avoid the
> lottery that are unrecoverable disk errors, good on you. I'll withdraw
> my "best of luck" comment and reserve it for the companies that take
> your approach but go down the pan while footering around looking for
> an end to end accurate & readable copy to do a restore.

Honestly, I don't understand what you mean.  No media is completely 100% 
reliable and error-proof.  When I did tape backups, I had them stored 
off-site, and I carried them to the off-site storage by bike.  So 
there's always the risk of a bus driving over the tape or the hard disk 
(this is regardless of how you transport them).  In either case, the 
medium is gone, they will not survive.  So whatever you do, you must 
make sure that this is not the only backup you have.

And the hard disk is not a tape.  If you have really bad luck, and you 
end up in a situation where both hard disks you made the backup on have 
non-recoverable read errors on several blocks, you just mount them 
RAID-1, and read the RAID volume.  The RAID controller (or the software 
that mimics a RAID controller) will do all the work for your.  The RAID 
controller also does the work for you to create duplicated backups, 
almost effort-less.

The only medium I bought in my five year IT-side-job carreer that was 
damaged beyond recovery was a LTO tape.  The LTO drive ate it.  We got a 
replacement for the drive on warranty, but the tape was completely 
destroyed.  This wasn't a problem, as said above - the tape was just one 
part of the redundant backup strategy, and it was destroyed while 
writing.  The other medium that did fail wasn't bought by me, and it was 
an expensive SAS drive - and this left the server without spares, 
because due to the high price, the IT department didn't have hot spares, 
and due to incompetence, they weren't informed about the problem.  I 
just saw the red light blinking on their server in the server room.  The 
cheap desktop harddisks I bought and I intented to replace after two 
years with newer, higher capacity ones, lasted all five years, because 
the bosses didn't understand why you should replace things which work 
perfectly fine ;-).

-- 
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://bernd-paysan.de/

[toc] | [prev] | [next] | [standalone]

#14904

From	Alex McDonald <blog@rivadpm.com>
Date	2012-08-09 13:58 -0700
Message-ID	<45b0cacd-fa89-4273-b991-39152ef433fa@b10g2000vbj.googlegroups.com>
In reply to	#14901

On Aug 9, 9:07 pm, Bernd Paysan <bernd.pay...@gmx.de> wrote:
> Alex McDonald wrote:
> > Look, if you're happy with backups to large TB desktop class drives
> > and can afford the time and effort to do it several times to avoid the
> > lottery that are unrecoverable disk errors, good on you. I'll withdraw
> > my "best of luck" comment and reserve it for the companies that take
> > your approach but go down the pan while footering around looking for
> > an end to end accurate & readable copy to do a restore.
>
> Honestly, I don't understand what you mean.  No media is completely 100%
> reliable and error-proof.  When I did tape backups, I had them stored
> off-site, and I carried them to the off-site storage by bike.  So
> there's always the risk of a bus driving over the tape or the hard disk
> (this is regardless of how you transport them).  In either case, the
> medium is gone, they will not survive.  So whatever you do, you must
> make sure that this is not the only backup you have.
>
> And the hard disk is not a tape.  If you have really bad luck, and you
> end up in a situation where both hard disks you made the backup on have
> non-recoverable read errors on several blocks, you just mount them
> RAID-1, and read the RAID volume.  The RAID controller (or the software
> that mimics a RAID controller) will do all the work for your.  The RAID
> controller also does the work for you to create duplicated backups,
> almost effort-less.
>
> The only medium I bought in my five year IT-side-job carreer that was
> damaged beyond recovery was a LTO tape.  The LTO drive ate it.  We got a
> replacement for the drive on warranty, but the tape was completely
> destroyed.  This wasn't a problem, as said above - the tape was just one
> part of the redundant backup strategy, and it was destroyed while
> writing.  The other medium that did fail wasn't bought by me, and it was
> an expensive SAS drive - and this left the server without spares,
> because due to the high price, the IT department didn't have hot spares,
> and due to incompetence, they weren't informed about the problem.  I
> just saw the red light blinking on their server in the server room.  The
> cheap desktop harddisks I bought and I intented to replace after two
> years with newer, higher capacity ones, lasted all five years, because
> the bosses didn't understand why you should replace things which work
> perfectly fine ;-).
>
> --
> Bernd Paysan
> "If you want it done right, you have to do it yourself"http://bernd-paysan.de/


Humans are notoriously bad at assessing risk; crossing the road vs
flying will produce all sorts of negative responses for flying when
it's demonstrably safer. Very few IT specialists understand risk
assessment either; identify, estimate, evaluate, mitigate,
communicate, measure. Even I have sometimes forgotten this, and I was
recently undone by an unprofessional approach to my personal and my
company's data. The last disk drive I bought as a backup & archive
failed after a month. It was a desktop class MLC SSD. I will not be
repeating the experiment.

[toc] | [prev] | [next] | [standalone]

#14906

From	Paul Rubin <no.email@nospam.invalid>
Date	2012-08-09 17:36 -0700
Message-ID	<7xobmjk2gi.fsf@ruckus.brouhaha.com>
In reply to	#14900

Alex McDonald <blog@rivadpm.com> writes:
> Look, if you're happy with backups to large TB desktop class drives
> and can afford the time and effort to do it several times to avoid the
> lottery that are unrecoverable disk errors, good on you. I'll withdraw
> my "best of luck" comment and reserve it for the companies that take
> your approach but go down the pan while footering around looking for
> an end to end accurate & readable copy to do a restore.

I don't understand what the big deal is.

1) If your data is valuable, you need multiple backups in physically
dispersed locations in case of earthquake, meteor, etc. regardless.

2) The issue of disk errors is handled by a) redundancy within the
backup set (RAID and maybe some ECC applied within the dump streams),
plus storing checksums in the metadata and doing a verification pass
after writing the data.  This is surely more cost effective than using
drives that are 2x as expensive so you can get by with a few percent
less redundancy.

[toc] | [prev] | [next] | [standalone]

#14907

From	Alex McDonald <blog@rivadpm.com>
Date	2012-08-10 04:13 -0700
Message-ID	<12b725a7-de8a-4dba-bb51-043077c8adb6@m13g2000vbd.googlegroups.com>
In reply to	#14906

On Aug 10, 1:36 am, Paul Rubin <no.em...@nospam.invalid> wrote:
> Alex McDonald <b...@rivadpm.com> writes:
> > Look, if you're happy with backups to large TB desktop class drives
> > and can afford the time and effort to do it several times to avoid the
> > lottery that are unrecoverable disk errors, good on you. I'll withdraw
> > my "best of luck" comment and reserve it for the companies that take
> > your approach but go down the pan while footering around looking for
> > an end to end accurate & readable copy to do a restore.
>
> I don't understand what the big deal is.
>
> 1) If your data is valuable, you need multiple backups in physically
> dispersed locations in case of earthquake, meteor, etc. regardless.
>
> 2) The issue of disk errors is handled by a) redundancy within the
> backup set (RAID and maybe some ECC applied within the dump streams),
> plus storing checksums in the metadata and doing a verification pass
> after writing the data.  This is surely more cost effective than using
> drives that are 2x as expensive so you can get by with a few percent
> less redundancy.

We've been over a lot of ground (probably OT for CLF, but even so more
interesting than Gavino on-topic).

I haven't advocated "2x more expensive drives" because I'm paid a
penny on every sale. There was also some discussion about the
bandwidth of shipping data that got lost in airline timetables and the
quality of coffee but I haven't suggested that the airlines should
drop their prices or that datacenters should be near sources of fine
Arabica beans either (well, perhaps I did tongue in cheek to Anton).

All I'm advocating is a robust backup (and I provided some information
to explain what can mitigate the issues of data corruption or loss),
and disk dumps to large multi TB destktop drives is a no-no in my
book. The rest fell out of that discussion.

[toc] | [prev] | [next] | [standalone]

#14938

From	Paul Rubin <no.email@nospam.invalid>
Date	2012-08-11 20:27 -0700
Message-ID	<7xfw7sx00o.fsf@ruckus.brouhaha.com>
In reply to	#14907

Alex McDonald <blog@rivadpm.com> writes:
> All I'm advocating is a robust backup (and I provided some information
> to explain what can mitigate the issues of data corruption or loss),
> and disk dumps to large multi TB destktop drives is a no-no in my
> book. The rest fell out of that discussion.

OK, I'm just missing the part about what's wrong with desktop drives
compared with enterprise drives.  You listed a number of issues but it
seems to me that all of them can be handled by software.  When 100's or
1000's of drives are involved, a 2x cost difference per drive adds up to
a lot of cash, so it has to be justified rather rigorously.

[toc] | [prev] | [next] | [standalone]

Page 2 of 4 — ← Prev page 1 [2] 3 4 Next page →

csiph-web

Implementing virtual memory on cassette tape

Contents

#14866

#14869

#14874

#14892

#14895

#14902

#14905

#14870

#14877

#14883

#14891

#14893

#14896

#14897

#14900

#14901

#14904

#14906

#14907

#14938