Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.os.linux.misc > #605 > unrolled thread

linux raid vs hw raid

Started byKeith Keller <kkeller-usenet@wombat.san-francisco.ca.us>
First post2011-04-05 19:39 -0700
Last post2011-04-12 03:37 +0000
Articles 20 on this page of 49 — 12 participants

Back to article view | Back to comp.os.linux.misc


Contents

  linux raid vs hw raid Keith Keller <kkeller-usenet@wombat.san-francisco.ca.us> - 2011-04-05 19:39 -0700
    Re: linux raid vs hw raid Tim Watts <tw@dionic.net> - 2011-04-06 08:01 +0100
      Re: linux raid vs hw raid David Brown <david@westcontrol.removethisbit.com> - 2011-04-06 10:03 +0200
        Re: linux raid vs hw raid Keith Keller <kkeller-usenet@wombat.san-francisco.ca.us> - 2011-04-06 14:00 -0700
          Re: linux raid vs hw raid David Brown <david.brown@removethis.hesbynett.no> - 2011-04-06 23:42 +0200
          Re: linux raid vs hw raid Grant <omg@grrr.id.au> - 2011-04-08 10:45 +1000
            Re: linux raid vs hw raid David Brown <david@westcontrol.removethisbit.com> - 2011-04-08 11:12 +0200
              Re: linux raid vs hw raid Keith Keller <kkeller-usenet@wombat.san-francisco.ca.us> - 2011-04-08 08:22 -0700
                Re: linux raid vs hw raid Grant <omg@grrr.id.au> - 2011-04-09 09:51 +1000
                  Re: linux raid vs hw raid Keith Keller <kkeller-usenet@wombat.san-francisco.ca.us> - 2011-04-08 17:10 -0700
                    Re: linux raid vs hw raid David Brown <david.brown@removethis.hesbynett.no> - 2011-04-09 13:14 +0200
              Re: linux raid vs hw raid Grant <omg@grrr.id.au> - 2011-04-09 09:47 +1000
                Re: linux raid vs hw raid David Brown <david.brown@removethis.hesbynett.no> - 2011-04-09 13:55 +0200
            Re: linux raid vs hw raid Tris Orendorff <triso@remove-me.cogeco.ca> - 2011-04-12 18:04 +0000
              Re: linux raid vs hw raid Keith Keller <kkeller-usenet@wombat.san-francisco.ca.us> - 2011-04-12 11:34 -0700
                Re: linux raid vs hw raid The Natural Philosopher <tnp@invalid.invalid> - 2011-04-12 21:13 +0100
                  Re: linux raid vs hw raid David Brown <david@westcontrol.removethisbit.com> - 2011-04-13 09:45 +0200
                    Re: linux raid vs hw raid Grant <omg@grrr.id.au> - 2011-04-14 13:42 +1000
                      Re: linux raid vs hw raid David Brown <david@westcontrol.removethisbit.com> - 2011-04-14 09:15 +0200
                        Re: linux raid vs hw raid Grant <omg@grrr.id.au> - 2011-04-15 08:03 +1000
                          Re: linux raid vs hw raid Tim Watts <tw@dionic.net> - 2011-04-15 07:22 +0100
                          Re: linux raid vs hw raid David Brown <david@westcontrol.removethisbit.com> - 2011-04-15 09:28 +0200
                            Re: linux raid vs hw raid Grant <omg@grrr.id.au> - 2011-04-19 11:20 +1000
                Re: linux raid vs hw raid Grant <omg@grrr.id.au> - 2011-04-14 13:38 +1000
                  Re: linux raid vs hw raid Keith Keller <kkeller-usenet@wombat.san-francisco.ca.us> - 2011-04-13 21:49 -0700
              Re: linux raid vs hw raid Grant <omg@grrr.id.au> - 2011-04-14 13:34 +1000
                Re: linux raid vs hw raid Tris Orendorff <triso@remove-me.cogeco.ca> - 2011-04-15 21:59 +0000
                  Re: linux raid vs hw raid "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2011-04-16 00:56 +0200
                    Re: linux raid vs hw raid The Natural Philosopher <tnp@invalid.invalid> - 2011-04-16 01:32 +0100
    Re: linux raid vs hw raid Tauno Voipio <tauno.voipio@notused.fi.invalid> - 2011-04-08 21:38 +0300
      Re: linux raid vs hw raid Grant <omg@grrr.id.au> - 2011-04-09 09:53 +1000
    Re: linux raid vs hw raid KR <kristian.rasmussen@broadpark.no.spam.com> - 2011-04-09 11:56 +0200
      Re: linux raid vs hw raid Keith Keller <kkeller-usenet@wombat.san-francisco.ca.us> - 2011-04-09 10:32 -0700
        Re: linux raid vs hw raid Grant <omg@grrr.id.au> - 2011-04-10 11:12 +1000
          Re: linux raid vs hw raid Keith Keller <kkeller-usenet@wombat.san-francisco.ca.us> - 2011-04-09 18:59 -0700
            Re: linux raid vs hw raid KR <kristian.rasmussen@broadpark.no.spam.com> - 2011-04-10 04:32 +0200
              Re: linux raid vs hw raid Grant <omg@grrr.id.au> - 2011-04-10 12:46 +1000
                Re: linux raid vs hw raid Keith Keller <kkeller-usenet@wombat.san-francisco.ca.us> - 2011-04-09 20:39 -0700
      Re: linux raid vs hw raid Robert Riches <spamtrap42@jacob21819.net> - 2011-04-10 03:47 +0000
        Re: linux raid vs hw raid Balwinder S Dheeman <bsd.SANSPAM@anu.homelinux.net> - 2011-04-10 11:11 +0530
          Re: linux raid vs hw raid Keith Keller <kkeller-usenet@wombat.san-francisco.ca.us> - 2011-04-09 23:29 -0700
            Re: linux raid vs hw raid Balwinder S Dheeman <bsd.SANSPAM@anu.homelinux.net> - 2011-04-10 14:05 +0530
          Re: linux raid vs hw raid Grant <omg@grrr.id.au> - 2011-04-10 20:16 +1000
            Re: linux raid vs hw raid Tim Watts <tw@dionic.net> - 2011-04-10 11:28 +0100
            Re: linux raid vs hw raid Balwinder S Dheeman <bsd.SANSPAM@anu.homelinux.net> - 2011-04-10 19:43 +0530
          Re: linux raid vs hw raid Robert Riches <spamtrap42@jacob21819.net> - 2011-04-12 03:44 +0000
            Re: linux raid vs hw raid Balwinder S Dheeman <bsd.SANSPAM@anu.homelinux.net> - 2011-04-12 13:56 +0530
        Re: linux raid vs hw raid Grant <omg@grrr.id.au> - 2011-04-10 20:09 +1000
          Re: linux raid vs hw raid Robert Riches <spamtrap42@jacob21819.net> - 2011-04-12 03:37 +0000

Page 1 of 3  [1] 2 3  Next page →


#605 — linux raid vs hw raid

FromKeith Keller <kkeller-usenet@wombat.san-francisco.ca.us>
Date2011-04-05 19:39 -0700
Subjectlinux raid vs hw raid
Message-ID<fc0t68x5ci.ln2@goaway.wombat.san-francisco.ca.us>
Hi all,

I am attempting to build a snapshot server for a ~15TB fileserver with
old fileserver hardware I have on hand.  My initial plan was to use the
hardware card in the old fileserver in a RAID50 (the card is old enough
that it doesn't support RAID6 natively) using new 2TB enterprise hard
drives.  But, as you probably know, these drives are reasonably
expensive.  So, since this machine will not be used by end-users very
much, I was contemplating using linux software raid instead, exporting
desktop-class drives as JBODs and using mdadm to RAID them.

The obvious advantage to this is cost: I can save almost 40% of my
original estimate by using desktop drives instead, thus fulfilling the
original meaning of the I of the RAID acronym.  There are other
advantages, as well, including being able to build a RAID6, which I
slightly prefer over a RAID50, and having more flexibility later on if I
want to move to bigger disks.  (Yes, I have seen the documentation
warning against too-large RAID arrays resulting in a failure during a
rebuild.)  A tertiary advantage would be that I would learn how to work
with linux software RAID, a skill I haven't yet acquired.

The disadvantages I can think of are: higher probability of disk
failures, resulting in more work for me in swapping out and RMAing
failed drives; potential degradation in performance, due both to RAID in
software and slower disks; a learning curve for linux RAID; and a
configuration less likely to be supported by the hardware RAID vendor.

My counters to most of the disadvantages would be that performance only
has to be decent, not great, on this box; the learning curve shouldn't
be too bad; and this configuration shouldn't require support from the
hardware RAID vendor anyway.  The disk failures would be the only issue
I couldn't counter, except by trying to determine if my labor costs
would end up being more than the savings in moving to cheaper disks.

My questions:

1) Has anyone done this before, and if so, what were the results?  Was
performance acceptable in this configuration?  Are there any gotchas to
an otherwise workable configuration?

2) From what I've read so far, using desktop-class disks with linux
software RAID should not be a major problem, unlike using them on a true
hardware RAID card.  Is this reasonably accurate?  If not, are there
links that describe the difficulties?

3) Suppose that my RAID6 starts out using 12 2TB disks, with three free
drive bays (one would be a hot spare).  Later on, I want to seamlessly
replace the 2TB disks with 3TB or larger disks.  Can mdadm grow an array
like this if, say, I replace one drive, rebuild, and repeat until I've
replaced all 12 disks with larger ones?  Or will the new 3TB disks only
be used up to 2TB, the size of the original disks?

Thanks for any advice or pointers you can provide!

--keith

-- 
kkeller-usenet@wombat.san-francisco.ca.us
(try just my userid to email me)
AOLSFAQ=http://www.therockgarden.ca/aolsfaq.txt
see X- headers for PGP signature information

[toc] | [next] | [standalone]


#608

FromTim Watts <tw@dionic.net>
Date2011-04-06 08:01 +0100
Message-ID<dmft68-2q8.ln1@squidward.dionic.net>
In reply to#605
Keith Keller wrote:

> Hi all,
> 
> I am attempting to build a snapshot server for a ~15TB fileserver with
> old fileserver hardware I have on hand.  My initial plan was to use the
> hardware card in the old fileserver in a RAID50 (the card is old enough
> that it doesn't support RAID6 natively) using new 2TB enterprise hard
> drives.  But, as you probably know, these drives are reasonably
> expensive.  So, since this machine will not be used by end-users very
> much, I was contemplating using linux software raid instead, exporting
> desktop-class drives as JBODs and using mdadm to RAID them.
> 
> The obvious advantage to this is cost: I can save almost 40% of my
> original estimate by using desktop drives instead, thus fulfilling the
> original meaning of the I of the RAID acronym.  There are other
> advantages, as well, including being able to build a RAID6, which I
> slightly prefer over a RAID50, and having more flexibility later on if I
> want to move to bigger disks.  (Yes, I have seen the documentation
> warning against too-large RAID arrays resulting in a failure during a
> rebuild.)  A tertiary advantage would be that I would learn how to work
> with linux software RAID, a skill I haven't yet acquired.
> 
> The disadvantages I can think of are: higher probability of disk
> failures, resulting in more work for me in swapping out and RMAing
> failed drives; potential degradation in performance, due both to RAID in
> software and slower disks; a learning curve for linux RAID; and a
> configuration less likely to be supported by the hardware RAID vendor.

Hi,

Highly dependant on your server and RAID card of course, but you may find MD 
software raid is quicker.

Even and older server has far more CPU horsepower available compared to a 
mediocre RAID card (and by mediocre, I mean anything costing less than 100's 
pounds.

> My counters to most of the disadvantages would be that performance only
> has to be decent, not great, on this box; the learning curve shouldn't
> be too bad; and this configuration shouldn't require support from the
> hardware RAID vendor anyway.  The disk failures would be the only issue
> I couldn't counter, except by trying to determine if my labor costs
> would end up being more than the savings in moving to cheaper disks.

The learning curve is fairly easy with mdadm - furthermore, linux MD is now 
more functionally complete than all but the better end *modern* hardware 
RAID systems. Specifically, some things linux will do that a lot of 
older/cheaper HW RAID won't:

1) Attempt to rewrite a disck block that has failed to read <- triggers a 
bad block remap on most drives.

2) If you run the monitor daemon, linux will alert you if stuff goes bad, eg 
failed disk (OK, a crappy HW raid knos this, but can it alert you by email 
or just sit there with a falshing red LED?)

3) Perform a full sweep and parity verify on demand?

There are more, but those are what I consider most useful.

> My questions:
> 
> 1) Has anyone done this before, and if so, what were the results?  Was
> performance acceptable in this configuration?  Are there any gotchas to
> an otherwise workable configuration?

Yep - been running SW raid 5 at home on 1.5TB total for 3 years. I have used 
a lot of mid range RAID controllers too (Chaparrel, Infotrend, ARECA, 
Eurologic)

> 2) From what I've read so far, using desktop-class disks with linux
> software RAID should not be a major problem, unlike using them on a true
> hardware RAID card.  Is this reasonably accurate?  If not, are there
> links that describe the difficulties?

Yep - desktop are fine. Enterprise class or "RAID Edition" may be better 
quality and/or quicker. Quicker is usually related to RPM and at least is 
checkable in the specifications. "Well built" is more abstract. I prefer to 
use a mixture of makes in the same server, eg Hitachi, Seagate, Fujitsu, WD) 
- that way, you lessen the risk of the "Maxtor Deathstar" whole buch failing 
at once syndrome.

> 3) Suppose that my RAID6 starts out using 12 2TB disks, with three free
> drive bays (one would be a hot spare).  Later on, I want to seamlessly
> replace the 2TB disks with 3TB or larger disks.  Can mdadm grow an array
> like this if, say, I replace one drive, rebuild, and repeat until I've
> replaced all 12 disks with larger ones?  Or will the new 3TB disks only
> be used up to 2TB, the size of the original disks?

RAID5/6 need to be spread over identically sized partitions. So you can't 
add a 3TB drive to a 2TB disk based array. You can partition and make a new 
RAID across the 1TB partition. This is where ZFS gets clever, but that's not 
really an option for linux (BTRFS will probably get there one day).

> Thanks for any advice or pointers you can provide!

One thing, whichever system you go for: set it up and do some speed and 
breakage tests to make sure it all works correctly - pull a disk out live, 
be sure you know how to put the disk back and bring the array back to fault 
tolerant and stuff like that.

It's good fun, enjoy :)

Cheers

Tim

> --keith
> 

-- 
Tim Watts

[toc] | [prev] | [next] | [standalone]


#610

FromDavid Brown <david@westcontrol.removethisbit.com>
Date2011-04-06 10:03 +0200
Message-ID<Xe-dnXd4LLb1gwHQnZ2dnUVZ7sWdnZ2d@lyse.net>
In reply to#608
On 06/04/2011 09:01, Tim Watts wrote:
> Keith Keller wrote:
>
>> Hi all,
>>
>> I am attempting to build a snapshot server for a ~15TB fileserver with
>> old fileserver hardware I have on hand.  My initial plan was to use the
>> hardware card in the old fileserver in a RAID50 (the card is old enough
>> that it doesn't support RAID6 natively) using new 2TB enterprise hard
>> drives.  But, as you probably know, these drives are reasonably
>> expensive.  So, since this machine will not be used by end-users very
>> much, I was contemplating using linux software raid instead, exporting
>> desktop-class drives as JBODs and using mdadm to RAID them.
>>
>> The obvious advantage to this is cost: I can save almost 40% of my
>> original estimate by using desktop drives instead, thus fulfilling the
>> original meaning of the I of the RAID acronym.  There are other
>> advantages, as well, including being able to build a RAID6, which I
>> slightly prefer over a RAID50, and having more flexibility later on if I
>> want to move to bigger disks.  (Yes, I have seen the documentation
>> warning against too-large RAID arrays resulting in a failure during a
>> rebuild.)  A tertiary advantage would be that I would learn how to work
>> with linux software RAID, a skill I haven't yet acquired.
>>
>> The disadvantages I can think of are: higher probability of disk
>> failures, resulting in more work for me in swapping out and RMAing
>> failed drives; potential degradation in performance, due both to RAID in
>> software and slower disks; a learning curve for linux RAID; and a
>> configuration less likely to be supported by the hardware RAID vendor.
>
> Hi,
>
> Highly dependant on your server and RAID card of course, but you may find MD
> software raid is quicker.
>
> Even and older server has far more CPU horsepower available compared to a
> mediocre RAID card (and by mediocre, I mean anything costing less than 100's
> pounds.
>

I'd go further than that and say that software raid will be faster 
unless your hardware raid card costs many 1000's of pounds.  Unless you 
are using the sort of raid card that comes with its own backup battery 
for caching, then mdadm raid is going to be faster with a modern 
processor.  Even with such a card, mdadm raid is probably going to be 
faster for raid 5 or raid 6, simply because the host has access to more 
memory for caching stripes.

A key bottleneck to consider is IO throughput, rather than CPU power. 
This is especially true for RAID1 setups - doing the RAID1 on a hardware 
card halves the IO on the host.

However, if the server is old enough, there was a time when commonly 
used hardware raid cards were faster than doing it in software on the 
host.  In particular, if the host is single core, or Intel's old and 
crappy shared bus SMP, then a hardware raid card will be faster.

Not that this matters too much to the OP, of course!

>> My counters to most of the disadvantages would be that performance only
>> has to be decent, not great, on this box; the learning curve shouldn't
>> be too bad; and this configuration shouldn't require support from the
>> hardware RAID vendor anyway.  The disk failures would be the only issue
>> I couldn't counter, except by trying to determine if my labor costs
>> would end up being more than the savings in moving to cheaper disks.
>
> The learning curve is fairly easy with mdadm - furthermore, linux MD is now
> more functionally complete than all but the better end *modern* hardware
> RAID systems. Specifically, some things linux will do that a lot of
> older/cheaper HW RAID won't:
>
> 1) Attempt to rewrite a disck block that has failed to read<- triggers a
> bad block remap on most drives.
>
> 2) If you run the monitor daemon, linux will alert you if stuff goes bad, eg
> failed disk (OK, a crappy HW raid knos this, but can it alert you by email
> or just sit there with a falshing red LED?)
>
> 3) Perform a full sweep and parity verify on demand?
>
> There are more, but those are what I consider most useful.
>

One hint about learning mdadm - with mdadm, you can build your arrays 
from partitions, not just whole disks.  So you can give your disks a 4 
GB partition at the start and use that when testing and learning - it's 
a lot easier to learn when your rebuild times are a couple of minutes, 
rather than most of the day!

One thing to practice is identifying drives - when a drive fails, you 
want to be very sure of which one you should be replacing :-)

>> My questions:
>>
>> 1) Has anyone done this before, and if so, what were the results?  Was
>> performance acceptable in this configuration?  Are there any gotchas to
>> an otherwise workable configuration?
>
> Yep - been running SW raid 5 at home on 1.5TB total for 3 years. I have used
> a lot of mid range RAID controllers too (Chaparrel, Infotrend, ARECA,
> Eurologic)
>

I haven't tried any hardware raid cards seriously, but I've used mdadm 
raid often on servers and desktops.

Personally, I like RAID10 with "far" layout - it gives you greater 
safety than RAID5 or RAID6, and most of the speed of RAID0.  It works 
well with 2 or 3 disks (something that no hardware raid card can do).

>> 2) From what I've read so far, using desktop-class disks with linux
>> software RAID should not be a major problem, unlike using them on a true
>> hardware RAID card.  Is this reasonably accurate?  If not, are there
>> links that describe the difficulties?
>
> Yep - desktop are fine. Enterprise class or "RAID Edition" may be better
> quality and/or quicker. Quicker is usually related to RPM and at least is
> checkable in the specifications. "Well built" is more abstract. I prefer to
> use a mixture of makes in the same server, eg Hitachi, Seagate, Fujitsu, WD)
> - that way, you lessen the risk of the "Maxtor Deathstar" whole buch failing
> at once syndrome.
>

I am not convinced that enterprise class disks really offer much more 
than desktop disks if you have a reasonable environment (not too hot or 
cold, reliable power, etc.).  There will be a difference in the expected 
lifetimes of the drives - but since disk failures are actually fairly 
rare, it won't show in the statistics unless you have hundreds of drives 
or drive them under very heavy load.

>> 3) Suppose that my RAID6 starts out using 12 2TB disks, with three free
>> drive bays (one would be a hot spare).  Later on, I want to seamlessly
>> replace the 2TB disks with 3TB or larger disks.  Can mdadm grow an array
>> like this if, say, I replace one drive, rebuild, and repeat until I've
>> replaced all 12 disks with larger ones?  Or will the new 3TB disks only
>> be used up to 2TB, the size of the original disks?
>
> RAID5/6 need to be spread over identically sized partitions. So you can't
> add a 3TB drive to a 2TB disk based array. You can partition and make a new
> RAID across the 1TB partition. This is where ZFS gets clever, but that's not
> really an option for linux (BTRFS will probably get there one day).
>

You can increase the size of the RAID5/6 devices (whole disks, or 
partitions) if you re-size them all.  So if you replace one 2 TB drive 
with a 3 TB drive and let it rebuild, you can't use more than the first 
2 TB.  But if you continue the process and replace all of the drives, 
you can then "grow" the array to use the new space.

Another option for growth is to use mdadm over partitions, rather than 
whole disks.  Then when you add bigger disks, you have spare space that 
you can make into new partitions, and make another mdadm raid using 
them.  If you are using LVM to organise your real partitions (which I 
highly recommend), then you can add your new raid as a new physical 
partition and extend your working space.


One other thing to think about if you are planning to replace disks, is 
that you are reducing your redundancy while it is happening.  For 
example, if you have a RAID5 array and you pull one drive to replace it 
with a bigger drive, then you have no redundancy during that operation. 
  With RAID6 you have one drive redundancy rather than two.  And like 
all rebuilds, the rebuild for the drive replacement is particularly 
stressful for the rest of the disks in the array - and you are going to 
do the whole operation 12 times in a row.

But the beauty of md raid is its flexibility.  Rather than use twelve 
disks in a RAID6, build twelve RAID1 pairs from a real drive and a 
missing drive.  Then build your RAID6 on top of these "pairs".  The 
result is the same in terms of speed, capacity and redundancy.  But when 
you want to replace a drive with a bigger disk, you do it by adding the 
new drive to one of the pairs and letting the pair "rebuild".  Then you 
remove the old disk from the pair.  You keep the same redundancy over 
the whole array throughout the operation, and the rebuild is done as a 
mirror copy from one disk - the other drives are unaffected.  You can 
happily do the replacement with multiple disks in parallel - as many as 
you have spare drive bays.

(Future plans for md include "hot replace" functionality that will 
effectively automate this, but that's for the future.)

>> Thanks for any advice or pointers you can provide!
>
> One thing, whichever system you go for: set it up and do some speed and
> breakage tests to make sure it all works correctly - pull a disk out live,
> be sure you know how to put the disk back and bring the array back to fault
> tolerant and stuff like that.
>
> It's good fun, enjoy :)
>
> Cheers
>
> Tim
>
>> --keith
>>
>

[toc] | [prev] | [next] | [standalone]


#612

FromKeith Keller <kkeller-usenet@wombat.san-francisco.ca.us>
Date2011-04-06 14:00 -0700
Message-ID<kr0v68xq47.ln2@goaway.wombat.san-francisco.ca.us>
In reply to#610
Hello Tim, David, thanks so much for your comments.

I do want to make specific comments, but in general, it seems like the
take-home message is that I'm not completely stupid or insane for
thinking about attempting this.  That's what I suspected, but I do feel
a little better having it confirmed.

On 2011-04-06, David Brown <david@westcontrol.removethisbit.com> wrote:
> On 06/04/2011 09:01, Tim Watts wrote:
>>
>> Highly dependant on your server and RAID card of course, but you may find MD
>> software raid is quicker.

Yes, and I probably should have mentioned the card: it's a 3ware 9550SX,
with no BBU, on a 64bit dual-core machine.  So, based on yours and
David's comments, I probably shouldn't expect significantly worse
performance, and may even be better.  That's really all I desire given
the intended purpose.

>> The learning curve is fairly easy with mdadm - furthermore, linux MD is now
>> more functionally complete than all but the better end *modern* hardware
>> RAID systems. Specifically, some things linux will do that a lot of
>> older/cheaper HW RAID won't:
>>
>> 1) Attempt to rewrite a disck block that has failed to read<- triggers a
>> bad block remap on most drives.
>>
>> 2) If you run the monitor daemon, linux will alert you if stuff goes bad, eg
>> failed disk (OK, a crappy HW raid knos this, but can it alert you by email
>> or just sit there with a falshing red LED?)
>>
>> 3) Perform a full sweep and parity verify on demand?

I believe the 9550 will do #2, and it definitely does #3, with email
alerts (which I direct to my cell phone via my SMS gateway).  I did have
to work with RAID controllers which would simply blink, which was
incredibly frustrating.

> One hint about learning mdadm - with mdadm, you can build your arrays 
> from partitions, not just whole disks.  So you can give your disks a 4 
> GB partition at the start and use that when testing and learning - it's 
> a lot easier to learn when your rebuild times are a couple of minutes, 
> rather than most of the day!

Great suggestion!

> One thing to practice is identifying drives - when a drive fails, you 
> want to be very sure of which one you should be replacing :-)

Oh boy, I learned that The Hard Way (TM) many years ago, when I
accidentally pulled the wrong drive bay on a server with a failed disk.
Now I number the drive bays, verify twice that I have the right bay,
generate disk activity (or use the "identify drive" feature to blink the
light) to be sure I'm pulling an inactive drive, do that again, go back
and verify the right bay again, then pull the drive with fingers and
toes crossed.  (Fortunately, my mistake with the wrong drive wasn't
catastrophic, but it definitely made extra work for me.)

> Personally, I like RAID10 with "far" layout - it gives you greater 
> safety than RAID5 or RAID6, and most of the speed of RAID0.

Is that the "far replicas" described in the man page for md(4)?

My concern about RAID10 is that I'll lose too much capacity to
redundancy.  Because this is a snapshot server, I really need to
maximize available storage space; if I have 12 drive bays, with 2TB
drives I'd get only 12TB of usable space from a RAID10; even with 3TB
drives that's only 18TB (if my math is right).  Whereas, a RAID6 with
12 2TB drives gets me 20TB usable.  (If this were my primary fileserver
I'd be more likely to consider a RAID10.)

> Another option for growth is to use mdadm over partitions, rather than 
> whole disks.  Then when you add bigger disks, you have spare space that 
> you can make into new partitions, and make another mdadm raid using 
> them.  If you are using LVM to organise your real partitions (which I 
> highly recommend), then you can add your new raid as a new physical 
> partition and extend your working space.

Yes, I use LVM.  Using partitions sounds like a great idea, and is
definitely something that I can't get out of a hardware RAID controller
(another reason I'm leaning this way).

> But the beauty of md raid is its flexibility.  Rather than use twelve 
> disks in a RAID6, build twelve RAID1 pairs from a real drive and a 
> missing drive.  Then build your RAID6 on top of these "pairs".  The 
> result is the same in terms of speed, capacity and redundancy.  But when 
> you want to replace a drive with a bigger disk, you do it by adding the 
> new drive to one of the pairs and letting the pair "rebuild".  Then you 
> remove the old disk from the pair.  You keep the same redundancy over 
> the whole array throughout the operation, and the rebuild is done as a 
> mirror copy from one disk - the other drives are unaffected.  You can 
> happily do the replacement with multiple disks in parallel - as many as 
> you have spare drive bays.

Another fantastic idea!  (Though I'm guessing the RAID1s will somehow
show up as ''failed''; I would need to work around that for paging
purposes.)

Again, thanks for the thoughtful responses!

--keith

-- 
kkeller-usenet@wombat.san-francisco.ca.us
(try just my userid to email me)
AOLSFAQ=http://www.therockgarden.ca/aolsfaq.txt
see X- headers for PGP signature information

[toc] | [prev] | [next] | [standalone]


#613

FromDavid Brown <david.brown@removethis.hesbynett.no>
Date2011-04-06 23:42 +0200
Message-ID<ididnbdDi9KnQwHQnZ2dnUVZ8imdnZ2d@lyse.net>
In reply to#612
On 06/04/11 23:00, Keith Keller wrote:
> Hello Tim, David, thanks so much for your comments.
>
> I do want to make specific comments, but in general, it seems like the
> take-home message is that I'm not completely stupid or insane for
> thinking about attempting this.  That's what I suspected, but I do feel
> a little better having it confirmed.
>
> On 2011-04-06, David Brown<david@westcontrol.removethisbit.com>  wrote:
>> On 06/04/2011 09:01, Tim Watts wrote:
>>>
>>> Highly dependant on your server and RAID card of course, but you may find MD
>>> software raid is quicker.
>
> Yes, and I probably should have mentioned the card: it's a 3ware 9550SX,
> with no BBU, on a 64bit dual-core machine.  So, based on yours and
> David's comments, I probably shouldn't expect significantly worse
> performance, and may even be better.  That's really all I desire given
> the intended purpose.
>
>>> The learning curve is fairly easy with mdadm - furthermore, linux MD is now
>>> more functionally complete than all but the better end *modern* hardware
>>> RAID systems. Specifically, some things linux will do that a lot of
>>> older/cheaper HW RAID won't:
>>>
>>> 1) Attempt to rewrite a disck block that has failed to read<- triggers a
>>> bad block remap on most drives.
>>>
>>> 2) If you run the monitor daemon, linux will alert you if stuff goes bad, eg
>>> failed disk (OK, a crappy HW raid knos this, but can it alert you by email
>>> or just sit there with a falshing red LED?)
>>>
>>> 3) Perform a full sweep and parity verify on demand?
>
> I believe the 9550 will do #2, and it definitely does #3, with email
> alerts (which I direct to my cell phone via my SMS gateway).  I did have
> to work with RAID controllers which would simply blink, which was
> incredibly frustrating.
>
>> One hint about learning mdadm - with mdadm, you can build your arrays
>> from partitions, not just whole disks.  So you can give your disks a 4
>> GB partition at the start and use that when testing and learning - it's
>> a lot easier to learn when your rebuild times are a couple of minutes,
>> rather than most of the day!
>
> Great suggestion!
>
>> One thing to practice is identifying drives - when a drive fails, you
>> want to be very sure of which one you should be replacing :-)
>
> Oh boy, I learned that The Hard Way (TM) many years ago, when I
> accidentally pulled the wrong drive bay on a server with a failed disk.

I see this as the number one reason for preferring RAID6 to RAID5.  One 
should never underestimate the risks of human error :-)

> Now I number the drive bays, verify twice that I have the right bay,
> generate disk activity (or use the "identify drive" feature to blink the
> light) to be sure I'm pulling an inactive drive, do that again, go back
> and verify the right bay again, then pull the drive with fingers and
> toes crossed.  (Fortunately, my mistake with the wrong drive wasn't
> catastrophic, but it definitely made extra work for me.)
>
>> Personally, I like RAID10 with "far" layout - it gives you greater
>> safety than RAID5 or RAID6, and most of the speed of RAID0.
>
> Is that the "far replicas" described in the man page for md(4)?
>

No, it is a special layout choice for RAID10.  Wikipedia has a 
reasonable explanation: 
<http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10>

If you are using RAID10, then it is a good choice for many workloads (it 
is significantly faster for reads than traditional RAID10, but 
marginally slower for writes).

> My concern about RAID10 is that I'll lose too much capacity to
> redundancy.  Because this is a snapshot server, I really need to
> maximize available storage space; if I have 12 drive bays, with 2TB
> drives I'd get only 12TB of usable space from a RAID10; even with 3TB
> drives that's only 18TB (if my math is right).  Whereas, a RAID6 with
> 12 2TB drives gets me 20TB usable.  (If this were my primary fileserver
> I'd be more likely to consider a RAID10.)

Fair enough.  You choose your balance between size, cost, speed, 
redundancy, rebuild times, etc.

>
>> Another option for growth is to use mdadm over partitions, rather than
>> whole disks.  Then when you add bigger disks, you have spare space that
>> you can make into new partitions, and make another mdadm raid using
>> them.  If you are using LVM to organise your real partitions (which I
>> highly recommend), then you can add your new raid as a new physical
>> partition and extend your working space.
>
> Yes, I use LVM.  Using partitions sounds like a great idea, and is
> definitely something that I can't get out of a hardware RAID controller
> (another reason I'm leaning this way).
>

I have only set up real systems with smaller numbers of drives - the 
last one I did had three drives in a RAID10 layout.  But grub won't boot 
from an mdadm RAID10 set - it is pretty non-standard.  So I put a small 
partition at the start of each disk and made a three-way RAID1 using 
those partitions (being small, the poor space efficiency doesn't 
matter).  I put /boot on that RAID1 and grub on the MBR of each disk. 
Then the rest of each disk was a single large partition, with those all 
tied together as RAID10.

>> But the beauty of md raid is its flexibility.  Rather than use twelve
>> disks in a RAID6, build twelve RAID1 pairs from a real drive and a
>> missing drive.  Then build your RAID6 on top of these "pairs".  The
>> result is the same in terms of speed, capacity and redundancy.  But when
>> you want to replace a drive with a bigger disk, you do it by adding the
>> new drive to one of the pairs and letting the pair "rebuild".  Then you
>> remove the old disk from the pair.  You keep the same redundancy over
>> the whole array throughout the operation, and the rebuild is done as a
>> mirror copy from one disk - the other drives are unaffected.  You can
>> happily do the replacement with multiple disks in parallel - as many as
>> you have spare drive bays.
>
> Another fantastic idea!  (Though I'm guessing the RAID1s will somehow
> show up as ''failed''; I would need to work around that for paging
> purposes.)
>

Yes, you will need to take these "failures" into account in your warning 
system.  It will also be an issue for hot spares - you will not want to 
make a spare drive into a general hot spare for the RAID1's, or it will 
quickly be grabbed by one of them.  I think you would have to go back to 
the old-fashioned way of using mdadm monitor to trigger a script when 
one of the mirrors fails completely, and then "manually" add in the disk 
to the correct mirror.

After a quick check of the mdadm man page, it seems you can make your 
RAID1 sets consist of only one drive.  Then your one-way "mirrors" are 
not failed.  When you want to migrate to a larger disk, you can simply 
"grow" the "mirror" to being two disks, including the new one.  Once you 
are ready to remove the old one, you fail it, remove it, then "grow" the 
"mirror" back to one disk.  I suspect you would still need some fiddling 
with mdadm-triggered scripts to get your hot spares working, as an 
automatic hot spare will not work when a "mirror" set dies completely.

> Again, thanks for the thoughtful responses!
>
> --keith
>

[toc] | [prev] | [next] | [standalone]


#625

FromGrant <omg@grrr.id.au>
Date2011-04-08 10:45 +1000
Message-ID<tllsp6ltftsq6ufp048hcc4ivufupgbmki@4ax.com>
In reply to#612
On Wed, 6 Apr 2011 14:00:04 -0700, Keith Keller <kkeller-usenet@wombat.san-francisco.ca.us> wrote:

>Hello Tim, David, thanks so much for your comments.
>
>I do want to make specific comments, but in general, it seems like the
>take-home message is that I'm not completely stupid or insane for
>thinking about attempting this.  That's what I suspected, but I do feel
>a little better having it confirmed.
>
>On 2011-04-06, David Brown <david@westcontrol.removethisbit.com> wrote:
>> On 06/04/2011 09:01, Tim Watts wrote:
>>>
>>> Highly dependant on your server and RAID card of course, but you may find MD
>>> software raid is quicker.
>
>Yes, and I probably should have mentioned the card: it's a 3ware 9550SX,
>with no BBU, on a 64bit dual-core machine.  So, based on yours and
>David's comments, I probably shouldn't expect significantly worse
>performance, and may even be better.  That's really all I desire given
>the intended purpose.
>
>>> The learning curve is fairly easy with mdadm - furthermore, linux MD is now
>>> more functionally complete than all but the better end *modern* hardware
>>> RAID systems. Specifically, some things linux will do that a lot of
>>> older/cheaper HW RAID won't:
>>>
>>> 1) Attempt to rewrite a disck block that has failed to read<- triggers a
>>> bad block remap on most drives.
>>>
>>> 2) If you run the monitor daemon, linux will alert you if stuff goes bad, eg
>>> failed disk (OK, a crappy HW raid knos this, but can it alert you by email
>>> or just sit there with a falshing red LED?)
>>>
>>> 3) Perform a full sweep and parity verify on demand?
>
>I believe the 9550 will do #2, and it definitely does #3, with email
>alerts (which I direct to my cell phone via my SMS gateway).  I did have
>to work with RAID controllers which would simply blink, which was
>incredibly frustrating.
>
>> One hint about learning mdadm - with mdadm, you can build your arrays 
>> from partitions, not just whole disks.  So you can give your disks a 4 
>> GB partition at the start and use that when testing and learning - it's 
>> a lot easier to learn when your rebuild times are a couple of minutes, 
>> rather than most of the day!
>
>Great suggestion!

RAID on partitions is a great idea, I'm using it here with 6 x 1TB drives for 
the RAID, and a 2TB drive for backup, bounce buffer.  At the moment growing 
from 5 to 6 x 1TB drives with the aid of a borrowed 1.5TB drive to keep it 
separate from my other stuff.  

So I use the fast end for OS, a 4GB partition for RAID10 swap, then 2 
partitions in the bulk of the space for data in two separate RAID6 arrays.

I'm still to find the best settings, running with a quad core CPU on an Intel 
chipset (ICH9R) mobo for the 6 raid drives, a dual SATA controller card for 
backup and external (casual) SATA drives.

One thing I'm not seeing discussed enough is the need for adjusting NCQ on the 
SATA drives.  I'm using Seagate drives that have up to 31 queue slots, and 
switched them to use 1.  But I've not yet scripted a benchmark to find out 
if there's a better queue depth to use.  The theory is that the mdadm RAID 
software is fighting command queuing, I have no idea what the impact is, but 
short tests indicate no queue is better.  I'd like more info, confirmation.

>
>> One thing to practice is identifying drives - when a drive fails, you 
>> want to be very sure of which one you should be replacing :-)

Mark the cables and put the drives in order!  Also spin down the drive 
you want to pull if it's out where you can feel if it's spinning?
>
>Oh boy, I learned that The Hard Way (TM) many years ago, when I
>accidentally pulled the wrong drive bay on a server with a failed disk.
>Now I number the drive bays, verify twice that I have the right bay,
>generate disk activity (or use the "identify drive" feature to blink the
>light) to be sure I'm pulling an inactive drive, do that again, go back
>and verify the right bay again, then pull the drive with fingers and
>toes crossed.  (Fortunately, my mistake with the wrong drive wasn't
>catastrophic, but it definitely made extra work for me.)
>
>> Personally, I like RAID10 with "far" layout - it gives you greater 
>> safety than RAID5 or RAID6, and most of the speed of RAID0.

I did that for the swap RAID10, unsure how to change from RAID10 with 
spare to what? now that I have 6 drives in there.  Not that I plan to 
use a lot of swap, but it is the overload area for /tmp as well (/tmp 
mounted in memory, expands to swap after it uses half of memory, 
something like that, I soon forget the details when there's no problems).
>
>Is that the "far replicas" described in the man page for md(4)?
>
>My concern about RAID10 is that I'll lose too much capacity to
>redundancy.  Because this is a snapshot server, I really need to
>maximize available storage space; if I have 12 drive bays, with 2TB
>drives I'd get only 12TB of usable space from a RAID10; even with 3TB
>drives that's only 18TB (if my math is right).  Whereas, a RAID6 with
>12 2TB drives gets me 20TB usable.  (If this were my primary fileserver
>I'd be more likely to consider a RAID10.)

RAID6 for data, if you're on a budget :)  RAID6 is slower than RAID5, 
but that extra data protection is worth it, I think.  You need to cost 
loss of data vs speed and other factors relevant for your own scenario.

To rebuild a RAID5 with a RAID5 after total data loss is madness, yet 
I know a guy doing business systems did that, 'cos the RAID controller 
didn't do RAID6 (was on a windoze box).  Madness?
>
>> Another option for growth is to use mdadm over partitions, rather than 
>> whole disks.  Then when you add bigger disks, you have spare space that 
>> you can make into new partitions, and make another mdadm raid using 
>> them.  If you are using LVM to organise your real partitions (which I 
>> highly recommend), then you can add your new raid as a new physical 
>> partition and extend your working space.
>
>Yes, I use LVM.  Using partitions sounds like a great idea, and is
>definitely something that I can't get out of a hardware RAID controller
>(another reason I'm leaning this way).

I tried telling mdadm to grow on partition size increase and it refused :(

Probably me not up there on the learning curve, but I was disappointed.

Since mdadm is under active development, I expect it to improve over time.
>
>> But the beauty of md raid is its flexibility.  Rather than use twelve 
>> disks in a RAID6, build twelve RAID1 pairs from a real drive and a 
>> missing drive.  Then build your RAID6 on top of these "pairs".  The 
>> result is the same in terms of speed, capacity and redundancy.  But when 
>> you want to replace a drive with a bigger disk, you do it by adding the 
>> new drive to one of the pairs and letting the pair "rebuild".  Then you 
>> remove the old disk from the pair.  You keep the same redundancy over 
>> the whole array throughout the operation, and the rebuild is done as a 
>> mirror copy from one disk - the other drives are unaffected.  You can 
>> happily do the replacement with multiple disks in parallel - as many as 
>> you have spare drive bays.
>
>Another fantastic idea!  (Though I'm guessing the RAID1s will somehow
>show up as ''failed''; I would need to work around that for paging
>purposes.)

Swap space?  RAID10 is best for that, from my reading.  Got to be careful 
with swap reliability because bad swap will crash the machine and possibly 
eat your data.  Same as bad memory.

Grant.

[toc] | [prev] | [next] | [standalone]


#626

FromDavid Brown <david@westcontrol.removethisbit.com>
Date2011-04-08 11:12 +0200
Message-ID<FqSdnWp-6szqTAPQnZ2dnUVZ8hednZ2d@lyse.net>
In reply to#625
On 08/04/2011 02:45, Grant wrote:
> On Wed, 6 Apr 2011 14:00:04 -0700, Keith
> Keller<kkeller-usenet@wombat.san-francisco.ca.us>  wrote:
>
>> Hello Tim, David, thanks so much for your comments.
<snip>
>>> One hint about learning mdadm - with mdadm, you can build your
>>> arrays from partitions, not just whole disks.  So you can give
>>> your disks a 4 GB partition at the start and use that when
>>> testing and learning - it's a lot easier to learn when your
>>> rebuild times are a couple of minutes, rather than most of the
>>> day!
>>
>> Great suggestion!
>
> RAID on partitions is a great idea, I'm using it here with 6 x 1TB
> drives for the RAID, and a 2TB drive for backup, bounce buffer.  At
> the moment growing from 5 to 6 x 1TB drives with the aid of a
> borrowed 1.5TB drive to keep it separate from my other stuff.
>
> So I use the fast end for OS, a 4GB partition for RAID10 swap, then
> 2 partitions in the bulk of the space for data in two separate RAID6
> arrays.
>

The flexibility is a big advantage of mdraid.  Sometimes you want to 
emphasise redundancy, sometimes speed, sometimes space efficiency - you 
can do it all on the same disks using md raid over partitions.

Another thing you can do with software raid is use external USB (or 
eSATA, if possible) drives in your raids.  While you won't want to do 
that for normal use, it can be a great way to add in a bit of extra 
redundancy before doing operations such as moving over to larger drives. 
  Try doing that with hardware raid cards!

> I'm still to find the best settings, running with a quad core CPU on
> an Intel chipset (ICH9R) mobo for the 6 raid drives, a dual SATA
> controller card for backup and external (casual) SATA drives.
>
> One thing I'm not seeing discussed enough is the need for adjusting
> NCQ on the SATA drives.  I'm using Seagate drives that have up to 31
> queue slots, and switched them to use 1.  But I've not yet scripted a
> benchmark to find out if there's a better queue depth to use.  The
> theory is that the mdadm RAID software is fighting command queuing, I
> have no idea what the impact is, but short tests indicate no queue is
> better.  I'd like more info, confirmation.
>

I hadn't thought about that at all.  I'm planning on setting up a couple 
of new servers in the near future - maybe I'll get a chance to try that out.

>>
>>> One thing to practice is identifying drives - when a drive fails,
>>> you want to be very sure of which one you should be replacing
>>> :-)
>
> Mark the cables and put the drives in order!  Also spin down the
> drive you want to pull if it's out where you can feel if it's
> spinning?

Marking the cables, as well as the drives, is a great idea.  It is 
obvious when you say it, of course, but worth saying out loud.

Spinning a drive down is a nice idea to identify them (especially if you 
forgot to label the drives and cables...) - I will try that to see how 
easy it is to feel the difference.


>>
>> Oh boy, I learned that The Hard Way (TM) many years ago, when I
>> accidentally pulled the wrong drive bay on a server with a failed
>> disk. Now I number the drive bays, verify twice that I have the
>> right bay, generate disk activity (or use the "identify drive"
>> feature to blink the light) to be sure I'm pulling an inactive
>> drive, do that again, go back and verify the right bay again, then
>> pull the drive with fingers and toes crossed.  (Fortunately, my
>> mistake with the wrong drive wasn't catastrophic, but it definitely
>> made extra work for me.)
>>
>>> Personally, I like RAID10 with "far" layout - it gives you
>>> greater safety than RAID5 or RAID6, and most of the speed of
>>> RAID0.
>
> I did that for the swap RAID10, unsure how to change from RAID10
> with spare to what? now that I have 6 drives in there.  Not that I
> plan to use a lot of swap, but it is the overload area for /tmp as
> well (/tmp mounted in memory, expands to swap after it uses half of
> memory, something like that, I soon forget the details when there's
> no problems).

I too like my /tmp (and /var/tmp, and sometimes other ad-hoc temporary 
directories) on tmpfs, and so often have a large swap even when I have a 
lot of ram.  I haven't bothered using raid on the swap drives - 
mirroring swap is a bit overkill on a desktop, though it's a good idea 
on a server.  You don't need to explicitly use raid0 for swap - the 
kernel does that automatically if you have multiple swap drives/partitions.

I am not sure whether RAID10,far is the best choice for swap, as 
compared to RAID10,near.  RAID10,far is excellent for a read-mostly 
array, but writes involve more head movement than in RAID10,near - and 
swap involves writes as much as reads.  Perhaps RAID10,offset is in fact 
the best choice.

One disadvantage of RAID10 is that you can't change it after it is made 
- you can't reshape it, grow it, or change the layout.  But for swap 
that shouldn't be a problem - just turn your swap off, break down the 
existing array, and create a new one including the extra drives.  Since 
you have no data on the raid (assuming you are not using swap at the 
time), you've nothing to lose.


>> Is that the "far replicas" described in the man page for md(4)?
>>
>> My concern about RAID10 is that I'll lose too much capacity to
>> redundancy.  Because this is a snapshot server, I really need to
>> maximize available storage space; if I have 12 drive bays, with
>> 2TB drives I'd get only 12TB of usable space from a RAID10; even
>> with 3TB drives that's only 18TB (if my math is right).  Whereas, a
>> RAID6 with 12 2TB drives gets me 20TB usable.  (If this were my
>> primary fileserver I'd be more likely to consider a RAID10.)
>
> RAID6 for data, if you're on a budget :)  RAID6 is slower than
> RAID5, but that extra data protection is worth it, I think.  You need
> to cost loss of data vs speed and other factors relevant for your own
> scenario.
>

I doubt if RAID6 is noticeably slower than RAID5 for most operations. 
Modern cpu's handle the calculations easily.  The only slow point is 
that partial stripe writes will be a little slower (if they miss the 
stripe cache), since you need to read in and write out at least three 
blocks.  But these blocks are all on different disks, so they operate in 
parallel.

I think the days of RAID5 are numbered, expect in cases where you have 
additional protection (such as RAID1+5).  Certainly RAID5 + hot spare is 
a meaningless choice - RAID6 would definitely be better.

> To rebuild a RAID5 with a RAID5 after total data loss is madness,
> yet I know a guy doing business systems did that, 'cos the RAID
> controller didn't do RAID6 (was on a windoze box).  Madness?

Many low-end hardware cards don't support RAID6.

>>
>>> Another option for growth is to use mdadm over partitions, rather
>>> than whole disks.  Then when you add bigger disks, you have spare
>>> space that you can make into new partitions, and make another
>>> mdadm raid using them.  If you are using LVM to organise your
>>> real partitions (which I highly recommend), then you can add your
>>> new raid as a new physical partition and extend your working
>>> space.
>>
>> Yes, I use LVM.  Using partitions sounds like a great idea, and is
>> definitely something that I can't get out of a hardware RAID
>> controller (another reason I'm leaning this way).
>
> I tried telling mdadm to grow on partition size increase and it
> refused :(
>
> Probably me not up there on the learning curve, but I was
> disappointed.
>

It depends on the type of array you have - some can be grown, others 
cannot.  RAID 1, 5 and 6 can be grown when you have increased the 
partition size of all components.  But RAID 0 and 10 cannot (currently) 
be grown.  Resizing RAID 10 would be complicated because of its layout, 
though I'm sure one day it will be supported.  Resizing RAID 0 sounds 
easy, but I gather that md RAID 0 is actually very general (it will work 
with different sized disks, for example), which complicates resizing.

> Since mdadm is under active development, I expect it to improve over
> time.

Some of the plans discussed on the linux-raid@vger.kernel.org mailing 
list are /very/ exciting.

>>
>>> But the beauty of md raid is its flexibility.  Rather than use
>>> twelve disks in a RAID6, build twelve RAID1 pairs from a real
>>> drive and a missing drive.  Then build your RAID6 on top of these
>>> "pairs".  The result is the same in terms of speed, capacity and
>>> redundancy.  But when you want to replace a drive with a bigger
>>> disk, you do it by adding the new drive to one of the pairs and
>>> letting the pair "rebuild".  Then you remove the old disk from
>>> the pair.  You keep the same redundancy over the whole array
>>> throughout the operation, and the rebuild is done as a mirror
>>> copy from one disk - the other drives are unaffected.  You can
>>> happily do the replacement with multiple disks in parallel - as
>>> many as you have spare drive bays.
>>
>> Another fantastic idea!  (Though I'm guessing the RAID1s will
>> somehow show up as ''failed''; I would need to work around that for
>> paging purposes.)
>
> Swap space?  RAID10 is best for that, from my reading.  Got to be
> careful with swap reliability because bad swap will crash the machine
> and possibly eat your data.  Same as bad memory.
>
> Grant.

[toc] | [prev] | [next] | [standalone]


#635

FromKeith Keller <kkeller-usenet@wombat.san-francisco.ca.us>
Date2011-04-08 08:22 -0700
Message-ID<4ql378xm3b.ln2@goaway.wombat.san-francisco.ca.us>
In reply to#626
On 2011-04-08, David Brown <david@westcontrol.removethisbit.com> wrote:
> On 08/04/2011 02:45, Grant wrote:
>>
>> Mark the cables and put the drives in order!  Also spin down the
>> drive you want to pull if it's out where you can feel if it's
>> spinning?
>
> Marking the cables, as well as the drives, is a great idea.  It is 
> obvious when you say it, of course, but worth saying out loud.
>
> Spinning a drive down is a nice idea to identify them (especially if you 
> forgot to label the drives and cables...) - I will try that to see how 
> easy it is to feel the difference.

It sounds like these suggestions all assume a desktop-like case.  Any
decent rackmount case with hot-swap drive bays should have some way to
label the drive bays, if the trays aren't already labeled.

> One disadvantage of RAID10 is that you can't change it after it is made 
> - you can't reshape it, grow it, or change the layout.  But for swap 
> that shouldn't be a problem - just turn your swap off, break down the 
> existing array, and create a new one including the extra drives.  Since 
> you have no data on the raid (assuming you are not using swap at the 
> time), you've nothing to lose.

You could always create a swap file on some other disks (even your data
disks, if you really need to do this), swapon the new file, then swapoff
the RAID10 swap space.  This might not be a lot of fun if you've got a
lot of swap in use, but that's an indicator of other problems.  :)

> I think the days of RAID5 are numbered, expect in cases where you have 
> additional protection (such as RAID1+5).  Certainly RAID5 + hot spare is 
> a meaningless choice - RAID6 would definitely be better.

I think RAID5 isn't dead yet, but it's a smaller niche.  Perhaps you
have redundant public-facing nodes with four drive bays.  Maybe you want
the extra storage space, so you don't want RAID6, but you want some
protection against failure, so you don't want RAID0.

But yes, in general I wouldn't want to go RAID5 with more than four or
so disks, and RAID5 + hot spare is almost pointless.

> Many low-end hardware cards don't support RAID6.

Yep!  The card in my original post doesn't support RAID6.  It does
support RAID50, but I think RAID6 is a better option both space-wise and
safety-wise--RAID6 can always tolerate two disk failures, whereas some
RAID50 two-disk failures will destroy the array.  (Yes, you get better
rebuild times on RAID50.)

--keith

-- 
kkeller-usenet@wombat.san-francisco.ca.us
(try just my userid to email me)
AOLSFAQ=http://www.therockgarden.ca/aolsfaq.txt
see X- headers for PGP signature information

[toc] | [prev] | [next] | [standalone]


#645

FromGrant <omg@grrr.id.au>
Date2011-04-09 09:51 +1000
Message-ID<hk7vp6lanc5k2b7fahvhcnscaolfaf9nqm@4ax.com>
In reply to#635
On Fri, 8 Apr 2011 08:22:12 -0700, Keith Keller <kkeller-usenet@wombat.san-francisco.ca.us> wrote:

>On 2011-04-08, David Brown <david@westcontrol.removethisbit.com> wrote:
>> On 08/04/2011 02:45, Grant wrote:
>>>
>>> Mark the cables and put the drives in order!  Also spin down the
>>> drive you want to pull if it's out where you can feel if it's
>>> spinning?
>>
>> Marking the cables, as well as the drives, is a great idea.  It is 
>> obvious when you say it, of course, but worth saying out loud.
>>
>> Spinning a drive down is a nice idea to identify them (especially if you 
>> forgot to label the drives and cables...) - I will try that to see how 
>> easy it is to feel the difference.
>
>It sounds like these suggestions all assume a desktop-like case.  Any
>decent rackmount case with hot-swap drive bays should have some way to
>label the drive bays, if the trays aren't already labeled.

My server is crammed into a desktop case, wish I had activity lights, 
can't see where to connect them?  (Seagate cheapie SATA drives), no 
mobo connections.  Does one add LEDs some other way and write a little 
driver?  Always did want to add lots of flashing LEDs to a PC ;^)
>
>> One disadvantage of RAID10 is that you can't change it after it is made 
>> - you can't reshape it, grow it, or change the layout.  But for swap 
>> that shouldn't be a problem - just turn your swap off, break down the 
>> existing array, and create a new one including the extra drives.  Since 
>> you have no data on the raid (assuming you are not using swap at the 
>> time), you've nothing to lose.
>
>You could always create a swap file on some other disks (even your data
>disks, if you really need to do this), swapon the new file, then swapoff
>the RAID10 swap space.  This might not be a lot of fun if you've got a
>lot of swap in use, but that's an indicator of other problems.  :)

Yes, swap is overflow space, should be able to quieten it on demand?
>
>> I think the days of RAID5 are numbered, expect in cases where you have 
>> additional protection (such as RAID1+5).  Certainly RAID5 + hot spare is 
>> a meaningless choice - RAID6 would definitely be better.
>
>I think RAID5 isn't dead yet, but it's a smaller niche.  Perhaps you
>have redundant public-facing nodes with four drive bays.  Maybe you want
>the extra storage space, so you don't want RAID6, but you want some
>protection against failure, so you don't want RAID0.
>
>But yes, in general I wouldn't want to go RAID5 with more than four or
>so disks, and RAID5 + hot spare is almost pointless.
>
>> Many low-end hardware cards don't support RAID6.
>
>Yep!  The card in my original post doesn't support RAID6.  It does
>support RAID50, but I think RAID6 is a better option both space-wise and
>safety-wise--RAID6 can always tolerate two disk failures, whereas some
>RAID50 two-disk failures will destroy the array.  (Yes, you get better
>rebuild times on RAID50.)

What's RAID50, I guess two mirrored RAID5s?  RAID6 seems more efficient?

Grant.
>
>--keith

[toc] | [prev] | [next] | [standalone]


#648

FromKeith Keller <kkeller-usenet@wombat.san-francisco.ca.us>
Date2011-04-08 17:10 -0700
Message-ID<kok478x7tj.ln2@goaway.wombat.san-francisco.ca.us>
In reply to#645
On 2011-04-08, Grant <omg@grrr.id.au> wrote:
> On Fri, 8 Apr 2011 08:22:12 -0700, Keith Keller <kkeller-usenet@wombat.san-francisco.ca.us> wrote:
>
> What's RAID50, I guess two mirrored RAID5s?  RAID6 seems more efficient?

RAID50 is two striped RAID5s.  RAID51 would be a mirror of RAID5s.
RAID6 is an improvement over RAID50, but older hardware RAID controllers
(like the one I have) don't support RAID6.

--keith

-- 
kkeller-usenet@wombat.san-francisco.ca.us
(try just my userid to email me)
AOLSFAQ=http://www.therockgarden.ca/aolsfaq.txt
see X- headers for PGP signature information

[toc] | [prev] | [next] | [standalone]


#652

FromDavid Brown <david.brown@removethis.hesbynett.no>
Date2011-04-09 13:14 +0200
Message-ID<sJGdnWu4CssWoj3QnZ2dnUVZ8sydnZ2d@lyse.net>
In reply to#648
On 09/04/11 02:10, Keith Keller wrote:
> On 2011-04-08, Grant<omg@grrr.id.au>  wrote:
>> On Fri, 8 Apr 2011 08:22:12 -0700, Keith Keller<kkeller-usenet@wombat.san-francisco.ca.us>  wrote:
>>
>> What's RAID50, I guess two mirrored RAID5s?  RAID6 seems more efficient?
>
> RAID50 is two striped RAID5s.  RAID51 would be a mirror of RAID5s.
> RAID6 is an improvement over RAID50, but older hardware RAID controllers
> (like the one I have) don't support RAID6.
>

RAID50 has some advantages in terms of scalability and recovery, as 
compared to a single wide RAID6 - quite aside from any limitations that 
hardware controllers might have.

One is that it can be easier to manage a hierarchical setup if you have 
a lot of drives.  You might have a number of independent RAID5 boxes, 
and stripe them together as RAID0.  Or you could have more than one RAID 
controller card in the same box, each managing a RAID5 array, with RAID0 
handled in software.

There is also the issue of rebuilding.  With a RAID5, a rebuild requires 
continuous reading of all data in all the other drives in the array 
(RAID6 is only slightly less bad).  If you have your array split into 
separate RAID5 arrays, then there will be less disk work during the rebuild.

A RAID50 will also be more efficient for partial stripe writes than a 
single wide RAID5 or RAID6, since you don't have to read in so many 
blocks to calculate the parity.  With very wide arrays, a higher 
proportion of your writes will be partial stripes, so this can be a 
bottleneck to scalability.

Of course, RAID50 doesn't give you any better worst-case redundancy than 
RAID5 otherwise would - a second disk failure during a rebuild means you 
lose everything.  RAID6 gives you that extra redundancy.  However, 
RAID50 gives you average-case better reliability than a single wide 
RAID5 would, if RAID6 is not an option.


It is also perfectly possible to do RAID60, and get the benefits of both 
(at the cost of another disk in each set, obviously).

[toc] | [prev] | [next] | [standalone]


#644

FromGrant <omg@grrr.id.au>
Date2011-04-09 09:47 +1000
Message-ID<tn3vp65b9h39pc81l6t65bhdleh3hi08c7@4ax.com>
In reply to#626
On Fri, 08 Apr 2011 11:12:14 +0200, David Brown <david@westcontrol.removethisbit.com> wrote:

>On 08/04/2011 02:45, Grant wrote:
>> On Wed, 6 Apr 2011 14:00:04 -0700, Keith
>> Keller<kkeller-usenet@wombat.san-francisco.ca.us>  wrote:
>>
>>> Hello Tim, David, thanks so much for your comments.
><snip>
>>>> One hint about learning mdadm - with mdadm, you can build your
>>>> arrays from partitions, not just whole disks.  So you can give
>>>> your disks a 4 GB partition at the start and use that when
>>>> testing and learning - it's a lot easier to learn when your
>>>> rebuild times are a couple of minutes, rather than most of the
>>>> day!
>>>
>>> Great suggestion!
>>
>> RAID on partitions is a great idea, I'm using it here with 6 x 1TB
>> drives for the RAID, and a 2TB drive for backup, bounce buffer.  At
>> the moment growing from 5 to 6 x 1TB drives with the aid of a
>> borrowed 1.5TB drive to keep it separate from my other stuff.
>>
>> So I use the fast end for OS, a 4GB partition for RAID10 swap, then
>> 2 partitions in the bulk of the space for data in two separate RAID6
>> arrays.
>>
>
>The flexibility is a big advantage of mdraid.  Sometimes you want to 
>emphasise redundancy, sometimes speed, sometimes space efficiency - you 
>can do it all on the same disks using md raid over partitions.
>
>Another thing you can do with software raid is use external USB (or 
>eSATA, if possible) drives in your raids.  While you won't want to do 
>that for normal use, it can be a great way to add in a bit of extra 
>redundancy before doing operations such as moving over to larger drives. 
>  Try doing that with hardware raid cards!

I have a borrowed drive out on an eSATA right now for an extra bounce buffer :)

But I'm leery of making an external drive a RAID member, prefer RAID members 
to be bolted down.
>
>> I'm still to find the best settings, running with a quad core CPU on
>> an Intel chipset (ICH9R) mobo for the 6 raid drives, a dual SATA
>> controller card for backup and external (casual) SATA drives.
>>
>> One thing I'm not seeing discussed enough is the need for adjusting
>> NCQ on the SATA drives.  I'm using Seagate drives that have up to 31
>> queue slots, and switched them to use 1.  But I've not yet scripted a
>> benchmark to find out if there's a better queue depth to use.  The
>> theory is that the mdadm RAID software is fighting command queuing, I
>> have no idea what the impact is, but short tests indicate no queue is
>> better.  I'd like more info, confirmation.
>>
>
>I hadn't thought about that at all.  I'm planning on setting up a couple 
>of new servers in the near future - maybe I'll get a chance to try that out.

Takes a long time I think.  A case of writing a script to make the queue depth 
change then call some benchmark exercises...  One thing I'm no longer sure of 
is that after exploring the /sys/ area controls, my method of writing NCQ depth 
to the drives direct from rc.local is probably okay, but changing the queue depth 
on the fly?  Made me wonder if I created data loss yesterday loading up the new 
RAID6, so I cleared the drives overnight, will start again, and 'talk' to the 
drives through the kernel which will presumably do an adjustment without losing 
in flight data.  

Better safe than sorry.
>
>>>
>>>> One thing to practice is identifying drives - when a drive fails,
>>>> you want to be very sure of which one you should be replacing
>>>> :-)
>>
>> Mark the cables and put the drives in order!  Also spin down the
>> drive you want to pull if it's out where you can feel if it's
>> spinning?
>
>Marking the cables, as well as the drives, is a great idea.  It is 
>obvious when you say it, of course, but worth saying out loud.
>
>Spinning a drive down is a nice idea to identify them (especially if you 
>forgot to label the drives and cables...) - I will try that to see how 
>easy it is to feel the difference.

It was good when I had several drives sitting outside the box a few months 
ago, I don't have a removable drive cage, so no idea how well that works in 
the box.  I got seven drives in a four drive tower, four in proper spots, one 
where floppy goes, and two up in the 5 1'4" bays with adapters, 600W power 
supply, but the UPS says the box taking less than 150W.  

Got UPS so I can run XFS safely, though I've yet to rewrite the crappy script 
that came with the UPS for delayed shutdown.  I think UPS is important part of 
RAID discussions.
>
>
>>>
>>> Oh boy, I learned that The Hard Way (TM) many years ago, when I
...
>>> made extra work for me.)
>>>
>>>> Personally, I like RAID10 with "far" layout - it gives you
>>>> greater safety than RAID5 or RAID6, and most of the speed of
>>>> RAID0.
>>
>> I did that for the swap RAID10, unsure how to change from RAID10
>> with spare to what? now that I have 6 drives in there.  Not that I
>> plan to use a lot of swap, but it is the overload area for /tmp as
>> well (/tmp mounted in memory, expands to swap after it uses half of
>> memory, something like that, I soon forget the details when there's
>> no problems).
>
>I too like my /tmp (and /var/tmp, and sometimes other ad-hoc temporary 
>directories) on tmpfs, and so often have a large swap even when I have a 
>lot of ram.  I haven't bothered using raid on the swap drives - 
>mirroring swap is a bit overkill on a desktop, though it's a good idea 
>on a server.  You don't need to explicitly use raid0 for swap - the 
>kernel does that automatically if you have multiple swap drives/partitions.

Yes, I'm building a server, if you check my headers you'll see I write 
from a windows box!
>
>I am not sure whether RAID10,far is the best choice for swap, as 
>compared to RAID10,near.  RAID10,far is excellent for a read-mostly 
>array, but writes involve more head movement than in RAID10,near - and 
>swap involves writes as much as reads.  Perhaps RAID10,offset is in fact 
>the best choice.

I don't recognise the RAID10,offset option, you see below I chose the f2 
option from my reading, but this is the first RAIDed swap I've put in place, 
and yes, I usually put a swap partition on each spindle and add the ',pri=1' 
to /etc/fstab to have them treated as RAIDO.  
A while back somebody pointed out to me that a disk failure in swap area 
is same as memory failure, therefore for a server should have redundancy 
for the swap too.  I agree, hence the RAID10, I'm happy to adjust it to 
better performing one :)

Do you have a reference for the ',offset' argument?  Or is it buried in 
'man mdadm' somewhere.

Hmm, I had to check my notes for the RAID10 setup, I have:

/etc/mdadm:

# swap: RAID10 - 4 x 2GiB + spare
ARRAY /dev/md/pooh:swap
        UUID=0e3121d0:613689a2:228d5e7b:570357bf
        devices=/dev/sd[abcd]3
        spares=/dev/sde3

And from my setup notes, I used:
swap array:
mdadm --create /dev/md1 --metadata=1.2 --verbose --level=10 \
	--layout=f2 --chunk=64 --raid-devices=4 /dev/sd[abcd]5 \
	--spare-devices=1 /dev/sde5

Now I have six by 4GB  partitions to play with for the swap array.  Which 
probably is good to keep redundancy there so I don't have to swap a disk 
that fails in just that area.  I expect total disk failure though.  

First data RAID for me, I avoided them until I met RAID6.  Too many horror 
stories people writing about losing two of three RAID5 disks, possibly due 
to using two per IDE cable or something stupid...  Drive goes down, takes 
mate on same cable with it?

Only that's not quite it because I changed the name for the final one to 
/dev/md/swap, which then had to be /dev/md/pooh:swap to keep /etc/fstab 
happy.

/etc/fstab, with my notes from the time:

# 8GB RAID10 swap space
/dev/md/pooh:swap swap          swap            defaults                0 0
# RAID6 data areas
/dev/md/data1p1 /home/raid/a    ext4            defaults                0 0
#
/dev/md/data2   /home/raid/b    xfs             defaults                0 0
#
# backup of the RAID data area, actually I think this is second backup,
#  as I change my mind about duplicating lower half of this 2TB disk for
#  connection with the 5 x 1TB RAID arrays.  The size of this partition
#  memorialises that early decision, as it has room for the 1TB partition
#  layout found on the remaining drives
#
/dev/sdg1       /home/backup1   xfs             defaults,ro             0 0
#
# okay, mount top half of the shiny new 2TB drive as temp holding place,
#  let's me think about using the bottom half in the RAID, but I'm sure
#  that's a bad idea.  Alternately, since I don't yet need that space,
#  it's ready to be pushed into service as a cold spare for the RAID6
#  data partitions
#
/dev/sdg2       /home/backup2   xfs             defaults,ro             0 0
#
# borrowed John's 1.5TB drive for temp data
/dev/sdh1       /home/backup3   ext4            defaults,ro             0 0
#

So I added a sixth 1TB drive a couple days ago, and the 2TB backup or bounce 
drive is there holding stuff that has to go onto the data RAID, then it'll 
be a de duplicated backup, my backups for stuff dating back to the 1990s is 
a mess, some things I have a dozen copies, one area I found only one copy 
from the 90s floppy disk era. 
>
>One disadvantage of RAID10 is that you can't change it after it is made 
>- you can't reshape it, grow it, or change the layout.  But for swap 
>that shouldn't be a problem - just turn your swap off, break down the 
>existing array, and create a new one including the extra drives.  Since 
>you have no data on the raid (assuming you are not using swap at the 
>time), you've nothing to lose.

Exactly right, quiesce the machine as far as big jobs go and one can turn 
swap off.
>
>
>>> Is that the "far replicas" described in the man page for md(4)?
>>>
>>> My concern about RAID10 is that I'll lose too much capacity to
>>> redundancy.  Because this is a snapshot server, I really need to
>>> maximize available storage space; if I have 12 drive bays, with
>>> 2TB drives I'd get only 12TB of usable space from a RAID10; even
>>> with 3TB drives that's only 18TB (if my math is right).  Whereas, a
>>> RAID6 with 12 2TB drives gets me 20TB usable.  (If this were my
>>> primary fileserver I'd be more likely to consider a RAID10.)
>>
>> RAID6 for data, if you're on a budget :)  RAID6 is slower than
>> RAID5, but that extra data protection is worth it, I think.  You need
>> to cost loss of data vs speed and other factors relevant for your own
>> scenario.
>>
>
>I doubt if RAID6 is noticeably slower than RAID5 for most operations. 

30% slower for initial sync, I can do some comparative benchmarking on 
the 'fast' RAID partitions (sd[abcdef]5), since that area is yet to be 
rebuilt.  I'm copying data from there to the sd[abcdef]6 RAID6 today, 
via the external temp 1.5TB drive.

>Modern cpu's handle the calculations easily.  The only slow point is 
>that partial stripe writes will be a little slower (if they miss the 
>stripe cache), since you need to read in and write out at least three 
>blocks.  But these blocks are all on different disks, so they operate in 
>parallel.

Well, I put in a quad core, Q6600 CPU, with 4GB memory, and the top usage 
is sitting between 2 and 3 for writing from external to the RAID6.
>
>I think the days of RAID5 are numbered, expect in cases where you have 
>additional protection (such as RAID1+5).  Certainly RAID5 + hot spare is 
>a meaningless choice - RAID6 would definitely be better.

Yup!
>
>> To rebuild a RAID5 with a RAID5 after total data loss is madness,
>> yet I know a guy doing business systems did that, 'cos the RAID
>> controller didn't do RAID6 (was on a windoze box).  Madness?
>
>Many low-end hardware cards don't support RAID6.

Yes, that too, I didn't know about RAID6 until a friend asked me to look at 
a NAS box he was buying.  At the moment seems only Linux mdadm and high end 
cards do RAID6?  Intel motherboard chipsets I've seen don't know about it, 
so I'm running six AHCI drives on the ICH9R 6 x SATA chipset.
>
>>>
>>>> Another option for growth is to use mdadm over partitions, rather
>>>> than whole disks.  Then when you add bigger disks, you have spare
>>>> space that you can make into new partitions, and make another
>>>> mdadm raid using them.  If you are using LVM to organise your
>>>> real partitions (which I highly recommend), then you can add your
>>>> new raid as a new physical partition and extend your working
>>>> space.
>>>
>>> Yes, I use LVM.  Using partitions sounds like a great idea, and is
>>> definitely something that I can't get out of a hardware RAID
>>> controller (another reason I'm leaning this way).
>>
>> I tried telling mdadm to grow on partition size increase and it
>> refused :(
>>
>> Probably me not up there on the learning curve, but I was
>> disappointed.
>>
>
>It depends on the type of array you have - some can be grown, others 
>cannot.  RAID 1, 5 and 6 can be grown when you have increased the 
>partition size of all components.  

It was a RAID6 I tried to grow, but I deleted it and started over, thanks to 
the plan of running two data RAID stripes, though seek time between them 
would be lousy, so that's not the planned operation, sort of active plus 
archive RAID, I could always merge them with LVM, but I read that slows down 
access times markedly.

>  But RAID 0 and 10 cannot (currently) 
>be grown.  Resizing RAID 10 would be complicated because of its layout, 
>though I'm sure one day it will be supported.  Resizing RAID 0 sounds 
>easy, but I gather that md RAID 0 is actually very general (it will work 
>with different sized disks, for example), which complicates resizing.
>
>> Since mdadm is under active development, I expect it to improve over
>> time.
>
>Some of the plans discussed on the linux-raid@vger.kernel.org mailing 
>list are /very/ exciting.

Hmm, I skim through lkml, dunno if I want to see a more detailed story ;)

Grant.
>
>>>
>>>> But the beauty of md raid is its flexibility.  Rather than use
>>>> twelve disks in a RAID6, build twelve RAID1 pairs from a real
>>>> drive and a missing drive.  Then build your RAID6 on top of these
>>>> "pairs".  The result is the same in terms of speed, capacity and
>>>> redundancy.  But when you want to replace a drive with a bigger
>>>> disk, you do it by adding the new drive to one of the pairs and
>>>> letting the pair "rebuild".  Then you remove the old disk from
>>>> the pair.  You keep the same redundancy over the whole array
>>>> throughout the operation, and the rebuild is done as a mirror
>>>> copy from one disk - the other drives are unaffected.  You can
>>>> happily do the replacement with multiple disks in parallel - as
>>>> many as you have spare drive bays.
>>>
>>> Another fantastic idea!  (Though I'm guessing the RAID1s will
>>> somehow show up as ''failed''; I would need to work around that for
>>> paging purposes.)
>>
>> Swap space?  RAID10 is best for that, from my reading.  Got to be
>> careful with swap reliability because bad swap will crash the machine
>> and possibly eat your data.  Same as bad memory.
>>
>> Grant.

[toc] | [prev] | [next] | [standalone]


#653

FromDavid Brown <david.brown@removethis.hesbynett.no>
Date2011-04-09 13:55 +0200
Message-ID<V4-dnbjdPpe_1D3QnZ2dnUVZ7vydnZ2d@lyse.net>
In reply to#644
On 09/04/11 01:47, Grant wrote:
> On Fri, 08 Apr 2011 11:12:14 +0200, David Brown<david@westcontrol.removethisbit.com>  wrote:
>
>> On 08/04/2011 02:45, Grant wrote:
>>> On Wed, 6 Apr 2011 14:00:04 -0700, Keith
>>> Keller<kkeller-usenet@wombat.san-francisco.ca.us>   wrote:

I've done some more snipping here - these posts are getting a bit too 
long for convenience.  There is lots of interest to discuss here.

>>
>> Another thing you can do with software raid is use external USB (or
>> eSATA, if possible) drives in your raids.  While you won't want to do
>> that for normal use, it can be a great way to add in a bit of extra
>> redundancy before doing operations such as moving over to larger drives.
>>   Try doing that with hardware raid cards!
>
> I have a borrowed drive out on an eSATA right now for an extra bounce buffer :)
>
> But I'm leery of making an external drive a RAID member, prefer RAID members
> to be bolted down.

Yes, but the extra external disk during such maintenance is a great 
safety net.

>
> Got UPS so I can run XFS safely, though I've yet to rewrite the crappy script
> that came with the UPS for delayed shutdown.  I think UPS is important part of
> RAID discussions.

I take an UPS for granted in a server situation.  Using RAID without an 
UPS is much like having a car airbag and then not wearing a seatbelt. 
If your power dies while you are writing to the disk, then RAID will not 
save you - and it will mean /very/ long check times on restart.


>>
>>
>>>>
>>>> Oh boy, I learned that The Hard Way (TM) many years ago, when I
> ...
>>>> made extra work for me.)
>>>>
>>>>> Personally, I like RAID10 with "far" layout - it gives you
>>>>> greater safety than RAID5 or RAID6, and most of the speed of
>>>>> RAID0.
>>>
>>> I did that for the swap RAID10, unsure how to change from RAID10
>>> with spare to what? now that I have 6 drives in there.  Not that I
>>> plan to use a lot of swap, but it is the overload area for /tmp as
>>> well (/tmp mounted in memory, expands to swap after it uses half of
>>> memory, something like that, I soon forget the details when there's
>>> no problems).
>>
>> I too like my /tmp (and /var/tmp, and sometimes other ad-hoc temporary
>> directories) on tmpfs, and so often have a large swap even when I have a
>> lot of ram.  I haven't bothered using raid on the swap drives -
>> mirroring swap is a bit overkill on a desktop, though it's a good idea
>> on a server.  You don't need to explicitly use raid0 for swap - the
>> kernel does that automatically if you have multiple swap drives/partitions.
>
> Yes, I'm building a server, if you check my headers you'll see I write
> from a windows box!

If you check /my/ headers, you'll see that some of my posts are from a 
windows machine at work, others from a linux machine at home.

But even for servers, redundancy on your swap partitions is perhaps only 
an issue if you really need continuous service.  For many uses, RAID is 
about /reducing/ downtime - it is not necessary to try to /eliminate/ 
downtime.  Still, making your swap space redundant is not exactly a big 
cost - it's just a small sliver off each disk in your arrays.

>>
>> I am not sure whether RAID10,far is the best choice for swap, as
>> compared to RAID10,near.  RAID10,far is excellent for a read-mostly
>> array, but writes involve more head movement than in RAID10,near - and
>> swap involves writes as much as reads.  Perhaps RAID10,offset is in fact
>> the best choice.
>
> I don't recognise the RAID10,offset option, you see below I chose the f2
> option from my reading, but this is the first RAIDed swap I've put in place,
> and yes, I usually put a swap partition on each spindle and add the ',pri=1'
> to /etc/fstab to have them treated as RAIDO.
> A while back somebody pointed out to me that a disk failure in swap area
> is same as memory failure, therefore for a server should have redundancy
> for the swap too.  I agree, hence the RAID10, I'm happy to adjust it to
> better performing one :)
>
> Do you have a reference for the ',offset' argument?  Or is it buried in
> 'man mdadm' somewhere.
>

Yes, the "offset" option is somewhere in the mdadm man page.  It doesn't 
get the same level of publicity as the "far" option, which is an 
exclusive feature in Linux md raid, and is generally the fastest choice 
("near" is pretty much standard RAID1+0, if you have 4 disks).  I think 
"offset" was added to md for compatibility with some other raid system, 
but I suspect it might actually be the best choice for when you have 
lots of writes, especially small writes, such as for swap space.

There are some layout diagrams here:

<http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10>



> Hmm, I had to check my notes for the RAID10 setup, I have:
>
> /etc/mdadm:
>
> # swap: RAID10 - 4 x 2GiB + spare
> ARRAY /dev/md/pooh:swap
>          UUID=0e3121d0:613689a2:228d5e7b:570357bf
>          devices=/dev/sd[abcd]3
>          spares=/dev/sde3
>
> And from my setup notes, I used:
> swap array:
> mdadm --create /dev/md1 --metadata=1.2 --verbose --level=10 \
> 	--layout=f2 --chunk=64 --raid-devices=4 /dev/sd[abcd]5 \
> 	--spare-devices=1 /dev/sde5
>
> Now I have six by 4GB  partitions to play with for the swap array.  Which
> probably is good to keep redundancy there so I don't have to swap a disk
> that fails in just that area.  I expect total disk failure though.
>
> First data RAID for me, I avoided them until I met RAID6.  Too many horror
> stories people writing about losing two of three RAID5 disks, possibly due
> to using two per IDE cable or something stupid...  Drive goes down, takes
> mate on same cable with it?
>

With IDE cables, it was certainly possible for one failure to bring down 
the other disk on the same cable.  You also suffered from low bandwidth 
when you had two disks on the same cable.  Parallel SCSI had similar 
issues, but then you could have more than just two disks in the same 
chain.  On the other hand, the SCSI disks were more robust and less 
likely to be affected by the failures of other disks in the chain.

But with serial cables (SATA or SAS), you don't get these problems any more.

However, there is still a risk of losing a second disk during a RAID5 
rebuild.  Rebuilds are the most stressful action you can have on a RAID5 
(or RAID6) array, so if a second disk is feeling poorly, then a rebuild 
might be the trigger that pushes it over the edge.

As always, make sure you have good independent backups of anything 
important - even if you use RAID6!

> Only that's not quite it because I changed the name for the final one to
> /dev/md/swap, which then had to be /dev/md/pooh:swap to keep /etc/fstab
> happy.
>
> /etc/fstab, with my notes from the time:
>
> # 8GB RAID10 swap space
> /dev/md/pooh:swap swap          swap            defaults                0 0
> # RAID6 data areas
> /dev/md/data1p1 /home/raid/a    ext4            defaults                0 0
> #
> /dev/md/data2   /home/raid/b    xfs             defaults                0 0
> #
> # backup of the RAID data area, actually I think this is second backup,
> #  as I change my mind about duplicating lower half of this 2TB disk for
> #  connection with the 5 x 1TB RAID arrays.  The size of this partition
> #  memorialises that early decision, as it has room for the 1TB partition
> #  layout found on the remaining drives
> #
> /dev/sdg1       /home/backup1   xfs             defaults,ro             0 0
> #
> # okay, mount top half of the shiny new 2TB drive as temp holding place,
> #  let's me think about using the bottom half in the RAID, but I'm sure
> #  that's a bad idea.  Alternately, since I don't yet need that space,
> #  it's ready to be pushed into service as a cold spare for the RAID6
> #  data partitions
> #
> /dev/sdg2       /home/backup2   xfs             defaults,ro             0 0
> #
> # borrowed John's 1.5TB drive for temp data
> /dev/sdh1       /home/backup3   ext4            defaults,ro             0 0
> #
>
> So I added a sixth 1TB drive a couple days ago, and the 2TB backup or bounce
> drive is there holding stuff that has to go onto the data RAID, then it'll
> be a de duplicated backup, my backups for stuff dating back to the 1990s is
> a mess, some things I have a dozen copies, one area I found only one copy
> from the 90s floppy disk era.
>>
>> One disadvantage of RAID10 is that you can't change it after it is made
>> - you can't reshape it, grow it, or change the layout.  But for swap
>> that shouldn't be a problem - just turn your swap off, break down the
>> existing array, and create a new one including the extra drives.  Since
>> you have no data on the raid (assuming you are not using swap at the
>> time), you've nothing to lose.
>
> Exactly right, quiesce the machine as far as big jobs go and one can turn
> swap off.
>>
>>
>>>> Is that the "far replicas" described in the man page for md(4)?
>>>>
>>>> My concern about RAID10 is that I'll lose too much capacity to
>>>> redundancy.  Because this is a snapshot server, I really need to
>>>> maximize available storage space; if I have 12 drive bays, with
>>>> 2TB drives I'd get only 12TB of usable space from a RAID10; even
>>>> with 3TB drives that's only 18TB (if my math is right).  Whereas, a
>>>> RAID6 with 12 2TB drives gets me 20TB usable.  (If this were my
>>>> primary fileserver I'd be more likely to consider a RAID10.)
>>>
>>> RAID6 for data, if you're on a budget :)  RAID6 is slower than
>>> RAID5, but that extra data protection is worth it, I think.  You need
>>> to cost loss of data vs speed and other factors relevant for your own
>>> scenario.
>>>
>>
>> I doubt if RAID6 is noticeably slower than RAID5 for most operations.
>
> 30% slower for initial sync, I can do some comparative benchmarking on
> the 'fast' RAID partitions (sd[abcdef]5), since that area is yet to be
> rebuilt.  I'm copying data from there to the sd[abcdef]6 RAID6 today,
> via the external temp 1.5TB drive.
>

Don't place too much emphasis on the initial sync time - that's only 
done once, and doesn't matter in the long run.  Rebuild times are a bit 
more important, but you (hopefully!) don't have to rebuild often.  It's 
the speed of the array in real-time use that's important.

>> Modern cpu's handle the calculations easily.  The only slow point is
>> that partial stripe writes will be a little slower (if they miss the
>> stripe cache), since you need to read in and write out at least three
>> blocks.  But these blocks are all on different disks, so they operate in
>> parallel.
>
> Well, I put in a quad core, Q6600 CPU, with 4GB memory, and the top usage
> is sitting between 2 and 3 for writing from external to the RAID6.
>>
>> I think the days of RAID5 are numbered, expect in cases where you have
>> additional protection (such as RAID1+5).  Certainly RAID5 + hot spare is
>> a meaningless choice - RAID6 would definitely be better.
>
> Yup!
>>
>>> To rebuild a RAID5 with a RAID5 after total data loss is madness,
>>> yet I know a guy doing business systems did that, 'cos the RAID
>>> controller didn't do RAID6 (was on a windoze box).  Madness?
>>
>> Many low-end hardware cards don't support RAID6.
>
> Yes, that too, I didn't know about RAID6 until a friend asked me to look at
> a NAS box he was buying.  At the moment seems only Linux mdadm and high end
> cards do RAID6?  Intel motherboard chipsets I've seen don't know about it,
> so I'm running six AHCI drives on the ICH9R 6 x SATA chipset.

The "raid" supported by motherboard chipsets is often known as 
"fakeraid".  It's a limited form of software raid, with all the 
disadvantages of software raid and all the disadvantages of hardware 
raid.  It's okay as a quick and easy solution for a desktop with either 
RAID0 or RAID1 for a pair of disks, and especially useful for OS's that 
don't have particularly good software raid (guess which one...).  But 
it's a poor choice for a more serious setup.

>>
>>>>
>>>>> Another option for growth is to use mdadm over partitions, rather
>>>>> than whole disks.  Then when you add bigger disks, you have spare
>>>>> space that you can make into new partitions, and make another
>>>>> mdadm raid using them.  If you are using LVM to organise your
>>>>> real partitions (which I highly recommend), then you can add your
>>>>> new raid as a new physical partition and extend your working
>>>>> space.
>>>>
>>>> Yes, I use LVM.  Using partitions sounds like a great idea, and is
>>>> definitely something that I can't get out of a hardware RAID
>>>> controller (another reason I'm leaning this way).
>>>
>>> I tried telling mdadm to grow on partition size increase and it
>>> refused :(
>>>
>>> Probably me not up there on the learning curve, but I was
>>> disappointed.
>>>
>>
>> It depends on the type of array you have - some can be grown, others
>> cannot.  RAID 1, 5 and 6 can be grown when you have increased the
>> partition size of all components.
>
> It was a RAID6 I tried to grow, but I deleted it and started over, thanks to
> the plan of running two data RAID stripes, though seek time between them
> would be lousy, so that's not the planned operation, sort of active plus
> archive RAID, I could always merge them with LVM, but I read that slows down
> access times markedly.
>

LVM can slow down operations in a number of ways.  The layers of 
indirection will increase access times, and it is easy to get 
non-contiguous logical partitions, especially if you have several 
physical volumes, which can mess with the filesystem's optimisations. 
But you get an enormous flexibility by using it.  The usual attitude is 
therefore to make your low-level RAID using md raid, getting the fastest 
setup you can with the redundancy and space requirements you need.  Then 
you put LVM on top and accept the speed costs for the flexibility gains.

>>   But RAID 0 and 10 cannot (currently)
>> be grown.  Resizing RAID 10 would be complicated because of its layout,
>> though I'm sure one day it will be supported.  Resizing RAID 0 sounds
>> easy, but I gather that md RAID 0 is actually very general (it will work
>> with different sized disks, for example), which complicates resizing.
>>
>>> Since mdadm is under active development, I expect it to improve over
>>> time.
>>
>> Some of the plans discussed on the linux-raid@vger.kernel.org mailing
>> list are /very/ exciting.
>
> Hmm, I skim through lkml, dunno if I want to see a more detailed story ;)


Have a look at <http://neil.brown.name/blog/20110216044002>.  Neil 
writes a very clear and well-thought-out article (as well as writing 
excellent software!).

[toc] | [prev] | [next] | [standalone]


#694

FromTris Orendorff <triso@remove-me.cogeco.ca>
Date2011-04-12 18:04 +0000
Message-ID<Xns9EC58F37B7D8RepublicPicturesLtd@69.16.185.250>
In reply to#625
Grant <omg@grrr.id.au> burped up warm pablum in
news:tllsp6ltftsq6ufp048hcc4ivufupgbmki@4ax.com: 

> 
> Swap space?  RAID10 is best for that, from my reading.  Got to be
> careful with swap reliability because bad swap will crash the machine
> and possibly eat your data.  Same as bad memory.

 Swap space?  Isn't that useless for a server?  We've found it next-to-useless on our desktops even with the fastest 
SSDs.

-- 
Tris Orendorff
[ Anyone naming their child should spend a few minutes checking rhyming slang and dodgy sounding names. Brad and 
Angelina failed to do this when naming their kid Shiloh Pitt. At some point, someone at school is going to spoonerise her 
name.
Craig Stark ]

[toc] | [prev] | [next] | [standalone]


#695

FromKeith Keller <kkeller-usenet@wombat.san-francisco.ca.us>
Date2011-04-12 11:34 -0700
Message-ID<viie78x9sc.ln2@goaway.wombat.san-francisco.ca.us>
In reply to#694
On 2011-04-12, Tris Orendorff <triso@remove-me.cogeco.ca> wrote:
>
>  Swap space?  Isn't that useless for a server?  We've found it next-to-useless on our desktops even with the fastest SSDs.

It's not *useless* per se.  The example I sometimes see cited (even in
this thread?) is with xfs repairs on large filesystems.  These can take
a ton of memory, but much of it isn't active, and let's face it, you
probably really really want the xfs check or repair to work no matter
what, so you'd be willing to sacrifice performance for results.  (Of
course you need plenty of free disk space on an unaffected fs!)

But even if you don't do that, it's still not completely useless.  The
kernel will move things to swap if it hasn't been used in a while and
free memory is wanted; when the memory frees up, it can leave that data
in swap so that it can use more physical memory for other tasks (e.g.,
more disk buffers).  In this use case, you're not actually using the
swap space very often.  It's true that if you're counting on swap to be
useful as active memory you're likely to be disappointed, but used as
one-off space it can be handy.

--keith



-- 
kkeller-usenet@wombat.san-francisco.ca.us
(try just my userid to email me)
AOLSFAQ=http://www.therockgarden.ca/aolsfaq.txt
see X- headers for PGP signature information

[toc] | [prev] | [next] | [standalone]


#696

FromThe Natural Philosopher <tnp@invalid.invalid>
Date2011-04-12 21:13 +0100
Message-ID<io2bp3$9sk$1@news.albasani.net>
In reply to#695
Keith Keller wrote:
>   The
> kernel will move things to swap if it hasn't been used in a while and
> free memory is wanted; when the memory frees up, it can leave that data
> in swap so that it can use more physical memory for other tasks (e.g.,
> more disk buffers).  

That is key,  It means the rarely used admin processes that ar 
essentially asleep, do not fill the RAM.

[toc] | [prev] | [next] | [standalone]


#698

FromDavid Brown <david@westcontrol.removethisbit.com>
Date2011-04-13 09:45 +0200
Message-ID<XcSdnU4wVYejyTjQnZ2dnUVZ8o-dnZ2d@lyse.net>
In reply to#696
On 12/04/2011 22:13, The Natural Philosopher wrote:
> Keith Keller wrote:
>> The
>> kernel will move things to swap if it hasn't been used in a while and
>> free memory is wanted; when the memory frees up, it can leave that data
>> in swap so that it can use more physical memory for other tasks (e.g.,
>> more disk buffers).
>
> That is key, It means the rarely used admin processes that ar
> essentially asleep, do not fill the RAM.

Such processes are usually small, but it's a good principle none the less.


The reason I like swap space is as a backing store for tmpfs 
filesystems.  I usually put /tmp and /var/tmp on tmpfs, and sometimes 
have additional tmpfs mounts for odd purposes (such as the build 
directories for large compilations - though obviously that's more 
desktop than server usage).  Tmpfs is much faster and more efficient 
than any other filesystem, even if the data is stored on a disk rather 
than in memory, because it does not give the slightest care to data 
reliability.

The Linux kernel is good at memory management, and at balancing what 
goes in ram and what goes in swap.  Clearly you are always faster with 
x+y ram instead of x ram and y swap, but x+y ram and y swap is even better.

[toc] | [prev] | [next] | [standalone]


#711

FromGrant <omg@grrr.id.au>
Date2011-04-14 13:42 +1000
Message-ID<h2rcq6hvcf8rbb0rigptkf6alrtfhsvqp7@4ax.com>
In reply to#698
On Wed, 13 Apr 2011 09:45:58 +0200, David Brown <david@westcontrol.removethisbit.com> wrote:

>On 12/04/2011 22:13, The Natural Philosopher wrote:
>> Keith Keller wrote:
>>> The
>>> kernel will move things to swap if it hasn't been used in a while and
>>> free memory is wanted; when the memory frees up, it can leave that data
>>> in swap so that it can use more physical memory for other tasks (e.g.,
>>> more disk buffers).
>>
>> That is key, It means the rarely used admin processes that ar
>> essentially asleep, do not fill the RAM.
>
>Such processes are usually small, but it's a good principle none the less.
>
>
>The reason I like swap space is as a backing store for tmpfs 
>filesystems.  I usually put /tmp and /var/tmp on tmpfs, and sometimes 
>have additional tmpfs mounts for odd purposes (such as the build 
>directories for large compilations - though obviously that's more 
>desktop than server usage).  Tmpfs is much faster and more efficient 
>than any other filesystem, even if the data is stored on a disk rather 
>than in memory, because it does not give the slightest care to data 
>reliability.

Wonder if you could show your relevant /etc/fstab lines?  I'm curious 
how other do this?
>
>The Linux kernel is good at memory management, and at balancing what 
>goes in ram and what goes in swap.  Clearly you are always faster with 
>x+y ram instead of x ram and y swap, but x+y ram and y swap is even better.

Grant.

[toc] | [prev] | [next] | [standalone]


#715

FromDavid Brown <david@westcontrol.removethisbit.com>
Date2011-04-14 09:15 +0200
Message-ID<-NmdnSetarUCAzvQnZ2dnUVZ8hKdnZ2d@lyse.net>
In reply to#711
On 14/04/2011 05:42, Grant wrote:
> On Wed, 13 Apr 2011 09:45:58 +0200, David Brown<david@westcontrol.removethisbit.com>  wrote:
>
>> On 12/04/2011 22:13, The Natural Philosopher wrote:
>>> Keith Keller wrote:
>>>> The
>>>> kernel will move things to swap if it hasn't been used in a while and
>>>> free memory is wanted; when the memory frees up, it can leave that data
>>>> in swap so that it can use more physical memory for other tasks (e.g.,
>>>> more disk buffers).
>>>
>>> That is key, It means the rarely used admin processes that ar
>>> essentially asleep, do not fill the RAM.
>>
>> Such processes are usually small, but it's a good principle none the less.
>>
>>
>> The reason I like swap space is as a backing store for tmpfs
>> filesystems.  I usually put /tmp and /var/tmp on tmpfs, and sometimes
>> have additional tmpfs mounts for odd purposes (such as the build
>> directories for large compilations - though obviously that's more
>> desktop than server usage).  Tmpfs is much faster and more efficient
>> than any other filesystem, even if the data is stored on a disk rather
>> than in memory, because it does not give the slightest care to data
>> reliability.
>
> Wonder if you could show your relevant /etc/fstab lines?  I'm curious
> how other do this?

Putting /tmp on tmpfs is not rocket science - if you thought I had some 
cunning secret here, I have to disappoint you :

tmpfs /tmp tmpfs defaults 0 0
tmpfs /var/tmp tmpfs defaults 0 0

(Note that /var/tmp should really survive a reboot.  However, I have 
never heard of any programs that actually rely on that - but no 
guarantees.  /tmp should always be safe on tmpfs.)

You can make a new tmpfs on another directory:

mkdir t
mount -t tmpfs tmpfs t

By default, tmpfs mounts are limited in size to half your physical ram - 
but you can change that with the "size" mount option.  tmpfs takes 
negligible space overhead - you only use ram/swap for the files stored 
there.

>>
>> The Linux kernel is good at memory management, and at balancing what
>> goes in ram and what goes in swap.  Clearly you are always faster with
>> x+y ram instead of x ram and y swap, but x+y ram and y swap is even better.
>
> Grant.

[toc] | [prev] | [next] | [standalone]


#743

FromGrant <omg@grrr.id.au>
Date2011-04-15 08:03 +1000
Message-ID<35qeq61o56nifjseocj5jf7c3h1k6snjv8@4ax.com>
In reply to#715
On Thu, 14 Apr 2011 09:15:32 +0200, David Brown <david@westcontrol.removethisbit.com> wrote:

>On 14/04/2011 05:42, Grant wrote:
>> On Wed, 13 Apr 2011 09:45:58 +0200, David Brown<david@westcontrol.removethisbit.com>  wrote:
>>
>>> On 12/04/2011 22:13, The Natural Philosopher wrote:
>>>> Keith Keller wrote:
>>>>> The
>>>>> kernel will move things to swap if it hasn't been used in a while and
>>>>> free memory is wanted; when the memory frees up, it can leave that data
>>>>> in swap so that it can use more physical memory for other tasks (e.g.,
>>>>> more disk buffers).
>>>>
>>>> That is key, It means the rarely used admin processes that ar
>>>> essentially asleep, do not fill the RAM.
>>>
>>> Such processes are usually small, but it's a good principle none the less.
>>>
>>>
>>> The reason I like swap space is as a backing store for tmpfs
>>> filesystems.  I usually put /tmp and /var/tmp on tmpfs, and sometimes
>>> have additional tmpfs mounts for odd purposes (such as the build
>>> directories for large compilations - though obviously that's more
>>> desktop than server usage).  Tmpfs is much faster and more efficient
>>> than any other filesystem, even if the data is stored on a disk rather
>>> than in memory, because it does not give the slightest care to data
>>> reliability.
>>
>> Wonder if you could show your relevant /etc/fstab lines?  I'm curious
>> how other do this?
>
>Putting /tmp on tmpfs is not rocket science - if you thought I had some 
>cunning secret here, I have to disappoint you :
>
>tmpfs /tmp tmpfs defaults 0 0
>tmpfs /var/tmp tmpfs defaults 0 0

So from where did I get this?
...
tmpfs           /dev/shm        tmpfs           defaults                0 0
#
# run /tmp in memory, use up to twice physical memory size, 8GB!
none    /tmp    tmpfs           size=8096M,mode=1777,nodev,nosuid       0 0
#

It works too, in that dd'ing to a new file in /tmp will use half memory 
then expand into swap space:

root@pooh:~# time (dd if=/dev/zero bs=1G count=6 of=/tmp/zeroes; sync)
6+0 records in
6+0 records out
6442450944 bytes (6.4 GB) copied, 25.6495 s, 251 MB/s

real    0m27.977s
user    0m0.003s
sys     0m9.449s

Why this confusion with GiB and GB?  dd counts by GiB, reports in decimal 
GB, a bet each way?  And yes, running into swap space takes a lot of time ;)
Swap is on RAID10, now set  to o2 :)

root@pooh:~# ls -l /tmp/
total 6303772
-rw-r--r-- 1 root root 6442450944 2011-04-15 07:40 zeroes

root@pooh:~# cat /proc/swaps
Filename                                Type            Size    Used    Priority
/dev/md127                              partition       8386300 4815424 -1

root@pooh:~# free
             total       used       free     shared    buffers     cached
Mem:       4053296    2269572    1783724          0      47532    1754452
-/+ buffers/cache:     467588    3585708
Swap:      8386300    4815332    3570968

root@pooh:~# time (dd if=/dev/zero bs=1G count=6 of=/tmp/zeroes2; sync)
dd: writing `/tmp/zeroes2': No space left on device
2+0 records in
1+0 records out
2030231552 bytes (2.0 GB) copied, 6.60451 s, 307 MB/s

real    0m8.519s
user    0m0.000s
sys     0m4.110s

root@pooh:~# ls -l /tmp/
total 8290308
-rw-r--r-- 1 root root 6442450944 2011-04-15 07:40 zeroes
-rw-r--r-- 1 root root 2030231552 2011-04-15 07:50 zeroes2

Shouldn't I get 10GB into /tmp if it has 2GB of real memory plus the 8GB 
swap sapce?  No, because I set /tmp size, had to, to make it go larger 
than tmpfs default.

root@pooh:~# rm /tmp/z*

Don't leave a saturated /tmp space!
>
>(Note that /var/tmp should really survive a reboot.  However, I have 
>never heard of any programs that actually rely on that - but no 
>guarantees.  /tmp should always be safe on tmpfs.)

Hmm, I don't do anything special for /var/tmp, but on a slack-11.0 box 
been up 16 days, it's empty.  ON the 'pooh' box above, it's got old crap 
surviving boot for KDE, 2.2MB for a single user, wonder why?  I tend 
towards wanting to flush that one on boot too, or make it in tmpfs.

root@pooh:~# ls -las /var/tmp
total 1
0 drwxrwxrwt  3 root  root   80 2011-01-07 11:10 ./
1 drwxr-xr-x 19 root  root  536 2011-02-10 08:03 ../
0 drwx------  3 grant wheel 128 2011-01-07 11:12 kdecache-grant/
root@pooh:~# ls -las /var/tmp/kdecache-grant/
total 2166
   0 drwx------ 3 grant wheel     128 2011-01-07 11:12 ./
   0 drwxrwxrwt 3 root  root       80 2011-01-07 11:10 ../
   0 drwx------ 2 grant wheel     168 2011-01-07 11:13 kpc/
2162 -rw-r--r-- 1 grant wheel 2211743 2011-01-07 11:12 ksycoca4
   4 -rw-r--r-- 1 grant wheel     358 2011-01-07 11:12 ksycoca4stamp
root@pooh:~# du -sh /var/tmp
2.2M    /var/tmp
>
>You can make a new tmpfs on another directory:
>
>mkdir t
>mount -t tmpfs tmpfs t
>
>By default, tmpfs mounts are limited in size to half your physical ram - 
>but you can change that with the "size" mount option.  tmpfs takes 
>negligible space overhead - you only use ram/swap for the files stored 
>there.

Thanks.

Grant.

[toc] | [prev] | [next] | [standalone]


Page 1 of 3  [1] 2 3  Next page →

Back to top | Article view | comp.os.linux.misc


csiph-web