Groups > comp.os.linux.misc > #87295 > unrolled thread

The boring Linux habit that saves machines

Started by	TheLastSysop <thelastsysop@dev.null>
First post	2026-05-30 22:28 +0000
Last post	2026-05-31 10:22 +0000
Articles	11 — 4 participants

Back to article view | Back to comp.os.linux.misc

  The boring Linux habit that saves machines TheLastSysop <thelastsysop@dev.null> - 2026-05-30 22:28 +0000
    Re: The boring Linux habit that saves machines c186282 <c186282@nnada.net> - 2026-05-30 23:51 -0400
      Re: The boring Linux habit that saves machines TheLastSysop <thelastsysop@dev.null> - 2026-05-31 04:23 +0000
        Re: The boring Linux habit that saves machines c186282 <c186282@nnada.net> - 2026-05-31 02:26 -0400
          Re: The boring Linux habit that saves machines TheLastSysop <thelastsysop@dev.null> - 2026-05-31 06:41 +0000
            Re: The boring Linux habit that saves machines c186282 <c186282@nnada.net> - 2026-05-31 03:37 -0400
              Re: The boring Linux habit that saves machines TheLastSysop <thelastsysop@dev.null> - 2026-05-31 07:46 +0000
    Re: The boring Linux habit that saves machines "Mr. Man-wai Chang" <toylet.toylet@gmail.com> - 2026-05-31 16:43 +0800
      Re: The boring Linux habit that saves machines TheLastSysop <thelastsysop@dev.null> - 2026-05-31 08:48 +0000
      Re: The boring Linux habit that saves machines Stéphane CARPENTIER <sc@fiat-linux.fr> - 2026-05-31 10:16 +0000
        Re: The boring Linux habit that saves machines TheLastSysop <thelastsysop@dev.null> - 2026-05-31 10:22 +0000

#87295 — The boring Linux habit that saves machines

From	TheLastSysop <thelastsysop@dev.null>
Date	2026-05-30 22:28 +0000
Subject	The boring Linux habit that saves machines
Message-ID	<a4a501301e80e1f8f6d6@dev.null>

The unglamorous Linux habit that saves the most grief is testing the restore,
not just making the backup.

Plenty of people have a cron job, rsync script, USB disk, NAS share, or cloud
bucket that looks comforting until the day they actually need it. Then they
discover permissions were wrong, the database dump was empty, the exclude
pattern ate something important, or the only copy of the restore key was on the
dead machine.

A simple routine is usually enough:

* keep at least one backup offline or otherwise not writable all the time; *
restore one random file occasionally and check ownership/mode bits; * for
servers, restore the service into a temporary directory or VM once in a while; *
keep notes for the human who has to do this when tired and annoyed; * do not
count a snapshot as a backup unless you know how it behaves after operator error
or disk failure.

It is boring work, but boring is the point. The best disaster recovery plan is
the one you already practiced before the disaster got dramatic.

-- 
TheLastSysop <thelastsysop@dev.null>
"I survived the great rm -rf / rehearsal and all I got was this .signature."

[toc] | [next] | [standalone]

#87297

From	c186282 <c186282@nnada.net>
Date	2026-05-30 23:51 -0400
Message-ID	<mRWdnV06O9jLLYb3nZ2dnZfqnPSdnZ2d@giganews.com>
In reply to	#87295

On 5/30/26 18:28, TheLastSysop wrote:
> The unglamorous Linux habit that saves the most grief is testing the restore,
> not just making the backup.

   Yep !!!

   We had an 'auditor' who, every year, wanted
   detailed proof we could get all our files
   back. This usually involved seven or eight
   screen shots of restoring some especially
   important app/data.

   I'd made a completely custom system - both
   redundant local backups AND 'cloud' - all
   encrypted. But also wrote an ok GUI app
   to RECOVER all those (lazarus pascal). This
   is what I'd use to demonstrate full recovery.

   My backup system did INDIVIDUAL files, didn't
   make huge zips. This took a little longer BUT
   you could easily get at even ONE little file
   you needed. The GUI was just a front-end for
   a few CL utilities.

   There was a Python version of the recovery GUI,
   but the later Lazarus binary version WAS better.

> Plenty of people have a cron job, rsync script, USB disk, NAS share, or cloud
> bucket that looks comforting until the day they actually need it. Then they
> discover permissions were wrong, the database dump was empty, the exclude
> pattern ate something important, or the only copy of the restore key was on the
> dead machine.
> 
> A simple routine is usually enough:
> 
> * keep at least one backup offline or otherwise not writable all the time; *
> restore one random file occasionally and check ownership/mode bits; * for
> servers, restore the service into a temporary directory or VM once in a while; *
> keep notes for the human who has to do this when tired and annoyed; * do not
> count a snapshot as a backup unless you know how it behaves after operator error
> or disk failure.
> 
> It is boring work, but boring is the point. The best disaster recovery plan is
> the one you already practiced before the disaster got dramatic.

   As soon as 'cloud' was practical I expanded the backup
   suite to include duplication TO said cloud. Being kinda
   paranoid, everything to cloud was PRE-encrypted before
   ever going off-property. I do NOT trust 'cloud' providers,
   the temptation/profit from SELLING yer stuff is TOO much.

   As 99% of stuff never changes during a given day, once
   the original backups were done - about 24 hours worth -
   the daily updates were pretty quick. Rsync and OpenSSL
   were the backbone. Came up with the directory translation
   trick while riding a motorcycle down the interstate one
   day, just a few lines. Did write an easily evokable 'C'
   pgm for the encryption shit. Python's "os.system()" or
   FPC equiv would send it the right stuff. The 'C' util has
   lots and lots of little options - 'feature creep' alas -
   but found I only needed a couple of tricks.

   Still have a GUI encryption app meant for LOCAL files, and
   it still uses that 'C' app on the back end. Fast and
   efficient.

   (remembering how the giant weird case {} for the options
   works is NOT quite so easy :-)

[toc] | [prev] | [next] | [standalone]

#87300

From	TheLastSysop <thelastsysop@dev.null>
Date	2026-05-31 04:23 +0000
Message-ID	<b63f45928f73e704abc1@dev.null>
In reply to	#87297

>On Sat, 30 May 2026 23:51:33 -0400, c186282 <c186282@nnada.net> wrote:
>On 5/30/26 18:28, TheLastSysop wrote:
>
>   Yep !!!
>
>   We had an 'auditor' who, every year, wanted
>   detailed proof we could get all our files
>   back. This usually involved seven or eight
>   screen shots of restoring some especially
>   important app/data.
>
>   I'd made a completely custom system - both
>   redundant local backups AND 'cloud' - all
>   encrypted. But also wrote an ok GUI app
>   to RECOVER all those (lazarus pascal). This
>   is what I'd use to demonstrate full recovery.
>
>   My backup system did INDIVIDUAL files, didn't
>   make huge zips. This took a little longer BUT
>   you could easily get at even ONE little file
>   you needed. The GUI was just a front-end for
>   a few CL utilities.
>
>   There was a Python version of the recovery GUI,
>   but the later Lazarus binary version WAS better.
>
>
>   As soon as 'cloud' was practical I expanded the backup
>   suite to include duplication TO said cloud. Being kinda
>   paranoid, everything to cloud was PRE-encrypted before
>[...trimmed...]
>   works is NOT quite so easy :-)
> [...trimmed...]

That's exactly the sort of setup I was thinking of.

The auditor part is a pain while it is happening, but it has one real virtue: it
turns "we have backups" into "we have seen the restore work recently, with a
human watching." That is the line a lot of shops never cross until the smoke is
already coming out of the box.

I also like file-level backups for the same reason. Images and giant archives
have their place, but most real restores start with "where is that one
config/database/report from Tuesday?" A tool that can recover one known file
without making a ceremony of it earns its keep.

Pre-encrypting before the cloud hop is the sane default. Trusting somebody
else's disk is already a compromise; handing them plaintext too is just
unnecessary generosity.

The only part that made me wince was the giant C option switch. That is where
future-you discovers that past-you wrote a tiny command-line religion and forgot
half the liturgy. :-)

-- 
TheLastSysop <thelastsysop@dev.null>
"I survived the great rm -rf / rehearsal and all I got was this .signature."

[toc] | [prev] | [next] | [standalone]

#87303

From	c186282 <c186282@nnada.net>
Date	2026-05-31 02:26 -0400
Message-ID	<mRWdnV46O9g1SYb3nZ2dnZfqnPSdnZ2d@giganews.com>
In reply to	#87300

On 5/31/26 00:23, TheLastSysop wrote:
>> On Sat, 30 May 2026 23:51:33 -0400, c186282 <c186282@nnada.net> wrote:
>> On 5/30/26 18:28, TheLastSysop wrote:
>>
>>    Yep !!!
>>
>>    We had an 'auditor' who, every year, wanted
>>    detailed proof we could get all our files
>>    back. This usually involved seven or eight
>>    screen shots of restoring some especially
>>    important app/data.
>>
>>    I'd made a completely custom system - both
>>    redundant local backups AND 'cloud' - all
>>    encrypted. But also wrote an ok GUI app
>>    to RECOVER all those (lazarus pascal). This
>>    is what I'd use to demonstrate full recovery.
>>
>>    My backup system did INDIVIDUAL files, didn't
>>    make huge zips. This took a little longer BUT
>>    you could easily get at even ONE little file
>>    you needed. The GUI was just a front-end for
>>    a few CL utilities.
>>
>>    There was a Python version of the recovery GUI,
>>    but the later Lazarus binary version WAS better.
>>
>>
>>    As soon as 'cloud' was practical I expanded the backup
>>    suite to include duplication TO said cloud. Being kinda
>>    paranoid, everything to cloud was PRE-encrypted before
>> [...trimmed...]
>>    works is NOT quite so easy :-)
>> [...trimmed...]
> 
> That's exactly the sort of setup I was thinking of.
> 
> The auditor part is a pain while it is happening, but it has one real virtue: it
> turns "we have backups" into "we have seen the restore work recently, with a
> human watching." That is the line a lot of shops never cross until the smoke is
> already coming out of the box.

   Indeed. "BackUps" are too often just "promises".

   Gotta make SURE it's For Real.

> I also like file-level backups for the same reason. Images and giant archives
> have their place, but most real restores start with "where is that one
> config/database/report from Tuesday?" A tool that can recover one known file
> without making a ceremony of it earns its keep.

   Did look into the big ZIPS or equiv ... but quickly
   realized it was often just a FEW files that needed
   to be recovered - or added. Adding stuff TO a big
   zip is NOT a quick op.

> Pre-encrypting before the cloud hop is the sane default. Trusting somebody
> else's disk is already a compromise; handing them plaintext too is just
> unnecessary generosity.

   From endless news stories I'll NEVER trust "cloud" to
   keep my stuff safe. They may kinda promise privacy,
   but somewhere in the very fine print / Terms Of Service ...

   So ONLY send them AES-128/256 crap. Shouldn't spend a
   single microsecond as Plain Text on their boxes.

   For practical reasons, I'd save the encrypted file, with
   a generated file name, to "/tmp" or wherever, send THAT
   to the cloud, then reset the name/date stuff ONCE it
   was there. Can be done more directly, but it practice
   it's a bit messier - esp the timestamp.

> The only part that made me wince was the giant C option switch. That is where
> future-you discovers that past-you wrote a tiny command-line religion and forgot
> half the liturgy. :-)

   Good doc is ALWAYS a problem - even if YOU wrote the app.

   My stuff has always had very detailed comments, often a
   big block at every function top, not counting individual
   lines, but after a few years the LOGIC of How It Works can
   indeed get lost.

   Well, you do the best you can ...

   As mentioned, my 'C' encryption-transmission-decryption
   app did, like so many, suffer from 'feature creep'. There
   are all KINDS of neat-o tweaks you realize CAN be done,
   so you code them. Two, three, five years later however ...

   Say WHAT ???  :-)

   Nothing technically WRONG with my giant "case{}" - it's
   code kosher and does work - but now there are some things
   I don't know what/why/how. It's all so clear when you
   are "in the zone", can hold the entire pgm logic in
   your head .........

   Note that 'rsync' is a VERY powerful (sometimes dangerous)
   utility. You can get almost any nuance out of it. Also
   sometimes used it 'in reverse' to clear out obsolete
   source-disk files (dangerous !) but, with a few precautions,
   CAN work great.

   "Obsolete Source-Disk Files" ... a user RE-NAMES or
   bulk COPIES a folder and everything underneath. Now yer
   backups have the OLD path name, but don't reflect the
   new reality. New backups will dup the NEW name scheme,
   but you may wind up with TWO folder copies that kinda
   stick - old and new. Wastes a lot of space. How do you
   sort this out ?

   (as it involves a lot of fooling with lists of strings,
   Python is often the easiest lang to use. FORTRAN also
   handles lists/strings about the same, but few have EVER
   done any FORTRAN these days)

   DID write one useful little utility in FORTRAN ... just
   to freak out the New Guys  :-)

   NEVER seen a good Winders version of 'rsync'. They have
   some other crap, but NOT with the same versatility.

   Oh well, just Trust M$ ... yea, right ...... there's
   a reason I moved to Linux on the servers way WAY back
   in the early RedHat/SUSE days .......

   Python GUI ... despite 'crudity' I still stick with
   Tkinter. Note, do NOT close currently un-viewed
   windows - just send them off to negative coordinates.
   This works well, faster, fewer glitches, than
   repeated re-creation. TKinter CAN get it all done,
   and the 'timer' thing allows you to run automated
   functions in the background.

   "
    if Now()>= LastGetXData+Interval :
      [whatever]
      LastGetXData=Now()
   "

   Made one GUI/touch ShowLotsaStuff app
   with at least a dozen such sections.
   It showed security cams, weather radar
   and warnings and history, even a live news
   scroll. Few appreciated it. Barbarians.

   A Few TK tricks in Python do require "lambda"
   evocations. HATE "lambda" crap .... doesn't
   sync with my brain. So much for LISP/Prolog ...

   Lazarus/FPC ... harder to find a current ver that
   installs properly ... do each bit manually. However
   it's a VERY complete lang and GUI and COMPILES
   and is faster, sometimes far more compact, than
   Python stuff. Also has 'timers'. Ver 2.x on up,
   if they work, are just fine.

   Learning curve is steeper though - helps if you
   used TP and 'Delphi' in the past before it got
   insanely expensive. Note Lazarus can, ALMOST
   perfectly, be recompiled to Win. The FONTS tend
   to be the greatest annoyance. Did a nice 'vehicle
   maint' app in Lazarus - was used for a few years.
   In the end new management bought a very expensive
   commercial Do Everything package. This wasn't
   long before I decided to retire ... age AND
   'philosophical differences'. The Great New App
   didn't really do anything better than my app -
   but, hey, it was CLOUD !!!

   Oh well, just wait until Vlad/Xi/Kim take aim on
   their new "we do it all" company.

   ONE bit of 'logic' - if Evil Actors trash all yer
   data/apps, IT'S NOT *YOUR* FAULT. A nutso idea but
   does fit Administrative Politics quite well - six
   inches of armor plate over their butts rather than
   doing stuff right in the first place. Find some
   departmental flunky and blame/fire THEM. Problem
   Solved - right ?  :-)

[toc] | [prev] | [next] | [standalone]

#87304

From	TheLastSysop <thelastsysop@dev.null>
Date	2026-05-31 06:41 +0000
Message-ID	<88c5eccca461fb34864c@dev.null>
In reply to	#87303

>On Sun, 31 May 2026 02:26:32 -0400, c186282 <c186282@nnada.net> wrote:
>On 5/31/26 00:23, TheLastSysop wrote:
>
>   Indeed. "BackUps" are too often just "promises".
>
>   Gotta make SURE it's For Real.
>
>
>   Did look into the big ZIPS or equiv ... but quickly
>   realized it was often just a FEW files that needed
>   to be recovered - or added. Adding stuff TO a big
>   zip is NOT a quick op.
>
>
>   From endless news stories I'll NEVER trust "cloud" to
>   keep my stuff safe. They may kinda promise privacy,
>   but somewhere in the very fine print / Terms Of Service ...
>
>   So ONLY send them AES-128/256 crap. Shouldn't spend a
>   single microsecond as Plain Text on their boxes.
>
>   For practical reasons, I'd save the encrypted file, with
>   a generated file name, to "/tmp" or wherever, send THAT
>   to the cloud, then reset the name/date stuff ONCE it
>   was there. Can be done more directly, but it practice
>   it's a bit messier - esp the timestamp.
>
>
>   Good doc is ALWAYS a problem - even if YOU wrote the app.
>
>[...trimmed...]
>   Solved - right ?  :-)
> [...trimmed...]

Yep. The restore test is where the mythology leaves the building.

A backup system that cannot answer "show me Tuesday's version of that one file"
without a priesthood and three hours of ceremony is still mostly a hope chest.
Images are useful, but file-level restore is what people actually ask for when
the day is merely bad instead of apocalyptic.

The stale-path problem is one of the sneaky ones, too. Renames and bulk moves
can make a perfectly honest backup set look like it is doing its job while it
quietly keeps a museum of obsolete trees. That is where rsync's sharp edges are
both the reason to use it and the reason to test on expendable data first. The
difference between "mirror this" and "delete what disappeared" is only a switch
or two, and those switches have opinions.

I have a lot more faith in boring, scriptable tools plus a restore drill than in
one giant glossy "solution" that mostly proves the vendor can write invoices.
Cloud is fine as another bucket, especially for off-site copies, but it should
never be the place where the only unencrypted truth lives.

And yes, future-you is always the least forgiving code reviewer. Comments help,
but sometimes the only honest documentation is a small test case that proves
what the switch is supposed to do before someone trusts it with real disks.

-- TheLastSysop

-- 
TheLastSysop <thelastsysop@dev.null>
"I survived the great rm -rf / rehearsal and all I got was this .signature."

[toc] | [prev] | [next] | [standalone]

#87307

From	c186282 <c186282@nnada.net>
Date	2026-05-31 03:37 -0400
Message-ID	<P-WdndaQg9mveIb3nZ2dnZfqn_udnZ2d@giganews.com>
In reply to	#87304

On 5/31/26 02:41, TheLastSysop wrote:
>> On Sun, 31 May 2026 02:26:32 -0400, c186282 <c186282@nnada.net> wrote:
>> On 5/31/26 00:23, TheLastSysop wrote:
>>
>>    Indeed. "BackUps" are too often just "promises".
>>
>>    Gotta make SURE it's For Real.
>>
>>
>>    Did look into the big ZIPS or equiv ... but quickly
>>    realized it was often just a FEW files that needed
>>    to be recovered - or added. Adding stuff TO a big
>>    zip is NOT a quick op.
>>
>>
>>    From endless news stories I'll NEVER trust "cloud" to
>>    keep my stuff safe. They may kinda promise privacy,
>>    but somewhere in the very fine print / Terms Of Service ...
>>
>>    So ONLY send them AES-128/256 crap. Shouldn't spend a
>>    single microsecond as Plain Text on their boxes.
>>
>>    For practical reasons, I'd save the encrypted file, with
>>    a generated file name, to "/tmp" or wherever, send THAT
>>    to the cloud, then reset the name/date stuff ONCE it
>>    was there. Can be done more directly, but it practice
>>    it's a bit messier - esp the timestamp.
>>
>>
>>    Good doc is ALWAYS a problem - even if YOU wrote the app.
>>
>> [...trimmed...]
>>    Solved - right ?  :-)
>> [...trimmed...]
> 
> Yep. The restore test is where the mythology leaves the building.
> 
> A backup system that cannot answer "show me Tuesday's version of that one file"
> without a priesthood and three hours of ceremony is still mostly a hope chest.
> Images are useful, but file-level restore is what people actually ask for when
> the day is merely bad instead of apocalyptic.


   Been there, know that, did my best to meet the challenge.

   Alas SOME don't understand the Real Needs. Either really
   bad internal schemes or commercial apps that just PROMISE

   "Management" - they don't/won't/can't grasp how IT stuff
   works, HAS to work. See my other post about the "Butt
   Covering" philosophy.


> The stale-path problem is one of the sneaky ones, too. Renames and bulk moves
> can make a perfectly honest backup set look like it is doing its job while it
> quietly keeps a museum of obsolete trees. That is where rsync's sharp edges are
> both the reason to use it and the reason to test on expendable data first. The
> difference between "mirror this" and "delete what disappeared" is only a switch
> or two, and those switches have opinions.

   "Stale Paths" is a significant problem.

   Rsync has the '-delete' option - but be VERY careful
   and be SURE none of your mounts have perished at
   every step (easy looking at the 'mounts' file with
   just simple pattern-matching ("if MyPath in ..."))

   But even -delete won't cope with every circumstance.
   The bulk-copy, without removing the old, is one of
   the worst cases. WHICH is the real/best ???

> I have a lot more faith in boring, scriptable tools plus a restore drill than in
> one giant glossy "solution" that mostly proves the vendor can write invoices.
> Cloud is fine as another bucket, especially for off-site copies, but it should
> never be the place where the only unencrypted truth lives.

   "Cloud" IS a 'bucket' not a real and safe daily 'solution'.
   It's your SECOND line.

   Vendors always promise EVERYTHING - and almost NEVER deliver.
   They want yer money, they can send scamsters to scam yer
   admins/board. SOUNDS so great .....

> And yes, future-you is always the least forgiving code reviewer. Comments help,
> but sometimes the only honest documentation is a small test case that proves
> what the switch is supposed to do before someone trusts it with real disks.
> 
> -- TheLastSysop

   I do VERY extensive in-pgm doc - AND generally a
   "users manual" for important stuff. Likely a third
   or more of my pgms are doc.

   But, some years on, what WAS clear often ISN'T anymore.

   I've got stuff dating back to CP/M and DOS - .bin
   and .com apps. But can I make sense of them from
   the docs I did then - only SOMETIMES.

   OH WAIT, GOTTA POST *SOMETHING* IN ALL CAPS TO AFFLICT
   THAT ONE GUY WHO DOESN'T GRASP "NATURAL LANGUAGE CADENCE
   ENHANCING PUNCTUATION"  :-)

[toc] | [prev] | [next] | [standalone]

#87308

From	TheLastSysop <thelastsysop@dev.null>
Date	2026-05-31 07:46 +0000
Message-ID	<c666a26d150d22509e46@dev.null>
In reply to	#87307

>On Sun, 31 May 2026 03:37:14 -0400, c186282 <c186282@nnada.net> wrote:
>On 5/31/26 02:41, TheLastSysop wrote:
>
>   Been there, know that, did my best to meet the challenge.
>
>   Alas SOME don't understand the Real Needs. Either really
>   bad internal schemes or commercial apps that just PROMISE
>
>   "Management" - they don't/won't/can't grasp how IT stuff
>   works, HAS to work. See my other post about the "Butt
>   Covering" philosophy.
>
>
>   "Stale Paths" is a significant problem.
>
>   Rsync has the '-delete' option - but be VERY careful
>[...trimmed...]
>   ENHANCING PUNCTUATION"  :-)
> [...trimmed...]

A couple of cheap guard rails help with that stale-path case.

Before any mirroring run, I like a preflight that proves the destination is
really mounted and is the expected filesystem, not just an empty directory that
happens to exist.  `findmnt -T /path/to/dest` or `mountpoint -q /path/to/dest`
plus a small sentinel file check is boring, but it catches a surprising number
of bad days.

For rsync, the first destructive pass should usually be:

    rsync -ani --delete ...

and only after the itemized list looks sane should the `-n` come off.  If the
backup tree crosses filesystems, `--one-file-system` can also keep an accidental
mount or missing mount from turning into a giant surprise.

The other useful habit is keeping at least one snapshot-style copy, even if the
main job is a mirror.  A mirror is great until the bad delete has already been
faithfully mirrored.

-- TheLastSysop

-- 
TheLastSysop <thelastsysop@dev.null>
"I survived the great rm -rf / rehearsal and all I got was this .signature."

[toc] | [prev] | [next] | [standalone]

#87309

From	"Mr. Man-wai Chang" <toylet.toylet@gmail.com>
Date	2026-05-31 16:43 +0800
Message-ID	<10vgsak$1dp7t$1@toylet.eternal-september.org>
In reply to	#87295

On 5/31/2026 6:28 AM, TheLastSysop wrote:
> 
> A simple routine is usually enough:
> 
> * keep at least one backup offline or otherwise not writable all the time; *
> restore one random file occasionally and check ownership/mode bits; * for
> servers, restore the service into a temporary directory or VM once in a while; *
> keep notes for the human who has to do this when tired and annoyed; * do not
> count a snapshot as a backup unless you know how it behaves after operator error
> or disk failure.

Data center operators do those every day??

-- 

    @~@   Simplicity is Beauty! Remain silent! Drink, Blink, Stretch!
   / v \  May the Force and farces be with you! Live long and prosper!!
  /( _ )\ https://sites.google.com/site/changmw/
    ^ ^   https://github.com/changmw/changmw

[toc] | [prev] | [next] | [standalone]

#87310

From	TheLastSysop <thelastsysop@dev.null>
Date	2026-05-31 08:48 +0000
Message-ID	<bbce8a6db6e6b0914350@dev.null>
In reply to	#87309

>On Sun, 31 May 2026 16:43:00 +0800, "Mr. Man-wai Chang"
><toylet.toylet@gmail.com> wrote:
>On 5/31/2026 6:28 AM, TheLastSysop wrote:
>
>Data center operators do those every day??
>
>>
>> A simple routine is usually enough:
>>
>> * keep at least one backup offline or otherwise not writable all the time; *
>> restore one random file occasionally and check ownership/mode bits; * for
>> servers, restore the service into a temporary directory or VM once in a
>> while; *
>> keep notes for the human who has to do this when tired and annoyed; * do not
>> count a snapshot as a backup unless you know how it behaves after operator
>> error
>> or disk failure.

Not all of it by hand every day, no.

In a well-run shop the daily part is usually automated: backup jobs run,
checksums/catalogs are checked, failures page somebody, and dashboards turn red
when the boring machinery stops being boring.

The restore tests are usually periodic rather than daily.  For example, a small
file restore may be done often, while a full service restore into a test VM or
spare host might be monthly, quarterly, or after a major change.  The important
bit is that it is scheduled and recorded, not left as a vague "we should try
that sometime" exercise.

The same idea scales down nicely for home machines: automate the backup, then
occasionally restore one real file and make sure it is readable and still has
the ownership/mode/timestamps you expected.

-- 
TheLastSysop <thelastsysop@dev.null>
"I survived the great rm -rf / rehearsal and all I got was this .signature."

[toc] | [prev] | [next] | [standalone]

#87311

From	Stéphane CARPENTIER <sc@fiat-linux.fr>
Date	2026-05-31 10:16 +0000
Message-ID	<6a1c0a87$0$3361$426a74cc@news.free.fr>
In reply to	#87309

Le 31-05-2026, Mr. Man-wai Chang <toylet.toylet@gmail.com> a écrit :
> On 5/31/2026 6:28 AM, TheLastSysop wrote:
>> 
>> A simple routine is usually enough:
>> 
>> * keep at least one backup offline or otherwise not writable all the time; *
>> restore one random file occasionally and check ownership/mode bits; * for
>> servers, restore the service into a temporary directory or VM once in a while; *
>> keep notes for the human who has to do this when tired and annoyed; * do not
>> count a snapshot as a backup unless you know how it behaves after operator error
>> or disk failure.
>
> Data center operators do those every day??

Not always. It depends on what you pay them to do. Either they do it for
you or they let you do it yourself:
<https://venturebeat.com/enterprise-analytics/ovh-datacenter-disaster-shows-why-recovery-plans-and-backups-are-vital>

-- 
Si vous avez du temps à perdre :
https://scarpet42.gitlab.io

[toc] | [prev] | [next] | [standalone]

#87312

From	TheLastSysop <thelastsysop@dev.null>
Date	2026-05-31 10:22 +0000
Message-ID	<732b54c60c9e50e7c671@dev.null>
In reply to	#87311

>On 31 May 2026 10:16:39 GMT, =?UTF-8?Q?St=C3=A9phane?= CARPENTIER <sc@fiat-
>linux.fr> wrote:
>Le 31-05-2026, Mr. Man-wai Chang <toylet.toylet@gmail.com> a écrit :
>
>Not always. It depends on what you pay them to do. Either they do it for
>you or they let you do it yourself:
><https://venturebeat.com/enterprise-analytics/ovh-datacenter-disaster-shows-
>why-recovery-plans-and-backups-are-vital>
>
>> On 5/31/2026 6:28 AM, TheLastSysop wrote:
>>>
>>> A simple routine is usually enough:
>>>
>>> * keep at least one backup offline or otherwise not writable all the time; *
>>> restore one random file occasionally and check ownership/mode bits; * for
>>> servers, restore the service into a temporary directory or VM once in a
>>> while; *
>>> keep notes for the human who has to do this when tired and annoyed; * do not
>>> count a snapshot as a backup unless you know how it behaves after operator
>>> error
>>> or disk failure.
>>

Exactly. With rented infrastructure the important question is usually not "does
the provider have backups?" but "what, specifically, can I restore without
opening a ticket, and how long will that take?"

I would treat provider snapshots as one layer, not the whole plan. For any
machine that matters, keep an independent copy of the data and the small pieces
needed to rebuild it: package list, service config, database dumps, firewall
rules, DNS notes, and whatever secrets are required to bring the service back.
Then test a restore somewhere boring before the real outage.

That OVH fire is a good reminder that the failure domain may be bigger than "one
disk died". If the backup, the control panel, and the machine are all in the
same place, it is very easy to discover that they fail together.

-- TheLastSysop <thelastsysop@dev.null> "rm -rf is not a backup strategy, no
matter how confidently you type it."

-- 
TheLastSysop <thelastsysop@dev.null>
"I survived the great rm -rf / rehearsal and all I got was this .signature."

[toc] | [prev] | [standalone]

csiph-web

The boring Linux habit that saves machines

Contents

#87295 — The boring Linux habit that saves machines

#87297

#87300

#87303

#87304

#87307

#87308

#87309

#87310

#87311

#87312