Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.sys.dec > #2929

best guess for mount-verification problem

From helbig@asclothestro.multivax.de (Phillip Helbig (undress to reply))
Newsgroups comp.os.vms, comp.sys.dec
Subject best guess for mount-verification problem
Date 2021-06-28 10:27 +0000
Organization Multivax C&R
Message-ID <sbc86h$656$1@gioia.aioe.org> (permalink)

Cross-posted to 2 groups.

Show all headers | View raw


I have a three-node cluster (when no satellite or test system has joined
it) and physical disks (blue SBB in BA356) on each node (no dual-ported
disks; each disk has a direct connection to only one node).  All disks
are HBVS; system disks have both members on one node while others have 
both (in one case three) members on different nodes.  I've been running 
such a setup (though with different machines, even different 
architectures, different disks, different expansion boxes) for decades.

When something fails, I just replace it with something of similar build.
(The main reason for moving to SBB disks was to be able to replace a
disk (the most common failure) without having to dismount the members it
hosts, shut down the system, remove it from the shelf, open it, replace
the disk, close it, put it back on the shelf, boot it, remount the
members it hosts.) 

For a while now I've noticed disks going in and out of mount 
verification.  It is clear which node is involved.  So, my plan is to 
replace hardware (and maybe try to find the problem when the hardware is 
out of the cluster) and hope that it goes away.  Since all disks with 
members on this system, but no others, are involved, it is clear that 
the problem is only on one node.  It is unlikely to be a problem with 
the physical SCSI disks.

Theoretically it could be the SCSI cable, but my guess is that it is
either the expansion box or the SCSI card.  (I have had one expansion
box fail, but it failed completely.)  Which is more likely?

Has anyone seen anything like this before?  The mount-verification 
problem occurs regularly every few minutes, but always completes 
automatically after a few seconds or half a minute or so (depending on 
the shadow set).

It would be easiest to replace the BA356: dismount the members, power
down the box, remove the members, stick them in another box, swap the
cables, power up the other box, remount the members (and be very 
thankful for MINICOPY).  Of course, if it is exceedingly unlikely that 
the box is the problem, as opposed to the SCSI card (or something else 
which I haven't thought of), then that would be a waste of time.

Thoughts? 

Back to comp.sys.dec | Previous | NextNext in thread | Find similar


Thread

best guess for mount-verification problem helbig@asclothestro.multivax.de (Phillip Helbig (undress to reply)) - 2021-06-28 10:27 +0000
  Re: best guess for mount-verification problem helbig@asclothestro.multivax.de (Phillip Helbig (undress to reply)) - 2021-06-28 10:40 +0000
    Re: best guess for mount-verification problem helbig@asclothestro.multivax.de (Phillip Helbig (undress to reply)) - 2021-06-28 10:57 +0000
      Re: best guess for mount-verification problem Hans Bachner <hans@bachner.priv.at> - 2021-06-28 16:04 +0200
        Re: best guess for mount-verification problem helbig@asclothestro.multivax.de (Phillip Helbig (undress to reply)) - 2021-06-28 19:21 +0000

csiph-web