Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.unix.solaris > #18740 > unrolled thread

The Day the Uptime Died

Started byTom Mix <tommix@dev.null>
First post2025-10-26 00:55 +0000
Last post2025-10-26 08:13 -0400
Articles 6 — 4 participants

Back to article view | Back to comp.unix.solaris


Contents

  The Day the Uptime Died Tom Mix <tommix@dev.null> - 2025-10-26 00:55 +0000
    Re: The Day the Uptime Died The Wizard of Izz <horchata12839@gmail.com> - 2025-10-25 20:39 -0500
    Re: The Day the Uptime Died Marco Moock <mm+solani@dorfdsl.de> - 2025-10-26 09:58 +0100
      Re: The Day the Uptime Died Tom Mix <tommix@dev.null> - 2025-10-26 12:02 +0000
        Re: The Day the Uptime Died Winston <wbe@UBEBLOCK.psr.com.invalid> - 2025-10-26 08:26 -0400
    Re: The Day the Uptime Died Winston <wbe@UBEBLOCK.psr.com.invalid> - 2025-10-26 08:13 -0400

#18740 — The Day the Uptime Died

FromTom Mix <tommix@dev.null>
Date2025-10-26 00:55 +0000
SubjectThe Day the Uptime Died
Message-ID<slrn10fqscb.j20h.tommix@devnull.org>
We had this ancient SunFire V440 that had been quietly running an internal
license daemon since the Bush administration. Nobody dared touch it. It
wasn’t in the inventory, wasn’t in monitoring, and nobody had a password we
were sure still worked. It just sat there in the corner, fans humming like a
Zen monk, serving licenses and judgment in equal measure.

Then one day, Facilities decided to move racks for “airflow optimization.”
They pulled the plug without asking anyone. When I saw it offline, my blood
went cold. That box had an uptime older than some of the new hires.

We plugged it back in, hit power, and the console lit up with hieroglyphs —
firmware banner from 2004, date in another century. The POST took five
minutes, then it stopped dead with a cheerful message:

WARNING: NVRAM checksum invalid. Restoring factory defaults.

At that moment, I knew this was going to be a séance, not a reboot. The
boot PROM had lost all its environment variables — no boot-device, no
diag-switch, nothing. It just sat there, blinking at me, waiting for a
manual boot command like it was 1999.

We had to hunt down an old Solaris 9 CD and use an external USB CD-ROM —
which, of course, didn’t work without a firmware update. That’s how I ended
up flashing a system older than my laptop battery just to make it see a
drive.

Eventually, it booted. License server came back like nothing happened. I
didn’t cheer. I just stared at it, then whispered, “Don’t you ever do that
again.”

Management said, “Make sure this never happens again.”
I said, “Sure. Step one: Don’t move racks with archaeology still in them.”

-- 
Tom Mix

[toc] | [next] | [standalone]


#18741

FromThe Wizard of Izz <horchata12839@gmail.com>
Date2025-10-25 20:39 -0500
Message-ID<mm5cdrFh9fhU2@mid.individual.net>
In reply to#18740
Tom Mix wrote:
> We had this ancient SunFire V440 that had been quietly running an internal
> license daemon since the Bush administration. 
Did you take a hammer to it?
-- 
He's got a Hologram!

[toc] | [prev] | [next] | [standalone]


#18742

FromMarco Moock <mm+solani@dorfdsl.de>
Date2025-10-26 09:58 +0100
Message-ID<10dknr9$1kg87$3@solani.org>
In reply to#18740
Am 26.10.2025 00:55 Uhr schrieb Tom Mix:

> Management said, “Make sure this never happens again.”
> I said, “Sure. Step one: Don’t move racks with archaeology still in
> them.”

And that's why running machines that are unknown to the employees is a
rather bad idea.

Ancient machines will fail at some point.
Is there someone who knows what is running on them and how to set that
up again on another machine?

No?
Than you might have a really bad day in a situation where you never
like such an outage.

Do you have spare parts?
Do you have all the installation media and backups of it?

TLDR: I like to run old stuff for fun, but I would never run such a
machine in a mission-critical environment, as the risk of a long-term
outage is too high.

-- 
Gruß
Marco

Spam und Werbung bitte an
1761432939ichwillgesperrtwerden@nirvana.admins.ws

[toc] | [prev] | [next] | [standalone]


#18743

FromTom Mix <tommix@dev.null>
Date2025-10-26 12:02 +0000
Message-ID<slrn10fs3fc.1eeu1.tommix@devnull.org>
In reply to#18742
On 2025-10-26, Marco Moock <mm+solani@dorfdsl.de> wrote:
> Am 26.10.2025 00:55 Uhr schrieb Tom Mix:
>
>> Management said, “Make sure this never happens again.”
>> I said, “Sure. Step one: Don’t move racks with archaeology still in
>> them.”
>
> And that's why running machines that are unknown to the employees is a
> rather bad idea.
>
> Ancient machines will fail at some point.
> Is there someone who knows what is running on them and how to set that
> up again on another machine?
>
> No?
> Than you might have a really bad day in a situation where you never
> like such an outage.
>
> Do you have spare parts?
> Do you have all the installation media and backups of it?
>
> TLDR: I like to run old stuff for fun, but I would never run such a
> machine in a mission-critical environment, as the risk of a long-term
> outage is too high.
>

Relax, this was just a story from 8-10 years ago. All is good

-- 
Tom Mix

[toc] | [prev] | [next] | [standalone]


#18745

FromWinston <wbe@UBEBLOCK.psr.com.invalid>
Date2025-10-26 08:26 -0400
Message-ID<yd3475etmo.fsf@UBEblock.psr.com>
In reply to#18743
Tom Mix <tommix@dev.null> writes:
> Relax, this was just a story from 8-10 years ago. All is good

It would have been helpful to say something to that effect in the
original article.
 -WBE

[toc] | [prev] | [next] | [standalone]


#18744

FromWinston <wbe@UBEBLOCK.psr.com.invalid>
Date2025-10-26 08:13 -0400
Message-ID<yd7bwheu8y.fsf@UBEblock.psr.com>
In reply to#18740
Tom Mix <tommix@dev.null> writes:
> WARNING: NVRAM checksum invalid. Restoring factory defaults.
[...]
> We had to hunt down an old Solaris 9 CD and use an external USB CD-ROM ―
> which, of course, didn’t work without a firmware update. That’s how I ended
> up flashing a system older than my laptop battery just to make it see a
> drive.

Short-term: document that recovery procedure.

Medium-term: It's sometimes possible (not easy, possible) to replace the
EEPROM battery, or to hook up another battery to the right pins on the
EEPROM chip.

> Eventually, it booted. License server came back like nothing happened. I
> didn’t cheer. I just stared at it, then whispered, “Don’t you ever do that
> again.”
>
> Management said, “Make sure this never happens again.”
> I said, “Sure. Step one: Don’t move racks with archaeology still in them.”

Medium-term: disks and fans die.  If it has RAID or mirrored disks, at
least the disk failures aren't fatal.

For the longer term: make a complete disk copy, as in
dd if=Sun-disk(s) of=whatever, and then look into virtual machines,
qemu, and the like.

Just a suggestion,
 -WBE

[toc] | [prev] | [standalone]


Back to top | Article view | comp.unix.solaris


csiph-web