Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.unix.solaris > #18740 > unrolled thread
| Started by | Tom Mix <tommix@dev.null> |
|---|---|
| First post | 2025-10-26 00:55 +0000 |
| Last post | 2025-10-26 08:13 -0400 |
| Articles | 6 — 4 participants |
Back to article view | Back to comp.unix.solaris
The Day the Uptime Died Tom Mix <tommix@dev.null> - 2025-10-26 00:55 +0000
Re: The Day the Uptime Died The Wizard of Izz <horchata12839@gmail.com> - 2025-10-25 20:39 -0500
Re: The Day the Uptime Died Marco Moock <mm+solani@dorfdsl.de> - 2025-10-26 09:58 +0100
Re: The Day the Uptime Died Tom Mix <tommix@dev.null> - 2025-10-26 12:02 +0000
Re: The Day the Uptime Died Winston <wbe@UBEBLOCK.psr.com.invalid> - 2025-10-26 08:26 -0400
Re: The Day the Uptime Died Winston <wbe@UBEBLOCK.psr.com.invalid> - 2025-10-26 08:13 -0400
| From | Tom Mix <tommix@dev.null> |
|---|---|
| Date | 2025-10-26 00:55 +0000 |
| Subject | The Day the Uptime Died |
| Message-ID | <slrn10fqscb.j20h.tommix@devnull.org> |
We had this ancient SunFire V440 that had been quietly running an internal license daemon since the Bush administration. Nobody dared touch it. It wasn’t in the inventory, wasn’t in monitoring, and nobody had a password we were sure still worked. It just sat there in the corner, fans humming like a Zen monk, serving licenses and judgment in equal measure. Then one day, Facilities decided to move racks for “airflow optimization.” They pulled the plug without asking anyone. When I saw it offline, my blood went cold. That box had an uptime older than some of the new hires. We plugged it back in, hit power, and the console lit up with hieroglyphs — firmware banner from 2004, date in another century. The POST took five minutes, then it stopped dead with a cheerful message: WARNING: NVRAM checksum invalid. Restoring factory defaults. At that moment, I knew this was going to be a séance, not a reboot. The boot PROM had lost all its environment variables — no boot-device, no diag-switch, nothing. It just sat there, blinking at me, waiting for a manual boot command like it was 1999. We had to hunt down an old Solaris 9 CD and use an external USB CD-ROM — which, of course, didn’t work without a firmware update. That’s how I ended up flashing a system older than my laptop battery just to make it see a drive. Eventually, it booted. License server came back like nothing happened. I didn’t cheer. I just stared at it, then whispered, “Don’t you ever do that again.” Management said, “Make sure this never happens again.” I said, “Sure. Step one: Don’t move racks with archaeology still in them.” -- Tom Mix
[toc] | [next] | [standalone]
| From | The Wizard of Izz <horchata12839@gmail.com> |
|---|---|
| Date | 2025-10-25 20:39 -0500 |
| Message-ID | <mm5cdrFh9fhU2@mid.individual.net> |
| In reply to | #18740 |
Tom Mix wrote: > We had this ancient SunFire V440 that had been quietly running an internal > license daemon since the Bush administration. Did you take a hammer to it? -- He's got a Hologram!
[toc] | [prev] | [next] | [standalone]
| From | Marco Moock <mm+solani@dorfdsl.de> |
|---|---|
| Date | 2025-10-26 09:58 +0100 |
| Message-ID | <10dknr9$1kg87$3@solani.org> |
| In reply to | #18740 |
Am 26.10.2025 00:55 Uhr schrieb Tom Mix: > Management said, “Make sure this never happens again.” > I said, “Sure. Step one: Don’t move racks with archaeology still in > them.” And that's why running machines that are unknown to the employees is a rather bad idea. Ancient machines will fail at some point. Is there someone who knows what is running on them and how to set that up again on another machine? No? Than you might have a really bad day in a situation where you never like such an outage. Do you have spare parts? Do you have all the installation media and backups of it? TLDR: I like to run old stuff for fun, but I would never run such a machine in a mission-critical environment, as the risk of a long-term outage is too high. -- Gruß Marco Spam und Werbung bitte an 1761432939ichwillgesperrtwerden@nirvana.admins.ws
[toc] | [prev] | [next] | [standalone]
| From | Tom Mix <tommix@dev.null> |
|---|---|
| Date | 2025-10-26 12:02 +0000 |
| Message-ID | <slrn10fs3fc.1eeu1.tommix@devnull.org> |
| In reply to | #18742 |
On 2025-10-26, Marco Moock <mm+solani@dorfdsl.de> wrote: > Am 26.10.2025 00:55 Uhr schrieb Tom Mix: > >> Management said, “Make sure this never happens again.” >> I said, “Sure. Step one: Don’t move racks with archaeology still in >> them.” > > And that's why running machines that are unknown to the employees is a > rather bad idea. > > Ancient machines will fail at some point. > Is there someone who knows what is running on them and how to set that > up again on another machine? > > No? > Than you might have a really bad day in a situation where you never > like such an outage. > > Do you have spare parts? > Do you have all the installation media and backups of it? > > TLDR: I like to run old stuff for fun, but I would never run such a > machine in a mission-critical environment, as the risk of a long-term > outage is too high. > Relax, this was just a story from 8-10 years ago. All is good -- Tom Mix
[toc] | [prev] | [next] | [standalone]
| From | Winston <wbe@UBEBLOCK.psr.com.invalid> |
|---|---|
| Date | 2025-10-26 08:26 -0400 |
| Message-ID | <yd3475etmo.fsf@UBEblock.psr.com> |
| In reply to | #18743 |
Tom Mix <tommix@dev.null> writes: > Relax, this was just a story from 8-10 years ago. All is good It would have been helpful to say something to that effect in the original article. -WBE
[toc] | [prev] | [next] | [standalone]
| From | Winston <wbe@UBEBLOCK.psr.com.invalid> |
|---|---|
| Date | 2025-10-26 08:13 -0400 |
| Message-ID | <yd7bwheu8y.fsf@UBEblock.psr.com> |
| In reply to | #18740 |
Tom Mix <tommix@dev.null> writes: > WARNING: NVRAM checksum invalid. Restoring factory defaults. [...] > We had to hunt down an old Solaris 9 CD and use an external USB CD-ROM ― > which, of course, didn’t work without a firmware update. That’s how I ended > up flashing a system older than my laptop battery just to make it see a > drive. Short-term: document that recovery procedure. Medium-term: It's sometimes possible (not easy, possible) to replace the EEPROM battery, or to hook up another battery to the right pins on the EEPROM chip. > Eventually, it booted. License server came back like nothing happened. I > didn’t cheer. I just stared at it, then whispered, “Don’t you ever do that > again.” > > Management said, “Make sure this never happens again.” > I said, “Sure. Step one: Don’t move racks with archaeology still in them.” Medium-term: disks and fans die. If it has RAID or mirrored disks, at least the disk failures aren't fatal. For the longer term: make a complete disk copy, as in dd if=Sun-disk(s) of=whatever, and then look into virtual machines, qemu, and the like. Just a suggestion, -WBE
[toc] | [prev] | [standalone]
Back to top | Article view | comp.unix.solaris
csiph-web