Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.sys.ibm.ps2.hardware > #71441

Article: Multi-Multicore Single System Image / Cloud Computing. A Good Idea? (part 1)

From Louis Ohland <ohland@charter.net>
Newsgroups comp.sys.ibm.ps2.hardware
Subject Article: Multi-Multicore Single System Image / Cloud Computing. A Good Idea? (part 1)
Date 2026-06-23 08:38 -0500
Organization csiph.com Internet News Service
Message-ID <111e28t$qta$1@csiph.com> (permalink)

Show all headers | View raw


http://perilsofparallel.blogspot.com/2009/01/multi-multicore-single-system-image.html

http://perilsofparallel.blogspot.com/2009/01/multi-multicore-single-system-image.html?showComment=1241145060000#c8871026021207257403

"Cameron Bahar said...
Greg,

I'm a fan of your book and I read it after working at Locus Computing 
for a good number of years. Was interested to learn more about different 
cluster architectures and why Locus had essentially failed even though 
we all thought it was so cool and great to have this technology. At 
Locus, working under the leadership of Jerry Popek and company we did 
actually build a SSI unix-flavored operating system for IBM called 
AIX-TCF (transparent computing facility). IBM ran into trouble in 1992 
and moved operations to Austin Texas and essentially disconnected from 
Locus over the years after Locus refused a buyout offer from IBM which 
was a mistake. A lot of the Locus folks ended up at 
Interactive/Sun/Solaris-X86 and thus was born the Solaris X86 operating 
system. Locus's demise benefited Sunsoft and Solaris X86, even though 
Sun didn't leverage the X86 platform to it's own demise! Will be 
interesting to see what Larry will do with Solaris.

Now to SSI and why didn't it take off. I think there are a number of 
reasons. Foremost, the concept and vision is certainly attractive. We 
had single system semantics across nodes and all OS services were 
distributed. We could migrate processes from one node to another. We had 
remote devices. We had a distributed replication filesystem (I worked on 
that a little) and transparency all over the system. But, I think the 
main problem was that ethernet got us effectively 4-5 Mbits/sec on a 
10Mbit link and were a tightly coupled operating system passing messages 
for consistency and coherency between nodes. The slow network caused 
lots of problems and things didn't always converge to a stable steady 
state. As you increased the number of nodes, the complexity grows and 
the number of messages increases and scalability suffers. HA was also 
problematic and not well thought out and we didn't have shared SAN's to 
help keep data around. So a shared nothing SSI cluster with a slow 
network is dead in its tracks.

I joined Teradata after Locus and there we had a shared nothing database 
with a dedicated fibre optic network with Gigabit links. Now this worked 
a lot better.

The distinction between Teradata and Locus is interesting, because 
Teradata is extremely successful and still leads in performance today.

What is this difference:

Teradata runs 1 application and that is a parallel database for data 
mining. One function, entire system optimized for a RDBMS with a 
specific workload. They have a parallel application interface but the 
whole thing always runs the database and is not GENERAL PURPOSE. So they 
pick a problem and solve it well.

With Locus, the idea is this is GENERAL PURPOSE and all apps will run 
unmodified and it's the holy grail. As I've learned over the last 20 
years, systems are built and optimized for certain workloads and there's 
no one size fits all system.

So HP/IBM/Intel/Tandem/Pyramid and all others bought the cool-aid, but 
at the end of the day it just doesn't work and is not needed.

What worked was a unix server and NFS for remote access to data. That 
model WON and SSI lost. Why did this win, because it was loosely coupled 
collection of independent nodes and the work was partitioned among the 
nodes with great autonomy between the nodes, i.e. No message passing.

Jerry passed away last year, god rest his soul. He was a visionary and a 
great teacher. He also wrote the first paper on virtualization which 
vmware is based on. At least the virtualization idea has caught on and 
has spawned the cloud computing era. Again independent nodes with single 
operating systems providing service, no message passing.

A few years ago I founded a company called ParaScale (parallel 
scalability). We're building a cloud storage platform which basically 
means an application layer software that runs on top of a number of 
Linux servers and federates them into a single cloud storage platform 
providing file services over standard protocols. Again, the idea here is 
loosely coupled, autonomous nodes all operating with the guidance of a 
few master nodes that are highly available. By learning from the past 
and focusing on providing a specific service (in this case storage 
services) we hope to achieve our goals of reliability and scalability 
and virtualization.

I enjoyed reading your multi-part post on SSI and it brought back some 
good memories. We have a few Locus people working at ParaScale and still 
solving interesting distributed system problems.

Best,
Cameron
--

April 30, 2009 at 8:31 PM"

Back to comp.sys.ibm.ps2.hardware | Previous | Next | Find similar | Unroll thread


Thread

Article: Multi-Multicore Single System Image / Cloud Computing. A Good Idea? (part 1) Louis Ohland <ohland@charter.net> - 2026-06-23 08:38 -0500

csiph-web