Path: csiph.com!x330-a1.tempe.blueboxinc.net!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!border3.nntp.dca.giganews.com!Xl.tags.giganews.com!border1.nntp.dca.giganews.com!nntp.giganews.com!local2.nntp.dca.giganews.com!news.giganews.com.POSTED!not-for-mail
NNTP-Posting-Date: Mon, 06 Feb 2012 20:28:20 -0600
Message-ID: <4F308C3F.3010101@SPAM.comp-arch.net>
Date: Mon, 06 Feb 2012 18:28:15 -0800
From: "Andy (Super) Glew" <andy@SPAM.comp-arch.net>
Reply-To: andy@SPAM.comp-arch.net
Organization: comp-arch.net
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:9.0) Gecko/20111222 Thunderbird/9.0.1
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Single Thread Performance
References: <jgjm3k$dve$1@speranza.aioe.org> <ggtgp-3295E4.05551006022012@netnews.mchsi.com> <jgoj7u$ieh$1@gosset.csi.cam.ac.uk> <bDVXq.20080$Sh7.12987@newsfe15.iad> <jgpdvn$k34$1@gosset.csi.cam.ac.uk>
In-Reply-To: <jgpdvn$k34$1@gosset.csi.cam.ac.uk>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 126
X-Usenet-Provider: http://www.giganews.com
X-Trace: sv3-eROruWiPDbDCs6VEnD4gZnrbCOTjK/RfI+GnvMZpIOsPNthAfnElMV6hyk90CvZlZJPoc1PTPwX5mIK!yKmLRs/2yk3aj0L022wZNkYtotGWRZvuhk+Ds1J5CBgbKKihrBazWHHeFW0uRz4=
X-Complaints-To: abuse@giganews.com
X-DMCA-Notifications: http://www.giganews.com/info/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
X-Original-Bytes: 6509
Xref: x330-a1.tempe.blueboxinc.net comp.arch:5753

On 2/6/2012 12:42 PM, nmm1@cam.ac.uk wrote:
> In article<bDVXq.20080$Sh7.12987@newsfe15.iad>,
> Robert Myers<rbmyersusa@gmail.com>  wrote:
>> On 2/6/2012 8:06 AM, nmm1@cam.ac.uk wrote:
>>> In article<ggtgp-3295E4.05551006022012@netnews.mchsi.com>,
>>> Brett Davis<ggtgp@yahoo.com>   wrote:
>>>>
>>>> Intel has built a 40 billion dollar a year company with 10 billion
>>>> in profits a year out of pursuing Single Thread Performance.
>>>
>>> Well, perhaps.  Sort of.
>>>
>>> Something that you miss is that even Intel has started to realise
>>> that it has hit an immovable obstacle, and to pursue multi-thread
>>> performance.  But the message has to travel from the ground to
>>> the brain and then back to where the action is, and Intel is as
>>> much a Diplodocus as IBM was in the 1970s.  Things don't happen
>>> fast.
>>
>> I'm fairly certain (in fact, absolutely certain) that there are people
>> who believe that the obstacle is not immovable.  Just how far it can be
>> moved and at what cost is another matter.
>
> There are people who believe that the earth is flat, too.  The
> fact that the obstacle has an exponential cliff and not a strictly
> asymptotic one doesn't mean that it's movable.
>
>
> Regards,
> Nick Maclaren.

A while back I corresponded witgh Uzi Vishkin, the PRAM guy, about the 
following:

I created a theoretical computation model I call the ILP-PRAM. 
Basically, a PRAM, with extensions to model what a really aggressive ILP 
processor could do.

I don't know if I have those notes any more - I may have thrown them 
away, as I did so much, when I changed jobs. (Wanting to avoid any 
suspicion of NDA violations, I left behind much that was really personal).

But basically I assumed that an ILP RAM had features such as
a) if a compiler could parallelize, it could parallelize
b) even if a compiler could not parallelize, it could parallelize using 
techniques familiar to those in this group.

And then I proved that an ILP-PRAM and a PRAM were equivalent in power. 
  At least to within polynomial.

In the theory classes I had just taken, proofs of equivalence typically 
involved emulation.   I suspect that you all can see how an implicitly 
ILP-PRAM can emulate an explicitly parallel PRAM, with equivalent 
efficiency, without even having to resort to an OOO tricks:

Basically, have the ILP-PRAM execute a program that simulates the PRAM. 
Something along the lines of

for each cycle
     for each processing element of the PRAM, p
          simulate_1_cycle(p)

Since by definition the ILP-PRAM can execute independent code, such as 
the loop body of the simulation, in parallel, then the body of the 
simulation loop executes "efficiently".

If you hypothesize that the ILP PRAM is less efficient than the PRAM, 
the inefficiency can only lie in the control.  i.e. the for loops. And 
classic PRAMs have essentially the same "loops" - the cycle explicly. 
The for each processor kloop, not really - assuming that the work has 
already been distributed.  But, the ILP-PRAM definition allows the same 
distribution.

I.e. an ILP-PRAM can simulate a PRAM "efficiently".  They are models of 
equivalent computational power. for this definition.

---

Now, I admit that this somewhat begs the question: a logically single 
threaded ILP-RAM is as "efficient" as an explicitly parallel PRAM, if 
you have an explicitly parallel program.

But I think it is interesting.

---

Perhaps somebody here can help me out: years ago, around the dawn of the 
web, I found a paper that talked about incorporating communications cost 
into theoretical models of parallel computation.  As I recall, they 
proved that with any realistic model of parallel computation, a similar 
equivalence in efficiency.

However, for the life of me I can't find the reference.

===

Anyway, theoretical equivalence aside: both parallel and serial 
computation run into the same walls.  However, computer architecture is 
a place where we worry about the constant multipliers for O(n) 
equivalent stuff.  I do not mean to say that explicit parallelism is not 
a good thing.  I just think that it is not as much of a savior as many 
would think.  Parallelism can solve bigger problems, but it won't solve 
fundamentally hard problems.

---

This being said, I keep thinking about "How does the world compute 
itself?" if not in parallel, with communication costs. I think that, 
unless quantum computation is fundamental (which may well be - but I 
think that you can define a quantum-ILP-RAM as well), and barring faster 
than light communication, that the fact that the world seems to compute 
itself must mean that the problems that it asking of itself are 
fundamentally solvable on whatever is the appropriate parallel model of 
computation and costs.

Which is the thing that causes me to wonder against Rob Myer's diatribes 
about physical modelling.

Unless... the universe has hidden dimensions that are used to increase 
the effective bandwidth of communication between the entities that 
constitute the world.