Groups > comp.lang.python > #2366 > unrolled thread

Python CPU

Started by	Brad <hwfwguy@gmail.com>
First post	2011-04-01 08:38 -0700
Last post	2011-04-03 22:59 -0700
Articles	9 on this page of 29 — 14 participants

Back to article view | Back to comp.lang.python

  Python CPU Brad <hwfwguy@gmail.com> - 2011-04-01 08:38 -0700
    Re: Python CPU jkn <jkn_gg@nicorp.f9.co.uk> - 2011-04-01 08:59 -0700
    Re: Python CPU Nobody <nobody@nowhere.com> - 2011-04-01 17:52 +0100
      Re: Python CPU Stefan Behnel <stefan_ml@behnel.de> - 2011-04-01 19:00 +0200
      Re: Python CPU geremy condra <debatem1@gmail.com> - 2011-04-01 10:12 -0700
    Re: Python CPU Emile van Sebille <emile@fenx.com> - 2011-04-01 11:28 -0700
    Re: Python CPU Emile van Sebille <emile@fenx.com> - 2011-04-01 11:35 -0700
      Re: Python CPU John Nagle <nagle@animats.com> - 2011-04-01 12:18 -0700
    Re: Python CPU "BartC" <bc@freeuk.com> - 2011-04-02 23:06 +0100
      Re: Python CPU Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-04-03 04:03 +0000
    Re: Python CPU Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2011-04-03 12:10 +1200
      Re: Python CPU Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-04-03 04:01 +0000
        Re: Python CPU John Nagle <nagle@animats.com> - 2011-04-03 00:16 -0700
          Re: Python CPU Paul Rubin <no.email@nospam.invalid> - 2011-04-03 01:59 -0700
            Re: Python CPU Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2011-04-04 19:41 +1200
          Re: Python CPU Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2011-04-04 19:47 +1200
            Re: Python CPU Paul Rubin <no.email@nospam.invalid> - 2011-04-04 02:23 -0700
              Re: Python CPU Terry Reedy <tjreedy@udel.edu> - 2011-04-04 13:14 -0400
              Re: Python CPU Terry Reedy <tjreedy@udel.edu> - 2011-04-04 14:58 -0400
                Re: Python CPU Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2011-04-05 09:59 +1200
            Re: Python CPU geremy condra <debatem1@gmail.com> - 2011-04-04 09:36 -0700
              Re: Python CPU Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2011-04-05 10:05 +1200
            Re: Python CPU John Nagle <nagle@animats.com> - 2011-04-04 12:56 -0700
              Re: Python CPU Paul Rubin <no.email@nospam.invalid> - 2011-04-04 13:20 -0700
                Re: Python CPU Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2011-04-05 10:11 +1200
      Re: Python CPU Werner Thie <werner@thieprojects.ch> - 2011-04-03 17:44 +0200
        Re: Python CPU John Nagle <nagle@animats.com> - 2011-04-03 10:15 -0700
          Re: Python CPU Nobody <nobody@nowhere.com> - 2011-04-03 22:37 +0100
            Re: Python CPU Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2011-04-03 22:59 -0700

Page 2 of 2 — ← Prev page 1 [2]

#2575

From	geremy condra <debatem1@gmail.com>
Date	2011-04-04 09:36 -0700
Message-ID	<mailman.8.1301935009.9059.python-list@python.org>
In reply to	#2556

On Mon, Apr 4, 2011 at 12:47 AM, Gregory Ewing
<greg.ewing@canterbury.ac.nz> wrote:
> John Nagle wrote:
>
>>    A tagged machine might make Python faster.  You could have
>> unboxed ints and floats, yet still allow values of other types,
>> with the hardware tagging helping with dispatch.   But it probably
>> wouldn't help all that much.  It didn't in the LISP machines.
>
> What might help more is having bytecodes that operate on
> arrays of unboxed types -- numpy acceleration in hardware.

I'd be interested in seeing the performance impact of this, although I
wonder if it'd be feasible.

Geremy Condra

[toc] | [prev] | [next] | [standalone]

#2604

From	Gregory Ewing <greg.ewing@canterbury.ac.nz>
Date	2011-04-05 10:05 +1200
Message-ID	<8vutl0Fdm3U1@mid.individual.net>
In reply to	#2575

geremy condra wrote:

> I'd be interested in seeing the performance impact of this, although I
> wonder if it'd be feasible.

A project I have in the back of my mind goes something
like this:

1) Design an instruction set for a Python machine and
a microcode architecture to support it

2) Write a simulator for it

3) Use the simulator to evaluate how effective it would
be if actually implemented, e.g. in an FPGA.

And if I get that far:

4) (optional) Get hold of a real FPGA and implement it

-- 
Greg

[toc] | [prev] | [next] | [standalone]

#2590

From	John Nagle <nagle@animats.com>
Date	2011-04-04 12:56 -0700
Message-ID	<4d9a2281$0$10524$742ec2ed@news.sonic.net>
In reply to	#2556

On 4/4/2011 12:47 AM, Gregory Ewing wrote:
> John Nagle wrote:
>
>> A tagged machine might make Python faster. You could have
>> unboxed ints and floats, yet still allow values of other types,
>> with the hardware tagging helping with dispatch. But it probably
>> wouldn't help all that much. It didn't in the LISP machines.
>
> What might help more is having bytecodes that operate on
> arrays of unboxed types -- numpy acceleration in hardware.

     That sort of thing was popular in the era of the early
Cray machines.  Once superscalar CPUs were developed,
the overhead on tight inner loops went down, and several
iterations of a loop could be in the pipeline at one time,
if they didn't conflict.  Modern superscalar machines have
register renaming, so the same program-visible register on
two successive iterations can map to different registers within
the CPU, allowing two iterations of the same loop to execute
simultaneously.  This eliminates the need for loop unrolling and
Duff's device.

				John Nagle

[toc] | [prev] | [next] | [standalone]

#2593

From	Paul Rubin <no.email@nospam.invalid>
Date	2011-04-04 13:20 -0700
Message-ID	<7xhbadojk0.fsf@ruckus.brouhaha.com>
In reply to	#2590

John Nagle <nagle@animats.com> writes:
>     That sort of thing was popular in the era of the early
> Cray machines.  Once superscalar CPUs were developed,
> the overhead on tight inner loops went down, and several
> iterations of a loop could be in the pipeline at one time,

Vector processors are back, they just call them GPGPU's now.

[toc] | [prev] | [next] | [standalone]

#2607

From	Gregory Ewing <greg.ewing@canterbury.ac.nz>
Date	2011-04-05 10:11 +1200
Message-ID	<8vuu1hFg70U1@mid.individual.net>
In reply to	#2593

Paul Rubin wrote:

> Vector processors are back, they just call them GPGPU's now.

Also present to some extent in the CPU, with
MMX, Altivec, etc.

-- 
Greg

[toc] | [prev] | [next] | [standalone]

#2530

From	Werner Thie <werner@thieprojects.ch>
Date	2011-04-03 17:44 +0200
Message-ID	<mailman.170.1301845860.2990.python-list@python.org>
In reply to	#2477

You probably heard of the infamous FORTH chips like the Harris RTX2000, 
or ShhBoom, which implemented a stack oriented very low power design 
before there were FPGAs in silicon. To my knowledge the RTX2000 is still 
used for space hardened application and if I search long enough I might 
fine the one I had sitting in my cellar.

The chip was at that time so insanely fast that it could produce video 
signals with FORTH programs driving the IO pins. Chuck Moore, father of 
FORTH developed the chip on silicon in FORTH itself.

Due to the fact, that the instruction sets of a FORTH machine, being a 
very general stack based von Neumann system, I believe that starting 
with an RTX2000 (which should be available in VHDL) one could quite fast 
be at  a point where things make sense, meaning not going for the 
'fastest' ever CPU but for the advantage of having a decent CPU 
programmable in Python sitting on a chip with a lot of hardware available.

Another thing worth to mention in this context is for sure the work 
available on http://www.myhdl.org/doku.php.

Werner

On 4/3/11 3:46 AM, Dan Stromberg wrote:
>
> On Sat, Apr 2, 2011 at 5:10 PM, Gregory Ewing
> <greg.ewing@canterbury.ac.nz <mailto:greg.ewing@canterbury.ac.nz>> wrote:
>
>     Brad wrote:
>
>         I've heard of Java CPUs. Has anyone implemented a Python CPU in VHDL
>         or Verilog?
>
>
>     Not that I know of.
>
>     I've had thoughts about designing one, just for the exercise.
>
>     It's doubtful whether such a thing would ever be of practical
>     use. Without as much money as Intel has to throw at CPU
>     development, it's likely that a Python chip would always be
>     slower and more expensive than an off-the-shelf CPU running
>     a tightly-coded interpreter.
>
>     It could be fun to speculate on what a Python CPU might
>     look like, though.
>
>
> One with the time and inclination could probably do a Python VM in an
> FPGA, no?
>
> Though last I heard, FPGA's weren't expected to increase in performance
> as fast as general-purpose CPU's.
>

[toc] | [prev] | [next] | [standalone]

#2532

From	John Nagle <nagle@animats.com>
Date	2011-04-03 10:15 -0700
Message-ID	<4d98ab35$0$10594$742ec2ed@news.sonic.net>
In reply to	#2530

On 4/3/2011 8:44 AM, Werner Thie wrote:
> You probably heard of the infamous FORTH chips like the Harris RTX2000,
> or ShhBoom, which implemented a stack oriented very low power design
> before there were FPGAs in silicon. To my knowledge the RTX2000 is still
> used for space hardened application and if I search long enough I might
> fine the one I had sitting in my cellar.
>
> The chip was at that time so insanely fast that it could produce video
> signals with FORTH programs driving the IO pins. Chuck Moore, father of
> FORTH developed the chip on silicon in FORTH itself.

     He did version 1, which had a broken integer divide operation.
(Divisors which were odd numbers produced wrong answers. Really.)
I came across one of those in a demo setup at a surplus store in
Silicon Valley, driving the CRT and with Moore's interface that
did everything with chords on three buttons.

> Due to the fact, that the instruction sets of a FORTH machine, being a
> very general stack based von Neumann system, I believe that starting
> with an RTX2000 (which should be available in VHDL) one could quite fast
> be at a point where things make sense, meaning not going for the
> 'fastest' ever CPU but for the advantage of having a decent CPU
> programmable in Python sitting on a chip with a lot of hardware available.

     Willow Garage has VHDL available for a Forth CPU.  It's only 200
lines.

     The Forth CPUs have three separate memories - RAM, Forth stack,
and return stack. All three are accessed on each cycle.  Back before
microprocessors had caches, this was a win over traditional CPUs,
where memory had to be accessed sequentially for those functions.
Once caches came in, it was a lose.

     It's interesting that if you wanted to design a CPU for Googles's
"nativeclient" approach for executing native code in the browser,
a separate return point stack would be a big help.  Google's
"nativeclient" system protects return points, so that you can tell,
from the source code, all the places control can go.  This is
a protection against redirection via buffer overflows, something
that's possible on x86 because the return points and other data
share the same stack.

     Note that if you run out of return point stack, or parameter
stack, you're stuck.  So there's a hardware limit on call depth.
National Semiconductor once built a CPU with a separate return
point stack with a depth of 20.  Big mistake.

     (All of this is irrelevant to Python, though. Most of Python's
speed problems come from spending too much time looking up attributes
and functions in dictionaries.)

				John Nagle

[toc] | [prev] | [next] | [standalone]

#2535

From	Nobody <nobody@nowhere.com>
Date	2011-04-03 22:37 +0100
Message-ID	<pan.2011.04.03.21.36.55.47000@nowhere.com>
In reply to	#2532

On Sun, 03 Apr 2011 10:15:34 -0700, John Nagle wrote:

>      Note that if you run out of return point stack, or parameter
> stack, you're stuck.  So there's a hardware limit on call depth.
> National Semiconductor once built a CPU with a separate return
> point stack with a depth of 20.  Big mistake.

The 8-bit PIC microcontrollers have a separate return stack. The PIC10 has
a 2-level stack, the PIC16 has 8 levels, and the PIC18 has 31 levels.

But these chips range from 16 bytes of RAM and 256 words of flash for a
PIC10, through 64-256 bytes of RAM and 1-4K words of flash for a PIC16, up
to 2KiB of RAM and 16K words of flash for a PIC18, so you usually run out
of something else long before the maximum stack depth becomes an issue.

[toc] | [prev] | [next] | [standalone]

#2551

From	Dennis Lee Bieber <wlfraed@ix.netcom.com>
Date	2011-04-03 22:59 -0700
Message-ID	<mailman.183.1301896772.2990.python-list@python.org>
In reply to	#2535

On Sun, 03 Apr 2011 22:37:16 +0100, Nobody <nobody@nowhere.com>
declaimed the following in gmane.comp.python.general:

> On Sun, 03 Apr 2011 10:15:34 -0700, John Nagle wrote:
> 
> >      Note that if you run out of return point stack, or parameter
> > stack, you're stuck.  So there's a hardware limit on call depth.
> > National Semiconductor once built a CPU with a separate return
> > point stack with a depth of 20.  Big mistake.
> 
> The 8-bit PIC microcontrollers have a separate return stack. The PIC10 has
> a 2-level stack, the PIC16 has 8 levels, and the PIC18 has 31 levels.
> 
> But these chips range from 16 bytes of RAM and 256 words of flash for a
> PIC10, through 64-256 bytes of RAM and 1-4K words of flash for a PIC16, up
> to 2KiB of RAM and 16K words of flash for a PIC18, so you usually run out
> of something else long before the maximum stack depth becomes an issue.

Not an architecture on which to code a recursive Fibonacci sequence. <G>
-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
        wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

[toc] | [prev] | [standalone]

Page 2 of 2 — ← Prev page 1 [2]

csiph-web

Python CPU

Contents

#2575

#2604

#2590

#2593

#2607

#2530

#2532

#2535

#2551