Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #2366 > unrolled thread
| Started by | Brad <hwfwguy@gmail.com> |
|---|---|
| First post | 2011-04-01 08:38 -0700 |
| Last post | 2011-04-03 22:59 -0700 |
| Articles | 9 on this page of 29 — 14 participants |
Back to article view | Back to comp.lang.python
Python CPU Brad <hwfwguy@gmail.com> - 2011-04-01 08:38 -0700
Re: Python CPU jkn <jkn_gg@nicorp.f9.co.uk> - 2011-04-01 08:59 -0700
Re: Python CPU Nobody <nobody@nowhere.com> - 2011-04-01 17:52 +0100
Re: Python CPU Stefan Behnel <stefan_ml@behnel.de> - 2011-04-01 19:00 +0200
Re: Python CPU geremy condra <debatem1@gmail.com> - 2011-04-01 10:12 -0700
Re: Python CPU Emile van Sebille <emile@fenx.com> - 2011-04-01 11:28 -0700
Re: Python CPU Emile van Sebille <emile@fenx.com> - 2011-04-01 11:35 -0700
Re: Python CPU John Nagle <nagle@animats.com> - 2011-04-01 12:18 -0700
Re: Python CPU "BartC" <bc@freeuk.com> - 2011-04-02 23:06 +0100
Re: Python CPU Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-04-03 04:03 +0000
Re: Python CPU Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2011-04-03 12:10 +1200
Re: Python CPU Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-04-03 04:01 +0000
Re: Python CPU John Nagle <nagle@animats.com> - 2011-04-03 00:16 -0700
Re: Python CPU Paul Rubin <no.email@nospam.invalid> - 2011-04-03 01:59 -0700
Re: Python CPU Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2011-04-04 19:41 +1200
Re: Python CPU Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2011-04-04 19:47 +1200
Re: Python CPU Paul Rubin <no.email@nospam.invalid> - 2011-04-04 02:23 -0700
Re: Python CPU Terry Reedy <tjreedy@udel.edu> - 2011-04-04 13:14 -0400
Re: Python CPU Terry Reedy <tjreedy@udel.edu> - 2011-04-04 14:58 -0400
Re: Python CPU Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2011-04-05 09:59 +1200
Re: Python CPU geremy condra <debatem1@gmail.com> - 2011-04-04 09:36 -0700
Re: Python CPU Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2011-04-05 10:05 +1200
Re: Python CPU John Nagle <nagle@animats.com> - 2011-04-04 12:56 -0700
Re: Python CPU Paul Rubin <no.email@nospam.invalid> - 2011-04-04 13:20 -0700
Re: Python CPU Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2011-04-05 10:11 +1200
Re: Python CPU Werner Thie <werner@thieprojects.ch> - 2011-04-03 17:44 +0200
Re: Python CPU John Nagle <nagle@animats.com> - 2011-04-03 10:15 -0700
Re: Python CPU Nobody <nobody@nowhere.com> - 2011-04-03 22:37 +0100
Re: Python CPU Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2011-04-03 22:59 -0700
Page 2 of 2 — ← Prev page 1 [2]
| From | geremy condra <debatem1@gmail.com> |
|---|---|
| Date | 2011-04-04 09:36 -0700 |
| Message-ID | <mailman.8.1301935009.9059.python-list@python.org> |
| In reply to | #2556 |
On Mon, Apr 4, 2011 at 12:47 AM, Gregory Ewing <greg.ewing@canterbury.ac.nz> wrote: > John Nagle wrote: > >> A tagged machine might make Python faster. You could have >> unboxed ints and floats, yet still allow values of other types, >> with the hardware tagging helping with dispatch. But it probably >> wouldn't help all that much. It didn't in the LISP machines. > > What might help more is having bytecodes that operate on > arrays of unboxed types -- numpy acceleration in hardware. I'd be interested in seeing the performance impact of this, although I wonder if it'd be feasible. Geremy Condra
[toc] | [prev] | [next] | [standalone]
| From | Gregory Ewing <greg.ewing@canterbury.ac.nz> |
|---|---|
| Date | 2011-04-05 10:05 +1200 |
| Message-ID | <8vutl0Fdm3U1@mid.individual.net> |
| In reply to | #2575 |
geremy condra wrote: > I'd be interested in seeing the performance impact of this, although I > wonder if it'd be feasible. A project I have in the back of my mind goes something like this: 1) Design an instruction set for a Python machine and a microcode architecture to support it 2) Write a simulator for it 3) Use the simulator to evaluate how effective it would be if actually implemented, e.g. in an FPGA. And if I get that far: 4) (optional) Get hold of a real FPGA and implement it -- Greg
[toc] | [prev] | [next] | [standalone]
| From | John Nagle <nagle@animats.com> |
|---|---|
| Date | 2011-04-04 12:56 -0700 |
| Message-ID | <4d9a2281$0$10524$742ec2ed@news.sonic.net> |
| In reply to | #2556 |
On 4/4/2011 12:47 AM, Gregory Ewing wrote:
> John Nagle wrote:
>
>> A tagged machine might make Python faster. You could have
>> unboxed ints and floats, yet still allow values of other types,
>> with the hardware tagging helping with dispatch. But it probably
>> wouldn't help all that much. It didn't in the LISP machines.
>
> What might help more is having bytecodes that operate on
> arrays of unboxed types -- numpy acceleration in hardware.
That sort of thing was popular in the era of the early
Cray machines. Once superscalar CPUs were developed,
the overhead on tight inner loops went down, and several
iterations of a loop could be in the pipeline at one time,
if they didn't conflict. Modern superscalar machines have
register renaming, so the same program-visible register on
two successive iterations can map to different registers within
the CPU, allowing two iterations of the same loop to execute
simultaneously. This eliminates the need for loop unrolling and
Duff's device.
John Nagle
[toc] | [prev] | [next] | [standalone]
| From | Paul Rubin <no.email@nospam.invalid> |
|---|---|
| Date | 2011-04-04 13:20 -0700 |
| Message-ID | <7xhbadojk0.fsf@ruckus.brouhaha.com> |
| In reply to | #2590 |
John Nagle <nagle@animats.com> writes: > That sort of thing was popular in the era of the early > Cray machines. Once superscalar CPUs were developed, > the overhead on tight inner loops went down, and several > iterations of a loop could be in the pipeline at one time, Vector processors are back, they just call them GPGPU's now.
[toc] | [prev] | [next] | [standalone]
| From | Gregory Ewing <greg.ewing@canterbury.ac.nz> |
|---|---|
| Date | 2011-04-05 10:11 +1200 |
| Message-ID | <8vuu1hFg70U1@mid.individual.net> |
| In reply to | #2593 |
Paul Rubin wrote: > Vector processors are back, they just call them GPGPU's now. Also present to some extent in the CPU, with MMX, Altivec, etc. -- Greg
[toc] | [prev] | [next] | [standalone]
| From | Werner Thie <werner@thieprojects.ch> |
|---|---|
| Date | 2011-04-03 17:44 +0200 |
| Message-ID | <mailman.170.1301845860.2990.python-list@python.org> |
| In reply to | #2477 |
You probably heard of the infamous FORTH chips like the Harris RTX2000, or ShhBoom, which implemented a stack oriented very low power design before there were FPGAs in silicon. To my knowledge the RTX2000 is still used for space hardened application and if I search long enough I might fine the one I had sitting in my cellar. The chip was at that time so insanely fast that it could produce video signals with FORTH programs driving the IO pins. Chuck Moore, father of FORTH developed the chip on silicon in FORTH itself. Due to the fact, that the instruction sets of a FORTH machine, being a very general stack based von Neumann system, I believe that starting with an RTX2000 (which should be available in VHDL) one could quite fast be at a point where things make sense, meaning not going for the 'fastest' ever CPU but for the advantage of having a decent CPU programmable in Python sitting on a chip with a lot of hardware available. Another thing worth to mention in this context is for sure the work available on http://www.myhdl.org/doku.php. Werner On 4/3/11 3:46 AM, Dan Stromberg wrote: > > On Sat, Apr 2, 2011 at 5:10 PM, Gregory Ewing > <greg.ewing@canterbury.ac.nz <mailto:greg.ewing@canterbury.ac.nz>> wrote: > > Brad wrote: > > I've heard of Java CPUs. Has anyone implemented a Python CPU in VHDL > or Verilog? > > > Not that I know of. > > I've had thoughts about designing one, just for the exercise. > > It's doubtful whether such a thing would ever be of practical > use. Without as much money as Intel has to throw at CPU > development, it's likely that a Python chip would always be > slower and more expensive than an off-the-shelf CPU running > a tightly-coded interpreter. > > It could be fun to speculate on what a Python CPU might > look like, though. > > > One with the time and inclination could probably do a Python VM in an > FPGA, no? > > Though last I heard, FPGA's weren't expected to increase in performance > as fast as general-purpose CPU's. >
[toc] | [prev] | [next] | [standalone]
| From | John Nagle <nagle@animats.com> |
|---|---|
| Date | 2011-04-03 10:15 -0700 |
| Message-ID | <4d98ab35$0$10594$742ec2ed@news.sonic.net> |
| In reply to | #2530 |
On 4/3/2011 8:44 AM, Werner Thie wrote:
> You probably heard of the infamous FORTH chips like the Harris RTX2000,
> or ShhBoom, which implemented a stack oriented very low power design
> before there were FPGAs in silicon. To my knowledge the RTX2000 is still
> used for space hardened application and if I search long enough I might
> fine the one I had sitting in my cellar.
>
> The chip was at that time so insanely fast that it could produce video
> signals with FORTH programs driving the IO pins. Chuck Moore, father of
> FORTH developed the chip on silicon in FORTH itself.
He did version 1, which had a broken integer divide operation.
(Divisors which were odd numbers produced wrong answers. Really.)
I came across one of those in a demo setup at a surplus store in
Silicon Valley, driving the CRT and with Moore's interface that
did everything with chords on three buttons.
> Due to the fact, that the instruction sets of a FORTH machine, being a
> very general stack based von Neumann system, I believe that starting
> with an RTX2000 (which should be available in VHDL) one could quite fast
> be at a point where things make sense, meaning not going for the
> 'fastest' ever CPU but for the advantage of having a decent CPU
> programmable in Python sitting on a chip with a lot of hardware available.
Willow Garage has VHDL available for a Forth CPU. It's only 200
lines.
The Forth CPUs have three separate memories - RAM, Forth stack,
and return stack. All three are accessed on each cycle. Back before
microprocessors had caches, this was a win over traditional CPUs,
where memory had to be accessed sequentially for those functions.
Once caches came in, it was a lose.
It's interesting that if you wanted to design a CPU for Googles's
"nativeclient" approach for executing native code in the browser,
a separate return point stack would be a big help. Google's
"nativeclient" system protects return points, so that you can tell,
from the source code, all the places control can go. This is
a protection against redirection via buffer overflows, something
that's possible on x86 because the return points and other data
share the same stack.
Note that if you run out of return point stack, or parameter
stack, you're stuck. So there's a hardware limit on call depth.
National Semiconductor once built a CPU with a separate return
point stack with a depth of 20. Big mistake.
(All of this is irrelevant to Python, though. Most of Python's
speed problems come from spending too much time looking up attributes
and functions in dictionaries.)
John Nagle
[toc] | [prev] | [next] | [standalone]
| From | Nobody <nobody@nowhere.com> |
|---|---|
| Date | 2011-04-03 22:37 +0100 |
| Message-ID | <pan.2011.04.03.21.36.55.47000@nowhere.com> |
| In reply to | #2532 |
On Sun, 03 Apr 2011 10:15:34 -0700, John Nagle wrote: > Note that if you run out of return point stack, or parameter > stack, you're stuck. So there's a hardware limit on call depth. > National Semiconductor once built a CPU with a separate return > point stack with a depth of 20. Big mistake. The 8-bit PIC microcontrollers have a separate return stack. The PIC10 has a 2-level stack, the PIC16 has 8 levels, and the PIC18 has 31 levels. But these chips range from 16 bytes of RAM and 256 words of flash for a PIC10, through 64-256 bytes of RAM and 1-4K words of flash for a PIC16, up to 2KiB of RAM and 16K words of flash for a PIC18, so you usually run out of something else long before the maximum stack depth becomes an issue.
[toc] | [prev] | [next] | [standalone]
| From | Dennis Lee Bieber <wlfraed@ix.netcom.com> |
|---|---|
| Date | 2011-04-03 22:59 -0700 |
| Message-ID | <mailman.183.1301896772.2990.python-list@python.org> |
| In reply to | #2535 |
On Sun, 03 Apr 2011 22:37:16 +0100, Nobody <nobody@nowhere.com>
declaimed the following in gmane.comp.python.general:
> On Sun, 03 Apr 2011 10:15:34 -0700, John Nagle wrote:
>
> > Note that if you run out of return point stack, or parameter
> > stack, you're stuck. So there's a hardware limit on call depth.
> > National Semiconductor once built a CPU with a separate return
> > point stack with a depth of 20. Big mistake.
>
> The 8-bit PIC microcontrollers have a separate return stack. The PIC10 has
> a 2-level stack, the PIC16 has 8 levels, and the PIC18 has 31 levels.
>
> But these chips range from 16 bytes of RAM and 256 words of flash for a
> PIC10, through 64-256 bytes of RAM and 1-4K words of flash for a PIC16, up
> to 2KiB of RAM and 16K words of flash for a PIC18, so you usually run out
> of something else long before the maximum stack depth becomes an issue.
Not an architecture on which to code a recursive Fibonacci sequence. <G>
--
Wulfraed Dennis Lee Bieber AF6VN
wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/
[toc] | [prev] | [standalone]
Page 2 of 2 — ← Prev page 1 [2]
Back to top | Article view | comp.lang.python
csiph-web