Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.forth > #20562 > unrolled thread

Hosted Forths on multicore machines

Started byAlex McDonald <blog@rivadpm.com>
First post2013-03-11 14:17 -0700
Last post2013-03-12 15:22 -0700
Articles 20 on this page of 36 — 10 participants

Back to article view | Back to comp.lang.forth


Contents

  Hosted Forths on multicore machines Alex McDonald <blog@rivadpm.com> - 2013-03-11 14:17 -0700
    Re: Hosted Forths on multicore machines "Clyde W. Phillips Jr." <cwpjr02@gmail.com> - 2013-03-11 17:41 -0700
      Re: Hosted Forths on multicore machines Alex McDonald <blog@rivadpm.com> - 2013-03-12 06:38 -0700
    Re: Hosted Forths on multicore machines Bernd Paysan <bernd.paysan@gmx.de> - 2013-03-12 02:19 +0100
      Re: Hosted Forths on multicore machines Alex McDonald <blog@rivadpm.com> - 2013-03-12 06:05 -0700
        Re: Hosted Forths on multicore machines Alex McDonald <blog@rivadpm.com> - 2013-03-12 06:34 -0700
          Re: Hosted Forths on multicore machines Bernd Paysan <bernd.paysan@gmx.de> - 2013-03-13 01:51 +0100
        Re: Hosted Forths on multicore machines "Elizabeth D. Rather" <erather@forth.com> - 2013-03-12 09:46 -1000
          Re: Hosted Forths on multicore machines Roelf Toxopeus <rt4all@notthis.hetnet.nl> - 2013-03-13 11:30 +0100
        Re: Hosted Forths on multicore machines Andrew Haley <andrew29@littlepinkcloud.invalid> - 2013-03-13 04:35 -0500
          Re: Hosted Forths on multicore machines Alex McDonald <blog@rivadpm.com> - 2013-03-13 06:55 -0700
            Re: Hosted Forths on multicore machines Andrew Haley <andrew29@littlepinkcloud.invalid> - 2013-03-13 09:49 -0500
              Re: Hosted Forths on multicore machines Paul Rubin <no.email@nospam.invalid> - 2013-03-13 08:18 -0700
                Re: Hosted Forths on multicore machines Andrew Haley <andrew29@littlepinkcloud.invalid> - 2013-03-13 10:39 -0500
                  Re: Hosted Forths on multicore machines Alex McDonald <blog@rivadpm.com> - 2013-03-13 16:36 -0700
                    Re: Hosted Forths on multicore machines Andrew Haley <andrew29@littlepinkcloud.invalid> - 2013-03-14 03:44 -0500
                      Re: Hosted Forths on multicore machines anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2013-03-14 09:06 +0000
                        Re: Hosted Forths on multicore machines Andrew Haley <andrew29@littlepinkcloud.invalid> - 2013-03-14 06:26 -0500
                          Re: Hosted Forths on multicore machines anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2013-03-14 15:41 +0000
                Re: Hosted Forths on multicore machines Bernd Paysan <bernd.paysan@gmx.de> - 2013-03-14 17:56 +0100
                  Re: Hosted Forths on multicore machines Andrew Haley <andrew29@littlepinkcloud.invalid> - 2013-03-15 03:26 -0500
                    Re: Hosted Forths on multicore machines Bernd Paysan <bernd.paysan@gmx.de> - 2013-03-16 23:11 +0100
    Re: Hosted Forths on multicore machines Andrew Haley <andrew29@littlepinkcloud.invalid> - 2013-03-12 05:01 -0500
      Re: Hosted Forths on multicore machines Alex McDonald <blog@rivadpm.com> - 2013-03-12 06:10 -0700
      Re: Hosted Forths on multicore machines Roelf Toxopeus <rt4all@notthis.hetnet.nl> - 2013-03-12 17:04 +0100
        Re: Hosted Forths on multicore machines Andrew Haley <andrew29@littlepinkcloud.invalid> - 2013-03-12 11:28 -0500
          Re: Hosted Forths on multicore machines Roelf Toxopeus <rt4all@notthis.hetnet.nl> - 2013-03-12 19:52 +0100
        Re: Hosted Forths on multicore machines morrimichael@gmail.com - 2013-03-12 10:11 -0700
          Re: Hosted Forths on multicore machines Alex McDonald <blog@rivadpm.com> - 2013-03-12 11:34 -0700
          Re: Hosted Forths on multicore machines Roelf Toxopeus <rt4all@notthis.hetnet.nl> - 2013-03-12 19:44 +0100
            Re: Hosted Forths on multicore machines Alex McDonald <blog@rivadpm.com> - 2013-03-12 13:13 -0700
              Re: Hosted Forths on multicore machines Roelf Toxopeus <rt4all@notthis.hetnet.nl> - 2013-03-15 15:44 +0100
                Re: Hosted Forths on multicore machines Alex McDonald <blog@rivadpm.com> - 2013-03-15 09:37 -0700
        Re: Hosted Forths on multicore machines Alex McDonald <blog@rivadpm.com> - 2013-03-12 11:37 -0700
          Re: Hosted Forths on multicore machines Roelf Toxopeus <rt4all@notthis.hetnet.nl> - 2013-03-13 10:46 +0100
    Re: Hosted Forths on multicore machines the_gavino_himself <visphatesjava@gmail.com> - 2013-03-12 15:22 -0700

Page 1 of 2  [1] 2  Next page →


#20562 — Hosted Forths on multicore machines

FromAlex McDonald <blog@rivadpm.com>
Date2013-03-11 14:17 -0700
SubjectHosted Forths on multicore machines
Message-ID<fd73df23-4c41-4cd7-936a-3141d752af25@g16g2000vbf.googlegroups.com>
As one of three Forth projects I'm pursuing at the moment (the others
are 64bit and an optimising compiler) I've been looking at specific
solutions for employing more than 1 core of my 8 (!) core laptop. (If
you won a laptop of less than 5 years vintage, it probably has more
than 1 core, and multicore machines have been available for a long
time in server land.)

There are two common architectures; SMP and NUMA. And there are a
plethora of chips and OSes that support these; Linux, Windows,
Andrpoid on ARM, Intel/AMD to name some of the commonest.

This has raised a number of questions for hosted Forths, some of which
I pose here.

Forth's traditional "multiuser" feature isn't adequate (PAUSE). It
simply isn't extendable to multi core machines. The model assumes a
single core with all work running on a single thread. There are no
synchronisation primitives, and there is no way to handle asynchrnous
interrupts.

Multicore support would appear to require:

. Memory fencing primitives (such as MFENCE SFENCE and LFENCE on x86)
. Sync primitives (CAS or LL/SC)
. Words for creating & managing threads
. Word(s) for querying processor attributes (such as # of cores)
. Closures (for which I have a proposal)

Would it be possible to agree that these are a reasonable set, and
that hosted Forth's would benefit from standardising some of these?

I've looked at a number of other languages that support these kind of
operations.

. GO uses high level constructs; this model looks interesting.
. C++ now has language intrinsics such as atomic_compare_exchange()
. Lambdas and blocks in C++, Objective-C and others
. Haskell, Erlang ...

Comments welcome.

[toc] | [next] | [standalone]


#20574

From"Clyde W. Phillips Jr." <cwpjr02@gmail.com>
Date2013-03-11 17:41 -0700
Message-ID<72c028ab-a4d0-4ff2-a9fa-19bbcb272fb1@googlegroups.com>
In reply to#20562
On Monday, March 11, 2013 4:17:25 PM UTC-5, Alex McDonald wrote:
> As one of three Forth projects I'm pursuing at the moment (the others
> 
> are 64bit and an optimising compiler) I've been looking at specific
> 
> solutions for employing more than 1 core of my 8 (!) core laptop. (If
> 
> you won a laptop of less than 5 years vintage, it probably has more
> 
> than 1 core, and multicore machines have been available for a long
> 
> time in server land.)
> 
> 
> 
> There are two common architectures; SMP and NUMA. And there are a
> 
> plethora of chips and OSes that support these; Linux, Windows,
> 
> Andrpoid on ARM, Intel/AMD to name some of the commonest.
> 
> 
> 
> This has raised a number of questions for hosted Forths, some of which
> 
> I pose here.
> 
> 
> 
> Forth's traditional "multiuser" feature isn't adequate (PAUSE). It
> 
> simply isn't extendable to multi core machines. The model assumes a
> 
> single core with all work running on a single thread. There are no
> 
> synchronisation primitives, and there is no way to handle asynchrnous
> 
> interrupts.
> 
> 
> 
> Multicore support would appear to require:
> 
> 
> 
> . Memory fencing primitives (such as MFENCE SFENCE and LFENCE on x86)
> 
> . Sync primitives (CAS or LL/SC)
> 
> . Words for creating & managing threads
> 
> . Word(s) for querying processor attributes (such as # of cores)
> 
> . Closures (for which I have a proposal)
> 
> 
> 
> Would it be possible to agree that these are a reasonable set, and
> 
> that hosted Forth's would benefit from standardising some of these?
> 
> 
> 
> I've looked at a number of other languages that support these kind of
> 
> operations.
> 
> 
> 
> . GO uses high level constructs; this model looks interesting.
> 
> . C++ now has language intrinsics such as atomic_compare_exchange()
> 
> . Lambdas and blocks in C++, Objective-C and others
> 
> . Haskell, Erlang ...
> 
> 
> 
> Comments welcome.

Would there be any value in segregating vocabularies to different cores, vs each being a complete system?

[toc] | [prev] | [next] | [standalone]


#20592

FromAlex McDonald <blog@rivadpm.com>
Date2013-03-12 06:38 -0700
Message-ID<b252ac73-2faa-4f13-8fa6-507e1c0525ef@o9g2000pbt.googlegroups.com>
In reply to#20574
On Mar 12, 12:41 am, "Clyde W. Phillips Jr." <cwpj...@gmail.com>
wrote:
> On Monday, March 11, 2013 4:17:25 PM UTC-5, Alex McDonald wrote:
> > As one of three Forth projects I'm pursuing at the moment (the others
>
> > are 64bit and an optimising compiler) I've been looking at specific
>
> > solutions for employing more than 1 core of my 8 (!) core laptop. (If
>
> > you won a laptop of less than 5 years vintage, it probably has more
>
> > than 1 core, and multicore machines have been available for a long
>
> > time in server land.)
>
> > There are two common architectures; SMP and NUMA. And there are a
>
> > plethora of chips and OSes that support these; Linux, Windows,
>
> > Andrpoid on ARM, Intel/AMD to name some of the commonest.
>
> > This has raised a number of questions for hosted Forths, some of which
>
> > I pose here.
>
> > Forth's traditional "multiuser" feature isn't adequate (PAUSE). It
>
> > simply isn't extendable to multi core machines. The model assumes a
>
> > single core with all work running on a single thread. There are no
>
> > synchronisation primitives, and there is no way to handle asynchrnous
>
> > interrupts.
>
> > Multicore support would appear to require:
>
> > . Memory fencing primitives (such as MFENCE SFENCE and LFENCE on x86)
>
> > . Sync primitives (CAS or LL/SC)
>
> > . Words for creating & managing threads
>
> > . Word(s) for querying processor attributes (such as # of cores)
>
> > . Closures (for which I have a proposal)
>
> > Would it be possible to agree that these are a reasonable set, and
>
> > that hosted Forth's would benefit from standardising some of these?
>
> > I've looked at a number of other languages that support these kind of
>
> > operations.
>
> > . GO uses high level constructs; this model looks interesting.
>
> > . C++ now has language intrinsics such as atomic_compare_exchange()
>
> > . Lambdas and blocks in C++, Objective-C and others
>
> > . Haskell, Erlang ...
>
> > Comments welcome.
>
> Would there be any value in segregating vocabularies to different cores, vs each being a complete system?

I'm not sure how that helps distribute work. It sounds like a multi-
user system, one per core, if I understand what you're saying. Then
we're into IPC or RPC for communication; the discussion is for
something a little more tightly coupled than that.

[toc] | [prev] | [next] | [standalone]


#20579

FromBernd Paysan <bernd.paysan@gmx.de>
Date2013-03-12 02:19 +0100
Message-ID<khlvr3$jq3$1@online.de>
In reply to#20562
Alex McDonald wrote:

> As one of three Forth projects I'm pursuing at the moment (the others
> are 64bit and an optimising compiler) I've been looking at specific
> solutions for employing more than 1 core of my 8 (!) core laptop. (If
> you won a laptop of less than 5 years vintage, it probably has more
> than 1 core, and multicore machines have been available for a long
> time in server land.)

Even my smartphone has four cores.  Single-core on Android sucks.

> Forth's traditional "multiuser" feature isn't adequate (PAUSE). It
> simply isn't extendable to multi core machines.

Well, in Gforth's unix/pthread.fs (using Posix threads as multitasker), 
PAUSE just maps to sched_yield().

I have

NewTask ( stacksize -- task )
\G creates a task, uses stacksize for stack, rstack, fpstack, locals
NewTask4 ( ssize rsize fpsize lsize -- task )
\G creates a task, each stack individually sized

Activation with

activate ( task r:cont -- )
\G activates task, the current procedure will be continued there

pass ( x1..xn n task r:cont -- x1..xn )
\G activates task, and passes n parameters from the data stack

sema ( "name" -- )
\G creates a semaphore "name" ( -- addr )

lock ( addr -- )
\G Aquires the lock

unlock ( addr -- )
\G releases the lock

stop ( -- )
\G stops the current task, waiting for events

stop-ns ( timeout-ns -- )
\G stops the current task, waiting for events or timeout in nanoseconds

There's an event system which needs some examples.  The easiest thing is to 
wake and stop other tasks:

event: ->wake ;
event: ->sleep  stop ;

: wake ( task -- )  <event ->wake event> ;
: sleep ( task -- ) <event ->sleep event> ;

Events are send as sequence of events, enclosed in <event .. event>.  The -> 
convention is still questionable, as Gforth also has a recognizer that usese 
->something for TO something (eliminating the need for the parsing TO).  It 
is just a convention, though, you can name your events whatever you like.

<event ( -- )
\G starts a sequence of events

event> ( task -- )
\G ends a sequence and sends it to the mentioned task

You can send literals, strings, and floats as part of the events:

elit, ( n -- )
\G sends a literal

e$, ( addr u -- )
\G sends a string (actually only the address and the count, because it's
\G shared memory

eflit, ( r -- )
\G sends a float

?events ( x1..xn -- y1..ym )
\G checks for events and executes them

event: ( "name" -- )
\G defines an event and the reaction to it as Forth code

> The model assumes a
> single core with all work running on a single thread. There are no
> synchronisation primitives, and there is no way to handle asynchrnous
> interrupts.

Yes.  The lock/unlock and the events as described above serve for 
synchronisation.  The events construct commands which are executed in the 
other task's context; IMHO this is the most forthish way to let threads 
communicate with each other.

Example: You use a few helper thread to download files from the internet via 
HTTP, e.g. 4 like IE does for fetching inline images; the helpers are chosen 
in a round-robin fashion.  It's asynchronous, events queue up, and you want 
to be notified when it's done, and then you want as notification *what* has 
been done (i.e. which url, and what the data is).

event: ->wdone ( content u1 url u2 -- )  cache-url rerender ;
event: ->wget ( url u task -- ) >r
       2dup wget <event 2swap e$, e$, ->wdone r> event> ;

: wget-async ( url u -- )
  <event e$, up@ elit, ->wget wget-helper event> ;

> Multicore support would appear to require:
> 
> . Memory fencing primitives (such as MFENCE SFENCE and LFENCE on x86)

x86 is easy, because it has TSO memory, anyways.  ARM has weak ordering, so 
you need it there.  We probably still need this to instruct the compiler 
that yes, it has to do fencing.  There are some optimizations the compiler 
is not allowed to do, like fetching the same value twice when there's only 
one @ in the code (VFX is an offender here).  A compiler-only fence between 
@ and DUP can help that.

> . Sync primitives (CAS or LL/SC)

I think we should have polling loops like LL/SC outside of the high level 
code.  CAS (and CAS2 for two values), a non-conditional exchange (I use the 
name !@ for that, but usually I only need the unlocked version), and an 
atomic increment are useful primitives.

> . Words for creating & managing threads

Just use the same words we used for creating tasks for the PAUSE 
multitasker.

> . Word(s) for querying processor attributes (such as # of cores)

cores ( -- n )?

You may need it if you write code that should just utilize all cores, but it 
is not something you get easily ported among platforms.  glibc has a 
sysconfig query for it.

> . Closures (for which I have a proposal)

I prefer PASS here.  Just pass a few parameters on the stack to the code 
that runs in the other task.

> Would it be possible to agree that these are a reasonable set, and
> that hosted Forth's would benefit from standardising some of these?

Well, maybe.  The stuff you explain is pretty low-level.  The events above 
are much higher level, and make programming with multiple threads very easy.  
You don't have to worry about low-level synchronizations, a bunch of events 
between <event and event> is constructed and then transmitted as a whole to 
the receiver (how that is achieved is totally up to the system implementer, 
including the necessary memory fence).

> I've looked at a number of other languages that support these kind of
> operations.
> 
> . GO uses high level constructs; this model looks interesting.

Yes, the higher level constructs are more useful.

> . C++ now has language intrinsics such as atomic_compare_exchange()
> . Lambdas and blocks in C++, Objective-C and others

The <event event> thing is my Forthish equivalent to these lambdas/blocks.

> . Haskell, Erlang ...
> 
> Comments welcome.

-- 
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://bernd-paysan.de/

[toc] | [prev] | [next] | [standalone]


#20589

FromAlex McDonald <blog@rivadpm.com>
Date2013-03-12 06:05 -0700
Message-ID<e0f429ac-b7ba-45ff-9900-293bd90cf992@oz4g2000pbc.googlegroups.com>
In reply to#20579
On Mar 12, 1:19 am, Bernd Paysan <bernd.pay...@gmx.de> wrote:
> Alex McDonald wrote:
> > As one of three Forth projects I'm pursuing at the moment (the others
> > are 64bit and an optimising compiler) I've been looking at specific
> > solutions for employing more than 1 core of my 8 (!) core laptop. (If
> > you won a laptop of less than 5 years vintage, it probably has more
> > than 1 core, and multicore machines have been available for a long
> > time in server land.)
>
> Even my smartphone has four cores.  Single-core on Android sucks.
>
> > Forth's traditional "multiuser" feature isn't adequate (PAUSE). It
> > simply isn't extendable to multi core machines.
>
> Well, in Gforth's unix/pthread.fs (using Posix threads as multitasker),
> PAUSE just maps to sched_yield().

That's not useful in a multicore environment. With 4 cores and 4
threads, one thread active per core, it will return immediately.
Unless there's processor affinity set on the thread, any core can run
any thread; all PAUSE does is pester the OS scheduler for no net
return. It's only useful in single core/multithread environments; for
instance, you release a mutex and want to have other tasks that are
waiting get CPU cycles.

>
> I have
>
> NewTask ( stacksize -- task )
> \G creates a task, uses stacksize for stack, rstack, fpstack, locals
> NewTask4 ( ssize rsize fpsize lsize -- task )
> \G creates a task, each stack individually sized
>
> Activation with
>
> activate ( task r:cont -- )
> \G activates task, the current procedure will be continued there
>
> pass ( x1..xn n task r:cont -- x1..xn )
> \G activates task, and passes n parameters from the data stack
>
> sema ( "name" -- )
> \G creates a semaphore "name" ( -- addr )
>
> lock ( addr -- )
> \G Aquires the lock
>
> unlock ( addr -- )
> \G releases the lock
>
> stop ( -- )
> \G stops the current task, waiting for events
>
> stop-ns ( timeout-ns -- )
> \G stops the current task, waiting for events or timeout in nanoseconds
>
> There's an event system which needs some examples.  The easiest thing is to
> wake and stop other tasks:
>
> event: ->wake ;
> event: ->sleep  stop ;
>
> : wake ( task -- )  <event ->wake event> ;
> : sleep ( task -- ) <event ->sleep event> ;
>
> Events are send as sequence of events, enclosed in <event .. event>.  The ->
> convention is still questionable, as Gforth also has a recognizer that usese
> ->something for TO something (eliminating the need for the parsing TO).  It
> is just a convention, though, you can name your events whatever you like.
>
> <event ( -- )
> \G starts a sequence of events
>
> event> ( task -- )
> \G ends a sequence and sends it to the mentioned task
>
> You can send literals, strings, and floats as part of the events:
>
> elit, ( n -- )
> \G sends a literal
>
> e$, ( addr u -- )
> \G sends a string (actually only the address and the count, because it's
> \G shared memory
>
> eflit, ( r -- )
> \G sends a float
>
> ?events ( x1..xn -- y1..ym )
> \G checks for events and executes them
>
> event: ( "name" -- )
> \G defines an event and the reaction to it as Forth code

I'll take a look at that; Win32Forth has a similar tasker and
semaphores. I'm not a big fan of locks, especially for queue
management.

>
> > The model assumes a
> > single core with all work running on a single thread. There are no
> > synchronisation primitives, and there is no way to handle asynchrnous
> > interrupts.
>
> Yes.  The lock/unlock and the events as described above serve for
> synchronisation.  The events construct commands which are executed in the
> other task's context; IMHO this is the most forthish way to let threads
> communicate with each other.

That's where a closure-like facility becomes useful, since each task
has a quite distinct stack and environment.

>
> Example: You use a few helper thread to download files from the internet via
> HTTP, e.g. 4 like IE does for fetching inline images; the helpers are chosen
> in a round-robin fashion.  It's asynchronous, events queue up, and you want
> to be notified when it's done, and then you want as notification *what* has
> been done (i.e. which url, and what the data is).
>
> event: ->wdone ( content u1 url u2 -- )  cache-url rerender ;
> event: ->wget ( url u task -- ) >r
>        2dup wget <event 2swap e$, e$, ->wdone r> event> ;
>
> : wget-async ( url u -- )
>   <event e$, up@ elit, ->wget wget-helper event> ;
>
> > Multicore support would appear to require:
>
> > . Memory fencing primitives (such as MFENCE SFENCE and LFENCE on x86)
>
> x86 is easy, because it has TSO memory, anyways.

Only by using RMW instructions. There are temporal issues with load x/
store x being seen as store x/load x by other processors due to OOO
execution.

> ARM has weak ordering, so
> you need it there.  We probably still need this to instruct the compiler
> that yes, it has to do fencing.  There are some optimizations the compiler
> is not allowed to do, like fetching the same value twice when there's only
> one @ in the code (VFX is an offender here).  A compiler-only fence between
> @ and DUP can help that.

Or a word like volatile@ ?

I think we we need a set of read/write acquire/release primitives for
ARM and other relaxed consistency processors. But I don't know enough
about them to comment.

>
> > . Sync primitives (CAS or LL/SC)
>
> I think we should have polling loops like LL/SC outside of the high level
> code.  CAS (and CAS2 for two values), a non-conditional exchange (I use the
> name !@ for that, but usually I only need the unlocked version), and an
> atomic increment are useful primitives.

( v -- volatile address
  c -- comparand
  x -- exchange value
  vval -- original value of v )

: atomic-@!       ( x v -- vval )    \ as per x86 XCHG
: atomic-2@!      ( dx v -- dvval )  \ see *
: atomic-@+!      ( x v -- vval )    \ as per x86 XADD
: atomic-2@+!     ( dx v -- dvval )  \ see *
: atomic-@cmp!    ( c x v -- vval )  \ as per x86 cmpxchg (32) or
cmpxchg8b (64)
: atomic-2@cmp!   ( dc dx v -- dvval ) \ ditto cmpxhcg8b (32) or
cmpxchg16b (64)

Atomic increment is  : atomic-incr ( -- n ) 1 var atomic-@+! ;

*On a 32bit(64bit) x86 processor, doing a 2 cell atomic exchange or
add will be problematic since there's no 64(128) bit equivalent, and
it will need to be emulated with cmpxchg8b(16b) in a spin loop.


>
> > . Words for creating & managing threads
>
> Just use the same words we used for creating tasks for the PAUSE
> multitasker.
>
> > . Word(s) for querying processor attributes (such as # of cores)
>
> cores ( -- n )?
>
> You may need it if you write code that should just utilize all cores, but it
> is not something you get easily ported among platforms.  glibc has a
> sysconfig query for it.
>
> > . Closures (for which I have a proposal)

I'll come back to this, since it relates to producer/consumer queues.

>
> I prefer PASS here.  Just pass a few parameters on the stack to the code
> that runs in the other task.
>
> > Would it be possible to agree that these are a reasonable set, and
> > that hosted Forth's would benefit from standardising some of these?
>
> Well, maybe.  The stuff you explain is pretty low-level.  The events above
> are much higher level, and make programming with multiple threads very easy.
> You don't have to worry about low-level synchronizations, a bunch of events
> between <event and event> is constructed and then transmitted as a whole to
> the receiver (how that is achieved is totally up to the system implementer,
> including the necessary memory fence).
>
> > I've looked at a number of other languages that support these kind of
> > operations.
>
> > . GO uses high level constructs; this model looks interesting.
>
> Yes, the higher level constructs are more useful.
>
> > . C++ now has language intrinsics such as atomic_compare_exchange()
> > . Lambdas and blocks in C++, Objective-C and others
>
> The <event event> thing is my Forthish equivalent to these lambdas/blocks.
>
> > . Haskell, Erlang ...
>
> > Comments welcome.
>
> --
> Bernd Paysan
> "If you want it done right, you have to do it yourself"http://bernd-paysan.de/

[toc] | [prev] | [next] | [standalone]


#20591

FromAlex McDonald <blog@rivadpm.com>
Date2013-03-12 06:34 -0700
Message-ID<7b6f28a8-83d1-429e-9c20-22959df667c3@q9g2000pbf.googlegroups.com>
In reply to#20589
On Mar 12, 1:05 pm, Alex McDonald <b...@rivadpm.com> wrote:
> On Mar 12, 1:19 am, Bernd Paysan <bernd.pay...@gmx.de> wrote:
>
> > Alex McDonald wrote:
[snip]
>
> > > Forth's traditional "multiuser" feature isn't adequate (PAUSE). It
> > > simply isn't extendable to multi core machines.
>
> > Well, in Gforth's unix/pthread.fs (using Posix threads as multitasker),
> > PAUSE just maps to sched_yield().
>
> That's not useful in a multicore environment. With 4 cores and 4
> threads, one thread active per core, it will return immediately.
> Unless there's processor affinity set on the thread, any core can run
> any thread; all PAUSE does is pester the OS scheduler for no net
> return. It's only useful in single core/multithread environments; for
> instance, you release a mutex and want to have other tasks that are
> waiting get CPU cycles.
>

(adding) Or if you have an issue with priority inversion, PAUSE may be
useful. But it is avoidable with good design.

[toc] | [prev] | [next] | [standalone]


#20610

FromBernd Paysan <bernd.paysan@gmx.de>
Date2013-03-13 01:51 +0100
Message-ID<khoiia$rt6$1@online.de>
In reply to#20591
Alex McDonald wrote:

> On Mar 12, 1:05 pm, Alex McDonald <b...@rivadpm.com> wrote:
>> On Mar 12, 1:19 am, Bernd Paysan <bernd.pay...@gmx.de> wrote:
>>
>> > Alex McDonald wrote:
> [snip]
>>
>> > > Forth's traditional "multiuser" feature isn't adequate (PAUSE). It
>> > > simply isn't extendable to multi core machines.
>>
>> > Well, in Gforth's unix/pthread.fs (using Posix threads as multitasker),
>> > PAUSE just maps to sched_yield().
>>
>> That's not useful in a multicore environment. With 4 cores and 4
>> threads, one thread active per core, it will return immediately.
>> Unless there's processor affinity set on the thread, any core can run
>> any thread; all PAUSE does is pester the OS scheduler for no net
>> return. It's only useful in single core/multithread environments; for
>> instance, you release a mutex and want to have other tasks that are
>> waiting get CPU cycles.
>>
> 
> (adding) Or if you have an issue with priority inversion, PAUSE may be
> useful. But it is avoidable with good design.

I don't actually use PAUSE.  It's just there, for "compatibility reasons" 
and it calls sched_yield(), which sometimes may be useful.  I exclusively 
use the <event ->somesignal task event> construct.  It does everything I 
need: It sends an atomic (multi-)message from one task to the other, and I 
can define the sequence point (the event> side is the sender's sequence 
point, the ?events is the receiver's sequence point).

Example: I'm splitting my net2o packet handler code off into a task at the 
moment.  This means you can write event-driven programs that communicate 
with each others through net2o.  It turned out to be a trivial exercise: 
Just put a <event ... task event> wrapper around the packed handler, and if 
there is an event to signal inside the code that handles packets, signal it.  
I need to expose the unsent event queue length, because there's no need to 
send empty queues around (or is this just an optimization issue?  <event 
task event> should really do nothing, and do it fast).

This is still somehow in flux, because <event is not really needed.  
Whatever events come, queue up, and event> sends them.  The only possible 
reason why you might want <event is for nesting, something like

<event ->foo ... <event ->bar a event> ... b event>

-- 
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://bernd-paysan.de/

[toc] | [prev] | [next] | [standalone]


#20604

From"Elizabeth D. Rather" <erather@forth.com>
Date2013-03-12 09:46 -1000
Message-ID<ddKdnRHFKf2-G6LMnZ2dnUVZ_uWdnZ2d@supernews.com>
In reply to#20589
On 3/12/13 3:05 AM, Alex McDonald wrote:
>>> Forth's traditional "multiuser" feature isn't adequate (PAUSE). It
>>> > >simply isn't extendable to multi core machines.
>> >
>> >Well, in Gforth's unix/pthread.fs (using Posix threads as multitasker),
>> >PAUSE just maps to sched_yield().
> That's not useful in a multicore environment. With 4 cores and 4
> threads, one thread active per core, it will return immediately.
> Unless there's processor affinity set on the thread, any core can run
> any thread; all PAUSE does is pester the OS scheduler for no net
> return. It's only useful in single core/multithread environments; for
> instance, you release a mutex and want to have other tasks that are
> waiting get CPU cycles.

Many years ago, even before multicore machines came out, we concluded 
that the traditional Forth multitasker is inappropriate in a hosted 
environment. We completely redesigned task management for Windows 
threads. The result is documented in the SwiftForth manual (included 
with all SwiftForth downloads, including the eval version).

Cheers,
Elizabeth

-- 
==================================================
Elizabeth D. Rather   (US & Canada)   800-55-FORTH
FORTH Inc.                         +1 310.999.6784
5959 West Century Blvd. Suite 700
Los Angeles, CA 90045
http://www.forth.com

"Forth-based products and Services for real-time
applications since 1973."
==================================================

[toc] | [prev] | [next] | [standalone]


#20623

FromRoelf Toxopeus <rt4all@notthis.hetnet.nl>
Date2013-03-13 11:30 +0100
Message-ID<rt4all-CD47F1.11304913032013@[10.12.75.213]>
In reply to#20604
In article <ddKdnRHFKf2-G6LMnZ2dnUVZ_uWdnZ2d@supernews.com>,
 "Elizabeth D. Rather" <erather@forth.com> wrote:

> We completely redesigned task management for Windows 
> threads. The result is documented in the SwiftForth manual (included 
> with all SwiftForth downloads, including the eval version).

What I appreciate very much, is that the diverse implementations for 
PolyForth, SwiftX, SwiftForth Windows, SwiftForth Linux, SwiftForth OSX
use a common set of names/words.

1980's code running on a trad multitasking Forth under a monotasking OS 
on a single core cpu, runs ias good as unaltered on a multicore-tasking 
Forth under a multicore-tasking OS on a multicore cpu. I think that's 
quite neat.

(Mach2/AtariGEM/68000 -> Coco-SF/OSX/i7)

Thanks!
-Roelf

[toc] | [prev] | [next] | [standalone]


#20618

FromAndrew Haley <andrew29@littlepinkcloud.invalid>
Date2013-03-13 04:35 -0500
Message-ID<p-Sdndgja6PN1d3MnZ2dnUVZ_j2dnZ2d@supernews.com>
In reply to#20589
Alex McDonald <blog@rivadpm.com> wrote:
> On Mar 12, 1:19?am, Bernd Paysan <bernd.pay...@gmx.de> wrote:
>> Well, in Gforth's unix/pthread.fs (using Posix threads as multitasker),
>> PAUSE just maps to sched_yield().
> 
> That's not useful in a multicore environment. With 4 cores and 4
> threads, one thread active per core, it will return immediately.
> Unless there's processor affinity set on the thread, any core can run
> any thread; all PAUSE does is pester the OS scheduler for no net
> return. It's only useful in single core/multithread environments; for
> instance, you release a mutex and want to have other tasks that are
> waiting get CPU cycles.

That is not true.  When you have a number of threads all trying to
acquire a lock it makes sense to use some kind of exponential backoff.
First you simply spin, then call PAUSE, and then only if you still
haven't acquired a lock do you need to go the heavyweight route of
creating a mutex and blocking.  This is a huge win in a heavily
contended environment where locks are held very briefly.

Andrew.

[toc] | [prev] | [next] | [standalone]


#20635

FromAlex McDonald <blog@rivadpm.com>
Date2013-03-13 06:55 -0700
Message-ID<376c6d69-3339-4d48-8df3-1e04730e3ac0@y9g2000vbb.googlegroups.com>
In reply to#20618
On Mar 13, 9:35 am, Andrew Haley <andre...@littlepinkcloud.invalid>
wrote:
> Alex McDonald <b...@rivadpm.com> wrote:
> > On Mar 12, 1:19?am, Bernd Paysan <bernd.pay...@gmx.de> wrote:
> >> Well, in Gforth's unix/pthread.fs (using Posix threads as multitasker),
> >> PAUSE just maps to sched_yield().
>
> > That's not useful in a multicore environment. With 4 cores and 4
> > threads, one thread active per core, it will return immediately.
> > Unless there's processor affinity set on the thread, any core can run
> > any thread; all PAUSE does is pester the OS scheduler for no net
> > return. It's only useful in single core/multithread environments; for
> > instance, you release a mutex and want to have other tasks that are
> > waiting get CPU cycles.
>
> That is not true.  When you have a number of threads all trying to
> acquire a lock it makes sense to use some kind of exponential backoff.
> First you simply spin, then call PAUSE, and then only if you still
> haven't acquired a lock do you need to go the heavyweight route of
> creating a mutex and blocking.  This is a huge win in a heavily
> contended environment where locks are held very briefly.
>
> Andrew.

It's the translation of PAUSE to Yield() or sched_yield() or
equivalent that's the issue. The scheduler has been given no reason to
make the switch; a yield says "If there's someone with a higher
priority, run them as I'm willing to wait. Otherwise run me." You're
not saying "I'm waiting for a lock, so run the guy that holds it."

For several tasks of equal or lower priority to the lock getter,
nothing happens and the yield simply returns immediately, so the
likelihood of getting the lock after a spin then yield is (a) on a
single core, exactly zero (b) on a multicore, it depends on whether
the lock holding thread has released it in the time it takes to do the
yield. It may not be running on any other core either, so either way
it's close to zero as well.

Windows & POSIX programmers seem to be advised to use sleep for a very
small interval (not zero which appears to be equivalent to a yield) to
kick the scheduler into action, as sleeping indicates you are willing
to give up your timeslice. Now you're saying "Do something else, I
need a rest. Get back to me later."

Yield (or PAUSE) won't work. You've got to sleep.

[toc] | [prev] | [next] | [standalone]


#20638

FromAndrew Haley <andrew29@littlepinkcloud.invalid>
Date2013-03-13 09:49 -0500
Message-ID<eZydnctgMdVxDN3MnZ2dnUVZ_sqdnZ2d@supernews.com>
In reply to#20635
Alex McDonald <blog@rivadpm.com> wrote:
> On Mar 13, 9:35?am, Andrew Haley <andre...@littlepinkcloud.invalid>
> wrote:
>> Alex McDonald <b...@rivadpm.com> wrote:
>> > On Mar 12, 1:19?am, Bernd Paysan <bernd.pay...@gmx.de> wrote:
>> >> Well, in Gforth's unix/pthread.fs (using Posix threads as multitasker),
>> >> PAUSE just maps to sched_yield().
>>
>> > That's not useful in a multicore environment. With 4 cores and 4
>> > threads, one thread active per core, it will return immediately.
>> > Unless there's processor affinity set on the thread, any core can run
>> > any thread; all PAUSE does is pester the OS scheduler for no net
>> > return. It's only useful in single core/multithread environments; for
>> > instance, you release a mutex and want to have other tasks that are
>> > waiting get CPU cycles.
>>
>> That is not true. When you have a number of threads all trying to
>> acquire a lock it makes sense to use some kind of exponential backoff.
>> First you simply spin, then call PAUSE, and then only if you still
>> haven't acquired a lock do you need to go the heavyweight route of
>> creating a mutex and blocking. This is a huge win in a heavily
>> contended environment where locks are held very briefly.
> 
> It's the translation of PAUSE to Yield() or sched_yield() or
> equivalent that's the issue. The scheduler has been given no reason
> to make the switch; a yield says "If there's someone with a higher
> priority, 

... higher or same ...

> run them as I'm willing to wait. Otherwise run me." You're not
> saying "I'm waiting for a lock, so run the guy that holds it."

Usually you don't know who holds it, and it would be effortful to find
out, so you just spin for a short while.

> For several tasks of equal or lower priority to the lock getter,
> nothing happens and the yield simply returns immediately, so the
> likelihood of getting the lock after a spin then yield is (a) on a
> single core, exactly zero (b) on a multicore, it depends on whether
> the lock holding thread has released it in the time it takes to do
> the yield.

Right, and often it will have done.  If you have a lot of threads
contending for locks held for a long time, you might as well use
heavyweight locks and block.  That's not the case I'm talking about,
which is fairly high contention but locks held for a short time.
I'm taking about a bunch of worker tasks of the same priority.

> It may not be running on any other core either, so either way it's
> close to zero as well.

Why is it close to zero?  Not IME.

> Windows & POSIX programmers seem to be advised to use sleep for a very
> small interval (not zero which appears to be equivalent to a yield) to
> kick the scheduler into action, as sleeping indicates you are willing
> to give up your timeslice. Now you're saying "Do something else, I
> need a rest. Get back to me later."
> 
> Yield (or PAUSE) won't work. You've got to sleep.

Not necessarily: even a few laps of an empty loop or just retrying the
lock may well be enough.

Andrew.

[toc] | [prev] | [next] | [standalone]


#20639

FromPaul Rubin <no.email@nospam.invalid>
Date2013-03-13 08:18 -0700
Message-ID<7xoben1g7l.fsf@ruckus.brouhaha.com>
In reply to#20638
Andrew Haley <andrew29@littlepinkcloud.invalid> writes:
> Not necessarily: even a few laps of an empty loop or just retrying the
> lock may well be enough.

I wonder if spinning is still reasonable with current cpu's, where
memory accesses (to retry the lock) cost 100's of cpu cycles, and the
energy dissipation from spinning is potentially costly in its own right.
Is there any hardware assist for this type of thing?

[toc] | [prev] | [next] | [standalone]


#20640

FromAndrew Haley <andrew29@littlepinkcloud.invalid>
Date2013-03-13 10:39 -0500
Message-ID<_fWdnYZr1Js3AN3MnZ2dnUVZ_rqdnZ2d@supernews.com>
In reply to#20639
Paul Rubin <no.email@nospam.invalid> wrote:
> Andrew Haley <andrew29@littlepinkcloud.invalid> writes:
>> Not necessarily: even a few laps of an empty loop or just retrying the
>> lock may well be enough.
> 
> I wonder if spinning is still reasonable with current cpu's, where
> memory accesses (to retry the lock) cost 100's of cpu cycles, and
> the energy dissipation from spinning is potentially costly in its
> own right.

It is.  We have snoopy caches, and you won't retry many times before
blocking.  Even on a fast box, blocking for a futex takes at least a
microsecond whereas a read from L3 is tens of nanoseconds.

> Is there any hardware assist for this type of thing?

Yes, there is.  Intel has the PAUSE instruction, which allows another
thread to use the processor.  It's recommended for just this purpose.

Andrew.

[toc] | [prev] | [next] | [standalone]


#20646

FromAlex McDonald <blog@rivadpm.com>
Date2013-03-13 16:36 -0700
Message-ID<1ebc328a-190a-4207-862a-0b1175e15c68@f6g2000yqm.googlegroups.com>
In reply to#20640
On Mar 13, 3:39 pm, Andrew Haley <andre...@littlepinkcloud.invalid>
wrote:
> Paul Rubin <no.em...@nospam.invalid> wrote:
> > Andrew Haley <andre...@littlepinkcloud.invalid> writes:
> >> Not necessarily: even a few laps of an empty loop or just retrying the
> >> lock may well be enough.
>
> > I wonder if spinning is still reasonable with current cpu's, where
> > memory accesses (to retry the lock) cost 100's of cpu cycles, and
> > the energy dissipation from spinning is potentially costly in its
> > own right.
>
> It is.  We have snoopy caches, and you won't retry many times before
> blocking.  Even on a fast box, blocking for a futex takes at least a
> microsecond whereas a read from L3 is tens of nanoseconds.
>
> > Is there any hardware assist for this type of thing?
>
> Yes, there is.  Intel has the PAUSE instruction, which allows another
> thread to use the processor.  It's recommended for just this purpose.

It does? I thought it was to reduce certain side effects on Xeon and
slow the spin down on later processors, hence reducing power burn; no
thread switching.

http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011Update/compiler_c/intref_cls/common/intref_sse2_pause.htm

>
> Andrew.

[toc] | [prev] | [next] | [standalone]


#20657

FromAndrew Haley <andrew29@littlepinkcloud.invalid>
Date2013-03-14 03:44 -0500
Message-ID<ZNOdnQ295_xlENzMnZ2dnUVZ_rKdnZ2d@supernews.com>
In reply to#20646
Alex McDonald <blog@rivadpm.com> wrote:
> On Mar 13, 3:39?pm, Andrew Haley <andre...@littlepinkcloud.invalid>
> wrote:
>> Paul Rubin <no.em...@nospam.invalid> wrote:
>> > Andrew Haley <andre...@littlepinkcloud.invalid> writes:
>> >> Not necessarily: even a few laps of an empty loop or just retrying the
>> >> lock may well be enough.
>>
>> > I wonder if spinning is still reasonable with current cpu's, where
>> > memory accesses (to retry the lock) cost 100's of cpu cycles, and
>> > the energy dissipation from spinning is potentially costly in its
>> > own right.
>>
>> It is.  We have snoopy caches, and you won't retry many times before
>> blocking.  Even on a fast box, blocking for a futex takes at least a
>> microsecond whereas a read from L3 is tens of nanoseconds.
>>
>> > Is there any hardware assist for this type of thing?
>>
>> Yes, there is.  Intel has the PAUSE instruction, which allows another
>> thread to use the processor.  It's recommended for just this purpose.
> 
> It does? I thought it was to reduce certain side effects on Xeon and
> slow the spin down on later processors, hence reducing power burn; no
> thread switching.

Hmmm.  The documentation isn't so great.  PAUSE is certainly
recommended in spin-wait loops and it causes a small delay.  I confess
that I had assumed that the other thread in a hyperthreaded core would
run, but this may well not be the case.

http://software.intel.com/en-us/articles/long-duration-spin-wait-loops-on-hyper-threading-technology-enabled-intel-processors

Andrew.

[toc] | [prev] | [next] | [standalone]


#20660

Fromanton@mips.complang.tuwien.ac.at (Anton Ertl)
Date2013-03-14 09:06 +0000
Message-ID<2013Mar14.100629@mips.complang.tuwien.ac.at>
In reply to#20657
Andrew Haley <andrew29@littlepinkcloud.invalid> writes:
[PAUSE on Intel CPUs]
>Hmmm.  The documentation isn't so great.  PAUSE is certainly
>recommended in spin-wait loops and it causes a small delay.  I confess
>that I had assumed that the other thread in a hyperthreaded core would
>run, but this may well not be the case.

In SMT (simultaneous multi-threading) all the threads that are on a
CPU are running.  This comes out clearer in the technical term _S_MT
than in the marketing name "hyperthreading".  One might have
instructions for changing the priority of resource allocation, but I
don't think that any SMT processor implements such priorities.

Concerning what PAUSE really does,
<http://www.postgresql.org/message-id/3FECD103.5040105@colorfullife.com>
looks plausible.  It says that (on the Pentium 4) it just causes a
delay; the main purpose apparently is to avoid some Pentium-4-specific
performance penalty, but as a side-effect it also reduces power
consumption.

- anton
-- 
M. Anton Ertl  http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
     New standard: http://www.forth200x.org/forth200x.html
   EuroForth 2013: http://www.euroforth.org/ef13/

[toc] | [prev] | [next] | [standalone]


#20662

FromAndrew Haley <andrew29@littlepinkcloud.invalid>
Date2013-03-14 06:26 -0500
Message-ID<OqCdnQbDBsxyLtzMnZ2dnUVZ_sCdnZ2d@supernews.com>
In reply to#20660
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
> Andrew Haley <andrew29@littlepinkcloud.invalid> writes:
> [PAUSE on Intel CPUs]
>>Hmmm.  The documentation isn't so great.  PAUSE is certainly
>>recommended in spin-wait loops and it causes a small delay.  I confess
>>that I had assumed that the other thread in a hyperthreaded core would
>>run, but this may well not be the case.
> 
> In SMT (simultaneous multi-threading) all the threads that are on a
> CPU are running. 

Of course.

> This comes out clearer in the technical term _S_MT than in the
> marketing name "hyperthreading".  One might have instructions for
> changing the priority of resource allocation, but I don't think that
> any SMT processor implements such priorities.

Certainly not, no.  Priorities are the domain of scheduler software.

> Concerning what PAUSE really does,
> <http://www.postgresql.org/message-id/3FECD103.5040105@colorfullife.com>
> looks plausible.  It says that (on the Pentium 4) it just causes a
> delay; the main purpose apparently is to avoid some Pentium-4-specific
> performance penalty, but as a side-effect it also reduces power
> consumption.

That seems reasonable.  However, I doubt that PAUSE delays the other
thread running in the same core, and thus it allows the other thread
full access to the core's execution units rather than pointlessly
spinning, which was my point.  If it really does stall both active
threads, then I'm wrong.

Andrew.

[toc] | [prev] | [next] | [standalone]


#20665

Fromanton@mips.complang.tuwien.ac.at (Anton Ertl)
Date2013-03-14 15:41 +0000
Message-ID<2013Mar14.164137@mips.complang.tuwien.ac.at>
In reply to#20662
Andrew Haley <andrew29@littlepinkcloud.invalid> writes:
>Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
>> This comes out clearer in the technical term _S_MT than in the
>> marketing name "hyperthreading".  One might have instructions for
>> changing the priority of resource allocation, but I don't think that
>> any SMT processor implements such priorities.
>
>Certainly not, no.

I would not put it in the realm of the impossible, it just has not
been done AFAIK.  E.g., one thread could get all the resources it
asks for, and the others would get the remainder.

>> Concerning what PAUSE really does,
>> <http://www.postgresql.org/message-id/3FECD103.5040105@colorfullife.com>
>> looks plausible.  It says that (on the Pentium 4) it just causes a
>> delay; the main purpose apparently is to avoid some Pentium-4-specific
>> performance penalty, but as a side-effect it also reduces power
>> consumption.
>
>That seems reasonable.  However, I doubt that PAUSE delays the other
>thread running in the same core, and thus it allows the other thread
>full access to the core's execution units

Yes, that's how I understand it.  

- anton
-- 
M. Anton Ertl  http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
     New standard: http://www.forth200x.org/forth200x.html
   EuroForth 2013: http://www.euroforth.org/ef13/

[toc] | [prev] | [next] | [standalone]


#20668

FromBernd Paysan <bernd.paysan@gmx.de>
Date2013-03-14 17:56 +0100
Message-ID<khsvfj$e1c$1@online.de>
In reply to#20639
Paul Rubin wrote:

> Andrew Haley <andrew29@littlepinkcloud.invalid> writes:
>> Not necessarily: even a few laps of an empty loop or just retrying the
>> lock may well be enough.
> 
> I wonder if spinning is still reasonable with current cpu's, where
> memory accesses (to retry the lock) cost 100's of cpu cycles, and the
> energy dissipation from spinning is potentially costly in its own right.
> Is there any hardware assist for this type of thing?

Transactional memory should give us the necessary assists.  The way a 
transactional memory works has two parts: One side to observe others 
stealing the cache-lines you read from within the transaction, the other is 
to commit the stores you made as atomic writes.  Fortunately, most of what 
you need there as hardware has been available for literally decades...

So you have a few cache lines to observe, and a few cache lines to keep as 
uncommitted copies.  The code looks like

begin transaction
  do some reads
  do some computations
  do some shadow writes
atomic commit

If you lost some reads or someone messed with the cache lines you write 
into, the atomic comit will fail and jump again to the begin.

Now, if you are waiting for a lock, you might want some "pause", i.e.

begin transaction
  read a lock
  check if it is free
  if not: pause and repeat
  do a shadow write to aquire the lock
atomic commit

PAUSE here would mean "wait until one of the observed cache lines is 
modified".  My PAUSE would have a timeout, because when the timeout expires, 
you should rethink your spin loop strategy, and do a full block.

-- 
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://bernd-paysan.de/

[toc] | [prev] | [next] | [standalone]


Page 1 of 2  [1] 2  Next page →

Back to top | Article view | comp.lang.forth


csiph-web