Re: Refreshing cpu cache before atomic relaxed loads

Message-ID	<c7f2c29a-3cdb-409a-bf5b-9fb30efc90cf@googlegroups.com> (permalink)
Newsgroups	comp.std.c++
From	itaj sherman <itajsherman@googlemail.com>
Subject	Re: Refreshing cpu cache before atomic relaxed loads
Organization	unknown
References	<f2af6f2a-c936-41c5-9461-9b0d89299993@googlegroups.com>
Date	2015-06-26 01:29 -0600

Show all headers | View raw

On Monday, June 22, 2015 at 10:30:04 PM UTC+3, itaj sherman wrote:
> My question turned up while implementing spin-wait
> when a thread needs to wait for another thread to complete a short work,
> so short that locking/releasing a mutex might take longer than the work.

I guess it's important to add:
knowing that the other thread is actually currently doing that work,
which is shorter than thread context switch (and mutex/condvar operations).

> The problem the way I see it is the standard does not explicitly define a
> clear way to refresh the cpu cache before re-loading an atomic variable.
>
> In my example code below, atomic_oneway_flag::spin_wait_flag has 3
suggested
> implementations. They are equivalent w.r.t memory ordering as seen by user
> code. But they might not be equal in speed.
>
> Specifically, it seems that practically (correct me if I'm wrong),
> implementation-2 of atomic_oneway_flag::spin_wait_flag below
> is faster/better than implementation-1.
> I.e. load( ..., memory_order_acquire ) or fence( memory_order_acquire )
>
> Now, is it somehow implied by the standard that an acquire operation might
> refresh the following loads faster than a relaxed operation?
> I cannot see that it is.
> Thus, I would expect to do something like implementation-3 below using an
> operation that explicitly and specifically refreshes the cache for the
next
> relaxed load.
>

Seems possibly I'm asking about something like the x86 "pause" instruction.
As explained here:
http://www.quora.com/What-is-the-purpose-of-the-pause-instruction-in-the-x86-ISA

I've seen this instruction is used in implementation of
boost::atomics::detail::pause().

So is there anything like that in the standard?
Was it ever discussed?


> ....
>
>   public: void spin_wait_flag() //implementation 3
>   {
>     while( true ) {
>       bool x( std::atomic::load( m, memory_order_relaxed ) );
>       if( x ) {
>         std::atomic::fence( memory_order_acquire );
>         return;
>       } else {
>         /* Some code that refreshes the cache for the */
>         /* following relaxed load. */
>         /* Supposedly std::atomic::load_memory_barrier(); */
>       }
>     }
>   }
>

to fix implementation 3 with that:

  public: void spin_wait_flag() //implementation 3
  {
    while( true ) {
      bool x( std::atomic::load( m, memory_order_relaxed ) );
      if( x ) {
        std::atomic::fence( memory_order_acquire );
        return;
      } else {
        /* Some code that refreshes the cache for the */
        /* following relaxed load. */
       boost::atomics::detail::pause();
      }
    }
  }

//code from boost_1_58_0\boost\atomic\detail\pause.hpp(30)

BOOST_FORCEINLINE void pause() BOOST_NOEXCEPT
{
#if defined(_MSC_VER) && (defined(_M_AMD64) || defined(_M_IX86))
    _mm_pause();
#elif defined(__GNUC__) && (defined(__i386__) || defined(__x86_64__))
    __asm__ __volatile__("pause;");
#endif
}

> ...
>
> regards,
> itaj
>


regards,
itaj


--
[ comp.std.c++ is moderated.  To submit articles, try posting with your ]
[ newsreader.  If that fails, use mailto:std-cpp-submit@vandevoorde.com ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Back to comp.std.c++ | Previous | Next — Previous in thread | Find similar

Thread

Refreshing cpu cache before atomic relaxed loads itaj sherman <itajsherman@googlemail.com> - 2015-06-22 14:22 -0600
  Re: Refreshing cpu cache before atomic relaxed loads via.usov@googlemail.com - 2015-06-25 14:32 -0600
  Re: Refreshing cpu cache before atomic relaxed loads itaj sherman <itajsherman@googlemail.com> - 2015-06-26 01:29 -0600

csiph-web