Groups > comp.programming.threads > #1783

Re: forcing the compiler to reload from memory with c++0x

From	frege <gottlobfrege@gmail.com>
Newsgroups	comp.programming.threads
Subject	Re: forcing the compiler to reload from memory with c++0x
Date	2011-01-27 21:34 -0800
Organization	http://groups.google.com
Message-ID	<cd471590-cf8d-4830-b6fb-38aa2b7d4f30@h16g2000yqh.googlegroups.com> (permalink)
References	(1 earlier) <1d21ad0a-db55-460e-aeb6-66f6d19369ea@i13g2000yqe.googlegroups.com> <4d3e2e3f$0$21517$426a34cc@news.free.fr> <4D40A4C7.16E6F9AE@web.de> <69d82db7-34be-4de7-86c4-2f39cc1e6df3@m13g2000yqb.googlegroups.com> <aebb655b-6821-4c19-a170-71633273366c@k32g2000vbn.googlegroups.com>

Show all headers | View raw

On Jan 27, 1:48 am, Dmitriy Vyukov <dvyu...@gmail.com> wrote:
> On Jan 27, 7:43 am, frege <gottlobfr...@gmail.com> wrote:
>
>
>
> > Either thread may see "stale" values, but never anything out of order.
> > The interesting thing is that, according to the language and memory
> > model, there is no real limit on how long a stale value can last.  I
> > worry about optimizing compilers that start optimizing based on the
> > new memory model - ie since no memory_order constraints were
> > specified, can the compiler decide to optimize away memory reads?  For
> > how long?  What does "memory" really mean...
>
> Visibility and ordering are different things.

agreed.  Tend to be related, but agreed.

> A Compiler can do that
> with or without memory fences.

I assume by "that" you mean "optimize away memory reads".  I suppose
so.  But I worry that with the new/better/defined memory model the
compiler can technically do *more* optimizing than it could previously
get away with.  ie now that it is (hopefully) well defined, then the
compiler has a better argument for why it did the optimization.

It is a bit of a vague idea at this point, I just wonder/fear that we
are opening a door that we don't realize yet.  I'm not even sure what
those optimizations would be.  Just something general like "I see this
memory is accessed by this thread and that thread, but it is being
done without synchronization of any kind, so I am free to *never* (or
nearly never) actually write the memory from this thread...

Maybe I am being paranoid.

The example further below might help...

> Fences are for ordering, not for
> visibility. However, any sane compiler won't do that equally with or
> without memory fences.
>

You are probably right.

> By the way, those are not "stale" values. Since there is no way to
> gather any information about other threads other than based on values
> being read, what you read is always "current".
> Why you call the value "stale"? Do you observe any newer value? No? So
> why don't you just call what is the most fresh value "current"?
>

Sorry.  I knew "stale" was a bad choice of words when I typed it.
I've complained when others have used it with threading and
visibility, so I should have known better.  I was just being lazy.   I
guess what I meant was that technically the other thread may have
already issued the writes to its write request queue, so that it is
"done" from that thread's perspective.  But, you are right, a CPU can
only see what it can read, so what is read is as current as anything.
I usually describe this as "you can't take the lid off the computer
and look at the memory, you can only see what the CPU and code tells
you it sees" or something like that.

again, see the example, below...

> > Anyhow, I would say that this is one of those rare cases where
> > volatile is the right answer - you just need to adds lots of code
> > comments as to why.  (Particularly since volatile is often mis-used in
> > threading, and conversely it is typically stated as "useless" by those
> > knowledgeable about threading - and I would agree with that 99% of the
> > time.  This just isn't one of those times.)
>
> So, do you mean that you are going to use volatile and you will be in
> a lucky 1% of people whose programs will work, and zillions of other
> programs will be broken soon?

I'm saying that volatile is all that is needed in this situation.
Possibly depending on what the proper definition of volatile (and
memory) is.  But in particular, fences and memory ordering are NOT
needed.  And that this is rare in threading.  Typically volatile isn't
very useful with threads, and novices think it is.  This is a counter-
example.

So, the example
What I think was originally being described (and I have now read the
paper!), was a queue that essentially looks like this:

[....0000XXXXXX0000.....]

ie a circular buffer, say an int buffer, where empty slots are marked
as 0, and non-empty are any non-zero values - ie X.  So the head is at
the first X and the tail is at the 0 after the last X.

Push() obviously writes a new X over the 0 at tail.
Pop() reads X at head and then replaces it with a 0.

So the question is - where are the fences needed?  I (and the paper
and its references) think none are needed.
In particular, empty/full is checked by looking at the X/0 state of
the buffer, not comparing head to tail, so we don't need visibility or
synchronization on head and tail read/writes.

Yet, like the original question, we do need to prevent the compiler
from completely caching things into registers.  It appears we need
volatile - no more, no less - strangely enough.

So I thought Alexander was basically agreeing with that (always hard
to tell with Alexander).  And I chimed in since it was something I had
been thinking about recently (I didn't know this type of queue had a
name or names associated with it - I assume someone had thought about
it before me, I just hadn't read any literature on it).

And back to my original thoughts - looking at this example, does it
shed any insight into compiler optimizations, threading, visibility,
etc.  In particular, how long can the reader see 0 "after" the writer
has issued its write of X.  Without a memory_order_... could it never
see X? More precisely - has anything changed in this regard, compared
to before C++ had a memory model.  I think you are saying no, but I'm
wondering if old compilers were working based on general "unwritten"
rules (particularly since threading wasn't a C++ rule), but now that
the rules are clear, will they eventually stretch the rules farther
than they previously had?

>
> --
> Dmitriy V'jukov

Tony

Back to comp.programming.threads | Previous | Next — Next in thread | Find similar

Thread

Re: forcing the compiler to reload from memory with c++0x frege <gottlobfrege@gmail.com> - 2011-01-27 21:34 -0800
  Re: forcing the compiler to reload from memory with c++0x frege <gottlobfrege@gmail.com> - 2011-01-28 19:43 -0800
    Re: forcing the compiler to reload from memory with c++0x Anthony Williams <anthony.ajw@gmail.com> - 2011-01-30 22:07 +0000
  Re: forcing the compiler to reload from memory with c++0x Anthony Williams <anthony.ajw@gmail.com> - 2011-01-28 08:10 +0000
    Re: forcing the compiler to reload from memory with c++0x Dmitriy Vyukov <dvyukov@gmail.com> - 2011-01-28 00:45 -0800
      Re: forcing the compiler to reload from memory with c++0x Anthony Williams <anthony.ajw@gmail.com> - 2011-01-28 09:31 +0000
  Re: forcing the compiler to reload from memory with c++0x Dmitriy Vyukov <dvyukov@gmail.com> - 2011-01-28 00:09 -0800
    Re: forcing the compiler to reload from memory with c++0x Alexander Terekhov <terekhov@web.de> - 2011-01-29 15:49 +0100
      Re: forcing the compiler to reload from memory with c++0x Andy Venikov <swojchelowek@gmail.com> - 2011-01-31 15:53 -0500
    Re: forcing the compiler to reload from memory with c++0x frege <gottlobfrege@gmail.com> - 2011-01-28 19:28 -0800
      Re: forcing the compiler to reload from memory with c++0x Dmitriy Vyukov <dvyukov@gmail.com> - 2011-01-29 01:41 -0800

csiph-web