Groups > comp.lang.python > #69792 > unrolled thread

Re: threading

Started by	Ben Finney <ben+python@benfinney.id.au>
First post	2014-04-07 13:05 +1000
Last post	2014-04-08 15:19 +0000
Articles	20 on this page of 105 — 22 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.

  Re: threading Ben Finney <ben+python@benfinney.id.au> - 2014-04-07 13:05 +1000
    Re: threading Roy Smith <roy@panix.com> - 2014-04-06 23:48 -0400
      Re: threading Chris Angelico <rosuav@gmail.com> - 2014-04-07 13:56 +1000
        Re: threading Roy Smith <roy@panix.com> - 2014-04-07 08:26 -0400
          Re: threading Chris Angelico <rosuav@gmail.com> - 2014-04-07 22:34 +1000
            Re: threading Roy Smith <roy@panix.com> - 2014-04-07 09:22 -0400
              Re: threading Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-04-07 14:41 +0100
              Re: threading Marko Rauhamaa <marko@pacujo.net> - 2014-04-07 16:49 +0300
                Re: threading Chris Angelico <rosuav@gmail.com> - 2014-04-08 00:27 +1000
                  Re: threading Marko Rauhamaa <marko@pacujo.net> - 2014-04-07 17:51 +0300
                    Re: threading Chris Angelico <rosuav@gmail.com> - 2014-04-08 01:12 +1000
              Re: threading Chris Angelico <rosuav@gmail.com> - 2014-04-08 00:24 +1000
        Re: threading Rick Johnson <rantingrickjohnson@gmail.com> - 2014-04-08 18:09 -0700
          Re: threading "Neil D. Cerutti" <neilc@norwich.edu> - 2014-04-09 09:50 -0400
            Re: threading Rick Johnson <rantingrickjohnson@gmail.com> - 2014-04-09 08:51 -0700
              Re: threading MRAB <python@mrabarnett.plus.com> - 2014-04-09 18:47 +0100
                Re: threading Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-04-10 11:35 +1200
                  Re: threading Roy Smith <roy@panix.com> - 2014-04-09 19:53 -0400
                    Re: threading Andrew Berg <robotsondrugs@gmail.com> - 2014-04-09 19:02 -0500
                    Re: threading Steven D'Aprano <steve@pearwood.info> - 2014-04-10 02:43 +0000
                      Re: threading Chris Angelico <rosuav@gmail.com> - 2014-04-10 13:08 +1000
                    Re: threading Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-04-10 09:23 +0100
                    Re: threading Chris Angelico <rosuav@gmail.com> - 2014-04-10 19:11 +1000
              Re: threading Chris Angelico <rosuav@gmail.com> - 2014-04-10 04:00 +1000
              Re: threading Steven D'Aprano <steve@pearwood.info> - 2014-04-10 03:44 +0000
                Re: threading Chris Angelico <rosuav@gmail.com> - 2014-04-10 13:54 +1000
      Re: threading Ben Finney <ben+python@benfinney.id.au> - 2014-04-07 15:22 +1000
      Re: threading Ethan Furman <ethan@stoneleaf.us> - 2014-04-08 11:09 -0700
      Re: threading Sturla Molden <sturla.molden@gmail.com> - 2014-04-08 21:41 +0200
        Re: threading Grant Edwards <invalid@invalid.invalid> - 2014-04-08 20:30 +0000
          Re: threading Sturla Molden <sturla.molden@gmail.com> - 2014-04-09 00:32 +0200
            Re: threading Rustom Mody <rustompmody@gmail.com> - 2014-04-08 19:17 -0700
    Re: threading Marko Rauhamaa <marko@pacujo.net> - 2014-04-07 08:10 +0300
      Re: threading Paul Rubin <no.email@nospam.invalid> - 2014-04-06 22:39 -0700
        Re: threading Marko Rauhamaa <marko@pacujo.net> - 2014-04-07 08:46 +0300
        Re: threading Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2014-04-07 19:47 -0400
          Re: threading Marko Rauhamaa <marko@pacujo.net> - 2014-04-08 08:19 +0300
            Re: threading Sturla Molden <sturla.molden@gmail.com> - 2014-04-08 10:47 +0000
              Re: threading Marko Rauhamaa <marko@pacujo.net> - 2014-04-08 15:10 +0300
                Re: threading Sturla Molden <sturla.molden@gmail.com> - 2014-04-08 16:37 +0000
                  Re: threading Marko Rauhamaa <marko@pacujo.net> - 2014-04-08 20:17 +0300
              Re: threading Roy Smith <roy@panix.com> - 2014-04-08 09:19 -0400
                Re: threading Sturla Molden <sturla.molden@gmail.com> - 2014-04-08 15:44 +0000
                  Re: threading Paul Rubin <no.email@nospam.invalid> - 2014-04-08 09:38 -0700
                    Re: threading Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-04-09 14:42 +0100
            Re: threading "Frank Millman" <frank@chagford.com> - 2014-04-09 15:23 +0200
              Re: threading Marko Rauhamaa <marko@pacujo.net> - 2014-04-09 16:55 +0300
                Re: threading "Frank Millman" <frank@chagford.com> - 2014-04-09 16:46 +0200
                  Re: threading Marko Rauhamaa <marko@pacujo.net> - 2014-04-09 20:31 +0300
                    Re: threading Chris Angelico <rosuav@gmail.com> - 2014-04-10 03:52 +1000
                      Re: threading Mark H Harris <harrismh777@gmail.com> - 2014-04-10 08:29 -0500
                    Re: threading Sturla Molden <sturla.molden@gmail.com> - 2014-04-09 19:20 +0000
            Re: threading Chris Angelico <rosuav@gmail.com> - 2014-04-09 23:47 +1000
              Re: threading Roy Smith <roy@panix.com> - 2014-04-09 10:44 -0400
            Re: threading "Frank Millman" <frank@chagford.com> - 2014-04-09 16:30 +0200
              Re: threading Roy Smith <roy@panix.com> - 2014-04-09 10:52 -0400
                Re: threading Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-04-10 11:19 +1200
              Re: threading Marko Rauhamaa <marko@pacujo.net> - 2014-04-09 19:48 +0300
            Re: threading Chris Angelico <rosuav@gmail.com> - 2014-04-10 00:44 +1000
            Re: threading Sturla Molden <sturla.molden@gmail.com> - 2014-04-09 15:29 +0000
            Re: threading Terry Reedy <tjreedy@udel.edu> - 2014-04-09 12:14 -0400
            Re: threading Chris Angelico <rosuav@gmail.com> - 2014-04-10 02:25 +1000
            Re: threading Sturla Molden <sturla.molden@gmail.com> - 2014-04-09 16:32 +0000
            Re: threading Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2014-04-09 19:44 -0400
            Re: threading Chris Angelico <rosuav@gmail.com> - 2014-04-10 11:05 +1000
            Re: threading "Frank Millman" <frank@chagford.com> - 2014-04-10 11:17 +0200
            Re: threading Chris Angelico <rosuav@gmail.com> - 2014-04-10 19:40 +1000
            Re: threading "Frank Millman" <frank@chagford.com> - 2014-04-10 13:10 +0200
              Re: threading Marko Rauhamaa <marko@pacujo.net> - 2014-04-10 14:43 +0300
                Re: threading Roy Smith <roy@panix.com> - 2014-04-10 08:56 -0400
                Re: threading Sturla Molden <sturla.molden@gmail.com> - 2014-04-10 15:24 +0000
                  Re: threading Marko Rauhamaa <marko@pacujo.net> - 2014-04-10 19:20 +0300
                Re: threading Chris Angelico <rosuav@gmail.com> - 2014-04-11 01:32 +1000
                  Re: threading Marko Rauhamaa <marko@pacujo.net> - 2014-04-10 19:25 +0300
                    Re: threading Chris Angelico <rosuav@gmail.com> - 2014-04-11 03:08 +1000
                      Re: threading Rustom Mody <rustompmody@gmail.com> - 2014-04-10 11:14 -0700
                        Re: threading Marko Rauhamaa <marko@pacujo.net> - 2014-04-10 22:44 +0300
                          Re: threading Rustom Mody <rustompmody@gmail.com> - 2014-04-10 13:21 -0700
                            Re: threading Marko Rauhamaa <marko@pacujo.net> - 2014-04-10 23:44 +0300
                              Re: threading Rustom Mody <rustompmody@gmail.com> - 2014-04-10 22:15 -0700
                                Re: threading Rustom Mody <rustompmody@gmail.com> - 2014-04-10 23:50 -0700
                                  Re: threading Marko Rauhamaa <marko@pacujo.net> - 2014-04-11 18:36 +0300
                                    Re: threading Chris Angelico <rosuav@gmail.com> - 2014-04-12 01:53 +1000
                                    Re: threading Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-04-11 16:58 +0100
                                    Re: threading Rustom Mody <rustompmody@gmail.com> - 2014-04-11 11:54 -0700
                                      Re: threading Marko Rauhamaa <marko@pacujo.net> - 2014-04-11 22:27 +0300
                          Re: threading Sturla Molden <sturla.molden@gmail.com> - 2014-04-11 01:51 +0200
                            Re: threading Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-04-11 05:35 +0000
                              Re: threading Sturla Molden <sturla.molden@gmail.com> - 2014-04-11 09:26 +0000
                              Re: threading Roy Smith <roy@panix.com> - 2014-04-11 08:36 -0400
                                Re: threading Grant Edwards <invalid@invalid.invalid> - 2014-04-11 16:18 +0000
                          Re: threading Sturla Molden <sturla.molden@gmail.com> - 2014-04-11 02:21 +0200
                          Re: threading Terry Reedy <tjreedy@udel.edu> - 2014-04-10 20:23 -0400
            Re: threading Chris Angelico <rosuav@gmail.com> - 2014-04-10 21:19 +1000
        Re: threading Sturla Molden <sturla.molden@gmail.com> - 2014-04-08 02:06 +0000
          Re: threading alister <alister.nospam.ware@ntlworld.com> - 2014-04-08 11:07 +0000
            Re: threading Roy Smith <roy@panix.com> - 2014-04-08 09:13 -0400
              Re: threading Chris Angelico <rosuav@gmail.com> - 2014-04-08 23:23 +1000
                Re: threading alister <alister.nospam.ware@ntlworld.com> - 2014-04-08 14:15 +0000
                  Re: threading Sturla Molden <sturla.molden@gmail.com> - 2014-04-08 16:06 +0000
              Re: threading Sturla Molden <sturla.molden@gmail.com> - 2014-04-08 15:40 +0000
                Re: threading Paul Rubin <no.email@nospam.invalid> - 2014-04-08 09:46 -0700
                  Re: threading Chris Angelico <rosuav@gmail.com> - 2014-04-09 02:46 +1000
                  Re: threading Sturla Molden <sturla.molden@gmail.com> - 2014-04-08 17:17 +0000
            Re: threading Sturla Molden <sturla.molden@gmail.com> - 2014-04-08 15:19 +0000

Page 4 of 6 — ← Prev page 1 2 3 [4] 5 6 Next page →

#69962

From	Terry Reedy <tjreedy@udel.edu>
Date	2014-04-09 12:14 -0400
Message-ID	<mailman.9082.1397060085.18130.python-list@python.org>
In reply to	#69832

On 4/9/2014 10:30 AM, Frank Millman wrote:

> In other words, non-blocking implies that everything required to pass off
> the request to a handler and be ready to deal with the next one must already
> be in memory, and it must not rely on communicating with any outside
> resource at all. Is this correct?

Chris said no, I would have said yes, but I think we understand the 
above differently. The important point is that there are two goals.

The first is to avoid having the cpu sitting idle when there is work to 
be done. Switching processes, switching threads within a process, and 
switching tasks within a thread are all aimed at this. (So are compiler 
code rearrangements that aim to keep various parts of a cpu, such 
integer and float arithmetic units, active simultaneously.)

The second, usually, is to keep the system responsive by not letting any 
particular work unit hog the cpu. But note that is work units are made 
too small, cpu time is wasted in excessive switching overhead.

A handler should neither waste nor monopolize cpu time. If input data is 
needed for a long computation, the handler should store the data where 
it needs to be for the computation but leave the actual computation to a 
background or idle task that runs when there is nothing else to do.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]

#69964

From	Chris Angelico <rosuav@gmail.com>
Date	2014-04-10 02:25 +1000
Message-ID	<mailman.9084.1397060752.18130.python-list@python.org>
In reply to	#69832

On Thu, Apr 10, 2014 at 2:14 AM, Terry Reedy <tjreedy@udel.edu> wrote:
> On 4/9/2014 10:30 AM, Frank Millman wrote:
>
>> In other words, non-blocking implies that everything required to pass off
>> the request to a handler and be ready to deal with the next one must
>> already
>> be in memory, and it must not rely on communicating with any outside
>> resource at all. Is this correct?
>
>
> Chris said no, I would have said yes, but I think we understand the above
> differently.

I said no because I see asynchronous I/O as a perfectly viable
structure for a program, which means that a non-blocking handler is
allowed to communicate with outside resources. Conversely, if you see
"a non-blocking handler" as meaning the one small piece that runs
uninterruptibly, then you might say that yes, it must not rely on any
outside resource.

Of course, it depends on where you're looking. Memory is itself an
outside resource that can potentially take a long time to give a
result - just look at what happens when you dip into swap space, and
RAM accesses become disk accesses. But generally, you go asynchronous
in order to increase your throughput; and if you're churning through
your page file, well, that's going to kill throughput whichever way
you look at it. It's generally safe enough to pretend that RAM can be
accessed in-line, and worry about the slowdowns elsewhere.

ChrisA

[toc] | [prev] | [next] | [standalone]

#69965

From	Sturla Molden <sturla.molden@gmail.com>
Date	2014-04-09 16:32 +0000
Message-ID	<mailman.9085.1397061189.18130.python-list@python.org>
In reply to	#69832

Sturla Molden <sturla.molden@gmail.com> wrote:

> 3. It is nice to be able to abort a read or write that hangs (for whatever
> reason). Killing a thread with pthread_cancel or TerminateThread is not
> recommended.

While "graceful timeout" is easy to do on Unix, using fcntl.fcntl or
signal.alarm, on Windows it requires overlapped I/O. This means the normal
Python file objects cannot be used for this purpose on Windows.

Sturla

[toc] | [prev] | [next] | [standalone]

#69989

From	Dennis Lee Bieber <wlfraed@ix.netcom.com>
Date	2014-04-09 19:44 -0400
Message-ID	<mailman.9099.1397087086.18130.python-list@python.org>
In reply to	#69832

On Wed, 9 Apr 2014 23:47:04 +1000, Chris Angelico <rosuav@gmail.com>
declaimed the following:

>won't block. You might think "Duh, how can printing to the screen
>block?!?", but if your program's output is being piped into something
>else, it most certainly can :) If that were writing to a remote

	Heck, even if it isn't blocking per se, it may still be enough to slow
down the whole system (over the past year I've had to characterize through
put on some systems -- and the console logging of "exceptions"* slowed the
overall data rate significantly)


* The unit providers' idea of "exception to be logged" just happened to be
something our intended application considered normal; hence our test data
produced LOTS of "exceptions".
-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
    wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

[toc] | [prev] | [next] | [standalone]

#69996

From	Chris Angelico <rosuav@gmail.com>
Date	2014-04-10 11:05 +1000
Message-ID	<mailman.9103.1397091924.18130.python-list@python.org>
In reply to	#69832

On Thu, Apr 10, 2014 at 9:44 AM, Dennis Lee Bieber
<wlfraed@ix.netcom.com> wrote:
> On Wed, 9 Apr 2014 23:47:04 +1000, Chris Angelico <rosuav@gmail.com>
> declaimed the following:
>
>>won't block. You might think "Duh, how can printing to the screen
>>block?!?", but if your program's output is being piped into something
>>else, it most certainly can :) If that were writing to a remote
>
>         Heck, even if it isn't blocking per se, it may still be enough to slow
> down the whole system (over the past year I've had to characterize through
> put on some systems -- and the console logging of "exceptions"* slowed the
> overall data rate significantly)

Oh yes, definitely. Console output can be *slow*. Back in my earliest
programming days, I'd often have a program that iterated over sub-jobs
from either 0 or 1 up to some unknown top (so I can't show a
percent-done), and the obvious thing to do is (rewritten in Python):

i = 0
while stuff_to_do():
    i += 1
    print(i, end="\r")
    do_more_stuff()
print(i)

Hmm, that's really slow. I know! I'll speed this up by printing out
only once a second. That should be way faster, right? Let's see.

i = time_printed = 0
while stuff_to_do():
    i += 1
    if int(time.time()) != time_printed:
        print(i, end="\r")
        time_printed = int(time.time())
    do_more_stuff()
print(i)

And that made it... waaaay slower. Turns out clock querying (at least
on those systems) is pretty slow too, even more so than console
output. Of course, what we ended up settling on was something like
this, which *does* make sense:

i = 0
while stuff_to_do():
    i += 1
    if i & 255 == 0: print(i, end="\r")
    do_more_stuff()
print(i)

replacing 255 with any number one less than a power of two, so it'd
print out every however-many-th (in this case, every 256th), using
bitwise operations rather than division.

But yeah, console output isn't something you want when you're going
for maximum throughput. Heh.

ChrisA

[toc] | [prev] | [next] | [standalone]

#70027

From	"Frank Millman" <frank@chagford.com>
Date	2014-04-10 11:17 +0200
Message-ID	<mailman.9129.1397121485.18130.python-list@python.org>
In reply to	#69832

"Chris Angelico" <rosuav@gmail.com> wrote in message 
news:CAPTjJmq2xx_WG2ymCC0NNqisDO=DNnJhneGPiD3DE+xeiy5hjg@mail.gmail.com...
> On Thu, Apr 10, 2014 at 12:30 AM, Frank Millman <frank@chagford.com> 
> wrote:
>>
>>>
>>>> How does one distinguish betwen 'blocking' and 'non-blocking'? Is it
>>>> either/or, or is it some arbitrary timeout - if a handler returns 
>>>> within
>>>> that time it is non-blocking, but if it exceeds it it is blocking?
>>>
>>> No; a blocking request is one that waits until it has a response, and
>>> a non-blocking request is one that goes off and does something, and
>>> then comes back to you when it's done.
>>

Thanks for that clarification - I think I've got it now.

>>> def nonblocking_query(id):
>>>    print("Finding out who employee #%d is..."%id)
>>>    def nextstep(res):
>>>        print("Employee #%d is %s."%(id,res[0].name))
>>>    db.asyncquery(nextstep, "select name from emp where id=12345")
>>>
>>
>> In this example, what is 'db.asyncquery'?
>>
>> If you mean that you have a separate thread to handle database queries, 
>> and
>> you use a queue or other message-passing mechanism to hand it the query 
>> and
>> get the result, then I understand it. If not, can you explain in more
>> detail.
>
> It's an imaginary function that would send a request to the database,
> and then call some callback function when the result arrives. If the
> database connection is via a TCP/IP socket, that could be handled by
> writing the query to the socket, and then when data comes back from
> the socket, looking up the callback and calling it. There's no
> additional thread here.
>

I need some time to get my head around that, but meanwhile can you resolve 
this stumbling block?

The current version of my program uses HTTP. As I understand it, a client 
makes a connection and submits a request. The server processes the request 
and returns a result. The connection is then closed.

In this scenario, does async apply at all? There is no open connection to 
'select' or 'poll'. You have to ensure that the request handler does not 
block the entire process, so that the main loop is ready to accept more 
connections. But passing the request to a thread for handling seems an 
effective solution.

Am I missing something?

Frank

[toc] | [prev] | [next] | [standalone]

#70028

From	Chris Angelico <rosuav@gmail.com>
Date	2014-04-10 19:40 +1000
Message-ID	<mailman.9130.1397122866.18130.python-list@python.org>
In reply to	#69832

On Thu, Apr 10, 2014 at 7:17 PM, Frank Millman <frank@chagford.com> wrote:
> The current version of my program uses HTTP. As I understand it, a client
> makes a connection and submits a request. The server processes the request
> and returns a result. The connection is then closed.
>
> In this scenario, does async apply at all? There is no open connection to
> 'select' or 'poll'. You have to ensure that the request handler does not
> block the entire process, so that the main loop is ready to accept more
> connections. But passing the request to a thread for handling seems an
> effective solution.

Let's take this to a slightly lower level. HTTP is built on top of a
TCP/IP socket. The client connects (usually on port 80), and sends a
string like this:

"""GET /foo/bar/asdf.html HTTP/1.0
Host: www.spam.org
User-Agent: Mozilla/5.0

"""

The server then sends back something like this:

"""HTTP/1.0 200 OK
Content-type: text/html

<html>
<body>
Hello, world!
</body>
</html>
"""

These are carried on a straight-forward bidirectional stream socket,
so the write and read operations (or send and recv, either way) can
potentially block. With a small request, you can kinda assume that the
write won't block, but the read most definitely will: it'll block
until the server writes something for you.

So it follows the usual model of blocking vs non-blocking. In blocking
mode, you do something like this:

data = socket.read()

and it waits until it has something to return. In non-blocking mode,
you do something like this:

def data_available(socket, data):
    # whatever
socket.set_read_callback(data_available)

An HTTP handling library can then build a non-blocking request handler
on top of that, by having data_available parse out the appropriate
information, and return if it doesn't have enough content yet. So it
follows the same model; you send off the request (and don't wait for
it), and then get notified when the result is there.

When you write the server, you effectively have the same principle,
with one additional feature: a listening socket becomes readable
whenever someone connects. So you can select() on that socket, just
like you can with the others, and whenever there's a new connection,
you add it to the collection and listen for requests on all of them.
It's basically the same concept; as soon as you can accept a new
connection, you do so, and then go back to the main loop.

It's pretty simple when you let a lower-level library do the work for
you :) The neat thing is, you can put all of this into a single
program; I can't demo it in Python for you, but I have a Pike kernel
that I wrote for my last job, which can handle a variety of different
asynchronous operations: TCP, UDP (which just sends single packets,
normally), a GUI (in theory), timers, the lot. It has convenience
features for creating a DNS server, an HTTP server, and a stateful
line-based server (covers lots of other protocols, like SMTP). And
(though this bit would be hard to port to Python) it can update itself
without shutting down. Yes, it can take some getting your head around,
but it's well worth it.

ChrisA

[toc] | [prev] | [next] | [standalone]

#70029

From	"Frank Millman" <frank@chagford.com>
Date	2014-04-10 13:10 +0200
Message-ID	<mailman.9131.1397128288.18130.python-list@python.org>
In reply to	#69832

"Chris Angelico" <rosuav@gmail.com> wrote in message 
news:CAPTjJmoWaHPZk=DAxbfJ=9ez2aj=4yf2C8WMbRYoF5VgN6Exsw@mail.gmail.com...
> On Thu, Apr 10, 2014 at 7:17 PM, Frank Millman <frank@chagford.com> wrote:
>> The current version of my program uses HTTP. As I understand it, a client
>> makes a connection and submits a request. The server processes the 
>> request
>> and returns a result. The connection is then closed.
>>
>> In this scenario, does async apply at all? There is no open connection to
>> 'select' or 'poll'. You have to ensure that the request handler does not
>> block the entire process, so that the main loop is ready to accept more
>> connections. But passing the request to a thread for handling seems an
>> effective solution.
>

[...]

Thanks, Chris - I am learning a lot!

I have skipped the first part of your reply, as it seems to refer to the 
client. I am using a web browser as a client, so I don't have to worry about 
programming that.

>
> When you write the server, you effectively have the same principle,
> with one additional feature: a listening socket becomes readable
> whenever someone connects. So you can select() on that socket, just
> like you can with the others, and whenever there's a new connection,
> you add it to the collection and listen for requests on all of them.
> It's basically the same concept; as soon as you can accept a new
> connection, you do so, and then go back to the main loop.
>

This is where it gets interesting. At present I am using cherrypy as a 
server, and I have not checked its internals. However, in the past I have 
dabbled with writing server programs like this -

    while self.running:
        try:
            conn,addr = self.s.accept()
            Session(args=(self, conn)).start()
        except KeyboardInterrupt:
            self.shutdown()

In this scenario, the loop blocks on 'accept'.

You seem to be suggesting that I set the socket to 'non-blocking', use 
select() to determine when a client is trying to connect, and then call 
'accept' on it to create a new connection.

If so, I understand your point. The main loop changes from 'blocking' to 
'non-blocking', which frees it up to perform all kinds of other tasks as 
well. It is no longer just a 'web server', but becomes an 'all-purpose 
server'.

Much food for thought!

Frank

[toc] | [prev] | [next] | [standalone]

#70031

From	Marko Rauhamaa <marko@pacujo.net>
Date	2014-04-10 14:43 +0300
Message-ID	<87wqexmmuc.fsf@elektro.pacujo.net>
In reply to	#70029

"Frank Millman" <frank@chagford.com>:

> You seem to be suggesting that I set the socket to 'non-blocking', use
> select() to determine when a client is trying to connect, and then
> call 'accept' on it to create a new connection.

Yes.

> If so, I understand your point. The main loop changes from 'blocking'
> to 'non-blocking', which frees it up to perform all kinds of other
> tasks as well. It is no longer just a 'web server', but becomes an
> 'all-purpose server'.

The server will do whatever you make it do.

Other points:

 * When you wake up from select() (or poll(), epoll()), you should treat
   it as a hint. The I/O call (accept()) could still raise
   socket.error(EAGAIN).

 * The connections returned from accept() have to be individually
   registered with select() (poll(), epoll()).

 * When you write() into a connection, you may be able to send only part
   of the data or get EAGAIN. You need to choose a buffering strategy --
   you should not block until all data is written out. Also take into
   account how much you are prepared to buffer.

 * There are two main modes of multiplexing: level-triggered and
   edge-triggered. Only epoll() (and kqueue()) support edge-triggered
   wakeups. Edge-triggered requires more discipline from the programmer
   but frees you from having to tell the multiplexing facility if you
   are interested in readability or writability in any given situation.

   Edge-triggered wakeups are only guaranteed after you have gotten an
   EAGAIN from an operation. Make sure you keep on reading/writing until
   you get an EAGAIN. On the other hand, watch out so one connection
   doesn't hog the process because it always has active I/O to perform.

 * You should always be ready to read to prevent deadlocks.

 * Sockets can be half-closed. Your state machines should deal with the
   different combinations gracefully. For example, you might read an EOF
   from the client socket before you have pushed the response out. You
   must not close the socket before the response has finished writing.
   On the other hand, you should not treat the half-closed socket as
   readable.

 * While a single-threaded process will not have proper race conditions,
   you must watch out for preemption. IOW, you might have Object A call
   a method of Object B, which calls some other method of Object A.
   Asyncio has a task queue facility. If you write your own main loop,
   you should also implement a similar task queue. The queue can then be
   used to make such tricky function calls in a safe context.

 * Asyncio provides timers. If you write your own main loop, you should
   also implement your own timers.

   Note that modern software has to tolerate suspension (laptop lid,
   virtual machines). Time is a tricky concept when your server wakes up
   from a coma.

 * Specify explicit states. Your connection objects should have a data
   member named "state" (or similar). Make your state transitions
   explicit and obvious in the code. In fact, log them. Resist the
   temptation of deriving the state implicitly from other object
   information.

 * Most states should be guarded with a timer. Make sure to document for
   each state, which timers are running.

 * In each state, check that you handle all possible events and
   timeouts. The state/transition matrix will be quite sizable even for
   seemingly simple tasks.


Marko

[toc] | [prev] | [next] | [standalone]

#70035

From	Roy Smith <roy@panix.com>
Date	2014-04-10 08:56 -0400
Message-ID	<roy-81616C.08563310042014@news.panix.com>
In reply to	#70031

In article <87wqexmmuc.fsf@elektro.pacujo.net>,
 Marko Rauhamaa <marko@pacujo.net> wrote:

>  * When you wake up from select() (or poll(), epoll()), you should treat
>    it as a hint. The I/O call (accept()) could still raise
>    socket.error(EAGAIN).

People often misunderstand what select() does.  The common misconception 
is that a select()ed descriptor has data waiting to be read.  What the 
man page says is, "A file descriptor is considered ready if it is 
possible to perform the corresponding I/O operation (e.g., read(2)) 
without blocking."  Not blocking includes failing immediately.

And, once you introduce threading, things get even more complicated.  
Imagine two threads, both waiting in a select() call on the same socket.  
Data comes in on that socket.  Both select() calls return.  If both 
threads then do reads on the socket, you've got a race condition.  One 
of them will read the data.  The other will block in the read call, 
because the data has already been read by the other thread!

So, yes, as Marko says, use select() as a hint, but then also do your 
reads in non-blocking mode, and be prepared for them to fail, regardless 
of whether select() said the descriptor was ready.

>    Note that modern software has to tolerate suspension (laptop lid,
>    virtual machines). Time is a tricky concept when your server wakes up
>    from a coma.

Not to mention running in a virtual machine.  Time is an equally tricky 
concept when your hardware clock is really some other piece of software 
playing smoke and mirrors.  I once worked on a time-sensitive system 
which was running in a VM.  The idiots who had configured the thing were 
running ntpd in the VM, to keep its clock in sync.  Normally, this is a 
good thing, but they were ALSO using the hypervisor's clock management 
gizmo (vmtools?) to adjust the VM clock.  The two mechanisms were 
fighting with each other, which did really weird stuff to time.

It took me forever to figure out what was going on.  How does one even 
observe that time is moving around randomly?  I eventually ended up 
writing a trivial NTP client in Python (it's only a few lines of code) 
and periodically logging the difference between the local system clock 
and what my NTP reference was telling me.  Of course, figuring out what 
was going on was the easy part.  Convincing the IT drones to fix the 
problem was considerably more difficult.

>  * In each state, check that you handle all possible events and
>    timeouts. The state/transition matrix will be quite sizable even for
>    seemingly simple tasks.

And, those empty boxes in the state transition matrix which are blank, 
because those transitions are impossible?  Guess what, they happen, and 
you better have a plan for when they do :-)

[toc] | [prev] | [next] | [standalone]

#70043

From	Sturla Molden <sturla.molden@gmail.com>
Date	2014-04-10 15:24 +0000
Message-ID	<mailman.9140.1397143500.18130.python-list@python.org>
In reply to	#70031

Marko Rauhamaa <marko@pacujo.net> wrote:

> Other points:
> 
>  * When you wake up from select() (or poll(), epoll()), you should treat
>    it as a hint. The I/O call (accept()) could still raise
>    socket.error(EAGAIN).
> 
>  * The connections returned from accept() have to be individually
>    registered with select() (poll(), epoll()).
> 
>  * When you write() into a connection, you may be able to send only part
>    of the data or get EAGAIN. You need to choose a buffering strategy --
>    you should not block until all data is written out. Also take into
>    account how much you are prepared to buffer.
> 
>  * There are two main modes of multiplexing: level-triggered and
>    edge-triggered. Only epoll() (and kqueue()) support edge-triggered
>    wakeups. Edge-triggered requires more discipline from the programmer
>    but frees you from having to tell the multiplexing facility if you
>    are interested in readability or writability in any given situation.
> 
>    Edge-triggered wakeups are only guaranteed after you have gotten an
>    EAGAIN from an operation. Make sure you keep on reading/writing until
>    you get an EAGAIN. On the other hand, watch out so one connection
>    doesn't hog the process because it always has active I/O to perform.
> 
>  * You should always be ready to read to prevent deadlocks.
> 
>  * Sockets can be half-closed. Your state machines should deal with the
>    different combinations gracefully. For example, you might read an EOF
>    from the client socket before you have pushed the response out. You
>    must not close the socket before the response has finished writing.
>    On the other hand, you should not treat the half-closed socket as
>    readable.
> 
>  * While a single-threaded process will not have proper race conditions,
>    you must watch out for preemption. IOW, you might have Object A call
>    a method of Object B, which calls some other method of Object A.
>    Asyncio has a task queue facility. If you write your own main loop,
>    you should also implement a similar task queue. The queue can then be
>    used to make such tricky function calls in a safe context.
> 
>  * Asyncio provides timers. If you write your own main loop, you should
>    also implement your own timers.
> 
>    Note that modern software has to tolerate suspension (laptop lid,
>    virtual machines). Time is a tricky concept when your server wakes up
>    from a coma.
> 
>  * Specify explicit states. Your connection objects should have a data
>    member named "state" (or similar). Make your state transitions
>    explicit and obvious in the code. In fact, log them. Resist the
>    temptation of deriving the state implicitly from other object
>    information.
> 
>  * Most states should be guarded with a timer. Make sure to document for
>    each state, which timers are running.
> 
>  * In each state, check that you handle all possible events and
>    timeouts. The state/transition matrix will be quite sizable even for
>    seemingly simple tasks.


And exactly how is getting all of this correct any easier than just using
threads and blocking i/o?

I'd like to see the programmer who can get all of this correct, but has no
idea how to use a queue og mutex without deadlocking.


Sturla

[toc] | [prev] | [next] | [standalone]

#70046

From	Marko Rauhamaa <marko@pacujo.net>
Date	2014-04-10 19:20 +0300
Message-ID	<87ppkpma0d.fsf@elektro.pacujo.net>
In reply to	#70043

Sturla Molden <sturla.molden@gmail.com>:

> And exactly how is getting all of this correct any easier than just
> using threads and blocking i/o?
>
> I'd like to see the programmer who can get all of this correct, but
> has no idea how to use a queue og mutex without deadlocking.

My personal experience is that it is easier to get "all of this correct"
than threads. I've done it both ways.


Marko

[toc] | [prev] | [next] | [standalone]

#70044

From	Chris Angelico <rosuav@gmail.com>
Date	2014-04-11 01:32 +1000
Message-ID	<mailman.9141.1397143980.18130.python-list@python.org>
In reply to	#70031

On Fri, Apr 11, 2014 at 1:24 AM, Sturla Molden <sturla.molden@gmail.com> wrote:
> And exactly how is getting all of this correct any easier than just using
> threads and blocking i/o?

For a start, nearly everything Marko just posted should be dealt with
by your library. I don't know Python's asyncio as it's very new and I
haven't yet found an excuse to use it, but with Pike, I just engage
backend mode, set callbacks on the appropriate socket/file/port
objects, and let things happen perfectly. All I need to do is check a
few return values (eg if I ask a non-blocking socket to write a whole
pile of data, it might return that it wrote only some of it, in which
case I have to buffer the rest - not hard but has to be done), and
make sure I always return promptly from my callbacks so as to avoid
lagging out other operations. None of the details of C-level APIs
matter to my high level code.

ChrisA

[toc] | [prev] | [next] | [standalone]

#70047

From	Marko Rauhamaa <marko@pacujo.net>
Date	2014-04-10 19:25 +0300
Message-ID	<87lhvdm9sw.fsf@elektro.pacujo.net>
In reply to	#70044

Chris Angelico <rosuav@gmail.com>:

> For a start, nearly everything Marko just posted should be dealt with
> by your library.

Let's not kid ourselves: it is hard to get any reactive system right.

> I don't know Python's asyncio as it's very new and I haven't yet found
> an excuse to use it, but with Pike, I just engage backend mode, set
> callbacks on the appropriate socket/file/port objects, and let things
> happen perfectly.

That "set callbacks" and "let things happen" is the hard part. The
framework part is trivial.


Marko

[toc] | [prev] | [next] | [standalone]

#70048

From	Chris Angelico <rosuav@gmail.com>
Date	2014-04-11 03:08 +1000
Message-ID	<mailman.9143.1397149733.18130.python-list@python.org>
In reply to	#70047

On Fri, Apr 11, 2014 at 2:25 AM, Marko Rauhamaa <marko@pacujo.net> wrote:
>> I don't know Python's asyncio as it's very new and I haven't yet found
>> an excuse to use it, but with Pike, I just engage backend mode, set
>> callbacks on the appropriate socket/file/port objects, and let things
>> happen perfectly.
>
> That "set callbacks" and "let things happen" is the hard part. The
> framework part is trivial.

Maybe. Here's a simple self-contained Pike program that makes a simple
echo server - whatever comes in goes out again:

//Create the port (listening connection).
object mainsock=Stdio.Port(12345,accept_callback);

void accept_callback()
{
    //Get the newly-connected socket
    object sock=mainsock->accept();
    //Set up its callbacks
    sock->set_nonblocking(read_callback, write_callback, close_callback);
    //Keep track of metadata (here that'll just be the write buffer)
    sock->set_id((["sock":sock]));
}

//Attempt to write some text, buffering any that can't be written
void write(mapping info, string text)
{
    if (!text || text=="") return;
    if (info->write_me)
    {
        //There's already buffered text. Queue this text too.
        info->write_me += text;
        return;
    }
    int written = info->sock->write(text);
    if (written < 0)
    {
        //Deal with write errors brutally by closing the socket.
        info->sock->close();
        return;
    }
    info->write_me = text[written..];
}

//When more can be written, write it.
void write_callback(mapping info) {write(info, m_delete(info,"write_me"));}

void read_callback(mapping info, string data)
{
    //Simple handling: Echo the text back with a prefix.
    //Note that this isn't line-buffered or anything.
    write(info, ">> " + data);
}

//Not strictly necessary, but if you need to do something when a client
//disconnects, this is where you'd do it.
void close_callback(mapping info)
{
    info->sock = "(disconnected)";
}

//Engage backend mode.
int main() {return -1;}

Setting callbacks? One line. There's a little complexity to the "write
what you can, buffer the rest", but if you're doing anything even a
little bit serious, you'll just bury that away in a mid-level library
function. The interesting part is in the read callback, which does the
actual work (in this case, it just writes back whatever it gets). And
here's how easy it is to make it into a chat server: just replace the
read and close callbacks with these:

multiset(mapping) sockets=(<>);
void read_callback(mapping info, string data)
{
    //Simple handling: Echo the text back with a prefix.
    //Note that this isn't line-buffered or anything.
    sockets[info] = 1;
    write(indices(sockets)[*], ">> " + data);
}

//Not strictly necessary, but if you need to do something when a client
//disconnects, this is where you'd do it.
void close_callback(mapping info)
{
    info->sock = "(disconnected)";
    sockets[info] = 0;
}

If you want to handle more information (maybe get users to log in?),
you just stuff more stuff into the info mapping (it's just like a
Python dict). Handling of TELNET negotiation, line buffering, etc,
etc, can all be added between this and the user-level code - that's
what I did with the framework I wrote for work. Effectively, you just
write one function (I had it double as the read and close callbacks
for simplicity), put a declaration down the bottom to say what port
number you want (hard coded to 12345 in the above code), and
everything just happens. It really isn't hard to get callback-based
code to work nicely if you think about what you're doing.

I expect it'll be similarly simple with asyncio; does someone who's
worked with it feel like implementing similar functionality?

ChrisA

[toc] | [prev] | [next] | [standalone]

#70052

From	Rustom Mody <rustompmody@gmail.com>
Date	2014-04-10 11:14 -0700
Message-ID	<70d69403-456a-43bc-84a6-c546983c90e5@googlegroups.com>
In reply to	#70048

On Thursday, April 10, 2014 10:38:49 PM UTC+5:30, Chris Angelico wrote:
> On Fri, Apr 11, 2014 at 2:25 AM, Marko Rauhamaa wrote:
> >> I don't know Python's asyncio as it's very new and I haven't yet found
> >> an excuse to use it, but with Pike, I just engage backend mode, set
> >> callbacks on the appropriate socket/file/port objects, and let things
> >> happen perfectly.
> >
> > That "set callbacks" and "let things happen" is the hard part. The
> > framework part is trivial.
> 
> Maybe. Here's a simple self-contained Pike program that makes a simple
> echo server - whatever comes in goes out again:
> 

For analogy let me take a 'thought-discussion' between a C programmer and a python programmer regarding data structures.


-----------------------------------------------------
PP: Is it not tedious and error prone, C's use of data structures? How/Why do 
you stick to that?
CP: Oh! Is it? And what do you propose I use?
PP: Why python of course! Or any modern language with first class data and 
garbage collection!  Why spend a lifetime tracking malloc errors?!
CP: Oh! is it? And what is python implemented in?
PP: But thats the whole point!  Once Guido-n-gang have done their thing we are 
unscathed by the bugs that prick and poke and torment you day in day out.
CP: Lets look at this in more detail shall we?
PP: Very well
CP: You give me any python data structure (so-called) and I'll give it to you 
in C. And note: Its very easy. I just open up the python implementation (its in 
C in case you forgot) and clean up all the mess that has been added for the 
support of lazy python programmers. In addition, I'll give you a couple of more
data-structures/algorithms that we have easy access to but for you, your only 
choice is to drop into C to use (HeHe!)
PP: You are setting the rules of the game... and winning. I did not say I want 
fancy algorithms and data structures. I said I want (primarily) the safety of 
garbage collection. Its also neat to have an explicit syntax for basic data 
types like lists rather than scrummaging around with struct and malloc and 
pointers (hoo boy!)
CP: Yeah.. Like I said you like to be mollycoddled; we like our power and freedom

-----------------------------------------------

If I may use somewhat heavy brush-strokes:
Marco (and evidently Chris) are in the CP camp whereas Sturla is in the PP camp.
Its just the 'data-structures (and algorithms)' is now replaced by 'concurrency'

Both these viewpoints assume that the status quo of current (mainstream) 
language support for concurrency is a given and not negotiable. Erlang/Go etc disprove this.

[toc] | [prev] | [next] | [standalone]

#70058

From	Marko Rauhamaa <marko@pacujo.net>
Date	2014-04-10 22:44 +0300
Message-ID	<87wqexj7ge.fsf@elektro.pacujo.net>
In reply to	#70052

Rustom Mody <rustompmody@gmail.com>:

> Marco (and evidently Chris) are in the CP camp whereas Sturla is in
> the PP camp. Its just the 'data-structures (and algorithms)' is now
> replaced by 'concurrency'
>
> Both these viewpoints assume that the status quo of current
> (mainstream) language support for concurrency is a given and not
> negotiable.

I think you misread me (us?). I'm not trying to make life hard on
myself. Nor am I disparaging fitting abstractions and high-level
utilities.

Threads are an essential tool when used appropriately. However, I do
believe the 90's fad of treating them like a silver bullet of
concurrency was a big mistake. The industry is noticing it, as is
evident in NIO and asyncio.

Threads are enticing in that they make it quick to put together working
prototypes. The difficulties only appear when it's too late to go back.
They definitely are not the high-level abstraction you're looking for.

> Erlang/Go etc disprove this.

<URL: http://en.wikipedia.org/wiki/Leonhard_Euler#
Personal_philosophy_and_religious_beliefs>:

                  n
             a + b
        Sir, ------ = x, hence God exists—reply!
               n

Seriously, Erlang (and Go) have nice tools for managing state machines
and concurrency. However, Python (and C) are perfectly suitable for
clear asynchronous programming idioms. I'm happy that asyncio is
happening after all these long years. It would be nice if it supported
edge-triggered wakeups, but I suppose that isn't supported in all
operating systems.


Marko

[toc] | [prev] | [next] | [standalone]

#70060

From	Rustom Mody <rustompmody@gmail.com>
Date	2014-04-10 13:21 -0700
Message-ID	<685d594b-c31b-4629-b81d-4aa64d9e3394@googlegroups.com>
In reply to	#70058

On Friday, April 11, 2014 1:14:01 AM UTC+5:30, Marko Rauhamaa wrote:

> 
> 
> Seriously, Erlang (and Go) have nice tools for managing state machines
> and concurrency. However, Python (and C) are perfectly suitable for
> clear asynchronous programming idioms. I'm happy that asyncio is
> happening after all these long years. It would be nice if it supported
> edge-triggered wakeups, but I suppose that isn't supported in all
> operating systems.
> 

Yes... Let me restate what (I hear you as) saying

Lets start with pure uniprocessor machines for ease of discussion (also of history)
An OS, sits between the uni-hardware and provides multi{processing,users,threads,etc}.
How does it do it? By the mechanisms process-switching, interleaving etc 
In short all the good-stuff... that constitutes asyncio (and relations)

What you are saying is that what the OS is doing, you can do better.
Analogous to said C programmer saying that what (data structures) the python 
programmer can make he can do better.

Note I dont exactly agree with Sturla either.
To see that time-shift the C/Python argument 30 years back when it was imperative
languages vs poorly implemented, buggy, interpreted Lisp/Prolog.

In that world, your 'I'd rather do it by hand/work out my state machine'
would make considerable sense.

Analogously, if the only choice were mainstream (concurrency-wise) languages --
C/C++/Java/Python -- + native threads + overheads + ensuing errors/headaches, then
the: "Please let me work out my state machine and manage my affairs" would be sound.

But its not the only choice!!

> http://en.wikipedia.org/wiki/Leonhard_Euler#Personal_philosophy_and_religious_beliefs
> 
>                   n
>              a + b
>         Sir, ------ = x, hence God exists--reply!
>                n

I always thought that God exists because was e^(ipi) + 1 = 0 :D
Evidently (s)he has better reasons for existing!

[toc] | [prev] | [next] | [standalone]

#70061

From	Marko Rauhamaa <marko@pacujo.net>
Date	2014-04-10 23:44 +0300
Message-ID	<87sipkkj7p.fsf@elektro.pacujo.net>
In reply to	#70060

Rustom Mody <rustompmody@gmail.com>:

> What you are saying is that what the OS is doing, you can do better.
> Analogous to said C programmer saying that what (data structures) the
> python programmer can make he can do better.

I'm sorry, but I don't quite follow you there.

I see the regular multithreaded approach as

 (1) oversimplification which makes it difficult to extend the design
     and handle all of the real-world contingencies

 (2) inviting race conditions carelessly--no mortal is immune.


Marko

[toc] | [prev] | [next] | [standalone]

#70092

From	Rustom Mody <rustompmody@gmail.com>
Date	2014-04-10 22:15 -0700
Message-ID	<b86c99de-f780-4c4a-904a-f5b054f4e891@googlegroups.com>
In reply to	#70061

On Friday, April 11, 2014 2:14:42 AM UTC+5:30, Marko Rauhamaa wrote:
> Rustom Mody:
> 
> > What you are saying is that what the OS is doing, you can do better.
> > Analogous to said C programmer saying that what (data structures) the
> > python programmer can make he can do better.
> 
> 
> 
> I'm sorry, but I don't quite follow you there.

Ok let me try again (Please note I am speaking more analogically than logically)

There was a time -- say 1990 -- when there was this choice
 - use C -- a production language with half-assed data structures support
 - use Lisp -- strong support for data structures but otherwise unrealistic

From this world and its world view its natural to conclude that to choose 
a strong data structure supporting language is to choose an unrealistic language

I was in the thick of this debate then
http://www.the-magus.in/Publications/chor.pdf

This argument is seen to be fallacious once we have languages like python
(and Ruby and Java and Perl and Haskell and ...)

Today we are in the same position vis-a-vis concurrency as we were with 
data structures in 1990.

We have mainstream languages -- Java,C,C++,Python -- with half-assed 
concurrency support. And we have languages like Erlang, Go, Cloud Haskell which 
make concurrency center-stage but are otherwise lacking and unrealistic.

I disagree with you in saying "We cant do better (than stay within the options
offered by mainstream languages"

As an individual you are probably right.
From a larger systemic pov (hopefully!) not!

I disagree with Sturla in what is considered invariant and what is under one's control.

He (seems?) to take hardware as under control, programming paradigm as not.
I believe that the mileage that can be achieved by working on both is more than
can be achieved by either alone.

> I see the regular multithreaded approach as
>  (2) inviting race conditions carelessly--no mortal is immune.

This I understand and concur with

> 
>  (1) oversimplification which makes it difficult to extend the design
>      and handle all of the real-world contingencies

This I dont...

[toc] | [prev] | [next] | [standalone]

Page 4 of 6 — ← Prev page 1 2 3 [4] 5 6 Next page →

csiph-web

Re: threading

Contents

#69962

#69964

#69965

#69989

#69996

#70027

#70028

#70029

#70031

#70035

#70043

#70046

#70044

#70047

#70048

#70052

#70058

#70060

#70061

#70092