Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #72431 > unrolled thread

Benefits of asyncio

Started byAseem Bansal <asmbansal2@gmail.com>
First post2014-06-02 10:40 -0700
Last post2014-06-02 21:54 -0700
Articles 20 on this page of 32 — 10 participants

Back to article view | Back to comp.lang.python


Contents

  Benefits of asyncio Aseem Bansal <asmbansal2@gmail.com> - 2014-06-02 10:40 -0700
    Re: Benefits of asyncio Ian Kelly <ian.g.kelly@gmail.com> - 2014-06-02 12:37 -0600
    Re: Benefits of asyncio Terry Reedy <tjreedy@udel.edu> - 2014-06-02 16:07 -0400
      Re: Benefits of asyncio Roy Smith <roy@panix.com> - 2014-06-02 16:19 -0400
      Re: Benefits of asyncio Marko Rauhamaa <marko@pacujo.net> - 2014-06-02 23:28 +0300
        Re: Benefits of asyncio Paul Rubin <no.email@nospam.invalid> - 2014-06-02 13:45 -0700
          Re: Benefits of asyncio Chris Angelico <rosuav@gmail.com> - 2014-06-03 07:49 +1000
          Re: Benefits of asyncio Terry Reedy <tjreedy@udel.edu> - 2014-06-02 21:51 -0400
          Re: Benefits of asyncio Marko Rauhamaa <marko@pacujo.net> - 2014-06-03 09:36 +0300
            Re: Benefits of asyncio Chris Angelico <rosuav@gmail.com> - 2014-06-03 18:47 +1000
              Re: Benefits of asyncio Marko Rauhamaa <marko@pacujo.net> - 2014-06-03 12:10 +0300
                Re: Benefits of asyncio Chris Angelico <rosuav@gmail.com> - 2014-06-03 19:30 +1000
                  Re: Benefits of asyncio Marko Rauhamaa <marko@pacujo.net> - 2014-06-03 13:08 +0300
                    Re: Benefits of asyncio Chris Angelico <rosuav@gmail.com> - 2014-06-03 20:23 +1000
                      Re: Benefits of asyncio Marko Rauhamaa <marko@pacujo.net> - 2014-06-03 14:12 +0300
                        Re: Benefits of asyncio Paul Rubin <no.email@nospam.invalid> - 2014-06-04 00:52 -0700
                Re: Benefits of asyncio Burak Arslan <burak.arslan@arskom.com.tr> - 2014-06-03 14:05 +0300
                Re: Benefits of asyncio Chris Angelico <rosuav@gmail.com> - 2014-06-03 21:57 +1000
                Re: Benefits of asyncio Burak Arslan <burak.arslan@arskom.com.tr> - 2014-06-04 08:10 +0300
                Re: Benefits of asyncio Chris Angelico <rosuav@gmail.com> - 2014-06-04 17:30 +1000
                Re: Benefits of asyncio Paul Rubin <no.email@nospam.invalid> - 2014-06-04 00:48 -0700
            Re: Benefits of asyncio "Frank Millman" <frank@chagford.com> - 2014-06-03 13:09 +0200
            Re: Benefits of asyncio Chris Angelico <rosuav@gmail.com> - 2014-06-03 22:01 +1000
              Re: Benefits of asyncio Marko Rauhamaa <marko@pacujo.net> - 2014-06-03 16:05 +0300
                Re: Benefits of asyncio Chris Angelico <rosuav@gmail.com> - 2014-06-03 23:31 +1000
                  Re: Benefits of asyncio Marko Rauhamaa <marko@pacujo.net> - 2014-06-03 16:42 +0300
                    Re: Benefits of asyncio Chris Angelico <rosuav@gmail.com> - 2014-06-03 23:49 +1000
                      Re: Benefits of asyncio Marko Rauhamaa <marko@pacujo.net> - 2014-06-03 19:18 +0300
                Re: Benefits of asyncio Roy Smith <roy@panix.com> - 2014-06-03 11:40 -0400
          Re: Benefits of asyncio Paul Sokolovsky <pmiscml@gmail.com> - 2014-06-03 11:31 +0300
    Re: Benefits of asyncio Burak Arslan <burak.arslan@arskom.com.tr> - 2014-06-03 00:07 +0300
    Re: Benefits of asyncio Aseem Bansal <asmbansal2@gmail.com> - 2014-06-02 21:54 -0700

Page 1 of 2  [1] 2  Next page →


#72431 — Benefits of asyncio

FromAseem Bansal <asmbansal2@gmail.com>
Date2014-06-02 10:40 -0700
SubjectBenefits of asyncio
Message-ID<d1634744-39e8-4bfa-be76-1fd2292fd59f@googlegroups.com>
I read in these groups that asyncio is a great addition to Python 3. I have looked around and saw the related PEP which is quite big BTW but couldn't find a simple explanation for why this is such a great addition. Any simple example where it can be used? 

It can be used to have a queue of tasks? Like threads? Maybe light weight threads? Those were my thoughts but the library reference clearly stated that this is single-threaded. So there should be some waiting time in between the tasks. Then what is good?

These are just jumbled thoughts that came into my mind while trying to make sense of usefulness of asyncio. Anyone can give a better idea?

[toc] | [next] | [standalone]


#72438

FromIan Kelly <ian.g.kelly@gmail.com>
Date2014-06-02 12:37 -0600
Message-ID<mailman.10572.1401734686.18130.python-list@python.org>
In reply to#72431
On Mon, Jun 2, 2014 at 11:40 AM, Aseem Bansal <asmbansal2@gmail.com> wrote:
> I read in these groups that asyncio is a great addition to Python 3. I have looked around and saw the related PEP which is quite big BTW but couldn't find a simple explanation for why this is such a great addition. Any simple example where it can be used?
>
> It can be used to have a queue of tasks? Like threads? Maybe light weight threads? Those were my thoughts but the library reference clearly stated that this is single-threaded. So there should be some waiting time in between the tasks. Then what is good?
>
> These are just jumbled thoughts that came into my mind while trying to make sense of usefulness of asyncio. Anyone can give a better idea?

You're right, neither the PEP nor the docs to much to motivate the
module's existence. I suggest you start here:
http://en.wikipedia.org/wiki/Asynchronous_I/O

The asynchronous model lets you initiate a task (typically an I/O
task) that would normally block, and then go on to do other things
(like initiating more tasks) while waiting on that task, without
having to resort to multiple threads or processes (which have the
disadvantages of consuming more system resources as well as
introducing the risk of race conditions and deadlocks).

It does this by using callbacks; when a task is complete, a callback
is called that handles its completion.  Often in asynchronous code you
end up with large networks of callbacks that can be confusing to
follow and debug because nothing ever gets called directly.  One of
the significant features of the asyncio module is that it allows
asynchronous programming using coroutines, where the callbacks are
abstracted away and essentially have the effect of resuming the
coroutine when the task completes.  Thus you end up writing code that
looks a lot like threaded, sequential code with none of the pitfalls.

[toc] | [prev] | [next] | [standalone]


#72439

FromTerry Reedy <tjreedy@udel.edu>
Date2014-06-02 16:07 -0400
Message-ID<mailman.10573.1401739639.18130.python-list@python.org>
In reply to#72431
On 6/2/2014 1:40 PM, Aseem Bansal wrote:

The following supplement Ian's answer.

> I read in these groups that asyncio is a great addition to Python 3.
> I have looked around and saw the related PEP which is quite big BTW
> but couldn't find a simple explanation for why this is such a great
> addition. Any simple example where it can be used?

asyncio replaces the very old asyncore, which has problems, is beyond 
fixing due to its design, and is now deprecated. So look up used for 
asyncore. You could think of asyncio as a lightweight version or core of 
other async packages, such as Twisted or Tornado. What are they good 
for. I admit that you should now have to answer the question so 
indirectly. One generic answer: carry on 'simultaneous' conversions with 
multiple external systems.

asyncio lets you write platform independent code while it makes good use 
of the asynchronous i/o available on each particular system. Async-i/o 
is one area where Windows has made advances over posix. But the models 
are different, and if one uses Windows' i/o completion as if it were 
posix poll/select, it works poorly. Running well on both types of 
systems was a major challenge.

> It can be used to have a queue of tasks?

Try set of tasks, as the sequencing may depend on external response times.

> Like threads? Maybe light weight threads?

Try light-weight thread, manages by Python instead of the OS.
I believe greenlets are a somewhat similar example.

> Those were my thoughts but the library reference
> clearly stated that this is single-threaded.

Meaning, asyncio itself only uses one os thread. The application, or 
individual tasks, can still spin off other os threads, perhaps for a 
long computation.

 > So there should be some waiting time in between the tasks.

I do not understand this. asyncio should switch between tasks faster 
than the OS switches between threads, thus reducing waiting time.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]


#72440

FromRoy Smith <roy@panix.com>
Date2014-06-02 16:19 -0400
Message-ID<roy-BC8EF4.16195002062014@news.panix.com>
In reply to#72439
In article <mailman.10573.1401739639.18130.python-list@python.org>,
 Terry Reedy <tjreedy@udel.edu> wrote:

> asyncio lets you write platform independent code while it makes good use 
> of the asynchronous i/o available on each particular system. Async-i/o 
> is one area where Windows has made advances over posix. But the models 
> are different, and if one uses Windows' i/o completion as if it were 
> posix poll/select, it works poorly. Running well on both types of 
> systems was a major challenge.

How would you compare using the new asyncio module to using gevent?  It 
seems like they do pretty much the same thing.  Assume, for the moment, 
that gevent runs on Python 3.x (which I assume it will, eventually).

[toc] | [prev] | [next] | [standalone]


#72441

FromMarko Rauhamaa <marko@pacujo.net>
Date2014-06-02 23:28 +0300
Message-ID<874n03t5t9.fsf@elektro.pacujo.net>
In reply to#72439
Terry Reedy <tjreedy@udel.edu>:

> I do not understand this. asyncio should switch between tasks faster
> than the OS switches between threads, thus reducing waiting time.

I don't know if thread switching is slower than task switching. However,
there are two main reasons to prefer asyncio over threads:

 * Scalability. Asyncio can easily manage, say, a million contexts. Most
   operating systems will have a hard time managing more than about a
   thousand threads.

   Such scalability needs may arise in very busy network servers with
   tens of thousands of simultaneous connections or computer games that
   simulate thousands of "monsters."

 * Conceptual simplicity. Toy servers are far easier to implement using
   threads. However, before long, the seeming simplicity turns out to be
   a complication:

    - Thread programming assumes each thread is waiting for precisely
      one external stimulus in any given state -- in practice, each
      state must be prepared to handle quite a few possible stimuli.

    - Thread-safe programming is easy to explain but devilishly
      difficult to get right.

   Asyncio makes the prototype somewhat cumbersome to write. However,
   once it is done, adding features, stimuli and states is a routine
   matter.

Threads have one major advantage: they can naturally take advantage of
multiple CPU cores. Generally, I would stay away from threads and use
multiple processes instead. However, threads may sometimes be the
optimal solution. The key is to keep the number of threads small (maybe
twice the number of CPUs).


Marko

[toc] | [prev] | [next] | [standalone]


#72442

FromPaul Rubin <no.email@nospam.invalid>
Date2014-06-02 13:45 -0700
Message-ID<7x4n03dor0.fsf@ruckus.brouhaha.com>
In reply to#72441
Marko Rauhamaa <marko@pacujo.net> writes:
>     - Thread programming assumes each thread is waiting for precisely
>       one external stimulus in any given state -- in practice, each
>       state must be prepared to handle quite a few possible stimuli.

Eh?  Threads typically have their own event loop dispatching various
kinds of stimuli.

>     - Thread-safe programming is easy to explain but devilishly
>       difficult to get right.

I keep hearing that but not encountering it.  Yes there are classic
hazards from sharing mutable state between threads.  However, it's
generally not too difficult to program in a style that avoids such
sharing.  Have threads communicate by message passing with immutable
data in the messages, and things tend to work pretty straightforwardly.

>    Asyncio makes the prototype somewhat cumbersome to write. However,
>    once it is done, adding features, stimuli and states is a routine
>    matter.

Having dealt with some node.js programs and the nest of callbacks they
morph into as the application gets more complicated, threads have their
advantages.

[toc] | [prev] | [next] | [standalone]


#72445

FromChris Angelico <rosuav@gmail.com>
Date2014-06-03 07:49 +1000
Message-ID<mailman.10576.1401745786.18130.python-list@python.org>
In reply to#72442
On Tue, Jun 3, 2014 at 6:45 AM, Paul Rubin <no.email@nospam.invalid> wrote:
>>     - Thread-safe programming is easy to explain but devilishly
>>       difficult to get right.
>
> I keep hearing that but not encountering it.  Yes there are classic
> hazards from sharing mutable state between threads.  However, it's
> generally not too difficult to program in a style that avoids such
> sharing.  Have threads communicate by message passing with immutable
> data in the messages, and things tend to work pretty straightforwardly.

It's more true on some systems than others. The issues of maintaining
"safe" state are very similar in callback systems and threads; the
main difference is that a single-threaded asyncio system becomes
cooperative, where threading systems are (usually) preemptive.

Preemption means you could get a context switch *anywhere*. (In
Python, I think the rule is that thread switches can happen only
between Python bytecodes, but that's still "anywhere" as far as your
code's concerned.) That means you have to *keep* everything safe,
rather than simply get it safe again.

Cooperative multitasking means your function will run to completion
before any other callback happens (or, at least, will get to a clearly
defined yield point). That means you can muck state up all you like,
and then fix it afterwards. In some ways, that's easier; but it has a
couple of risks: firstly, if your code jumps out early somewhere, you
might forget to fix the shared state, and only find out much later;
and secondly, if your function takes a long time to execute,
everything else stalls.

So whichever way you do it, you still have to be careful - just
careful of slightly different things. For instance, you might keep
track of network activity as a potentially slow operation, and make
sure you never block a callback waiting for a socket - but you might
do a quick and simple system call, not realizing that it involves a
directory that's mounted from a remote server. With threads, someone
else will get priority as soon as you block, but with asyncio, you
have to be explicit about everything that's done asynchronously.

Threads are massively simpler if you have a top-down execution model
for a relatively small number of clients. Works really nicely for a
sequence of prompts - you just code it exactly as if you were using
print() and input() and stuff, and then turn print() into a blocking
socket write (or whatever your I/O is done over) and your input() into
a blocking socket read with line splitting, and that's all the changes
you need. (You could even replace the actual print and input
functions, and use a whole block of code untouched.)

Async I/O is massively simpler if you have very little state, and
simply react to stimuli. Every client connects, authenticates,
executes commands, and terminates its connection. If all you need to
know is whether the client's authenticated or not (restricted
commandset before login), asyncio will be really *really* easy, and
threads are overkill. This is even more true if most of your clients
are going to be massively idle most of the time, with just tiny
queries coming in occasionally and getting responded to quickly.

Both have their advantages and disadvantages. Learning both models is,
IMO, worth doing; get to know them, then decide which one suits your
project.

>>    Asyncio makes the prototype somewhat cumbersome to write. However,
>>    once it is done, adding features, stimuli and states is a routine
>>    matter.
>
> Having dealt with some node.js programs and the nest of callbacks they
> morph into as the application gets more complicated, threads have their
> advantages.

I wrote an uberlite async I/O framework for my last job. Most of the
work was done by the lower-level facilities (actual non-blocking I/O,
etc), but basically, what I had was a single callback for each
connection type and a dictionary of state for each connection (with a
few exceptions - incoming UDP has no state, ergo no dict). Worked out
beautifully simple; each run through the callback processed one
logical action (eg a line of text arriving on a socket, terminated by
newline), updated state if required, and returned, back to the main
loop. Not all asyncio will fit into that sort of structure, but if it
does fit, this keeps everything from getting out of hand.

(Plus, keeping state in a separate dict rather than using closures and
local variables meant I could update code while maintaining state. Not
important for most Python projects, but it was for us.)

Both have their merits.

ChrisA

[toc] | [prev] | [next] | [standalone]


#72457

FromTerry Reedy <tjreedy@udel.edu>
Date2014-06-02 21:51 -0400
Message-ID<mailman.10583.1401760311.18130.python-list@python.org>
In reply to#72442
To all the great responders. If anyone thinks the async intro is 
inadequate and has a paragraph to contribute, open a tracker issue.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]


#72481

FromMarko Rauhamaa <marko@pacujo.net>
Date2014-06-03 09:36 +0300
Message-ID<878upe8poc.fsf@elektro.pacujo.net>
In reply to#72442
Paul Rubin <no.email@nospam.invalid>:

> Marko Rauhamaa <marko@pacujo.net> writes:
>>     - Thread programming assumes each thread is waiting for precisely
>>       one external stimulus in any given state -- in practice, each
>>       state must be prepared to handle quite a few possible stimuli.
>
> Eh?  Threads typically have their own event loop dispatching various
> kinds of stimuli.

I have yet to see that in practice. The "typical" thread works as
follows:

    while True:
        while request.incomplete():
            request.read()                 # block
        sql_stmt = request.process()
        db.act(sql_stmt)                   # block
        db.commit()                        # block
        response = request.ok_response()
        while response.incomplete():
            response.write()               # block

The places marked with the "block" comment are states with only one
valid input stimulus.

> Have threads communicate by message passing with immutable data in the
> messages, and things tend to work pretty straightforwardly.

Again, I have yet to see that in practice. It is more common, and
naturally enforced, with multiprocessing.

> Having dealt with some node.js programs and the nest of callbacks they
> morph into as the application gets more complicated, threads have
> their advantages.

If threads simplify an asynchronous application, that is generally done
by oversimplifying and reducing functionality.

Yes, a "nest of callbacks" can get messy very quickly. That is why you
need to be very explicit with your states. Your class needs to have a
state field named "state" with clearly named state values.


Marko

[toc] | [prev] | [next] | [standalone]


#72492

FromChris Angelico <rosuav@gmail.com>
Date2014-06-03 18:47 +1000
Message-ID<mailman.10606.1401785234.18130.python-list@python.org>
In reply to#72481
On Tue, Jun 3, 2014 at 4:36 PM, Marko Rauhamaa <marko@pacujo.net> wrote:
> I have yet to see that in practice. The "typical" thread works as
> follows:
>
>     while True:
>         while request.incomplete():
>             request.read()                 # block
>         sql_stmt = request.process()
>         db.act(sql_stmt)                   # block
>         db.commit()                        # block
>         response = request.ok_response()
>         while response.incomplete():
>             response.write()               # block
>
> The places marked with the "block" comment are states with only one
> valid input stimulus.
> ...
> Yes, a "nest of callbacks" can get messy very quickly. That is why you
> need to be very explicit with your states. Your class needs to have a
> state field named "state" with clearly named state values.

Simple/naive way to translate this into a callback system is like this:

def request_read_callback(request, data):
    request.read(data) # however that part works
    if not request.incomplete():
        request.process()

def write(request, data):
    request.write_buffer += data
    request.attempt_write() # sets up callbacks for async writing

def request.process(self): # I know this isn't valid syntax
    db.act(whatever) # may block but shouldn't for long
    db.commit() # ditto
    write(self, response) # won't block


This works as long as your database is reasonably fast and close
(common case for a lot of web servers: DB runs on same computer as web
and application and etc servers). It's nice and simple, lets you use a
single database connection (although you should probably wrap it in a
try/finally to ensure that you roll back on any exception), and won't
materially damage throughput as long as you don't run into problems.
For a database driven web site, most of the I/O time will be waiting
for clients, not waiting for your database.

Getting rid of those blocking database calls means having multiple
concurrent transactions on the database. Whether you go async or
threaded, this is going to happen. Unless your database lets you run
multiple simultaneous transactions on a single connection (I don't
think the Python DB API allows that, and I can't think of any DB
backends that support it, off hand), that means that every single
concurrency point needs its own database connection. With threads, you
could have a pool of (say) a dozen or so, one per thread, with each
one working synchronously; with asyncio, you'd have to have one for
every single incoming client request, or else faff around with
semaphores and resource pools and such manually. The throughput you
gain by making those asynchronous with callbacks is quite probably
destroyed by the throughput you lose in having too many simultaneous
connections to the database. I can't prove that, obviously, but I do
know that PostgreSQL requires up-front RAM allocation based on the
max_connections setting, and trying to support 5000 connections
started to get kinda stupid.

So how do you deal with the possibility that the database will block?
"Pure" threading (one thread listens for clients, spin off a thread
for each client, end the thread when the client disconnects) copes
poorly; async I/O copes poorly. The thread pool copes well (you know
exactly how many connections you'll need - one per thread in the
pool), but doesn't necessarily solve the problem (you can get all
threads waiting on the database and none handling other requests).
Frankly, I think the only solution is to beef up the database so it
won't block for too long (and, duh, to solve any stupid locking
problems, because they WILL kill you :) ).

> If threads simplify an asynchronous application, that is generally done
> by oversimplifying and reducing functionality.

Which means that I disagree with this statement. In my opinion, both
simple models (pure threading and asyncio) can express the same
functionality; the hybrid thread-pool model may simplify things a bit
in the interests of resource usage; but threading does let you think
about code the same way for one client as for fifty, without any
change of functionality. Compare:

# Console I/O:
def print_menu():
    print("1: Spam")
    print("2: Ham")
    print("3: Quit")

def spam():
    print("Spam, spam, spam, spam,")
    while input("Continue? ")!="NO!":
        print("spam, spam, spam...")

def mainloop():
    print("Welcome!")
    while True:
        print_menu()
        x = int(input("What would you like? "))
        if x == 1: spam()
        elif x == 2: ham()
        elif x == 3: break
        else: print("I don't know numbers like %d."%x)
    print("Goodbye!")


I could translate this into a pure-threading system very easily:

# Socket I/O:
import consoleio
class TerminateRequest(Exception): pass
tls = threading.local()
def print(s):
    tls.socket.write(s+"\r\n") # Don't forget, most of the internet uses \r\n!

def input(prompt):
    tls.socket.write(s)
    while '\n' not in tls.readbuffer:
        tls.readbuffer += tls.socket.read()
        if not tls.socket.connected(): raise TerminateRequest
    ret, _, tls.readbuffer = tls.readbuffer.partition("\n")
    return ret.strip("\r")

consoleio.print = print
consoleio.input = input

def thread(sock):
    tls.socket = sock
    tls.readbuffer = ""
    try: consoleio.mainloop()
    except TerminateRequest: pass

mainsock = socket.socket() # bind to port, whatever
mainsock.listen(5)
while True:
    threading.Thread(target=thread,args=(mainsock.accept(),))


The only changes I made to the application main loop are to change
(via injection) print and input to use the socket. Everything else -
the (bad) logic inside spam(), helper functions, etc, etc - works
exactly the same way. So if you're coming from a single-thread
single-client starting point, it's definitely way simpler to follow a
model like this. (And I have written enough programs along this sort
of line to be able to write the above without more than some quick
checks to see how thread-local storage is done in Python. It's a
really simple model. On the flip side, that does also mean the code
above is completely untested, not to mention stubby in places, so it
probably won't actually run.)

How do you translate that to asynchronous I/O? Normally, with piles
and piles of callbacks. I haven't looked into tulip yet, but my
understanding is that it may be possible to tulipify the above code in
the same way that I socketified it - you'd simply have a "yield from"
inside input(), which would let the system go and deal with other
requests; but otherwise, you would need to completely rewrite the main
code. The exact same functionality would look a lot messier; and you
have the exact same problems to deal with either way. Your connection
state has to cope with both input and output buffers (since you have
to make sure you don't block inside print() either), and while you can
in most applications assume that there won't be a huge amount of
output buffering (and can therefore let mainloop() pretend that it all
got written instantly, while the actual writing happens in the
background), sometimes you have to be explicit about that part of
state too. So the consoleio module above would need a lot of separate
states: every single print() or input() call becomes its own state, in
which the connection might stall. Do you really want to do that with
callbacks? Because it would *definitely* be less clear.

ChrisA

[toc] | [prev] | [next] | [standalone]


#72494

FromMarko Rauhamaa <marko@pacujo.net>
Date2014-06-03 12:10 +0300
Message-ID<87wqcyuznv.fsf@elektro.pacujo.net>
In reply to#72492
Chris Angelico <rosuav@gmail.com>:

> def request.process(self): # I know this isn't valid syntax
>     db.act(whatever) # may block but shouldn't for long
>     db.commit() # ditto
>     write(self, response) # won't block
>
> This works as long as your database is reasonably fast and close

I find that assumption unacceptable.

The DB APIs desperately need asynchronous variants. As it stands, you
are forced to delegate your DB access to threads/processes.

> So how do you deal with the possibility that the database will block?

You separate the request and response parts of the DB methods. That's
how it is implemented internally anyway.

Say no to blocking APIs.

> but otherwise, you would need to completely rewrite the main code.

That's a good reason to avoid threads. Once you realize you would have
been better off with an async approach, you'll have to start over. You
can easily turn a nonblocking solution into a blocking one but not the
other way around.


Marko

[toc] | [prev] | [next] | [standalone]


#72496

FromChris Angelico <rosuav@gmail.com>
Date2014-06-03 19:30 +1000
Message-ID<mailman.10609.1401787816.18130.python-list@python.org>
In reply to#72494
On Tue, Jun 3, 2014 at 7:10 PM, Marko Rauhamaa <marko@pacujo.net> wrote:
> Chris Angelico <rosuav@gmail.com>:
>
>> def request.process(self): # I know this isn't valid syntax
>>     db.act(whatever) # may block but shouldn't for long
>>     db.commit() # ditto
>>     write(self, response) # won't block
>>
>> This works as long as your database is reasonably fast and close
>
> I find that assumption unacceptable.

It is a dangerous assumption.

> The DB APIs desperately need asynchronous variants. As it stands, you
> are forced to delegate your DB access to threads/processes.
>
>> So how do you deal with the possibility that the database will block?
>
> You separate the request and response parts of the DB methods. That's
> how it is implemented internally anyway.
>
> Say no to blocking APIs.

Okay, but how do you handle two simultaneous requests going through
the processing that you see above? You *MUST* separate them onto two
transactions, otherwise one will commit half of the other's work. (Or
are you forgetting Databasing 101 - a transaction should be a logical
unit of work?) And since you can't, with most databases, have two
transactions on one connection, that means you need a separate
connection for each request. Given that the advantages of asyncio
include the ability to scale to arbitrary numbers of connections, it's
not really a good idea to then say "oh but you need that many
concurrent database connections". Most systems can probably handle a
few thousand threads without a problem, but a few million is going to
cause major issues; but most databases start getting inefficient at a
few thousand concurrent sessions.

>> but otherwise, you would need to completely rewrite the main code.
>
> That's a good reason to avoid threads. Once you realize you would have
> been better off with an async approach, you'll have to start over. You
> can easily turn a nonblocking solution into a blocking one but not the
> other way around.

Alright. I'm throwing down the gauntlet. Write me a purely nonblocking
web site concept that can handle a million concurrent connections,
where each one requires one query against the database, and one in a
hundred of them require five queries which happen atomically. I can do
it with a thread pool and blocking database queries, and by matching
the thread pool size and the database concurrent connection limit, I
can manage memory usage fairly easily; how do you do it efficiently
with pure async I/O?

ChrisA

[toc] | [prev] | [next] | [standalone]


#72498

FromMarko Rauhamaa <marko@pacujo.net>
Date2014-06-03 13:08 +0300
Message-ID<87ppiquwyw.fsf@elektro.pacujo.net>
In reply to#72496
Chris Angelico <rosuav@gmail.com>:

> Okay, but how do you handle two simultaneous requests going through
> the processing that you see above? You *MUST* separate them onto two
> transactions, otherwise one will commit half of the other's work. (Or
> are you forgetting Databasing 101 - a transaction should be a logical
> unit of work?) And since you can't, with most databases, have two
> transactions on one connection, that means you need a separate
> connection for each request. Given that the advantages of asyncio
> include the ability to scale to arbitrary numbers of connections, it's
> not really a good idea to then say "oh but you need that many
> concurrent database connections". Most systems can probably handle a
> few thousand threads without a problem, but a few million is going to
> cause major issues; but most databases start getting inefficient at a
> few thousand concurrent sessions.

I will do whatever I have to. Pooling transaction contexts
("connections") is probably necessary. Point is, no task should ever
block.

I deal with analogous situations all the time, in fact, I'm dealing with
one as we speak.

> Alright. I'm throwing down the gauntlet. Write me a purely nonblocking
> web site concept that can handle a million concurrent connections,
> where each one requires one query against the database, and one in a
> hundred of them require five queries which happen atomically. I can do
> it with a thread pool and blocking database queries, and by matching
> the thread pool size and the database concurrent connection limit, I
> can manage memory usage fairly easily; how do you do it efficiently
> with pure async I/O?

Sorry, I'm going to pass. That doesn't look like a 5-liner.


Marko

[toc] | [prev] | [next] | [standalone]


#72499

FromChris Angelico <rosuav@gmail.com>
Date2014-06-03 20:23 +1000
Message-ID<mailman.10611.1401791037.18130.python-list@python.org>
In reply to#72498
On Tue, Jun 3, 2014 at 8:08 PM, Marko Rauhamaa <marko@pacujo.net> wrote:
> Chris Angelico <rosuav@gmail.com>:
>
>> Okay, but how do you handle two simultaneous requests going through
>> the processing that you see above? You *MUST* separate them onto two
>> transactions, otherwise one will commit half of the other's work.
>
> I will do whatever I have to. Pooling transaction contexts
> ("connections") is probably necessary. Point is, no task should ever
> block.
>
> I deal with analogous situations all the time, in fact, I'm dealing with
> one as we speak.

Rule 1: No task should ever block.
Rule 2: Every task will require the database at least once.
Rule 3: No task's actions on the database should damage another task's
state. (Separate transactions.)
Rule 4: Maximum of N concurrent database connections, for any given value of N.

The only solution I can think of is to have a task wait (without
blocking) for a database connection to be available. That's a lot of
complexity, and you know what? It's going to come to exactly the same
thing as blocking database queries will - your throughput is defined
by your database.

It's the same with all sorts of other resources. What happens if your
error logging blocks? Do you code everything, *absolutely everything*,
around callbacks? Because ultimately, it adds piles and piles of
complexity and inefficiency, and it still comes back to the same
thing: stuff can make other stuff wait.

That's where threads are simpler. You do blocking I/O everywhere, and
the system deals with the rest. Has its limitations, but sure is
simpler.

ChrisA

[toc] | [prev] | [next] | [standalone]


#72506

FromMarko Rauhamaa <marko@pacujo.net>
Date2014-06-03 14:12 +0300
Message-ID<87lhteuu0c.fsf@elektro.pacujo.net>
In reply to#72499
Chris Angelico <rosuav@gmail.com>:

> your throughput is defined by your database.

Asyncio is not (primarily) a throughput-optimization method. Sometimes
it is a resource consumption optimization method as the context objects
are lighter-weight than full-blown threads.

Mostly asyncio is a way to deal with anything you throw at it. What do
you do if you need to exit the application immediately and your threads
are stuck in a 2-minute timeout? With asyncio, you have full control of
the situation.

> It's the same with all sorts of other resources. What happens if your
> error logging blocks? Do you code everything, *absolutely everything*,
> around callbacks? Because ultimately, it adds piles and piles of
> complexity and inefficiency, and it still comes back to the same
> thing: stuff can make other stuff wait.

It would be interesting to have an OS or a programming language where no
function returns a value. Linux, in particular, suffers from the
deeply-ingrained system assumption that all file access is synchronous.

However, your protestations seem like a straw man to me. I have really
been practicing event-driven programming for decades. It is fraught with
frustrating complications but they feel like fresh air compared with the
what-now moments I've had to deal with doing multithreaded programming.


Marko

[toc] | [prev] | [next] | [standalone]


#72613

FromPaul Rubin <no.email@nospam.invalid>
Date2014-06-04 00:52 -0700
Message-ID<7xsinlw1q5.fsf@ruckus.brouhaha.com>
In reply to#72506
Marko Rauhamaa <marko@pacujo.net> writes:
> Mostly asyncio is a way to deal with anything you throw at it. What do
> you do if you need to exit the application immediately and your threads
> are stuck in a 2-minute timeout? 

Eh?  When the main thread exits, all the child threads go with it.
Sometimes there is some crap in the stderr because of resource cleanups
happening in unexpected order as the various threads exit, but it
all shuts down.  

The new Tulip i/o stuff based on "yield" coroutines should combine the
advantages of async and threads.

[toc] | [prev] | [next] | [standalone]


#72504

FromBurak Arslan <burak.arslan@arskom.com.tr>
Date2014-06-03 14:05 +0300
Message-ID<mailman.10614.1401793504.18130.python-list@python.org>
In reply to#72494
On 06/03/14 12:30, Chris Angelico wrote:
> Write me a purely nonblocking
> web site concept that can handle a million concurrent connections,
> where each one requires one query against the database, and one in a
> hundred of them require five queries which happen atomically.


I don't see why that can't be done. Twisted has everyting I can think of
except database bits (adb runs on threads), and I got txpostgres[1]
running in production, it seems quite robust so far. what else are we
missing?

[1]: https://pypi.python.org/pypi/txpostgres

[toc] | [prev] | [next] | [standalone]


#72509

FromChris Angelico <rosuav@gmail.com>
Date2014-06-03 21:57 +1000
Message-ID<mailman.10617.1401796684.18130.python-list@python.org>
In reply to#72494
On Tue, Jun 3, 2014 at 9:05 PM, Burak Arslan <burak.arslan@arskom.com.tr> wrote:
> On 06/03/14 12:30, Chris Angelico wrote:
>> Write me a purely nonblocking
>> web site concept that can handle a million concurrent connections,
>> where each one requires one query against the database, and one in a
>> hundred of them require five queries which happen atomically.
>
>
> I don't see why that can't be done. Twisted has everyting I can think of
> except database bits (adb runs on threads), and I got txpostgres[1]
> running in production, it seems quite robust so far. what else are we
> missing?
>
> [1]: https://pypi.python.org/pypi/txpostgres

I never said it can't be done. My objection was to Marko's reiterated
statement that asynchronous coding is somehow massively cleaner than
threading; my argument is that threading is often significantly
cleaner than async, and that at worst, they're about the same (because
they're dealing with exactly the same problems).

ChrisA

[toc] | [prev] | [next] | [standalone]


#72594

FromBurak Arslan <burak.arslan@arskom.com.tr>
Date2014-06-04 08:10 +0300
Message-ID<mailman.10678.1401858648.18130.python-list@python.org>
In reply to#72494
On 03/06/14 14:57, Chris Angelico wrote:
> On Tue, Jun 3, 2014 at 9:05 PM, Burak Arslan <burak.arslan@arskom.com.tr> wrote:
>> On 06/03/14 12:30, Chris Angelico wrote:
>>> Write me a purely nonblocking
>>> web site concept that can handle a million concurrent connections,
>>> where each one requires one query against the database, and one in a
>>> hundred of them require five queries which happen atomically.
>>
>> I don't see why that can't be done. Twisted has everyting I can think of
>> except database bits (adb runs on threads), and I got txpostgres[1]
>> running in production, it seems quite robust so far. what else are we
>> missing?
>>
>> [1]: https://pypi.python.org/pypi/txpostgres
> I never said it can't be done. My objection was to Marko's reiterated
> statement that asynchronous coding is somehow massively cleaner than
> threading; my argument is that threading is often significantly
> cleaner than async, and that at worst, they're about the same (because
> they're dealing with exactly the same problems).

Ah ok. Well, a couple of years of writing async code, my 
not-so-objective opinion about it is that it forces you to split your 
code into functions, just like Python forces you to indent your code 
properly. This in turn generally helps the quality of the codebase.

If you manage to keep yourself out of the closure hell by not writing 
more and more functions inside one another, I say async code and 
(non-sloppy) blocking code looks almost the same. (which means, I guess, 
that we mostly agree :))

Burak

[toc] | [prev] | [next] | [standalone]


#72609

FromChris Angelico <rosuav@gmail.com>
Date2014-06-04 17:30 +1000
Message-ID<mailman.10687.1401867048.18130.python-list@python.org>
In reply to#72494
On Wed, Jun 4, 2014 at 3:10 PM, Burak Arslan <burak.arslan@arskom.com.tr> wrote:
> Ah ok. Well, a couple of years of writing async code, my not-so-objective
> opinion about it is that it forces you to split your code into functions,
> just like Python forces you to indent your code properly. This in turn
> generally helps the quality of the codebase.

That's entirely possible, but it depends hugely on your
library/framework, then - see earlier comments in this thread about
Node.js and the nightmare of callbacks.

One thing I'm seeing, though, the more different styles of programming
I work with, is that since it's possible to write good code in pretty
much anything (even PHP, and my last boss used that as a
counter-argument to "PHP sucks"), and since a good programmer will
write good code in anything, neither of these is really a good
argument in favour of (or against) a feature/library/framework/style.
Python forces you to indent your code. Fine! But a good programmer
will already indent, and a sloppy programmer isn't forced to be
consistent. (At worst, you just add "if True:" every time you
unexpectedly indent.) To judge the quality of a framework based on
code style, you need to look at a *bad* programmer and what s/he
produces. A bad programmer, with just GOTO and line numbers, will
often produce convoluted code that's completely unreadable; a bad
programmer with a good suite of structured control flow will more
generally stumble to something that's at least mostly clear.

So how does async vs threaded stack up there? A competent programmer
won't have a problem with either model. A mediocre programmer probably
will think about one thing at a time, and will then run into problems.
Threading produces these problems in one set of ways, asyncio produces
problems in another set of ways. Which one would you, as an expert,
prefer to deal with in a junior programmer's code?

ChrisA

[toc] | [prev] | [next] | [standalone]


Page 1 of 2  [1] 2  Next page →

Back to top | Article view | comp.lang.python


csiph-web