Groups > comp.lang.python > #76124 > unrolled thread

Using asyncio workers in a `concurrent.futures` interface

Started by	cool-RR <ram.rachum@gmail.com>
First post	2014-08-12 10:02 -0700
Last post	2014-08-13 00:03 -0600
Articles	6 — 4 participants

Back to article view | Back to comp.lang.python

  Using asyncio workers in a `concurrent.futures` interface cool-RR <ram.rachum@gmail.com> - 2014-08-12 10:02 -0700
    Re: Using asyncio workers in a `concurrent.futures` interface Marko Rauhamaa <marko@pacujo.net> - 2014-08-12 21:31 +0300
    Re: Using asyncio workers in a `concurrent.futures` interface MRAB <python@mrabarnett.plus.com> - 2014-08-12 19:43 +0100
    Re: Using asyncio workers in a `concurrent.futures` interface Ian Kelly <ian.g.kelly@gmail.com> - 2014-08-12 15:31 -0600
      Re: Using asyncio workers in a `concurrent.futures` interface Marko Rauhamaa <marko@pacujo.net> - 2014-08-13 08:03 +0300
        Re: Using asyncio workers in a `concurrent.futures` interface Ian Kelly <ian.g.kelly@gmail.com> - 2014-08-13 00:03 -0600

#76124 — Using asyncio workers in a `concurrent.futures` interface

From	cool-RR <ram.rachum@gmail.com>
Date	2014-08-12 10:02 -0700
Subject	Using asyncio workers in a `concurrent.futures` interface
Message-ID	<d6e2ab63-c9bc-4271-abac-17e32afa4c88@googlegroups.com>

Hello everybody! I have a question.

I have a Django app running on Heroku. I need to run about 100 worker threads there to do uploads/downloads simultaneously. A Heroku Dyno has only 512MB of memory, so I'm reluctant to run 100 worker threads. (I've had Dynos crash from lack of memory when using 6 threads before.)

I heard that the asyncio module is mature and ready for usage, and I was happy because I kept hearing about it in the last year, and I saw Guido's lecture about it. If I understand correctly it would let me run multiple uploads and downloads efficiently in one thread, which would conserve more resources than using threads. (Please correct me if I'm wrong.)

Now, I am a little clueless about the whole way it's built, using coroutines and tricky usage of `yield from`. I figured that since this is eventually a library for concurrency, i.e. doing many tasks at the same time, there will be an API in a style of "Here are 100 tasks for you to do concurrently, let me know when they're done."

I looked at the asyncio documentation page and saw that it does mention futures and executors, which is my favorite interface for doing concurrency. I was happy and I skimmed the docs. But, I couldn't find a simple way to use these. I don't want to learn how to define coroutines and use `yield from` to switch between them. (I use `yield from` regularly and fully understand how it works, I just don't write my programs that way.)

What I'm expecting is something like this:

download_file = lambda url: requests.get(url).content
urls = ['http://google.com/file1.jpg', 'http://google.com/file2.jpg', 'http://google.com/file3.jpg'] # etc.

with AsyncIOExecutor() as asyncio_executor:
files = asyncio_executor.map(download_file, urls)

And that's it, no coroutines, no `yield from`. Since, if I understand correctly, asyncio requires a mainloop, it would make sense for the AsyncIOExecutor to have a thread of its own in which it could run its mainloop.

Is this possible? Did someone implement this?

Thanks for your help,
Ram.

[toc] | [next] | [standalone]

#76129

From	Marko Rauhamaa <marko@pacujo.net>
Date	2014-08-12 21:31 +0300
Message-ID	<87egwlef15.fsf@elektro.pacujo.net>
In reply to	#76124

cool-RR <ram.rachum@gmail.com>:

> If I understand correctly [asyncio] would let me run multiple uploads
> and downloads efficiently in one thread, which would conserve more
> resources than using threads.

Asyncio does make it convenient to multiplex event on one or more
threads. Threads have their uses (exploiting multiple CPUs), only you
shouldn't associate threads with every state machine, IMO. Asyncio
allows you to separate your state machines from your threads. For
example, you might have 1,000 state machines (for 1,000 connections) but
only 8 threads for 4 CPUs.

> Now, I am a little clueless about the whole way it's built, using
> coroutines and tricky usage of `yield from`.

Asyncio actively supports (at least) *two* multiplexing models:
callbacks (aka listeners or handlers) and coroutines. Programming with
callbacks involves storing the state explicitly in a state machine
object. The callbacks are supposed to never block but return
immediately. That model is a long-time favorite of many, including me.

The coroutine model is highly analogous with the multithreading model in
that you store the state implicitly in the code. Multithreading marks
the state with blocking function calls. Coroutines mark the state with
"yield from" statements. Otherwise, a multithreading implementation will
have very much the shape of a coroutine implementation.

The funky aspect of the coroutines is the way they "abuse" the "yield
from" statement, whose original purpose is to pass a series of results
from a generator to the caller by chaining lower-level generators.
Coroutines employ a "trick:" the "yield from" statement does not pass
any meaningful results. Instead, the statement is used to make the
generator/coroutine pseudoblock and switch context within the same
thread.

There's one crucial advantage coroutines have over threads: you can
multiplex events. If your thread is blocked on, say, reading a byte, you
can't tell it to stop waiting and do something else instead. Coroutines
can be made to wait on alternative stimuli.

> I looked at the asyncio documentation page and saw that it does
> mention futures and executors, which is my favorite interface for
> doing concurrency.

They are there.

My favorite model is the Actor Model, where objects communicate with
each other and the outside world through asynchornous stimuli of sorts.
The thinking goes: something happened, so how do I react to it. The
actor model just needs a class with the member "self.state", which
contains the name of the internal state of the object. Then the objects
callback methods receive the inputs send out messages and adjust the
state.

Both the actor model and the coroutines produce somewhat messy code that
somewhat hard to get right. However, that's a reflection of how messy
the reality is. Any attempts to pave it over will ultimately result in
more trouble.

>     download_file = lambda url: requests.get(url).content
>     urls = ['http://google.com/file1.jpg',
>     'http://google.com/file2.jpg', 'http://google.com/file3.jpg'] #
>     etc.
>     
>     with AsyncIOExecutor() as asyncio_executor:
>         files = asyncio_executor.map(download_file, urls)
>
> And that's it, no coroutines, no `yield from`.

I'm not quite following you. However, one feature of the coroutines is
that you must use "yield from" everywhere. You can't delegate it to a
subroutine and forget about it.

That's my main problem with coroutines. The simple function call syntax:

    y = f(x)

is replaced with the weird:

    y = yield from f(x)

> Since, if I understand correctly, asyncio requires a mainloop, it
> would make sense for the AsyncIOExecutor to have a thread of its own
> in which it could run its mainloop.
>
> Is this possible? Did someone implement this? 

Unfortunately, I have not. I have just implemented a toy example to
satisfy my curiosity. All of my code uses callbacks and select.epoll()
-- to great success.

Here's my exploration in the classic dining philosophers problem:

   <URL: http://pacujo.net/marko/philosophers.py>

I have "fixed" the faulty protocol by having an assistant break the
deadlock by occasionally nagging the philosophers to drop everything and
get back to thinking.

The main accomplishment of the exercise was that I convinced myself the
coroutines can be used for serious things as coroutines can multiplex
stimuli with asyncio.wait(..., return_when=asyncio.FIRST_COMPLETED).


Marko

[toc] | [prev] | [next] | [standalone]

#76131

From	MRAB <python@mrabarnett.plus.com>
Date	2014-08-12 19:43 +0100
Message-ID	<mailman.12891.1407869004.18130.python-list@python.org>
In reply to	#76124

On 2014-08-12 18:02, cool-RR wrote:
> Hello everybody! I have a question.
>
> I have a Django app running on Heroku. I need to run about 100 worker
> threads there to do uploads/downloads simultaneously. A Heroku Dyno
> has only 512MB of memory, so I'm reluctant to run 100 worker threads.
> (I've had Dynos crash from lack of memory when using 6 threads
> before.)
>
> I heard that the asyncio module is mature and ready for usage, and I
> was happy because I kept hearing about it in the last year, and I saw
> Guido's lecture about it. If I understand correctly it would let me
> run multiple uploads and downloads efficiently in one thread, which
> would conserve more resources than using threads. (Please correct me
> if I'm wrong.)
>
> Now, I am a little clueless about the whole way it's built, using
> coroutines and tricky usage of `yield from`. I figured that since
> this is eventually a library for concurrency, i.e. doing many tasks
> at the same time, there will be an API in a style of "Here are 100
> tasks for you to do concurrently, let me know when they're done."
>
> I looked at the asyncio documentation page and saw that it does
> mention futures and executors, which is my favorite interface for
> doing concurrency. I was happy and I skimmed the docs. But, I
> couldn't find a simple way to use these. I don't want to learn how to
> define coroutines and use `yield from` to switch between them. (I use
> `yield from` regularly and fully understand how it works, I just
> don't write my programs that way.)
>
> What I'm expecting is something like this:
>
>      download_file = lambda url: requests.get(url).content
>      urls = ['http://google.com/file1.jpg', 'http://google.com/file2.jpg', 'http://google.com/file3.jpg'] # etc.
>
>      with AsyncIOExecutor() as asyncio_executor:
>          files = asyncio_executor.map(download_file, urls)
>
> And that's it, no coroutines, no `yield from`. Since, if I understand
> correctly, asyncio requires a mainloop, it would make sense for the
> AsyncIOExecutor to have a thread of its own in which it could run its
> mainloop.
>
> Is this possible? Did someone implement this?
>
Do you really need to upload/download that many at the same time? I'd
put them into a queue and have only a few of uploading/downloading at
any time.

[toc] | [prev] | [next] | [standalone]

#76152

From	Ian Kelly <ian.g.kelly@gmail.com>
Date	2014-08-12 15:31 -0600
Message-ID	<mailman.12901.1407879145.18130.python-list@python.org>
In reply to	#76124

On Tue, Aug 12, 2014 at 11:02 AM, cool-RR <ram.rachum@gmail.com> wrote:
> And that's it, no coroutines, no `yield from`. Since, if I understand correctly, asyncio requires a mainloop, it would make sense for the AsyncIOExecutor to have a thread of its own in which it could run its mainloop.

I think that putting the event loop in a separate thread would be
necessary if the intention is that the executor be invoked from
outside. I'm not aware of this work having been done, but it sounds
perfectly feasible.

[toc] | [prev] | [next] | [standalone]

#76169

From	Marko Rauhamaa <marko@pacujo.net>
Date	2014-08-13 08:03 +0300
Message-ID	<8738d1dlru.fsf@elektro.pacujo.net>
In reply to	#76152

Ian Kelly <ian.g.kelly@gmail.com>:

> On Tue, Aug 12, 2014 at 11:02 AM, cool-RR <ram.rachum@gmail.com> wrote:
>> And that's it, no coroutines, no `yield from`. Since, if I understand
>> correctly, asyncio requires a mainloop, it would make sense for the
>> AsyncIOExecutor to have a thread of its own in which it could run its
>> mainloop.
>
> I think that putting the event loop in a separate thread would be
> necessary if the intention is that the executor be invoked from
> outside. I'm not aware of this work having been done, but it sounds
> perfectly feasible.

Multithreading will require normal locking to protect critical sections.
Care must be taken to never yield while holding a threading lock.


Marko

[toc] | [prev] | [next] | [standalone]

#76170

From	Ian Kelly <ian.g.kelly@gmail.com>
Date	2014-08-13 00:03 -0600
Message-ID	<mailman.12908.1407909872.18130.python-list@python.org>
In reply to	#76169

[Multipart message — attachments visible in raw view] — view raw

On Tue, Aug 12, 2014 at 11:03 PM, Marko Rauhamaa <marko@pacujo.net> wrote:
>
> Ian Kelly <ian.g.kelly@gmail.com>:
>
> > On Tue, Aug 12, 2014 at 11:02 AM, cool-RR <ram.rachum@gmail.com> wrote:
> >> And that's it, no coroutines, no `yield from`. Since, if I understand
> >> correctly, asyncio requires a mainloop, it would make sense for the
> >> AsyncIOExecutor to have a thread of its own in which it could run its
> >> mainloop.
> >
> > I think that putting the event loop in a separate thread would be
> > necessary if the intention is that the executor be invoked from
> > outside. I'm not aware of this work having been done, but it sounds
> > perfectly feasible.
>
> Multithreading will require normal locking to protect critical sections.
> Care must be taken to never yield while holding a threading lock.

You can use event_loop.call_soon_threadsafe() to schedule tasks and
callbacks on the event loop, and Queue objects to pass futures back to the
caller. Apart from whatever synchronization those use internally, I don't
think any locking would be needed.

However, it would of course require that the "functions" passed in to the
executor be coroutines in actuality. You just can't pass a blocking
function into an asynchronous framework and expect it to magically not
block.

[toc] | [prev] | [standalone]

csiph-web

Using asyncio workers in a `concurrent.futures` interface

Contents

#76124 — Using asyncio workers in a `concurrent.futures` interface

#76129

#76131

#76152

#76169

#76170