Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsreader4.netcologne.de!news.netcologne.de!bcyclone01.am1.xlned.com!bcyclone01.am1.xlned.com!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
Date: Thu, 26 Feb 2015 22:04:30 +1030
Subject: asyncio POLLHUP question
From: Chris Laws <clawsicus@gmail.com>
To: python-list@python.org
Content-Type: multipart/alternative; boundary=047d7b3a857ca21933050ffc226e
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.19253.1424950477.18130.python-list@python.org>
Lines: 79
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:86493

--047d7b3a857ca21933050ffc226e
Content-Type: text/plain; charset=UTF-8

I have a system scenario where thousands of applications are running and
via a service discovery mechanism they all get notified that a service they
are all interesting in has come online. They all attempt to connect a TCP
socket to the service. This happen virtually instantly.

The problem that I see is that many of the applications that try to connect
to the server get themselves into a state where they are consuming a lot of
CPU.

I am using Python 3.4.2, asyncio and have set the server backlog set to
4000 in an effort to accomodate the connection request backlog. I am
actually using an event loop from aiozmq (but no ZMQ sockets in this
scenaio) but under the covers this is just using epoll so it should really
be the same as using the DefaultSelector.

Using strace on the apps exhibiting issues I see that a socket is
continuously triggering a POLLERR|POLLHUP event. This is the cause of the
large CPU usage. The socket is the one that was attempting to connect to
the new service that was just brought up.

I am guessing that the POLLHUP is caused by the server having issues
processing the volume of connect requests.

I think I need to drop/close the socket causing the POLLHUP. However, from
looking through the asyncio source code I don't see how I can do that from
within the _selector.select() or _process_events() functions with only the
knowledge of which fd is causing the issue.

How do poll errors propagate up from the select loop?

I can potentially unregister the fd but I don't think this will trigger the
transport/protocol getting closed (as far as I can tell) which prevents my
normal error handling scenarios from attempting to reconnect to the
service. The asyncio select functions seem to ignore events other than
EVENT_READ and EVENT_WRITE.

Any help would be appreciated.

Regards,
Chris

--047d7b3a857ca21933050ffc226e
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">I have a system scenario where thousands of applications a=
re running and via a service discovery mechanism they all get notified that=
 a service they are all interesting in has come online. They all attempt to=
 connect a TCP socket to the service. This happen virtually instantly.<div>=
<br></div><div>The problem that I see is that many of the applications that=
 try to connect to the server get themselves into a state where they are co=
nsuming a lot of CPU.<div><br></div><div>I am using Python 3.4.2, asyncio a=
nd have set the server backlog set to 4000 in an effort to accomodate the c=
onnection request backlog. I am actually using an event loop from aiozmq (b=
ut no ZMQ sockets in this scenaio) but under the covers this is just using =
epoll so it should really be the same as using the DefaultSelector.</div><d=
iv><br></div><div>Using strace on the apps exhibiting issues I see that a s=
ocket is continuously triggering a POLLERR|POLLHUP event. This is the cause=
 of the large CPU usage. The socket is the one that was attempting to conne=
ct to the new service that was just brought up.</div><div><br></div><div>I =
am guessing that the POLLHUP is caused by the server having issues processi=
ng the volume of connect requests.</div><div><br></div><div>I think I need =
to drop/close the socket causing the POLLHUP. However, from looking through=
 the asyncio source code I don&#39;t see how I can do that from within the =
_selector.select() or _process_events() functions with only the knowledge o=
f which fd is causing the issue.=C2=A0</div><div><br></div><div>How do poll=
 errors propagate up from the select loop?</div><div><br></div><div>I can p=
otentially unregister the fd but I don&#39;t think this will trigger the tr=
ansport/protocol getting closed (as far as I can tell) which prevents my no=
rmal error handling scenarios from attempting to reconnect to the service. =
The asyncio select functions seem to ignore events other than EVENT_READ an=
d EVENT_WRITE.</div></div><div><br></div><div>Any help would be appreciated=
.</div><div><br></div><div>Regards,</div><div>Chris</div><div><br></div></d=
iv>

--047d7b3a857ca21933050ffc226e--