Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsreader4.netcologne.de!news.netcologne.de!bcyclone01.am1.xlned.com!bcyclone01.am1.xlned.com!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.039 X-Spam-Evidence: '*H*': 0.92; '*S*': 0.00; 'causing': 0.04; 'socket': 0.07; 'prevents': 0.09; 'propagate': 0.09; 'subject:question': 0.10; 'python': 0.11; 'guessing': 0.16; 'sockets': 0.16; 'tcp': 0.16; 'apps': 0.16; 'ignore': 0.16; '(but': 0.19; 'mechanism': 0.19; 'select': 0.22; 'issue.': 0.22; 'error': 0.23; 'source': 0.25; 'handling': 0.26; 'appreciated.': 0.29; 'chris': 0.29; 'errors': 0.30; 'message-id:@mail.gmail.com': 0.30; 'code': 0.31; 'getting': 0.31; 'discovery': 0.31; 'themselves': 0.32; 'up.': 0.33; 'running': 0.33; 'skip:_ 10': 0.34; 'problem': 0.35; 'knowledge': 0.35; 'connection': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'really': 0.36; 'should': 0.36; 'effort': 0.37; 'server': 0.38; 'to:addr:python-list': 0.38; 'volume': 0.39; 'to:addr:python.org': 0.39; 'how': 0.40; 'event.': 0.60; 'new': 0.61; 'notified': 0.63; 'happen': 0.63; 'within': 0.65; 'covers': 0.68; 'caused': 0.69; 'potentially': 0.81; 'cpu.': 0.84; 'usage.': 0.84; 'exhibiting': 0.91; 'instantly.': 0.91 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=PZUAEMVIQqgCjs7wUzAPPgZCesx/ByMn/PqtJZtcHA0=; b=gjY1EDrhbH8aQfqOIxF3Rz5gtqHnuYPX2w20ManqfPnOZK+wyftsF8bwRuhZ9hn/PF v71enN88o2MARv+CD2JcXHHUKI2bnlv3cWL+xzkLBnN0AVSvQ1iTltcfQGsLq+4CKwy2 +zEL6NEZTmSmwaGIFC24wW0JEEC5aAA0aXS3A3rZjvTRv4OPZr7tJuGKCxYCGSCtRTk0 7JRYlP7xLbX046u/Rx5jZ2ySDXouPFit4H9KOtELJLfQFZXQfp6m9uj/Yocw9U+so2Z8 MTM16q/EdZf93205mFoG6GIBdkyFKgpnN3zVlVYEZJMEdxq4ohfs2CPIXqBiBe0v8vTd DoZA== MIME-Version: 1.0 X-Received: by 10.112.146.66 with SMTP id ta2mr7236628lbb.0.1424950470775; Thu, 26 Feb 2015 03:34:30 -0800 (PST) Date: Thu, 26 Feb 2015 22:04:30 +1030 Subject: asyncio POLLHUP question From: Chris Laws To: python-list@python.org Content-Type: multipart/alternative; boundary=047d7b3a857ca21933050ffc226e X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 79 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1424950477 news.xs4all.nl 2905 [2001:888:2000:d::a6]:46687 X-Complaints-To: abuse@xs4all.nl X-Received-Bytes: 7402 X-Received-Body-CRC: 4168235967 Xref: csiph.com comp.lang.python:86493 --047d7b3a857ca21933050ffc226e Content-Type: text/plain; charset=UTF-8 I have a system scenario where thousands of applications are running and via a service discovery mechanism they all get notified that a service they are all interesting in has come online. They all attempt to connect a TCP socket to the service. This happen virtually instantly. The problem that I see is that many of the applications that try to connect to the server get themselves into a state where they are consuming a lot of CPU. I am using Python 3.4.2, asyncio and have set the server backlog set to 4000 in an effort to accomodate the connection request backlog. I am actually using an event loop from aiozmq (but no ZMQ sockets in this scenaio) but under the covers this is just using epoll so it should really be the same as using the DefaultSelector. Using strace on the apps exhibiting issues I see that a socket is continuously triggering a POLLERR|POLLHUP event. This is the cause of the large CPU usage. The socket is the one that was attempting to connect to the new service that was just brought up. I am guessing that the POLLHUP is caused by the server having issues processing the volume of connect requests. I think I need to drop/close the socket causing the POLLHUP. However, from looking through the asyncio source code I don't see how I can do that from within the _selector.select() or _process_events() functions with only the knowledge of which fd is causing the issue. How do poll errors propagate up from the select loop? I can potentially unregister the fd but I don't think this will trigger the transport/protocol getting closed (as far as I can tell) which prevents my normal error handling scenarios from attempting to reconnect to the service. The asyncio select functions seem to ignore events other than EVENT_READ and EVENT_WRITE. Any help would be appreciated. Regards, Chris --047d7b3a857ca21933050ffc226e Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I have a system scenario where thousands of applications a= re running and via a service discovery mechanism they all get notified that= a service they are all interesting in has come online. They all attempt to= connect a TCP socket to the service. This happen virtually instantly.
=
The problem that I see is that many of the applications that= try to connect to the server get themselves into a state where they are co= nsuming a lot of CPU.

I am using Python 3.4.2, asyncio a= nd have set the server backlog set to 4000 in an effort to accomodate the c= onnection request backlog. I am actually using an event loop from aiozmq (b= ut no ZMQ sockets in this scenaio) but under the covers this is just using = epoll so it should really be the same as using the DefaultSelector.

Using strace on the apps exhibiting issues I see that a s= ocket is continuously triggering a POLLERR|POLLHUP event. This is the cause= of the large CPU usage. The socket is the one that was attempting to conne= ct to the new service that was just brought up.

I = am guessing that the POLLHUP is caused by the server having issues processi= ng the volume of connect requests.

I think I need = to drop/close the socket causing the POLLHUP. However, from looking through= the asyncio source code I don't see how I can do that from within the = _selector.select() or _process_events() functions with only the knowledge o= f which fd is causing the issue.=C2=A0

How do poll= errors propagate up from the select loop?

I can p= otentially unregister the fd but I don't think this will trigger the tr= ansport/protocol getting closed (as far as I can tell) which prevents my no= rmal error handling scenarios from attempting to reconnect to the service. = The asyncio select functions seem to ignore events other than EVENT_READ an= d EVENT_WRITE.

Any help would be appreciated= .

Regards,
Chris

--047d7b3a857ca21933050ffc226e--