Path: csiph.com!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder1.enfer-du-nord.net!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'else:': 0.03; 'exception': 0.03; 'elif': 0.04; 'log:': 0.04; 'true,': 0.04; 'error:': 0.05; 'skip:" 60': 0.05; "'',": 0.07; 'data:': 0.07; 'false,': 0.07; 'msg': 0.07; 'parameter': 0.07; 'reason,': 0.07; 'skip:$ 30': 0.07; 'wrapped': 0.07; '**kwargs)': 0.09; '0.1': 0.09; 'ioerror:': 0.09; 'loop.': 0.09; 'parameter.': 0.09; 'rows': 0.09; 'runtime': 0.09; 'success:': 0.09; 'way:': 0.09; 'wrong,': 0.09; 'def': 0.10; 'cases': 0.15; 'server,': 0.15; "skip:' 30": 0.15; "'host':": 0.16; '0.001': 0.16; '0.01': 0.16; '10);': 0.16; '67,': 0.16; 'amd64': 0.16; 'cache:': 0.16; 'caching': 0.16; 'closed")': 0.16; 'dropping': 0.16; 'firebug': 0.16; 'linux).': 0.16; 'messages)': 0.16; 'modified.': 0.16; 'monitors': 0.16; 'poll': 0.16; 'processed.': 0.16; 'sec': 0.16; 'seconds,': 0.16; 'seconds.': 0.16; 'skip:{ 30': 0.16; 'terribly': 0.16; 'time.time()': 0.16; 'timeout:': 0.16; 'type:': 0.16; '});': 0.16; 'copied': 0.17; 'thanks,': 0.18; '(or': 0.18; 'load': 0.19; 'skip:p 30': 0.20; 'skip:" 40': 0.20; 'browsers': 0.22; 'disable': 0.22; 'dropped': 0.22; 'jquery': 0.22; 'sends': 0.22; 'tells': 0.22; 'this:': 0.23; 'raise': 0.24; 'second': 0.24; 'connected': 0.24; 'linux': 0.24; 'header:User-Agent:1': 0.26; '(most': 0.27; 'possible,': 0.27; 'see,': 0.27; 'skip:" 50': 0.27; 'post': 0.28; '"no': 0.29; '20%': 0.29; 'cpu': 0.29; 'email name:': 0.29; 'queue': 0.29; 'reduced': 0.29; 'starts': 0.29; 'skip:_ 10': 0.29; 'connection': 0.30; "skip:' 10": 0.30; 'checks': 0.30; 'url:non-standard http port': 0.30; 'function': 0.30; 'figure': 0.30; 'seconds': 0.30; 'point': 0.31; 'gets': 0.32; 'problem.': 0.32; 'file': 0.32; "skip:' 20": 0.32; 'could': 0.32; 'skip:_ 30': 0.32; 'print': 0.32; 'goes': 0.33; 'skip:s 30': 0.33; 'messages,': 0.33; 'traceback': 0.33; 'to:addr:python-list': 0.33; 'another': 0.33; 'server': 0.35; 'skip:b 50': 0.35; 'open': 0.35; 'doing': 0.35; 'received:192.168.0': 0.35; 'sometimes': 0.35; 'something': 0.35; 'there': 0.35; 'but': 0.36; 'method': 0.36; 'anything': 0.36; 'client': 0.36; 'should': 0.36; 'skip:t 40': 0.37; 'does': 0.37; 'two': 0.37; 'time,': 0.62; 'url:ip addr': 0.62; 'more': 0.63; 'within': 0.64; 'here': 0.65; 'total': 0.65; 'jul': 0.65; 'url:0': 0.67; 'operated': 0.71; '1997': 0.84; '832': 0.84; "browser's": 0.84; 'catch(err)': 0.84; 'inactive,': 0.84; 'received:192.168.0.101': 0.84; 'received:192.168.13': 0.84; 'received:62.179': 0.84; 'received:62.179.121': 0.84; 'received:upcmail.net': 0.84; 'sec.': 0.84; 'why?': 0.84; 'checks.': 0.91; 'faster.': 0.91; 'outgoing': 0.91; 'response,': 0.93; 'clients,': 0.95; 'instant': 0.98 X-SourceIP: 89.134.225.226 Date: Mon, 03 Sep 2012 08:31:43 +0200 From: Laszlo Nagy User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120827 Thunderbird/15.0 MIME-Version: 1.0 To: python-list@python.org Subject: tornado.web ioloop add_timeout eats CPU Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 175 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1346653953 news.xs4all.nl 6978 [2001:888:2000:d::a6]:41639 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:28341 JavaScript clients (browsers) do long poll requests. Each request can take up to 10 seconds before the server responds. On the server side, every client has a queue of messages that needs to be sent to the client. When the long poll request comes in, the server checks if there are messages to be sent out. If there are no outgoing messages, then it does not finish the response, but calls ioloop's add_timeout method for doing further checks. After 10 seconds (if there are no new messages) the server returns 304/not modified. If there is a message, then it is sent back to the client as fast as possible, and the client comes back with another long poll immediately. These message queues are used for UI updates and also for instant messaging. UI must be responsive. For this reason, any message in the outgoing queue should be sent out to the client within 0.1 seconds. Sometimes (rarely) lots of messages arrive quickly, and in those cases it would be good to send them out even faster. What I did is that in the first 0.1 seconds, I call add_timeout with 0.01 seconds. So if the outgoing queue is full of messages, then they are delivered quickly. After 0.1 seconds lapsed, add_timeout is called with 0.1 sec parameter. So the server load is reduced because most clients are inactive, and they are going to get callbacks in every 0.1 sec. Here are the two most important methods of my request handler: @tornado.web.asynchronous def post(self): """Handle POST requests.""" # Disable caching self.set_header("Cache-Control","no-cache, must-revalidate") self.set_header("Expires","Mon, 26 Jul 1997 05:00:00 GMT") self.poll_start = time.time() action = self.get_argument("action") if action=="poll": self.poll() elif action=="message": self.process_incoming(self.get_argument("message")) else: self.set_status(400) self.finish() def poll(self): """Handle POLL request for the browser's message loop. This method monitors the outgoing message queue, and sends new messages to the browser when they come in (or until self.poll_interval seconds elapsed).""" poll_elapsed = time.time() - self.poll_start if poll_elapsed<0.1: poll_step = 0.01 else: poll_step = 0.1 if poll_elapsed:1() 1135775 832.283 0.001 832.283 0.001 {method 'poll' of 'select.epoll' objects} I have copied out the two relevant rows only. As you can see, total runtime was 845 seconds, and 832 seconds were spent in "epoll". Apparently, CPU load goes up linearly as I connect more clients. It means that 50 connected clients would do 20% CPU load. Which is ridiculous, because they don't do anything but wait for messages to be processed. Something terribly wrong, but I cannot figure out what? Actually I could not try this with 50 clients. If I open 15 clients, then the server starts dropping connections. (Tried from Firefox and Chrome too.) If I change the poll() method this way: else: print "No messages after %.2f seconds"%poll_elapsed self.set_status(304) self.finish() then I see this in the log: No messages after 10.01 seconds ERROR:root:Uncaught exception POST /client (127.0.0.1) HTTPRequest(protocol='http', host='127.0.0.1:8888', method='POST', uri='/client', version='HTTP/1.1', remote_ip='127.0.0.1', body='_xsrf=df157469a62142d7b28c5a4880dd8478&action=poll', headers={'Referer': 'http://127.0.0.1:8888/', 'Content-Length': '50', 'Accept-Language': 'en-us;q=0.8,en;q=0.5', 'Accept-Encoding': 'gzip, deflate', 'Host': '127.0.0.1:8888', 'Accept': '*/*', 'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:15.0) Gecko/20100101 Firefox/15.0', 'Connection': 'keep-alive', 'X-Requested-With': 'XMLHttpRequest', 'Pragma': 'no-cache', 'Cache-Control': 'no-cache', 'Cookie': 'sid="MS1acHd5b3V1WHFOQU1BbTVmSXJEeVhkLys=|1346652787|e045d786fdb89b73220a2c77ef89572d0c16901e"; _xsrf=df157469a62142d7b28c5a4880dd8478; xsfr=df157469a62142d7b28c5a4880dd8478', 'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8'}) Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/tornado/stack_context.py", line 183, in wrapped callback(*args, **kwargs) File "/home/gandalf/Python/Projects/test/client.py", line 67, in poll self.finish() File "/usr/lib/python2.7/dist-packages/tornado/web.py", line 641, in finish self.request.finish() File "/usr/lib/python2.7/dist-packages/tornado/httpserver.py", line 411, in finish self.connection.finish() File "/usr/lib/python2.7/dist-packages/tornado/httpserver.py", line 186, in finish self._finish_request() File "/usr/lib/python2.7/dist-packages/tornado/httpserver.py", line 213, in _finish_request self.stream.read_until(b("\r\n\r\n"), self._header_callback) File "/usr/lib/python2.7/dist-packages/tornado/iostream.py", line 151, in read_until self._check_closed() File "/usr/lib/python2.7/dist-packages/tornado/iostream.py", line 493, in _check_closed raise IOError("Stream is closed") IOError: Stream is closed What is even more interesting is that on the client side, Firebug tells me that the connection was dropped after 15 seconds. The JavaScript message loop is operated by a jQuery AJAX call, that has timeout=15000 parameter given: that.messageLoop = function () { var xsfr = that.readCookie("xsfr"); $.ajax({ url: '/client', type: "POST", data: {"_xsrf":xsfr,"action":"poll"}, async: true, cache: false, timeout: 15000, error: function (data) { setTimeout( that.messageLoop , 1000); console.log(data); }, success: function (data) { if (data) { try { eval(data); } catch(err) { $().toastmessage('showErrorToast', err.message); } } setTimeout( that.messageLoop , 10); } }); }; But on the server side, I see that the connection was dropped after 10.01 seconds. If I start 20 clients, then about every second poll request gets dropped. Thanks, Laszlo