Groups > comp.lang.python > #38625 > unrolled thread

Python recv loop

Started by	Ihsan Junaidi Ibrahim <ihsan@grep.my>
First post	2013-02-11 08:48 +0800
Last post	2013-02-12 03:09 +0000
Articles	9 — 4 participants

Back to article view | Back to comp.lang.python

  Python recv loop Ihsan Junaidi Ibrahim <ihsan@grep.my> - 2013-02-11 08:48 +0800
    Re: Python recv loop Roy Smith <roy@panix.com> - 2013-02-10 21:24 -0500
      Re: Python recv loop Ihsan Junaidi Ibrahim <ihsan@grep.my> - 2013-02-11 22:56 +0800
        Re: Python recv loop Roy Smith <roy@panix.com> - 2013-02-11 21:44 -0500
      Re: Python recv loop MRAB <python@mrabarnett.plus.com> - 2013-02-11 15:11 +0000
      Re: Python recv loop Chris Angelico <rosuav@gmail.com> - 2013-02-12 02:24 +1100
      Re: Python recv loop Ihsan Junaidi Ibrahim <ihsan@grep.my> - 2013-02-12 09:41 +0800
      Re: Python recv loop Chris Angelico <rosuav@gmail.com> - 2013-02-12 13:20 +1100
      Re: Python recv loop MRAB <python@mrabarnett.plus.com> - 2013-02-12 03:09 +0000

#38625 — Python recv loop

From	Ihsan Junaidi Ibrahim <ihsan@grep.my>
Date	2013-02-11 08:48 +0800
Subject	Python recv loop
Message-ID	<mailman.1612.1360544258.2939.python-list@python.org>

Hi,

I'm implementing a python client connecting to a C-backend server and am currently stuck to as to how to proceed with receiving variable-length byte stream coming in from the server.

I have coded the first 4 bytes (in hexadecimal) of message coming in from the server to specify the length of the message payload i.e. 0xad{...}

I've managed to receive and translate the message length until I reach my second recv which I readjusted the buffer size to include the new message length.

However that failed and recv received 0 bytes. I implemented the same algorithm on the server side using C and it work so appreciate if you can help me on this.

# receive message length
    print 'receiving data'
    mlen = sock.recv(4)
    try:
        nbuf = int(mlen, 16)
    except ValueError as e:
        print 'invalid length type'
        return -1

    while True:
        buf = sock.recv(nbuf)

        if not buf:
            break

    slen = len(buf)
    str = "{0} bytes received: {1}".format(slen, buf)
    print str

[toc] | [next] | [standalone]

#38639

From	Roy Smith <roy@panix.com>
Date	2013-02-10 21:24 -0500
Message-ID	<roy-CC4632.21243210022013@news.panix.com>
In reply to	#38625

In article <mailman.1612.1360544258.2939.python-list@python.org>,
 Ihsan Junaidi Ibrahim <ihsan@grep.my> wrote:

> I'm implementing a python client connecting to a C-backend server and am 
> currently stuck to as to how to proceed with receiving variable-length byte 
> stream coming in from the server.
> 
> I have coded the first 4 bytes (in hexadecimal) of message coming in from the 
> server to specify the length of the message payload i.e. 0xad{...}

Is this server that you're talking to something that you have control 
over, i.e. are you stuck with this protocol?  Given a choice, I'd go 
with something like JSON, for which pre-existing libraries for every 
language under the sun.

But, let's assume for the moment that you're stuck with this 
length-value encoding.  OK, but it's going to be more complicated than 
you think.  [I assume we're talking TCP here?]

Carefully read the documentation for socket.recv():

> socket.recv(bufsize[, flags]) [...] The maximum amount of data to be received 
> at once is specified by bufsize. 

Linger on the word "maximum", and try to grok the fullness of of how 
annoying that can be.  What it means is that if the other side sent 120 
bytes (octets), recv() might return all 120 at once, or it might return 
them one at a time, or anything in between.

So, what you need to do is call recv() repeatedly in a loop, each time 
passing it a value for bufsize which represents the amount left in the 
message (i.e. the original message length parsed earlier minus all the 
little bits and pieces that have been read so far).

Keep in mind, you also need to do this when you recv() the first 4 
octets, which make up the length field.  What you've got, recv(4), will 
work MOST of the time, but it's perfectly legal for recv() to return a 
short read.  You can't predict how fragmentation and retry timeouts and 
all sorts of low-level crud will cause your message boundaries to get 
scrambled.

> # receive message length
>     print 'receiving data'
>     mlen = sock.recv(4)
>     try:
>         nbuf = int(mlen, 16)
>     except ValueError as e:
>         print 'invalid length type'
>         return -1
> 
>     while True:
>         buf = sock.recv(nbuf)
> 
>         if not buf:
>             break
> 
>     slen = len(buf)
>     str = "{0} bytes received: {1}".format(slen, buf)
>     print str

Do you actually *know* what the value of nbuf is?  Is it possible that 
(somehow) it's 0?  You should print (log, whatever), the value of nbuf, 
just to make sure.

And, once you'e got all this working, tear it all out and convert to 
using something sane like JSON.  Let somebody else worry about all the 
horrible details.

[toc] | [prev] | [next] | [standalone]

#38689

From	Ihsan Junaidi Ibrahim <ihsan@grep.my>
Date	2013-02-11 22:56 +0800
Message-ID	<mailman.1655.1360594595.2939.python-list@python.org>
In reply to	#38639

Hi Roy,

On Feb 11, 2013, at 10:24 AM, Roy Smith <roy@panix.com> wrote:
> 
> Is this server that you're talking to something that you have control 
> over, i.e. are you stuck with this protocol?  Given a choice, I'd go 
> with something like JSON, for which pre-existing libraries for every 
> language under the sun.
> 
I'm running JSON for my application messaging protocol but with JSON and python default unordered dict,
there's no guarantee if I put in the length key in the JSON message, it will be placed on the first bytes hence
why it was designed for a fixed 4-byte at the start of the message to indicate the message length.

Beyond the 4-bytes it is all JSON.

but if you have a better idea, i would certainly welcome it.

> Do you actually *know* what the value of nbuf is?  Is it possible that 
> (somehow) it's 0?  You should print (log, whatever), the value of nbuf, 
> just to make sure.

nbuf is printing the right bytes amount, I removed the print statement before I made the first post.

So to clarify, I added a print statement between the first recv and the second.

{"msgver": "1.0", "msgid": "200", "subcode": "100", "appver": "1.0", "appid": "1.0", "data": {"1": "igb0", "2": "igb1", "ifcnt": "2"}}
connected to misty:8080
sending data
138 bytes sent: 0x86{"msgver": "1.0", "msgid": "200", "subcode": "100", "appver": "1.0", "appid": "1.0", "data": {"1": "igb0", "2": "igb1", "ifcnt": "2"}}
receiving data
message length is 188
0 bytes received:

So the subsequent recv() call will be readjusted with 188 bytes buffer size so theoretically, recv shouldn't return 0.

The same logic that I used to send to the server from the python client that the server will readjust the second recv() call based on the length information. On this 2nd recv() call the server is able to obtain the rest of the messages.

[toc] | [prev] | [next] | [standalone]

#38716

From	Roy Smith <roy@panix.com>
Date	2013-02-11 21:44 -0500
Message-ID	<roy-016322.21444811022013@news.panix.com>
In reply to	#38689

In article <mailman.1655.1360594595.2939.python-list@python.org>,
 Ihsan Junaidi Ibrahim <ihsan@grep.my> wrote:

> I'm running JSON for my application messaging protocol but with JSON and 
> python default unordered dict,
> there's no guarantee if I put in the length key in the JSON message, it will 
> be placed on the first bytes hence
> why it was designed for a fixed 4-byte at the start of the message to 
> indicate the message length.
> 
> Beyond the 4-bytes it is all JSON.

I'm confused.  It sounds like you're making things way more complicated 
than they have to be.  Can you give us an example of an actual data 
message?

[toc] | [prev] | [next] | [standalone]

#38693

From	MRAB <python@mrabarnett.plus.com>
Date	2013-02-11 15:11 +0000
Message-ID	<mailman.1658.1360595493.2939.python-list@python.org>
In reply to	#38639

On 2013-02-11 14:56, Ihsan Junaidi Ibrahim wrote:
> Hi Roy,
>
> On Feb 11, 2013, at 10:24 AM, Roy Smith <roy@panix.com> wrote:
>>
>> Is this server that you're talking to something that you have control
>> over, i.e. are you stuck with this protocol?  Given a choice, I'd go
>> with something like JSON, for which pre-existing libraries for every
>> language under the sun.
>>
> I'm running JSON for my application messaging protocol but with JSON and python default unordered dict,
> there's no guarantee if I put in the length key in the JSON message, it will be placed on the first bytes hence
> why it was designed for a fixed 4-byte at the start of the message to indicate the message length.
>
> Beyond the 4-bytes it is all JSON.
>
> but if you have a better idea, i would certainly welcome it.
>
I probably wouldn't make it fixed length. I'd have the length in
decimal followed by, say, "\n".

>> Do you actually *know* what the value of nbuf is?  Is it possible that
>> (somehow) it's 0?  You should print (log, whatever), the value of nbuf,
>> just to make sure.
>
> nbuf is printing the right bytes amount, I removed the print statement before I made the first post.
>
> So to clarify, I added a print statement between the first recv and the second.
>
> {"msgver": "1.0", "msgid": "200", "subcode": "100", "appver": "1.0", "appid": "1.0", "data": {"1": "igb0", "2": "igb1", "ifcnt": "2"}}
> connected to misty:8080
> sending data
> 138 bytes sent: 0x86{"msgver": "1.0", "msgid": "200", "subcode": "100", "appver": "1.0", "appid": "1.0", "data": {"1": "igb0", "2": "igb1", "ifcnt": "2"}}
> receiving data
> message length is 188
> 0 bytes received:
>
> So the subsequent recv() call will be readjusted with 188 bytes buffer size so theoretically, recv shouldn't return 0.
>
> The same logic that I used to send to the server from the python client that the server will readjust the second recv() call based on the length information. On this 2nd recv() call the server is able to obtain the rest of the messages.
>

[toc] | [prev] | [next] | [standalone]

#38695

From	Chris Angelico <rosuav@gmail.com>
Date	2013-02-12 02:24 +1100
Message-ID	<mailman.1660.1360596251.2939.python-list@python.org>
In reply to	#38639

On Tue, Feb 12, 2013 at 2:11 AM, MRAB <python@mrabarnett.plus.com> wrote:
> I probably wouldn't make it fixed length. I'd have the length in
> decimal followed by, say, "\n".

Or even "followed by any non-digit". Chances are your JSON data begins
with a non-digit, so you'd just have to insert a space in the event
that you're JSON-encoding a flat integer. (Which might not ever
happen, if you know that your data will always be an object.)

ChrisA

[toc] | [prev] | [next] | [standalone]

#38714

From	Ihsan Junaidi Ibrahim <ihsan@grep.my>
Date	2013-02-12 09:41 +0800
Message-ID	<mailman.1675.1360633285.2939.python-list@python.org>
In reply to	#38639

On Feb 11, 2013, at 11:24 PM, Chris Angelico <rosuav@gmail.com> wrote:

> On Tue, Feb 12, 2013 at 2:11 AM, MRAB <python@mrabarnett.plus.com> wrote:
>> I probably wouldn't make it fixed length. I'd have the length in
>> decimal followed by, say, "\n".
> 
> Or even "followed by any non-digit". Chances are your JSON data begins
> with a non-digit, so you'd just have to insert a space in the event
> that you're JSON-encoding a flat integer. (Which might not ever
> happen, if you know that your data will always be an object.)
> 
> ChrisA

So on the first recv() call, I set the buffer at 1 character and I iterate over single character until a non-digit character
is encountered?

[toc] | [prev] | [next] | [standalone]

#38715

From	Chris Angelico <rosuav@gmail.com>
Date	2013-02-12 13:20 +1100
Message-ID	<mailman.1676.1360635633.2939.python-list@python.org>
In reply to	#38639

On Tue, Feb 12, 2013 at 12:41 PM, Ihsan Junaidi Ibrahim <ihsan@grep.my> wrote:
>
> On Feb 11, 2013, at 11:24 PM, Chris Angelico <rosuav@gmail.com> wrote:
>
>> On Tue, Feb 12, 2013 at 2:11 AM, MRAB <python@mrabarnett.plus.com> wrote:
>>> I probably wouldn't make it fixed length. I'd have the length in
>>> decimal followed by, say, "\n".
>>
>> Or even "followed by any non-digit". Chances are your JSON data begins
>> with a non-digit, so you'd just have to insert a space in the event
>> that you're JSON-encoding a flat integer. (Which might not ever
>> happen, if you know that your data will always be an object.)
>>
>> ChrisA
>
> So on the first recv() call, I set the buffer at 1 character and I iterate over single character until a non-digit character
> is encountered?

More efficient would be to guess that it'll be, say, 10 bytes, and
then retain any excess for your JSON read loop. But you'd need to sort
that out between the halves of your code.

ChrisA

[toc] | [prev] | [next] | [standalone]

#38718

From	MRAB <python@mrabarnett.plus.com>
Date	2013-02-12 03:09 +0000
Message-ID	<mailman.1678.1360638583.2939.python-list@python.org>
In reply to	#38639

On 2013-02-12 02:20, Chris Angelico wrote:
> On Tue, Feb 12, 2013 at 12:41 PM, Ihsan Junaidi Ibrahim <ihsan@grep.my> wrote:
>>
>> On Feb 11, 2013, at 11:24 PM, Chris Angelico <rosuav@gmail.com> wrote:
>>
>>> On Tue, Feb 12, 2013 at 2:11 AM, MRAB <python@mrabarnett.plus.com> wrote:
>>>> I probably wouldn't make it fixed length. I'd have the length in
>>>> decimal followed by, say, "\n".
>>>
>>> Or even "followed by any non-digit". Chances are your JSON data begins
>>> with a non-digit, so you'd just have to insert a space in the event
>>> that you're JSON-encoding a flat integer. (Which might not ever
>>> happen, if you know that your data will always be an object.)
>>>
>>> ChrisA
>>
>> So on the first recv() call, I set the buffer at 1 character and I iterate
 >> over single character until a non-digit character is encountered?
>
> More efficient would be to guess that it'll be, say, 10 bytes, and
> then retain any excess for your JSON read loop. But you'd need to sort
> that out between the halves of your code.
>
If the length is always followed by a space then it's easier to split
it off the input:

     buf = sock.recv(10)
     space_pos = buf.find(b" ")
     nbuf = int(buf[ : space_pos])
     buf = buf[space_pos+ 1 : ]

     while len(buf) < nbuf:
         chunk = sock.recv(nbuf - len(buf))
         if not chunk:
             break

         buf += chunk

I'm assuming that:

1. The initial recv returns the length followed by a space. It could,
of course, return fewer bytes (space_pos == -1), so you may need to
recv some more bytes, like what's done later on.

2. At least 10 bytes were sent. Imagine what would happen if the sender
sent b"2 []" immediately followed by b"2 []". The initial recv could
return all of it. In that case you could save the excess until next
time. Alternatively, the sender could guarantee that it would never
send fewer than the 10 bytes, padding with several b" " if necessary.

[toc] | [prev] | [standalone]

csiph-web

Python recv loop

Contents

#38625 — Python recv loop

#38639

#38689

#38716

#38693

#38695

#38714

#38715

#38718