Groups > comp.lang.python > #96872 > unrolled thread

Lightwight socket IO wrapper

Started by	"James Harris" <james.harris.1@gmail.com>
First post	2015-09-20 11:22 +0100
Last post	2015-09-22 22:28 +0100
Articles	14 — 8 participants

Back to article view | Back to comp.lang.python

  Lightwight socket IO wrapper "James Harris" <james.harris.1@gmail.com> - 2015-09-20 11:22 +0100
    Re: Lightwight socket IO wrapper Akira Li <4kir4.1i@gmail.com> - 2015-09-20 16:15 +0300
      Re: Lightwight socket IO wrapper "James Harris" <james.harris.1@gmail.com> - 2015-09-20 23:36 +0100
        Re: Lightwight socket IO wrapper Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2015-09-20 20:19 -0400
          Re: Lightwight socket IO wrapper Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2015-09-21 17:46 +1200
          Re: Lightwight socket IO wrapper Jorgen Grahn <grahn+nntp@snipabacken.se> - 2015-09-21 11:25 +0000
          Re: Lightwight socket IO wrapper "James Harris" <james.harris.1@gmail.com> - 2015-09-22 20:45 +0100
            Re: Lightwight socket IO wrapper Random832 <random832@fastmail.com> - 2015-09-22 19:52 -0400
              Re: Lightwight socket IO wrapper Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2015-09-23 12:47 +1200
        Re: Lightwight socket IO wrapper Chris Angelico <rosuav@gmail.com> - 2015-09-21 10:34 +1000
        Re: Lightwight socket IO wrapper Akira Li <4kir4.1i@gmail.com> - 2015-09-21 06:07 +0300
          Re: Lightwight socket IO wrapper "James Harris" <james.harris.1@gmail.com> - 2015-09-22 21:05 +0100
            Re: Lightwight socket IO wrapper Marko Rauhamaa <marko@pacujo.net> - 2015-09-23 00:00 +0300
              Re: Lightwight socket IO wrapper "James Harris" <james.harris.1@gmail.com> - 2015-09-22 22:28 +0100

#96872 — Lightwight socket IO wrapper

From	"James Harris" <james.harris.1@gmail.com>
Date	2015-09-20 11:22 +0100
Subject	Lightwight socket IO wrapper
Message-ID	<mtm18o$9fm$1@dont-email.me>

I guess there have been many attempts to make socket IO easier to handle 
and a good number of those have been in Python.

The trouble with trying to improve something which is already well 
designed (and conciously left as is) is that the so-called improvement 
can become much more complex and overly elaborate. That can apply to the 
initial idea, for sure, but when writing helper or convenience functions 
perhaps it applies more to the temptation to keep adding just a little 
bit extra. The end result can be overly elaborate such as a framework 
which is fine where such is needed but is overkill for simpler 
requirements.

Do you guys have any recommendations of some *lightweight* additions to 
Python socket IO before I write any more of my own? Something built in 
to Python would be much preferred over any modules which have to be 
added. I had in the back of my mind that there was a high-level 
socket-IO library - much as threading was added as a wrapper to the 
basic thread module - but I cannot find anything above socket. Is there 
any?

A current specific to illustrate where basic socket IO is limited: it 
normally provides no guarantees over how many bytes are transferred at a 
time (AFAICS that's true for both streams and datagrams) so the 
delimiting of messages/records needs to be handled by the sender and 
receiver. I do already handle some of this myself but I wondered if 
there was a prebuilt solution that I should be using instead - to save 
me adding just a little bit extra. ;-)

James

[toc] | [next] | [standalone]

#96875

From	Akira Li <4kir4.1i@gmail.com>
Date	2015-09-20 16:15 +0300
Message-ID	<mailman.37.1442754893.21674.python-list@python.org>
In reply to	#96872

"James Harris" <james.harris.1@gmail.com> writes:

> I guess there have been many attempts to make socket IO easier to
> handle and a good number of those have been in Python.
>
> The trouble with trying to improve something which is already well
> designed (and conciously left as is) is that the so-called improvement
> can become much more complex and overly elaborate. That can apply to
> the initial idea, for sure, but when writing helper or convenience
> functions perhaps it applies more to the temptation to keep adding
> just a little bit extra. The end result can be overly elaborate such
> as a framework which is fine where such is needed but is overkill for
> simpler requirements.
>
> Do you guys have any recommendations of some *lightweight* additions
> to Python socket IO before I write any more of my own? Something built
> in to Python would be much preferred over any modules which have to be
> added. I had in the back of my mind that there was a high-level
> socket-IO library - much as threading was added as a wrapper to the
> basic thread module - but I cannot find anything above socket. Is
> there any?

Does ØMQ qualify as lightweight?

> A current specific to illustrate where basic socket IO is limited: it
> normally provides no guarantees over how many bytes are transferred at
> a time (AFAICS that's true for both streams and datagrams) so the
> delimiting of messages/records needs to be handled by the sender and
> receiver. I do already handle some of this myself but I wondered if
> there was a prebuilt solution that I should be using instead - to save
> me adding just a little bit extra. ;-)

There are already convenience functions in stdlib such as
sock.sendall(), sock.sendfile(), socket.create_connection() in addition
to BSD Sockets API.

If you want to extend this list and have specific suggestions; see
  https://docs.python.org/devguide/stdlibchanges.html

Or just describe your current specific issue in more detail here.

[toc] | [prev] | [next] | [standalone]

#96901

From	"James Harris" <james.harris.1@gmail.com>
Date	2015-09-20 23:36 +0100
Message-ID	<mtnc9q$pqs$1@dont-email.me>
In reply to	#96875

"Akira Li" <4kir4.1i@gmail.com> wrote in message 
news:mailman.37.1442754893.21674.python-list@python.org...
> "James Harris" <james.harris.1@gmail.com> writes:
>
>> I guess there have been many attempts to make socket IO easier to
>> handle and a good number of those have been in Python.
>>
>> The trouble with trying to improve something which is already well
>> designed (and conciously left as is) is that the so-called 
>> improvement
>> can become much more complex and overly elaborate. That can apply to
>> the initial idea, for sure, but when writing helper or convenience
>> functions perhaps it applies more to the temptation to keep adding
>> just a little bit extra. The end result can be overly elaborate such
>> as a framework which is fine where such is needed but is overkill for
>> simpler requirements.
>>
>> Do you guys have any recommendations of some *lightweight* additions
>> to Python socket IO before I write any more of my own? Something 
>> built
>> in to Python would be much preferred over any modules which have to 
>> be
>> added. I had in the back of my mind that there was a high-level
>> socket-IO library - much as threading was added as a wrapper to the
>> basic thread module - but I cannot find anything above socket. Is
>> there any?
>
> Does ØMQ qualify as lightweight?

It's certainly interesting. It's puzzling, too. For example,

  http://zguide.zeromq.org/py:hwserver

The Python code there includes

  message = socket.recv()

but given that this is a TCP socket it doesn't look like there is any 
way for the stack to know how many bytes to return. Either ZeroMQ layers 
another end-to-end protocol on top of TCP (which would be no good) or it 
will be guessing (which would not be good either).

There are probably answers to that query but there is a lot of 
documentation, including on reliable communication, and that in itself 
makes ZeroMQ seem overkill, even if it can be persuaded to do what I 
want.

I am impressed that they show code in many languages. I may come back to 
it but for the moment it doesn't seem to be what I was looking for. And 
it is not built in.

>> A current specific to illustrate where basic socket IO is limited: it
>> normally provides no guarantees over how many bytes are transferred 
>> at
>> a time (AFAICS that's true for both streams and datagrams) so the
>> delimiting of messages/records needs to be handled by the sender and
>> receiver. I do already handle some of this myself but I wondered if
>> there was a prebuilt solution that I should be using instead - to 
>> save
>> me adding just a little bit extra. ;-)
>
> There are already convenience functions in stdlib such as
> sock.sendall(), sock.sendfile(), socket.create_connection() in 
> addition
> to BSD Sockets API.
>
> If you want to extend this list and have specific suggestions; see
>  https://docs.python.org/devguide/stdlibchanges.html

That may be a bit overkill just now but it's a good suggestion.

> Or just describe your current specific issue in more detail here.

There are a few things and more crop up as time goes on. For example, 
over TCP it would be helpful to have a function to receive a specific 
number of bytes or one to read bytes until reaching a certain delimiter 
such as newline or zero or space etc. Even better would be to be able to 
use the iteration protocol so you could just code next() and get the 
next such chunk of read in a for loop. When sending it would be good to 
just say to send a bunch of bytes but know that you will get told how 
many were sent (or didn't get sent) if it fails. Sock.sendall() doesn't 
do that.

I thought UDP would deliver (or drop) a whole datagram but cannot find 
anything in the Python documentaiton to guarantee that. In fact 
documentation for the send() call says that apps are responsible for 
checking that all data has been sent. They may mean that to apply to 
stream protocols only but it doesn't state that. (Of course, UDP 
datagrams are limited in size so the call may validly indicate 
incomplete transmission even when the first part of a big message is 
sent successfully.)

Receiving no bytes is taken as indicating the end of the communication. 
That's OK for TCP but not for UDP so there should be a way to 
distinguish between the end of data and receiving an empty datagram.

The recv calls require a buffer size to be supplied which is a technical 
detail. A Python wrapper could save the programmer dealing with that.

Reminder to self: encoding issues.

None of the above is difficult to write and I have written the bits I 
need myself but, basically, there are things that would make socket IO 
easier and yet still compatible with more long-winded code. So I 
wondered if there were already some Python modules which were more 
convenient than what I found in the documentation.

James

[toc] | [prev] | [next] | [standalone]

#96903

From	Dennis Lee Bieber <wlfraed@ix.netcom.com>
Date	2015-09-20 20:19 -0400
Message-ID	<mailman.12.1442794762.28679.python-list@python.org>
In reply to	#96901

On Sun, 20 Sep 2015 23:36:30 +0100, "James Harris"
<james.harris.1@gmail.com> declaimed the following:


>
>There are a few things and more crop up as time goes on. For example, 
>over TCP it would be helpful to have a function to receive a specific 
>number of bytes or one to read bytes until reaching a certain delimiter 
>such as newline or zero or space etc. Even better would be to be able to 
>use the iteration protocol so you could just code next() and get the 
>next such chunk of read in a for loop. When sending it would be good to 
>just say to send a bunch of bytes but know that you will get told how 
>many were sent (or didn't get sent) if it fails. Sock.sendall() doesn't 
>do that.

	Note that the "buffer size" option on a TCP socket.recv() gives you
your "specific number of bytes" -- if available at that time.

	I wouldn't want to user .recv(1) though to implement your "reaching a
certain delimiter"... Much better to read as much as available and search
it for the delimiter. I'll confess, adding a .readln() FOR TCP ONLY, might
be a nice extension over BSD sockets (might need to allow option for
whether line-ends are Internet standard <cr><lf> or some other marker, and
whether they should be converted upon reading to the native format for the
host).


>
>I thought UDP would deliver (or drop) a whole datagram but cannot find 
>anything in the Python documentaiton to guarantee that. In fact 
>documentation for the send() call says that apps are responsible for 
>checking that all data has been sent. They may mean that to apply to 
>stream protocols only but it doesn't state that. (Of course, UDP 
>datagrams are limited in size so the call may validly indicate 
>incomplete transmission even when the first part of a big message is 
>sent successfully.)
>
	Looking in the wrong documentation <G> 

	You probably should be looking at the UDP RFC. Or maybe just

http://www.diffen.com/difference/TCP_vs_UDP

"""
Packets are sent individually and are checked for integrity only if they
arrive. Packets have definite boundaries which are honored upon receipt,
meaning a read operation at the receiver socket will yield an entire
message as it was originally sent.
"""

	Even if the IP layer has to fragment a UDP packet to meet limits of the
transport media, it should put them back together on the other end before
passing it up to the UDP layer. To my knowledge, UDP does not have a size
limit on the message (well -- a 16-bit length field in the UDP header). But
since it /is/ "got it all" or "dropped" with no inherent confirmation, one
would have to embed their own protocol within it -- sequence numbers with
ACK/NAK, for example. Problem: if using LARGE UDP packets, this protocol
would mean having LARGE resends should packets be dropped or arrive out of
sequence (and since the ACK/NAK could be dropped too, you may have to
handle the case of a duplicated packet -- also large).

	TCP is a stream protocol -- the protocol will ensure that all data
arrives, and that it arrives in order, but does not enforce any boundaries
on the data; what started as a relatively large packet at one end may
arrive as lots of small packets due to intermediate transport limits (one
can visualize a worst case: each TCP packet is broken up to fit Hollerith
cards; 20bytes for header and 60 bytes of data -- then fed to a reader and
sent on AS-IS). Boundaries are the end-user responsibility... line endings
(look at SMTP, where an email message ends on a line containing just a ".")
or embedded length counter (not the TCP packet length).

>Receiving no bytes is taken as indicating the end of the communication. 
>That's OK for TCP but not for UDP so there should be a way to 
>distinguish between the end of data and receiving an empty datagram.
>
	I don't believe UDP supports a truly empty datagram (length of 0) --
presuming a sending stack actually sends one, the receiving stack will
probably drop it as there is no data to pass on to a client (there is a PR
at work because we have a UDP driver that doesn't drop 0-length messages,
but also can't deliver them -- so the circular buffer might fill with
undeliverable headers)

-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
    wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

[toc] | [prev] | [next] | [standalone]

#96914

From	Gregory Ewing <greg.ewing@canterbury.ac.nz>
Date	2015-09-21 17:46 +1200
Message-ID	<d69jtnFp9qfU1@mid.individual.net>
In reply to	#96903

Dennis Lee Bieber wrote:
> worst case: each TCP packet is broken up to fit Hollerith
> cards;

Or printed on strips of paper and tied to pigeons:

https://en.wikipedia.org/wiki/IP_over_Avian_Carriers

-- 
Greg

[toc] | [prev] | [next] | [standalone]

#96932

From	Jorgen Grahn <grahn+nntp@snipabacken.se>
Date	2015-09-21 11:25 +0000
Message-ID	<slrnmvvq8v.eij.grahn+nntp@frailea.sa.invalid>
In reply to	#96903

On Mon, 2015-09-21, Dennis Lee Bieber wrote:
> On Sun, 20 Sep 2015 23:36:30 +0100, "James Harris"
> <james.harris.1@gmail.com> declaimed the following:

...
>>I thought UDP would deliver (or drop) a whole datagram but cannot find 
>>anything in the Python documentaiton to guarantee that. In fact 
>>documentation for the send() call says that apps are responsible for 
>>checking that all data has been sent. They may mean that to apply to 
>>stream protocols only but it doesn't state that. (Of course, UDP 
>>datagrams are limited in size so the call may validly indicate 
>>incomplete transmission even when the first part of a big message is 
>>sent successfully.)
>>
> 	Looking in the wrong documentation <G> 
>
> 	You probably should be looking at the UDP RFC. Or maybe just
>
> http://www.diffen.com/difference/TCP_vs_UDP
>
> """
> Packets are sent individually and are checked for integrity only if they
> arrive. Packets have definite boundaries which are honored upon receipt,
> meaning a read operation at the receiver socket will yield an entire
> message as it was originally sent.
> """
>
> 	Even if the IP layer has to fragment a UDP packet to meet limits of the
> transport media, it should put them back together on the other end before
> passing it up to the UDP layer. To my knowledge, UDP does not have a size
> limit on the message (well -- a 16-bit length field in the UDP header).

So they are "limited in size" like the OP wrote.  (A TCP stream OTOH is
potentially infinite.)

But also, the IPv4 RFC says:

    All hosts must be prepared to accept datagrams of up to 576 octets
    (whether they arrive whole or in fragments).  It is recommended
    that hosts only send datagrams larger than 576 octets if they have
    assurance that the destination is prepared to accept the larger
    datagrams.

As for "all or nothing" with UDP datagrams, you also have the socket
layer case where the user does read() into a 1000 octet buffer and the
datagram was 1200 octets.  With BSD sockets you can (if you try)
detect this, but the extra 200 octets are lost forever.

> But  since it /is/ "got it all" or "dropped" with no inherent confirmation, one
> would have to embed their own protocol within it -- sequence numbers with
> ACK/NAK, for example. Problem: if using LARGE UDP packets, this protocol
> would mean having LARGE resends should packets be dropped or arrive out of
> sequence (and since the ACK/NAK could be dropped too, you may have to
> handle the case of a duplicated packet -- also large).
>
> 	TCP is a stream protocol -- the protocol will ensure that all data
> arrives, and that it arrives in order, but does not enforce any boundaries
> on the data; what started as a relatively large packet at one end may
> arrive as lots of small packets due to intermediate transport limits (one
> can visualize a worst case: each TCP packet is broken up to fit Hollerith
> cards; 20bytes for header and 60 bytes of data -- then fed to a reader and
> sent on AS-IS).

The problem is IMO more this: the chunks of data that the application
writes doesn't map to what the other application reads.  In the lower
layers, I don't expect TCP segments to be split, and IP fragmentation
(if it happens at all) operates at an even lower level.

However the end result is still just as you write:

> Boundaries are the end-user responsibility... line endings
> (look at SMTP, where an email message ends on a line containing just a ".")
> or embedded length counter (not the TCP packet length).
>
>>Receiving no bytes is taken as indicating the end of the communication. 
>>That's OK for TCP but not for UDP so there should be a way to 
>>distinguish between the end of data and receiving an empty datagram.
>>
> 	I don't believe UDP supports a truly empty datagram (length of 0) --
> presuming a sending stack actually sends one, the receiving stack will
> probably drop it as there is no data to pass on to a client

UDP datagrams of length 0 work (just tried it on Linux).  There's
nothing special about it.

> (there is a PR
> at work because we have a UDP driver that doesn't drop 0-length messages,
> but also can't deliver them -- so the circular buffer might fill with
> undeliverable headers)

Those messages should be delivered to the receiving socket, in the
sense that they are sanity-checked, used to wake up the application
and mark the socket readable, fill up one entry in the read queue and
so on ...

Of course your system at work may have the rights to be more
restrictive, if it's special-purpose.

/Jorgen

-- 
  // Jorgen Grahn <grahn@  Oo  o.   .     .
\X/     snipabacken.se>   O  o   .

[toc] | [prev] | [next] | [standalone]

#96988

From	"James Harris" <james.harris.1@gmail.com>
Date	2015-09-22 20:45 +0100
Message-ID	<mtsb10$uoj$1@dont-email.me>
In reply to	#96903

"Dennis Lee Bieber" <wlfraed@ix.netcom.com> wrote in message 
news:mailman.12.1442794762.28679.python-list@python.org...
> On Sun, 20 Sep 2015 23:36:30 +0100, "James Harris"
> <james.harris.1@gmail.com> declaimed the following:
>
>
>>
>>There are a few things and more crop up as time goes on. For example,
>>over TCP it would be helpful to have a function to receive a specific
>>number of bytes or one to read bytes until reaching a certain 
>>delimiter
>>such as newline or zero or space etc. Even better would be to be able 
>>to
>>use the iteration protocol so you could just code next() and get the
>>next such chunk of read in a for loop. When sending it would be good 
>>to
>>just say to send a bunch of bytes but know that you will get told how
>>many were sent (or didn't get sent) if it fails. Sock.sendall() 
>>doesn't
>>do that.
>
> Note that the "buffer size" option on a TCP socket.recv() gives you
> your "specific number of bytes" -- if available at that time.

"If" is a big word!

AIUI the buffer size is not guaranteed to relate to the number of bytes 
returned except that you won't/shouldn't(!) get more than the buffer 
size.

> I wouldn't want to user .recv(1) though to implement your "reaching a
> certain delimiter"... Much better to read as much as available and 
> search
> it for the delimiter.

Yes, that's what I do at the moment. I keep a block of bytes, add any 
new stuff to it and scan it for delimiters.

> I'll confess, adding a .readln() FOR TCP ONLY, might
> be a nice extension over BSD sockets (might need to allow option for
> whether line-ends are Internet standard <cr><lf> or some other marker, 
> and
> whether they should be converted upon reading to the native format for 
> the
> host).

Akira Li pointed out that there is just such an extension: makefile. 
Scanning to <lf> is what I do just now as that includes <cr><lf> too and 
I leave them on the string. IIRC file.readline works in the same way.

>>I thought UDP would deliver (or drop) a whole datagram but cannot find
>>anything in the Python documentaiton to guarantee that. In fact
>>documentation for the send() call says that apps are responsible for
>>checking that all data has been sent. They may mean that to apply to
>>stream protocols only but it doesn't state that. (Of course, UDP
>>datagrams are limited in size so the call may validly indicate
>>incomplete transmission even when the first part of a big message is
>>sent successfully.)
>>
> Looking in the wrong documentation <G>
>
> You probably should be looking at the UDP RFC. Or maybe just
>
> http://www.diffen.com/difference/TCP_vs_UDP
>
> """
> Packets are sent individually and are checked for integrity only if 
> they
> arrive. Packets have definite boundaries which are honored upon 
> receipt,
> meaning a read operation at the receiver socket will yield an entire
> message as it was originally sent.
> """

I would rather see it in the Python docs because we program to the 
language standard and there can be - and often are, for good reason - 
areas where Python does not work in the same way as underlying systems.

> Even if the IP layer has to fragment a UDP packet to meet limits of 
> the
> transport media, it should put them back together on the other end 
> before
> passing it up to the UDP layer. To my knowledge, UDP does not have a 
> size
> limit on the message (well -- a 16-bit length field in the UDP 
> header). But
> since it /is/ "got it all" or "dropped" with no inherent confirmation, 
> one
> would have to embed their own protocol within it -- sequence numbers 
> with
> ACK/NAK, for example. Problem: if using LARGE UDP packets, this 
> protocol
> would mean having LARGE resends should packets be dropped or arrive 
> out of
> sequence (and since the ACK/NAK could be dropped too, you may have to
> handle the case of a duplicated packet -- also large).

Yes, it was the 16-bit limitation that I was talking about.

> TCP is a stream protocol -- the protocol will ensure that all data
> arrives, and that it arrives in order, but does not enforce any 
> boundaries
> on the data; what started as a relatively large packet at one end may
> arrive as lots of small packets due to intermediate transport limits 
> (one
> can visualize a worst case: each TCP packet is broken up to fit 
> Hollerith
> cards; 20bytes for header and 60 bytes of data -- then fed to a reader 
> and
> sent on AS-IS). Boundaries are the end-user responsibility... line 
> endings
> (look at SMTP, where an email message ends on a line containing just a 
> ".")
> or embedded length counter (not the TCP packet length).

Yes.

>>Receiving no bytes is taken as indicating the end of the 
>>communication.
>>That's OK for TCP but not for UDP so there should be a way to
>>distinguish between the end of data and receiving an empty datagram.
>>
> I don't believe UDP supports a truly empty datagram (length of 0) --
> presuming a sending stack actually sends one, the receiving stack will
> probably drop it as there is no data to pass on to a client (there is 
> a PR
> at work because we have a UDP driver that doesn't drop 0-length 
> messages,
> but also can't deliver them -- so the circular buffer might fill with
> undeliverable headers)

As others have pointed out, UDP implementations do seem to work with 
zero-byte datagrams properly. Again, I would rather see that in the 
Python documentation which is what, effectively, forms a contract that 
we should be able to rely on.

James

[toc] | [prev] | [next] | [standalone]

#97011

From	Random832 <random832@fastmail.com>
Date	2015-09-22 19:52 -0400
Message-ID	<mailman.84.1442965978.28679.python-list@python.org>
In reply to	#96988

On Tue, Sep 22, 2015, at 15:45, James Harris wrote:
> "Dennis Lee Bieber" <wlfraed@ix.netcom.com> wrote in message 
> news:mailman.12.1442794762.28679.python-list@python.org...
> > On Sun, 20 Sep 2015 23:36:30 +0100, "James Harris"
> > <james.harris.1@gmail.com> declaimed the following:
> >>Receiving no bytes is taken as indicating the end of the 
> >>communication.
> >>That's OK for TCP but not for UDP so there should be a way to
> >>distinguish between the end of data and receiving an empty datagram.
> >>
> > I don't believe UDP supports a truly empty datagram (length of 0) --
> > presuming a sending stack actually sends one, the receiving stack will
> > probably drop it as there is no data to pass on to a client (there is 
> > a PR
> > at work because we have a UDP driver that doesn't drop 0-length 
> > messages,
> > but also can't deliver them -- so the circular buffer might fill with
> > undeliverable headers)
> 
> As others have pointed out, UDP implementations do seem to work with 
> zero-byte datagrams properly. Again, I would rather see that in the 
> Python documentation which is what, effectively, forms a contract that 
> we should be able to rely on.

Isn't this technically the same problem as pressing ctrl-d at a terminal
- it's not _really_ the end of the input (you can continue reading
after), but it sends the program something it will interpret as such?

[toc] | [prev] | [next] | [standalone]

#97013

From	Gregory Ewing <greg.ewing@canterbury.ac.nz>
Date	2015-09-23 12:47 +1200
Message-ID	<d6eb4uFi26U1@mid.individual.net>
In reply to	#97011

Random832 wrote:

> Isn't this technically the same problem as pressing ctrl-d at a terminal
> - it's not _really_ the end of the input (you can continue reading
> after), but it sends the program something it will interpret as such?

Yes. There's no concept of "closing the connection" with UDP,
because there's no connection. So if a read returns 0 bytes,
it must be because someone sent you a 0-length datagram.

-- 
Greg

[toc] | [prev] | [next] | [standalone]

#96905

From	Chris Angelico <rosuav@gmail.com>
Date	2015-09-21 10:34 +1000
Message-ID	<mailman.14.1442795696.28679.python-list@python.org>
In reply to	#96901

On Mon, Sep 21, 2015 at 10:19 AM, Dennis Lee Bieber
<wlfraed@ix.netcom.com> wrote:
>         Even if the IP layer has to fragment a UDP packet to meet limits of the
> transport media, it should put them back together on the other end before
> passing it up to the UDP layer. To my knowledge, UDP does not have a size
> limit on the message (well -- a 16-bit length field in the UDP header). But
> since it /is/ "got it all" or "dropped" with no inherent confirmation, one
> would have to embed their own protocol within it -- sequence numbers with
> ACK/NAK, for example. Problem: if using LARGE UDP packets, this protocol
> would mean having LARGE resends should packets be dropped or arrive out of
> sequence (and since the ACK/NAK could be dropped too, you may have to
> handle the case of a duplicated packet -- also large).
>

If you're going to add sequencing and acknowledgements to UDP,
wouldn't it be easier to use TCP and simply prefix every message with
a two-byte length?

UDP is great when order doesn't matter and each packet stands entirely
alone. DNS is a well-known example - the question "What is the IP
address for www.rosuav.com?" doesn't in any way affect the question
"What is the mail server for gmail.com?", so you fire off UDP packets
for each one, and get responses whenever you get them. UDP's also
perfect for a heartbeat system - you send out a packet every
however-often, and if the monitor hasn't heard from you in X seconds,
it starts alerting people. No need for responses of any kind there.
But for working with a stream, I usually find it's a lot easier to
build on top of TCP than UDP.

ChrisA

[toc] | [prev] | [next] | [standalone]

#96910

From	Akira Li <4kir4.1i@gmail.com>
Date	2015-09-21 06:07 +0300
Message-ID	<mailman.18.1442804862.28679.python-list@python.org>
In reply to	#96901

"James Harris" <james.harris.1@gmail.com> writes:
...
> There are a few things and more crop up as time goes on. For example,
> over TCP it would be helpful to have a function to receive a specific
> number of bytes or one to read bytes until reaching a certain
> delimiter such as newline or zero or space etc. 

The answer is sock.makefile('rb') then `file.read(nbytes)` returns a
specific number of bytes.

`file.readline()` reads until newline (b'\n') There is Python Issue:
"Add support for reading records with arbitrary separators to the
standard IO stack"
  http://bugs.python.org/issue1152248
See also
  http://bugs.python.org/issue17083

Perhaps, it is easier to implement read_until(sep) that is best suited
for a particular case.

> Even better would be to be able to use the iteration protocol so you
> could just code next() and get the next such chunk of read in a for
> loop.

file is an iterator over lines i.e., next(file) works.

> When sending it would be good to just say to send a bunch of bytes but
> know that you will get told how many were sent (or didn't get sent) if
> it fails. Sock.sendall() doesn't do that.

sock.send() returns the number of bytes sent that may be less than given.
You could reimplement sock.sendall() to include the number of bytes
successfully sent in case of an error.

> I thought UDP would deliver (or drop) a whole datagram but cannot find
> anything in the Python documentaiton to guarantee that. In fact
> documentation for the send() call says that apps are responsible for
> checking that all data has been sent. They may mean that to apply to
> stream protocols only but it doesn't state that. (Of course, UDP
> datagrams are limited in size so the call may validly indicate
> incomplete transmission even when the first part of a big message is
> sent successfully.)
>
> Receiving no bytes is taken as indicating the end of the
> communication. That's OK for TCP but not for UDP so there should be a
> way to distinguish between the end of data and receiving an empty
> datagram.

There is no end of communication in UDP and therefore there is no end of
data. If you've got a zero bytes in return then it means that you've
received a zero length datagram.

sock.recvfrom() is a thin wrapper around the corresponding C
function. You could read any docs you like about UDP sockets.
  http://stackoverflow.com/questions/5307031/how-to-detect-receipt-of-a-0-length-udp-datagram

> The recv calls require a buffer size to be supplied which is a
> technical detail. A Python wrapper could save the programmer dealing
> with that.

It is not just a buffer size. It is the maximum amount of data to be
received at once i.e., sock.recv() may return less but never more.
You could use makefile() and read() if recv() is too low-level.

> Reminder to self: encoding issues.
>
> None of the above is difficult to write and I have written the bits I
> need myself but, basically, there are things that would make socket IO
> easier and yet still compatible with more long-winded code. So I
> wondered if there were already some Python modules which were more
> convenient than what I found in the documentation.
>
> James

[toc] | [prev] | [next] | [standalone]

#96990

From	"James Harris" <james.harris.1@gmail.com>
Date	2015-09-22 21:05 +0100
Message-ID	<mtsc60$41t$1@dont-email.me>
In reply to	#96910

"Akira Li" <4kir4.1i@gmail.com> wrote in message 
news:mailman.18.1442804862.28679.python-list@python.org...
> "James Harris" <james.harris.1@gmail.com> writes:
> ...
>> There are a few things and more crop up as time goes on. For example,
>> over TCP it would be helpful to have a function to receive a specific
>> number of bytes or one to read bytes until reaching a certain
>> delimiter such as newline or zero or space etc.
>
> The answer is sock.makefile('rb') then `file.read(nbytes)` returns a
> specific number of bytes.

Thanks, I hadn't seen that. Now I know of it I see references to it all 
over the place but beforehand it was in hiding....

It is exactly the type of convenience wrapper I was expecting Python to 
have but expected it to be in another module. It looks as though it will 
definitely cover some of the issues I had.

> `file.readline()` reads until newline (b'\n') There is Python Issue:
> "Add support for reading records with arbitrary separators to the
> standard IO stack"
>  http://bugs.python.org/issue1152248
> See also
>  http://bugs.python.org/issue17083
>
> Perhaps, it is easier to implement read_until(sep) that is best suited
> for a particular case.

OK.

...

>> When sending it would be good to just say to send a bunch of bytes 
>> but
>> know that you will get told how many were sent (or didn't get sent) 
>> if
>> it fails. Sock.sendall() doesn't do that.
>
> sock.send() returns the number of bytes sent that may be less than 
> given.
> You could reimplement sock.sendall() to include the number of bytes
> successfully sent in case of an error.

I know. As mentioned, I wondered if there were already such functions to 
save me using my own.

>> I thought UDP would deliver (or drop) a whole datagram but cannot 
>> find
>> anything in the Python documentaiton to guarantee that. In fact
>> documentation for the send() call says that apps are responsible for
>> checking that all data has been sent. They may mean that to apply to
>> stream protocols only but it doesn't state that. (Of course, UDP
>> datagrams are limited in size so the call may validly indicate
>> incomplete transmission even when the first part of a big message is
>> sent successfully.)
>>
>> Receiving no bytes is taken as indicating the end of the
>> communication. That's OK for TCP but not for UDP so there should be a
>> way to distinguish between the end of data and receiving an empty
>> datagram.
>
> There is no end of communication in UDP and therefore there is no end 
> of
> data. If you've got a zero bytes in return then it means that you've
> received a zero length datagram.
>
> sock.recvfrom() is a thin wrapper around the corresponding C
> function. You could read any docs you like about UDP sockets.
> 
> http://stackoverflow.com/questions/5307031/how-to-detect-receipt-of-a-0-length-udp-datagram

As mentioned to Dennis just now, I would prefer to write code to conform 
with the documented behaviour of Python and its libraries, as long as 
they were known to be reliable implementations of what was documented, 
of course.

I agree with what you say. A zero-length UDP datagram should be possible 
and not indicate end of input but is that guaranteed and portable? 
(Rhetorical.)  It seems not. Even the Linux man page for recv says: "If 
no  messages  are  available  at  the  socket, the receive calls wait 
for a message to arrive, unless the socket is nonblocking...." In that 
case, of course, what it defines as a "message" - and whether it can be 
zero length or not - is not stated.

>> The recv calls require a buffer size to be supplied which is a
>> technical detail. A Python wrapper could save the programmer dealing
>> with that.
>
> It is not just a buffer size. It is the maximum amount of data to be
> received at once i.e., sock.recv() may return less but never more.

My point was that we might want to request the entire next line or next 
field of input and not know a maximum length. *C* programmers are used 
to giving buffers fixed sizes often because then they can avoid fiddling 
with memory management but Python normally does that for us. I was 
suggesting that the thin wrapper around the socket recv() call is too 
thin! The makefile() approach that you mentioned seems more Pythonesque, 
though.

> You could use makefile() and read() if recv() is too low-level.

Yes.

James

[toc] | [prev] | [next] | [standalone]

#96993

From	Marko Rauhamaa <marko@pacujo.net>
Date	2015-09-23 00:00 +0300
Message-ID	<8737y6cgp6.fsf@elektro.pacujo.net>
In reply to	#96990

"James Harris" <james.harris.1@gmail.com>:

> I agree with what you say. A zero-length UDP datagram should be
> possible and not indicate end of input but is that guaranteed and
> portable?

The zero-length payload size shouldn't be an issue, but UDP doesn't make
any guarantees about delivering the message. Your UDP application must
be prepared for some, most or all of the messages disappearing without
any error indication.

In practice, you'd end up implementing your own TCP on top of UDP
(retries, timeouts, acknowledgements, sequence numbers etc).


Marko

[toc] | [prev] | [next] | [standalone]

#96995

From	"James Harris" <james.harris.1@gmail.com>
Date	2015-09-22 22:28 +0100
Message-ID	<mtsh2k$obt$1@dont-email.me>
In reply to	#96993

"Marko Rauhamaa" <marko@pacujo.net> wrote in message 
news:8737y6cgp6.fsf@elektro.pacujo.net...
> "James Harris" <james.harris.1@gmail.com>:
>
>> I agree with what you say. A zero-length UDP datagram should be
>> possible and not indicate end of input but is that guaranteed and
>> portable?
>
> The zero-length payload size shouldn't be an issue, but UDP doesn't 
> make
> any guarantees about delivering the message. Your UDP application must
> be prepared for some, most or all of the messages disappearing without
> any error indication.
>
> In practice, you'd end up implementing your own TCP on top of UDP
> (retries, timeouts, acknowledgements, sequence numbers etc).

The unreliability of UDP was not the case in point here. Rather, it was 
about whether different platforms could be relied upon to deliver 
zero-length datagrams to the app if the datagrams got safely across the 
network.

James

[toc] | [prev] | [standalone]

csiph-web

Lightwight socket IO wrapper

Contents

#96872 — Lightwight socket IO wrapper

#96875

#96901

#96903

#96914

#96932

#96988

#97011

#97013

#96905

#96910

#96990

#96993

#96995