Groups > comp.lang.python > #96921 > unrolled thread

Re: Lightwight socket IO wrapper

Started by	Chris Angelico <rosuav@gmail.com>
First post	2015-09-21 17:57 +1000
Last post	2015-09-22 07:59 +0000
Articles	9 — 3 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.

  Re: Lightwight socket IO wrapper Chris Angelico <rosuav@gmail.com> - 2015-09-21 17:57 +1000
    Re: Lightwight socket IO wrapper Marko Rauhamaa <marko@pacujo.net> - 2015-09-21 10:59 +0300
      Re: Lightwight socket IO wrapper Chris Angelico <rosuav@gmail.com> - 2015-09-21 18:07 +1000
        Re: Lightwight socket IO wrapper Marko Rauhamaa <marko@pacujo.net> - 2015-09-21 11:38 +0300
          Re: Lightwight socket IO wrapper Chris Angelico <rosuav@gmail.com> - 2015-09-21 18:45 +1000
            Re: Lightwight socket IO wrapper Marko Rauhamaa <marko@pacujo.net> - 2015-09-21 11:48 +0300
              Re: Lightwight socket IO wrapper Marko Rauhamaa <marko@pacujo.net> - 2015-09-21 11:49 +0300
          Re: Lightwight socket IO wrapper Chris Angelico <rosuav@gmail.com> - 2015-09-21 18:50 +1000
            Re: Lightwight socket IO wrapper Jorgen Grahn <grahn+nntp@snipabacken.se> - 2015-09-22 07:59 +0000

#96921 — Re: Lightwight socket IO wrapper

From	Chris Angelico <rosuav@gmail.com>
Date	2015-09-21 17:57 +1000
Subject	Re: Lightwight socket IO wrapper
Message-ID	<mailman.23.1442822248.28679.python-list@python.org>

On Mon, Sep 21, 2015 at 4:27 PM, Cameron Simpson <cs@zip.com.au> wrote:
> I don't like embedding arbitrary size limits in protocols or data formats if
> I can easily avoid it. So (for my home grown binary protocols) I encode
> unsigned integers as big endian octets with the top bit meaning "another
> octet follows" and the bottom 7 bits going to the value. So my packets look
> like:
>
>  encoded(length)data
>
> For sizes below 128, one byte of length. For sizes 128-16383, two bytes. And
> so on. Compact yet unbounded.

Ah, the MIDI Variable-Length Integer. Decent.

It's generally a lot faster to do a read(2) than a loop with any
number of read(1), and you get some kind of bound on your allocations.
Whether that's important to you or not is another question, but
certainly your chosen encoding is a good way of allowing arbitrary
integer values.

ChrisA

[toc] | [next] | [standalone]

#96922

From	Marko Rauhamaa <marko@pacujo.net>
Date	2015-09-21 10:59 +0300
Message-ID	<876134p5hw.fsf@elektro.pacujo.net>
In reply to	#96921

Chris Angelico <rosuav@gmail.com>:

> On Mon, Sep 21, 2015 at 4:27 PM, Cameron Simpson <cs@zip.com.au> wrote:
>> For sizes below 128, one byte of length. For sizes 128-16383, two bytes. And
>> so on. Compact yet unbounded.
>
> [...]
>
> It's generally a lot faster to do a read(2) than a loop with any
> number of read(1), and you get some kind of bound on your allocations.
> Whether that's important to you or not is another question, but
> certainly your chosen encoding is a good way of allowing arbitrary
> integer values.

You can read a full buffer even if you have a variable-length length
encoding.


Marko

[toc] | [prev] | [next] | [standalone]

#96923

From	Chris Angelico <rosuav@gmail.com>
Date	2015-09-21 18:07 +1000
Message-ID	<mailman.24.1442822846.28679.python-list@python.org>
In reply to	#96922

On Mon, Sep 21, 2015 at 5:59 PM, Marko Rauhamaa <marko@pacujo.net> wrote:
> Chris Angelico <rosuav@gmail.com>:
>
>> On Mon, Sep 21, 2015 at 4:27 PM, Cameron Simpson <cs@zip.com.au> wrote:
>>> For sizes below 128, one byte of length. For sizes 128-16383, two bytes. And
>>> so on. Compact yet unbounded.
>>
>> [...]
>>
>> It's generally a lot faster to do a read(2) than a loop with any
>> number of read(1), and you get some kind of bound on your allocations.
>> Whether that's important to you or not is another question, but
>> certainly your chosen encoding is a good way of allowing arbitrary
>> integer values.
>
> You can read a full buffer even if you have a variable-length length
> encoding.

Not sure what you mean there. Unless you can absolutely guarantee that
you didn't read too much, or can absolutely guarantee that your
buffering function will be the ONLY way anything reads from the
socket, buffering is a problem.

ChrisA

[toc] | [prev] | [next] | [standalone]

#96925

From	Marko Rauhamaa <marko@pacujo.net>
Date	2015-09-21 11:38 +0300
Message-ID	<871tdsp3ox.fsf@elektro.pacujo.net>
In reply to	#96923

Chris Angelico <rosuav@gmail.com>:

> On Mon, Sep 21, 2015 at 5:59 PM, Marko Rauhamaa <marko@pacujo.net> wrote:
>> You can read a full buffer even if you have a variable-length length
>> encoding.
>
> Not sure what you mean there. Unless you can absolutely guarantee that
> you didn't read too much, or can absolutely guarantee that your
> buffering function will be the ONLY way anything reads from the
> socket, buffering is a problem.

Only one reader can read a socket safely at any given time so mutual
exclusion is needed.

If you read "too much," the excess can be put in the application's read
buffer where it is available for whoever wants to process the next
message.

Marko

[toc] | [prev] | [next] | [standalone]

#96926

From	Chris Angelico <rosuav@gmail.com>
Date	2015-09-21 18:45 +1000
Message-ID	<mailman.25.1442825151.28679.python-list@python.org>
In reply to	#96925

On Mon, Sep 21, 2015 at 6:38 PM, Marko Rauhamaa <marko@pacujo.net> wrote:
> Chris Angelico <rosuav@gmail.com>:
>
>> On Mon, Sep 21, 2015 at 5:59 PM, Marko Rauhamaa <marko@pacujo.net> wrote:
>>> You can read a full buffer even if you have a variable-length length
>>> encoding.
>>
>> Not sure what you mean there. Unless you can absolutely guarantee that
>> you didn't read too much, or can absolutely guarantee that your
>> buffering function will be the ONLY way anything reads from the
>> socket, buffering is a problem.
>
> Only one reader can read a socket safely at any given time so mutual
> exclusion is needed.
>
> If you read "too much," the excess can be put in the application's read
> buffer where it is available for whoever wants to process the next
> message.

Which works only if you have a single concept of "application's read
buffer". That means that you have only one place that can ever read
data. Imagine a

[toc] | [prev] | [next] | [standalone]

#96928

From	Marko Rauhamaa <marko@pacujo.net>
Date	2015-09-21 11:48 +0300
Message-ID	<87si68nooe.fsf@elektro.pacujo.net>
In reply to	#96926

Chris Angelico <rosuav@gmail.com>:

> On Mon, Sep 21, 2015 at 6:38 PM, Marko Rauhamaa <marko@pacujo.net> wrote:
>> Only one reader can read a socket safely at any given time so mutual
>> exclusion is needed.
>>
>> If you read "too much," the excess can be put in the application's read
>> buffer where it is available for whoever wants to process the next
>> message.
>
> Which works only if you have a single concept of "application's read
> buffer".

Well, the socket's read buffer.


Marko

[toc] | [prev] | [next] | [standalone]

#96929

From	Marko Rauhamaa <marko@pacujo.net>
Date	2015-09-21 11:49 +0300
Message-ID	<87lhc0nolx.fsf@elektro.pacujo.net>
In reply to	#96928

Marko Rauhamaa <marko@pacujo.net>:

> Chris Angelico <rosuav@gmail.com>:
>
>> On Mon, Sep 21, 2015 at 6:38 PM, Marko Rauhamaa <marko@pacujo.net> wrote:
>>> Only one reader can read a socket safely at any given time so mutual
>>> exclusion is needed.
>>>
>>> If you read "too much," the excess can be put in the application's read
>>> buffer where it is available for whoever wants to process the next
>>> message.
>>
>> Which works only if you have a single concept of "application's read
>> buffer".
>
> Well, the socket's read buffer.

To be exact, the application should associate a read buffer with each
socket.


Marko

[toc] | [prev] | [next] | [standalone]

#96930

From	Chris Angelico <rosuav@gmail.com>
Date	2015-09-21 18:50 +1000
Message-ID	<mailman.26.1442825411.28679.python-list@python.org>
In reply to	#96925

On Mon, Sep 21, 2015 at 6:38 PM, Marko Rauhamaa <marko@pacujo.net> wrote:
> Chris Angelico <rosuav@gmail.com>:
>
>> On Mon, Sep 21, 2015 at 5:59 PM, Marko Rauhamaa <marko@pacujo.net> wrote:
>>> You can read a full buffer even if you have a variable-length length
>>> encoding.
>>
>> Not sure what you mean there. Unless you can absolutely guarantee that
>> you didn't read too much, or can absolutely guarantee that your
>> buffering function will be the ONLY way anything reads from the
>> socket, buffering is a problem.
>
> Only one reader can read a socket safely at any given time so mutual
> exclusion is needed.
>
> If you read "too much," the excess can be put in the application's read
> buffer where it is available for whoever wants to process the next
> message.

Oops, premature send - sorry! Trying again.

Which works only if you have a single concept of "application's read
buffer". That means that you have only one place that can ever read
data. Imagine a protocol that mainly consists of lines of text
terminated by CRLF, but allows binary data to be transmitted by
sending "DATA N\r\n" followed by N arbitrary bytes. The simplest and
most obvious way to handle the base protocol is to buffer your reads
as much as possible, but that means potentially reading the beginning
of the data stream along with its header. You therefore cannot use the
basic read() method to read that data - you have to use something from
your line-based wrapper, even though you are decidedly NOT using a
line-based protocol at that point.

That's what I mean by guaranteeing that your buffering function is the
only way data gets read from the socket. Either that, or you need an
underlying facility for un-reading a bunch of data - de-buffering and
making it readable again.

ChrisA

[toc] | [prev] | [next] | [standalone]

#96975

From	Jorgen Grahn <grahn+nntp@snipabacken.se>
Date	2015-09-22 07:59 +0000
Message-ID	<slrnn022ia.eij.grahn+nntp@frailea.sa.invalid>
In reply to	#96930

On Mon, 2015-09-21, Chris Angelico wrote:
> On Mon, Sep 21, 2015 at 6:38 PM, Marko Rauhamaa <marko@pacujo.net> wrote:
>> Chris Angelico <rosuav@gmail.com>:
>>
>>> On Mon, Sep 21, 2015 at 5:59 PM, Marko Rauhamaa <marko@pacujo.net> wrote:
>>>> You can read a full buffer even if you have a variable-length length
>>>> encoding.
>>>
>>> Not sure what you mean there. Unless you can absolutely guarantee that
>>> you didn't read too much, or can absolutely guarantee that your
>>> buffering function will be the ONLY way anything reads from the
>>> socket, buffering is a problem.
>>
>> Only one reader can read a socket safely at any given time so mutual
>> exclusion is needed.
>>
>> If you read "too much," the excess can be put in the application's read
>> buffer where it is available for whoever wants to process the next
>> message.
>
> Oops, premature send - sorry! Trying again.
>
> Which works only if you have a single concept of "application's read
> buffer". That means that you have only one place that can ever read
> data. Imagine a protocol that mainly consists of lines of text
> terminated by CRLF, but allows binary data to be transmitted by
> sending "DATA N\r\n" followed by N arbitrary bytes. The simplest and
> most obvious way to handle the base protocol is to buffer your reads
> as much as possible, but that means potentially reading the beginning
> of the data stream along with its header. You therefore cannot use the
> basic read() method to read that data - you have to use something from
> your line-based wrapper, even though you are decidedly NOT using a
> line-based protocol at that point.
>
> That's what I mean by guaranteeing that your buffering function is the
> only way data gets read from the socket. Either that, or you need an
> underlying facility for un-reading a bunch of data - de-buffering and
> making it readable again.

The way it seems to me, reading a TCP socket always ends up as:

- keep an application buffer
- do one socket read and append to the buffer
- consume 0--more complete "entries" from the beginning
  of the buffer; keep the incomplete one which may exist
  at the end
- go back and read some more when there's a chance more data
  has arrived

So the buffer is a circular buffer of octets, which you chop up
by parsing it so you can see it as a circular buffer of complete and
incomplete entries or messages.

At that level, yes, the line-oriented data and the binary data would
coexist in the same application buffer.

/Jorgen

-- 
  // Jorgen Grahn <grahn@  Oo  o.   .     .
\X/     snipabacken.se>   O  o   .

[toc] | [prev] | [standalone]

csiph-web

Re: Lightwight socket IO wrapper

Contents

#96921 — Re: Lightwight socket IO wrapper

#96922

#96923

#96925

#96926

#96928

#96929

#96930

#96975