Groups > comp.lang.python > #86852 > unrolled thread

Reading all buffered bytes without blocking

Started by	Paul Moore <p.f.moore@gmail.com>
First post	2015-03-03 08:07 -0800
Last post	2015-03-05 22:33 -0800
Articles	6 — 4 participants

Back to article view | Back to comp.lang.python

  Reading all buffered bytes without blocking Paul  Moore <p.f.moore@gmail.com> - 2015-03-03 08:07 -0800
    Re: Reading all buffered bytes without blocking Serhiy Storchaka <storchaka@gmail.com> - 2015-03-03 21:28 +0200
      Re: Reading all buffered bytes without blocking Paul  Moore <p.f.moore@gmail.com> - 2015-03-03 13:10 -0800
        Re: Reading all buffered bytes without blocking wxjmfauth@gmail.com - 2015-03-05 00:41 -0800
          Re: Reading all buffered bytes without blocking jornws0718@xs4all.nl (Oscar) - 2015-03-05 11:15 +0000
            Re: Reading all buffered bytes without blocking wxjmfauth@gmail.com - 2015-03-05 22:33 -0800

#86852 — Reading all buffered bytes without blocking

From	Paul Moore <p.f.moore@gmail.com>
Date	2015-03-03 08:07 -0800
Subject	Reading all buffered bytes without blocking
Message-ID	<4ad47d38-bd92-4516-bc60-60fddc9e0666@googlegroups.com>

Is it possible to say to a BufferedReader stream "give me all the bytes you have available in the buffer, or do one OS call and give me everything you get back"? The problem is that the "number of bytes" argument to read1() isn't optional, so I can't do available_bytes = fd.read1().

I need this because I want to decode the returned bytes from UTF-8, and I *might* get a character split across the boundary of any arbitrary block size I choose. (I'm happy to ignore the possibility that the *source* did a flush part-way through a character). I don't really want to have to do incremental encoding if I can avoid it - it looks hard...

Thanks,
Paul

[toc] | [next] | [standalone]

#86861

From	Serhiy Storchaka <storchaka@gmail.com>
Date	2015-03-03 21:28 +0200
Message-ID	<mailman.25.1425410913.21433.python-list@python.org>
In reply to	#86852

On 03.03.15 18:07, Paul Moore wrote:
> Is it possible to say to a BufferedReader stream "give me all the bytes you have available in the buffer, or do one OS call and give me everything you get back"? The problem is that the "number of bytes" argument to read1() isn't optional, so I can't do available_bytes = fd.read1().

Just specify large size.

[toc] | [prev] | [next] | [standalone]

#86867

From	Paul Moore <p.f.moore@gmail.com>
Date	2015-03-03 13:10 -0800
Message-ID	<a56af750-eeb0-49a1-ab2c-f2020f0585fb@googlegroups.com>
In reply to	#86861

On Tuesday, 3 March 2015 19:29:19 UTC, Serhiy Storchaka  wrote:
> On 03.03.15 18:07, Paul Moore wrote:
> > Is it possible to say to a BufferedReader stream "give me all the bytes you have available in the buffer, or do one OS call and give me everything you get back"? The problem is that the "number of bytes" argument to read1() isn't optional, so I can't do available_bytes = fd.read1().
> 
> Just specify large size.

Thanks. Looking at the source, it appears that a large size will allocate a buffer that size for the data even if the amount actually read is small (thinking about it, of couse it has to, doh, because the syscall needs it).

Anyway, it's a pretty microscopic risk in practice, and when I looked at them, the incremental codecs (codecs.iterdecode) really aren't that hard to use, so I can do it that way if it matters enough.

For what it's worth, in case anyone wants to know, incremental decoding looks like this:

def get():
    while True:
        data = process.stdout.read(1000)
        if not data:
            break
        yield data
for data in codecs.iterdecode(get(), encoding):
    sys.stdout.write(data)
    sys.stdout.flush()

Thanks.
Paul

[toc] | [prev] | [next] | [standalone]

#86938

From	wxjmfauth@gmail.com
Date	2015-03-05 00:41 -0800
Message-ID	<f137c6cb-81ea-41bd-8387-2542a7fae5f9@googlegroups.com>
In reply to	#86867

Le mardi 3 mars 2015 22:10:36 UTC+1, Paul  Moore a écrit :
> On Tuesday, 3 March 2015 19:29:19 UTC, Serhiy Storchaka  wrote:
> > On 03.03.15 18:07, Paul Moore wrote:
> > > Is it possible to say to a BufferedReader stream "give me all the bytes you have available in the buffer, or do one OS call and give me everything you get back"? The problem is that the "number of bytes" argument to read1() isn't optional, so I can't do available_bytes = fd.read1().
> > 
> > Just specify large size.
> 
> Thanks. Looking at the source, it appears that a large size will allocate a buffer that size for the data even if the amount actually read is small (thinking about it, of couse it has to, doh, because the syscall needs it).
> 
> Anyway, it's a pretty microscopic risk in practice, and when I looked at them, the incremental codecs (codecs.iterdecode) really aren't that hard to use, so I can do it that way if it matters enough.
> 
> For what it's worth, in case anyone wants to know, incremental decoding looks like this:
> 
> def get():
>     while True:
>         data = process.stdout.read(1000)
>         if not data:
>             break
>         yield data
> for data in codecs.iterdecode(get(), encoding):
>     sys.stdout.write(data)
>     sys.stdout.flush()
> 
> Thanks.
> Paul

======

>>> buffer = ('a'*998 + '\u20ac').encode('utf-8')[:1000]
>>> buffer.decode('utf-8')
Traceback (most recent call last):
  File "<eta last command>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 998-999: 
unexpected end of data
>>> 
>>> # BOUM

[toc] | [prev] | [next] | [standalone]

#86940

From	jornws0718@xs4all.nl (Oscar)
Date	2015-03-05 11:15 +0000
Message-ID	<54f83ab7$0$2849$e4fe514c@news2.news.xs4all.nl>
In reply to	#86938

In article <f137c6cb-81ea-41bd-8387-2542a7fae5f9@googlegroups.com>,
 <wxjmfauth@gmail.com> wrote:
>>>> buffer = ('a'*998 + '\u20ac').encode('utf-8')[:1000]
>>>> buffer.decode('utf-8')
>Traceback (most recent call last):
>  File "<eta last command>", line 1, in <module>
>UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 998-999: 
>unexpected end of data
>>>> 
>>>> # BOUM

hmm...

>>> import sys as jmr
>>> input = jmr.stdin.fileno()
>>> output = jmr.stdout.fileno()
>>> value = output / input
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: integer division or modulo by zero
>>> # BOUM

-- 
[J|O|R] <- .signature.gz

[toc] | [prev] | [next] | [standalone]

#86988

From	wxjmfauth@gmail.com
Date	2015-03-05 22:33 -0800
Message-ID	<9adf48f2-5f3e-4000-8968-7879ace1ff20@googlegroups.com>
In reply to	#86940

Le jeudi 5 mars 2015 12:16:06 UTC+1, Oscar a écrit :
> In article <f137c6cb-81ea-41bd-8387-2542a7fae5f9@googlegroups.com>,
>  <wxjmfauth@gmail.com> wrote:
> >>>> buffer = ('a'*998 + '\u20ac').encode('utf-8')[:1000]
> >>>> buffer.decode('utf-8')
> >Traceback (most recent call last):
> >  File "<eta last command>", line 1, in <module>
> >UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 998-999: 
> >unexpected end of data
> >>>> 
> >>>> # BOUM
> 
> hmm...
> 
> >>> import sys as jmr
> >>> input = jmr.stdin.fileno()
> >>> output = jmr.stdout.fileno()
> >>> value = output / input
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> ZeroDivisionError: integer division or modulo by zero
> >>> # BOUM
> 
> -- 
> [J|O|R] <- .signature.gz

==========

My BOUM is much better:
*** It may always happen ***.

Your BOUM happens only, if one explicitly does it.

I will not refrain people to write buggy unicode code.

[toc] | [prev] | [standalone]

csiph-web

Reading all buffered bytes without blocking

Contents

#86852 — Reading all buffered bytes without blocking

#86861

#86867

#86938

#86940

#86988