Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #86852 > unrolled thread

Reading all buffered bytes without blocking

Started byPaul Moore <p.f.moore@gmail.com>
First post2015-03-03 08:07 -0800
Last post2015-03-05 22:33 -0800
Articles 6 — 4 participants

Back to article view | Back to comp.lang.python


Contents

  Reading all buffered bytes without blocking Paul  Moore <p.f.moore@gmail.com> - 2015-03-03 08:07 -0800
    Re: Reading all buffered bytes without blocking Serhiy Storchaka <storchaka@gmail.com> - 2015-03-03 21:28 +0200
      Re: Reading all buffered bytes without blocking Paul  Moore <p.f.moore@gmail.com> - 2015-03-03 13:10 -0800
        Re: Reading all buffered bytes without blocking wxjmfauth@gmail.com - 2015-03-05 00:41 -0800
          Re: Reading all buffered bytes without blocking jornws0718@xs4all.nl (Oscar) - 2015-03-05 11:15 +0000
            Re: Reading all buffered bytes without blocking wxjmfauth@gmail.com - 2015-03-05 22:33 -0800

#86852 — Reading all buffered bytes without blocking

FromPaul Moore <p.f.moore@gmail.com>
Date2015-03-03 08:07 -0800
SubjectReading all buffered bytes without blocking
Message-ID<4ad47d38-bd92-4516-bc60-60fddc9e0666@googlegroups.com>
Is it possible to say to a BufferedReader stream "give me all the bytes you have available in the buffer, or do one OS call and give me everything you get back"? The problem is that the "number of bytes" argument to read1() isn't optional, so I can't do available_bytes = fd.read1().

I need this because I want to decode the returned bytes from UTF-8, and I *might* get a character split across the boundary of any arbitrary block size I choose. (I'm happy to ignore the possibility that the *source* did a flush part-way through a character). I don't really want to have to do incremental encoding if I can avoid it - it looks hard...

Thanks,
Paul

[toc] | [next] | [standalone]


#86861

FromSerhiy Storchaka <storchaka@gmail.com>
Date2015-03-03 21:28 +0200
Message-ID<mailman.25.1425410913.21433.python-list@python.org>
In reply to#86852
On 03.03.15 18:07, Paul Moore wrote:
> Is it possible to say to a BufferedReader stream "give me all the bytes you have available in the buffer, or do one OS call and give me everything you get back"? The problem is that the "number of bytes" argument to read1() isn't optional, so I can't do available_bytes = fd.read1().

Just specify large size.

[toc] | [prev] | [next] | [standalone]


#86867

FromPaul Moore <p.f.moore@gmail.com>
Date2015-03-03 13:10 -0800
Message-ID<a56af750-eeb0-49a1-ab2c-f2020f0585fb@googlegroups.com>
In reply to#86861
On Tuesday, 3 March 2015 19:29:19 UTC, Serhiy Storchaka  wrote:
> On 03.03.15 18:07, Paul Moore wrote:
> > Is it possible to say to a BufferedReader stream "give me all the bytes you have available in the buffer, or do one OS call and give me everything you get back"? The problem is that the "number of bytes" argument to read1() isn't optional, so I can't do available_bytes = fd.read1().
> 
> Just specify large size.

Thanks. Looking at the source, it appears that a large size will allocate a buffer that size for the data even if the amount actually read is small (thinking about it, of couse it has to, doh, because the syscall needs it).

Anyway, it's a pretty microscopic risk in practice, and when I looked at them, the incremental codecs (codecs.iterdecode) really aren't that hard to use, so I can do it that way if it matters enough.

For what it's worth, in case anyone wants to know, incremental decoding looks like this:

def get():
    while True:
        data = process.stdout.read(1000)
        if not data:
            break
        yield data
for data in codecs.iterdecode(get(), encoding):
    sys.stdout.write(data)
    sys.stdout.flush()

Thanks.
Paul

[toc] | [prev] | [next] | [standalone]


#86938

Fromwxjmfauth@gmail.com
Date2015-03-05 00:41 -0800
Message-ID<f137c6cb-81ea-41bd-8387-2542a7fae5f9@googlegroups.com>
In reply to#86867
Le mardi 3 mars 2015 22:10:36 UTC+1, Paul  Moore a écrit :
> On Tuesday, 3 March 2015 19:29:19 UTC, Serhiy Storchaka  wrote:
> > On 03.03.15 18:07, Paul Moore wrote:
> > > Is it possible to say to a BufferedReader stream "give me all the bytes you have available in the buffer, or do one OS call and give me everything you get back"? The problem is that the "number of bytes" argument to read1() isn't optional, so I can't do available_bytes = fd.read1().
> > 
> > Just specify large size.
> 
> Thanks. Looking at the source, it appears that a large size will allocate a buffer that size for the data even if the amount actually read is small (thinking about it, of couse it has to, doh, because the syscall needs it).
> 
> Anyway, it's a pretty microscopic risk in practice, and when I looked at them, the incremental codecs (codecs.iterdecode) really aren't that hard to use, so I can do it that way if it matters enough.
> 
> For what it's worth, in case anyone wants to know, incremental decoding looks like this:
> 
> def get():
>     while True:
>         data = process.stdout.read(1000)
>         if not data:
>             break
>         yield data
> for data in codecs.iterdecode(get(), encoding):
>     sys.stdout.write(data)
>     sys.stdout.flush()
> 
> Thanks.
> Paul

======

>>> buffer = ('a'*998 + '\u20ac').encode('utf-8')[:1000]
>>> buffer.decode('utf-8')
Traceback (most recent call last):
  File "<eta last command>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 998-999: 
unexpected end of data
>>> 
>>> # BOUM

[toc] | [prev] | [next] | [standalone]


#86940

Fromjornws0718@xs4all.nl (Oscar)
Date2015-03-05 11:15 +0000
Message-ID<54f83ab7$0$2849$e4fe514c@news2.news.xs4all.nl>
In reply to#86938
In article <f137c6cb-81ea-41bd-8387-2542a7fae5f9@googlegroups.com>,
 <wxjmfauth@gmail.com> wrote:
>>>> buffer = ('a'*998 + '\u20ac').encode('utf-8')[:1000]
>>>> buffer.decode('utf-8')
>Traceback (most recent call last):
>  File "<eta last command>", line 1, in <module>
>UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 998-999: 
>unexpected end of data
>>>> 
>>>> # BOUM

hmm...

>>> import sys as jmr
>>> input = jmr.stdin.fileno()
>>> output = jmr.stdout.fileno()
>>> value = output / input
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: integer division or modulo by zero
>>> # BOUM

-- 
[J|O|R] <- .signature.gz

[toc] | [prev] | [next] | [standalone]


#86988

Fromwxjmfauth@gmail.com
Date2015-03-05 22:33 -0800
Message-ID<9adf48f2-5f3e-4000-8968-7879ace1ff20@googlegroups.com>
In reply to#86940
Le jeudi 5 mars 2015 12:16:06 UTC+1, Oscar a écrit :
> In article <f137c6cb-81ea-41bd-8387-2542a7fae5f9@googlegroups.com>,
>  <wxjmfauth@gmail.com> wrote:
> >>>> buffer = ('a'*998 + '\u20ac').encode('utf-8')[:1000]
> >>>> buffer.decode('utf-8')
> >Traceback (most recent call last):
> >  File "<eta last command>", line 1, in <module>
> >UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 998-999: 
> >unexpected end of data
> >>>> 
> >>>> # BOUM
> 
> hmm...
> 
> >>> import sys as jmr
> >>> input = jmr.stdin.fileno()
> >>> output = jmr.stdout.fileno()
> >>> value = output / input
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> ZeroDivisionError: integer division or modulo by zero
> >>> # BOUM
> 
> -- 
> [J|O|R] <- .signature.gz

==========

My BOUM is much better:
*** It may always happen ***.

Your BOUM happens only, if one explicitly does it.

I will not refrain people to write buggy unicode code.

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web