Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #86852 > unrolled thread
| Started by | Paul Moore <p.f.moore@gmail.com> |
|---|---|
| First post | 2015-03-03 08:07 -0800 |
| Last post | 2015-03-05 22:33 -0800 |
| Articles | 6 — 4 participants |
Back to article view | Back to comp.lang.python
Reading all buffered bytes without blocking Paul Moore <p.f.moore@gmail.com> - 2015-03-03 08:07 -0800
Re: Reading all buffered bytes without blocking Serhiy Storchaka <storchaka@gmail.com> - 2015-03-03 21:28 +0200
Re: Reading all buffered bytes without blocking Paul Moore <p.f.moore@gmail.com> - 2015-03-03 13:10 -0800
Re: Reading all buffered bytes without blocking wxjmfauth@gmail.com - 2015-03-05 00:41 -0800
Re: Reading all buffered bytes without blocking jornws0718@xs4all.nl (Oscar) - 2015-03-05 11:15 +0000
Re: Reading all buffered bytes without blocking wxjmfauth@gmail.com - 2015-03-05 22:33 -0800
| From | Paul Moore <p.f.moore@gmail.com> |
|---|---|
| Date | 2015-03-03 08:07 -0800 |
| Subject | Reading all buffered bytes without blocking |
| Message-ID | <4ad47d38-bd92-4516-bc60-60fddc9e0666@googlegroups.com> |
Is it possible to say to a BufferedReader stream "give me all the bytes you have available in the buffer, or do one OS call and give me everything you get back"? The problem is that the "number of bytes" argument to read1() isn't optional, so I can't do available_bytes = fd.read1(). I need this because I want to decode the returned bytes from UTF-8, and I *might* get a character split across the boundary of any arbitrary block size I choose. (I'm happy to ignore the possibility that the *source* did a flush part-way through a character). I don't really want to have to do incremental encoding if I can avoid it - it looks hard... Thanks, Paul
[toc] | [next] | [standalone]
| From | Serhiy Storchaka <storchaka@gmail.com> |
|---|---|
| Date | 2015-03-03 21:28 +0200 |
| Message-ID | <mailman.25.1425410913.21433.python-list@python.org> |
| In reply to | #86852 |
On 03.03.15 18:07, Paul Moore wrote: > Is it possible to say to a BufferedReader stream "give me all the bytes you have available in the buffer, or do one OS call and give me everything you get back"? The problem is that the "number of bytes" argument to read1() isn't optional, so I can't do available_bytes = fd.read1(). Just specify large size.
[toc] | [prev] | [next] | [standalone]
| From | Paul Moore <p.f.moore@gmail.com> |
|---|---|
| Date | 2015-03-03 13:10 -0800 |
| Message-ID | <a56af750-eeb0-49a1-ab2c-f2020f0585fb@googlegroups.com> |
| In reply to | #86861 |
On Tuesday, 3 March 2015 19:29:19 UTC, Serhiy Storchaka wrote:
> On 03.03.15 18:07, Paul Moore wrote:
> > Is it possible to say to a BufferedReader stream "give me all the bytes you have available in the buffer, or do one OS call and give me everything you get back"? The problem is that the "number of bytes" argument to read1() isn't optional, so I can't do available_bytes = fd.read1().
>
> Just specify large size.
Thanks. Looking at the source, it appears that a large size will allocate a buffer that size for the data even if the amount actually read is small (thinking about it, of couse it has to, doh, because the syscall needs it).
Anyway, it's a pretty microscopic risk in practice, and when I looked at them, the incremental codecs (codecs.iterdecode) really aren't that hard to use, so I can do it that way if it matters enough.
For what it's worth, in case anyone wants to know, incremental decoding looks like this:
def get():
while True:
data = process.stdout.read(1000)
if not data:
break
yield data
for data in codecs.iterdecode(get(), encoding):
sys.stdout.write(data)
sys.stdout.flush()
Thanks.
Paul
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2015-03-05 00:41 -0800 |
| Message-ID | <f137c6cb-81ea-41bd-8387-2542a7fae5f9@googlegroups.com> |
| In reply to | #86867 |
Le mardi 3 mars 2015 22:10:36 UTC+1, Paul Moore a écrit :
> On Tuesday, 3 March 2015 19:29:19 UTC, Serhiy Storchaka wrote:
> > On 03.03.15 18:07, Paul Moore wrote:
> > > Is it possible to say to a BufferedReader stream "give me all the bytes you have available in the buffer, or do one OS call and give me everything you get back"? The problem is that the "number of bytes" argument to read1() isn't optional, so I can't do available_bytes = fd.read1().
> >
> > Just specify large size.
>
> Thanks. Looking at the source, it appears that a large size will allocate a buffer that size for the data even if the amount actually read is small (thinking about it, of couse it has to, doh, because the syscall needs it).
>
> Anyway, it's a pretty microscopic risk in practice, and when I looked at them, the incremental codecs (codecs.iterdecode) really aren't that hard to use, so I can do it that way if it matters enough.
>
> For what it's worth, in case anyone wants to know, incremental decoding looks like this:
>
> def get():
> while True:
> data = process.stdout.read(1000)
> if not data:
> break
> yield data
> for data in codecs.iterdecode(get(), encoding):
> sys.stdout.write(data)
> sys.stdout.flush()
>
> Thanks.
> Paul
======
>>> buffer = ('a'*998 + '\u20ac').encode('utf-8')[:1000]
>>> buffer.decode('utf-8')
Traceback (most recent call last):
File "<eta last command>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 998-999:
unexpected end of data
>>>
>>> # BOUM
[toc] | [prev] | [next] | [standalone]
| From | jornws0718@xs4all.nl (Oscar) |
|---|---|
| Date | 2015-03-05 11:15 +0000 |
| Message-ID | <54f83ab7$0$2849$e4fe514c@news2.news.xs4all.nl> |
| In reply to | #86938 |
In article <f137c6cb-81ea-41bd-8387-2542a7fae5f9@googlegroups.com>,
<wxjmfauth@gmail.com> wrote:
>>>> buffer = ('a'*998 + '\u20ac').encode('utf-8')[:1000]
>>>> buffer.decode('utf-8')
>Traceback (most recent call last):
> File "<eta last command>", line 1, in <module>
>UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 998-999:
>unexpected end of data
>>>>
>>>> # BOUM
hmm...
>>> import sys as jmr
>>> input = jmr.stdin.fileno()
>>> output = jmr.stdout.fileno()
>>> value = output / input
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: integer division or modulo by zero
>>> # BOUM
--
[J|O|R] <- .signature.gz
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2015-03-05 22:33 -0800 |
| Message-ID | <9adf48f2-5f3e-4000-8968-7879ace1ff20@googlegroups.com> |
| In reply to | #86940 |
Le jeudi 5 mars 2015 12:16:06 UTC+1, Oscar a écrit :
> In article <f137c6cb-81ea-41bd-8387-2542a7fae5f9@googlegroups.com>,
> <wxjmfauth@gmail.com> wrote:
> >>>> buffer = ('a'*998 + '\u20ac').encode('utf-8')[:1000]
> >>>> buffer.decode('utf-8')
> >Traceback (most recent call last):
> > File "<eta last command>", line 1, in <module>
> >UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 998-999:
> >unexpected end of data
> >>>>
> >>>> # BOUM
>
> hmm...
>
> >>> import sys as jmr
> >>> input = jmr.stdin.fileno()
> >>> output = jmr.stdout.fileno()
> >>> value = output / input
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> ZeroDivisionError: integer division or modulo by zero
> >>> # BOUM
>
> --
> [J|O|R] <- .signature.gz
==========
My BOUM is much better:
*** It may always happen ***.
Your BOUM happens only, if one explicitly does it.
I will not refrain people to write buggy unicode code.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web