Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #90784 > unrolled thread
| Started by | Johannes Bauer <dfnsonfsduifb@gmx.de> |
|---|---|
| First post | 2015-05-17 21:39 +0200 |
| Last post | 2015-05-22 06:51 -0700 |
| Articles | 5 — 5 participants |
Back to article view | Back to comp.lang.python
textwrap.wrap() breaks non-breaking spaces Johannes Bauer <dfnsonfsduifb@gmx.de> - 2015-05-17 21:39 +0200
Re: textwrap.wrap() breaks non-breaking spaces Tim Chase <python.list@tim.thechases.com> - 2015-05-17 15:12 -0500
Re: textwrap.wrap() breaks non-breaking spaces Ned Batchelder <ned@nedbatchelder.com> - 2015-05-17 13:24 -0700
Re: textwrap.wrap() breaks non-breaking spaces Roy Smith <roy@panix.com> - 2015-05-21 22:36 -0400
Re: textwrap.wrap() breaks non-breaking spaces wxjmfauth@gmail.com - 2015-05-22 06:51 -0700
| From | Johannes Bauer <dfnsonfsduifb@gmx.de> |
|---|---|
| Date | 2015-05-17 21:39 +0200 |
| Subject | textwrap.wrap() breaks non-breaking spaces |
| Message-ID | <mjaqqd$t0m$1@news.albasani.net> |
Hey there,
so that textwrap.wrap() breks non-breaking spaces, is this a bug or
intended behavior? For example:
Python 3.4.0 (default, Apr 11 2014, 13:05:11)
[GCC 4.8.2] on linux
>>> import textwrap
>>> for line in textwrap.wrap("foo dont\xa0break " * 20): print(line)
...
foo dont break foo dont break foo dont break foo dont break foo dont
break foo dont break foo dont break foo dont break foo dont break foo
dont break foo dont break foo dont break foo dont break foo dont break
foo dont break foo dont break foo dont break foo dont break foo dont
break foo dont break
Apparently it does recognize that \xa0 is a kind of space, but it thinks
it can break any space. The point of \xa0 being exactly to avoid this
kind of thing.
Any remedy or ideas?
Cheers,
Johannes
--
>> Wo hattest Du das Beben nochmal GENAU vorhergesagt?
> Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
- Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$1@speranza.aioe.org>
[toc] | [next] | [standalone]
| From | Tim Chase <python.list@tim.thechases.com> |
|---|---|
| Date | 2015-05-17 15:12 -0500 |
| Message-ID | <mailman.98.1431894065.17265.python-list@python.org> |
| In reply to | #90784 |
On 2015-05-17 21:39, Johannes Bauer wrote:
> Hey there,
>
> so that textwrap.wrap() breks non-breaking spaces, is this a bug or
> intended behavior? For example:
>
> Python 3.4.0 (default, Apr 11 2014, 13:05:11)
> [GCC 4.8.2] on linux
>
> >>> import textwrap
> >>> for line in textwrap.wrap("foo dont\xa0break " * 20):
> >>> print(line)
> ...
> foo dont break foo dont break foo dont break foo dont break foo dont
> break foo dont break foo dont break foo dont break foo dont break
> foo dont break foo dont break foo dont break foo dont break foo
> dont break foo dont break foo dont break foo dont break foo dont
> break foo dont break foo dont break
>
> Apparently it does recognize that \xa0 is a kind of space, but it
> thinks it can break any space. The point of \xa0 being exactly to
> avoid this kind of thing.
>
> Any remedy or ideas?
Since it uses a TextWrapper class, you can subclass that and
then assert that the spaces found for splitting aren't
non-breaking spaces. Note that, to use the "\u00a0"
notation, the particular string has to be a non-raw string.
You can compare the two regular expressions with those in
the original source file in your $STDLIB/textwrap.py
import textwrap
import re
class MyWrapper(textwrap.TextWrapper):
wordsep_re = re.compile(
'((?!\u00a0)\\s+|' # any whitespace
r'[^\s\w]*\w+[^0-9\W]-(?=\w+[^0-9\W])|' # hyphenated words
r'(?<=[\w\!\"\'\&\.\,\?])-{2,}(?=\w))') # em-dash
# This less funky little regex just split on recognized spaces. E.g.
# "Hello there -- you goof-ball, use the -b option!"
# splits into
# Hello/ /there/ /--/ /you/ /goof-ball,/ /use/ /the/ /-b/ /option!/
wordsep_simple_re = re.compile('((?!\u00a0)\\s+)')
s = 'foo dont\u00a0break ' * 20
wrapper = MyWrapper()
for line in wrapper.wrap(s):
print(line)
Based on my tests, it gives the results you were looking
for.
-tkc
[toc] | [prev] | [next] | [standalone]
| From | Ned Batchelder <ned@nedbatchelder.com> |
|---|---|
| Date | 2015-05-17 13:24 -0700 |
| Message-ID | <322460a5-ac25-467e-ae90-9a610e3bbb5f@googlegroups.com> |
| In reply to | #90784 |
On Sunday, May 17, 2015 at 3:40:07 PM UTC-4, Johannes Bauer wrote:
> Hey there,
>
> so that textwrap.wrap() breks non-breaking spaces, is this a bug or
> intended behavior? For example:
>
> Python 3.4.0 (default, Apr 11 2014, 13:05:11)
> [GCC 4.8.2] on linux
>
> >>> import textwrap
> >>> for line in textwrap.wrap("foo dont\xa0break " * 20): print(line)
> ...
> foo dont break foo dont break foo dont break foo dont break foo dont
> break foo dont break foo dont break foo dont break foo dont break foo
> dont break foo dont break foo dont break foo dont break foo dont break
> foo dont break foo dont break foo dont break foo dont break foo dont
> break foo dont break
>
> Apparently it does recognize that \xa0 is a kind of space, but it thinks
> it can break any space. The point of \xa0 being exactly to avoid this
> kind of thing.
There's a Python bug about this: http://bugs.python.org/issue20491
--Ned.
>
> Any remedy or ideas?
>
> Cheers,
> Johannes
[toc] | [prev] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2015-05-21 22:36 -0400 |
| Message-ID | <roy-4ADE41.22362621052015@news.panix.com> |
| In reply to | #90784 |
In article <mjaqqd$t0m$1@news.albasani.net>, Johannes Bauer <dfnsonfsduifb@gmx.de> wrote: > so that textwrap.wrap() breks non-breaking spaces, is this a bug or > intended behavior? I opened http://bugs.python.org/issue16623 on this a couple of years ago. Looks like it was being worked (http://bugs.python.org/issue20491) but got stalled in the testing stage.
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2015-05-22 06:51 -0700 |
| Message-ID | <6f3cc2dc-650c-4bdd-b10f-551df3ff22ea@googlegroups.com> |
| In reply to | #91028 |
----------
Textwrap will probably never work.
It is however possible to "sweat" on that subject.
It is probably not visible, but in my GUI interactive
interpreter, this works correctly (output stream).
Two variants: unicode litterals and glyphs.
Copy/paste from my interpreter window onto FireFox
window on Windows 7.
>>> print( (('a\t' + 'bbb ሴ\u00a0䕧\U00100061 ' * 20) + '\n')*2)
a bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb
ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb
ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧
a bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb
ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb
ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧
>>> print( (('a\t' + 'bbb ሴ 䕧 ' * 20) + '\n')*2)
a bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb
ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb
ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧
a bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb
ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb
ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧 bbb ሴ 䕧
>>> sys.version
'3.2.5 (default, May 15 2013, 23:06:03) [MSC v.1500 32 bit (Intel)]'
>>>
jmf
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web