Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #90785
| Date | 2015-05-17 15:12 -0500 |
|---|---|
| From | Tim Chase <python.list@tim.thechases.com> |
| Subject | Re: textwrap.wrap() breaks non-breaking spaces |
| References | <mjaqqd$t0m$1@news.albasani.net> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.98.1431894065.17265.python-list@python.org> (permalink) |
On 2015-05-17 21:39, Johannes Bauer wrote:
> Hey there,
>
> so that textwrap.wrap() breks non-breaking spaces, is this a bug or
> intended behavior? For example:
>
> Python 3.4.0 (default, Apr 11 2014, 13:05:11)
> [GCC 4.8.2] on linux
>
> >>> import textwrap
> >>> for line in textwrap.wrap("foo dont\xa0break " * 20):
> >>> print(line)
> ...
> foo dont break foo dont break foo dont break foo dont break foo dont
> break foo dont break foo dont break foo dont break foo dont break
> foo dont break foo dont break foo dont break foo dont break foo
> dont break foo dont break foo dont break foo dont break foo dont
> break foo dont break foo dont break
>
> Apparently it does recognize that \xa0 is a kind of space, but it
> thinks it can break any space. The point of \xa0 being exactly to
> avoid this kind of thing.
>
> Any remedy or ideas?
Since it uses a TextWrapper class, you can subclass that and
then assert that the spaces found for splitting aren't
non-breaking spaces. Note that, to use the "\u00a0"
notation, the particular string has to be a non-raw string.
You can compare the two regular expressions with those in
the original source file in your $STDLIB/textwrap.py
import textwrap
import re
class MyWrapper(textwrap.TextWrapper):
wordsep_re = re.compile(
'((?!\u00a0)\\s+|' # any whitespace
r'[^\s\w]*\w+[^0-9\W]-(?=\w+[^0-9\W])|' # hyphenated words
r'(?<=[\w\!\"\'\&\.\,\?])-{2,}(?=\w))') # em-dash
# This less funky little regex just split on recognized spaces. E.g.
# "Hello there -- you goof-ball, use the -b option!"
# splits into
# Hello/ /there/ /--/ /you/ /goof-ball,/ /use/ /the/ /-b/ /option!/
wordsep_simple_re = re.compile('((?!\u00a0)\\s+)')
s = 'foo dont\u00a0break ' * 20
wrapper = MyWrapper()
for line in wrapper.wrap(s):
print(line)
Based on my tests, it gives the results you were looking
for.
-tkc
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
textwrap.wrap() breaks non-breaking spaces Johannes Bauer <dfnsonfsduifb@gmx.de> - 2015-05-17 21:39 +0200
Re: textwrap.wrap() breaks non-breaking spaces Tim Chase <python.list@tim.thechases.com> - 2015-05-17 15:12 -0500
Re: textwrap.wrap() breaks non-breaking spaces Ned Batchelder <ned@nedbatchelder.com> - 2015-05-17 13:24 -0700
Re: textwrap.wrap() breaks non-breaking spaces Roy Smith <roy@panix.com> - 2015-05-21 22:36 -0400
Re: textwrap.wrap() breaks non-breaking spaces wxjmfauth@gmail.com - 2015-05-22 06:51 -0700
csiph-web