Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #92443
| References | <20150610082812.2ce887c3@bigbox.christie.dr> <mailman.344.1433946513.13271.python-list@python.org> <55786fd5$0$13003$c3e8da3$5496439d@news.astraweb.com> <mailman.370.1433981374.13271.python-list@python.org> <5578f1be$0$12979$c3e8da3$5496439d@news.astraweb.com> |
|---|---|
| Date | 2015-06-11 13:05 +1000 |
| Subject | Re: Python NBSP DWIM |
| From | Chris Angelico <rosuav@gmail.com> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.374.1433991937.13271.python-list@python.org> (permalink) |
On Thu, Jun 11, 2015 at 12:26 PM, Steven D'Aprano <steve@pearwood.info> wrote:
> No, despite the name, that is not a space character, it is a formatting
> character. Due to Unicode's stability policy, the name is stuck forever,
> but it should not be treated as a space character:
>
> py> unicodedata.category(' ')
> 'Zs'
> py> unicodedata.category('\u00A0') # NBSP
> 'Zs'
> py> unicodedata.category('\uFEFF') # ZWNBSP
> 'Cf'
>
>
> Ideally, outside of the BOM, you should never come across a ZWNBSP. You
> should use U+2060 WORD JOINER instead. But if you do come across one
> outside of the BOM, it should be treated as a legitimate non-space
> character:
>
> http://www.unicode.org/faq/utf_bom.html#bom6
>
> Although ZWNBSP is a "default ignorable" code point, I believe that the font
> is well within its rights to show it with a visible glyph:
>
> "Fonts can contain glyphs intended for visible display of
> default ignorable code points that would otherwise be
> rendered invisibly when not supported."
>
> http://www.unicode.org/faq/unsup_char.html
Huh. Okay, my bad. I was under the impression that it was supposed to
take up no width, as the name implies, but stability trumps logic
sometimes. Learn something new every day.
>> notable because it's also used as
>> the byte-order mark (as its counterpart, U+FFFE, is unallocated). I've
>> been fighting with VLC Media Player over the font it uses for subtitles;
>> for some bizarre reason, that font represents U+FEFF not with zero pixels
>> of emptiness, but with a box containing the letters "ZWN" "BSP" on two
>> lines. Yeah, because that totally takes up zero width and looks like blank
>> space.
>
> Why do the subtitles contain ZWNBSP in the first place? Surely they're not
> English subtitles?
No, they're not :) The character comes up in the Cantonese and
Japanese subs for Once Upon A December.
http://youtu.be/CEpcUeWP0bg
http://youtu.be/WFZAaHrHens
Possibly some others in the series as well. It may well be a fault in
the subtitles, but most programs I've seen don't show U+FEFF as a big
fat box.
ChrisA
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Re: Python NBSP DWIM Skip Montanaro <skip.montanaro@gmail.com> - 2015-06-10 09:28 -0500
Re: Python NBSP DWIM Steven D'Aprano <steve@pearwood.info> - 2015-06-11 03:11 +1000
Re: Python NBSP DWIM random832@fastmail.us - 2015-06-10 21:02 -0400
Re: Python NBSP DWIM Chris Angelico <rosuav@gmail.com> - 2015-06-11 11:09 +1000
Re: Python NBSP DWIM Steven D'Aprano <steve@pearwood.info> - 2015-06-11 12:26 +1000
Re: Python NBSP DWIM Chris Angelico <rosuav@gmail.com> - 2015-06-11 13:05 +1000
Re: Python NBSP DWIM Steven D'Aprano <steve@pearwood.info> - 2015-06-11 13:27 +1000
Re: Python NBSP DWIM Chris Angelico <rosuav@gmail.com> - 2015-06-11 13:37 +1000
Re: Python NBSP DWIM random832@fastmail.us - 2015-06-10 23:18 -0400
Re: Python NBSP DWIM Chris Angelico <rosuav@gmail.com> - 2015-06-11 13:28 +1000
csiph-web