Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #92439

Re: Python NBSP DWIM

From random832@fastmail.us
References <20150610082812.2ce887c3@bigbox.christie.dr> <mailman.344.1433946513.13271.python-list@python.org> <55786fd5$0$13003$c3e8da3$5496439d@news.astraweb.com> <CAPTjJmqS-sx2yxrPAcN6iv625hUOQKpM8bqUqBrNfcHvyzm8AQ@mail.gmail.com>
Subject Re: Python NBSP DWIM
Date 2015-06-10 21:02 -0400
Newsgroups comp.lang.python
Message-ID <mailman.372.1433984539.13271.python-list@python.org> (permalink)

Show all headers | View raw


On Wed, Jun 10, 2015, at 20:09, Chris Angelico wrote:
> And U+FEFF "ZERO WIDTH NO-BREAK SPACE", notable because it's also used as
> the byte-order mark (as its counterpart, U+FFFE, is unallocated). I've
> been
> fighting with VLC Media Player over the font it uses for subtitles; for
> some bizarre reason, that font represents U+FEFF not with zero pixels of
> emptiness, but with a box containing the letters "ZWN" "BSP" on two
> lines.
> Yeah, because that totally takes up zero width and looks like blank
> space.

As I understand it, the proper behavior is that the ZWNBSP that is the
byte order mark shall never appear in an in-memory representation of the
first line of a BOM-encoded file, or any other line of the concatenation
of two BOM-encoded files, but should "vanish" when the file is opened
and first read from. So it shouldn't be showing up in your subtitles
regardless of its rendering behavior.

The real world, needless to say, isn't so nice.

IIRC there's also a font in MS windows that uses various glyphs which
are zero-width, but are not blank, to represent ZWJ, ZWNJ, RLM, and LRM.
Good for seeing what is happening, bad for actually rendering text
that's intended to contain these characters. Though there's another
argument that ideally a rendering engine should not render any such
glyph unless something like "visible controls" has been selected (the
real world, again, isn't so nice, which is why most symbols intended for
visible control style rendering have their own distinct code points
rather than using those of the control characters they represent).

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Re: Python NBSP DWIM Skip Montanaro <skip.montanaro@gmail.com> - 2015-06-10 09:28 -0500
  Re: Python NBSP DWIM Steven D'Aprano <steve@pearwood.info> - 2015-06-11 03:11 +1000
    Re: Python NBSP DWIM random832@fastmail.us - 2015-06-10 21:02 -0400
    Re: Python NBSP DWIM Chris Angelico <rosuav@gmail.com> - 2015-06-11 11:09 +1000
    Re: Python NBSP DWIM Steven D'Aprano <steve@pearwood.info> - 2015-06-11 12:26 +1000
      Re: Python NBSP DWIM Chris Angelico <rosuav@gmail.com> - 2015-06-11 13:05 +1000
        Re: Python NBSP DWIM Steven D'Aprano <steve@pearwood.info> - 2015-06-11 13:27 +1000
          Re: Python NBSP DWIM Chris Angelico <rosuav@gmail.com> - 2015-06-11 13:37 +1000
      Re: Python NBSP DWIM random832@fastmail.us - 2015-06-10 23:18 -0400
      Re: Python NBSP DWIM Chris Angelico <rosuav@gmail.com> - 2015-06-11 13:28 +1000

csiph-web