Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #5181

Re: unicode by default

From harrismh777 <harrismh777@charter.net>
Newsgroups comp.lang.python
Subject Re: unicode by default
References <OkDyp.2983$M61.450@newsfe07.iad> <mailman.1433.1305151801.9059.python-list@python.org> <vpEyp.981$dL5.736@newsfe08.iad> <mailman.1435.1305157329.9059.python-list@python.org>
Message-ID <KDGyp.180$0t1.7@newsfe04.iad> (permalink)
Date 2011-05-11 20:22 -0500

Show all headers | View raw


John Machin wrote:
> (1) You cannot work without using bytes sequences. Files are byte
> sequences. Web communication is in bytes. You need to (know / assume / be
> able to extract / guess) the input encoding. You need to encode your
> output using an encoding that is expected by the consumer (or use an
> output method that will do it for you).
>
> (2) You don't need to use bytes to specify a Unicode code point. Just use
> an escape sequence e.g. "\u0404" is a Cyrillic character.
>

Thanks John.  In reverse order, I understand point (2). I'm less clear 
on point (1).

If I generate a string of characters that I presume to be ascii/utf-8 
(no \u0404 type characters) and write them to a file (stdout) how does 
default encoding affect that file.by default..?   I'm not seeing that 
there is anything unusual going on...   If I open the file with vi?  If 
I open the file with gedit?  emacs?

....

Another question... in mail I'm receiving many small blocks that look 
like sprites with four small hex codes, scattered about the mail... 
mostly punctuation, maybe?   ... guessing, are these unicode code 
points, and if so what is the best way to 'guess' the encoding? ... is 
it coded in the stream somewhere...protocol?

thanks

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

unicode by default harrismh777 <harrismh777@charter.net> - 2011-05-11 16:37 -0500
  Re: unicode by default Ian Kelly <ian.g.kelly@gmail.com> - 2011-05-11 16:09 -0600
    Re: unicode by default harrismh777 <harrismh777@charter.net> - 2011-05-11 17:51 -0500
      Re: unicode by default "John Machin" <sjmachin@lexicon.net> - 2011-05-12 09:32 +1000
        Re: unicode by default harrismh777 <harrismh777@charter.net> - 2011-05-11 20:22 -0500
          Re: unicode by default MRAB <python@mrabarnett.plus.com> - 2011-05-12 03:31 +0100
            Re: unicode by default Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-05-12 03:16 +0000
              Re: unicode by default harrismh777 <harrismh777@charter.net> - 2011-05-11 22:44 -0500
                Re: unicode by default Terry Reedy <tjreedy@udel.edu> - 2011-05-12 00:12 -0400
                Re: unicode by default harrismh777 <harrismh777@charter.net> - 2011-05-12 01:43 -0500
                Re: unicode by default "John Machin" <sjmachin@lexicon.net> - 2011-05-12 14:14 +1000
                Re: unicode by default Benjamin Kaplan <benjamin.kaplan@case.edu> - 2011-05-11 21:14 -0700
                Re: unicode by default "John Machin" <sjmachin@lexicon.net> - 2011-05-12 14:41 +1000
                Re: unicode by default harrismh777 <harrismh777@charter.net> - 2011-05-12 01:14 -0500
                Re: unicode by default TheSaint <nobody@nowhere.net.no> - 2011-05-12 20:40 +0800
            Re: unicode by default Ben Finney <ben+python@benfinney.id.au> - 2011-05-12 14:07 +1000
              Re: unicode by default harrismh777 <harrismh777@charter.net> - 2011-05-12 01:31 -0500
                Re: unicode by default "John Machin" <sjmachin@lexicon.net> - 2011-05-12 17:58 +1000
                Re: unicode by default Ian Kelly <ian.g.kelly@gmail.com> - 2011-05-12 10:17 -0600
                Re: unicode by default jmfauth <wxjmfauth@gmail.com> - 2011-05-12 23:28 -0700
                Re: unicode by default harrismh777 <harrismh777@charter.net> - 2011-05-13 14:53 -0500
                Re: unicode by default Robert Kern <robert.kern@gmail.com> - 2011-05-13 15:18 -0500
                Re: unicode by default Terry Reedy <tjreedy@udel.edu> - 2011-05-13 21:41 -0400
                Re: unicode by default harrismh777 <harrismh777@charter.net> - 2011-05-14 02:41 -0500
                Re: unicode by default jmfauth <wxjmfauth@gmail.com> - 2011-05-14 03:26 -0700
                Re: unicode by default Terry Reedy <tjreedy@udel.edu> - 2011-05-14 16:26 -0400
                Re: unicode by default Ben Finney <ben+python@benfinney.id.au> - 2011-05-15 09:47 +1000
                Re: unicode by default Nobody <nobody@nowhere.com> - 2011-05-14 09:34 +0100
                Re: unicode by default Terry Reedy <tjreedy@udel.edu> - 2011-05-12 16:42 -0400
                Re: unicode by default Ian Kelly <ian.g.kelly@gmail.com> - 2011-05-12 16:25 -0600
          Re: unicode by default "John Machin" <sjmachin@lexicon.net> - 2011-05-12 13:54 +1000
  Re: unicode by default Benjamin Kaplan <benjamin.kaplan@case.edu> - 2011-05-11 15:34 -0700

csiph-web