Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #5194

Re: unicode by default

Path csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!feeder.news-service.com!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <python-python-list@m.gmane.org>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.000
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'help?': 0.03; '*not*': 0.05; 'bytes.': 0.07; 'mode,': 0.07; 'resource.': 0.07; 'terry': 0.07; 'python': 0.07; 'bytes,': 0.09; 'default.': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:80.91.229.12': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'received:lo.gmane.org': 0.09; 'statement.': 0.09; 'utf-8': 0.09; 'pm,': 0.11; 'this:': 0.11; '>>>': 0.12; 'output': 0.12; 'binary': 0.14; 'wrote:': 0.14; '(string': 0.16; '3.2)': 0.16; 'emacs': 0.16; 'emacs,': 0.16; 'guys,': 0.16; 'reedy': 0.16; 'subject:unicode': 0.16; 'url:unicode': 0.16; 'bytes': 0.19; 'jan': 0.22; 'header:In-Reply- To:1': 0.22; 'specified': 0.22; 'stored': 0.25; "i'm": 0.26; 'instead': 0.26; 'changed': 0.27; 'looks': 0.28; 'thanks': 0.29; 'assuming': 0.29; 'unicode': 0.29; 'confused': 0.31; 'url:articles': 0.31; 'does': 0.31; 'to:addr:python-list': 0.32; '...': 0.32; "i've": 0.33; 'character': 0.33; 'using': 0.34; 'header:X-Complaints-To:1': 0.34; 'file.': 0.34; 'got': 0.34; 'difference': 0.35; 'file': 0.35; 'characters': 0.35; 'open': 0.35; 'header:User-Agent:1': 0.35; 'some': 0.37; 'either': 0.37; 'sequence': 0.38; 'steven': 0.38; 'but': 0.38; 'sign': 0.38; 'so,': 0.38; 'used': 0.38; 'received:org': 0.38; 'to:addr:python.org': 0.39; 'header:Mime-Version:1': 0.39; 'how': 0.39; 'header:Received:5': 0.40; 'done:': 0.84; 'here...': 0.91
X-Injected-Via-Gmane http://gmane.org/
To python-list@python.org
From Terry Reedy <tjreedy@udel.edu>
Subject Re: unicode by default
Date Thu, 12 May 2011 00:12:02 -0400
References <OkDyp.2983$M61.450@newsfe07.iad> <mailman.1433.1305151801.9059.python-list@python.org> <vpEyp.981$dL5.736@newsfe08.iad> <mailman.1435.1305157329.9059.python-list@python.org> <KDGyp.180$0t1.7@newsfe04.iad> <mailman.1439.1305167541.9059.python-list@python.org> <4dcb50f8$0$29973$c3e8da3$5496439d@news.astraweb.com> <rIIyp.3007$M61.2987@newsfe07.iad>
Mime-Version 1.0
Content-Type text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding quoted-printable
X-Gmane-NNTP-Posting-Host rain.gmane.org
User-Agent Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.17) Gecko/20110414 Lightning/1.0b2 Thunderbird/3.1.10
In-Reply-To <rIIyp.3007$M61.2987@newsfe07.iad>
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.12
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.1443.1305173553.9059.python-list@python.org> (permalink)
Lines 62
NNTP-Posting-Host 82.94.164.166
X-Trace 1305173553 news.xs4all.nl 81479 [::ffff:82.94.164.166]:35066
X-Complaints-To abuse@xs4all.nl
Xref x330-a1.tempe.blueboxinc.net comp.lang.python:5194

Show key headers only | View raw


On 5/11/2011 11:44 PM, harrismh777 wrote:
> Steven D'Aprano wrote:
>>> You need to understand the difference between characters and bytes.
>>
>> http://www.joelonsoftware.com/articles/Unicode.html
>>
>> is also a good resource.
>
> Thanks for being patient guys, here's what I've done:
>
>>>>> astr="pound sign"
>>>>> asym=" \u00A3"
>>>>> afile=open("myfile", mode='w')
>>>>> afile.write(astr + asym)
>> 12
>>>>> afile.close()
>
>
> When I edit "myfile" with vi I see the 'characters' :
>
> pound sign £
>
> ... same with emacs, same with gedit ...
>
>
> When I hexdump myfile I see this:
>
> 0000000 6f70 6375 2064 6973 6e67 c220 00a3

> This is *not* what I expected... well it is (little-endian) right up to
> the 'c2' and that is what is confusing me....

> I did not open the file with an encoding of UTF-8... so I'm assuming
> UTF-16 by default (python3) so I was expecting a '00A3' little-endian as
> 'A300' but what I got instead was UTF-8 little-endian 'c2a3' ....
>
> See my problem?... when I open the file with emacs I see the character
> pound sign... same with gedit... they're all using UTF-8 by default. By
> default it looks like Python3 is writing output with UTF-8 as default...
> and I thought that by default Python3 was using either UTF-16 or UTF-32.
> So, I'm confused here... also, I used the character sequence \u00A3
> which I thought was UTF-16... but Python3 changed my intent to 'c2a3'
> which is the normal UTF-8...

If you open a file as binary (bytes), you must write bytes, and they are 
stored without transformation. If you open in text mode, you must write 
text (string as unicode in 3.2) and Python will encode to bytes using 
either some default or the encoding you specified in the open statement. 
It does not matter how Python stored the unicode internally. Does this 
help? Your intent is signalled by how you open the file.

-- 
Terry Jan Reedy

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

unicode by default harrismh777 <harrismh777@charter.net> - 2011-05-11 16:37 -0500
  Re: unicode by default Ian Kelly <ian.g.kelly@gmail.com> - 2011-05-11 16:09 -0600
    Re: unicode by default harrismh777 <harrismh777@charter.net> - 2011-05-11 17:51 -0500
      Re: unicode by default "John Machin" <sjmachin@lexicon.net> - 2011-05-12 09:32 +1000
        Re: unicode by default harrismh777 <harrismh777@charter.net> - 2011-05-11 20:22 -0500
          Re: unicode by default MRAB <python@mrabarnett.plus.com> - 2011-05-12 03:31 +0100
            Re: unicode by default Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-05-12 03:16 +0000
              Re: unicode by default harrismh777 <harrismh777@charter.net> - 2011-05-11 22:44 -0500
                Re: unicode by default Terry Reedy <tjreedy@udel.edu> - 2011-05-12 00:12 -0400
                Re: unicode by default harrismh777 <harrismh777@charter.net> - 2011-05-12 01:43 -0500
                Re: unicode by default "John Machin" <sjmachin@lexicon.net> - 2011-05-12 14:14 +1000
                Re: unicode by default Benjamin Kaplan <benjamin.kaplan@case.edu> - 2011-05-11 21:14 -0700
                Re: unicode by default "John Machin" <sjmachin@lexicon.net> - 2011-05-12 14:41 +1000
                Re: unicode by default harrismh777 <harrismh777@charter.net> - 2011-05-12 01:14 -0500
                Re: unicode by default TheSaint <nobody@nowhere.net.no> - 2011-05-12 20:40 +0800
            Re: unicode by default Ben Finney <ben+python@benfinney.id.au> - 2011-05-12 14:07 +1000
              Re: unicode by default harrismh777 <harrismh777@charter.net> - 2011-05-12 01:31 -0500
                Re: unicode by default "John Machin" <sjmachin@lexicon.net> - 2011-05-12 17:58 +1000
                Re: unicode by default Ian Kelly <ian.g.kelly@gmail.com> - 2011-05-12 10:17 -0600
                Re: unicode by default jmfauth <wxjmfauth@gmail.com> - 2011-05-12 23:28 -0700
                Re: unicode by default harrismh777 <harrismh777@charter.net> - 2011-05-13 14:53 -0500
                Re: unicode by default Robert Kern <robert.kern@gmail.com> - 2011-05-13 15:18 -0500
                Re: unicode by default Terry Reedy <tjreedy@udel.edu> - 2011-05-13 21:41 -0400
                Re: unicode by default harrismh777 <harrismh777@charter.net> - 2011-05-14 02:41 -0500
                Re: unicode by default jmfauth <wxjmfauth@gmail.com> - 2011-05-14 03:26 -0700
                Re: unicode by default Terry Reedy <tjreedy@udel.edu> - 2011-05-14 16:26 -0400
                Re: unicode by default Ben Finney <ben+python@benfinney.id.au> - 2011-05-15 09:47 +1000
                Re: unicode by default Nobody <nobody@nowhere.com> - 2011-05-14 09:34 +0100
                Re: unicode by default Terry Reedy <tjreedy@udel.edu> - 2011-05-12 16:42 -0400
                Re: unicode by default Ian Kelly <ian.g.kelly@gmail.com> - 2011-05-12 16:25 -0600
          Re: unicode by default "John Machin" <sjmachin@lexicon.net> - 2011-05-12 13:54 +1000
  Re: unicode by default Benjamin Kaplan <benjamin.kaplan@case.edu> - 2011-05-11 15:34 -0700

csiph-web