Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!feeder.news-service.com!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
In-Reply-To: <vpEyp.981$dL5.736@newsfe08.iad>
References: <OkDyp.2983$M61.450@newsfe07.iad> <mailman.1433.1305151801.9059.python-list@python.org> <vpEyp.981$dL5.736@newsfe08.iad>
Date: Thu, 12 May 2011 09:32:06 +1000
Subject: Re: unicode by default
From: "John Machin" <sjmachin@lexicon.net>
To: python-list@python.org
User-Agent: SquirrelMail/1.4.21
MIME-Version: 1.0
Content-Type: text/plain;charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Importance: Normal
Precedence: list
Reply-To: sjmachin@lexicon.net
Newsgroups: comp.lang.python
Message-ID: <mailman.1435.1305157329.9059.python-list@python.org>
Lines: 19
NNTP-Posting-Host: 82.94.164.166
Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:5177

On Thu, May 12, 2011 8:51 am, harrismh777 wrote:
> Is it true that if I am
> working without using bytes sequences that I will not need to care about
> the encoding anyway, unless of course I need to specify a unicode code
> point?

Quite the contrary.

(1) You cannot work without using bytes sequences. Files are byte
sequences. Web communication is in bytes. You need to (know / assume / be
able to extract / guess) the input encoding. You need to encode your
output using an encoding that is expected by the consumer (or use an
output method that will do it for you).

(2) You don't need to use bytes to specify a Unicode code point. Just use
an escape sequence e.g. "\u0404" is a Cyrillic character.