Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!feeder.news-service.com!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.011 X-Spam-Evidence: '*H*': 0.98; '*S*': 0.00; 'escape': 0.04; 'bytes.': 0.07; 'encoding.': 0.09; 'output': 0.12; 'am,': 0.14; 'wrote:': 0.14; 'contrary.': 0.16; 'received:203.24': 0.16; 'subject:unicode': 0.16; 'input': 0.18; 'bytes': 0.19; '(or': 0.22; 'code': 0.22; 'header:In-Reply-To:1': 0.22; 'e.g.': 0.22; 'thu,': 0.22; 'sequences.': 0.23; 'byte': 0.25; 'extract': 0.25; 'specify': 0.25; 'assume': 0.25; 'unicode': 0.29; 'character.': 0.31; 'to:addr:python-list': 0.32; 'using': 0.34; 'header:User- Agent:1': 0.35; 'quite': 0.36; 'sequence': 0.38; 'files': 0.38; 'unless': 0.38; 'to:addr:python.org': 0.39; '2011': 0.62; 'care': 0.67; 'reply-to:no real name:2**0': 0.72; 'header:Reply-To:1': 0.72; 'consumer': 0.80; 'you).': 0.91 In-Reply-To: References: Date: Thu, 12 May 2011 09:32:06 +1000 Subject: Re: unicode by default From: "John Machin" To: python-list@python.org User-Agent: SquirrelMail/1.4.21 MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list Reply-To: sjmachin@lexicon.net List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 19 NNTP-Posting-Host: 82.94.164.166 X-Trace: 1305157330 news.xs4all.nl 41102 [::ffff:82.94.164.166]:34099 X-Complaints-To: abuse@xs4all.nl Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:5177 On Thu, May 12, 2011 8:51 am, harrismh777 wrote: > Is it true that if I am > working without using bytes sequences that I will not need to care about > the encoding anyway, unless of course I need to specify a unicode code > point? Quite the contrary. (1) You cannot work without using bytes sequences. Files are byte sequences. Web communication is in bytes. You need to (know / assume / be able to extract / guess) the input encoding. You need to encode your output using an encoding that is expected by the consumer (or use an output method that will do it for you). (2) You don't need to use bytes to specify a Unicode code point. Just use an escape sequence e.g. "\u0404" is a Cyrillic character.