Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.002 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'encoding': 0.05; 'that?': 0.05; 'subject:Python': 0.06; 'utf-8': 0.07; 'string': 0.09; 'character,': 0.09; 'encode': 0.09; 'snippet': 0.09; 'python': 0.11; '2.7': 0.14; 'random': 0.14; 'windows': 0.15; 'clear.': 0.16; 'from:addr:torriem': 0.16; 'from:name:michael torrie': 0.16; 'throw': 0.16; 'unicode.': 0.16; 'using,': 0.16; 'index': 0.16; 'all.': 0.16; 'wrote:': 0.18; 'variable': 0.18; 'all,': 0.19; 'bit': 0.19; 'properly': 0.19; 'seems': 0.21; 'programming': 0.22; 'header:User-Agent:1': 0.23; 'aspect': 0.24; 'byte': 0.24; 'bytes': 0.24; 'unicode': 0.24; 'environment': 0.24; 'header:In- Reply-To:1': 0.27; 'correct': 0.29; 'am,': 0.29; 'character': 0.29; 'sets': 0.30; "i'm": 0.30; 'code': 0.31; '(on': 0.31; '(usually': 0.31; '>>>>': 0.31; 'assert': 0.31; 'text': 0.33; 'linux': 0.33; 'maybe': 0.34; 'form.': 0.35; 'but': 0.35; 'there': 0.35; 'belong': 0.36; 'set.': 0.36; 'wrong': 0.37; 'message- id:@gmail.com': 0.38; 'nov': 0.38; 'stopped': 0.38; 'to:addr :python-list': 0.38; 'little': 0.38; 'sure': 0.39; 'to:addr:python.org': 0.39; 'received:org': 0.40; 'how': 0.40; 'skip:u 10': 0.60; "you're": 0.61; 'email addr:gmail.com': 0.63; 'provide': 0.64; 'more': 0.64; 'afraid': 0.65; 'subjectcharset:iso-8859-1': 0.66; 'between': 0.67; 'default': 0.69; 'fourth': 0.84; 'streams': 0.84; 'subject::': 0.85; '2013,': 0.91 X-Virus-Scanned: amavisd-new at torriefamily.org Date: Mon, 13 Jan 2014 08:58:50 -0700 From: Michael Torrie User-Agent: Mozilla/5.0 (X11; Linux i686; rv:10.0.12) Gecko/20130105 Thunderbird/10.0.12 MIME-Version: 1.0 To: python-list@python.org Subject: Re: =?ISO-8859-1?Q?=27Stra=DFe=27_=28=27Strasse=27=29_and_?= =?ISO-8859-1?Q?Python_2?= References: <30dfa6f1-61b2-49b8-bc65-5fd18d498c38@googlegroups.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 40 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1389628765 news.xs4all.nl 2840 [2001:888:2000:d::a6]:56905 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:63834 On 01/13/2014 02:54 AM, wxjmfauth@gmail.com wrote: > Not at all. I'm afraid I'm understanding Python (on this > aspect very well). Are you sure about that? Seems to me you're still confused as to the difference between unicode and encodings. > > Do you belong to this group of people who are naively > writing wrong Python code (usually not properly working) > during more than a decade? > > 'ß' is the the fourth character in that text "Straße" > (base index 0). > > This assertions are correct (byte string and unicode). How can they be? They only are true for the default encoding and character set you are using, which happens to have 'ß' as a single byte. Hence your little python 2.7 snippet is not using unicode at all, in any form. It's using a non-unicode character set. There are methods which can decode your character set to unicode and encode from unicode. But let's be clear. Your byte streams are not unicode! If the default byte encoding is UTF-8, which is a variable number of bytes per character, your assertions are completely wrong. Maybe it's time you stopped programming in Windows and use OS X or Linux which throw out the random single-byte character sets and instead provide a UTF-8 terminal environment to support non-latin characters. > >>>> sys.version > '2.7.6 (default, Nov 10 2013, 19:24:18) [MSC v.1500 32 bit (Intel)]' >>>> assert 'Straße'[4] == 'ß' >>>> assert u'Straße'[4] == u'ß' >>>> > > jmf > > PS Nothing to do with Py2/Py3.