Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.mixmin.net!rt.uk.eu.org!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.002 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'encoding': 0.05; 'explicitly': 0.05; 'tree': 0.05; 'subject:Python': 0.06; 'encoded': 0.07; 'utf-8': 0.07; 'parsed': 0.09; 'wrong,': 0.09; 'cc:addr:python-list': 0.11; 'assume': 0.14; '1:48': 0.16; 'encoding.': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'in;': 0.16; 'simple.': 0.16; 'throw': 0.16; 'using,': 0.16; 'wrote:': 0.18; 'thu,': 0.19; 'meant': 0.20; 'cc:addr:python.org': 0.22; 'error': 0.23; 'replace': 0.24; 'cc:2**0': 0.24; 'header:In-Reply-To:1': 0.27; 'am,': 0.29; 'especially': 0.30; 'message-id:@mail.gmail.com': 0.30; 'develops': 0.31; 'file': 0.32; 'figure': 0.32; 'cases': 0.33; "can't": 0.35; 'agree': 0.35; 'something': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'should': 0.36; 'so,': 0.37; 'two': 0.37; 'problems': 0.38; 'configured': 0.38; 'whatever': 0.38; 'issue': 0.38; 'anything': 0.39; 'expect': 0.39; 'use.': 0.39; 'either': 0.39; 'skip:u 10': 0.60; 'read': 0.60; 'course.': 0.60; 'dangerous': 0.60; 'save': 0.62; 'more': 0.64; 'default': 0.69; 'family': 0.73; 'jul': 0.74; 'to:none': 0.92 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=7p9NXJXpINQcNh+lwxYq5WzT5JsvOCYCwiP58swi1G4=; b=uZQGbe7P3RZ5fcoo74SIgfhqCwU+gkfZPgU3rNX8JmaBOpbKSZuwZbkJYlGN4FjLdY Bk8Yp86t1yP+0PG2ejntA6i/rJnQn3pZZ+4a/sATsCc3Tw2NEPY30Sea7f+zzbeSLlzT gs1NKctUZJBanDL651X1G9FTksrhENB+4xgQpDqx6XB6jBLEh8stVxgGZZDCQreRJjc9 qtY0f5rzYbuN4VYJ6oXnYIV8JKavcI+zvlqHrfMkrGr+x6mvHHeACbwZD2ma9lxPga85 0uKT3mAis1wA7PWOrUc1Y9SdS+B3FKq3+HQ8GQENZWI10zdbKOlKDEpZEDEK6xyeedhm K3UA== MIME-Version: 1.0 X-Received: by 10.52.138.209 with SMTP id qs17mr3532637vdb.80.1405526848665; Wed, 16 Jul 2014 09:07:28 -0700 (PDT) In-Reply-To: <8761ix4859.fsf@elektro.pacujo.net> References: <5389cb53$0$29978$c3e8da3$5496439d@news.astraweb.com> <99b7b2a2-7521-42d7-a5a0-1a35d4d5b922@googlegroups.com> <53C4A454.9010600@gmail.com> <87zjga4j4v.fsf@elektro.pacujo.net> <53c57bae$0$9505$c3e8da3$5496439d@news.astraweb.com> <87iomy4ciy.fsf@elektro.pacujo.net> <53c5f6dc$0$9505$c3e8da3$5496439d@news.astraweb.com> <87egxl4zq8.fsf@elektro.pacujo.net> <53c62e7f$0$29897$c3e8da3$5496439d@news.astraweb.com> <871ttlfune.fsf@elektro.pacujo.net> <53c66ba8$0$9505$c3e8da3$5496439d@news.astraweb.com> <87sim1e9dt.fsf@elektro.pacujo.net> <87oawpe5be.fsf@elektro.pacujo.net> <8761ix4859.fsf@elektro.pacujo.net> Date: Thu, 17 Jul 2014 02:07:28 +1000 Subject: Re: Python 3 is killing Python From: Chris Angelico Cc: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 24 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1405526851 news.xs4all.nl 2970 [2001:888:2000:d::a6]:51035 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:74565 On Thu, Jul 17, 2014 at 1:48 AM, Marko Rauhamaa wrote: > it is dangerous to assume that the file formats agree with > the locale. Of course. You never assume anything about encodings. What you do is expect something about the encoding, and either throw an error if it's wrong, or figure out some other encoding to use. With anything that you broadly control (eg if your program is configured by a file in /etc that nothing else uses), you just decode with whatever you document your program as using, and any failure is *not your problem*. It's that simple. You don't replace /etc/passwd with a JPEG encoded photograph of your family tree and expect all your family to be able to log in; no more should you expect a file to be parsed correctly if it's meant to be UTF-8 and you save it in ISO-8859-4. The two cases are equally ridiculous. The only thing that might be an issue is that you can't use open(fn) to read your files, but you have to explicitly state the encoding. That would be an understandable problem, especially for someone who develops on a single platform and forgets that the default differs. As long as you always explicitly say encoding="utf-8", and document that you do so, any problems are someone else's. ChrisA