Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.mixmin.net!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed1a.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'encoding': 0.05; 'explicitly': 0.05; 'defaults': 0.07; 'encode': 0.09; 'happen.': 0.09; 'lawrence': 0.09; 'python': 0.11; "'w')": 0.16; "'w',": 0.16; '-tkc': 0.16; 'codec': 0.16; 'conveyed': 0.16; 'from:addr:python.list': 0.16; 'from:addr:tim.thechases.com': 0.16; 'from:name:tim chase': 0.16; 'gonna': 0.16; 'hides': 0.16; 'open()': 0.16; 'ordinal': 0.16; 'prevent': 0.16; ':-)': 0.16; 'wrote:': 0.18; 'written': 0.21; '>>>': 0.22; 'bytes': 0.24; '(or': 0.24; 'header:In-Reply-To:1': 0.27; 'point': 0.28; 'chris': 0.29; 'character': 0.29; "doesn't": 0.30; 'cool': 0.30; 'originally': 0.30; '"",': 0.31; '>>>>': 0.31; 'exceptions': 0.31; 'file': 0.32; '(most': 0.33; "i'd": 0.34; "can't": 0.35; 'done,': 0.36; 'done': 0.36; 'doing': 0.36; 'charset:us-ascii': 0.36; 'subject:?': 0.36; 'should': 0.36; 'being': 0.38; 'whatever': 0.38; 'to:addr:python-list': 0.38; 'files': 0.38; 'fact': 0.38; 'recent': 0.39; 'does': 0.39; 'to:addr:python.org': 0.39; 'how': 0.40; 'skip:u 10': 0.60; 'break': 0.61; 'kindly': 0.61; 'such': 0.63; 'received:50.22': 0.84; 'whereas': 0.91 Date: Wed, 12 Feb 2014 21:29:53 -0600 From: Tim Chase To: python-list@python.org Subject: Re: Wait... WHAT? In-Reply-To: References: <6c76ef4e-8c7c-4199-b30d-c4d55c1061c8@googlegroups.com> <20140212161427.0a9843d5@bigbox.christie.dr> <20140212184432.1df9b491@bigbox.christie.dr> X-Mailer: Claws Mail 3.8.1 (GTK+ 2.24.10; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - boston.accountservergroup.com X-AntiAbuse: Original Domain - python.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - tim.thechases.com X-Get-Message-Sender-Via: boston.accountservergroup.com: authenticated_id: tim@thechases.com X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 39 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1392262157 news.xs4all.nl 2846 [2001:888:2000:d::a6]:55176 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:66144 On 2014-02-13 00:59, Mark Lawrence wrote: > >>>> s = "\u3141" # HANGUL LETTER MIEUM > >>>> f = open('test.txt', 'w') > >>>> f.write("\u3141") > > Traceback (most recent call last): > > File "", line 1, in > > UnicodeEncodeError: 'ascii' codec can't encode character '\u3141' > > in position 0: ordinal not in range(128) > > > > Just because the open() call hides the specification of how Python > > should do that encoding doesn't prevent the required encoding from > > happening. :-) > > Which clearly reinforces the fact that what you originally said is > incorrect, I don't have to do anything, Python very kindly does > things for me under the covers. ...and when they break, you get to keep both pieces. :) If you don't know that encoding is being done, it's a lot harder to trust the assumption that you can directly write strings to files when exceptions like the above happen. My original point (though perhaps not conveyed as well as I'd intended) was that only bytes get written to the disk, and that some encoding must take place. It can be done implicitly using some defaults which may break (as demoed), whereas one would be better off doing it explicitly such as Chris shows: >>> f = open('test.txt', 'w', encoding='utf-8') >>> f.write("\u3141") 1 UTF-8'rs gonna 8. (or whatever memes the cool kids are riffing these days) -tkc