Path: csiph.com!usenet.pasdenom.info!dedibox.gegeweb.org!gegeweb.eu!nntpfeed.proxad.net!proxad.net!feeder1-2.proxad.net!usenet-fr.net!nerim.net!novso.com!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'subject:not': 0.03; 'broken': 0.04; 'encoding': 0.05; 'subject:Python': 0.06; '(python': 0.07; 'binary': 0.07; 'failing': 0.07; 'assumed': 0.09; 'bytes,': 0.09; 'inherited': 0.09; 'python': 0.11; 'encodings': 0.16; 'etc?': 0.16; 'sockets': 0.16; 'str)': 0.16; 'subject:Unicode': 0.16; 'unicode,': 0.16; 'all.': 0.16; 'wrote:': 0.18; 'wed,': 0.18; 'module': 0.19; 'handles': 0.22; 'error': 0.23; 'bytes': 0.24; 'unicode': 0.24; 'equivalent': 0.26; 'defined': 0.27; 'header:In-Reply-To:1': 0.27; 'am,': 0.29; "doesn't": 0.30; 'message-id:@mail.gmail.com': 0.30; 'everywhere': 0.31; 'pipe': 0.31; 'though.': 0.31; 'file': 0.32; 'sources': 0.33; 'created': 0.35; 'etc': 0.35; 'but': 0.35; 'received:google.com': 0.35; '14,': 0.36; 'surely': 0.36; 'being': 0.38; 'problems': 0.38; 'to:addr:python-list': 0.38; 'issue': 0.38; 'to:addr:python.org': 0.39; 'read': 0.60; 'problems.': 0.60; 'occur': 0.65; 'promise': 0.68; 'receive': 0.70; 'subject:know': 0.84; 'subject:you': 0.87; 'subject:want': 0.91; 'imagine': 0.93 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=iF89VhgLyvHBa/tDSTes+JXFyXcpUa6uKBSpRDAOr1U=; b=SixG9TcYCRBKqnjhr1WG/dM/pSUlyKBi9QnVajB7BoUgHFJjjswYcVdOiWARm71w28 inar5ddIALnlN30C4JfkaG/tN8Ew6/uiKzqYfU6BKV4IQG8/e6n8LFjSsA6qJZpSLjbx QBJveqYJzv21p8cu4dzr+zpt0Kukak1dnnCsCUAtCeJgj0tpM6e5NCVE9mEZtc30mKAm 6dgi0ZH3AtPlOWSXItGJmWhWxlOaih1Flkq3oFv4LmAYIk7HcHdAPyojefefS25jhtjx 6yEaMyvLSFQzZkvW16UZLp+xPxoypsQ8kWEjJuxbzWvHKwChThwZ/LxI0FxNLpNxNpGI KZzw== X-Received: by 10.68.249.2 with SMTP id yq2mr5563019pbc.70.1400083055648; Wed, 14 May 2014 08:57:35 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <53738C13.60406@chamonix.reportlab.co.uk> References: <8P7cv.78617$Sp6.8377@fx15.am4> <537172eb$0$29980$c3e8da3$5496439d@news.astraweb.com> <53738C13.60406@chamonix.reportlab.co.uk> From: Ian Kelly Date: Wed, 14 May 2014 09:56:54 -0600 Subject: Re: Everything you did not want to know about Unicode in Python 3 To: Python Content-Type: text/plain; charset=UTF-8 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 12 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1400083063 news.xs4all.nl 2953 [2001:888:2000:d::a6]:57512 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:71567 On Wed, May 14, 2014 at 9:30 AM, Robin Becker wrote: > Doesn't this issue also come up wherever bytes are being read ie in sockets, > pipe file handles etc? Some sources may have well defined encodings and so > allow use of unicode strings but surely not all. I imagine all of the > problems associated with a broken encoding promise for stdin can also occur > with sockets & other sources ie error messages failing to be printable etc > etc. Since bytes in Python 3 are not equivalent to the old str (Python 3 > bytes != Python 2 str) using bytes everywhere has its own problems. Sockets send and receive bytes, and pipes created by the subprocess module are opened in binary mode. Pipes inherited as stdin are still assumed to be unicode, though.