Path: csiph.com!usenet.pasdenom.info!news.redatomik.org!newsfeed.xs4all.nl!newsfeed2a.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'encoding': 0.05; 'subject:text': 0.05; 'failing': 0.07; 'bytes,': 0.09; 'bytes.': 0.09; 'integers': 0.09; 'jpg': 0.09; 'so?': 0.09; 'strings.': 0.09; 'subject:question': 0.10; 'cc:addr:python-list': 0.11; 'python': 0.11; 'dictionary,': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'integers.': 0.16; 'exception': 0.16; 'wrote:': 0.18; 'cc:addr:python.org': 0.22; 'bytes': 0.24; 'either.': 0.24; 'string,': 0.24; 'text,': 0.24; 'text.': 0.24; 'unicode': 0.24; 'cc:2**0': 0.24; 'header:In-Reply-To:1': 0.27; 'am,': 0.29; 'array': 0.29; "doesn't": 0.30; 'message- id:@mail.gmail.com': 0.30; 'catching': 0.31; 'file': 0.32; 'text': 0.33; 'could': 0.34; "can't": 0.35; 'something': 0.35; 'operate': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'add': 0.35; 'really': 0.36; 'files': 0.38; 'use.': 0.39; 'how': 0.40; 'matter': 0.61; 'mar': 0.68; '2015': 0.84; 'to:none': 0.92 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=kMIE6gVYb3S9CyYObb29xXf8bgesHVHIUaven9NVCEc=; b=E/8Y43q/mEiMhaPd8cLhNFNG9pLAlNBNX/XdhJdsAKrKoP+jcCUBj9I+vEjbRdqSEH bMKkCstKxYWFBP77hOPWL9TZyjvbIHvA/Z1vd05RTSUHMHxtDMWX6WH7h4OnrjWQ6cfz c2R06qfWFH9S7VNPBb2QhsfQeAS+5BbGxEI7DrZ47efVYnTS/uYx1i0bj+F13olzyUNr ymekifln1pyqNR2uVnpSjqSa61jEnvm1cJVsLXJiT8vGLJEQEeHAu4rF+nGCKG6FujKZ CO5xwOAXauiZiMI6CYP+cu+7EdSdsWwsaVmyv1GYZBmL7UOdX10Ih0FIanLfEP++dqC3 Zc7w== MIME-Version: 1.0 X-Received: by 10.50.79.161 with SMTP id k1mr35691658igx.14.1425747645031; Sat, 07 Mar 2015 09:00:45 -0800 (PST) In-Reply-To: <87bnk4yci1.fsf@elektro.pacujo.net> References: <9169f3b1-2ac7-42a3-8033-584f84b88a1f@googlegroups.com> <7a75a23c-4678-4d7a-a2ec-9e8fff4c07f8@googlegroups.com> <132d5ce6-f672-4eec-99f9-1cc9e88b94f3@googlegroups.com> <619e4cb5-1c4c-449b-a5d7-951101b32b45@googlegroups.com> <54f862ca$0$13014$c3e8da3$5496439d@news.astraweb.com> <54fadc70$0$13004$c3e8da3$5496439d@news.astraweb.com> <87twxxxbvd.fsf@elektro.pacujo.net> <54fb1bf4$0$12993$c3e8da3$5496439d@news.astraweb.com> <87twxw4xlz.fsf@elektro.pacujo.net> <87k2ysydtk.fsf@elektro.pacujo.net> <87bnk4yci1.fsf@elektro.pacujo.net> Date: Sun, 8 Mar 2015 04:00:44 +1100 Subject: Re: Newbie question about text encoding From: Chris Angelico Cc: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.19 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 16 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1425747654 news.xs4all.nl 2892 [2001:888:2000:d::a6]:52694 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:87110 On Sun, Mar 8, 2015 at 3:54 AM, Marko Rauhamaa wrote: > You can't operate on file names and text files using Python strings. Or > at least, you will need to add (nontrivial) exception catching logic. You can't operate on a JPG file using a Unicode string, nor an array of integers. What of it? You can't operate on an array of integers using a dictionary, either. So? How is this a failing of UTF-8? If you really REALLY can't use the bytes() type to work with something that is, yaknow, bytes, then you could use an alternative encoding that has a value for every byte. It's still not Unicode text, so it doesn't much matter which encoding you use. But it's much better to use the bytes type to work with bytes. It is not text, so don't treat it as text. ChrisA