Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Ben Finney Newsgroups: comp.lang.python Subject: Re: non printable (moving away from Perl) Date: Sat, 12 Mar 2016 06:52:42 +1100 Lines: 52 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: news.uni-berlin.de OEV/bYqs2ONh7KUBYCiGuwpA5vy6ap0PNFVOqVrzeCsw== Cancel-Lock: sha1:lrf3qt6hH67CkYZWHAWICsA3s/g= Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'mrab': 0.05; 'bytes.': 0.07; "'rb')": 0.09; 'method:': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'type)': 0.09; 'type;': 0.09; 'python': 0.10; 'encoding': 0.15; 'decode': 0.16; 'encoded.': 0.16; "file's": 0.16; 'received:80.91.229.3': 0.16; 'received:io': 0.16; 'received:plane.gmane.org': 0.16; 'received:psf.io': 0.16; 'subject:non': 0.16; 'too?': 0.16; 'wrote:': 0.16; 'string': 0.17; 'attribute': 0.18; 'byte': 0.18; 'bytes': 0.18; '>>>': 0.20; '(the': 0.22; '"",': 0.22; 'text,': 0.22; '(most': 0.24; 'plain': 0.24; 'header:User-Agent:1': 0.26; 'header:X-Complaints-To:1': 0.26; 'sense': 0.26; '(such': 0.27; "skip:' 10": 0.28; 'lies': 0.29; 'foo': 0.33; 'stream': 0.33; 'traceback': 0.33; 'file': 0.34; 'text': 0.35; 'clear': 0.35; 'text.': 0.35; 'unicode': 0.35; 'asking': 0.35; 'skip:i 20': 0.36; 'there': 0.36; 'to:addr:python- list': 0.36; 'pm,': 0.36; 'subject:: ': 0.37; 'received:org': 0.37; 'difference': 0.38; 'itself': 0.38; 'data': 0.39; 'does': 0.39; 'subject:from': 0.39; 'to:addr:python.org': 0.40; 'between': 0.65; 'talking': 0.67; 'skip:\xe2 10': 0.70; '_o__)': 0.84; 'received:125': 0.84 X-Injected-Via-Gmane: http://gmane.org/ X-Gmane-NNTP-Posting-Host: jigong.madmonks.org X-Public-Key-ID: 0xAC128405 X-Public-Key-Fingerprint: 517C F14B B2F3 98B0 CB35 4855 B8B2 4C06 AC12 8405 X-Public-Key-URL: http://www.benfinney.id.au/contact/bfinney-pubkey.asc X-Post-From: Ben Finney User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4 (gnu/linux) X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:104633 Fillmore writes: > On 3/11/2016 2:23 PM, MRAB wrote: > > Python 3 (Unicode) strings have an .isprintable method: > > > > mystring.isprintable() > > my strings are UTF-8. Will it work there too? You need to always be clear on the difference between text (the Python 3 ‘str’ type) versus bytes. It only makes sense to talk about an encoding, when talking about bytes. Text itself is an abstract data type; the content of a Unicode string does not have any encoding because it is not encoded. The content of a byte stream (such as a file's content) is not text, it is bytes. >>> foo = "こんにちは" >>> foo.isprintable() True >>> foo_encoded = foo.encode("utf-8") >>> foo_encoded.isprintable() Traceback (most recent call last): File "", line 1, in AttributeError: 'bytes' object has no attribute 'isprintable' You can only ask ‘isprintable’ about text. Bytes are not printable because bytes are not text; you need to decode the bytes to text before asking whether that text is printable. >>> infile = open('lorem.txt', 'rb') >>> infile_bytes = infile.read() >>> infile_bytes.isprintable() Traceback (most recent call last): File "", line 1, in AttributeError: 'bytes' object has no attribute 'isprintable' >>> infile = open('lorem.txt', 'rt', encoding="utf-8") >>> infile_text = infile.read() >>> infile_text.isprintable() True -- \ “Telling pious lies to trusting children is a form of abuse, | `\ plain and simple.” —Daniel Dennett, 2010-01-12 | _o__) | Ben Finney