Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.007 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'string.': 0.05; 'completeness': 0.07; 'subject:data': 0.07; 'subject:string': 0.09; 'to:name:python-list@python.org': 0.15; 'ascii': 0.16; 'characters,': 0.16; 'currencies': 0.16; 'decode': 0.16; 'disclaimers': 0.16; 'disclaimers,': 0.16; 'from:addr:jpmorgan.com': 0.16; 'immer': 0.16; 'non-text': 0.16; 'received:155.180': 0.16; 'received:159': 0.16; 'received:159.53': 0.16; 'received:159.53.110': 0.16; 'received:bankone.net': 0.16; 'received:exchad.jpmchase.net': 0.16; 'received:jpmchase.com': 0.16; 'received:jpmchase.net': 0.16; 'received:svr.bankone.net': 0.16; 'securities,': 0.16; 'subject:changing': 0.16; 'url:disclosures': 0.16; 'url:jpmorgan': 0.16; 'verwendet': 0.16; 'accuracy': 0.18; 'received:169.254': 0.20; 'header:In-Reply- To:1': 0.22; 'kann': 0.23; 'example': 0.24; '>>>': 0.24; 'nicht': 0.24; 'received:169': 0.27; 'bit': 0.28; 'converts': 0.29; 'received:155': 0.29; 'diese': 0.30; 'source': 0.31; 'phone:': 0.31; 'rest': 0.32; 'subject: (': 0.33; 'bytes': 0.33; 'wird': 0.33; 'leave': 0.34; 'none': 0.34; 'characters': 0.35; 'subject:)': 0.36; 'sure': 0.36; 'text': 0.36; 'but': 0.36; 'charset:us-ascii': 0.36; 'oder': 0.37; 'something': 0.38; 'to:addr:python-list': 0.39; 'think': 0.40; 'to:addr:python.org': 0.40; 'skip:n 10': 0.62; 'information,': 0.65; 'remove': 0.65; 'alle': 0.66; 'andere': 0.66; 'troublesome': 0.66; 'purchase': 0.67; 'subject': 0.70; 'investment': 0.71; 'sind': 0.72; 'legal': 0.73; 'bank': 0.74; 'sale': 0.75; '712': 0.84; 'houston,': 0.84; 'received:169.254.8': 0.84; 'heute': 0.91; 'else.': 0.93 X-DKIM: OpenDKIM Filter v2.1.3 sj3.jpmchase.com q2SIVCHQ012795 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=jpmorgan.com; s=smtpout; t=1332959472; bh=wwNG0nlnr2FeI3ZBjFzXQ4cJzITjebvcUIwLFfCnnZQ=; h=From:To:Subject:Date:Message-ID:References:In-Reply-To: Content-Transfer-Encoding:MIME-Version:Content-Type; b=KMtkbZL+CFP9T7tAD7mcWlpQH2/7V1U+EJRdDrIotTeTZVL1TiyntBXSVaNst6kgp vJwW2mFYnoAz0e4eukLfFrX7j++uZ+0t8b8XW56RWOYpIqwDi9q6UZKEY4ofFCwer3 Xin9SkIxZffxLZheRaN4dV9HCuCPMoLRR1F8VwUE= From: "Prasad, Ramit" To: "python-list@python.org" Subject: RE: "convert" string to bytes without changing data (encoding) Thread-Topic: "convert" string to bytes without changing data (encoding) Thread-Index: AQHNDQ6IHZNzdgc8qEeb3v43AQCsyZaABXqA Date: Wed, 28 Mar 2012 18:31:00 +0000 References: <9tg21lFmo3U1@mid.dfncis.de> <9tg4qoFbfpU1@mid.dfncis.de> <9th0u8Fuf2U1@mid.dfncis.de> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.67.79.38] Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-DLP-FWD: Yes Content-Type: text/plain; charset="us-ascii" X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 25 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1332959476 news.xs4all.nl 6862 [2001:888:2000:d::a6]:47111 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:22300 > You can read as bytes and decode as ASCII but ignoring the troublesome=0D= =0A> non-text characters:=0D=0A> =0D=0A> >>> print(open('text=2Etxt', 'br')= =2Eread()=2Edecode('ascii', 'ignore'))=0D=0A> Das fr ASCII nicht benutzte B= it kann auch fr Fehlerkorrekturzwecke=0D=0A> (Parittsbit) auf den Kommunika= tionsleitungen oder fr andere=0D=0A> Steuerungsaufgaben verwendet werden=2E= Heute wird es aber fast immer zur=0D=0A> Erweiterung von ASCII auf einen 8= -Bit-Code verwendet=2E Diese=0D=0A> Erweiterungen sind mit dem ursprngliche= n ASCII weitgehend kompatibel,=0D=0A> so dass alle im ASCII definierten Zei= chen auch in den verschiedenen=0D=0A> Erweiterungen durch die gleichen Bitm= uster kodiert werden=2E Die=0D=0A> einfachsten Erweiterungen sind Kodierung= en mit sprachspezifischen=0D=0A> Zeichen, die nicht im lateinischen Grundal= phabet enthalten sind=2E=0D=0A> =0D=0A> The paragraph is from the German Wi= kipedia on ASCII, in UTF-8=2E=0D=0A=0D=0AI see no non-ASCII characters, not= sure if that is because the source=0D=0Ahas none or something else=2E From= this example I would not say that=0D=0Athe rest of the text is "unchanged"= =2E Decode converts to Unicode,=0D=0Adid you mean encode?=0D=0A=0D=0AI thi= nk "ignore" will remove non-translatable characters and not =0D=0Aleave the= m in the returned string=2E=0D=0A=0D=0ARamit=0D=0A=0D=0A=0D=0ARamit Prasad = | JPMorgan Chase Investment Bank | Currencies Technology=0D=0A712 Main Stre= et | Houston, TX 77002=0D=0Awork phone: 713 - 216 - 5423=0D=0A=0D=0A--=0D= =0A=0D=0AThis email is confidential and subject to important disclaimers an= d=0D=0Aconditions including on offers for the purchase or sale of=0D=0Asecu= rities, accuracy and completeness of information, viruses,=0D=0Aconfidentia= lity, legal privilege, and legal entity disclaimers,=0D=0Aavailable at http= ://www=2Ejpmorgan=2Ecom/pages/disclosures/email=2E