Path: csiph.com!x330-a1.tempe.blueboxinc.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!feeder.news-service.com!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.004 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'ascii': 0.07; 'subject:code': 0.07; 'python': 0.08; '8bit%:78': 0.09; '>>>>': 0.09; 'subject:characters': 0.09; 'url:peps': 0.09; 'declared;': 0.16; 'from:addr:alec.taylor6': 0.16; 'from:name:alec taylor': 0.16; 'subject:non': 0.16; 'syntaxerror:': 0.16; 'url:pep-0263': 0.16; 'cc:addr:python-list': 0.16; 'wrote:': 0.16; '>>>': 0.18; 'convert': 0.19; 'cc:no real name:2**0': 0.20; 'cc:2**0': 0.22; 'header:In-Reply-To:1': 0.22; 'tue,': 0.23; 'sep': 0.23; 'worked': 0.23; 'pm,': 0.24; 'input': 0.24; 'string': 0.26; 'url:mailman': 0.28; 'exit': 0.29; 'message-id:@mail.gmail.com': 0.29; 'cc:addr:python.org': 0.30; 'subject:?': 0.31; 'url:listinfo': 0.33; 'character': 0.34; 'test': 0.34; 'like:': 0.34; 'skip:" 20': 0.35; 'subject:How': 0.35; 'file': 0.36; 'url:python': 0.36; '8bit%:86': 0.37; 'but': 0.37; 'something': 0.37; 'received:google.com': 0.38; 'url:org': 0.38; 'non': 0.38; 'subject:: ': 0.39; 'e.g.': 0.39; "there's": 0.39; 'received:74.125': 0.39; 'subject:from': 0.40; "it's": 0.40; 'where': 0.40; 'kind': 0.61; 'details': 0.65; '8bit%:75': 0.67; 'skip:\xd0 20': 0.77; '11:33': 0.84; 'subject:removal': 0.84 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=g0pqXuXqoD7kMSLgHpZmKkPrOZ27LEnpHiCsPdvuZNM=; b=j/PQWIUb5yUee7tfgGz9E1opubIFxDV+UZzbxJXneMvqEVTuKm9nOPT0T0QCq5953t 3HdDkn5ogKMTinQm/vDW5hIYp1pPYdbCsrODhzm6Hdm55B6hybV0z3fpQyKjEZbaaEyy I+Slcnp9jUGwj6v5k9iela/MVjZx18QqSqSSA= MIME-Version: 1.0 In-Reply-To: References: <4E6DC028.1020101@islandtraining.com> <4e6dc7b4$0$29986$c3e8da3$5496439d@news.astraweb.com> Date: Wed, 14 Sep 2011 01:02:05 +1000 Subject: Re: How do I automate the removal of all non-ascii characters from my code? From: Alec Taylor To: Vlastimil Brom Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: python-list@python.org X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 41 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1315926128 news.xs4all.nl 2522 [2001:888:2000:d::a6]:36232 X-Complaints-To: abuse@xs4all.nl Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:13234 Hmm, nothing mentioned so far works for me... Here's a very small test case: >>> python -u "Convert to Creole.py" File "Convert to Creole.py", line 1 SyntaxError: Non-ASCII character '\xe2' in file Convert to Creole.py on line 1, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details >>> Exit Code: 1 Line 1: a=3Du'''=E2=89=A4'''.encode("ascii", "ignore").decode("ascii") On Tue, Sep 13, 2011 at 11:33 PM, Vlastimil Brom wrote: > 2011/9/13 ron : >> >> Depending on the load, you can do something like: >> >> "".join([x for x in string if ord(x) < 128]) >> >> It's worked great for me in cleaning input on webapps where there's a >> lot of copy/paste from varied sources. >> -- >> http://mail.python.org/mailman/listinfo/python-list >> > Well, for this kind of dirty "data cleaning" you may as well use e.g. > >>>> u"=C3=A4te=C3=B6xt =C3=9B=C3=9C=C3=9D wi=C3=89=C3=8A=C3=8B=C3=8Cth=C3= =9E=C3=9F=C3=A0 =C3=A1=C3=A2no=C3=BB=C3=BC=C3=BD=C3=BEn AS=C9=94=C9=95=C9= =96C=C9=97=C9=98=C9=99=C9=9A=C9=9BI=C9=97=C9=98=C9=99=C9=9A=C9=9BI=CE=B5=CE= =B6 i=CE=B7=CE=B8=CE=B9n =D0=B6=D0=B7bet=D0=B8=D0=B9=D0=BA=D0=BBwee=E1=83= =9F=E1=83=A0=E1=83=A1n .=E1=83=A2=E1=83=A3..=E1=83=A4".encode("ascii", "ign= ore").decode("ascii") > u'text =C2=A0with non ASCII in between ...' >>>> > > vbr > -- > http://mail.python.org/mailman/listinfo/python-list >