Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #13239
| References | (4 earlier) <mailman.1021.1315817058.27778.python-list@python.org> <4e6dc7b4$0$29986$c3e8da3$5496439d@news.astraweb.com> <ee50be9b-2710-423b-9c69-744cc173ac85@dq7g2000vbb.googlegroups.com> <CAHzaPEM7OURL_UESEEJGBVHOaTT0TS6LgVf+SV2UuGQg8czS6w@mail.gmail.com> <CAO+9iGfMToWvMOyha__DnAH=S-uaDS6wZEk+uFrH15seuO7ynQ@mail.gmail.com> |
|---|---|
| Date | 2011-09-13 20:13 +0200 |
| Subject | Re: How do I automate the removal of all non-ascii characters from my code? |
| From | Vlastimil Brom <vlastimil.brom@gmail.com> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.1081.1315937627.27778.python-list@python.org> (permalink) |
2011/9/13 Alec Taylor <alec.taylor6@gmail.com>:
> Hmm, nothing mentioned so far works for me...
>
> Here's a very small test case:
>
>>>> python -u "Convert to Creole.py"
> File "Convert to Creole.py", line 1
> SyntaxError: Non-ASCII character '\xe2' in file Convert to Creole.py
> on line 1, but no encoding declared; see
> http://www.python.org/peps/pep-0263.html for details
>>>> Exit Code: 1
>
> Line 1: a=u'''≤'''.encode("ascii", "ignore").decode("ascii")
>
> On Tue, Sep 13, 2011 at 11:33 PM, Vlastimil Brom
> <vlastimil.brom@gmail.com> wrote:
>> 2011/9/13 ron <vacorama@gmail.com>:
>>>
>>> Depending on the load, you can do something like:
>>>
>>> "".join([x for x in string if ord(x) < 128])
>>>
>>> It's worked great for me in cleaning input on webapps where there's a
>>> lot of copy/paste from varied sources.
>>> --
>>> http://mail.python.org/mailman/listinfo/python-list
>>>
>> Well, for this kind of dirty "data cleaning" you may as well use e.g.
>>
>>>>> u"äteöxt ÛÜÝ wiÉÊËÌthÞßà áânoûüýþn ASɔɕɖCɗɘəɚɛIɗɘəɚɛIεζ iηθιn жзbetийклweeჟრსn .ტუ..ფ".encode("ascii", "ignore").decode("ascii")
>> u'text with non ASCII in between ...'
>>>>>
>>
>> vbr
>> --
>> http://mail.python.org/mailman/listinfo/python-list
>>
>
Ok, in that case the encoding probably would be utf-8; \xe2 is just
the first part of the encoded data
>>> u'≤'.encode("utf-8")
'\xe2\x89\xa4'
>>>
Setting this encoding at the beginning of the file, as mentioned
before, might solve the problem while retaining the symbol in question
(or you could move from syntax error to some unicode related error
depending on other circumstances...).
vbr
Back to comp.lang.python | Previous | Next — Previous in thread | Find similar | Unroll thread
Re: How do I automate the removal of all non-ascii characters from my code? Stefan Behnel <stefan_ml@behnel.de> - 2011-09-12 10:43 +0200
Re: How do I automate the removal of all non-ascii characters from my code? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-09-12 18:49 +1000
Re: How do I automate the removal of all non-ascii characters from my code? Dave Angel <davea@ieee.org> - 2011-09-12 08:09 -0400
Re: How do I automate the removal of all non-ascii characters from my code? jmfauth <wxjmfauth@gmail.com> - 2011-09-12 07:47 -0700
Re: How do I automate the removal of all non-ascii characters from my code? "Rhodri James" <rhodri@wildebst.demon.co.uk> - 2011-09-12 22:39 +0100
Re: How do I automate the removal of all non-ascii characters from my code? jmfauth <wxjmfauth@gmail.com> - 2011-09-13 00:49 -0700
Re: How do I automate the removal of all non-ascii characters from my code? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-09-13 18:15 +1000
Re: How do I automate the removal of all non-ascii characters from my code? jmfauth <wxjmfauth@gmail.com> - 2011-09-13 02:04 -0700
Re: How do I automate the removal of all non-ascii characters from my code? ron <vacorama@gmail.com> - 2011-09-13 05:31 -0700
Re: How do I automate the removal of all non-ascii characters from my code? Vlastimil Brom <vlastimil.brom@gmail.com> - 2011-09-13 15:33 +0200
Re: How do I automate the removal of all non-ascii characters from my code? Alec Taylor <alec.taylor6@gmail.com> - 2011-09-14 01:02 +1000
Re: How do I automate the removal of all non-ascii characters from my code? Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2011-09-13 18:29 +0300
Re: How do I automate the removal of all non-ascii characters from my code? Vlastimil Brom <vlastimil.brom@gmail.com> - 2011-09-13 20:13 +0200
csiph-web