Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #13239

Re: How do I automate the removal of all non-ascii characters from my code?

References (4 earlier) <mailman.1021.1315817058.27778.python-list@python.org> <4e6dc7b4$0$29986$c3e8da3$5496439d@news.astraweb.com> <ee50be9b-2710-423b-9c69-744cc173ac85@dq7g2000vbb.googlegroups.com> <CAHzaPEM7OURL_UESEEJGBVHOaTT0TS6LgVf+SV2UuGQg8czS6w@mail.gmail.com> <CAO+9iGfMToWvMOyha__DnAH=S-uaDS6wZEk+uFrH15seuO7ynQ@mail.gmail.com>
Date 2011-09-13 20:13 +0200
Subject Re: How do I automate the removal of all non-ascii characters from my code?
From Vlastimil Brom <vlastimil.brom@gmail.com>
Newsgroups comp.lang.python
Message-ID <mailman.1081.1315937627.27778.python-list@python.org> (permalink)

Show all headers | View raw


2011/9/13 Alec Taylor <alec.taylor6@gmail.com>:
> Hmm, nothing mentioned so far works for me...
>
> Here's a very small test case:
>
>>>> python -u "Convert to Creole.py"
>  File "Convert to Creole.py", line 1
> SyntaxError: Non-ASCII character '\xe2' in file Convert to Creole.py
> on line 1, but no encoding declared; see
> http://www.python.org/peps/pep-0263.html for details
>>>> Exit Code: 1
>
> Line 1: a=u'''≤'''.encode("ascii", "ignore").decode("ascii")
>
> On Tue, Sep 13, 2011 at 11:33 PM, Vlastimil Brom
> <vlastimil.brom@gmail.com> wrote:
>> 2011/9/13 ron <vacorama@gmail.com>:
>>>
>>> Depending on the load, you can do something like:
>>>
>>> "".join([x for x in string if ord(x) < 128])
>>>
>>> It's worked great for me in cleaning input on webapps where there's a
>>> lot of copy/paste from varied sources.
>>> --
>>> http://mail.python.org/mailman/listinfo/python-list
>>>
>> Well, for this kind of dirty "data cleaning" you may as well use e.g.
>>
>>>>> u"äteöxt ÛÜÝ wiÉÊËÌthÞßà áânoûüýþn ASɔɕɖCɗɘəɚɛIɗɘəɚɛIεζ iηθιn жзbetийклweeჟრსn .ტუ..ფ".encode("ascii", "ignore").decode("ascii")
>> u'text  with non ASCII in between ...'
>>>>>
>>
>> vbr
>> --
>> http://mail.python.org/mailman/listinfo/python-list
>>
>

Ok, in that case the encoding probably would be utf-8; \xe2 is just
the first part of the encoded data

>>> u'≤'.encode("utf-8")
'\xe2\x89\xa4'
>>>

Setting this encoding at the beginning of the file, as mentioned
before, might solve the problem while retaining the symbol in question
(or you could move from syntax error to some unicode related error
depending on other circumstances...).

vbr

Back to comp.lang.python | Previous | NextPrevious in thread | Find similar | Unroll thread


Thread

Re: How do I automate the removal of all non-ascii characters from my code? Stefan Behnel <stefan_ml@behnel.de> - 2011-09-12 10:43 +0200
  Re: How do I automate the removal of all non-ascii characters from my code? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-09-12 18:49 +1000
    Re: How do I automate the removal of all non-ascii characters from my code? Dave Angel <davea@ieee.org> - 2011-09-12 08:09 -0400
    Re: How do I automate the removal of all non-ascii characters from my code? jmfauth <wxjmfauth@gmail.com> - 2011-09-12 07:47 -0700
      Re: How do I automate the removal of all non-ascii characters from my code? "Rhodri James" <rhodri@wildebst.demon.co.uk> - 2011-09-12 22:39 +0100
        Re: How do I automate the removal of all non-ascii characters from my code? jmfauth <wxjmfauth@gmail.com> - 2011-09-13 00:49 -0700
          Re: How do I automate the removal of all non-ascii characters from my code? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-09-13 18:15 +1000
            Re: How do I automate the removal of all non-ascii characters from my code? jmfauth <wxjmfauth@gmail.com> - 2011-09-13 02:04 -0700
    Re: How do I automate the removal of all non-ascii characters from my code? ron <vacorama@gmail.com> - 2011-09-13 05:31 -0700
      Re: How do I automate the removal of all non-ascii characters from my code? Vlastimil Brom <vlastimil.brom@gmail.com> - 2011-09-13 15:33 +0200
      Re: How do I automate the removal of all non-ascii characters from my code? Alec Taylor <alec.taylor6@gmail.com> - 2011-09-14 01:02 +1000
        Re: How do I automate the removal of all non-ascii characters from my code? Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2011-09-13 18:29 +0300
      Re: How do I automate the removal of all non-ascii characters from my code? Vlastimil Brom <vlastimil.brom@gmail.com> - 2011-09-13 20:13 +0200

csiph-web