Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #13162

Re: How do I automate the removal of all non-ascii characters from my code?

From Stefan Behnel <stefan_ml@behnel.de>
Subject Re: How do I automate the removal of all non-ascii characters from my code?
Date 2011-09-12 10:43 +0200
References <CAO+9iGc0f7vFLrjjm24vzZqoMV04eL4fBQrYOTfDt-eYCNEacQ@mail.gmail.com> <4E6DC028.1020101@islandtraining.com> <CAO+9iGfHABoWnz-Podk9J5D3EJFgQ-9th=ky2Z-uW8MbQJA12A@mail.gmail.com> <CAO+9iGe2JcUHY8u+KEZfv3r8VmBdJ9h9NLwL3nO2Wpm80zRT2Q@mail.gmail.com>
Newsgroups comp.lang.python
Message-ID <mailman.1021.1315817058.27778.python-list@python.org> (permalink)

Show all headers | View raw


Alec Taylor, 12.09.2011 10:33:
> from creole import html2creole
>
> from BeautifulSoup import BeautifulSoup
>
> VALID_TAGS = ['strong', 'em', 'p', 'ul', 'li', 'br', 'b', 'i', 'a', 'h1', 'h2']
>
> def sanitize_html(value):
>
>     soup = BeautifulSoup(value)
>
>     for tag in soup.findAll(True):
>         if tag.name not in VALID_TAGS:
>             tag.hidden = True
>
>     return soup.renderContents()
> html2creole(u(sanitize_html('''<h1
> style="margin-left:76.8px;margin-right:0;text-indent:0;">Abstract</h1>
>     <p class="Standard"
> style="margin-left:76.8px;margin-right:0;text-indent:0;">
> [more stuff here]
> """))

Hi,

I'm not sure what you are trying to say with the above code, but if it's 
the code that fails for you with the exception you posted, I would guess 
that the problem is in the "[more stuff here]" part, which likely contains 
a non-ASCII character. Note that you didn't declare the source file 
encoding above. Do as Gary told you.

Stefan

Back to comp.lang.python | Previous | NextNext in thread | Find similar | Unroll thread


Thread

Re: How do I automate the removal of all non-ascii characters from my code? Stefan Behnel <stefan_ml@behnel.de> - 2011-09-12 10:43 +0200
  Re: How do I automate the removal of all non-ascii characters from my code? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-09-12 18:49 +1000
    Re: How do I automate the removal of all non-ascii characters from my code? Dave Angel <davea@ieee.org> - 2011-09-12 08:09 -0400
    Re: How do I automate the removal of all non-ascii characters from my code? jmfauth <wxjmfauth@gmail.com> - 2011-09-12 07:47 -0700
      Re: How do I automate the removal of all non-ascii characters from my code? "Rhodri James" <rhodri@wildebst.demon.co.uk> - 2011-09-12 22:39 +0100
        Re: How do I automate the removal of all non-ascii characters from my code? jmfauth <wxjmfauth@gmail.com> - 2011-09-13 00:49 -0700
          Re: How do I automate the removal of all non-ascii characters from my code? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-09-13 18:15 +1000
            Re: How do I automate the removal of all non-ascii characters from my code? jmfauth <wxjmfauth@gmail.com> - 2011-09-13 02:04 -0700
    Re: How do I automate the removal of all non-ascii characters from my code? ron <vacorama@gmail.com> - 2011-09-13 05:31 -0700
      Re: How do I automate the removal of all non-ascii characters from my code? Vlastimil Brom <vlastimil.brom@gmail.com> - 2011-09-13 15:33 +0200
      Re: How do I automate the removal of all non-ascii characters from my code? Alec Taylor <alec.taylor6@gmail.com> - 2011-09-14 01:02 +1000
        Re: How do I automate the removal of all non-ascii characters from my code? Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2011-09-13 18:29 +0300
      Re: How do I automate the removal of all non-ascii characters from my code? Vlastimil Brom <vlastimil.brom@gmail.com> - 2011-09-13 20:13 +0200

csiph-web