Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #13162
| From | Stefan Behnel <stefan_ml@behnel.de> |
|---|---|
| Subject | Re: How do I automate the removal of all non-ascii characters from my code? |
| Date | 2011-09-12 10:43 +0200 |
| References | <CAO+9iGc0f7vFLrjjm24vzZqoMV04eL4fBQrYOTfDt-eYCNEacQ@mail.gmail.com> <4E6DC028.1020101@islandtraining.com> <CAO+9iGfHABoWnz-Podk9J5D3EJFgQ-9th=ky2Z-uW8MbQJA12A@mail.gmail.com> <CAO+9iGe2JcUHY8u+KEZfv3r8VmBdJ9h9NLwL3nO2Wpm80zRT2Q@mail.gmail.com> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.1021.1315817058.27778.python-list@python.org> (permalink) |
Alec Taylor, 12.09.2011 10:33:
> from creole import html2creole
>
> from BeautifulSoup import BeautifulSoup
>
> VALID_TAGS = ['strong', 'em', 'p', 'ul', 'li', 'br', 'b', 'i', 'a', 'h1', 'h2']
>
> def sanitize_html(value):
>
> soup = BeautifulSoup(value)
>
> for tag in soup.findAll(True):
> if tag.name not in VALID_TAGS:
> tag.hidden = True
>
> return soup.renderContents()
> html2creole(u(sanitize_html('''<h1
> style="margin-left:76.8px;margin-right:0;text-indent:0;">Abstract</h1>
> <p class="Standard"
> style="margin-left:76.8px;margin-right:0;text-indent:0;">
> [more stuff here]
> """))
Hi,
I'm not sure what you are trying to say with the above code, but if it's
the code that fails for you with the exception you posted, I would guess
that the problem is in the "[more stuff here]" part, which likely contains
a non-ASCII character. Note that you didn't declare the source file
encoding above. Do as Gary told you.
Stefan
Back to comp.lang.python | Previous | Next — Next in thread | Find similar | Unroll thread
Re: How do I automate the removal of all non-ascii characters from my code? Stefan Behnel <stefan_ml@behnel.de> - 2011-09-12 10:43 +0200
Re: How do I automate the removal of all non-ascii characters from my code? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-09-12 18:49 +1000
Re: How do I automate the removal of all non-ascii characters from my code? Dave Angel <davea@ieee.org> - 2011-09-12 08:09 -0400
Re: How do I automate the removal of all non-ascii characters from my code? jmfauth <wxjmfauth@gmail.com> - 2011-09-12 07:47 -0700
Re: How do I automate the removal of all non-ascii characters from my code? "Rhodri James" <rhodri@wildebst.demon.co.uk> - 2011-09-12 22:39 +0100
Re: How do I automate the removal of all non-ascii characters from my code? jmfauth <wxjmfauth@gmail.com> - 2011-09-13 00:49 -0700
Re: How do I automate the removal of all non-ascii characters from my code? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-09-13 18:15 +1000
Re: How do I automate the removal of all non-ascii characters from my code? jmfauth <wxjmfauth@gmail.com> - 2011-09-13 02:04 -0700
Re: How do I automate the removal of all non-ascii characters from my code? ron <vacorama@gmail.com> - 2011-09-13 05:31 -0700
Re: How do I automate the removal of all non-ascii characters from my code? Vlastimil Brom <vlastimil.brom@gmail.com> - 2011-09-13 15:33 +0200
Re: How do I automate the removal of all non-ascii characters from my code? Alec Taylor <alec.taylor6@gmail.com> - 2011-09-14 01:02 +1000
Re: How do I automate the removal of all non-ascii characters from my code? Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2011-09-13 18:29 +0300
Re: How do I automate the removal of all non-ascii characters from my code? Vlastimil Brom <vlastimil.brom@gmail.com> - 2011-09-13 20:13 +0200
csiph-web