Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #35447
| Subject | Re: how to detect the character encoding in a web page ? |
|---|---|
| From | Kurt Mueller <kurt.alfred.mueller@gmail.com> |
| Date | 2012-12-24 09:34 +0100 |
| References | <c15bad9a-a7f7-456e-8dc5-b1af67fbdd44@googlegroups.com> <2324928c-32de-4f9d-8ff1-5db6dcf5543a@googlegroups.com> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.1245.1356338098.29569.python-list@python.org> (permalink) |
Am 24.12.2012 um 04:03 schrieb iMath: > but how to let python do it for you ? > such as these 2 pages > http://python.org/ > http://msdn.microsoft.com/en-us/library/bb802962(v=office.12).aspx > how to detect the character encoding in these 2 pages by python ? If you have the html code, let chardetect.py do an educated guess for you. http://pypi.python.org/pypi/chardet Example: $ wget -q -O - http://python.org/ | chardetect.py stdin: ISO-8859-2 with confidence 0.803579722043 $ $ wget -q -O - 'http://msdn.microsoft.com/en-us/library/bb802962(v=office.12).aspx' | chardetect.py stdin: utf-8 with confidence 0.87625 $ Grüessli -- kurt.alfred.mueller@gmail.com
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
how to detect the character encoding in a web page ? iMath <redstone-cold@163.com> - 2012-12-23 16:34 -0800
Re: how to detect the character encoding in a web page ? Chris Angelico <rosuav@gmail.com> - 2012-12-24 12:23 +1100
Re: how to detect the character encoding in a web page ? Hans Mulder <hansmu@xs4all.nl> - 2012-12-24 02:30 +0100
Re: how to detect the character encoding in a web page ? iMath <redstone-cold@163.com> - 2012-12-23 18:57 -0800
Re: how to detect the character encoding in a web page ? iMath <redstone-cold@163.com> - 2012-12-23 19:03 -0800
Re: how to detect the character encoding in a web page ? iMath <redstone-cold@163.com> - 2012-12-23 19:03 -0800
Re: how to detect the character encoding in a web page ? Kurt Mueller <kurt.alfred.mueller@gmail.com> - 2012-12-24 09:34 +0100
Re: how to detect the character encoding in a web page ? Kwpolska <kwpolska@gmail.com> - 2012-12-24 13:16 +0100
Re: how to detect the character encoding in a web page ? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-12-24 13:50 +0000
Re: how to detect the character encoding in a web page ? Alister <alister.ware@ntlworld.com> - 2012-12-24 16:27 +0000
Re: how to detect the character encoding in a web page ? Roy Smith <roy@panix.com> - 2012-12-24 11:46 -0500
Re: how to detect the character encoding in a web page ? albert@spenarnc.xs4all.nl (Albert van der Horst) - 2013-01-14 12:50 +0000
Re: how to detect the character encoding in a web page ? python培训 <51mmj.com@gmail.com> - 2012-12-28 06:30 -0800
Re: how to detect the character encoding in a web page ? iMath <redstone-cold@163.com> - 2013-01-07 01:23 -0800
csiph-web