Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #35447

Re: how to detect the character encoding in a web page ?

Subject Re: how to detect the character encoding in a web page ?
From Kurt Mueller <kurt.alfred.mueller@gmail.com>
Date 2012-12-24 09:34 +0100
References <c15bad9a-a7f7-456e-8dc5-b1af67fbdd44@googlegroups.com> <2324928c-32de-4f9d-8ff1-5db6dcf5543a@googlegroups.com>
Newsgroups comp.lang.python
Message-ID <mailman.1245.1356338098.29569.python-list@python.org> (permalink)

Show all headers | View raw


Am 24.12.2012 um 04:03 schrieb iMath:
> but how to let python do it for you ? 
> such as these 2 pages 
> http://python.org/ 
> http://msdn.microsoft.com/en-us/library/bb802962(v=office.12).aspx
> how to  detect the character encoding in these 2 pages  by python ?


If you have the html code, let 
chardetect.py 
do an educated guess for you.

http://pypi.python.org/pypi/chardet

Example:
$ wget -q -O - http://python.org/ | chardetect.py 
stdin: ISO-8859-2 with confidence 0.803579722043
$ 

$ wget -q -O - 'http://msdn.microsoft.com/en-us/library/bb802962(v=office.12).aspx' | chardetect.py 
stdin: utf-8 with confidence 0.87625
$ 


Grüessli
-- 
kurt.alfred.mueller@gmail.com

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

how to detect the character encoding  in a web page ? iMath <redstone-cold@163.com> - 2012-12-23 16:34 -0800
  Re: how to detect the character encoding in a web page ? Chris Angelico <rosuav@gmail.com> - 2012-12-24 12:23 +1100
  Re: how to detect the character encoding  in a web page ? Hans Mulder <hansmu@xs4all.nl> - 2012-12-24 02:30 +0100
  Re: how to detect the character encoding  in a web page ? iMath <redstone-cold@163.com> - 2012-12-23 18:57 -0800
  Re: how to detect the character encoding  in a web page ? iMath <redstone-cold@163.com> - 2012-12-23 19:03 -0800
  Re: how to detect the character encoding  in a web page ? iMath <redstone-cold@163.com> - 2012-12-23 19:03 -0800
    Re: how to detect the character encoding  in a web page ? Kurt Mueller <kurt.alfred.mueller@gmail.com> - 2012-12-24 09:34 +0100
    Re: how to detect the character encoding in a web page ? Kwpolska <kwpolska@gmail.com> - 2012-12-24 13:16 +0100
      Re: how to detect the character encoding in a web page ? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-12-24 13:50 +0000
        Re: how to detect the character encoding in a web page ? Alister <alister.ware@ntlworld.com> - 2012-12-24 16:27 +0000
          Re: how to detect the character encoding in a web page ? Roy Smith <roy@panix.com> - 2012-12-24 11:46 -0500
            Re: how to detect the character encoding in a web page ? albert@spenarnc.xs4all.nl (Albert van der Horst) - 2013-01-14 12:50 +0000
  Re: how to detect the character encoding  in a web page ? python培训 <51mmj.com@gmail.com> - 2012-12-28 06:30 -0800
  Re: how to detect the character encoding  in a web page ? iMath <redstone-cold@163.com> - 2013-01-07 01:23 -0800

csiph-web