Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #64102

Re: Guessing the encoding from a BOM

Path csiph.com!usenet.pasdenom.info!goblin2!goblin.stu.neva.ru!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <fomcl@yahoo.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.008
X-Spam-Evidence '*H*': 0.98; '*S*': 0.00; '16,': 0.03; 'encoding': 0.05; 'that?': 0.05; 'utf-8': 0.07; 'ascii': 0.09; 'cc:addr :python-list': 0.11; 'def': 0.12; 'jan': 0.12; "'rb')": 0.16; 'guessing': 0.16; 'utf8': 0.16; 'thursday,': 0.16; 'wrote:': 0.18; 'thu,': 0.19; 'cc:addr:python.org': 0.22; 'cc:2**0': 0.24; 'skip:" 20': 0.27; 'header:In-Reply-To:1': 0.27; 'point': 0.28; 'function': 0.29; 'skip:- 40': 0.29; 'chris': 0.29; 'am,': 0.29; 'characters': 0.30; 'skip:g 30': 0.30; "skip:' 10": 0.31; "d'aprano": 0.31; 'steven': 0.31; 'thanks!': 0.32; 'fri,': 0.33; 'header:Received:9': 0.33; 'not.': 0.33; 'date:': 0.34; 'subject:the': 0.34; "i'd": 0.34; 'subject:from': 0.34; 'skip:s 30': 0.35; 'add': 0.35; 'there': 0.35; 'january': 0.37; 'files': 0.38; 'rather': 0.38; 'subject:': 0.39; 'received:98.137': 0.60; 'name': 0.63; 're:': 0.63; 'more': 0.64; 'to:addr:gmail.com': 0.65; 'email addr:python.org"': 0.68; 'default': 0.69; '2014,': 0.84; 'received:98.138.229': 0.84
X-Yahoo-Newman-Property ymail-4
X-Yahoo-Newman-Id 683205.47906.bm@omp1004.mail.gq1.yahoo.com
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1389901049; bh=U9fT6DeRG8i1bH4fHZYp+8w8XPpm2o88OLcQHNnEelo=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:Message-ID:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=1w9OgkYbLRqUFJX6LEzTg7hTRCOo75w5xGV9g60F062305RY5hkymP/WC5nOmM1lDy52YYCT4NC4eFK5gNPnnWHPlpXEUXih1LU6LBtCV5AdMMvreZYq/mp7deZ/55ZrVdH7dpz5744NHD9nRCDbtnZ5ogPXtd322XZhDgYSXLo=
DomainKey-Signature a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:Message-ID:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=tY/D/wcqCAz7rgSXy/P0AJSI4QA4ERF0M+DCMv8/ZL2fyIL4mTNcxEfsgBF0dAN/VioYytxdq4h522tj+KwrmwRO05WJAOr+7CorK9RGKYYx1d+Hnsqkv1NX4WEVhdvnG6F8LJPzwhT3YoZgvBl0boPSk5XAU3WbVJ4h03aVA4o=;
X-YMail-OSG RLx0xhUVM1mb8RyysociwdF3JQTCW48i4YTvWihY3NjwLtz Y1zJeyRyAe0HRJ4NUffM0xp.Q_i0yLrL0c54oN9Tuk6UDwAvdUPwGSiU2v.y raVMEE_4tgombqy.re9Ui6b_x.5MVUA5dzzQ.Ev23IjZsUJF4qortMsQb8wr RuvMmJGLpUOC2mURuQvrQpLLhLIiDGTQp0KRIXLq5ExsTOjtMtp6uF6mlHbx dOfFXXRH2XtoARGmsx9Ln5XQjlZnRGFKxnq5kuKp3ekLDeQplMFbVyE1ckSV EHPSQU9A1LfXjRyk3fIHBCGeshNWPGifWpXq3U.b40qt7HwYGqARQOv.BP3p .zoRSdaCeV2PbkqBJplwmIRQw0abPHQM98afhil6UKKBxC.tDDLvTOBRCvtQ 7clrixOC_9ekJ2drrEWwYoTY9Ho2eS0gmFWU1nsZencuHU7MgntDIwrwClOS boG2k3lwi8UCH9gHX9XWcYAdpRhSu_n56zGcoq2a_mTW46sGrCeTW3PAv44T 4aNBMS2RMVz2A7KN9dqgS6RZtE.KdQ7AYJIcU7FBr36bZ7TB23EkHOmhYJ_C KtSnP3o5Cb0iTLfll7dC.l9A-
X-Rocket-MIMEInfo 002.001, LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0NCk9uIFRodSwgMS8xNi8xNCwgQ2hyaXMgQW5nZWxpY28gPHJvc3VhdkBnbWFpbC5jb20.IHdyb3RlOg0KDQogU3ViamVjdDogUmU6IEd1ZXNzaW5nIHRoZSBlbmNvZGluZyBmcm9tIGEgQk9NDQogVG86IA0KIENjOiAicHl0aG9uLWxpc3RAcHl0aG9uLm9yZyIgPHB5dGhvbi1saXN0QHB5dGhvbi5vcmc.DQogRGF0ZTogVGh1cnNkYXksIEphbnVhcnkgMTYsIDIwMTQsIDc6MDYgUE0NCiANCiBPbiBGcmksIEphbiAxNywgMjAxNCBhdCABMAEBAQE-
X-Mailer YahooMailClassic/387 YahooMailWebService/0.8.173.622
Date Thu, 16 Jan 2014 11:37:29 -0800 (PST)
From Albert-Jan Roskam <fomcl@yahoo.com>
Subject Re: Guessing the encoding from a BOM
To Chris Angelico <rosuav@gmail.com>
In-Reply-To <CAPTjJmqyO0UHrq31510iNeoQ2TcrJnosV0A6oHQOt5i-gz3njA@mail.gmail.com>
MIME-Version 1.0
Content-Type text/plain; charset=iso-8859-1
Content-Transfer-Encoding quoted-printable
Cc "python-list@python.org" <python-list@python.org>
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.5601.1389901239.18130.python-list@python.org> (permalink)
Lines 49
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1389901239 news.xs4all.nl 2908 [2001:888:2000:d::a6]:39545
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:64102

Show key headers only | View raw


--------------------------------------------
On Thu, 1/16/14, Chris Angelico <rosuav@gmail.com> wrote:

 Subject: Re: Guessing the encoding from a BOM
 To: 
 Cc: "python-list@python.org" <python-list@python.org>
 Date: Thursday, January 16, 2014, 7:06 PM
 
 On Fri, Jan 17, 2014 at 5:01 AM,
 Björn Lindqvist <bjourne@gmail.com>
 wrote:
 > 2014/1/16 Steven D'Aprano <steve+comp.lang.python@pearwood.info>:
 >> def guess_encoding_from_bom(filename, default):
 >>     with open(filename, 'rb')
 as f:
 >>         sig =
 f.read(4)
 >>     if
 sig.startswith((b'\xFE\xFF', b'\xFF\xFE')):
 >>         return
 'utf_16'
 >>     elif
 sig.startswith((b'\x00\x00\xFE\xFF', b'\xFF\xFE\x00\x00')):
 >>         return
 'utf_32'
 >>     else:
 >>         return
 default
 >
 > You might want to add the utf8 bom too:
 '\xEF\xBB\xBF'.
 
 I'd actually rather not. It would tempt people to pollute
 UTF-8 files
 with a BOM, which is not necessary unless you are MS
 Notepad.
 

 ===> Can you elaborate on that? Unless your utf-8 files will only contain ascii characters I do not understand why you would not want a bom utf-8.

Btw, isn't "read_encoding_from_bom" a better function name than "guess_encoding_from_bom"? I thought the point of BOMs was that there would be no more need to guess?

Thanks!

Albert-Jan

Back to comp.lang.python | Previous | NextNext in thread | Find similar | Unroll thread


Thread

Re: Guessing the encoding from a BOM Albert-Jan Roskam <fomcl@yahoo.com> - 2014-01-16 11:37 -0800
  Re: Guessing the encoding from a BOM Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-01-17 01:18 +0000

csiph-web