Path: csiph.com!usenet.pasdenom.info!goblin2!goblin.stu.neva.ru!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.008 X-Spam-Evidence: '*H*': 0.98; '*S*': 0.00; '16,': 0.03; 'encoding': 0.05; 'that?': 0.05; 'utf-8': 0.07; 'ascii': 0.09; 'cc:addr :python-list': 0.11; 'def': 0.12; 'jan': 0.12; "'rb')": 0.16; 'guessing': 0.16; 'utf8': 0.16; 'thursday,': 0.16; 'wrote:': 0.18; 'thu,': 0.19; 'cc:addr:python.org': 0.22; 'cc:2**0': 0.24; 'skip:" 20': 0.27; 'header:In-Reply-To:1': 0.27; 'point': 0.28; 'function': 0.29; 'skip:- 40': 0.29; 'chris': 0.29; 'am,': 0.29; 'characters': 0.30; 'skip:g 30': 0.30; "skip:' 10": 0.31; "d'aprano": 0.31; 'steven': 0.31; 'thanks!': 0.32; 'fri,': 0.33; 'header:Received:9': 0.33; 'not.': 0.33; 'date:': 0.34; 'subject:the': 0.34; "i'd": 0.34; 'subject:from': 0.34; 'skip:s 30': 0.35; 'add': 0.35; 'there': 0.35; 'january': 0.37; 'files': 0.38; 'rather': 0.38; 'subject:': 0.39; 'received:98.137': 0.60; 'name': 0.63; 're:': 0.63; 'more': 0.64; 'to:addr:gmail.com': 0.65; 'email addr:python.org"': 0.68; 'default': 0.69; '2014,': 0.84; 'received:98.138.229': 0.84 X-Yahoo-Newman-Property: ymail-4 X-Yahoo-Newman-Id: 683205.47906.bm@omp1004.mail.gq1.yahoo.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1389901049; bh=U9fT6DeRG8i1bH4fHZYp+8w8XPpm2o88OLcQHNnEelo=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:Message-ID:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=1w9OgkYbLRqUFJX6LEzTg7hTRCOo75w5xGV9g60F062305RY5hkymP/WC5nOmM1lDy52YYCT4NC4eFK5gNPnnWHPlpXEUXih1LU6LBtCV5AdMMvreZYq/mp7deZ/55ZrVdH7dpz5744NHD9nRCDbtnZ5ogPXtd322XZhDgYSXLo= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:Message-ID:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=tY/D/wcqCAz7rgSXy/P0AJSI4QA4ERF0M+DCMv8/ZL2fyIL4mTNcxEfsgBF0dAN/VioYytxdq4h522tj+KwrmwRO05WJAOr+7CorK9RGKYYx1d+Hnsqkv1NX4WEVhdvnG6F8LJPzwhT3YoZgvBl0boPSk5XAU3WbVJ4h03aVA4o=; X-YMail-OSG: RLx0xhUVM1mb8RyysociwdF3JQTCW48i4YTvWihY3NjwLtz Y1zJeyRyAe0HRJ4NUffM0xp.Q_i0yLrL0c54oN9Tuk6UDwAvdUPwGSiU2v.y raVMEE_4tgombqy.re9Ui6b_x.5MVUA5dzzQ.Ev23IjZsUJF4qortMsQb8wr RuvMmJGLpUOC2mURuQvrQpLLhLIiDGTQp0KRIXLq5ExsTOjtMtp6uF6mlHbx dOfFXXRH2XtoARGmsx9Ln5XQjlZnRGFKxnq5kuKp3ekLDeQplMFbVyE1ckSV EHPSQU9A1LfXjRyk3fIHBCGeshNWPGifWpXq3U.b40qt7HwYGqARQOv.BP3p .zoRSdaCeV2PbkqBJplwmIRQw0abPHQM98afhil6UKKBxC.tDDLvTOBRCvtQ 7clrixOC_9ekJ2drrEWwYoTY9Ho2eS0gmFWU1nsZencuHU7MgntDIwrwClOS boG2k3lwi8UCH9gHX9XWcYAdpRhSu_n56zGcoq2a_mTW46sGrCeTW3PAv44T 4aNBMS2RMVz2A7KN9dqgS6RZtE.KdQ7AYJIcU7FBr36bZ7TB23EkHOmhYJ_C KtSnP3o5Cb0iTLfll7dC.l9A- X-Rocket-MIMEInfo: 002.001, LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0NCk9uIFRodSwgMS8xNi8xNCwgQ2hyaXMgQW5nZWxpY28gPHJvc3VhdkBnbWFpbC5jb20.IHdyb3RlOg0KDQogU3ViamVjdDogUmU6IEd1ZXNzaW5nIHRoZSBlbmNvZGluZyBmcm9tIGEgQk9NDQogVG86IA0KIENjOiAicHl0aG9uLWxpc3RAcHl0aG9uLm9yZyIgPHB5dGhvbi1saXN0QHB5dGhvbi5vcmc.DQogRGF0ZTogVGh1cnNkYXksIEphbnVhcnkgMTYsIDIwMTQsIDc6MDYgUE0NCiANCiBPbiBGcmksIEphbiAxNywgMjAxNCBhdCABMAEBAQE- X-Mailer: YahooMailClassic/387 YahooMailWebService/0.8.173.622 Date: Thu, 16 Jan 2014 11:37:29 -0800 (PST) From: Albert-Jan Roskam Subject: Re: Guessing the encoding from a BOM To: Chris Angelico In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Cc: "python-list@python.org" X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 49 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1389901239 news.xs4all.nl 2908 [2001:888:2000:d::a6]:39545 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:64102 -------------------------------------------- On Thu, 1/16/14, Chris Angelico wrote: Subject: Re: Guessing the encoding from a BOM To:=20 Cc: "python-list@python.org" Date: Thursday, January 16, 2014, 7:06 PM =20 On Fri, Jan 17, 2014 at 5:01 AM, Bj=F6rn Lindqvist wrote: > 2014/1/16 Steven D'Aprano : >> def guess_encoding_from_bom(filename, default): >>=A0 =A0=A0=A0with open(filename, 'rb') as f: >>=A0 =A0 =A0 =A0=A0=A0sig =3D f.read(4) >>=A0 =A0=A0=A0if sig.startswith((b'\xFE\xFF', b'\xFF\xFE')): >>=A0 =A0 =A0 =A0=A0=A0return 'utf_16' >>=A0 =A0=A0=A0elif sig.startswith((b'\x00\x00\xFE\xFF', b'\xFF\xFE\x00\x00')): >>=A0 =A0 =A0 =A0=A0=A0return 'utf_32' >>=A0 =A0=A0=A0else: >>=A0 =A0 =A0 =A0=A0=A0return default > > You might want to add the utf8 bom too: '\xEF\xBB\xBF'. =20 I'd actually rather not. It would tempt people to pollute UTF-8 files with a BOM, which is not necessary unless you are MS Notepad. =20 =3D=3D=3D> Can you elaborate on that? Unless your utf-8 files will only co= ntain ascii characters I do not understand why you would not want a bom utf= -8. Btw, isn't "read_encoding_from_bom" a better function name than "guess_enco= ding_from_bom"? I thought the point of BOMs was that there would be no more= need to guess? Thanks! Albert-Jan