Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!xlned.com!feeder5.xlned.com!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.048 X-Spam-Evidence: '*H*': 0.90; '*S*': 0.00; 'stealing': 0.09; 'runs': 0.10; 'python': 0.11; '2.7': 0.14; 'anyway': 0.14; '17.': 0.16; '255': 0.16; 'prevent': 0.16; 'version.': 0.19; 'putting': 0.22; 'header:User-Agent:1': 0.23; 'certainly': 0.24; "i've": 0.25; "i'm": 0.30; 'fedora': 0.31; 'overhead': 0.31; 'search.': 0.31; 'probably': 0.32; 'checked': 0.32; 'linux': 0.33; 'but': 0.35; 'add': 0.35; 'doing': 0.36; "i'll": 0.36; 'subject:?': 0.36; 'massive': 0.38; 'somebody': 0.38; 'to:addr:python-list': 0.38; 'files': 0.38; 'does': 0.39; 'to:addr:python.org': 0.39; 'enough': 0.39; 'received:org': 0.40; 'even': 0.60; 'remove': 0.60; 'entire': 0.61; 'our': 0.64; 'home': 0.69; 'limit': 0.70 Date: Tue, 24 Sep 2013 16:17:18 +0100 From: "J. Bagg" Organization: Dept of Anthropology, University of Kent User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.1.16) Gecko/20101125 Thunderbird/3.0.11 MIME-Version: 1.0 To: python-list@python.org Subject: removing BOM prepended by codecs? Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Mailman-Approved-At: Tue, 24 Sep 2013 17:38:27 +0200 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 21 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1380037108 news.xs4all.nl 15995 [2001:888:2000:d::a6]:40352 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:54707 I've checked the original files using od and they don't have BOMs. I'll remove them in the servlet. The overhead is probably small enough unless somebody is doing a massive search. We have a limit anyway to prevent somebody stealing the entire set of data. I started writing the Python search because the ancient C search had started putting out BOMs. I'm actually mystified because our home Linux box does not add BOMs even though it runs 2.7 but my work one does even though it has the same version. The only difference is Fedora 18 v Fedora 17. The BOMs are certainly there: <86> %R 10C0203z-621 %A François-Xavier Le_Bourdonnec 0000000 206 255 373 % R 1 0 C 0 2 0 3 z - J