Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'url:pypi': 0.03; 'parsing': 0.07; 'python': 0.09; 'encode': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'terry': 0.09; 'subject:error': 0.11; 'yet.': 0.13; 'file,': 0.15; 'codec': 0.16; 'mark,': 0.16; 'received:80.91.229.3': 0.16; 'received:plane.gmane.org': 0.16; 'reedy': 0.16; 'string': 0.17; 'wrote:': 0.17; 'bytes': 0.17; 'unicode': 0.17; 'jan': 0.18; '3.2': 0.22; 'header:In-Reply-To:1': 0.25; 'header:User-Agent:1': 0.26; 'used,': 0.27; 'header:X -Complaints-To:1': 0.28; 'decimal': 0.29; 'character': 0.29; 'url:python': 0.32; 'file': 0.32; 'to:addr:python-list': 0.33; 'that,': 0.34; 'version': 0.34; "can't": 0.34; 'done': 0.34; 'pm,': 0.35; 'received:org': 0.36; 'but': 0.36; 'url:org': 0.36; 'should': 0.36; 'subject:: ': 0.38; 'possible.': 0.38; 'to:addr:python.org': 0.39; 'header:Received:5': 0.40; 'skip:u 10': 0.60; 'first': 0.61; 'email addr:gmail.com': 0.63; 'believe': 0.69; 'received:fios.verizon.net': 0.84 X-Injected-Via-Gmane: http://gmane.org/ To: python-list@python.org From: Terry Reedy Subject: Re: encoding error Date: Wed, 20 Feb 2013 01:13:09 -0500 References: <974651c6-c5b2-4fba-b733-67ec65ec733f@googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Gmane-NNTP-Posting-Host: pool-173-75-251-66.phlapa.fios.verizon.net User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130107 Thunderbird/17.0.2 In-Reply-To: <974651c6-c5b2-4fba-b733-67ec65ec733f@googlegroups.com> X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 18 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1361340815 news.xs4all.nl 6895 [2001:888:2000:d::a6]:40753 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:39317 On 2/19/2013 8:07 PM, halagamal2009@gmail.com wrote: > UnicodeEncodeError: 'decimal' codec can't encode character u'\ufeff' > in position 0: invalid decimal Unicode string I believe that is a byte-order mark, which should only be the first 2 bytes in the file and which should be removed if you use the proper decoder when reading the file, before parsing it. You did not say what version of Python you used, but I would use 3.3 or if not that, 3.2 if possible. http://pypi.python.org/pypi/Whoosh/ claims that whoosh works with python 3. Also, read about the basics of unicode if you have not done so yet. -- Terry Jan Reedy