Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #39737
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Subject | Re: encoding error in python 27 |
| Date | 2013-02-24 09:34 +0100 |
| Organization | None |
| References | <a3d3d352-c170-4165-9552-741869106830@googlegroups.com> <86c880ca-ab2d-4406-832a-129235cf59bd@googlegroups.com> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.2397.1361694874.2939.python-list@python.org> (permalink) |
Hala Gamal wrote:
> thank you :)it worked well for small file but when i enter big file,, i
> obtain this error: "Traceback (most recent call last):
> File "D:\Python27\yarab (4).py", line 46, in <module>
> writer.add_document(**doc)
> File "build\bdist.win32\egg\whoosh\filedb\filewriting.py", line 369, in
> add_document
> items = field.index(value)
> File "build\bdist.win32\egg\whoosh\fields.py", line 466, in index
> return [(txt, 1, 1.0, '') for txt in self._tiers(num)]
> File "build\bdist.win32\egg\whoosh\fields.py", line 454, in _tiers
> yield self.to_text(num, shift=shift)
> File "build\bdist.win32\egg\whoosh\fields.py", line 487, in to_text
> return self._to_text(self.prepare_number(x), shift=shift,
> File "build\bdist.win32\egg\whoosh\fields.py", line 476, in
> prepare_number
> x = self.type(x)
> UnicodeEncodeError: 'decimal' codec can't encode characters in position
> 0-4: invalid decimal Unicode string" i don't know realy where is the
> problem? On Friday, February 22, 2013 4:55:22 PM UTC+2, Hala Gamal wrote:
>> my code works well with english file but when i use text file
>> encodede"utf-8" "my file contain some arabic letters" it doesn't work.
I guess that one of the fields you require to be NUMERIC contains non-digit
characters. Replace the line
>> writer.add_document(**doc)
with something similar to
try:
writer.add_document(**doc)
except UnicodeEncodeError:
print "Skipping malformed line", repr(i)
This will allow you to inspect the lines your script cannot handle and if
they are indeed "malformed" as I am guessing you can fix your input data.
i is a terrible name for a line in a file, btw. Also, you should avoid
readlines() which reads the whole file into memory and instead iterate over
the file object directly:
with codecs.open("tt.txt", encoding='utf-8-sig') as textfile:
for line in textfile: # no readlines(), can handle
# text files of arbitrary size
...
Back to comp.lang.python | Previous | Next — Previous in thread | Find similar | Unroll thread
encoding error in python 27 Hala Gamal <halagamal2009@gmail.com> - 2013-02-22 06:55 -0800
Re: encoding error in python 27 Peter Otten <__peter__@web.de> - 2013-02-22 16:40 +0100
Re: encoding error in python 27 MRAB <python@mrabarnett.plus.com> - 2013-02-22 17:35 +0000
Re: encoding error in python 27 Hala Gamal <halagamal2009@gmail.com> - 2013-02-23 20:31 -0800
Re: encoding error in python 27 Peter Otten <__peter__@web.de> - 2013-02-24 09:34 +0100
csiph-web