Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #68689 > unrolled thread
| Started by | chip9munk@gmail.com |
|---|---|
| First post | 2014-03-21 06:29 -0700 |
| Last post | 2014-03-21 15:15 +0000 |
| Articles | 6 — 3 participants |
Back to article view | Back to comp.lang.python
csv read _csv.Error: line contains NULL byte chip9munk@gmail.com - 2014-03-21 06:29 -0700
Re: csv read _csv.Error: line contains NULL byte Tim Golden <mail@timgolden.me.uk> - 2014-03-21 13:39 +0000
Re: csv read _csv.Error: line contains NULL byte chip9munk@gmail.com - 2014-03-21 07:46 -0700
Re: csv read _csv.Error: line contains NULL byte chip9munk@gmail.com - 2014-03-21 07:59 -0700
Re: csv read _csv.Error: line contains NULL byte Tim Golden <mail@timgolden.me.uk> - 2014-03-21 14:59 +0000
Re: csv read _csv.Error: line contains NULL byte Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-03-21 15:15 +0000
| From | chip9munk@gmail.com |
|---|---|
| Date | 2014-03-21 06:29 -0700 |
| Subject | csv read _csv.Error: line contains NULL byte |
| Message-ID | <22aeefa3-cf82-457c-ab85-6f0366ff7b4e@googlegroups.com> |
Hi all!
I am reading from a huge csv file (> 20 Gb), so I have to read line by line:
for i, row in enumerate(input_reader):
# and I do something on each row
Everything works fine until i get to a row with some strange symbols "0I`00�^"
at that point I get an error: _csv.Error: line contains NULL byte
How can i skip such row and continue going, or "decipher" it in some way?
I have tried :
csvFile = open(input_file_path, 'rb')
csvFile = open(input_file_path, 'rU')
csvFile = open(input_file_path, 'r')
and nothing works.
if I do:
try:
for i, row in enumerate(input_reader):
# and I do something on each row
except Exception:
sys.exc_clear()
i simply stop an that line. I would like to skip it and move on.
Please help!
Best,
Chip Munk
[toc] | [next] | [standalone]
| From | Tim Golden <mail@timgolden.me.uk> |
|---|---|
| Date | 2014-03-21 13:39 +0000 |
| Message-ID | <mailman.8354.1395409181.18130.python-list@python.org> |
| In reply to | #68689 |
On 21/03/2014 13:29, chip9munk@gmail.com wrote:
> Hi all!
>
> I am reading from a huge csv file (> 20 Gb), so I have to read line by line:
>
> for i, row in enumerate(input_reader):
> # and I do something on each row
>
> Everything works fine until i get to a row with some strange symbols "0I`00�^"
> at that point I get an error: _csv.Error: line contains NULL byte
>
> How can i skip such row and continue going, or "decipher" it in some way?
Well you have several options:
Without disturbing your existing code too much, you could wrap the
input_reader in a generator which skips malformed lines. That would look
something like this:
def unfussy_reader(reader):
while True:
try:
yield next(reader)
except csv.Error:
# log the problem or whatever
continue
If you knew what to do with the malformed data, you strip it out and
carry on. Whatever works best for you.
Alternatively you could subclass the standard Reader and do something
equivalent to the above in the __next__ method.
TJG
[toc] | [prev] | [next] | [standalone]
| From | chip9munk@gmail.com |
|---|---|
| Date | 2014-03-21 07:46 -0700 |
| Message-ID | <fefcec40-3bd9-4a94-9ae8-4f214fce2302@googlegroups.com> |
| In reply to | #68690 |
On Friday, March 21, 2014 2:39:37 PM UTC+1, Tim Golden wrote: > Without disturbing your existing code too much, you could wrap the > > input_reader in a generator which skips malformed lines. That would look > > something like this: > > > > def unfussy_reader(reader): > > while True: > > try: > > yield next(reader) > > except csv.Error: > > # log the problem or whatever > > continue I am sorry I do not understand how to get to each row in this way. Please could you explain also this: If I define this function, how do I change my for loop to get each row? Thanks!
[toc] | [prev] | [next] | [standalone]
| From | chip9munk@gmail.com |
|---|---|
| Date | 2014-03-21 07:59 -0700 |
| Message-ID | <c66fbee9-d585-4d0d-98be-e925f2cfef5f@googlegroups.com> |
| In reply to | #68696 |
Ok, I have figured it out:
for i, row in enumerate(unfussy_reader(input_reader):
# and I do something on each row
Sorry, it is my first "face to face" with generators!
Thank you very much!
Best,
Chip Munk
[toc] | [prev] | [next] | [standalone]
| From | Tim Golden <mail@timgolden.me.uk> |
|---|---|
| Date | 2014-03-21 14:59 +0000 |
| Message-ID | <mailman.8361.1395414009.18130.python-list@python.org> |
| In reply to | #68696 |
On 21/03/2014 14:46, chip9munk@gmail.com wrote:
> I am sorry I do not understand how to get to each row in this way.
>
> Please could you explain also this:
> If I define this function,
> how do I change my for loop to get each row?
Does this help?
<code>
#!python3
import csv
def unfussy_reader(csv_reader):
while True:
try:
yield next(csv_reader)
except csv.Error:
# log the problem or whatever
print("Problem with some row")
continue
if __name__ == '__main__':
#
# Generate malformed csv file for
# demonstration purposes
#
with open("temp.csv", "w") as fout:
fout.write("abc,def\nghi\x00,klm\n123,456")
#
# Open the malformed file for reading, fire up a
# conventional CSV reader over it, wrap that reader
# in our "unfussy" generator and enumerate over that
# generator.
#
with open("temp.csv") as fin:
reader = unfussy_reader(csv.reader(fin))
for n, row in enumerate(reader):
print(n, "=>", row)
</code>
TJG
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2014-03-21 15:15 +0000 |
| Message-ID | <mailman.8362.1395414925.18130.python-list@python.org> |
| In reply to | #68696 |
On 21/03/2014 14:46, chip9munk@gmail.com wrote: > On Friday, March 21, 2014 2:39:37 PM UTC+1, Tim Golden wrote: > >> Without disturbing your existing code too much, you could wrap the >> >> input_reader in a generator which skips malformed lines. That would look >> >> something like this: >> >> >> >> def unfussy_reader(reader): >> >> while True: >> >> try: >> >> yield next(reader) >> >> except csv.Error: >> >> # log the problem or whatever >> >> continue > > > I am sorry I do not understand how to get to each row in this way. > > Please could you explain also this: > If I define this function, > how do I change my for loop to get each row? > > Thanks! > I'm pleased to see that you have answers. In return would you either use the mailing list https://mail.python.org/mailman/listinfo/python-list or read and action this https://wiki.python.org/moin/GoogleGroupsPython to prevent us seeing double line spacing and single line paragraphs, thanks. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web