Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #68690

Re: csv read _csv.Error: line contains NULL byte

Date 2014-03-21 13:39 +0000
From Tim Golden <mail@timgolden.me.uk>
Subject Re: csv read _csv.Error: line contains NULL byte
References <22aeefa3-cf82-457c-ab85-6f0366ff7b4e@googlegroups.com>
Newsgroups comp.lang.python
Message-ID <mailman.8354.1395409181.18130.python-list@python.org> (permalink)

Show all headers | View raw


On 21/03/2014 13:29, chip9munk@gmail.com wrote:
> Hi all!
> 
> I am reading from a huge csv file (> 20 Gb), so I have to read line by line:
> 
> for i, row in enumerate(input_reader):
>       #  and I do something on each row
> 
> Everything works fine until i get to a row with some strange symbols "0I`00�^"
> at that point I get an error: _csv.Error: line contains NULL byte
> 
> How can i skip such row and continue going, or "decipher" it in some way?

Well you have several options:

Without disturbing your existing code too much, you could wrap the
input_reader in a generator which skips malformed lines. That would look
something like this:

def unfussy_reader(reader):
    while True:
        try:
            yield next(reader)
        except csv.Error:
            # log the problem or whatever
            continue


If you knew what to do with the malformed data, you strip it out and
carry on. Whatever works best for you.

Alternatively you could subclass the standard Reader and do something
equivalent to the above in the __next__ method.

TJG

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

csv read _csv.Error: line contains NULL byte chip9munk@gmail.com - 2014-03-21 06:29 -0700
  Re: csv read _csv.Error: line contains NULL byte Tim Golden <mail@timgolden.me.uk> - 2014-03-21 13:39 +0000
    Re: csv read _csv.Error: line contains NULL byte chip9munk@gmail.com - 2014-03-21 07:46 -0700
      Re: csv read _csv.Error: line contains NULL byte chip9munk@gmail.com - 2014-03-21 07:59 -0700
      Re: csv read _csv.Error: line contains NULL byte Tim Golden <mail@timgolden.me.uk> - 2014-03-21 14:59 +0000
      Re: csv read _csv.Error: line contains NULL byte Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-03-21 15:15 +0000

csiph-web