Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #68690

Re: csv read _csv.Error: line contains NULL byte

Path csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!feeder.erje.net!us.feeder.erje.net!newsfeed.fsmpi.rwth-aachen.de!goblin3!goblin1!goblin.stu.neva.ru!uio.no!news.tele.dk!news.tele.dk!small.news.tele.dk!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <mail@timgolden.me.uk>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.004
X-Spam-Evidence '*H*': 0.99; '*S*': 0.00; 'error:': 0.07; 'method.': 0.07; 'line:': 0.09; 'lines.': 0.09; 'try:': 0.09; 'def': 0.12; 'csv': 0.16; 'from:addr:timgolden.me.uk': 0.16; 'from:name:tim golden': 0.16; 'message-id:@timgolden.me.uk': 0.16; 'received:74.55.86': 0.16; 'received:74.55.86.74': 0.16; 'received:smtp.webfaction.com': 0.16; 'received:webfaction.com': 0.16; 'skips': 0.16; 'subclass': 0.16; 'symbols': 0.16; 'tjg': 0.16; 'true:': 0.16; 'wrote:': 0.18; 'header:User-Agent:1': 0.23; 'byte': 0.24; 'received:192.168.100': 0.24; 'skip': 0.24; 'fine': 0.24; 'equivalent': 0.26; 'this:': 0.26; 'header:In-Reply-To:1': 0.27; 'point': 0.28; 'code': 0.31; 'strip': 0.31; 'way?': 0.31; 'file': 0.32; 'reader': 0.33; 'could': 0.34; 'problem': 0.35; 'except': 0.35; 'skip:u 20': 0.35; 'something': 0.35; 'data,': 0.36; 'options:': 0.36; 'yield': 0.36; 'too': 0.37; 'whatever': 0.38; 'to:addr:python-list': 0.38; 'to:addr:python.org': 0.39; 'how': 0.40; 'read': 0.60; 'you.': 0.62; 'email addr:gmail.com': 0.63; 'such': 0.63; 'from:addr:mail': 0.83; 'all!': 0.84; 'much,': 0.84; 'subject:read': 0.84
Date Fri, 21 Mar 2014 13:39:37 +0000
From Tim Golden <mail@timgolden.me.uk>
User-Agent Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.0
MIME-Version 1.0
To python-list@python.org
Subject Re: csv read _csv.Error: line contains NULL byte
References <22aeefa3-cf82-457c-ab85-6f0366ff7b4e@googlegroups.com>
In-Reply-To <22aeefa3-cf82-457c-ab85-6f0366ff7b4e@googlegroups.com>
Content-Type text/plain; charset=UTF-8
Content-Transfer-Encoding 8bit
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.8354.1395409181.18130.python-list@python.org> (permalink)
Lines 37
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1395409181 news.xs4all.nl 2930 [2001:888:2000:d::a6]:48387
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:68690

Show key headers only | View raw


On 21/03/2014 13:29, chip9munk@gmail.com wrote:
> Hi all!
> 
> I am reading from a huge csv file (> 20 Gb), so I have to read line by line:
> 
> for i, row in enumerate(input_reader):
>       #  and I do something on each row
> 
> Everything works fine until i get to a row with some strange symbols "0I`00�^"
> at that point I get an error: _csv.Error: line contains NULL byte
> 
> How can i skip such row and continue going, or "decipher" it in some way?

Well you have several options:

Without disturbing your existing code too much, you could wrap the
input_reader in a generator which skips malformed lines. That would look
something like this:

def unfussy_reader(reader):
    while True:
        try:
            yield next(reader)
        except csv.Error:
            # log the problem or whatever
            continue


If you knew what to do with the malformed data, you strip it out and
carry on. Whatever works best for you.

Alternatively you could subclass the standard Reader and do something
equivalent to the above in the __next__ method.

TJG

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

csv read _csv.Error: line contains NULL byte chip9munk@gmail.com - 2014-03-21 06:29 -0700
  Re: csv read _csv.Error: line contains NULL byte Tim Golden <mail@timgolden.me.uk> - 2014-03-21 13:39 +0000
    Re: csv read _csv.Error: line contains NULL byte chip9munk@gmail.com - 2014-03-21 07:46 -0700
      Re: csv read _csv.Error: line contains NULL byte chip9munk@gmail.com - 2014-03-21 07:59 -0700
      Re: csv read _csv.Error: line contains NULL byte Tim Golden <mail@timgolden.me.uk> - 2014-03-21 14:59 +0000
      Re: csv read _csv.Error: line contains NULL byte Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-03-21 15:15 +0000

csiph-web