Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #68690
| Path | csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!feeder.erje.net!us.feeder.erje.net!newsfeed.fsmpi.rwth-aachen.de!goblin3!goblin1!goblin.stu.neva.ru!uio.no!news.tele.dk!news.tele.dk!small.news.tele.dk!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <mail@timgolden.me.uk> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.004 |
| X-Spam-Evidence | '*H*': 0.99; '*S*': 0.00; 'error:': 0.07; 'method.': 0.07; 'line:': 0.09; 'lines.': 0.09; 'try:': 0.09; 'def': 0.12; 'csv': 0.16; 'from:addr:timgolden.me.uk': 0.16; 'from:name:tim golden': 0.16; 'message-id:@timgolden.me.uk': 0.16; 'received:74.55.86': 0.16; 'received:74.55.86.74': 0.16; 'received:smtp.webfaction.com': 0.16; 'received:webfaction.com': 0.16; 'skips': 0.16; 'subclass': 0.16; 'symbols': 0.16; 'tjg': 0.16; 'true:': 0.16; 'wrote:': 0.18; 'header:User-Agent:1': 0.23; 'byte': 0.24; 'received:192.168.100': 0.24; 'skip': 0.24; 'fine': 0.24; 'equivalent': 0.26; 'this:': 0.26; 'header:In-Reply-To:1': 0.27; 'point': 0.28; 'code': 0.31; 'strip': 0.31; 'way?': 0.31; 'file': 0.32; 'reader': 0.33; 'could': 0.34; 'problem': 0.35; 'except': 0.35; 'skip:u 20': 0.35; 'something': 0.35; 'data,': 0.36; 'options:': 0.36; 'yield': 0.36; 'too': 0.37; 'whatever': 0.38; 'to:addr:python-list': 0.38; 'to:addr:python.org': 0.39; 'how': 0.40; 'read': 0.60; 'you.': 0.62; 'email addr:gmail.com': 0.63; 'such': 0.63; 'from:addr:mail': 0.83; 'all!': 0.84; 'much,': 0.84; 'subject:read': 0.84 |
| Date | Fri, 21 Mar 2014 13:39:37 +0000 |
| From | Tim Golden <mail@timgolden.me.uk> |
| User-Agent | Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.0 |
| MIME-Version | 1.0 |
| To | python-list@python.org |
| Subject | Re: csv read _csv.Error: line contains NULL byte |
| References | <22aeefa3-cf82-457c-ab85-6f0366ff7b4e@googlegroups.com> |
| In-Reply-To | <22aeefa3-cf82-457c-ab85-6f0366ff7b4e@googlegroups.com> |
| Content-Type | text/plain; charset=UTF-8 |
| Content-Transfer-Encoding | 8bit |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.15 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.8354.1395409181.18130.python-list@python.org> (permalink) |
| Lines | 37 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1395409181 news.xs4all.nl 2930 [2001:888:2000:d::a6]:48387 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | csiph.com comp.lang.python:68690 |
Show key headers only | View raw
On 21/03/2014 13:29, chip9munk@gmail.com wrote:
> Hi all!
>
> I am reading from a huge csv file (> 20 Gb), so I have to read line by line:
>
> for i, row in enumerate(input_reader):
> # and I do something on each row
>
> Everything works fine until i get to a row with some strange symbols "0I`00�^"
> at that point I get an error: _csv.Error: line contains NULL byte
>
> How can i skip such row and continue going, or "decipher" it in some way?
Well you have several options:
Without disturbing your existing code too much, you could wrap the
input_reader in a generator which skips malformed lines. That would look
something like this:
def unfussy_reader(reader):
while True:
try:
yield next(reader)
except csv.Error:
# log the problem or whatever
continue
If you knew what to do with the malformed data, you strip it out and
carry on. Whatever works best for you.
Alternatively you could subclass the standard Reader and do something
equivalent to the above in the __next__ method.
TJG
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
csv read _csv.Error: line contains NULL byte chip9munk@gmail.com - 2014-03-21 06:29 -0700
Re: csv read _csv.Error: line contains NULL byte Tim Golden <mail@timgolden.me.uk> - 2014-03-21 13:39 +0000
Re: csv read _csv.Error: line contains NULL byte chip9munk@gmail.com - 2014-03-21 07:46 -0700
Re: csv read _csv.Error: line contains NULL byte chip9munk@gmail.com - 2014-03-21 07:59 -0700
Re: csv read _csv.Error: line contains NULL byte Tim Golden <mail@timgolden.me.uk> - 2014-03-21 14:59 +0000
Re: csv read _csv.Error: line contains NULL byte Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-03-21 15:15 +0000
csiph-web