Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #99987

Re: getting fileinput to do errors='ignore' or 'replace'?

Path csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail
From Oscar Benjamin <oscar.j.benjamin@gmail.com>
Newsgroups comp.lang.python
Subject Re: getting fileinput to do errors='ignore' or 'replace'?
Date Thu, 3 Dec 2015 22:26:22 +0000
Lines 48
Message-ID <mailman.187.1449181591.14615.python-list@python.org> (permalink)
References <fn26jcxltl.ln2@news.ducksburg.com> <8336jcxi2m.ln2@news.ducksburg.com> <n3prpd$pt4$1@ger.gmane.org>
Mime-Version 1.0
Content-Type text/plain; charset=UTF-8
X-Trace news.uni-berlin.de U8JG1w01HGyuqGwnrxobOgQ4z2Lo0wpAc94tWYEpca3Q==
Return-Path <oscar.j.benjamin@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.000
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'context': 0.05; 'skip:o 50': 0.07; "subject:' ": 0.07; 'subject:getting': 0.07; 'utf-8': 0.07; 'cc:addr:python-list': 0.09; 'closed.': 0.09; 'effect.': 0.09; 'files:': 0.09; 'subject:ignore': 0.09; 'whichever': 0.09; 'python': 0.10; '2.7': 0.13; 'ignore': 0.14; '&gt;&gt;&gt;': 0.15; 'encoding': 0.15; 'skip:f 30': 0.15; '"terry': 0.16; '2.7.3': 0.16; 'adam': 0.16; 'cc:name:python list': 0.16; 'consume': 0.16; 'email addr:udel.edu&gt;': 0.16; 'email name:&lt;tjreedy': 0.16; 'iteration': 0.16; 'iterators': 0.16; 'iterators,': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'reedy"': 0.16; 'statement.': 0.16; 'wrote:': 0.16; '&gt;': 0.18; 'input': 0.18; '>>>': 0.20; '2015': 0.20; 'cc:2**0': 0.20; 'cc:addr:python.org': 0.20; '(on': 0.22; 'mixed': 0.22; 'am,': 0.23; 'dec': 0.23; 'forgot': 0.23; 'header:In-Reply-To:1': 0.24; 'module': 0.25; "i've": 0.25; "doesn't": 0.26; 'message-id:@mail.gmail.com': 0.27; 'function': 0.28; "skip:' 10": 0.28; 'fine': 0.28; "i'm": 0.30; 'received:209.85.215.46': 0.30; "i'd": 0.31; 'wrap': 0.33; 'open': 0.33; 'correctly': 0.34; 'file': 0.34; 'this?': 0.34; 'received:google.com': 0.35; 'trouble': 0.35; 'replace': 0.35; 'but': 0.36; 'instead': 0.36; 'received:209.85': 0.36; 'possible': 0.36; '(and': 0.36; 'subject:: ': 0.37; 'received:209': 0.38; 'files': 0.38; 'end': 0.39; 'does': 0.39; 'easily': 0.39; 'some': 0.40; 'your': 0.60; '&amp;': 0.61; 'different': 0.63; 'managers': 0.63; 'skip:\xc2 10': 0.67; 'guaranteed': 0.67; 'oscar': 0.84
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=Xxijl/QrxJGvHvvbjcpZBMsWdgKUYlmDJFEAepmfXNI=; b=W/zHYMJdgO4t06R8TIG3M0/dqq/ztsAoXw5a1PMstL5EQVY3vVJbdHK3FwluYnmUNc J3blUdE+fzOqR8Y4b9NPadNsHy+zkbdBtE890/DOeO9V5d2av/V2MsBfoGC9qICaPfsD V6NU7NdI8HKBgd4xz+uvvNj/7E2SpqDC74ak4xtV8zJdNfiP+7hGkwEzeU1b/NvcLjdU GJ5mnSD016e4ucUGo6Vlug5pzE1aMjWx4bjSNiXowpa3+VK/YhELYHDyrnUOGjWr7RZA yHkcGMZyz221pr+SPpnPU1OEs+bzREyBTWhk5x2koD40bSh0Kf4EdO7kixnvdR/6tmmY DHIA==
X-Received by 10.25.87.79 with SMTP id l76mr6720551lfb.136.1449181583386; Thu, 03 Dec 2015 14:26:23 -0800 (PST)
In-Reply-To <n3prpd$pt4$1@ger.gmane.org>
X-Content-Filtered-By Mailman/MimeDel 2.1.20+
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.20+
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Xref csiph.com comp.lang.python:99987

Show key headers only | View raw


On 3 Dec 2015 16:50, "Terry Reedy" <tjreedy@udel.edu> wrote:
>
> On 12/3/2015 10:18 AM, Adam Funk wrote:
>>
>> On 2015-12-03, Adam Funk wrote:
>>
>>> I'm having trouble with some input files that are almost all proper
>>> UTF-8 but with a couple of troublesome characters mixed in, which I'd
>>> like to ignore instead of throwing ValueError.  I've found the
>>> openhook for the encoding
>>>
>>> for line in fileinput.input(options.files,
openhook=fileinput.hook_encoded("utf-8")):
>>>      do_stuff(line)
>>>
>>> which the documentation describes as "a hook which opens each file
>>> with codecs.open(), using the given encoding to read the file", but
>>> I'd like codecs.open() to also have the errors='ignore' or
>>> errors='replace' effect.  Is it possible to do this?
>>
>>
>> I forgot to mention: this is for Python 2.7.3 & 2.7.10 (on different
>> machines).
>
>
> fileinput is an ancient module that predates iterators (and generators)
and context managers. Since by 2.7 open files are both context managers and
line iterators, you can easily write your own multi-file line iteration
that does exactly what you want.  At minimum:
>
> for file in files:
>     with codecs.open(file, errors='ignore') as f
>     # did not look up signature,
>         for line in f:
>             do_stuff(line)

The above is fine but...

> To make this reusable, wrap in 'def filelines(files):' and replace
'do_stuff(line)' with 'yield line'.

That doesn't work entirely correctly as you end up yielding from inside a
with statement. If the user of your generator function doesn't fully
consume the generator then whichever file is currently open is not
guaranteed to be closed.

--
Oscar

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

getting fileinput to do errors='ignore' or 'replace'? Adam Funk <a24061@ducksburg.com> - 2015-12-03 15:12 +0000
  Re: getting fileinput to do errors='ignore' or 'replace'? Adam Funk <a24061@ducksburg.com> - 2015-12-03 15:18 +0000
    Re: getting fileinput to do errors='ignore' or 'replace'? Peter Otten <__peter__@web.de> - 2015-12-03 17:11 +0100
      Re: getting fileinput to do errors='ignore' or 'replace'? Adam Funk <a24061@ducksburg.com> - 2015-12-03 19:17 +0000
    Re: getting fileinput to do errors='ignore' or 'replace'? Terry Reedy <tjreedy@udel.edu> - 2015-12-03 11:48 -0500
      Re: getting fileinput to do errors='ignore' or 'replace'? Adam Funk <a24061@ducksburg.com> - 2015-12-03 19:21 +0000
    Re: getting fileinput to do errors='ignore' or 'replace'? Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2015-12-03 22:26 +0000
    Re: getting fileinput to do errors='ignore' or 'replace'? Serhiy Storchaka <storchaka@gmail.com> - 2015-12-04 10:34 +0200
    Re: getting fileinput to do errors='ignore' or 'replace'? Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2015-12-04 09:00 +0000
      Re: getting fileinput to do errors='ignore' or 'replace'? Adam Funk <a24061@ducksburg.com> - 2015-12-07 14:46 +0000
  Re: getting fileinput to do errors='ignore' or 'replace'? MRAB <python@mrabarnett.plus.com> - 2015-12-03 16:12 +0000
  Re: getting fileinput to do errors='ignore' or 'replace'? Laura Creighton <lac@openend.se> - 2015-12-03 17:46 +0100
    Re: getting fileinput to do errors='ignore' or 'replace'? Adam Funk <a24061@ducksburg.com> - 2015-12-03 19:17 +0000
      Re: getting fileinput to do errors='ignore' or 'replace'? Laura Creighton <lac@openend.se> - 2015-12-03 21:40 +0100

csiph-web