Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #54417

Re: iterating over a file with two pointers

Path csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!rt.uk.eu.org!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Return-Path <joshua.landau.ws@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.001
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; '"""': 0.07; 'amounts': 0.07; 'lines,': 0.07; 'method.': 0.07; 'subject:file': 0.07; 'subject:two': 0.07; 'iterate': 0.09; 'line:': 0.09; 'preferable': 0.09; 'def': 0.12; 'suggest': 0.14; 'iteration': 0.16; 'man)': 0.16; 'option:': 0.16; 'storing': 0.16; 'true:': 0.16; 'files.': 0.16; 'sender:addr:gmail.com': 0.17; 'bit': 0.19; 'file,': 0.19; 'things.': 0.19; 'unlike': 0.19; 'fit': 0.20; 'memory': 0.22; 'certainly': 0.24; 'simpler': 0.24; 'defined': 0.27; 'header:In- Reply-To:1': 0.27; 'received:209.85.217': 0.29; 'message- id:@mail.gmail.com': 0.30; 'lines': 0.31; '(unless': 0.31; "d'aprano": 0.31; 'faster,': 0.31; 'steven': 0.31; 'allows': 0.31; 'probably': 0.32; 'another': 0.32; 'to:name:python-list': 0.33; 'there,': 0.34; "i'd": 0.34; 'subject:with': 0.35; "can't": 0.35; 'received:209.85': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'really': 0.36; 'yield': 0.36; 'two': 0.37; 'received:209': 0.37; 'being': 0.38; 'skip:o 20': 0.38; 'handle': 0.38; 'to:addr:python- list': 0.38; 'files': 0.38; 'to:addr:python.org': 0.39; 'enough': 0.39; 'read': 0.60; 'most': 0.60; 'break': 0.61; "you're": 0.61; "you'll": 0.62; 'to,': 0.72; 'subject:over': 0.84
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=Ta2qIsQtrZ1E9cFjqnWhP/kKHDBig8hGAjIn5ouPC0E=; b=LKqpzncgxsy7XXDT7vT5QQ1B5ufc02GITGvrFFaMB3R7EXOyhPGLs8euVk6NdYN1LS Orw0HgZDETRzIrst9E0HOJgkPhxX8xFrstc5xh7Gorm84N3eW7xDL47E8StIf4/7z8a4 oCU4BCpPcLD7np3JZQk2sFMmifrrLVobtGxo7+XIgctDu+U/Ugg7GBtP4sCFb5vxV0nn O+B0X+EE8QllXlbVqypjxtSqUYOnYplodIPkpKhWIaKSvu5LjZoi9/qasR1waWUqvcPB s86KGvS74ySYhuRe+PtR4vlVcTvcqrWXUG6zsKlj6i1Cdh8I8cjqpK98Y5U2fJCv6Ah+ QbMw==
X-Received by 10.152.87.169 with SMTP id az9mr5688lab.65.1379574331384; Thu, 19 Sep 2013 00:05:31 -0700 (PDT)
MIME-Version 1.0
Sender joshua.landau.ws@gmail.com
In-Reply-To <523a67c3$0$29988$c3e8da3$5496439d@news.astraweb.com>
References <3018b3d4-f914-4c89-9f26-cd4b2af32e73@googlegroups.com> <523a67c3$0$29988$c3e8da3$5496439d@news.astraweb.com>
From Joshua Landau <joshua@landau.ws>
Date Thu, 19 Sep 2013 08:04:51 +0100
X-Google-Sender-Auth 3QkLqupyZDeRjnoh_9WFCnZp734
Subject Re: iterating over a file with two pointers
To python-list <python-list@python.org>
Content-Type text/plain; charset=UTF-8
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.145.1379574349.18130.python-list@python.org> (permalink)
Lines 51
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1379574349 news.xs4all.nl 15968 [2001:888:2000:d::a6]:57443
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:54417

Show key headers only | View raw


Although "tee" is most certainly preferable because IO is far slower
than the small amounts of memory "tee" will use, you do have this
option:

    def iterate_file_lines(file):
        """
        Iterate over lines in a file, unlike normal
        iteration this allows seeking.
        """
        while True:
            line = thefile.readline()
            if not line:
                break

            yield line


    thefile = open("/tmp/thefile")
    thelines = iterate_file_lines(thefile)

    for line in thelines:
        print("Outer:", repr(line))

        if is_start(line):
            outer_position = thefile.tell()

            for line in thelines:
                print("Inner:", repr(line))

                if is_end(line):
                    break

            thefile.seek(outer_position)

It's simpler than having two files but probably not faster, "tee" will
almost definitely be way better a choice (unless the subsections can't
fit in memory) and it forfeits being able to change up the order of
these things.

If you want to change up the order to another defined order, you can
think about storing the subsections, but if you want to support
independent iteration you'll need to seek before every "readline"
which is a bit silly.

Basically, read it all into memory like Steven D'Aprano suggested. If
you really don't want to, use "tee". If you can't handle non-constant
memory usage (really? You're reading lines, man) I'd suggest my
method. If you can't handle the inflexibility there, use multiple
files.

There, is that enough choices?

Back to comp.lang.python | Previous | NextPrevious in thread | Find similar | Unroll thread


Thread

iterating over a file with two pointers nikhil Pandey <nikhilpandey90@gmail.com> - 2013-09-18 04:12 -0700
  Re: iterating over a file with two pointers Chris Angelico <rosuav@gmail.com> - 2013-09-18 21:21 +1000
    Re: iterating over a file with two pointers nikhil Pandey <nikhilpandey90@gmail.com> - 2013-09-18 05:07 -0700
      Re: iterating over a file with two pointers Travis Griggs <travisgriggs@gmail.com> - 2013-09-18 09:18 -0700
  Re: iterating over a file with two pointers Dave Angel <davea@davea.name> - 2013-09-18 11:39 +0000
    Re: iterating over a file with two pointers Roy Smith <roy@panix.com> - 2013-09-18 08:56 -0400
      Re: iterating over a file with two pointers Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-09-18 14:09 +0100
      Re: iterating over a file with two pointers Roy Smith <roy@panix.com> - 2013-09-18 10:36 -0400
      Re: iterating over a file with two pointers Dave Angel <davea@davea.name> - 2013-09-18 20:07 +0000
      Re: iterating over a file with two pointers Peter Otten <__peter__@web.de> - 2013-09-19 09:23 +0200
      Re: iterating over a file with two pointers Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-09-19 15:16 +0100
      Re: iterating over a file with two pointers Peter Otten <__peter__@web.de> - 2013-09-19 16:38 +0200
      Re: iterating over a file with two pointers Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-09-19 15:48 +0100
  Re: iterating over a file with two pointers Peter Otten <__peter__@web.de> - 2013-09-18 13:44 +0200
    Re: iterating over a file with two pointers nikhil Pandey <nikhilpandey90@gmail.com> - 2013-09-18 05:14 -0700
      Re: iterating over a file with two pointers Peter Otten <__peter__@web.de> - 2013-09-18 14:54 +0200
      Re: iterating over a file with two pointers Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-09-19 02:40 +0000
  Re: iterating over a file with two pointers Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-09-19 02:56 +0000
    Re: iterating over a file with two pointers Joshua Landau <joshua@landau.ws> - 2013-09-19 08:04 +0100

csiph-web