Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!rt.uk.eu.org!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; '"""': 0.07; 'amounts': 0.07; 'lines,': 0.07; 'method.': 0.07; 'subject:file': 0.07; 'subject:two': 0.07; 'iterate': 0.09; 'line:': 0.09; 'preferable': 0.09; 'def': 0.12; 'suggest': 0.14; 'iteration': 0.16; 'man)': 0.16; 'option:': 0.16; 'storing': 0.16; 'true:': 0.16; 'files.': 0.16; 'sender:addr:gmail.com': 0.17; 'bit': 0.19; 'file,': 0.19; 'things.': 0.19; 'unlike': 0.19; 'fit': 0.20; 'memory': 0.22; 'certainly': 0.24; 'simpler': 0.24; 'defined': 0.27; 'header:In- Reply-To:1': 0.27; 'received:209.85.217': 0.29; 'message- id:@mail.gmail.com': 0.30; 'lines': 0.31; '(unless': 0.31; "d'aprano": 0.31; 'faster,': 0.31; 'steven': 0.31; 'allows': 0.31; 'probably': 0.32; 'another': 0.32; 'to:name:python-list': 0.33; 'there,': 0.34; "i'd": 0.34; 'subject:with': 0.35; "can't": 0.35; 'received:209.85': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'really': 0.36; 'yield': 0.36; 'two': 0.37; 'received:209': 0.37; 'being': 0.38; 'skip:o 20': 0.38; 'handle': 0.38; 'to:addr:python- list': 0.38; 'files': 0.38; 'to:addr:python.org': 0.39; 'enough': 0.39; 'read': 0.60; 'most': 0.60; 'break': 0.61; "you're": 0.61; "you'll": 0.62; 'to,': 0.72; 'subject:over': 0.84 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=Ta2qIsQtrZ1E9cFjqnWhP/kKHDBig8hGAjIn5ouPC0E=; b=LKqpzncgxsy7XXDT7vT5QQ1B5ufc02GITGvrFFaMB3R7EXOyhPGLs8euVk6NdYN1LS Orw0HgZDETRzIrst9E0HOJgkPhxX8xFrstc5xh7Gorm84N3eW7xDL47E8StIf4/7z8a4 oCU4BCpPcLD7np3JZQk2sFMmifrrLVobtGxo7+XIgctDu+U/Ugg7GBtP4sCFb5vxV0nn O+B0X+EE8QllXlbVqypjxtSqUYOnYplodIPkpKhWIaKSvu5LjZoi9/qasR1waWUqvcPB s86KGvS74ySYhuRe+PtR4vlVcTvcqrWXUG6zsKlj6i1Cdh8I8cjqpK98Y5U2fJCv6Ah+ QbMw== X-Received: by 10.152.87.169 with SMTP id az9mr5688lab.65.1379574331384; Thu, 19 Sep 2013 00:05:31 -0700 (PDT) MIME-Version: 1.0 Sender: joshua.landau.ws@gmail.com In-Reply-To: <523a67c3$0$29988$c3e8da3$5496439d@news.astraweb.com> References: <3018b3d4-f914-4c89-9f26-cd4b2af32e73@googlegroups.com> <523a67c3$0$29988$c3e8da3$5496439d@news.astraweb.com> From: Joshua Landau Date: Thu, 19 Sep 2013 08:04:51 +0100 X-Google-Sender-Auth: 3QkLqupyZDeRjnoh_9WFCnZp734 Subject: Re: iterating over a file with two pointers To: python-list Content-Type: text/plain; charset=UTF-8 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 51 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1379574349 news.xs4all.nl 15968 [2001:888:2000:d::a6]:57443 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:54417 Although "tee" is most certainly preferable because IO is far slower than the small amounts of memory "tee" will use, you do have this option: def iterate_file_lines(file): """ Iterate over lines in a file, unlike normal iteration this allows seeking. """ while True: line = thefile.readline() if not line: break yield line thefile = open("/tmp/thefile") thelines = iterate_file_lines(thefile) for line in thelines: print("Outer:", repr(line)) if is_start(line): outer_position = thefile.tell() for line in thelines: print("Inner:", repr(line)) if is_end(line): break thefile.seek(outer_position) It's simpler than having two files but probably not faster, "tee" will almost definitely be way better a choice (unless the subsections can't fit in memory) and it forfeits being able to change up the order of these things. If you want to change up the order to another defined order, you can think about storing the subsections, but if you want to support independent iteration you'll need to seek before every "readline" which is a bit silly. Basically, read it all into memory like Steven D'Aprano suggested. If you really don't want to, use "tee". If you can't handle non-constant memory usage (really? You're reading lines, man) I'd suggest my method. If you can't handle the inflexibility there, use multiple files. There, is that enough choices?