Groups > comp.lang.python > #103333 > unrolled thread

avoid for loop calling Generator function

Started by	Arshpreet Singh <arsh840@gmail.com>
First post	2016-02-22 03:15 -0800
Last post	2016-02-22 09:11 -0700
Articles	5 — 4 participants

Back to article view | Back to comp.lang.python

  avoid for loop calling Generator function Arshpreet Singh <arsh840@gmail.com> - 2016-02-22 03:15 -0800
    Re: avoid for loop calling Generator function Peter Otten <__peter__@web.de> - 2016-02-22 14:34 +0100
      Re: avoid for loop calling Generator function Arshpreet Singh <arsh840@gmail.com> - 2016-02-22 07:38 -0800
        Re: avoid for loop calling Generator function Chris Angelico <rosuav@gmail.com> - 2016-02-23 02:46 +1100
        Re: avoid for loop calling Generator function Ian Kelly <ian.g.kelly@gmail.com> - 2016-02-22 09:11 -0700

#103333 — avoid for loop calling Generator function

From	Arshpreet Singh <arsh840@gmail.com>
Date	2016-02-22 03:15 -0800
Subject	avoid for loop calling Generator function
Message-ID	<e5e4a934-4eeb-46ed-892f-cda9e903c1cd@googlegroups.com>

Hi, I am converting PDF into text file, I am using following code.

from pypdf2 import PdfFileReader 

def read_pdf(pdfFileName):
	
    pdf = PdfFileReader(pdfFileName) 

    yield from (pg.extractText() for pg in pdf.pages)

for i in read_pdf('book.pdf'):
     print(i)

I want to avoid for loop , I also tried to create another function and call read_pdf() inside that new function using yield from but I think I am missing real picture here

[toc] | [next] | [standalone]

#103341

From	Peter Otten <__peter__@web.de>
Date	2016-02-22 14:34 +0100
Message-ID	<mailman.40.1456148110.20994.python-list@python.org>
In reply to	#103333

Arshpreet Singh wrote:

> Hi, I am converting PDF into text file, I am using following code.
> 
> from pypdf2 import PdfFileReader
> 
> def read_pdf(pdfFileName):
> 
>     pdf = PdfFileReader(pdfFileName)
> 
>     yield from (pg.extractText() for pg in pdf.pages)
> 
> for i in read_pdf('book.pdf'):
>      print(i)
> 
> I want to avoid for loop , I also tried to create another function and
> call read_pdf() inside that new function using yield from but I think I am
> missing real picture here

While it is possible to replace the loop with

next(filter(print, read_pdf("book.pdf")), None)

or the slightly less convoluted

sys.stdout.writelines(map("{}\n".format, read_pdf("book.pdf")))

the for loop is the obvious and therefore recommended solution. Personally, 
I would also replace

>     yield from (pg.extractText() for pg in pdf.pages)

with the good old

for pg in pdf.pages:
    yield pg.extractText()

and reserve the generator expression for occasions where it has a 
demonstrable advantage in readability.

[toc] | [prev] | [next] | [standalone]

#103343

From	Arshpreet Singh <arsh840@gmail.com>
Date	2016-02-22 07:38 -0800
Message-ID	<0c289f93-ce64-4eff-93f3-c70d7ff50817@googlegroups.com>
In reply to	#103341

On Monday, 22 February 2016 19:05:24 UTC+5:30, Peter Otten  wrote:
> Arshpreet Singh wrote:
> 
> > Hi, I am converting PDF into text file, I am using following code.
> > 
> > from pypdf2 import PdfFileReader
> > 
> > def read_pdf(pdfFileName):
> > 
> >     pdf = PdfFileReader(pdfFileName)
> > 
> >     yield from (pg.extractText() for pg in pdf.pages)
> > 
> > for i in read_pdf('book.pdf'):
> >      print(i)
> > 
> > I want to avoid for loop , I also tried to create another function and
> > call read_pdf() inside that new function using yield from but I think I am
> > missing real picture here
> 
> While it is possible to replace the loop with
> 
> next(filter(print, read_pdf("book.pdf")), None)

Why we are w=using filter here?
 
> or the slightly less convoluted
> 
> sys.stdout.writelines(map("{}\n".format, read_pdf("book.pdf")))

Actually I am using this function in Android App which is being built using Kivy, Where I am returning whole text into a file, So what you think will be more efficient way? 

> the for loop is the obvious and therefore recommended solution. Personally, 
> I would also replace
> 
> >     yield from (pg.extractText() for pg in pdf.pages)
> 
> with the good old
> 
> for pg in pdf.pages:
>     yield pg.extractText()
> 
> and reserve the generator expression for occasions where it has a 
> demonstrable advantage in readability.
But when I am calling pdf_read() from nother function to avoid for loop why it is not working?
say:

def hello()
    yield from read_pdf('book.pdf')

print(hello()) # still returns memory location instead of text. If I am not wrong yield from can be used to avoid for loop?

[toc] | [prev] | [next] | [standalone]

#103344

From	Chris Angelico <rosuav@gmail.com>
Date	2016-02-23 02:46 +1100
Message-ID	<mailman.42.1456156002.20994.python-list@python.org>
In reply to	#103343

On Tue, Feb 23, 2016 at 2:38 AM, Arshpreet Singh <arsh840@gmail.com> wrote:
>> next(filter(print, read_pdf("book.pdf")), None)
>
> Why we are w=using filter here?

It's a beautiful hack. It'll filter according to the "print"
predicate, which always returns None, and will thus filter everything
out. One single call to next() will thus process and print the entire
PDF, and then - since it has a second parameter - return None instead
of raising StopIteration.

Or maybe it's a gross and ugly hack, with the exact same description.

ChrisA

[toc] | [prev] | [next] | [standalone]

#103346

From	Ian Kelly <ian.g.kelly@gmail.com>
Date	2016-02-22 09:11 -0700
Message-ID	<mailman.43.1456157520.20994.python-list@python.org>
In reply to	#103343

On Mon, Feb 22, 2016 at 8:38 AM, Arshpreet Singh <arsh840@gmail.com> wrote:
> On Monday, 22 February 2016 19:05:24 UTC+5:30, Peter Otten  wrote:
>> or the slightly less convoluted
>>
>> sys.stdout.writelines(map("{}\n".format, read_pdf("book.pdf")))
>
> Actually I am using this function in Android App which is being built
using Kivy, Where I am returning whole text into a file, So what you think
will be more efficient way?

Profile them and find out, but I don't think you'll find the difference is
great enough to be overly concerned with. Pick the way that is more
readable and doesn't introduce any gross inefficiencies (such as
concatenating strings in a loop).

> But when I am calling pdf_read() from nother function to avoid for loop
why it is not working?
> say:
>
> def hello()
>     yield from read_pdf('book.pdf')

This uses yield from, which makes it a generator function.

>
> print(hello()) # still returns memory location instead of text. If I am
not wrong yield from can be used to avoid for loop?

hello is a generator function, so calling it just creates a generator
object. Printing it then prints out the repr of that generator object,
which is just something like <generator object hello at 0x7f6e82b124c0>.

Notably, you haven't actually *executed* the generator object, which would
require iterating over it, e.g.:

for i in hello():
    print(i)

So you haven't actually avoided creating a for loop; you've just added a
redundant layer between the for loop and the thing it's actually iterating
over.

[toc] | [prev] | [standalone]

csiph-web

avoid for loop calling Generator function

Contents

#103333 — avoid for loop calling Generator function

#103341

#103343

#103344

#103346