Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #103333 > unrolled thread
| Started by | Arshpreet Singh <arsh840@gmail.com> |
|---|---|
| First post | 2016-02-22 03:15 -0800 |
| Last post | 2016-02-22 09:11 -0700 |
| Articles | 5 — 4 participants |
Back to article view | Back to comp.lang.python
avoid for loop calling Generator function Arshpreet Singh <arsh840@gmail.com> - 2016-02-22 03:15 -0800
Re: avoid for loop calling Generator function Peter Otten <__peter__@web.de> - 2016-02-22 14:34 +0100
Re: avoid for loop calling Generator function Arshpreet Singh <arsh840@gmail.com> - 2016-02-22 07:38 -0800
Re: avoid for loop calling Generator function Chris Angelico <rosuav@gmail.com> - 2016-02-23 02:46 +1100
Re: avoid for loop calling Generator function Ian Kelly <ian.g.kelly@gmail.com> - 2016-02-22 09:11 -0700
| From | Arshpreet Singh <arsh840@gmail.com> |
|---|---|
| Date | 2016-02-22 03:15 -0800 |
| Subject | avoid for loop calling Generator function |
| Message-ID | <e5e4a934-4eeb-46ed-892f-cda9e903c1cd@googlegroups.com> |
Hi, I am converting PDF into text file, I am using following code.
from pypdf2 import PdfFileReader
def read_pdf(pdfFileName):
pdf = PdfFileReader(pdfFileName)
yield from (pg.extractText() for pg in pdf.pages)
for i in read_pdf('book.pdf'):
print(i)
I want to avoid for loop , I also tried to create another function and call read_pdf() inside that new function using yield from but I think I am missing real picture here
[toc] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2016-02-22 14:34 +0100 |
| Message-ID | <mailman.40.1456148110.20994.python-list@python.org> |
| In reply to | #103333 |
Arshpreet Singh wrote:
> Hi, I am converting PDF into text file, I am using following code.
>
> from pypdf2 import PdfFileReader
>
> def read_pdf(pdfFileName):
>
> pdf = PdfFileReader(pdfFileName)
>
> yield from (pg.extractText() for pg in pdf.pages)
>
> for i in read_pdf('book.pdf'):
> print(i)
>
> I want to avoid for loop , I also tried to create another function and
> call read_pdf() inside that new function using yield from but I think I am
> missing real picture here
While it is possible to replace the loop with
next(filter(print, read_pdf("book.pdf")), None)
or the slightly less convoluted
sys.stdout.writelines(map("{}\n".format, read_pdf("book.pdf")))
the for loop is the obvious and therefore recommended solution. Personally,
I would also replace
> yield from (pg.extractText() for pg in pdf.pages)
with the good old
for pg in pdf.pages:
yield pg.extractText()
and reserve the generator expression for occasions where it has a
demonstrable advantage in readability.
[toc] | [prev] | [next] | [standalone]
| From | Arshpreet Singh <arsh840@gmail.com> |
|---|---|
| Date | 2016-02-22 07:38 -0800 |
| Message-ID | <0c289f93-ce64-4eff-93f3-c70d7ff50817@googlegroups.com> |
| In reply to | #103341 |
On Monday, 22 February 2016 19:05:24 UTC+5:30, Peter Otten wrote:
> Arshpreet Singh wrote:
>
> > Hi, I am converting PDF into text file, I am using following code.
> >
> > from pypdf2 import PdfFileReader
> >
> > def read_pdf(pdfFileName):
> >
> > pdf = PdfFileReader(pdfFileName)
> >
> > yield from (pg.extractText() for pg in pdf.pages)
> >
> > for i in read_pdf('book.pdf'):
> > print(i)
> >
> > I want to avoid for loop , I also tried to create another function and
> > call read_pdf() inside that new function using yield from but I think I am
> > missing real picture here
>
> While it is possible to replace the loop with
>
> next(filter(print, read_pdf("book.pdf")), None)
Why we are w=using filter here?
> or the slightly less convoluted
>
> sys.stdout.writelines(map("{}\n".format, read_pdf("book.pdf")))
Actually I am using this function in Android App which is being built using Kivy, Where I am returning whole text into a file, So what you think will be more efficient way?
> the for loop is the obvious and therefore recommended solution. Personally,
> I would also replace
>
> > yield from (pg.extractText() for pg in pdf.pages)
>
> with the good old
>
> for pg in pdf.pages:
> yield pg.extractText()
>
> and reserve the generator expression for occasions where it has a
> demonstrable advantage in readability.
But when I am calling pdf_read() from nother function to avoid for loop why it is not working?
say:
def hello()
yield from read_pdf('book.pdf')
print(hello()) # still returns memory location instead of text. If I am not wrong yield from can be used to avoid for loop?
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2016-02-23 02:46 +1100 |
| Message-ID | <mailman.42.1456156002.20994.python-list@python.org> |
| In reply to | #103343 |
On Tue, Feb 23, 2016 at 2:38 AM, Arshpreet Singh <arsh840@gmail.com> wrote:
>> next(filter(print, read_pdf("book.pdf")), None)
>
> Why we are w=using filter here?
It's a beautiful hack. It'll filter according to the "print"
predicate, which always returns None, and will thus filter everything
out. One single call to next() will thus process and print the entire
PDF, and then - since it has a second parameter - return None instead
of raising StopIteration.
Or maybe it's a gross and ugly hack, with the exact same description.
ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2016-02-22 09:11 -0700 |
| Message-ID | <mailman.43.1456157520.20994.python-list@python.org> |
| In reply to | #103343 |
On Mon, Feb 22, 2016 at 8:38 AM, Arshpreet Singh <arsh840@gmail.com> wrote:
> On Monday, 22 February 2016 19:05:24 UTC+5:30, Peter Otten wrote:
>> or the slightly less convoluted
>>
>> sys.stdout.writelines(map("{}\n".format, read_pdf("book.pdf")))
>
> Actually I am using this function in Android App which is being built
using Kivy, Where I am returning whole text into a file, So what you think
will be more efficient way?
Profile them and find out, but I don't think you'll find the difference is
great enough to be overly concerned with. Pick the way that is more
readable and doesn't introduce any gross inefficiencies (such as
concatenating strings in a loop).
> But when I am calling pdf_read() from nother function to avoid for loop
why it is not working?
> say:
>
> def hello()
> yield from read_pdf('book.pdf')
This uses yield from, which makes it a generator function.
>
> print(hello()) # still returns memory location instead of text. If I am
not wrong yield from can be used to avoid for loop?
hello is a generator function, so calling it just creates a generator
object. Printing it then prints out the repr of that generator object,
which is just something like <generator object hello at 0x7f6e82b124c0>.
Notably, you haven't actually *executed* the generator object, which would
require iterating over it, e.g.:
for i in hello():
print(i)
So you haven't actually avoided creating a for loop; you've just added a
redundant layer between the for loop and the thing it's actually iterating
over.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web