Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #19570 > unrolled thread

Reading Adobe PDF File

Started byShrewd Investor <cltung@gmail.com>
First post2012-01-28 21:52 -0800
Last post2012-01-30 08:22 -0500
Articles 5 — 4 participants

Back to article view | Back to comp.lang.python


Contents

  Reading Adobe PDF File Shrewd Investor <cltung@gmail.com> - 2012-01-28 21:52 -0800
    Re: Reading Adobe PDF File Shrewd Investor <cltung@gmail.com> - 2012-01-28 21:52 -0800
    Re: Reading Adobe PDF File Chris Rebert <clp2@rebertia.com> - 2012-01-28 21:59 -0800
    Re: Reading Adobe PDF File Matej Cepl <mcepl@redhat.com> - 2012-01-30 09:09 +0100
    Re: Reading Adobe PDF File Adam Tauno Williams <awilliam@whitemice.org> - 2012-01-30 08:22 -0500

#19570 — Reading Adobe PDF File

FromShrewd Investor <cltung@gmail.com>
Date2012-01-28 21:52 -0800
SubjectReading Adobe PDF File
Message-ID<a54dcb32-1ecd-4186-81a7-3a55c275c9b0@4g2000pbz.googlegroups.com>
Hi,

I have a very large Adobe PDF file.  I was hoping to use a script to
extract the information for it.  Is there a way to loop through a PDF
file using Python?

Or do I need to find a way to convert a PDF file into a text file?  If
so how?

Thanks!

[toc] | [next] | [standalone]


#19571

FromShrewd Investor <cltung@gmail.com>
Date2012-01-28 21:52 -0800
Message-ID<de3d9d25-7ece-49a7-b583-c2d697bf372c@ra5g2000pbc.googlegroups.com>
In reply to#19570
If somebody could point to me a resource (past posting, web page, and
book) so I could learn or by posting a sample script that would be
greatly appreciated?

On Jan 28, 9:52 pm, Shrewd Investor <clt...@gmail.com> wrote:
> Hi,
>
> I have a very large Adobe PDF file.  I was hoping to use a script to
> extract the information for it.  Is there a way to loop through a PDF
> file using Python?
>
> Or do I need to find a way to convert a PDF file into a text file?  If
> so how?
>
> Thanks!
>
>

[toc] | [prev] | [next] | [standalone]


#19572

FromChris Rebert <clp2@rebertia.com>
Date2012-01-28 21:59 -0800
Message-ID<mailman.5191.1327816772.27778.python-list@python.org>
In reply to#19570
On Sat, Jan 28, 2012 at 9:52 PM, Shrewd Investor <cltung@gmail.com> wrote:
> Hi,
>
> I have a very large Adobe PDF file.  I was hoping to use a script to
> extract the information for it.  Is there a way to loop through a PDF
> file using Python?

Haven't used it myself, but:
http://www.unixuser.org/~euske/python/pdfminer/

> Or do I need to find a way to convert a PDF file into a text file?  If
> so how?

The pdf2txt.py script from the same package happens to do exactly this.

Cheers,
Chris

[toc] | [prev] | [next] | [standalone]


#19597

FromMatej Cepl <mcepl@redhat.com>
Date2012-01-30 09:09 +0100
Message-ID<mailman.5210.1327917151.27778.python-list@python.org>
In reply to#19570
On 29.1.2012 06:52, Shrewd Investor wrote:
> Or do I need to find a way to convert a PDF file into a text file?  If
> so how?

http://en.wikipedia.org/wiki/Pdftotext ?

[toc] | [prev] | [next] | [standalone]


#19599

FromAdam Tauno Williams <awilliam@whitemice.org>
Date2012-01-30 08:22 -0500
Message-ID<mailman.5213.1327929838.27778.python-list@python.org>
In reply to#19570
On Sat, 2012-01-28 at 21:59 -0800, Chris Rebert wrote:
> On Sat, Jan 28, 2012 at 9:52 PM, Shrewd Investor <cltung@gmail.com> wrote:
> > I have a very large Adobe PDF file.  I was hoping to use a script to
> > extract the information for it.  Is there a way to loop through a PDF
> > file using Python?
> Haven't used it myself, but:
> http://www.unixuser.org/~euske/python/pdfminer/

It is very prone to hanging and/or crashing.  I haven't yet found a
really reliably way to read text from a PDF.

PyPDF provides a PdfFileReader class with an extractText method.  The
output is indeed the text although it can be a bit thorny to look at.

> > Or do I need to find a way to convert a PDF file into a text file?  If
> > so how?
> The pdf2txt.py script from the same package happens to do exactly this.


-- 
System & Network Administrator [ LPI & NCLA ]
<http://www.whitemiceconsulting.com>
OpenGroupware Developer <http://www.opengroupware.us>
Adam Tauno Williams

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web