Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder2.enfer-du-nord.net!tudelft.nl!txtfeed1.tudelft.nl!feeder3.cambriumusenet.nl!feed.tweaknews.nl!194.134.4.91.MISMATCH!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
Sender: chris@rebertia.com
In-Reply-To: <a54dcb32-1ecd-4186-81a7-3a55c275c9b0@4g2000pbz.googlegroups.com>
References: <a54dcb32-1ecd-4186-81a7-3a55c275c9b0@4g2000pbz.googlegroups.com>
Date: Sat, 28 Jan 2012 21:59:29 -0800
Subject: Re: Reading Adobe PDF File
From: Chris Rebert <clp2@rebertia.com>
To: Shrewd Investor <cltung@gmail.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: python-list@python.org
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.5191.1327816772.27778.python-list@python.org>
Lines: 19
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:19572

On Sat, Jan 28, 2012 at 9:52 PM, Shrewd Investor <cltung@gmail.com> wrote:
> Hi,
>
> I have a very large Adobe PDF file. =C2=A0I was hoping to use a script to
> extract the information for it. =C2=A0Is there a way to loop through a PD=
F
> file using Python?

Haven't used it myself, but:
http://www.unixuser.org/~euske/python/pdfminer/

> Or do I need to find a way to convert a PDF file into a text file? =C2=A0=
If
> so how?

The pdf2txt.py script from the same package happens to do exactly this.

Cheers,
Chris