Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #19570 > unrolled thread
| Started by | Shrewd Investor <cltung@gmail.com> |
|---|---|
| First post | 2012-01-28 21:52 -0800 |
| Last post | 2012-01-30 08:22 -0500 |
| Articles | 5 — 4 participants |
Back to article view | Back to comp.lang.python
Reading Adobe PDF File Shrewd Investor <cltung@gmail.com> - 2012-01-28 21:52 -0800
Re: Reading Adobe PDF File Shrewd Investor <cltung@gmail.com> - 2012-01-28 21:52 -0800
Re: Reading Adobe PDF File Chris Rebert <clp2@rebertia.com> - 2012-01-28 21:59 -0800
Re: Reading Adobe PDF File Matej Cepl <mcepl@redhat.com> - 2012-01-30 09:09 +0100
Re: Reading Adobe PDF File Adam Tauno Williams <awilliam@whitemice.org> - 2012-01-30 08:22 -0500
| From | Shrewd Investor <cltung@gmail.com> |
|---|---|
| Date | 2012-01-28 21:52 -0800 |
| Subject | Reading Adobe PDF File |
| Message-ID | <a54dcb32-1ecd-4186-81a7-3a55c275c9b0@4g2000pbz.googlegroups.com> |
Hi, I have a very large Adobe PDF file. I was hoping to use a script to extract the information for it. Is there a way to loop through a PDF file using Python? Or do I need to find a way to convert a PDF file into a text file? If so how? Thanks!
[toc] | [next] | [standalone]
| From | Shrewd Investor <cltung@gmail.com> |
|---|---|
| Date | 2012-01-28 21:52 -0800 |
| Message-ID | <de3d9d25-7ece-49a7-b583-c2d697bf372c@ra5g2000pbc.googlegroups.com> |
| In reply to | #19570 |
If somebody could point to me a resource (past posting, web page, and book) so I could learn or by posting a sample script that would be greatly appreciated? On Jan 28, 9:52 pm, Shrewd Investor <clt...@gmail.com> wrote: > Hi, > > I have a very large Adobe PDF file. I was hoping to use a script to > extract the information for it. Is there a way to loop through a PDF > file using Python? > > Or do I need to find a way to convert a PDF file into a text file? If > so how? > > Thanks! > >
[toc] | [prev] | [next] | [standalone]
| From | Chris Rebert <clp2@rebertia.com> |
|---|---|
| Date | 2012-01-28 21:59 -0800 |
| Message-ID | <mailman.5191.1327816772.27778.python-list@python.org> |
| In reply to | #19570 |
On Sat, Jan 28, 2012 at 9:52 PM, Shrewd Investor <cltung@gmail.com> wrote: > Hi, > > I have a very large Adobe PDF file. I was hoping to use a script to > extract the information for it. Is there a way to loop through a PDF > file using Python? Haven't used it myself, but: http://www.unixuser.org/~euske/python/pdfminer/ > Or do I need to find a way to convert a PDF file into a text file? If > so how? The pdf2txt.py script from the same package happens to do exactly this. Cheers, Chris
[toc] | [prev] | [next] | [standalone]
| From | Matej Cepl <mcepl@redhat.com> |
|---|---|
| Date | 2012-01-30 09:09 +0100 |
| Message-ID | <mailman.5210.1327917151.27778.python-list@python.org> |
| In reply to | #19570 |
On 29.1.2012 06:52, Shrewd Investor wrote: > Or do I need to find a way to convert a PDF file into a text file? If > so how? http://en.wikipedia.org/wiki/Pdftotext ?
[toc] | [prev] | [next] | [standalone]
| From | Adam Tauno Williams <awilliam@whitemice.org> |
|---|---|
| Date | 2012-01-30 08:22 -0500 |
| Message-ID | <mailman.5213.1327929838.27778.python-list@python.org> |
| In reply to | #19570 |
On Sat, 2012-01-28 at 21:59 -0800, Chris Rebert wrote: > On Sat, Jan 28, 2012 at 9:52 PM, Shrewd Investor <cltung@gmail.com> wrote: > > I have a very large Adobe PDF file. I was hoping to use a script to > > extract the information for it. Is there a way to loop through a PDF > > file using Python? > Haven't used it myself, but: > http://www.unixuser.org/~euske/python/pdfminer/ It is very prone to hanging and/or crashing. I haven't yet found a really reliably way to read text from a PDF. PyPDF provides a PdfFileReader class with an extractText method. The output is indeed the text although it can be a bit thorny to look at. > > Or do I need to find a way to convert a PDF file into a text file? If > > so how? > The pdf2txt.py script from the same package happens to do exactly this. -- System & Network Administrator [ LPI & NCLA ] <http://www.whitemiceconsulting.com> OpenGroupware Developer <http://www.opengroupware.us> Adam Tauno Williams
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web