Groups > comp.lang.python > #64101 > unrolled thread

Re: Converting folders of jpegs to single pdf per folder

Started by	Tim Golden <mail@timgolden.me.uk>
First post	2014-01-16 19:41 +0000
Last post	2014-01-16 12:20 -0800
Articles	9 — 3 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.

  Re: Converting folders of jpegs to single pdf per folder Tim Golden <mail@timgolden.me.uk> - 2014-01-16 19:41 +0000
    Re: Converting folders of jpegs to single pdf per folder vasishtha.spier@gmail.com - 2014-01-16 11:50 -0800
      Re: Converting folders of jpegs to single pdf per folder Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-01-16 20:01 +0000
      Re: Converting folders of jpegs to single pdf per folder Tim Golden <mail@timgolden.me.uk> - 2014-01-16 20:07 +0000
        Re: Converting folders of jpegs to single pdf per folder vasishtha.spier@gmail.com - 2014-01-16 21:42 -0800
          Re: Converting folders of jpegs to single pdf per folder Tim Golden <mail@timgolden.me.uk> - 2014-01-17 08:53 +0000
          Re: Converting folders of jpegs to single pdf per folder Tim Golden <mail@timgolden.me.uk> - 2014-01-17 09:01 +0000
      Re: Converting folders of jpegs to single pdf per folder Tim Golden <mail@timgolden.me.uk> - 2014-01-16 20:12 +0000
        Re: Converting folders of jpegs to single pdf per folder vasishtha.spier@gmail.com - 2014-01-16 12:20 -0800

#64101 — Re: Converting folders of jpegs to single pdf per folder

From	Tim Golden <mail@timgolden.me.uk>
Date	2014-01-16 19:41 +0000
Subject	Re: Converting folders of jpegs to single pdf per folder
Message-ID	<mailman.5602.1389901256.18130.python-list@python.org>

On 16/01/2014 19:11, Harry Spier wrote:
>
> Dear list members,
>
> I have a directory that contains about a hundred subdirectories named
> J0001,J0002,J0003 . . . etc.
> Each of these subdirectories contains about a hundred JPEGs named
> P001.jpg, P002.jpg, P003.jpg etc.
>
> I need to write a python script that will cycle thru each directory and
> convert ALL JPEGs in each directory into a single PDF file and save
> these PDF files (one per directory) to an output file.
>
> Any pointers on how to do this with a Python script would be
> appreciated. Reading on the internet it appears that using ImageMagick
> wouldn't work because of using too much memory. Can this be done using
> the Python Image Library or some other library? Any sample code would
> also be appreciated.

The usual go-to library for PDF generation is ReportLab. I haven't used 
it for a long while but I'm quite certain it would have no problem 
including images.

Do I take it that it's the PDF-generation side of things you're asking 
about? Or do you need help iterating over hundreds of directories and files?

TJG

[toc] | [next] | [standalone]

#64103

From	vasishtha.spier@gmail.com
Date	2014-01-16 11:50 -0800
Message-ID	<0c648d2d-d8b3-4848-8b37-1e5e1ae40327@googlegroups.com>
In reply to	#64101

On Thursday, January 16, 2014 11:41:04 AM UTC-8, Tim Golden wrote:
> On 16/01/2014 19:11, Harry Spier wrote:
> 
> >
> 
> > Dear list members,
> 
> >
> 
> > I have a directory that contains about a hundred subdirectories named
> 
> > J0001,J0002,J0003 . . . etc.
> 
> > Each of these subdirectories contains about a hundred JPEGs named
> 
> > P001.jpg, P002.jpg, P003.jpg etc.
> 
> >
> 
> > I need to write a python script that will cycle thru each directory and
> 
> > convert ALL JPEGs in each directory into a single PDF file and save
> 
> > these PDF files (one per directory) to an output file.
> 
> >
> 
> > Any pointers on how to do this with a Python script would be
> 
> > appreciated. Reading on the internet it appears that using ImageMagick
> 
> > wouldn't work because of using too much memory. Can this be done using
> 
> > the Python Image Library or some other library? Any sample code would
> 
> > also be appreciated.
> 
> 
> 
> The usual go-to library for PDF generation is ReportLab. I haven't used 
> 
> it for a long while but I'm quite certain it would have no problem 
> 
> including images.
> 
> 
> 
> Do I take it that it's the PDF-generation side of things you're asking 
> 
> about? Or do you need help iterating over hundreds of directories and files?
> 
> 
> 
> TJG

Its mostly the PDF generating side I need but I haven't yet used the Python directory and file traversing functions so an example of this would also be useful especially showing how I could capture the directory name and use that as the name of the pdf file I'm creating from the directory contents.

Thanks again,
Harry

[toc] | [prev] | [next] | [standalone]

#64104

From	Mark Lawrence <breamoreboy@yahoo.co.uk>
Date	2014-01-16 20:01 +0000
Message-ID	<mailman.5603.1389902490.18130.python-list@python.org>
In reply to	#64103

On 16/01/2014 19:50, vasishtha.spier@gmail.com wrote:
> On Thursday, January 16, 2014 11:41:04 AM UTC-8, Tim Golden wrote:
>> On 16/01/2014 19:11, Harry Spier wrote:
>>
>>>
>>
>>> Dear list members,
>>
>>>
>>
>>> I have a directory that contains about a hundred subdirectories named
>>
>>> J0001,J0002,J0003 . . . etc.
>>
>>> Each of these subdirectories contains about a hundred JPEGs named
>>
>>> P001.jpg, P002.jpg, P003.jpg etc.
>>
>>>
>>
>>> I need to write a python script that will cycle thru each directory and
>>
>>> convert ALL JPEGs in each directory into a single PDF file and save
>>
>>> these PDF files (one per directory) to an output file.
>>
>>>
>>
>>> Any pointers on how to do this with a Python script would be
>>
>>> appreciated. Reading on the internet it appears that using ImageMagick
>>
>>> wouldn't work because of using too much memory. Can this be done using
>>
>>> the Python Image Library or some other library? Any sample code would
>>
>>> also be appreciated.
>>
>>
>>
>> The usual go-to library for PDF generation is ReportLab. I haven't used
>>
>> it for a long while but I'm quite certain it would have no problem
>>
>> including images.
>>
>>
>>
>> Do I take it that it's the PDF-generation side of things you're asking
>>
>> about? Or do you need help iterating over hundreds of directories and files?
>>
>>
>>
>> TJG
>
> Its mostly the PDF generating side I need but I haven't yet used the Python directory and file traversing functions so an example of this would also be useful especially showing how I could capture the directory name and use that as the name of the pdf file I'm creating from the directory contents.
>
> Thanks again,
> Harry
>

I'm sorry that I can't help with your problem, but would you please read 
and action this https://wiki.python.org/moin/GoogleGroupsPython to 
prevent us seeing the double line spacing above, thanks.

-- 
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.

Mark Lawrence

[toc] | [prev] | [next] | [standalone]

#64105

From	Tim Golden <mail@timgolden.me.uk>
Date	2014-01-16 20:07 +0000
Message-ID	<mailman.5604.1389902872.18130.python-list@python.org>
In reply to	#64103

On 16/01/2014 19:50, vasishtha.spier@gmail.com wrote:
> On Thursday, January 16, 2014 11:41:04 AM UTC-8, Tim Golden wrote:
> The usual go-to library for PDF generation is ReportLab. I haven't used
>>
>> it for a long while but I'm quite certain it would have no problem
>>
>> including images.
>>
>>
>>
>> Do I take it that it's the PDF-generation side of things you're asking
>>
>> about? Or do you need help iterating over hundreds of directories and files?
>>
>>
>>
>> TJG
>
> Its mostly the PDF generating side I need but I haven't yet used the Python directory and file traversing functions so an example of this would also be useful especially showing how I could capture the directory name and use that as the name of the pdf file I'm creating from the directory contents.
>
> Thanks again,
> Harry
>

Here's a quick example. (And, by the way, please try to avoid the sort 
of double-spacing above, especially if you're coming from Google Groups 
which tends to produce such effects).

This should walk down the Python directory, creating a text file for 
each directory. The textfile will contain the names of all the files in 
the directory. (NB this might create a lot of text files so run it 
inside some temp directory).

<code>
import os

root = "c:/temp"

for dirpath, dirnames, filenames in os.walk(root):
     print("Looking at", dirpath)
     txt_filename = os.path.basename(dirpath) + ".txt"
     with open(txt_filename, "w") as f:
       f.write("\n".join(filenames))

</code>


TJG

[toc] | [prev] | [next] | [standalone]

#64141

From	vasishtha.spier@gmail.com
Date	2014-01-16 21:42 -0800
Message-ID	<96cfad0d-143f-4084-827f-a67006bf6db2@googlegroups.com>
In reply to	#64105

On Thursday, January 16, 2014 12:07:59 PM UTC-8, Tim Golden wrote:
>
> Here's a quick example. 
> This should walk down the Python directory, creating a text file for 
> each directory. The textfile will contain the names of all the files in 
>  the directory. (NB this might create a lot of text files so run it 
>  inside some temp directory).

> <code>
> import os
> root = "c:/temp"
> for dirpath, dirnames, filenames in os.walk(root):
>      print("Looking at", dirpath)
>      txt_filename = os.path.basename(dirpath) + ".txt"
>      with open(txt_filename, "w") as f:
>        f.write("\n".join(filenames)
> </code>
> TJG

Thanks Tim.  It worked like a charm and saved me weeks of work using a drag and drop utility. About 250 pdf files created of 50 to 100 pages each.  Heres the code in case any one else can use it.  
----------------
import os
from reportlab.pdfgen import canvas
from reportlab.lib.utils import ImageReader

root = "C:\\Users\\Harry\\" 

try:
     n = 0
     for dirpath, dirnames, filenames in os.walk(root):
          PdfOutputFileName = os.path.basename(dirpath) + ".pdf" 
          c = canvas.Canvas(PdfOutputFileName)
          if n > 0 :
               for filename in filenames:
                    LowerCaseFileName = filename.lower()
                    if LowerCaseFileName.endswith(".jpg"):
                         print(filename)
                         filepath    = os.path.join(dirpath, filename)
                         print(filepath)
                         im          = ImageReader(filepath)
                         imagesize   = im.getSize()
                         c.setPageSize(imagesize)
                         c.drawImage(filepath,0,0)
                         c.showPage()
                         c.save()
          n = n + 1
          print "PDF of Image directory created" + PdfOutputFileName
           
except:
     print "Failed creating PDF"
-------------------------

[toc] | [prev] | [next] | [standalone]

#64144

From	Tim Golden <mail@timgolden.me.uk>
Date	2014-01-17 08:53 +0000
Message-ID	<mailman.5625.1389948805.18130.python-list@python.org>
In reply to	#64141

On 17/01/2014 05:42, vasishtha.spier@gmail.com wrote:
> On Thursday, January 16, 2014 12:07:59 PM UTC-8, Tim Golden wrote:
>> 
>> Here's a quick example. This should walk down the Python directory,
>> creating a text file for each directory. The textfile will contain
>> the names of all the files in the directory. (NB this might create
>> a lot of text files so run it inside some temp directory).

[.. snip sample code ...]

> 
> Thanks Tim.  It worked like a charm and saved me weeks of work using
> a drag and drop utility. About 250 pdf files created of 50 to 100
> pages each.  Heres the code in case any one else can use it. [snip]

Glad it was helpful. And thanks for coming back with the solution:
hopefully future searchers will find it useful. (And it's a great advert
for how easy it is to do useful things in just a few lines of Python).

TJG

[toc] | [prev] | [next] | [standalone]

#64146

From	Tim Golden <mail@timgolden.me.uk>
Date	2014-01-17 09:01 +0000
Message-ID	<mailman.5626.1389949315.18130.python-list@python.org>
In reply to	#64141

On 17/01/2014 05:42, vasishtha.spier@gmail.com wrote:
> try:
>      n = 0
>      for dirpath, dirnames, filenames in os.walk(root):
>           PdfOutputFileName = os.path.basename(dirpath) + ".pdf" 
>           c = canvas.Canvas(PdfOutputFileName)
>           if n > 0 :
>                for filename in filenames:
>                     LowerCaseFileName = filename.lower()
>                     if LowerCaseFileName.endswith(".jpg"):
>                          print(filename)
>                          filepath    = os.path.join(dirpath, filename)
>                          print(filepath)
>                          im          = ImageReader(filepath)
>                          imagesize   = im.getSize()
>                          c.setPageSize(imagesize)
>                          c.drawImage(filepath,0,0)
>                          c.showPage()
>                          c.save()
>           n = n + 1
>           print "PDF of Image directory created" + PdfOutputFileName
>            
> except:
>      print "Failed creating PDF"
> -------------------------

One thing I would point out (assuming that this is your final code):
your try-except is too broad, both in terms of the code it encloses and
in terms of the exceptions it traps.

As it stands, your code will drop straight out as soon as it hits an
error, with the message "Failed creating PDF" -- which is what it would
have done anyway, only you've removed the informative traceback which
would have told you what went wrong!

In the circumstances, you presumably want to attempt to recover from
some failure (perhaps caused by a corrupt JPEG or a permissions issue)
and continue to generate the remaning PDFs. In that case, you'd do
better a structure of this sort:

<semi-pseudocode>
import logging
logging.basicConfig()

for d, ds, fs in os.walk("..."):

   # init pdf
   try:
      # create PDF
   except:
     logging.exception("Couldn't create PDF for %s", d)
     continue
   else:
     logging.info("Created PDF for %s", d)
     # write PDF

</semi-pseudocode>

If you could narrow down the range of exceptions you want to recover
from, that would go in the "except:" clause, but in this situation you
might not be in a position to do that.

TJG

[toc] | [prev] | [next] | [standalone]

#64106

From	Tim Golden <mail@timgolden.me.uk>
Date	2014-01-16 20:12 +0000
Message-ID	<mailman.5605.1389903114.18130.python-list@python.org>
In reply to	#64103

On 16/01/2014 20:07, Tim Golden wrote:
> This should walk down the Python directory,

s/the Python directory/some directory/

(Sorry, I initially had it walking os.path.dirname(sys.executable))

TJG

[toc] | [prev] | [next] | [standalone]

#64107

From	vasishtha.spier@gmail.com
Date	2014-01-16 12:20 -0800
Message-ID	<0ffbcdd6-05c7-46c7-b5b1-25c4069aa955@googlegroups.com>
In reply to	#64106

On Thursday, January 16, 2014 12:12:01 PM UTC-8, Tim Golden wrote:
> On 16/01/2014 20:07, Tim Golden wrote:
> 
> > This should walk down the Python directory,
> s/the Python directory/some directory/
> (Sorry, I initially had it walking os.path.dirname(sys.executable))
> TJG

Thanks Tim thats very helpful. Sorry about the double lines.  For some reason I wasn't getting the posts directly in my email and was using Google Groups.  I've changed my subscription parameters and hopefully I'll get the replies directly.
Cheers,
Harry

[toc] | [prev] | [standalone]

csiph-web

Re: Converting folders of jpegs to single pdf per folder

Contents

#64101 — Re: Converting folders of jpegs to single pdf per folder

#64103

#64104

#64105

#64141

#64144

#64146

#64106

#64107