Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #59959 > unrolled thread

Using try-catch to handle multiple possible file types?

Started byVictor Hooi <victorhooi@gmail.com>
First post2013-11-18 23:13 -0800
Last post2013-11-20 01:50 +0000
Articles 8 — 6 participants

Back to article view | Back to comp.lang.python


Contents

  Using try-catch to handle multiple possible file types? Victor Hooi <victorhooi@gmail.com> - 2013-11-18 23:13 -0800
    Re: Using try-catch to handle multiple possible file types? Chris Angelico <rosuav@gmail.com> - 2013-11-19 18:22 +1100
    Re: Using try-catch to handle multiple possible file types? Amit Saha <amitsaha.in@gmail.com> - 2013-11-19 17:22 +1000
    Re: Using try-catch to handle multiple possible file types? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-11-19 09:36 +0000
      Re: Using try-catch to handle multiple possible file types? Victor Hooi <victorhooi@gmail.com> - 2013-11-19 16:30 -0800
        Re: Using try-catch to handle multiple possible file types? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-11-20 01:56 +0000
          Re: Using try-catch to handle multiple possible file types? Neil Cerutti <mr.cerutti@gmail.com> - 2013-11-20 10:05 -0500
        Re: Using try-catch to handle multiple possible file types? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-11-20 01:50 +0000

#59959 — Using try-catch to handle multiple possible file types?

FromVictor Hooi <victorhooi@gmail.com>
Date2013-11-18 23:13 -0800
SubjectUsing try-catch to handle multiple possible file types?
Message-ID<8379f7c2-c248-4a67-82ed-2d288a1635d2@googlegroups.com>
Hi,

I have a script that needs to handle input files of different types (uncompressed, gzipped etc.).

My question is regarding how I should handle the different cases.

My first thought was to use a try-catch block and attempt to open it using the most common filetype, then if that failed, try the next most common type etc. before finally erroring out.

So basically, using exception handling for flow-control.

However, is that considered bad practice, or un-Pythonic?

What other alternative constructs could I also use, and pros and cons?

(I was thinking I could also use python-magic which wraps libmagic, or I can just rely on file extensions).

Other thoughts?

Cheers,
Victor

[toc] | [next] | [standalone]


#59961

FromChris Angelico <rosuav@gmail.com>
Date2013-11-19 18:22 +1100
Message-ID<mailman.2889.1384845774.18130.python-list@python.org>
In reply to#59959
On Tue, Nov 19, 2013 at 6:13 PM, Victor Hooi <victorhooi@gmail.com> wrote:
> My first thought was to use a try-catch block and attempt to open it using the most common filetype, then if that failed, try the next most common type etc. before finally erroring out.
>
> So basically, using exception handling for flow-control.
>
> However, is that considered bad practice, or un-Pythonic?

It's fairly common to work that way. But you may want to be careful
what order you try them in; some codecs might be technically capable
of reading other formats than you wanted, so start with the most
specific.

Alternatively, looking at a file's magic number (either with
python-magic/libmagic or by manually reading in a few bytes) might be
more efficient. Either way can work, take your choice!

ChrisA

[toc] | [prev] | [next] | [standalone]


#59962

FromAmit Saha <amitsaha.in@gmail.com>
Date2013-11-19 17:22 +1000
Message-ID<mailman.2890.1384846203.18130.python-list@python.org>
In reply to#59959
On Tue, Nov 19, 2013 at 5:13 PM, Victor Hooi <victorhooi@gmail.com> wrote:
> Hi,
>
> I have a script that needs to handle input files of different types (uncompressed, gzipped etc.).
>
> My question is regarding how I should handle the different cases.
>
> My first thought was to use a try-catch block and attempt to open it using the most common filetype, then if that failed, try the next most common type etc. before finally erroring out.
>
> So basically, using exception handling for flow-control.
>
> However, is that considered bad practice, or un-Pythonic?
>
> What other alternative constructs could I also use, and pros and cons?
>
> (I was thinking I could also use python-magic which wraps libmagic, or I can just rely on file extensions).
>
> Other thoughts?

How about starting with a dictionary like this:

file_opener = {'.gz': gz_opener,
                     '.txt': text_opener,
                    '.zip': zip_opener}
                   # and so on.

where the *_opener are say functions which does the job of actually
opening the files.
The above dictionary is keyed on file extensions, but perhaps you
would be better off using MIME types instead.

Assuming you go ahead with using MIME type, how about using
python-magic to detect the type and then look in your dictionary
above, if there is a corresponding file_opener object. If you get a
KeyError, you can raise an exception saying that you cannot handle
this file.


How does that sound?

Best,
Amit.


-- 
http://echorand.me

[toc] | [prev] | [next] | [standalone]


#59969

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2013-11-19 09:36 +0000
Message-ID<mailman.2897.1384853817.18130.python-list@python.org>
In reply to#59959
On 19/11/2013 07:13, Victor Hooi wrote:
>
> So basically, using exception handling for flow-control.
>
> However, is that considered bad practice, or un-Pythonic?
>

If it works for you use it, practicality beats purity :)

-- 
Python is the second best programming language in the world.
But the best has yet to be invented.  Christian Tismer

Mark Lawrence

[toc] | [prev] | [next] | [standalone]


#60043

FromVictor Hooi <victorhooi@gmail.com>
Date2013-11-19 16:30 -0800
Message-ID<4c5f2389-177c-4056-8857-19e2950f8aa7@googlegroups.com>
In reply to#59969
Hi,

Is either approach (try-excepts, or using libmagic) considered more idiomatic? What would you guys prefer yourselves?

Also, is it possible to use either approach with a context manager ("with"), without duplicating lots of code?

For example:

try:
	with gzip.open('blah.txt', 'rb') as f:
		for line in f:
			print(line)
except IOError as e:
	with open('blah.txt', 'rb') as f:
		for line in f:
			print(line)

I'm not sure of how to do this without needing to duplicating the processing lines (everything inside the with)?

And using:

try:
	f = gzip.open('blah.txt', 'rb')
except IOError as e:
	f = open('blah.txt', 'rb')
finally:
	for line in f:
		print(line)

won't work, since the exception won't get thrown until you actually try to open the file. Plus, I'm under the impression that I should be using context-managers where I can.

Also, on another note, python-magic will return a string as a result, e.g.:

gzip compressed data, was "blah.txt", from Unix, last modified: Wed Nov 20 10:48:35 2013

I suppose it's enough to just do a?

    if "gzip compressed data" in results:

or is there a better way?

Cheers,
Victor

On Tuesday, 19 November 2013 20:36:47 UTC+11, Mark Lawrence  wrote:
> On 19/11/2013 07:13, Victor Hooi wrote:
> 
> >
> 
> > So basically, using exception handling for flow-control.
> 
> >
> 
> > However, is that considered bad practice, or un-Pythonic?
> 
> >
> 
> 
> 
> If it works for you use it, practicality beats purity :)
> 
> 
> 
> -- 
> 
> Python is the second best programming language in the world.
> 
> But the best has yet to be invented.  Christian Tismer
> 
> 
> 
> Mark Lawrence

[toc] | [prev] | [next] | [standalone]


#60045

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2013-11-20 01:56 +0000
Message-ID<528c16b5$0$29992$c3e8da3$5496439d@news.astraweb.com>
In reply to#60043
On Tue, 19 Nov 2013 16:30:46 -0800, Victor Hooi wrote:

> Hi,
> 
> Is either approach (try-excepts, or using libmagic) considered more
> idiomatic? What would you guys prefer yourselves?

Specifically in the case of file types, I consider it better to use 
libmagic. But as a general technique, using try...except is a reasonable 
approach in many situations.


> Also, is it possible to use either approach with a context manager
> ("with"), without duplicating lots of code?
> 
> For example:
> 
> try:
> 	with gzip.open('blah.txt', 'rb') as f:
> 		for line in f:
> 			print(line)
> except IOError as e:
> 	with open('blah.txt', 'rb') as f:
> 		for line in f:
> 			print(line)
> 
> I'm not sure of how to do this without needing to duplicating the
> processing lines (everything inside the with)?

Write a helper function:

def process(opener):
    with opener('blah.txt', 'rb') as f:
        for line in f:
            print(line)


try:
    process(gzip.open)
except IOError:
    process(open)


If you have many different things to try:


for opener in [gzip.open, open, ...]:
    try:
        process(opener)
    except IOError:
        continue
    else:
        break



[...]
> Also, on another note, python-magic will return a string as a result,
> e.g.:
> 
> gzip compressed data, was "blah.txt", from Unix, last modified: Wed Nov
> 20 10:48:35 2013
> 
> I suppose it's enough to just do a?
> 
>     if "gzip compressed data" in results:
> 
> or is there a better way?

*shrug*

Read the docs of python-magic. Do they offer a programmable API? If not, 
that kinda sucks.



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#60090

FromNeil Cerutti <mr.cerutti@gmail.com>
Date2013-11-20 10:05 -0500
Message-ID<mailman.2967.1384959910.18130.python-list@python.org>
In reply to#60045
Steven D'Aprano steve+comp.lang.python@pearwood.info via python.org
8:56 PM (12 hours ago) wrote:
> Write a helper function:
>
> def process(opener):
>     with opener('blah.txt', 'rb') as f:
>         for line in f:
>             print(line)

As another option, you can enter the context manager after you decide.

try:
    f = gzip.open('blah.txt', 'rb')
except IOError:
    f = open('blah.txt', 'rb')
with f:
   # processing
   for line in f:
       print(line)

contextlib.ExitStack was designed to handle cases where entering
context is optional, and so also works for this use case.

with contextlib.ExitStack() as stack:
    try:
        f = gzip.open('blah.txt', 'rb')
    except IOError:
        f = open('blah.txt', 'rb')
    stack.enter_context(f)
    for line in f:
       print(line)

-- 
Neil Cerutti

On Tue, Nov 19, 2013 at 8:56 PM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> On Tue, 19 Nov 2013 16:30:46 -0800, Victor Hooi wrote:
>
>> Hi,
>>
>> Is either approach (try-excepts, or using libmagic) considered more
>> idiomatic? What would you guys prefer yourselves?
>
> Specifically in the case of file types, I consider it better to use
> libmagic. But as a general technique, using try...except is a reasonable
> approach in many situations.
>
>
>> Also, is it possible to use either approach with a context manager
>> ("with"), without duplicating lots of code?
>>
>> For example:
>>
>> try:
>>       with gzip.open('blah.txt', 'rb') as f:
>>               for line in f:
>>                       print(line)
>> except IOError as e:
>>       with open('blah.txt', 'rb') as f:
>>               for line in f:
>>                       print(line)
>>
>> I'm not sure of how to do this without needing to duplicating the
>> processing lines (everything inside the with)?
>
> Write a helper function:
>
> def process(opener):
>     with opener('blah.txt', 'rb') as f:
>         for line in f:
>             print(line)
>
>
> try:
>     process(gzip.open)
> except IOError:
>     process(open)
>
>
> If you have many different things to try:
>
>
> for opener in [gzip.open, open, ...]:
>     try:
>         process(opener)
>     except IOError:
>         continue
>     else:
>         break
>
>
>
> [...]
>> Also, on another note, python-magic will return a string as a result,
>> e.g.:
>>
>> gzip compressed data, was "blah.txt", from Unix, last modified: Wed Nov
>> 20 10:48:35 2013
>>
>> I suppose it's enough to just do a?
>>
>>     if "gzip compressed data" in results:
>>
>> or is there a better way?
>
> *shrug*
>
> Read the docs of python-magic. Do they offer a programmable API? If not,
> that kinda sucks.
>
>
>
> --
> Steven
> --
> https://mail.python.org/mailman/listinfo/python-list



-- 
Neil Cerutti <mr.cerutti+python@gmail.com>

[toc] | [prev] | [next] | [standalone]


#60047

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2013-11-20 01:50 +0000
Message-ID<mailman.2948.1384913119.18130.python-list@python.org>
In reply to#60043
On 20/11/2013 00:30, Victor Hooi wrote:
> Hi,
>
> Is either approach (try-excepts, or using libmagic) considered more idiomatic? What would you guys prefer yourselves?
>
> Also, is it possible to use either approach with a context manager ("with"), without duplicating lots of code?
>
> For example:
>
> try:
> 	with gzip.open('blah.txt', 'rb') as f:
> 		for line in f:
> 			print(line)
> except IOError as e:
> 	with open('blah.txt', 'rb') as f:
> 		for line in f:
> 			print(line)
>
> I'm not sure of how to do this without needing to duplicating the processing lines (everything inside the with)?
>
> And using:
>
> try:
> 	f = gzip.open('blah.txt', 'rb')
> except IOError as e:
> 	f = open('blah.txt', 'rb')
> finally:
> 	for line in f:
> 		print(line)
>
> won't work, since the exception won't get thrown until you actually try to open the file. Plus, I'm under the impression that I should be using context-managers where I can.
>
> Also, on another note, python-magic will return a string as a result, e.g.:
>
> gzip compressed data, was "blah.txt", from Unix, last modified: Wed Nov 20 10:48:35 2013
>
> I suppose it's enough to just do a?
>
>      if "gzip compressed data" in results:
>
> or is there a better way?
>
> Cheers,
> Victor
>
> On Tuesday, 19 November 2013 20:36:47 UTC+11, Mark Lawrence  wrote:
>> On 19/11/2013 07:13, Victor Hooi wrote:
>>
>>>
>>
>>> So basically, using exception handling for flow-control.
>>
>>>
>>
>>> However, is that considered bad practice, or un-Pythonic?
>>
>>>
>>
>>
>>
>> If it works for you use it, practicality beats purity :)
>>
>>
>>
>> --
>>
>> Python is the second best programming language in the world.
>>
>> But the best has yet to be invented.  Christian Tismer
>>
>>
>>
>> Mark Lawrence

Something like

for filetype in filetypes:
   try:
     process(filetype)
     break
   except IOError:
     pass

??? as it's 01:50 GMT and I can't sleep :(

-- 
Python is the second best programming language in the world.
But the best has yet to be invented.  Christian Tismer

Mark Lawrence

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web