Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #43160 > unrolled thread

newbie question about confusing exception handling in urllib

Started bycabbar@gmail.com
First post2013-04-09 04:41 -0700
Last post2013-04-09 13:11 -0600
Articles 8 — 7 participants

Back to article view | Back to comp.lang.python


Contents

  newbie question about confusing exception handling in urllib cabbar@gmail.com - 2013-04-09 04:41 -0700
    Re: newbie question about confusing exception handling in urllib Peter Otten <__peter__@web.de> - 2013-04-09 14:19 +0200
    Re: newbie question about confusing exception handling in urllib cabbar@gmail.com - 2013-04-09 06:19 -0700
      Re: newbie question about confusing exception handling in urllib Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-09 15:05 +0000
        Re: newbie question about confusing exception handling in urllib Chris Angelico <rosuav@gmail.com> - 2013-04-10 02:23 +1000
        RE: newbie question about confusing exception handling in urllib "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2013-04-12 21:29 +0000
    Re: newbie question about confusing exception handling in urllib Terry Jan Reedy <tjreedy@udel.edu> - 2013-04-09 10:19 -0400
    Re: newbie question about confusing exception handling in urllib Ian Kelly <ian.g.kelly@gmail.com> - 2013-04-09 13:11 -0600

#43160 — newbie question about confusing exception handling in urllib

Fromcabbar@gmail.com
Date2013-04-09 04:41 -0700
Subjectnewbie question about confusing exception handling in urllib
Message-ID<1ae3261b-078d-4362-abff-ea4471addd6a@googlegroups.com>
Hi,

I have been using Java/Perl professionally for many years and have been trying to learn python3 recently. As my first program, I tried writing a class for a small project, and I am having really hard time understanding exception handling in urllib and in python in general...
Basically, what I want to do is very simple, try to fetch something "tryurllib.request.urlopen(request)", and:
  - If request times out or connection is reset, re-try n times
  - If it fails, return an error
  - If it works return the content.

But, this simple requirement became a nightmare for me. I am really confused about how I should be checking this because:
  - When connection times out, I sometimes get URLException with "reason" field set to socket.timeout, and checking (isinstance(exception.reason, socket.timeout)) works fine
  - But sometimes I get socket.timeout exception directly, and it has no "reason" field, so above statement fails, since there is no reason field there.
  - Connection reset is a totally different exception
  - Not to mention, some exceptions have msg / reason / errno fields but some don't, so there is no way of knowing exception details unless you check them one by one. The only common thing I could was to find call __str__()?
  - Since, there are too many possible exceptions, you need to catch BaseException (I received URLError, socket.timeout, ConnectionRefusedError, ConnectionResetError, BadStatusLine, and none share a common parent). And, catching the top level exception is not a good thing.

So, I ended up writing the following, but from everything I know, this looks really ugly and wrong???

        try: 
            response = urllib.request.urlopen(request)
            content = response.read()
        except BaseException as ue:
            if (isinstance(ue, socket.timeout) or (hasattr(ue, "reason") and isinstance(ue.reason, socket.timeout)) or isinstance(ue, ConnectionResetError)):
                print("REQUEST TIMED OUT")

or, something like:

        except:
            (a1,a2,a3) = sys.exc_info()
            errorString = a2.__str__()
            if ((errorString.find("Connection reset by peer") >= 0) or (errorString.find("error timed out") >= 0)):

Am I missing something here? I mean, is this really how I should be doing it?

Thanks.

[toc] | [next] | [standalone]


#43161

FromPeter Otten <__peter__@web.de>
Date2013-04-09 14:19 +0200
Message-ID<mailman.344.1365509864.3114.python-list@python.org>
In reply to#43160
cabbar@gmail.com wrote:

> Hi,
> 
> I have been using Java/Perl professionally for many years and have been
> trying to learn python3 recently. As my first program, I tried writing a
> class for a small project, and I am having really hard time understanding
> exception handling in urllib and in python in general... Basically, what I
> want to do is very simple, try to fetch something
> "tryurllib.request.urlopen(request)", and:
>   - If request times out or connection is reset, re-try n times
>   - If it fails, return an error
>   - If it works return the content.
> 
> But, this simple requirement became a nightmare for me. I am really
> confused about how I should be checking this because:
>   - When connection times out, I sometimes get URLException with "reason"
>   field set to socket.timeout, and checking (isinstance(exception.reason,
>   socket.timeout)) works fine - But sometimes I get socket.timeout
>   exception directly, and it has no "reason" field, so above statement
>   fails, since there is no reason field there. - Connection reset is a
>   totally different exception - Not to mention, some exceptions have msg /
>   reason / errno fields but some don't, so there is no way of knowing
>   exception details unless you check them one by one. The only common
>   thing I could was to find call __str__()? - Since, there are too many
>   possible exceptions, you need to catch BaseException (I received
>   URLError, socket.timeout, ConnectionRefusedError, ConnectionResetError,
>   BadStatusLine, and none share a common parent). And, catching the top
>   level exception is not a good thing.
> 
> So, I ended up writing the following, but from everything I know, this
> looks really ugly and wrong???
> 
>         try:
>             response = urllib.request.urlopen(request)
>             content = response.read()
>         except BaseException as ue:
>             if (isinstance(ue, socket.timeout) or (hasattr(ue, "reason")
>             and isinstance(ue.reason, socket.timeout)) or isinstance(ue,
>             ConnectionResetError)):
>                 print("REQUEST TIMED OUT")
> 
> or, something like:
> 
>         except:
>             (a1,a2,a3) = sys.exc_info()
>             errorString = a2.__str__()
>             if ((errorString.find("Connection reset by peer") >= 0) or
>             (errorString.find("error timed out") >= 0)):
> 
> Am I missing something here? I mean, is this really how I should be doing
> it?

Does it help if you reorganize your code a bit? For example:

def read_content(request)
    try:
        response = urllib.request.urlopen(request)
        content = response.read()
    except socket.timeout:
        return None
    except URLError as ue:
        if isinstance(ue.reason, socket.timeout):
            return None
        raise
    return content

for i in range(max_tries):
    content = read_content(request)
    if content is not None:
        break
else:
    print("Could not download", request)

Instead of returning an out-of-band response (None) you could also raise a 
custom exception (called MyTimeoutError below). The retry-loop would then 
become

for i in range(max_tries):
    try:
        content = read_content(request):
    except MyTimeoutError:
        pass
    else:
        break
else:
    print("Could not download", request)


[toc] | [prev] | [next] | [standalone]


#43164

Fromcabbar@gmail.com
Date2013-04-09 06:19 -0700
Message-ID<c395da30-c593-4aeb-960e-69bf98c31880@googlegroups.com>
In reply to#43160
Ah, looks better. 

But, 2 questions:

1. I should also catch ConnectionResetError I am guessing.
2. How do I handle all other exceptions, just say Exception: and handle them? I want to silently ignore them.

Thanks...

On Tuesday, April 9, 2013 2:41:51 PM UTC+3, cab...@gmail.com wrote:
> Hi,
> 
> 
> 
> I have been using Java/Perl professionally for many years and have been trying to learn python3 recently. As my first program, I tried writing a class for a small project, and I am having really hard time understanding exception handling in urllib and in python in general...
> 
> Basically, what I want to do is very simple, try to fetch something "tryurllib.request.urlopen(request)", and:
> 
>   - If request times out or connection is reset, re-try n times
> 
>   - If it fails, return an error
> 
>   - If it works return the content.
> 
> 
> 
> But, this simple requirement became a nightmare for me. I am really confused about how I should be checking this because:
> 
>   - When connection times out, I sometimes get URLException with "reason" field set to socket.timeout, and checking (isinstance(exception.reason, socket.timeout)) works fine
> 
>   - But sometimes I get socket.timeout exception directly, and it has no "reason" field, so above statement fails, since there is no reason field there.
> 
>   - Connection reset is a totally different exception
> 
>   - Not to mention, some exceptions have msg / reason / errno fields but some don't, so there is no way of knowing exception details unless you check them one by one. The only common thing I could was to find call __str__()?
> 
>   - Since, there are too many possible exceptions, you need to catch BaseException (I received URLError, socket.timeout, ConnectionRefusedError, ConnectionResetError, BadStatusLine, and none share a common parent). And, catching the top level exception is not a good thing.
> 
> 
> 
> So, I ended up writing the following, but from everything I know, this looks really ugly and wrong???
> 
> 
> 
>         try: 
> 
>             response = urllib.request.urlopen(request)
> 
>             content = response.read()
> 
>         except BaseException as ue:
> 
>             if (isinstance(ue, socket.timeout) or (hasattr(ue, "reason") and isinstance(ue.reason, socket.timeout)) or isinstance(ue, ConnectionResetError)):
> 
>                 print("REQUEST TIMED OUT")
> 
> 
> 
> or, something like:
> 
> 
> 
>         except:
> 
>             (a1,a2,a3) = sys.exc_info()
> 
>             errorString = a2.__str__()
> 
>             if ((errorString.find("Connection reset by peer") >= 0) or (errorString.find("error timed out") >= 0)):
> 
> 
> 
> Am I missing something here? I mean, is this really how I should be doing it?
> 
> 
> 
> Thanks.

[toc] | [prev] | [next] | [standalone]


#43181

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2013-04-09 15:05 +0000
Message-ID<51642e32$0$30003$c3e8da3$5496439d@news.astraweb.com>
In reply to#43164
On Tue, 09 Apr 2013 06:19:09 -0700, cabbar wrote:

> How do I
> handle all other exceptions, just say Exception: and handle them? I want
> to silently ignore them.

Please don't. That is normally poor practice, since it simply hides bugs 
in your code.

As a general rule, you should only catch exceptions that you know are 
harmless, and that you can recover from. If you don't know that it's 
harmless, then it's probably a bug, and you should let it raise, so you 
can see the traceback and fix it. In the words of Chris Smith:

    "I find it amusing when novice programmers believe their 
     main job is preventing programs from crashing. ... More 
     experienced programmers realize that correct code is 
     great, code that crashes could use improvement, but 
     incorrect code that doesn’t crash is a horrible nightmare."


One exception to this rule (no pun intended) is that sometimes you want 
to hide the details of unexpected tracebacks from your users. In that 
case, it may be acceptable to wrap your application's main function in a 
try block, catch any unexpected exceptions, log the exception, and then 
quietly exit with a short, non-threatening error message that won't scare 
the civilians:


try:
    main()
except Exception as err:
    log(err)
    print("Sorry, an unexpected error has occurred.")
    print("Please contact support for assistance.")
    sys.exit(-1)


Still want to catch all unexpected errors, and ignore them? If you're 
absolutely sure that this is the right thing to do, then:

try:
    code_goes_here()
except Exception:
    pass

But really, you shouldn't do this.

(For experts only: you *probably* shouldn't do this.)



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#43194

FromChris Angelico <rosuav@gmail.com>
Date2013-04-10 02:23 +1000
Message-ID<mailman.362.1365524611.3114.python-list@python.org>
In reply to#43181
On Wed, Apr 10, 2013 at 1:05 AM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> One exception to this rule (no pun intended) is that sometimes you want
> to hide the details of unexpected tracebacks from your users. In that
> case, it may be acceptable to wrap your application's main function in a
> try block, catch any unexpected exceptions, log the exception, and then
> quietly exit with a short, non-threatening error message that won't scare
> the civilians

This is important to some types of security concern, too; for
instance, if I'm running a web server, I probably don't want to leak
details of exceptions and tracebacks to a potential attacker. Same
again: catch the exception, log it, return simple error message;
additionally, you can return that message as an HTTP response rather
than simply bombing the web server. But again, a bare except should
almost always be logging its exceptions.

True story, though not in Python: After taking over the code of an
ex-coworker, I was trying to fix some crazy problems. Everything I did
seemed to kinda-work, but nothing properly worked. Trying to clean up
the code to comply with "use strict" mode (which will tell you what
language this is, and it isn't Perl) was a matter of blundering about
in the dark. Turned out there was an event handler somewhere that
buried the *entire file full of code* behind a callback that caught
and suppressed everything. Gee, thanks. Web browsers these days are
pretty good at reporting exceptions - we were mainly using Chrome's
inbuilt Firebug-equivalent - but our brilliant coworker saw fit to
hide them all.

Exceptions are a huge boon.

ChrisA

[toc] | [prev] | [next] | [standalone]


#43486

From"Prasad, Ramit" <ramit.prasad@jpmorgan.com>
Date2013-04-12 21:29 +0000
Message-ID<mailman.537.1365802247.3114.python-list@python.org>
In reply to#43181
Steven D'Aprano wrote:
> try:
>     main()
> except Exception as err:
>     log(err)
>     print("Sorry, an unexpected error has occurred.")
>     print("Please contact support for assistance.")
>     sys.exit(-1)
> 
> 

I like the traceback[0] module for logging last exception thrown.
See traceback.format_exc() or traceback.print_exc().

trace = traceback.format_exc()
log.error('I was trying to do <action>, but unexpected error.\n{0}'.format(trace))

[0] http://docs.python.org/2/library/traceback.html


~Ramit



This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  

[toc] | [prev] | [next] | [standalone]


#43176

FromTerry Jan Reedy <tjreedy@udel.edu>
Date2013-04-09 10:19 -0400
Message-ID<mailman.351.1365517159.3114.python-list@python.org>
In reply to#43160
On 4/9/2013 7:41 AM, cabbar@gmail.com wrote:
> Hi,
>
> I have been using Java/Perl professionally for many years and have been trying to learn python3 recently. As my first program, I tried writing a class for a small project, and I am having really hard time understanding exception handling in urllib and in python in general...
> Basically, what I want to do is very simple,

Very funny ;-). What you are trying to do, as your first project, is 
interact with the large, multi-layered, non=deterministic monster known 
as Internet, with timeout handling, through multiple layers of library 
code. When it comes to exception handling, this is about the most 
complex thing you can do.

> try to fetch something "tryurllib.request.urlopen(request)", and:
>    - If request times out or connection is reset, re-try n times
>    - If it fails, return an error
>    - If it works return the content.
>
> But, this simple requirement became a nightmare for me. I am really confused about how I should be checking this because:
>    - When connection times out, I sometimes get URLException with "reason" field set to socket.timeout, and checking (isinstance(exception.reason, socket.timeout)) works fine
>    - But sometimes I get socket.timeout exception directly, and it has no "reason" field, so above statement fails, since there is no reason field there.

If you are curious why the different exceptions for seemingly the same 
problem, you can look at the printed traceback to see where the 
different exceptions come from. Either don't catch the exceptions, 
re-raise them, or explicitly grab the traceback (from exc_info, I believe)

>    - Connection reset is a totally different exception
>    - Not to mention, some exceptions have msg / reason / errno fields but some don't, so there is no way of knowing exception details unless you check them one by one. The only common thing I could was to find call __str__()?

The system is probably a bit more ragged then it might be if completely 
re-designed from scratch.

>    - Since, there are too many possible exceptions, you need to catch BaseException (I received URLError, socket.timeout, ConnectionRefusedError, ConnectionResetError, BadStatusLine, and none share a common parent). And, catching the top level exception is not a good thing.

You are right, catching BaseException is bad. In particular, it will 
catch KeyboardInterrupt from a user trying to stop the process. It is 
also unnecessary as all the exceptions you want to catch are derived 
from Exception, which itself is derived from BaseException.

> So, I ended up writing the following, but from everything I know, this looks really ugly and wrong???
>
>          try:
>              response = urllib.request.urlopen(request)
>              content = response.read()
>          except BaseException as ue:

except Exception as ue:

>              if (isinstance(ue, socket.timeout) or (hasattr(ue, "reason") and isinstance(ue.reason, socket.timeout)) or isinstance(ue, ConnectionResetError)):
>                  print("REQUEST TIMED OUT")
>
> or, something like:
>
>          except:

except Exception:

>              (a1,a2,a3) = sys.exc_info()
>              errorString = a2.__str__()
>              if ((errorString.find("Connection reset by peer") >= 0) or (errorString.find("error timed out") >= 0)):

--
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]


#43216

FromIan Kelly <ian.g.kelly@gmail.com>
Date2013-04-09 13:11 -0600
Message-ID<mailman.378.1365534727.3114.python-list@python.org>
In reply to#43160
On Tue, Apr 9, 2013 at 5:41 AM,  <cabbar@gmail.com> wrote:
>         try:
>             response = urllib.request.urlopen(request)
>             content = response.read()
>         except BaseException as ue:
>             if (isinstance(ue, socket.timeout) or (hasattr(ue, "reason") and isinstance(ue.reason, socket.timeout)) or isinstance(ue, ConnectionResetError)):
>                 print("REQUEST TIMED OUT")

I'm surprised nobody has yet pointed out that you can catch multiple
specific exception types in the except clause rather than needing to
organize them under a catch-all base class.  These two code blocks are
basically equivalent:

try:
    do_stuff()
except BaseException as ue:
    if isinstance(ue, (socket.timeout, ConnectionResetError)):
        handle_it()
    else:
        raise

try:
    do_stuff()
except (socket.timeout, ConnectionResetError) as ue:
    handle_it()

Cheers,
Ian

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web