Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #65608

Re: Possible bug with stability of mimetypes.guess_* function output

From Peter Otten <__peter__@web.de>
Subject Re: Possible bug with stability of mimetypes.guess_* function output
Date 2014-02-07 20:40 +0100
Organization None
References <ld37bb$7ji$1@news.albasani.net> <03a2c4c8-313f-4382-8be9-5163d8bf644c@googlegroups.com>
Newsgroups comp.lang.python
Message-ID <mailman.6497.1391802017.18130.python-list@python.org> (permalink)

Show all headers | View raw


Asaf Las wrote:

> On Friday, February 7, 2014 8:06:36 PM UTC+2, Johannes Bauer wrote:
>> Hi group,
>> 
>> I'm using Python 3.3.2+ (default, Oct  9 2013, 14:50:09) [GCC 4.8.1] on
>> linux and have found what is very peculiar behavior at best and a bug at
>> worst. It regards the mimetypes module and in particular the
>> guess_all_extensions and guess_extension functions.
>> 
>> I've found that these do not return stable output. When running the
>> following commands, it returns one of:
>> 
>> $ python3 -c 'import mimetypes;
>> print(mimetypes.guess_all_extensions("text/html"),
>> mimetypes.guess_extension("text/html"))'
>> ['.htm', '.html', '.shtml'] .htm
>> 
>> $ python3 -c 'import mimetypes;
>> print(mimetypes.guess_all_extensions("text/html"),
>> mimetypes.guess_extension("text/html"))'
>> ['.html', '.htm', '.shtml'] .html
>> 
>> So guess_extension(x) seems to always return guess_all_extensions(x)[0].
>> 
>> Curiously, "shtml" is never the first element. The other two are mixed
>> with a probability of around 50% which leads me to believe they're
>> internally managed as a set and are therefore affected by the
>> (relatively new) nondeterministic hashing function initialization.
>> 
>> 
>> I don't know if stable output is guaranteed for these functions, but it
>> sure would be nice. Messes up a whole bunch of things otherwise :-/
>> 
>> Please let me know if this is a bug or expected behavior.
>> 
>> Best regards,
>> 
>> Johannes
> 
> dictionary. same for v3.3.3 as well.
> 
> it might be you could try to query using sequence below :
> 
> import mimetypes
> mimetypes.init()
> mimetypes.guess_extension("text/html")
> 
> i got only 'htm' for 5 consequitive attempts

As Johannes mentioned, this depends on the hash seed:

$ PYTHONHASHSEED=0 python3 -c 'print({".htm", ".html", ".shtml"}.pop())'
.html
$ PYTHONHASHSEED=1 python3 -c 'print({".htm", ".html", ".shtml"}.pop())'
.htm
$ PYTHONHASHSEED=2 python3 -c 'print({".htm", ".html", ".shtml"}.pop())'
.shtml

You never see ".shtml" as the guessed extension because it is not in the 
original mimetypes.types_map dict, but instead programmaticaly read from a 
file like /etc/mime.types and then added to a list of extensions.

Johanes, 
I'd like the guessed extension to be consistent, too, but even if that is 
rejected the current behaviour should be documented. 

Please file a bug report.

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Possible bug with stability of mimetypes.guess_* function output Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-02-07 19:06 +0100
  Re: Possible bug with stability of mimetypes.guess_* function output Asaf Las <roegltd@gmail.com> - 2014-02-07 11:09 -0800
    Re: Possible bug with stability of mimetypes.guess_* function output Asaf Las <roegltd@gmail.com> - 2014-02-07 11:17 -0800
      Re: Possible bug with stability of mimetypes.guess_* function output Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-02-07 19:28 +0000
    Re: Possible bug with stability of mimetypes.guess_* function output Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-02-07 20:39 +0100
    Re: Possible bug with stability of mimetypes.guess_* function output Peter Otten <__peter__@web.de> - 2014-02-07 20:40 +0100
      Re: Possible bug with stability of mimetypes.guess_* function output Asaf Las <roegltd@gmail.com> - 2014-02-07 12:25 -0800
        Re: Possible bug with stability of mimetypes.guess_* function output Peter Otten <__peter__@web.de> - 2014-02-08 08:51 +0100
          Re: Possible bug with stability of mimetypes.guess_* function output Asaf Las <roegltd@gmail.com> - 2014-02-08 00:24 -0800
            Re: Possible bug with stability of mimetypes.guess_* function output Peter Otten <__peter__@web.de> - 2014-02-08 09:39 +0100
              Re: Possible bug with stability of mimetypes.guess_* function output Asaf Las <roegltd@gmail.com> - 2014-02-08 02:59 -0800

csiph-web