Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #65608
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Subject | Re: Possible bug with stability of mimetypes.guess_* function output |
| Date | 2014-02-07 20:40 +0100 |
| Organization | None |
| References | <ld37bb$7ji$1@news.albasani.net> <03a2c4c8-313f-4382-8be9-5163d8bf644c@googlegroups.com> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.6497.1391802017.18130.python-list@python.org> (permalink) |
Asaf Las wrote:
> On Friday, February 7, 2014 8:06:36 PM UTC+2, Johannes Bauer wrote:
>> Hi group,
>>
>> I'm using Python 3.3.2+ (default, Oct 9 2013, 14:50:09) [GCC 4.8.1] on
>> linux and have found what is very peculiar behavior at best and a bug at
>> worst. It regards the mimetypes module and in particular the
>> guess_all_extensions and guess_extension functions.
>>
>> I've found that these do not return stable output. When running the
>> following commands, it returns one of:
>>
>> $ python3 -c 'import mimetypes;
>> print(mimetypes.guess_all_extensions("text/html"),
>> mimetypes.guess_extension("text/html"))'
>> ['.htm', '.html', '.shtml'] .htm
>>
>> $ python3 -c 'import mimetypes;
>> print(mimetypes.guess_all_extensions("text/html"),
>> mimetypes.guess_extension("text/html"))'
>> ['.html', '.htm', '.shtml'] .html
>>
>> So guess_extension(x) seems to always return guess_all_extensions(x)[0].
>>
>> Curiously, "shtml" is never the first element. The other two are mixed
>> with a probability of around 50% which leads me to believe they're
>> internally managed as a set and are therefore affected by the
>> (relatively new) nondeterministic hashing function initialization.
>>
>>
>> I don't know if stable output is guaranteed for these functions, but it
>> sure would be nice. Messes up a whole bunch of things otherwise :-/
>>
>> Please let me know if this is a bug or expected behavior.
>>
>> Best regards,
>>
>> Johannes
>
> dictionary. same for v3.3.3 as well.
>
> it might be you could try to query using sequence below :
>
> import mimetypes
> mimetypes.init()
> mimetypes.guess_extension("text/html")
>
> i got only 'htm' for 5 consequitive attempts
As Johannes mentioned, this depends on the hash seed:
$ PYTHONHASHSEED=0 python3 -c 'print({".htm", ".html", ".shtml"}.pop())'
.html
$ PYTHONHASHSEED=1 python3 -c 'print({".htm", ".html", ".shtml"}.pop())'
.htm
$ PYTHONHASHSEED=2 python3 -c 'print({".htm", ".html", ".shtml"}.pop())'
.shtml
You never see ".shtml" as the guessed extension because it is not in the
original mimetypes.types_map dict, but instead programmaticaly read from a
file like /etc/mime.types and then added to a list of extensions.
Johanes,
I'd like the guessed extension to be consistent, too, but even if that is
rejected the current behaviour should be documented.
Please file a bug report.
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Possible bug with stability of mimetypes.guess_* function output Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-02-07 19:06 +0100
Re: Possible bug with stability of mimetypes.guess_* function output Asaf Las <roegltd@gmail.com> - 2014-02-07 11:09 -0800
Re: Possible bug with stability of mimetypes.guess_* function output Asaf Las <roegltd@gmail.com> - 2014-02-07 11:17 -0800
Re: Possible bug with stability of mimetypes.guess_* function output Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-02-07 19:28 +0000
Re: Possible bug with stability of mimetypes.guess_* function output Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-02-07 20:39 +0100
Re: Possible bug with stability of mimetypes.guess_* function output Peter Otten <__peter__@web.de> - 2014-02-07 20:40 +0100
Re: Possible bug with stability of mimetypes.guess_* function output Asaf Las <roegltd@gmail.com> - 2014-02-07 12:25 -0800
Re: Possible bug with stability of mimetypes.guess_* function output Peter Otten <__peter__@web.de> - 2014-02-08 08:51 +0100
Re: Possible bug with stability of mimetypes.guess_* function output Asaf Las <roegltd@gmail.com> - 2014-02-08 00:24 -0800
Re: Possible bug with stability of mimetypes.guess_* function output Peter Otten <__peter__@web.de> - 2014-02-08 09:39 +0100
Re: Possible bug with stability of mimetypes.guess_* function output Asaf Las <roegltd@gmail.com> - 2014-02-08 02:59 -0800
csiph-web