Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #65603 > unrolled thread
| Started by | Johannes Bauer <dfnsonfsduifb@gmx.de> |
|---|---|
| First post | 2014-02-07 19:06 +0100 |
| Last post | 2014-02-08 02:59 -0800 |
| Articles | 11 — 4 participants |
Back to article view | Back to comp.lang.python
Possible bug with stability of mimetypes.guess_* function output Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-02-07 19:06 +0100
Re: Possible bug with stability of mimetypes.guess_* function output Asaf Las <roegltd@gmail.com> - 2014-02-07 11:09 -0800
Re: Possible bug with stability of mimetypes.guess_* function output Asaf Las <roegltd@gmail.com> - 2014-02-07 11:17 -0800
Re: Possible bug with stability of mimetypes.guess_* function output Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-02-07 19:28 +0000
Re: Possible bug with stability of mimetypes.guess_* function output Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-02-07 20:39 +0100
Re: Possible bug with stability of mimetypes.guess_* function output Peter Otten <__peter__@web.de> - 2014-02-07 20:40 +0100
Re: Possible bug with stability of mimetypes.guess_* function output Asaf Las <roegltd@gmail.com> - 2014-02-07 12:25 -0800
Re: Possible bug with stability of mimetypes.guess_* function output Peter Otten <__peter__@web.de> - 2014-02-08 08:51 +0100
Re: Possible bug with stability of mimetypes.guess_* function output Asaf Las <roegltd@gmail.com> - 2014-02-08 00:24 -0800
Re: Possible bug with stability of mimetypes.guess_* function output Peter Otten <__peter__@web.de> - 2014-02-08 09:39 +0100
Re: Possible bug with stability of mimetypes.guess_* function output Asaf Las <roegltd@gmail.com> - 2014-02-08 02:59 -0800
| From | Johannes Bauer <dfnsonfsduifb@gmx.de> |
|---|---|
| Date | 2014-02-07 19:06 +0100 |
| Subject | Possible bug with stability of mimetypes.guess_* function output |
| Message-ID | <ld37bb$7ji$1@news.albasani.net> |
Hi group,
I'm using Python 3.3.2+ (default, Oct 9 2013, 14:50:09) [GCC 4.8.1] on
linux and have found what is very peculiar behavior at best and a bug at
worst. It regards the mimetypes module and in particular the
guess_all_extensions and guess_extension functions.
I've found that these do not return stable output. When running the
following commands, it returns one of:
$ python3 -c 'import mimetypes;
print(mimetypes.guess_all_extensions("text/html"),
mimetypes.guess_extension("text/html"))'
['.htm', '.html', '.shtml'] .htm
$ python3 -c 'import mimetypes;
print(mimetypes.guess_all_extensions("text/html"),
mimetypes.guess_extension("text/html"))'
['.html', '.htm', '.shtml'] .html
So guess_extension(x) seems to always return guess_all_extensions(x)[0].
Curiously, "shtml" is never the first element. The other two are mixed
with a probability of around 50% which leads me to believe they're
internally managed as a set and are therefore affected by the
(relatively new) nondeterministic hashing function initialization.
I don't know if stable output is guaranteed for these functions, but it
sure would be nice. Messes up a whole bunch of things otherwise :-/
Please let me know if this is a bug or expected behavior.
Best regards,
Johannes
--
>> Wo hattest Du das Beben nochmal GENAU vorhergesagt?
> Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
- Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$1@speranza.aioe.org>
[toc] | [next] | [standalone]
| From | Asaf Las <roegltd@gmail.com> |
|---|---|
| Date | 2014-02-07 11:09 -0800 |
| Message-ID | <03a2c4c8-313f-4382-8be9-5163d8bf644c@googlegroups.com> |
| In reply to | #65603 |
On Friday, February 7, 2014 8:06:36 PM UTC+2, Johannes Bauer wrote:
> Hi group,
>
> I'm using Python 3.3.2+ (default, Oct 9 2013, 14:50:09) [GCC 4.8.1] on
> linux and have found what is very peculiar behavior at best and a bug at
> worst. It regards the mimetypes module and in particular the
> guess_all_extensions and guess_extension functions.
>
> I've found that these do not return stable output. When running the
> following commands, it returns one of:
>
> $ python3 -c 'import mimetypes;
> print(mimetypes.guess_all_extensions("text/html"),
> mimetypes.guess_extension("text/html"))'
> ['.htm', '.html', '.shtml'] .htm
>
> $ python3 -c 'import mimetypes;
> print(mimetypes.guess_all_extensions("text/html"),
> mimetypes.guess_extension("text/html"))'
> ['.html', '.htm', '.shtml'] .html
>
> So guess_extension(x) seems to always return guess_all_extensions(x)[0].
>
> Curiously, "shtml" is never the first element. The other two are mixed
> with a probability of around 50% which leads me to believe they're
> internally managed as a set and are therefore affected by the
> (relatively new) nondeterministic hashing function initialization.
>
>
> I don't know if stable output is guaranteed for these functions, but it
> sure would be nice. Messes up a whole bunch of things otherwise :-/
>
> Please let me know if this is a bug or expected behavior.
>
> Best regards,
>
> Johannes
dictionary. same for v3.3.3 as well.
it might be you could try to query using sequence below :
import mimetypes
mimetypes.init()
mimetypes.guess_extension("text/html")
i got only 'htm' for 5 consequitive attempts
/Asaf
[toc] | [prev] | [next] | [standalone]
| From | Asaf Las <roegltd@gmail.com> |
|---|---|
| Date | 2014-02-07 11:17 -0800 |
| Message-ID | <01c8e74a-5451-40a8-958a-c58c86a9f77f@googlegroups.com> |
| In reply to | #65604 |
btw, had seen this after own post - example usage includes mimetypes.init() before call to module functions.
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2014-02-07 19:28 +0000 |
| Message-ID | <mailman.6496.1391801309.18130.python-list@python.org> |
| In reply to | #65605 |
On 07/02/2014 19:17, Asaf Las wrote: > btw, had seen this after own post - > example usage includes mimetypes.init() > before call to module functions. > From http://docs.python.org/3/library/mimetypes.html#module-mimetypes third paragraph "The functions described below provide the primary interface for this module. If the module has not been initialized, they will call init() if they rely on the information init() sets up." Draw your own conclusions :) -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com
[toc] | [prev] | [next] | [standalone]
| From | Johannes Bauer <dfnsonfsduifb@gmx.de> |
|---|---|
| Date | 2014-02-07 20:39 +0100 |
| Message-ID | <ld3cpd$i42$1@news.albasani.net> |
| In reply to | #65604 |
On 07.02.2014 20:09, Asaf Las wrote:
> it might be you could try to query using sequence below :
>
> import mimetypes
> mimetypes.init()
> mimetypes.guess_extension("text/html")
>
> i got only 'htm' for 5 consequitive attempts
Doesn't change anything. With this:
#!/usr/bin/python3
import mimetypes
mimetypes.init()
print(mimetypes.guess_extension("application/msword"))
And a call like this:
$ for i in `seq 100`; do ./x.py ; done | sort | uniq -c
I get
35 .doc
24 .dot
41 .wiz
Regards,
Johannes
--
>> Wo hattest Du das Beben nochmal GENAU vorhergesagt?
> Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
- Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$1@speranza.aioe.org>
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2014-02-07 20:40 +0100 |
| Message-ID | <mailman.6497.1391802017.18130.python-list@python.org> |
| In reply to | #65604 |
Asaf Las wrote:
> On Friday, February 7, 2014 8:06:36 PM UTC+2, Johannes Bauer wrote:
>> Hi group,
>>
>> I'm using Python 3.3.2+ (default, Oct 9 2013, 14:50:09) [GCC 4.8.1] on
>> linux and have found what is very peculiar behavior at best and a bug at
>> worst. It regards the mimetypes module and in particular the
>> guess_all_extensions and guess_extension functions.
>>
>> I've found that these do not return stable output. When running the
>> following commands, it returns one of:
>>
>> $ python3 -c 'import mimetypes;
>> print(mimetypes.guess_all_extensions("text/html"),
>> mimetypes.guess_extension("text/html"))'
>> ['.htm', '.html', '.shtml'] .htm
>>
>> $ python3 -c 'import mimetypes;
>> print(mimetypes.guess_all_extensions("text/html"),
>> mimetypes.guess_extension("text/html"))'
>> ['.html', '.htm', '.shtml'] .html
>>
>> So guess_extension(x) seems to always return guess_all_extensions(x)[0].
>>
>> Curiously, "shtml" is never the first element. The other two are mixed
>> with a probability of around 50% which leads me to believe they're
>> internally managed as a set and are therefore affected by the
>> (relatively new) nondeterministic hashing function initialization.
>>
>>
>> I don't know if stable output is guaranteed for these functions, but it
>> sure would be nice. Messes up a whole bunch of things otherwise :-/
>>
>> Please let me know if this is a bug or expected behavior.
>>
>> Best regards,
>>
>> Johannes
>
> dictionary. same for v3.3.3 as well.
>
> it might be you could try to query using sequence below :
>
> import mimetypes
> mimetypes.init()
> mimetypes.guess_extension("text/html")
>
> i got only 'htm' for 5 consequitive attempts
As Johannes mentioned, this depends on the hash seed:
$ PYTHONHASHSEED=0 python3 -c 'print({".htm", ".html", ".shtml"}.pop())'
.html
$ PYTHONHASHSEED=1 python3 -c 'print({".htm", ".html", ".shtml"}.pop())'
.htm
$ PYTHONHASHSEED=2 python3 -c 'print({".htm", ".html", ".shtml"}.pop())'
.shtml
You never see ".shtml" as the guessed extension because it is not in the
original mimetypes.types_map dict, but instead programmaticaly read from a
file like /etc/mime.types and then added to a list of extensions.
Johanes,
I'd like the guessed extension to be consistent, too, but even if that is
rejected the current behaviour should be documented.
Please file a bug report.
[toc] | [prev] | [next] | [standalone]
| From | Asaf Las <roegltd@gmail.com> |
|---|---|
| Date | 2014-02-07 12:25 -0800 |
| Message-ID | <c26b109c-0247-4c99-80c3-dccfa3d7ab06@googlegroups.com> |
| In reply to | #65608 |
On Friday, February 7, 2014 9:40:06 PM UTC+2, Peter Otten wrote:
> As Johannes mentioned, this depends on the hash seed:
> $ PYTHONHASHSEED=0 python3 -c 'print({".htm", ".html", ".shtml"}.pop())'
> .html
> $ PYTHONHASHSEED=1 python3 -c 'print({".htm", ".html", ".shtml"}.pop())'
> .htm
> $ PYTHONHASHSEED=2 python3 -c 'print({".htm", ".html", ".shtml"}.pop())'
> .shtml
>
> You never see ".shtml" as the guessed extension because it is not in the
> original mimetypes.types_map dict, but instead programmaticaly read from a
> file like /etc/mime.types and then added to a list of extensions.
>
as there are bunch of files in mimetypes.py the only repeatability could
be achieved on particular machine level.
"/etc/mime.types",
"/etc/httpd/mime.types",
"/etc/httpd/conf/mime.types",
"/etc/apache/mime.types",
"/etc/apache2/mime.types",
"/usr/local/etc/httpd/conf/mime.types",
"/usr/local/lib/netscape/mime.types",
"/usr/local/etc/httpd/conf/mime.types",
"/usr/local/etc/mime.types"
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2014-02-08 08:51 +0100 |
| Message-ID | <mailman.6517.1391845924.18130.python-list@python.org> |
| In reply to | #65609 |
Asaf Las wrote: > On Friday, February 7, 2014 9:40:06 PM UTC+2, Peter Otten wrote: >> You never see ".shtml" as the guessed extension because it is not in the >> original mimetypes.types_map dict, but instead programmaticaly read from >> a file like /etc/mime.types and then added to a list of extensions. > as there are bunch of files in mimetypes.py the only repeatability could > be achieved on particular machine level. At least the mimetypes already defined in the module could easily produce the same guessed extension consistently.
[toc] | [prev] | [next] | [standalone]
| From | Asaf Las <roegltd@gmail.com> |
|---|---|
| Date | 2014-02-08 00:24 -0800 |
| Message-ID | <e5cc422f-e324-4ad3-9f24-cf8c462ddf15@googlegroups.com> |
| In reply to | #65638 |
On Saturday, February 8, 2014 9:51:48 AM UTC+2, Peter Otten wrote: > > At least the mimetypes already defined in the module could easily produce > the same guessed extension consistently. imho one workaround for OP could be to supply own map file in init() thus ensure unambiguous mapping across every platform and distribution. guess some libraries already doing that. or write wrapper and process all_guesses to eliminate ambiguity up to needed requirement. that is in case if bug request will be rejected.
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2014-02-08 09:39 +0100 |
| Message-ID | <mailman.6526.1391848759.18130.python-list@python.org> |
| In reply to | #65644 |
Asaf Las wrote: > On Saturday, February 8, 2014 9:51:48 AM UTC+2, Peter Otten wrote: >> >> At least the mimetypes already defined in the module could easily produce >> the same guessed extension consistently. > > imho one workaround for OP could be to supply own map file in init() thus > ensure unambiguous mapping across every platform and distribution. guess > some libraries already doing that. or write wrapper and process > all_guesses to eliminate ambiguity up to needed requirement. > that is in case if bug request will be rejected. You also have to set mimetypes.types_map and mimetypes.common_types to an empty dict (or an OrderedDict).
[toc] | [prev] | [next] | [standalone]
| From | Asaf Las <roegltd@gmail.com> |
|---|---|
| Date | 2014-02-08 02:59 -0800 |
| Message-ID | <3ca96b6c-ff2d-4fe2-8492-e0a8ff961ede@googlegroups.com> |
| In reply to | #65650 |
On Saturday, February 8, 2014 10:39:06 AM UTC+2, Peter Otten wrote: > Asaf Las wrote: > > On Saturday, February 8, 2014 9:51:48 AM UTC+2, Peter Otten wrote: > >> At least the mimetypes already defined in the module could easily produce > >> the same guessed extension consistently. > > imho one workaround for OP could be to supply own map file in init() thus > > ensure unambiguous mapping across every platform and distribution. guess > > some libraries already doing that. or write wrapper and process > > all_guesses to eliminate ambiguity up to needed requirement. > > that is in case if bug request will be rejected. > > You also have to set mimetypes.types_map and mimetypes.common_types to an > empty dict (or an OrderedDict). Hmmm, yes. then the quickest workaround is to get all guesses list then sort it and use the one at index 0.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web