Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > linux.debian.maint.python > #8408
| Path | csiph.com!feeder.erje.net!2.us.feeder.erje.net!newsfeed.fsmpi.rwth-aachen.de!newsfeed.straub-nv.de!news.mixmin.net!aioe.org!bofh.it!news.nic.it!robomod |
|---|---|
| From | Cara <ceridwen.mailing.lists@gmail.com> |
| Newsgroups | linux.debian.maint.python |
| Subject | CPython hash randomization makes some Python packages unreproducible |
| Date | Sat, 09 Apr 2016 19:30:02 +0200 |
| Message-ID | <rm4RQ-8vg-11@gated-at.bofh.it> (permalink) |
| X-Original-To | debian-python@lists.debian.org |
| X-Mailbox-Line | From debian-python-request@lists.debian.org Sat Apr 9 17:25:58 2016 |
| Old-Return-Path | <ceridwen.mailing.lists@gmail.com> |
| X-Amavis-Spam-Status | No, score=-7.679 tagged_above=-10000 required=5.3 tests=[BAYES_00=-2, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FOURLA=0.1, FREEMAIL_FROM=0.001, LDO_WHITELIST=-5, MURPHY_DRUGS_REL8=0.02, RCVD_IN_DNSWL_LOW=-0.7] autolearn=ham autolearn_force=no |
| X-Policyd-Weight | using cached result; rate: -7 |
| Dkim-Signature | v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:subject:from:to:date:mime-version :content-transfer-encoding; bh=OvA3zZkxph56fZbIc12dxBbafyRmeL0MKnG5GlNM62o=; b=bRI+4WV38cL3cIcO2UorJzbfHRjnepoh8WjgY2qPzQkGiyBuZeLldd82bCJIk12K3f zp+Q5TFoORdwIcZbkds/237cuK80610PWxQO1fxh1fgklS/fEDVd4z1K4uBUb1g15qGm VNEGnlU/x+aTUBzBulHO2KBBq0Mp+88y/hhw07eFRnZYu1NcxG3Nbjq3K9IdexWOo/DY PAJVTDFP5oQiIqxa6Rl8X2N9ogWR6ZUGSueW4gyy5PMDFFaksuaIPuEeTZ2uTNlvHYtc 4kckjQbXTwt4zBX/QoRoRo4L5WBuhsniWU8n56UahHFiWKGc25mnj6BU3dCop+J7bo0b YLCg== |
| X-Google-Dkim-Signature | v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:subject:from:to:date:mime-version :content-transfer-encoding; bh=OvA3zZkxph56fZbIc12dxBbafyRmeL0MKnG5GlNM62o=; b=cQisq9DPOgd5e1dWNRzZqvcN23zrVSV9dWFC2iDih0CIq3da9KpmLnZH/pE+mcvcBg aVXl/3SIMmNEz5BKX/yjYUv5B7grojfMR0LwuNLgiGStqcrDcvD1KgL9U8udzzCkcy4j Vbo1khNmH830wz7V9gJj8DQn7kR/+96CnzjCeOi/vasurn6Ms5Py3Fj8rjkZ6qUSHH1l 6LnnIWCj8bcm0PPIL74LDZW8Quq05BENCLnV1tV5sQT9c0e8KjLylFoiSGS3WChmpcGN bfhSnh2yz5yzUeguFSRdxhmv/3FU+0FTYgpMxcAQ7eV1VhAXsQYVes7lKmJagSScVCBH qmvg== |
| X-Gm-Message-State | AD7BkJIC8FJ/fc7v4HnoGi3yHW0l6HBKHzzDmcoyK+BjNKQkI20RDGdIXtA2zJsZU42N1Q== |
| X-Received | by 10.37.90.4 with SMTP id o4mr7861055ybb.150.1460222740846; Sat, 09 Apr 2016 10:25:40 -0700 (PDT) |
| Content-Type | text/plain; charset="UTF-8" |
| X-Mailer | Evolution 3.16.5-1ubuntu3.1 |
| MIME-Version | 1.0 |
| Content-Transfer-Encoding | 7bit |
| X-Mailing-List | <debian-python@lists.debian.org> archive/latest/13735 |
| List-ID | <debian-python.lists.debian.org> |
| List-URL | <https://lists.debian.org/debian-python/> |
| List-Archive | https://lists.debian.org/msgid-search/1460222739.5012.44.camel@gmail.com |
| Approved | robomod@news.nic.it |
| Lines | 52 |
| Organization | linux.* mail to news gateway |
| Sender | robomod@news.nic.it |
| X-Original-Date | Sat, 09 Apr 2016 13:25:39 -0400 |
| X-Original-Message-ID | <1460222739.5012.44.camel@gmail.com> |
| Xref | csiph.com linux.debian.maint.python:8408 |
Show key headers only | View raw
I've been investigating why some Python packages are unreproducible[1]
and have discovered that in some cases the problem can be traced to
CPython's hash randomization. This happens any time a package writes
files that depend on the iteration order over dictionaries or sets. An
example is python-phply[2], which depends on PLY[3], an LALR parser for
Python. After being given a grammar, PLY generates LALR parse tables
and writes these tables to a file to avoid needing to regenerate them,
and in generating the file, PLY iterates over dict.items()[4]. This
problem has also occurred in other contexts, for instance Sphinx had a
reproducibility issue[5] that related to hash randomization. Another
example is pickle: running the following script under CPython will
generate different pickle files with different values of PYTHONHASHSEED
because the order in which a dictionary is created affects its pickle.
import pickle
d = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
pickle.dump(dict(d.items()), open('temp.pickle', 'wb'))
There's often no simple solution for these problems at the level of the
packages themselves. In PLY's case, trying to sort the parse tables
before writing them to file doesn't work because of how it iterates
over dictionaries during table generation[6]. I doubt that the other
proposed solution in that Github issue, using an ordered dictionary,
will be accepted by David Beaz because it would cause a significant
performance hit on CPython <3.4, particularly CPython 2.7, because a C
implementation of ordered dictionaries was only added in 3.5. More
broadly, trying to patch every individual Python package that's
affected is impractical, both because of the number of affected
packages and the possibility that any individual patch can be quite
complicated if it's even possible.
I think a better solution is disabling hash randomization by setting
PYTHONHASHSEED=0 when building Python packages with CPython for Debian,
probably somewhere in dh-python. Note that this isn't necessary for
PyPy, which doesn't have hash randomization[7]. Hash randomization was
implemented to prevent, "[H]ash collisions [being] exploited to DoS a
web framework that automatically parses input forms into
dictionaries"[8]. This shouldn't be an issue at build-time, as any
time CPython is run to read in the files written during the build, hash
randomization will be enabled again.
Ceridwen
[1] https://wiki.debian.org/ReproducibleBuilds
[2] https://packages.debian.org/stretch/python/python-phply
[3] https://github.com/dabeaz/ply
[4] https://github.com/dabeaz/ply/blob/master/ply/yacc.py#L2733
[5] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=795976;msg=29
[6] https://github.com/dabeaz/ply/issues/79
[7] http://doc.pypy.org/en/latest/cpython_differences.html#miscellaneous
[8] https://bugs.python.org/issue13703
Back to linux.debian.maint.python | Previous | Next — Next in thread | Find similar | Unroll thread
CPython hash randomization makes some Python packages unreproducible Cara <ceridwen.mailing.lists@gmail.com> - 2016-04-09 19:30 +0200
Re: CPython hash randomization makes some Python packages unreproducible Julien Cristau <jcristau@debian.org> - 2016-04-09 20:20 +0200
Re: CPython hash randomization makes some Python packages unreproducible Barry Warsaw <barry@debian.org> - 2016-04-11 20:20 +0200
csiph-web