Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #102898

Re: What is heating the memory here? hashlib?

From Chris Angelico <rosuav@gmail.com>
Newsgroups comp.lang.python
Subject Re: What is heating the memory here? hashlib?
Date 2016-02-14 13:01 +1100
Message-ID <mailman.102.1455415322.22075.python-list@python.org> (permalink)
References <n9o06t$1hjo$1@gioia.aioe.org> <n9oaja$39n$1@gioia.aioe.org> <mailman.99.1455403508.22075.python-list@python.org> <n9om6l$hkt$1@gioia.aioe.org>

Show all headers | View raw


On Sun, Feb 14, 2016 at 12:44 PM, Paulo da Silva
<p_s_d_a_s_i_l_v_a_ns@netcabo.pt> wrote:
>> What happens if, after hashing each file (and returning from this
>> function), you call gc.collect()? If that reduces your RAM usage, you
>> have reference cycles somewhere.
>>
> I have used gc and del. No luck.
>
> The most probable cause seems to be hashlib not correctly handling big
> buffers updates. I am working in a computer and testing in another. For
> the second part may be somehow I forgot to transfer the change to the
> other computer. Unlikely but possible.

I'd like to see the problem boiled down to just the hashlib calls.
Something like this:

import hashlib
data = b"*" * 4*1024*1024
lastdig = None
while "simulating files":
    h = hashlib.sha256()
    hu = h.update
    for chunk in range(100):
        hu(data)
        dig = h.hexdigest()
    if lastdig is None:
        lastdig = dig
        print("Digest:",dig)
    else:
        if lastdig != dig:
            print("Digest fail!")

Running this on my system (Python 3.6 on Debian Linux) produces a
long-running process with stable memory usage, which is exactly what
I'd expect. Even using different data doesn't change that:

import hashlib
import itertools
byte = itertools.count()
data = b"*" * 4*1024*1024
while "simulating files":
    h = hashlib.sha256()
    hu = h.update
    for chunk in range(100):
        hu(data + bytes([next(byte)&255]))
    dig = h.hexdigest()
    print("Digest:",dig)

Somewhere between my code and yours is something that consumes all
that memory. Can you neuter the actual disk reading (replacing it with
constants, like this) and make a complete and shareable program that
leaks all that memory?

ChrisA

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

What is heating the memory here? hashlib? Paulo da Silva <p_s_d_a_s_i_l_v_a_ns@netcabo.pt> - 2016-02-13 19:29 +0000
  Re: What is heating the memory here? hashlib? Paulo da Silva <p_s_d_a_s_i_l_v_a_ns@netcabo.pt> - 2016-02-13 22:26 +0000
    Re: What is heating the memory here? hashlib? Chris Angelico <rosuav@gmail.com> - 2016-02-14 09:45 +1100
      Re: What is heating the memory here? hashlib? Paulo da Silva <p_s_d_a_s_i_l_v_a_ns@netcabo.pt> - 2016-02-14 01:44 +0000
        Re: What is heating the memory here? hashlib? Chris Angelico <rosuav@gmail.com> - 2016-02-14 13:01 +1100
  Re: What is heating the memory here? hashlib? Steven D'Aprano <steve@pearwood.info> - 2016-02-14 13:21 +1100
    Re: What is heating the memory here? hashlib? Paulo da Silva <p_s_d_a_s_i_l_v_a_ns@netcabo.pt> - 2016-02-15 08:05 +0000
  Re: What is heating the memory here? hashlib? Paulo da Silva <p_s_d_a_s_i_l_v_a_ns@netcabo.pt> - 2016-02-14 07:04 +0000
    Re: What is heating the memory here? hashlib? INADA Naoki <songofacandy@gmail.com> - 2016-02-14 18:49 +0900
      Re: What is heating the memory here? hashlib? Paulo da Silva <p_s_d_a_s_i_l_v_a_ns@netcabo.pt> - 2016-02-15 07:38 +0000
    Re: What is heating the memory here? hashlib? Paulo da Silva <p_s_d_a_s_i_l_v_a_ns@netcabo.pt> - 2016-02-15 02:21 +0000
      Re: What is heating the memory here? hashlib? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2016-02-15 09:12 +0100
        Re: What is heating the memory here? hashlib? Paulo da Silva <p_s_d_a_s_i_l_v_a_ns@netcabo.pt> - 2016-02-15 17:29 +0000

csiph-web