Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #102948

Re: What is heating the memory here? hashlib?

From Paulo da Silva <p_s_d_a_s_i_l_v_a_ns@netcabo.pt>
Newsgroups comp.lang.python
Subject Re: What is heating the memory here? hashlib?
Date 2016-02-15 08:05 +0000
Organization Aioe.org NNTP Server
Message-ID <n9s0sp$u3i$1@gioia.aioe.org> (permalink)
References <n9o06t$1hjo$1@gioia.aioe.org> <56bfe49e$0$1587$c3e8da3$5496439d@news.astraweb.com>

Show all headers | View raw


Às 02:21 de 14-02-2016, Steven D'Aprano escreveu:
> On Sun, 14 Feb 2016 06:29 am, Paulo da Silva wrote:
...

Thanks Steven for your advices.
This is a small script to solve a specific problem.
It will be used in future to solve other similar problems probably with
small changes.
When I found it eating memory and, what I thought was the 1st reason for
that was fixed and it still ate the memory, I thought of something less
obvious. After all it seems there is nothing wrong with it (see my other
post).

> That's your first clue that, perhaps, you should be reading in relatively
> small blocks, more like 4K than 4MB. Sure enough, a quick bit of googling
> shows that typically you should read from files in small-ish chunks, and
> that trying to read in large chunks is often counter-productive:
> 
> https://duckduckgo.com/html/?q=file+read+buffer+size
> 
> The first three links all talk about optimal sizes being measured in small
> multiples of 4K, not 40MB.
>
I didn't know about this!
Most of my files are about ~>30MB. So I chose 40MB to avoid python
loops. After all, python should be able to optimize those things.

> You can try to increase the system buffer, by changing the "open" line to:
> 
>     with open(pathname, 'rb', buffering=40*M) as f:
> 
This is another thing. One thing is the requested amount of data I want
another is to choose de "really" buffer size. (I didn't know about this
argument - thanks).
...

> By the way, do you need a cryptographic checksum? sha256 is expensive to
> calculate. If all you are doing is trying to match files which could have
> the same content, you could use a cheaper hash, like md5 or even crc32.
I don't know the probability of collision of each of them. The script
has sha256 and md5 as options. When the failed execution I had chosen
sha256. I didn't check if it takes much more time. A collision might
cause data loss. So ...

Thank you.
Paulo

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

What is heating the memory here? hashlib? Paulo da Silva <p_s_d_a_s_i_l_v_a_ns@netcabo.pt> - 2016-02-13 19:29 +0000
  Re: What is heating the memory here? hashlib? Paulo da Silva <p_s_d_a_s_i_l_v_a_ns@netcabo.pt> - 2016-02-13 22:26 +0000
    Re: What is heating the memory here? hashlib? Chris Angelico <rosuav@gmail.com> - 2016-02-14 09:45 +1100
      Re: What is heating the memory here? hashlib? Paulo da Silva <p_s_d_a_s_i_l_v_a_ns@netcabo.pt> - 2016-02-14 01:44 +0000
        Re: What is heating the memory here? hashlib? Chris Angelico <rosuav@gmail.com> - 2016-02-14 13:01 +1100
  Re: What is heating the memory here? hashlib? Steven D'Aprano <steve@pearwood.info> - 2016-02-14 13:21 +1100
    Re: What is heating the memory here? hashlib? Paulo da Silva <p_s_d_a_s_i_l_v_a_ns@netcabo.pt> - 2016-02-15 08:05 +0000
  Re: What is heating the memory here? hashlib? Paulo da Silva <p_s_d_a_s_i_l_v_a_ns@netcabo.pt> - 2016-02-14 07:04 +0000
    Re: What is heating the memory here? hashlib? INADA Naoki <songofacandy@gmail.com> - 2016-02-14 18:49 +0900
      Re: What is heating the memory here? hashlib? Paulo da Silva <p_s_d_a_s_i_l_v_a_ns@netcabo.pt> - 2016-02-15 07:38 +0000
    Re: What is heating the memory here? hashlib? Paulo da Silva <p_s_d_a_s_i_l_v_a_ns@netcabo.pt> - 2016-02-15 02:21 +0000
      Re: What is heating the memory here? hashlib? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2016-02-15 09:12 +0100
        Re: What is heating the memory here? hashlib? Paulo da Silva <p_s_d_a_s_i_l_v_a_ns@netcabo.pt> - 2016-02-15 17:29 +0000

csiph-web