Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #78079

Re: program to generate data helpful in finding duplicate large files

Path csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.mixmin.net!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <rosuav@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.001
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'python.': 0.02; 'binary': 0.07; 'differently.': 0.09; 'english,': 0.09; 'exit': 0.09; 'false,': 0.09; 'omit': 0.09; 'second.': 0.09; 'subject:files': 0.09; 'terminated': 0.09; 'windows,': 0.09; 'cc:addr:python-list': 0.11; 'python': 0.11; '"\\r\\n"': 0.16; "'rb')": 0.16; '11:32': 0.16; 'determines': 0.16; 'file).': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'responses.': 0.16; 'set,': 0.16; 'subject:program': 0.16; 'files.': 0.16; 'folks': 0.16; 'wrote:': 0.18; 'all,': 0.19; 'file,': 0.19; 'thu,': 0.19; 'platforms': 0.22; 'shell': 0.22; 'cc:addr:python.org': 0.22; 'either.': 0.24; 'logical': 0.24; 'specify': 0.24; 'cc:2**0': 0.24; 'possibly': 0.26; 'header:In-Reply-To:1': 0.27; 'appear': 0.29; 'chris': 0.29; 'am,': 0.29; 'mode': 0.30; 'statement': 0.30; 'message-id:@mail.gmail.com': 0.30; "i'm": 0.30; 'code': 0.31; 'that.': 0.31; 'usually': 0.31; 'block,': 0.31; 'question:': 0.31; 'sep': 0.31; 'universal': 0.31; 'quite': 0.32; 'open': 0.33; 'fri,': 0.33; 'style': 0.33; 'sense': 0.34; 'something': 0.35; 'case,': 0.35; 'convert': 0.35; 'one,': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'really': 0.36; 'impression': 0.36; 'subject:data': 0.36; 'doing': 0.36; 'thanks': 0.36; 'should': 0.36; 'error.': 0.37; 'example,': 0.37; 'starting': 0.37; 'files': 0.38; 'pm,': 0.38; 'short': 0.38; 'expect': 0.39; 'does': 0.39; 'skip:p 20': 0.39; 'ensure': 0.60; 'expression': 0.60; "you're": 0.61; 'first': 0.61; 'more': 0.64; '11:45': 0.84; 'careless': 0.84; 'closes': 0.84; 'fortunately,': 0.84; 'terrible': 0.84; 'opens': 0.91; 'to:none': 0.92
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=Stfc8kLoTQCqZZaYn/LpYlh44aVfGFooIruR4HQFhNM=; b=Qi8GPFq2PqmDJ8AclhxJUDoilajkrS1iEA5rx9vQd715exxVnHK0vXAIJdsiTFhVl4 icIVxOesLD7g9KaiFetYn1p4BKlUBj5Ing+WqdLCY64ZbRXi6KgFjPQBqIg994MfiWfH VM59ePra3h9KwNRQnmrUmqwD/vfJSekOizR9QM0LgHr61nzUkUS1gH/eia5bt53B3Nbh OnropeEvUTcMCsBen7wpbWs5AiorRR8YVmrbCq+FSQb/H+1Q0KCyEj/rl/vE7CPTS2bX xweLGHwZrqHZ/GvrVp7Q9p1XZkeqXQM9KqndOrtt4vm5an47p1W6NeIWs/AbK88+Y0uv 2ErQ==
MIME-Version 1.0
X-Received by 10.50.20.169 with SMTP id o9mr26911895ige.14.1411135174989; Fri, 19 Sep 2014 06:59:34 -0700 (PDT)
In-Reply-To <CALDD_=n0mftV0TFHAFiACgyyDSijE-fR7PiO3O94QYhk+5f6Ew@mail.gmail.com>
References <CALDD_=n0mftV0TFHAFiACgyyDSijE-fR7PiO3O94QYhk+5f6Ew@mail.gmail.com>
Date Fri, 19 Sep 2014 23:59:34 +1000
Subject Re: program to generate data helpful in finding duplicate large files
From Chris Angelico <rosuav@gmail.com>
Cc "python-list@python.org" <python-list@python.org>
Content-Type text/plain; charset=UTF-8
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.14147.1411135178.18130.python-list@python.org> (permalink)
Lines 58
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1411135178 news.xs4all.nl 2895 [2001:888:2000:d::a6]:56956
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:78079

Show key headers only | View raw


On Fri, Sep 19, 2014 at 11:32 PM, David Alban <extasia@extasia.org> wrote:
> thanks for the responses.   i'm having quite a good time learning python.

Awesome! But while you're at it, you may want to consider learning
English on the side; capitalization does make your prose more
readable. Also, it makes you look careless - you appear not to care
about your English, so it's logical to expect that you may not care
about your Python either. That may be completely false, but it's still
the impression you're creating.

> On Thu, Sep 18, 2014 at 11:45 AM, Chris Kaynor <ckaynor@zindagigames.com>
> wrote:
>>
>> Additionally, you may want to specify binary mode by using open(file_path,
>> 'rb') to ensure platform-independence ('r' uses Universal newlines, which
>> means on Windows, Python will convert "\r\n" to "\n" while reading the
>> file). Additionally, some platforms will treat binary files differently.
>
> would it be good to use 'rb' all the time?

Only if you're reading binary files. In the program you're doing here,
yes; you want binary mode.

> if you omit the exit statement it in this example, and
> $report_mode is not set, your shell program will give a non-zero return code
> and appear to have terminated with an error.  in shell the last expression
> evaluated determines the return code to the os.

IMO that's a terrible misfeature. If you actually want the return
value to be propagated, you should have to say so - something like:

#!/bin/sh
run_program
exit $?

Fortunately, Python isn't like that.

> style question:  if there is only one, possibly short statement in a block,
> do folks usually move it up to the line starting the block?
>
>   if not S_ISREG( mode ) or S_ISLNK( mode ):
>     return
>
> vs.
>
>   if not S_ISREG( mode ) or S_ISLNK( mode ): return
>
> or even:
>
>   with open( file_path, 'rb' ) as f: md5sum = md5_for_file( file_path )

Only if it's really short AND it makes very good sense that way. Some
people would say "never". In the first case, I might do it, but not
the second. (Though that's not necessary at all, there; md5_for_file
opens and closes the file, so you don't need to open it redundantly
before calling.)

ChrisA

Back to comp.lang.python | Previous | NextNext in thread | Find similar | Unroll thread


Thread

Re: program to generate data helpful in finding duplicate large files Chris Angelico <rosuav@gmail.com> - 2014-09-19 23:59 +1000
  Re: program to generate data helpful in finding duplicate large files Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-09-20 02:22 +1000
    Re: program to generate data helpful in finding duplicate large files Chris Angelico <rosuav@gmail.com> - 2014-09-20 03:07 +1000
    Re: program to generate data helpful in finding duplicate large files Cameron Simpson <cs@zip.com.au> - 2014-09-20 10:30 +1000
    Re: program to generate data helpful in finding duplicate large files Ben Finney <ben+python@benfinney.id.au> - 2014-09-20 16:29 +1000

csiph-web