Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #78061

Re: program to generate data helpful in finding duplicate large files

Path csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.albasani.net!news.stack.nl!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <rosuav@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.000
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'explicitly': 0.05; 'importing': 0.05; 'output': 0.05; 'say,': 0.05; '(python': 0.07; 'exit': 0.09; 'latter': 0.09; 'option,': 0.09; 'strings.': 0.09; 'subject:files': 0.09; 'trailing': 0.09; 'cc:addr:python-list': 0.11; '(just': 0.16; '2.7.3': 0.16; '__future__': 0.16; 'formatted': 0.16; 'formatting.': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'inclined': 0.16; 'instead:': 0.16; 'integers.': 0.16; 'program),': 0.16; 'subject:program': 0.16; 'size,': 0.16; 'wrote:': 0.18; '(not': 0.18; 'code.': 0.18; 'normally': 0.19; 'seems': 0.21; 'code,': 0.22; 'cc:addr:python.org': 0.22; 'print': 0.22; 'issue,': 0.24; 'cc:2**0': 0.24; 'skip:" 20': 0.27; 'values': 0.27; 'header:In- Reply-To:1': 0.27; 'message-id:@mail.gmail.com': 0.30; "i'm": 0.30; 'code': 0.31; "d'aprano": 0.31; 'explained': 0.31; 'sep': 0.31; 'steven': 0.31; 'text': 0.33; 'fri,': 0.33; 'plain': 0.33; 'something': 0.35; 'convert': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'subject:data': 0.36; 'list': 0.37; 'rich': 0.38; 'work?': 0.38; 'pm,': 0.38; 'anything': 0.39; 'does': 0.39; 'sure': 0.39; 'mailing': 0.39; 'either': 0.39; 'how': 0.40; 'problems.': 0.60; 'simply': 0.61; "you'll": 0.62; 'david,': 0.84; 'that),': 0.91; 'to:none': 0.92
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=RVa1K/YwwBATlBqtpwwkk0VcEDjJCHJ2yH3iO8aQ3PI=; b=gg7YBSDQmK8AKkBTG4b3igzr4YX/FaKjiMojOaVfxS+S8pzS7YQjz6huCt4GULVLSj X3NH/I0BRb81kPzrI00Z6OPzTFxvdBdPwJrAy6CPvAiNgE0E5RRwMk6Eh9iEW1GthZcG P/bQx2taK1cjmYRN4hmMKw1T1g1s7RZ9rCPJkdz3rF2MJ4uSapXZQVDpMQmW3YAlIv+J sjPtlsplpxFtW5yLq1CMQrD0sODZdoT1bpvIkacCvipz7S/YyPTb3YoIpzqViW4UIUy7 t+GEg7j50rsPWDlM/0y9LhQ3oSYxa5vUk9bcT1ZoasF9Fw9xOF0l38r0lNTHitn6VGEe uUXA==
MIME-Version 1.0
X-Received by 10.42.236.197 with SMTP id kl5mr16528243icb.37.1411109122596; Thu, 18 Sep 2014 23:45:22 -0700 (PDT)
In-Reply-To <541bc310$0$29975$c3e8da3$5496439d@news.astraweb.com>
References <mailman.14114.1411063879.18130.python-list@python.org> <541bc310$0$29975$c3e8da3$5496439d@news.astraweb.com>
Date Fri, 19 Sep 2014 16:45:22 +1000
Subject Re: program to generate data helpful in finding duplicate large files
From Chris Angelico <rosuav@gmail.com>
Cc "python-list@python.org" <python-list@python.org>
Content-Type text/plain; charset=UTF-8
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.14135.1411109125.18130.python-list@python.org> (permalink)
Lines 47
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1411109125 news.xs4all.nl 2842 [2001:888:2000:d::a6]:53058
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:78061

Show key headers only | View raw


On Fri, Sep 19, 2014 at 3:45 PM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> David Alban wrote:
>> *import sys*
>
> Um, how did you end up with leading and trailing asterisks? That's going to
> stop your code from running.

They're not part of the code, they're part of the mangling of the
formatting. So this isn't a code issue, it's a mailing list /
newsgroup one. David, if you set your mail/news client to send plain
text only (not rich text or HTML or formatted or anything like that),
you'll avoid these problems.

>> *    sep = ascii_nul*
>
> Seems a strange choice of a delimiter.

But one that he explained in his body :)

>> *    print "%s%c%s%c%d%c%d%c%d%c%d%c%s" % ( thishost, sep, md5sum, sep,
>> dev, sep, ino, sep, nlink, sep, size, sep, file_path )*
>
> Arggh, my brain! *wink*
>
> Try this instead:
>
>     s = '\0'.join([thishost, md5sum, dev, ino, nlink, size, file_path])
>     print s

That won't work on its own; several of the values are integers. So
either they need to be str()'d or something in the output system needs
to know to convert them to strings. I'm inclined to the latter option,
which simply means importing print_function from __future__ and
setting sep=chr(0).

>> *exit( 0 )*
>
> No need to explicitly call sys.exit (just exit won't work) at the end of
> your code.

Hmm, you sure exit won't work? I normally use sys.exit to set return
values (though as you say, it's unnecessary at the end of the
program), but I tested it (Python 2.7.3 on Debian) and it does seem to
be functional. Do you know what provides it?

ChrisA

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

program to generate data helpful in finding duplicate large files David Alban <extasia@extasia.org> - 2014-09-18 11:11 -0700
  Re: program to generate data helpful in finding duplicate large files Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-09-19 15:45 +1000
    Re: program to generate data helpful in finding duplicate large files Chris Angelico <rosuav@gmail.com> - 2014-09-19 16:45 +1000
      Re: program to generate data helpful in finding duplicate large files Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-09-19 21:04 +1000
        Re: program to generate data helpful in finding duplicate large files Chris Angelico <rosuav@gmail.com> - 2014-09-19 21:36 +1000
          Re: program to generate data helpful in finding duplicate large files Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-09-20 09:33 +1000
            Re: program to generate data helpful in finding duplicate large files Chris Angelico <rosuav@gmail.com> - 2014-09-20 14:47 +1000
    Re: program to generate data helpful in finding duplicate large files Ian Kelly <ian.g.kelly@gmail.com> - 2014-09-19 11:20 -0600
    Re: program to generate data helpful in finding duplicate large files Chris Angelico <rosuav@gmail.com> - 2014-09-20 03:36 +1000

csiph-web