Re: program to generate data helpful in finding duplicate large files

Path	csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Return-Path	<rosuav@gmail.com>
X-Original-To	python-list@python.org
Delivered-To	python-list@mail.python.org
X-Spam-Status	OK 0.001
X-Spam-Evidence	'H': 1.00; 'S': 0.00; 'importing': 0.05; 'output': 0.05; 'ugly': 0.07; 'formatting': 0.09; 'latter': 0.09; 'option,': 0.09; 'strings.': 0.09; 'subject:files': 0.09; 'cc:addr :python-list': 0.11; '__future__': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'inclined': 0.16; 'integers.': 0.16; 'personally,': 0.16; 'subject:program': 0.16; 'sat,': 0.16; 'size,': 0.16; 'wrote:': 0.18; '>>>': 0.22; 'code,': 0.22; 'cc:addr:python.org': 0.22; 'print': 0.22; 'cc:2**0': 0.24; 'values': 0.27; 'header:In-Reply-To:1': 0.27; 'chris': 0.29; 'am,': 0.29; 'converting': 0.30; 'message-id:@mail.gmail.com': 0.30; "i'm": 0.30; 'work.': 0.31; "skip:' 10": 0.31; "d'aprano": 0.31; 'sep': 0.31; 'steven': 0.31; 'fri,': 0.33; 'something': 0.35; 'convert': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'subject:data': 0.36; 'pm,': 0.38; 'does': 0.39; 'either': 0.39; 'ian': 0.60; 'simply': 0.61; 'map': 0.64; '20,': 0.68; 'lean': 0.84; 'to:none': 0.92
DKIM-Signature	v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=R9clLoiYE8ercwoTEhMlw5uXv0xOgg+nuLuVZZAkZuk=; b=XQtvCDn3AW13TCgVbrECaYx/ZDpgO74SyU0wNLqToeGIuzMqxWHRN2Gj29OreQH8YS b9wHH2q6v/DPRPeNohsFNnd/nQo/ufMCSde/buknM6pSfa3G4PgvoBvKO8BTymSKn23k n7C1tSF9Wd/lqW3sU+cjpEk89UZVM0jdvvEA935dpu6lHe9r1lhTO3HKNf6zChJKNhwp s41936ZyjlYBRZa0bTK+TOsRAF80XYjO0+MaMM2JiQuzgEw8m5kf/gU0655BLj5Ru0Rk A2Vtis/JFUvW8ggx2O+g9q/2ZhKe/45AOwZHjUmQwrnSmH+Yx6kv3s0s3Nu2Wj0sHMU6 pLmA==
MIME-Version	1.0
X-Received	by 10.112.199.197 with SMTP id jm5mr7761362lbc.19.1411148193565; Fri, 19 Sep 2014 10:36:33 -0700 (PDT)
In-Reply-To	<CALwzidnTwOgLwt1WGs+oqgCfPtB2hc5NgV4KDMZ4xSx5YRXnuA@mail.gmail.com>
References	<mailman.14114.1411063879.18130.python-list@python.org> <541bc310$0$29975$c3e8da3$5496439d@news.astraweb.com> <CAPTjJmpRnN3FaT2EHiFFuMBOArVBuuvf+f4siex95SW6vPGdcQ@mail.gmail.com> <CALwzidnTwOgLwt1WGs+oqgCfPtB2hc5NgV4KDMZ4xSx5YRXnuA@mail.gmail.com>
Date	Sat, 20 Sep 2014 03:36:33 +1000
Subject	Re: program to generate data helpful in finding duplicate large files
From	Chris Angelico <rosuav@gmail.com>
Cc	Python <python-list@python.org>
Content-Type	text/plain; charset=UTF-8
X-BeenThere	python-list@python.org
X-Mailman-Version	2.1.15
Precedence	list
List-Id	General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe	<https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive	<http://mail.python.org/pipermail/python-list/>
List-Post	<mailto:python-list@python.org>
List-Help	<mailto:python-list-request@python.org?subject=help>
List-Subscribe	<https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups	comp.lang.python
Message-ID	<mailman.14151.1411148195.18130.python-list@python.org> (permalink)
Lines	21
NNTP-Posting-Host	2001:888:2000:d::a6
X-Trace	1411148195 news.xs4all.nl 2911 [2001:888:2000:d::a6]:60721
X-Complaints-To	abuse@xs4all.nl
Xref	csiph.com comp.lang.python:78084

Show key headers only | View raw

On Sat, Sep 20, 2014 at 3:20 AM, Ian Kelly <ian.g.kelly@gmail.com> wrote:
> On Fri, Sep 19, 2014 at 12:45 AM, Chris Angelico <rosuav@gmail.com> wrote:
>> On Fri, Sep 19, 2014 at 3:45 PM, Steven D'Aprano
>>>     s = '\0'.join([thishost, md5sum, dev, ino, nlink, size, file_path])
>>>     print s
>>
>> That won't work on its own; several of the values are integers. So
>> either they need to be str()'d or something in the output system needs
>> to know to convert them to strings. I'm inclined to the latter option,
>> which simply means importing print_function from __future__ and
>> setting sep=chr(0).
>
> Personally, I lean toward converting them with map in this case:
>
>     s = '\0'.join(map(str, [thishost, md5sum, dev, ino, nlink, size,
> file_path]))

There are many ways to do it. I'm not seeing this as particularly less
ugly than the original formatting code, tbh, but it does work.

ChrisA

Back to comp.lang.python | Previous | Next — Previous in thread | Find similar | Unroll thread

Thread

program to generate data helpful in finding duplicate large files David Alban <extasia@extasia.org> - 2014-09-18 11:11 -0700
  Re: program to generate data helpful in finding duplicate large files Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-09-19 15:45 +1000
    Re: program to generate data helpful in finding duplicate large files Chris Angelico <rosuav@gmail.com> - 2014-09-19 16:45 +1000
      Re: program to generate data helpful in finding duplicate large files Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-09-19 21:04 +1000
        Re: program to generate data helpful in finding duplicate large files Chris Angelico <rosuav@gmail.com> - 2014-09-19 21:36 +1000
          Re: program to generate data helpful in finding duplicate large files Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-09-20 09:33 +1000
            Re: program to generate data helpful in finding duplicate large files Chris Angelico <rosuav@gmail.com> - 2014-09-20 14:47 +1000
    Re: program to generate data helpful in finding duplicate large files Ian Kelly <ian.g.kelly@gmail.com> - 2014-09-19 11:20 -0600
    Re: program to generate data helpful in finding duplicate large files Chris Angelico <rosuav@gmail.com> - 2014-09-20 03:36 +1000

csiph-web