Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail
From: Tim Chase <python.list@tim.thechases.com>
Newsgroups: comp.lang.python
Subject: Re: A sets algorithm
Date: Sun, 7 Feb 2016 18:20:50 -0600
Lines: 35
Message-ID: <mailman.82.1454892109.2317.python-list@python.org>
References: <n98e0f$15lj$1@gioia.aioe.org> <mailman.81.1454885914.2317.python-list@python.org> <n98m3o$1h8k$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
In-Reply-To: <n98m3o$1h8k$1@gioia.aioe.org>
Precedence: list
Xref: csiph.com comp.lang.python:102642

On 2016-02-08 00:05, Paulo da Silva wrote:
> =C3=80s 22:17 de 07-02-2016, Tim Chase escreveu:
>>   all_files =3D list(generate_MyFile_objects())
>>   interesting =3D [
>>     (my_file1, my_file2)
>>     for i, my_file1
>>     in enumerate(all_files, 1)
>>     for my_file2
>>     in all_files[i:]
>>     if my_file1 =3D=3D my_file2
>>     ]
>=20
> "my_file1 =3D=3D my_file2" can be implemented into MyFile class taking
> advantage of caching sizes (if different files are different),
> hashes or even content (for small files) or file headers (first n
> bytes). However this seems to have a problem:
> all_files: a b c d e ...
> If a=3D=3Db then comparing b with c,d,e is useless.

Depends on what the OP wants to have happen if more than one input
file is equal. I.e., a =3D=3D b =3D=3D c.  Does one just want "a has
duplicates" (and optionally "and here's one of them"), or does one
want "a =3D=3D b", "a =3D=3D c" and "b =3D=3D c" in the output?

> Another solution I thought of, could be defining some methods (I
> still don't know which ones) in MyFile so that I could use sets
> intersection. Would this one be a faster solution?

Adding __hash__ would allow for the set operations, but would
require (as ChrisA points out) knowing how to create a hash function
that encompasses the information you want to compare.

-tkc