Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #102642

Re: A sets algorithm

Path csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail
From Tim Chase <python.list@tim.thechases.com>
Newsgroups comp.lang.python
Subject Re: A sets algorithm
Date Sun, 7 Feb 2016 18:20:50 -0600
Lines 35
Message-ID <mailman.82.1454892109.2317.python-list@python.org> (permalink)
References <n98e0f$15lj$1@gioia.aioe.org> <mailman.81.1454885914.2317.python-list@python.org> <n98m3o$1h8k$1@gioia.aioe.org>
Mime-Version 1.0
Content-Type text/plain; charset=UTF-8
Content-Transfer-Encoding quoted-printable
X-Trace news.uni-berlin.de Fqk1qMKYp1nm6vgw3GZyugyy4LRPbmgv6a8h+HqRPBLA==
Return-Path <python.list@tim.thechases.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.000
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'cc:addr:python-list': 0.09; '(first': 0.09; 'output?': 0.09; 'headers': 0.15; '-tkc': 0.16; 'bytes).': 0.16; 'caching': 0.16; 'compare.': 0.16; 'equal.': 0.16; 'files)': 0.16; 'from:addr:python.list': 0.16; 'from:addr:tim.thechases.com': 0.16; 'from:name:tim chase': 0.16; 'hashes': 0.16; 'i.e.,': 0.16; 'optionally': 0.16; 'out)': 0.16; 'paulo': 0.16; 'received:10.122': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'useless.': 0.16; 'wrote:': 0.16; 'comparing': 0.18; 'pfxlen:0': 0.18; 'skip:l 30': 0.18; 'input': 0.18; 'cc:2**0': 0.20; 'cc:addr:python.org': 0.20; 'problem:': 0.22; 'cc:no real name:2**0': 0.22; 'seems': 0.23; 'sets': 0.23; 'implemented': 0.24; 'tim': 0.24; 'header:In-Reply-To:1': 0.24; 'points': 0.27; 'defining': 0.27; 'operations,': 0.27; 'function': 0.28; 'chase': 0.29; 'hash': 0.29; 'another': 0.32; 'class': 0.33; '(for': 0.34; 'file': 0.34; 'could': 0.35; 'but': 0.36; '(and': 0.36; 'depends': 0.36; 'faster': 0.36; 'subject:: ': 0.37; 'received:10': 0.37; 'thought': 0.37; 'files': 0.38; 'does': 0.39; 'skip:e 20': 0.39; 'still': 0.40; 'some': 0.40; 'more': 0.63; 'different': 0.63; 'received:46': 0.63; 'information': 0.63; 'chrisa': 0.84
X-Sender-Id wwwh|x-authuser|tim@thechases.com
X-Sender-Id wwwh|x-authuser|tim@thechases.com
X-MC-Relay Neutral
X-MailChannels-SenderId wwwh|x-authuser|tim@thechases.com
X-MailChannels-Auth-Id wwwh
X-MC-Loop-Signature 1454891009221:3995543210
X-MC-Ingress-Time 1454891009221
In-Reply-To <n98m3o$1h8k$1@gioia.aioe.org>
X-Mailer Claws Mail 3.11.1 (GTK+ 2.24.25; x86_64-pc-linux-gnu)
X-AuthUser tim@thechases.com
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.21rc2
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Xref csiph.com comp.lang.python:102642

Show key headers only | View raw


On 2016-02-08 00:05, Paulo da Silva wrote:
> Às 22:17 de 07-02-2016, Tim Chase escreveu:
>>   all_files = list(generate_MyFile_objects())
>>   interesting = [
>>     (my_file1, my_file2)
>>     for i, my_file1
>>     in enumerate(all_files, 1)
>>     for my_file2
>>     in all_files[i:]
>>     if my_file1 == my_file2
>>     ]
> 
> "my_file1 == my_file2" can be implemented into MyFile class taking
> advantage of caching sizes (if different files are different),
> hashes or even content (for small files) or file headers (first n
> bytes). However this seems to have a problem:
> all_files: a b c d e ...
> If a==b then comparing b with c,d,e is useless.

Depends on what the OP wants to have happen if more than one input
file is equal. I.e., a == b == c.  Does one just want "a has
duplicates" (and optionally "and here's one of them"), or does one
want "a == b", "a == c" and "b == c" in the output?

> Another solution I thought of, could be defining some methods (I
> still don't know which ones) in MyFile so that I could use sets
> intersection. Would this one be a faster solution?

Adding __hash__ would allow for the set operations, but would
require (as ChrisA points out) knowing how to create a hash function
that encompasses the information you want to compare.

-tkc

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

A sets algorithm Paulo da Silva <p_s_d_a_s_i_l_v_a_ns@netcabo.pt> - 2016-02-07 21:46 +0000
  Re: A sets algorithm Chris Angelico <rosuav@gmail.com> - 2016-02-08 08:58 +1100
  Re: A sets algorithm Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2016-02-07 22:03 +0000
  Re: A sets algorithm Tim Chase <python.list@tim.thechases.com> - 2016-02-07 16:17 -0600
    Re: A sets algorithm Paulo da Silva <p_s_d_a_s_i_l_v_a_ns@netcabo.pt> - 2016-02-08 00:05 +0000
      Re: A sets algorithm Tim Chase <python.list@tim.thechases.com> - 2016-02-07 18:20 -0600
  Re: A sets algorithm Cem Karan <cfkaran2@gmail.com> - 2016-02-07 20:07 -0500
  Re: A sets algorithm Paulo da Silva <p_s_d_a_s_i_l_v_a_ns@netcabo.pt> - 2016-02-08 02:22 +0000
  Re: A sets algorithm Random832 <random832@fastmail.com> - 2016-02-08 09:49 -0500
  Re: A sets algorithm Chris Angelico <rosuav@gmail.com> - 2016-02-09 02:11 +1100
    Re: A sets algorithm Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2016-02-09 15:13 +1100
      Re: A sets algorithm Chris Angelico <rosuav@gmail.com> - 2016-02-09 15:27 +1100
        Re: A sets algorithm Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2016-02-09 17:48 +1300

csiph-web