Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #102642
| Path | csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail |
|---|---|
| From | Tim Chase <python.list@tim.thechases.com> |
| Newsgroups | comp.lang.python |
| Subject | Re: A sets algorithm |
| Date | Sun, 7 Feb 2016 18:20:50 -0600 |
| Lines | 35 |
| Message-ID | <mailman.82.1454892109.2317.python-list@python.org> (permalink) |
| References | <n98e0f$15lj$1@gioia.aioe.org> <mailman.81.1454885914.2317.python-list@python.org> <n98m3o$1h8k$1@gioia.aioe.org> |
| Mime-Version | 1.0 |
| Content-Type | text/plain; charset=UTF-8 |
| Content-Transfer-Encoding | quoted-printable |
| X-Trace | news.uni-berlin.de Fqk1qMKYp1nm6vgw3GZyugyy4LRPbmgv6a8h+HqRPBLA== |
| Return-Path | <python.list@tim.thechases.com> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.000 |
| X-Spam-Evidence | '*H*': 1.00; '*S*': 0.00; 'cc:addr:python-list': 0.09; '(first': 0.09; 'output?': 0.09; 'headers': 0.15; '-tkc': 0.16; 'bytes).': 0.16; 'caching': 0.16; 'compare.': 0.16; 'equal.': 0.16; 'files)': 0.16; 'from:addr:python.list': 0.16; 'from:addr:tim.thechases.com': 0.16; 'from:name:tim chase': 0.16; 'hashes': 0.16; 'i.e.,': 0.16; 'optionally': 0.16; 'out)': 0.16; 'paulo': 0.16; 'received:10.122': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'useless.': 0.16; 'wrote:': 0.16; 'comparing': 0.18; 'pfxlen:0': 0.18; 'skip:l 30': 0.18; 'input': 0.18; 'cc:2**0': 0.20; 'cc:addr:python.org': 0.20; 'problem:': 0.22; 'cc:no real name:2**0': 0.22; 'seems': 0.23; 'sets': 0.23; 'implemented': 0.24; 'tim': 0.24; 'header:In-Reply-To:1': 0.24; 'points': 0.27; 'defining': 0.27; 'operations,': 0.27; 'function': 0.28; 'chase': 0.29; 'hash': 0.29; 'another': 0.32; 'class': 0.33; '(for': 0.34; 'file': 0.34; 'could': 0.35; 'but': 0.36; '(and': 0.36; 'depends': 0.36; 'faster': 0.36; 'subject:: ': 0.37; 'received:10': 0.37; 'thought': 0.37; 'files': 0.38; 'does': 0.39; 'skip:e 20': 0.39; 'still': 0.40; 'some': 0.40; 'more': 0.63; 'different': 0.63; 'received:46': 0.63; 'information': 0.63; 'chrisa': 0.84 |
| X-Sender-Id | wwwh|x-authuser|tim@thechases.com |
| X-Sender-Id | wwwh|x-authuser|tim@thechases.com |
| X-MC-Relay | Neutral |
| X-MailChannels-SenderId | wwwh|x-authuser|tim@thechases.com |
| X-MailChannels-Auth-Id | wwwh |
| X-MC-Loop-Signature | 1454891009221:3995543210 |
| X-MC-Ingress-Time | 1454891009221 |
| In-Reply-To | <n98m3o$1h8k$1@gioia.aioe.org> |
| X-Mailer | Claws Mail 3.11.1 (GTK+ 2.24.25; x86_64-pc-linux-gnu) |
| X-AuthUser | tim@thechases.com |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.21rc2 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Xref | csiph.com comp.lang.python:102642 |
Show key headers only | View raw
On 2016-02-08 00:05, Paulo da Silva wrote: > Às 22:17 de 07-02-2016, Tim Chase escreveu: >> all_files = list(generate_MyFile_objects()) >> interesting = [ >> (my_file1, my_file2) >> for i, my_file1 >> in enumerate(all_files, 1) >> for my_file2 >> in all_files[i:] >> if my_file1 == my_file2 >> ] > > "my_file1 == my_file2" can be implemented into MyFile class taking > advantage of caching sizes (if different files are different), > hashes or even content (for small files) or file headers (first n > bytes). However this seems to have a problem: > all_files: a b c d e ... > If a==b then comparing b with c,d,e is useless. Depends on what the OP wants to have happen if more than one input file is equal. I.e., a == b == c. Does one just want "a has duplicates" (and optionally "and here's one of them"), or does one want "a == b", "a == c" and "b == c" in the output? > Another solution I thought of, could be defining some methods (I > still don't know which ones) in MyFile so that I could use sets > intersection. Would this one be a faster solution? Adding __hash__ would allow for the set operations, but would require (as ChrisA points out) knowing how to create a hash function that encompasses the information you want to compare. -tkc
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
A sets algorithm Paulo da Silva <p_s_d_a_s_i_l_v_a_ns@netcabo.pt> - 2016-02-07 21:46 +0000
Re: A sets algorithm Chris Angelico <rosuav@gmail.com> - 2016-02-08 08:58 +1100
Re: A sets algorithm Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2016-02-07 22:03 +0000
Re: A sets algorithm Tim Chase <python.list@tim.thechases.com> - 2016-02-07 16:17 -0600
Re: A sets algorithm Paulo da Silva <p_s_d_a_s_i_l_v_a_ns@netcabo.pt> - 2016-02-08 00:05 +0000
Re: A sets algorithm Tim Chase <python.list@tim.thechases.com> - 2016-02-07 18:20 -0600
Re: A sets algorithm Cem Karan <cfkaran2@gmail.com> - 2016-02-07 20:07 -0500
Re: A sets algorithm Paulo da Silva <p_s_d_a_s_i_l_v_a_ns@netcabo.pt> - 2016-02-08 02:22 +0000
Re: A sets algorithm Random832 <random832@fastmail.com> - 2016-02-08 09:49 -0500
Re: A sets algorithm Chris Angelico <rosuav@gmail.com> - 2016-02-09 02:11 +1100
Re: A sets algorithm Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2016-02-09 15:13 +1100
Re: A sets algorithm Chris Angelico <rosuav@gmail.com> - 2016-02-09 15:27 +1100
Re: A sets algorithm Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2016-02-09 17:48 +1300
csiph-web