Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #36627
| Path | csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!news.stack.nl!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <d@davea.name> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.046 |
| X-Spam-Evidence | '*H*': 0.91; '*S*': 0.00; 'mrab': 0.05; 'repeated': 0.07; 'satisfy': 0.09; 'second.': 0.09; 'subtract': 0.09; 'resulting': 0.13; 'roy': 0.16; 'set,': 0.16; 'subject:sample': 0.16; 'values:': 0.16; 'wrote:': 0.17; 'items.': 0.17; 'typical': 0.17; 'sets': 0.23; 'split': 0.23; 'header:In-Reply-To:1': 0.25; 'header:User-Agent:1': 0.26; 'am,': 0.27; 'way?': 0.29; 'convert': 0.29; 'could': 0.32; 'to:addr:python-list': 0.33; 'another': 0.33; 'list': 0.35; 'collecting': 0.35; 'subject:?': 0.35; 'there': 0.35; 'next': 0.35; 'list.': 0.35; 'but': 0.36; '(i.e.': 0.36; 'smaller': 0.36; 'should': 0.36; 'best,': 0.37; 'item': 0.37; 'subject:: ': 0.38; 'some': 0.38; 'sure': 0.38; 'to:addr:python.org': 0.39; 'received:192': 0.39; 'list,': 0.39; 'received:192.168': 0.40; 'your': 0.60; 'first': 0.61; 'back': 0.62; 'between': 0.63; 'more': 0.63; 'offer': 0.65; 'header:Reply- To:1': 0.68; 'received:74.208': 0.71; 'smith': 0.71; 'reply-to:no real name:2**0': 0.72; 'received:74.208.4.194': 0.84; '100,000': 0.91 |
| Date | Fri, 11 Jan 2013 10:14:15 -0500 |
| From | Dave Angel <d@davea.name> |
| User-Agent | Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121011 Thunderbird/16.0.1 |
| MIME-Version | 1.0 |
| To | python-list@python.org |
| Subject | Re: Multiple disjoint sample sets? |
| References | <roy-7E69C0.09152911012013@news.panix.com> <50F02389.70507@mrabarnett.plus.com> |
| In-Reply-To | <50F02389.70507@mrabarnett.plus.com> |
| Content-Type | text/plain; charset=ISO-8859-1 |
| Content-Transfer-Encoding | 7bit |
| X-Provags-ID | V02:K0:8RlghNz13lBPX3sESexlgrb59rzDzoo2m6to0l+mwUm g42HA0vi2OgN77+/fd/DgHIb/tPQ1sPuVJ26uSYHUizUn/HKwd nrc7YcPD84QaAuOyefBZoamOaCxkpqj1MIj0RRxnWs/yCqHZWn LFVz04I6YP4/9xx5cuXafls/1j0VuCVDz31UI1+Nvbi4pV0iKi Q2sNxutCnk64KMOYQH7GXgdqy5vWrwzlrphm8bW+Ztkxf3oYfv XmrxrRpf0B5gYCOlZe8ySfOMPEAmwsZDJipd+3K4mFUmNNrSmp 6qsAcVIAog9cTchsCbyrIxIu+qS7b75bO7jLDHowtmHqPsuEQ= = |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.15 |
| Precedence | list |
| Reply-To | d@davea.name |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.402.1357917280.2939.python-list@python.org> (permalink) |
| Lines | 34 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1357917280 news.xs4all.nl 6892 [2001:888:2000:d::a6]:42656 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | csiph.com comp.lang.python:36627 |
Show key headers only | View raw
On 01/11/2013 09:36 AM, MRAB wrote: > On 2013-01-11 14:15, Roy Smith wrote: >> I have a list of items. I need to generate n samples of k unique items >> each. I not only want each sample set to have no repeats, but I also >> want to make sure the sets are disjoint (i.e. no item repeated between >> sets). >> >> random.sample(items, k) will satisfy the first constraint, but not the >> second. Should I just do random.sample(items, k*n), and then split the >> resulting big list into n pieces? Or is there some more efficient way? >> >> Typical values: >> >> len(items) = 5,000,000 >> n = 10 >> k = 100,000 >> > I don't know how efficient it would be, but couldn't you shuffle the > list and then use slicing to get the samples? I like that answer best, but just to offer another choice... You start with a (presumably unique) list of items. After collecting your first sample, you could subtract them from the list, and use the smaller list for the next sample. One way is to convert list to set, subtract, then convert back to list. -- DaveA
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Multiple disjoint sample sets? Roy Smith <roy@panix.com> - 2013-01-11 09:15 -0500 Re: Multiple disjoint sample sets? MRAB <python@mrabarnett.plus.com> - 2013-01-11 14:36 +0000 Re: Multiple disjoint sample sets? Dave Angel <d@davea.name> - 2013-01-11 10:14 -0500 Re: Multiple disjoint sample sets? Peter Otten <__peter__@web.de> - 2013-01-13 11:16 +0100
csiph-web