Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #100011 > unrolled thread
| Started by | duncan smith <duncan@invalid.invalid> |
|---|---|
| First post | 2015-12-04 19:43 +0000 |
| Last post | 2015-12-05 00:18 +0000 |
| Articles | 5 — 3 participants |
Back to article view | Back to comp.lang.python
counting unique numpy subarrays duncan smith <duncan@invalid.invalid> - 2015-12-04 19:43 +0000
RE: counting unique numpy subarrays Albert-Jan Roskam <sjeik_appie@hotmail.com> - 2015-12-04 22:36 +0000
Re: counting unique numpy subarrays duncan smith <duncan@invalid.invalid> - 2015-12-05 00:13 +0000
Re: counting unique numpy subarrays Peter Otten <__peter__@web.de> - 2015-12-05 00:06 +0100
Re: counting unique numpy subarrays duncan smith <duncan@invalid.invalid> - 2015-12-05 00:18 +0000
| From | duncan smith <duncan@invalid.invalid> |
|---|---|
| Date | 2015-12-04 19:43 +0000 |
| Subject | counting unique numpy subarrays |
| Message-ID | <Q1m8y.334924$rR1.113623@fx19.iad> |
Hello,
I'm trying to find a computationally efficient way of identifying
unique subarrays, counting them and returning an array containing only
the unique subarrays and a corresponding 1D array of counts. The
following code works, but is a bit slow.
###############
from collections import Counter
import numpy
def bag_data(data):
# data (a numpy array) is bagged along axis 0
# returns concatenated array and corresponding array of counts
vec_shape = data.shape[1:]
counts = Counter(tuple(arr.flatten()) for arr in data)
data_out = numpy.zeros((len(counts),) + vec_shape)
cnts = numpy.zeros((len(counts,)))
for i, (tup, cnt) in enumerate(counts.iteritems()):
data_out[i] = numpy.array(tup).reshape(vec_shape)
cnts[i] = cnt
return data_out, cnts
###############
I've been looking through the numpy docs, but don't seem to be able to
come up with a clean solution that avoids Python loops. TIA for any
useful pointers. Cheers.
Duncan
[toc] | [next] | [standalone]
| From | Albert-Jan Roskam <sjeik_appie@hotmail.com> |
|---|---|
| Date | 2015-12-04 22:36 +0000 |
| Message-ID | <mailman.208.1449268679.14615.python-list@python.org> |
| In reply to | #100011 |
Hi (Sorry for topposting) numpy.ravel is faster than numpy.flatten (no copy) numpy.empty is faster than numpy.zeros numpy.fromiter might be useful to avoid the loop (just a hunch) Albert-Jan > From: duncan@invalid.invalid > Subject: counting unique numpy subarrays > Date: Fri, 4 Dec 2015 19:43:35 +0000 > To: python-list@python.org > > Hello, > I'm trying to find a computationally efficient way of identifying > unique subarrays, counting them and returning an array containing only > the unique subarrays and a corresponding 1D array of counts. The > following code works, but is a bit slow. > > ############### > > from collections import Counter > import numpy > > def bag_data(data): > # data (a numpy array) is bagged along axis 0 > # returns concatenated array and corresponding array of counts > vec_shape = data.shape[1:] > counts = Counter(tuple(arr.flatten()) for arr in data) > data_out = numpy.zeros((len(counts),) + vec_shape) > cnts = numpy.zeros((len(counts,))) > for i, (tup, cnt) in enumerate(counts.iteritems()): > data_out[i] = numpy.array(tup).reshape(vec_shape) > cnts[i] = cnt > return data_out, cnts > > ############### > > I've been looking through the numpy docs, but don't seem to be able to > come up with a clean solution that avoids Python loops. TIA for any > useful pointers. Cheers. > > Duncan > -- > https://mail.python.org/mailman/listinfo/python-list
[toc] | [prev] | [next] | [standalone]
| From | duncan smith <duncan@invalid.invalid> |
|---|---|
| Date | 2015-12-05 00:13 +0000 |
| Message-ID | <e%p8y.287627$dc2.166641@fx24.iad> |
| In reply to | #100016 |
On 04/12/15 22:36, Albert-Jan Roskam wrote: > Hi > > (Sorry for topposting) > > numpy.ravel is faster than numpy.flatten (no copy) > numpy.empty is faster than numpy.zeros > numpy.fromiter might be useful to avoid the loop (just a hunch) > > Albert-Jan > Thanks, I'd forgotten the difference between numpy. flatten and numpy.ravel. I wasn't even aware of numpy.empty. Duncan
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2015-12-05 00:06 +0100 |
| Message-ID | <mailman.213.1449270605.14615.python-list@python.org> |
| In reply to | #100011 |
duncan smith wrote:
> Hello,
> I'm trying to find a computationally efficient way of identifying
> unique subarrays, counting them and returning an array containing only
> the unique subarrays and a corresponding 1D array of counts. The
> following code works, but is a bit slow.
>
> ###############
>
> from collections import Counter
> import numpy
>
> def bag_data(data):
> # data (a numpy array) is bagged along axis 0
> # returns concatenated array and corresponding array of counts
> vec_shape = data.shape[1:]
> counts = Counter(tuple(arr.flatten()) for arr in data)
> data_out = numpy.zeros((len(counts),) + vec_shape)
> cnts = numpy.zeros((len(counts,)))
> for i, (tup, cnt) in enumerate(counts.iteritems()):
> data_out[i] = numpy.array(tup).reshape(vec_shape)
> cnts[i] = cnt
> return data_out, cnts
>
> ###############
>
> I've been looking through the numpy docs, but don't seem to be able to
> come up with a clean solution that avoids Python loops.
Me neither :(
> TIA for any
> useful pointers. Cheers.
Here's what I have so far:
def bag_data(data):
counts = numpy.zeros(data.shape[0])
seen = {}
for i, arr in enumerate(data):
sarr = arr.tostring()
if sarr in seen:
counts[seen[sarr]] += 1
else:
seen[sarr] = i
counts[i] = 1
nz = counts != 0
return numpy.compress(nz, data, axis=0), numpy.compress(nz, counts)
[toc] | [prev] | [next] | [standalone]
| From | duncan smith <duncan@invalid.invalid> |
|---|---|
| Date | 2015-12-05 00:18 +0000 |
| Message-ID | <93q8y.177413$ij2.5605@fx08.iad> |
| In reply to | #100021 |
On 04/12/15 23:06, Peter Otten wrote:
> duncan smith wrote:
>
>> Hello,
>> I'm trying to find a computationally efficient way of identifying
>> unique subarrays, counting them and returning an array containing only
>> the unique subarrays and a corresponding 1D array of counts. The
>> following code works, but is a bit slow.
>>
>> ###############
>>
>> from collections import Counter
>> import numpy
>>
>> def bag_data(data):
>> # data (a numpy array) is bagged along axis 0
>> # returns concatenated array and corresponding array of counts
>> vec_shape = data.shape[1:]
>> counts = Counter(tuple(arr.flatten()) for arr in data)
>> data_out = numpy.zeros((len(counts),) + vec_shape)
>> cnts = numpy.zeros((len(counts,)))
>> for i, (tup, cnt) in enumerate(counts.iteritems()):
>> data_out[i] = numpy.array(tup).reshape(vec_shape)
>> cnts[i] = cnt
>> return data_out, cnts
>>
>> ###############
>>
>> I've been looking through the numpy docs, but don't seem to be able to
>> come up with a clean solution that avoids Python loops.
>
> Me neither :(
>
>> TIA for any
>> useful pointers. Cheers.
>
> Here's what I have so far:
>
> def bag_data(data):
> counts = numpy.zeros(data.shape[0])
> seen = {}
> for i, arr in enumerate(data):
> sarr = arr.tostring()
> if sarr in seen:
> counts[seen[sarr]] += 1
> else:
> seen[sarr] = i
> counts[i] = 1
> nz = counts != 0
> return numpy.compress(nz, data, axis=0), numpy.compress(nz, counts)
>
Three times as fast as what I had, and a bit cleaner. Excellent. Cheers.
Duncan
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web