Groups > comp.lang.python > #22086 > unrolled thread

Data mining/pattern recogniton software in Python?

Started by	Grzegorz Staniak <gstaniak@gmail.com>
First post	2012-03-23 16:43 +0000
Last post	2012-03-24 05:21 +0000
Articles	3 — 2 participants

Back to article view | Back to comp.lang.python

  Data mining/pattern recogniton software in Python? Grzegorz Staniak <gstaniak@gmail.com> - 2012-03-23 16:43 +0000
    Re: Data mining/pattern recogniton software in Python? Jon Clements <joncle@googlemail.com> - 2012-03-23 20:10 -0700
      Re: Data mining/pattern recogniton software in Python? Grzegorz Staniak <gstaniak@gmail.com> - 2012-03-24 05:21 +0000

#22086 — Data mining/pattern recogniton software in Python?

From	Grzegorz Staniak <gstaniak@gmail.com>
Date	2012-03-23 16:43 +0000
Subject	Data mining/pattern recogniton software in Python?
Message-ID	<jki97s$66e$1@mx1.internetia.pl>

Hello,

I've been asked by a colleague for help in a small educational
project, which would involve the recognition of patterns in a live 
feed of data points (readings from a measuring appliance), and then 
a more general search for patterns on archival data. The language 
of preference is Python, since the lab uses software written in
Python already. I can see there are packages like Open CV,
scikit-learn, Orange that could perhaps be of use for the mining
phase -- and even if they are slanted towards image pattern 
recognition, I think I'd be able to find an appropriate package
for the timeseries analyses. But I'm wondering about the "live" 
phase -- what approach would you suggest? I wouldn't want to 
force an open door, perhaps there are already packages/modules that 
could be used to read data in a loop i.e. every 10 seconds, 
maintain a a buffer of 15 readings and ring a bell when the data
in buffer form a specific pattern (a spike, a trough, whatever)?

I'll be grateful for a push in the right direction. Thanks,

GS
-- 
Grzegorz Staniak   <gstaniak _at_ gmail [dot] com>

[toc] | [next] | [standalone]

#22109

From	Jon Clements <joncle@googlemail.com>
Date	2012-03-23 20:10 -0700
Message-ID	<30719938.516.1332558628851.JavaMail.geo-discussion-forums@vbut24>
In reply to	#22086

On Friday, 23 March 2012 16:43:40 UTC, Grzegorz Staniak  wrote:
> Hello,
> 
> I've been asked by a colleague for help in a small educational
> project, which would involve the recognition of patterns in a live 
> feed of data points (readings from a measuring appliance), and then 
> a more general search for patterns on archival data. The language 
> of preference is Python, since the lab uses software written in
> Python already. I can see there are packages like Open CV,
> scikit-learn, Orange that could perhaps be of use for the mining
> phase -- and even if they are slanted towards image pattern 
> recognition, I think I'd be able to find an appropriate package
> for the timeseries analyses. But I'm wondering about the "live" 
> phase -- what approach would you suggest? I wouldn't want to 
> force an open door, perhaps there are already packages/modules that 
> could be used to read data in a loop i.e. every 10 seconds, 
> maintain a a buffer of 15 readings and ring a bell when the data
> in buffer form a specific pattern (a spike, a trough, whatever)?
> 
> I'll be grateful for a push in the right direction. Thanks,
> 
> GS
> -- 
> Grzegorz Staniak   <gstaniak _at_ gmail [dot] com>

It might also be worth checking out pandas[1] and scikits.statsmodels[2].

In terms of reading data in a loop I would probably go for a producer-consumer model (possibly using a Queue[3]). Have the consumer constantly try to get another reading, and notify the consumer which can then determine if it's got enough data to calculate a peak/trough. This article is also a fairly good read[4].

That's some pointers anyway,

hth,

Jon.


[1] http://pandas.pydata.org/
[2] http://statsmodels.sourceforge.net/
[3] http://docs.python.org/library/queue.html
[4] http://www.laurentluce.com/posts/python-threads-synchronization-locks-rlocks-semaphores-conditions-events-and-queues/

[toc] | [prev] | [next] | [standalone]

#22114

From	Grzegorz Staniak <gstaniak@gmail.com>
Date	2012-03-24 05:21 +0000
Message-ID	<jkjlk5$jfa$1@mx1.internetia.pl>
In reply to	#22109

On 24.03.2012, Jon Clements <joncle@googlemail.com> wroted:

> It might also be worth checking out pandas[1] and scikits.statsmodels[2].
>
> In terms of reading data in a loop I would probably go for a producer-consumer model (possibly using a Queue[3]). Have the consumer constantly try to get another reading, and notify the consumer which can then determine if it's got enough data to calculate a peak/trough. This article is also a fairly good read[4].
>
> That's some pointers anyway,
>
> hth,
>
> Jon.
>
> [1] http://pandas.pydata.org/
> [2] http://statsmodels.sourceforge.net/
> [3] http://docs.python.org/library/queue.html
> [4] http://www.laurentluce.com/posts/python-threads-synchronization-locks-rlocks-semaphores-conditions-events-and-queues/

Thanks for the suggestions.

GS
-- 
Grzegorz Staniak   <gstaniak _at_ gmail [dot] com>

[toc] | [prev] | [standalone]

csiph-web

Data mining/pattern recogniton software in Python?

Contents

#22086 — Data mining/pattern recogniton software in Python?

#22109

#22114