Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #19301
| Date | 2012-01-24 10:49 +1100 |
|---|---|
| From | Cameron Simpson <cs@zip.com.au> |
| Subject | Re: Parsing a serial stream too slowly |
| References | <d250b98e-6409-474b-ba56-146e5b490e01@l1g2000vbc.googlegroups.com> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.4999.1327363168.27778.python-list@python.org> (permalink) |
On 23Jan2012 13:48, M.Pekala <mcdpekala@gmail.com> wrote:
| Hello, I am having some trouble with a serial stream on a project I am
| working on. I have an external board that is attached to a set of
| sensors. The board polls the sensors, filters them, formats the
| values, and sends the formatted values over a serial bus. The serial
| stream comes out like $A1234$$B-10$$C987$, where "$A.*$" is a sensor
| value, "$B.*$" is a sensor value, "$C.*$" is a sensor value, ect...
|
| When one sensor is running my python script grabs the data just fine,
| removes the formatting, and throws it into a text control box. However
| when 3 or more sensors are running, I get output like the following:
|
| Sensor 1: 373
| Sensor 2: 112$$M-160$G373
| Sensor 3: 763$$A892$
|
| I am fairly certain this means that my code is running too slow to
| catch all the '$' markers. Below is the snippet of code I believe is
| the cause of this problem...
Your code _is_ slow, but as you can see above you're not missing data,
you're gathering too much data.
Some point by point remarks below. The actual _bug_ is your use of ".*"
in your regexps. Some change suggestions below the code.
| def OnSerialRead(self, event):
| text = event.data
| self.sensorabuffer = self.sensorabuffer + text
| self.sensorbbuffer = self.sensorbbuffer + text
| self.sensorcbuffer = self.sensorcbuffer + text
Slow and memory wasteful. Supposing a sensor never reports? You will
accumulate an ever growing buffer string. And extending a string gets
expensive as it grows.
| if sensoraenable:
| sensorresult = re.search(r'\$A.*\$.*', self.sensorabuffer )
Slow and buggy.
The slow: You're compiling the regular expression _every_ time you come
here (unless the re module caches things, which I seem to recall it may.
But that efficiency is only luck.
The bug: supposing you get multiple sensor reports, like this:
$A1$$B2$$C3$
Your regexp matches the whole thing! Because ".*" is greedy.
You want "[^$]*" - characters that are not a "$".
| if sensorresult:
| s = sensorresult.group(0)
| s = s[2:-1]
| if self.sensor_enable_chkbox.GetValue():
| self.SensorAValue = s
| self.sensorabuffer = ''
What if there are multiple values in the buffer? After fixing your
regexp you will now be throwing them away. Better to go:
self.sensorabuffer = self.sensorabuffer[sensorresult.end():]
[...]
| I think that regex is too slow for this operation, but I'm uncertain
| of another method in python that could be faster. A little help would
| be appreciated.
Regex _is_ slow. It is good for flexible lexing, but generally Not
Fast. It can be faster than in-Python lexing because the inner
interpreation of the regex is C code, but is often overkill when speed
matters. (Which you may find it does not for your app - fix the bugs
first and see how it behaves).
I would be making the following changes if it were me:
- keep only one buffer, and parse it into sensor "tokens"
pass each token to the right sensor as needed
- don't use regexps
this is a speed thing; if you code is more readable with regexps and
still faster enough you may not do this
To these ends, untested attempt 1 (one buffer, lex into tokens, still
using regexps):
re_token = re.compile( r'\$([A-Z])([^$]*)\$' )
def OnSerialRead(self, event):
# accessing a local var is quicker and more readable
buffer = self.buffer
text = event.data
buffer += text
m = re_token.search(buffer)
while m:
sensor, value = m.group(1), m.group(2)
buffer = buffer[m.end():]
if sensor == 'A':
# ...
elif sensor == 'B':
# ...
else:
warning("unsupported sensor: %s", sensor)
# stash the updated buffer for later
self.buffer = buffer
I'm assuming here that you can get noise in the serial stream. If you
are certain to get only clean "$Ax$" sequences and nothing else you can
make the code much simpler. And faster again.
Pretending clean data and no regexps:
def OnSerialRead(self, event):
# accessing a local var is quicker and more readable
buffer = self.buffer
text = event.data
buffer += text
while buffer:
if not buffer.startswith('$'):
raise ValueError("bad data in buffer! code rewrite needed!")
mark2 = buffer.find('$', 1)
if mark2 < 0:
# end of token not present
# look again later
break
token = buffer[1:mark2]
buffer = buffer[mark2+1:]
if not token:
raise ValueError("no data in packet!")
sensorm value = token[1], token[1:]
... hand off to sensors as above ...
Cheers,
--
Cameron Simpson <cs@zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/
If your new theorem can be stated with great simplicity, then there
will exist a pathological exception. - Adrian Mathesis
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Parsing a serial stream too slowly "M.Pekala" <mcdpekala@gmail.com> - 2012-01-23 13:48 -0800
Re: Parsing a serial stream too slowly Jon Clements <joncle@googlemail.com> - 2012-01-23 14:00 -0800
Re: Parsing a serial stream too slowly "M.Pekala" <mcdpekala@gmail.com> - 2012-01-23 14:03 -0800
Re: Parsing a serial stream too slowly Nick Dokos <nicholas.dokos@hp.com> - 2012-01-23 18:28 -0500
Re: Parsing a serial stream too slowly Thomas Rachel <nutznetz-0c1b6768-bfa9-48d5-a470-7603bd3aa915@spamschutz.glglgl.de> - 2012-01-24 00:13 +0100
Re: Parsing a serial stream too slowly Thomas Rachel <nutznetz-0c1b6768-bfa9-48d5-a470-7603bd3aa915@spamschutz.glglgl.de> - 2012-01-24 14:04 +0100
Re: Parsing a serial stream too slowly Cameron Simpson <cs@zip.com.au> - 2012-01-24 10:49 +1100
Re: Parsing a serial stream too slowly "M.Pekala" <mcdpekala@gmail.com> - 2012-01-23 17:07 -0800
Re: Parsing a serial stream too slowly Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-01-24 05:08 +0000
Re: Parsing a serial stream too slowly Cameron Simpson <cs@zip.com.au> - 2012-01-24 19:23 +1100
Re: Parsing a serial stream too slowly Ulrich Eckhardt <ulrich.eckhardt@dominolaser.com> - 2012-01-24 11:15 +0100
csiph-web