Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #65401 > unrolled thread

Logging data from Arduino using PySerial

Started byThomas <t.tchorzewski@gmail.com>
First post2014-02-03 20:07 -0800
Last post2014-02-04 14:05 +0000
Articles 6 — 4 participants

Back to article view | Back to comp.lang.python


Contents

  Logging data from Arduino using PySerial Thomas <t.tchorzewski@gmail.com> - 2014-02-03 20:07 -0800
    Re: Logging data from Arduino using PySerial Chris Angelico <rosuav@gmail.com> - 2014-02-04 15:47 +1100
      Re: Logging data from Arduino using PySerial Thomas <t.tchorzewski@gmail.com> - 2014-02-03 20:57 -0800
        Re: Logging data from Arduino using PySerial Chris Angelico <rosuav@gmail.com> - 2014-02-04 16:18 +1100
    Re: Logging data from Arduino using PySerial Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2014-02-04 08:56 -0500
    Re: Logging data from Arduino using PySerial MRAB <python@mrabarnett.plus.com> - 2014-02-04 14:05 +0000

#65401 — Logging data from Arduino using PySerial

FromThomas <t.tchorzewski@gmail.com>
Date2014-02-03 20:07 -0800
SubjectLogging data from Arduino using PySerial
Message-ID<e3b195f8-5a17-4536-9926-2b2ab193719c@googlegroups.com>
I've written a script to log data from my Arduino to a csv file. The script works well enough but it's very, very slow. I'm quite new to Python and I just wanted to put this out there to see if any Python experts could help optimise my code. Here it is:

    import serial
    import re
    import csv
    import numpy as np
    import matplotlib.pyplot as plt
    
    portPath = "/dev/ttyACM0"
    baud = 9600
    sample_time = 0.5
    sim_time = 30
    
    
    # Initializing Lists
    # Data Collection
    data_log = []
    line_data = []
    
    def map(x, in_min, in_max, out_min, out_max):
        return (((x - in_min) * (out_max - out_min))/(in_max - in_min)) + out_min
        
    # Establishing Serial Connection
    connection = serial.Serial(portPath,baud)
    
    # Calculating the length of data to collect based on the
    # sample time and simulation time (set by user)
    max_length = sim_time/sample_time
    
    # Collecting the data from the serial port
    while True:
        data_log.append(connection.readline())
        if len(data_log) > max_length - 1:
            break
                
    # Cleaning the data_log and storing it in data.csv
    with open('data.csv','wb') as csvfile:
        for line in data_log:
            line_data = re.findall('\d*\.\d*',line) # Find all digits
            line_data = filter(None,line_data)    # Filter out empty strings
            line_data = [float(x) for x in line_data] # Convert Strings to float
            
            for i in range(1,len(line_data)):
                line_data[i]=map(line_data[i],0,1023,0,5)
                
            csvwrite = csv.writer(csvfile)
            csvwrite.writerow(line_data)
            
            
    
    plt.clf()
    plt.close()
    plt.plotfile('data.csv',(0,1,2),names=['time (s)','voltage2 (V)','voltage1 (V)'],newfig=True)
    plt.show()


I'd appreciate any help/tips you can offer.

[toc] | [next] | [standalone]


#65402

FromChris Angelico <rosuav@gmail.com>
Date2014-02-04 15:47 +1100
Message-ID<mailman.6376.1391489248.18130.python-list@python.org>
In reply to#65401
On Tue, Feb 4, 2014 at 3:07 PM, Thomas <t.tchorzewski@gmail.com> wrote:
> I've written a script to log data from my Arduino to a csv file. The script works well enough but it's very, very slow. I'm quite new to Python and I just wanted to put this out there to see if any Python experts could help optimise my code. Here it is:
>

The most important question is: Have you profiled your code? That is,
do you know which part(s) are slow?

And the first part of that question is: Define "slow". How long does
your program actually take? How much data are you accumulating in that
time? By my reading, you're getting 60 lines from the serial port; how
long is each line?

For a basic back-of-the-envelope timing estimate, work out how long
each line is (roughly), and multiply by 60 (number of lines), then
divide by 960 (your baud rate, divided by 10, which gives a rough
bytes-per-second rate). That'll give you a ball-park figure for how
many seconds this loop will take:

>     # Collecting the data from the serial port
>     while True:
>         data_log.append(connection.readline())
>         if len(data_log) > max_length - 1:
>             break

If your lines are 80 characters long, give or take, then that works
out to 80*60/960 = five seconds just to fetch the data from the serial
port. So if the script is taking anywhere from 3 to 8 seconds to run,
then you can assume that it's probably spending most of its time right
here. Yes, that might feel like it's really slow, but it's all down to
your baud rate. You can easily confirm this by surrounding that loop
with:

import time
start_time = time.time()
while ...
   ... append ...
print("Time to fetch from serial port:",time.time()-start_time)

(If this is Python 2, this will produce slightly ugly output as it'll
display it as a tuple. It'll work though.)

Put that in, and then run the program with some kind of external
timing. On Linux, that would be:

$ time python scriptname.py

If the total script execution is barely more than the time spent in
that loop, don't bother optimizing any of the rest of the code - it's
a waste of time improving something that's not materially affecting
your total run time.

But while I'm here looking at your code, I'll take the liberty of
making a few stylistic suggestions :) Feel free to ignore these, but
this is the more Pythonic way of writing the code. Starting with the
serial port fetch loop:

>     # Collecting the data from the serial port
>     while True:
>         data_log.append(connection.readline())
>         if len(data_log) > max_length - 1:
>             break

An infinite loop with a conditional break exactly at one end. This
would be clearer written as a straight-forward conditional loop, with
the condition inverted:

while len(data_log) < max_length:
    data_log.append(connection.readline())

This isn't strictly identical to your previous version, but since
len(data_log) will always be an integer, it is - I believe -
functionally equivalent. But make sure I haven't introduced any bugs.
:)

(There is another difference, in that my version checks the condition
on entry where yours would check it only after doing the first append
- your version would guarantee a minimum of one line in the log, mine
won't. I'm guessing that this won't be significant in normal usage; if
it is, go back to your version of the code.)

>     def map(x, in_min, in_max, out_min, out_max):
>         return (((x - in_min) * (out_max - out_min))/(in_max - in_min)) + out_min

This shadows the built-in map() function. It works, but it's usually
safer to not do this, in case you subsequently want the default map().

>             line_data = re.findall('\d*\.\d*',line) # Find all digits
>             line_data = filter(None,line_data)    # Filter out empty strings
>             line_data = [float(x) for x in line_data] # Convert Strings to float

You can combine these.

line_data = [float(x) for x in re.findall('\d*\.\d*',line) if x]

Optionally break out the re into a separate line, but definitely I'd
go with the conditional comprehension above the separate filter():

line_data = re.findall('\d*\.\d*',line)
line_data = [float(x) for x in line_data if x]

>             for i in range(1,len(line_data)):
>                 line_data[i]=map(line_data[i],0,1023,0,5)

Is it deliberate that you start from 1 here? Are you consciously
skipping the first element, leaving it unchanged? If not, I would go
for another comprehension:

line_data = [map(x,0,1023,0,5) for x in line_data]

but if so, this warrants a code comment explaining why the first one is special.

>     plt.clf()
>     plt.close()
>     plt.plotfile('data.csv',(0,1,2),names=['time (s)','voltage2 (V)','voltage1 (V)'],newfig=True)
>     plt.show()

This, btw, is the other place that I'd look for a potentially large
slab of time. Bracket this with time.time() calls as above, and see
whether it's slow enough to mean nothing else is a concern. But I'd
first look at the serial port read loop; depending on how long your
lines are, that could easily be quite a few seconds of time just on
its own.

ChrisA

[toc] | [prev] | [next] | [standalone]


#65403

FromThomas <t.tchorzewski@gmail.com>
Date2014-02-03 20:57 -0800
Message-ID<b21f1406-a294-4e70-b039-09024e7928e0@googlegroups.com>
In reply to#65402
Wow...Thanks Chris! I really appreciate your suggestions (including the stylistic ones). I'll definitely be revising my code as soon as I find the time. As far as profiling goes, I've used timeit in the past but it's quite a pain going through any program block by block. I wish there were a program in which you could just toss in a script and it would spit out the bottlenecks in your code (with suggested performance improvements perhaps)...

[toc] | [prev] | [next] | [standalone]


#65405

FromChris Angelico <rosuav@gmail.com>
Date2014-02-04 16:18 +1100
Message-ID<mailman.6378.1391491099.18130.python-list@python.org>
In reply to#65403
On Tue, Feb 4, 2014 at 3:57 PM, Thomas <t.tchorzewski@gmail.com> wrote:
> Wow...Thanks Chris! I really appreciate your suggestions (including the stylistic ones). I'll definitely be revising my code as soon as I find the time. As far as profiling goes, I've used timeit in the past but it's quite a pain going through any program block by block. I wish there were a program in which you could just toss in a script and it would spit out the bottlenecks in your code (with suggested performance improvements perhaps)...
>

Well, timeit is good for microbenchmarking. (How useful
microbenchmarking itself is, now, that's a separate question.) For
anything where you can "feel" the program's time by human, it's easy
enough to use the time.time() function.

Add a little helper like this (untested):

last_time = time.time()
def tt(desc):
    global last_time
    cur_time=time.time()
    print("%s: %f"%(desc,cur_time-last_time))
    last_time=cur_time


Then put this all through your code:

....

    # Calculating the length of data to collect based on the
    # sample time and simulation time (set by user)
    max_length = sim_time/sample_time

    tt("Init")

    # Collecting the data from the serial port
    while True:
        data_log.append(connection.readline())
        if len(data_log) > max_length - 1:
            break

    tt("Serial")

...

etc etc. Give it a short description saying what's just happened
(because it'll give the time since the previous timepoint), and then
just eyeball the results to see where the biggest numbers are. If you
do it right, you'll find a whole pile of sections with tiny numbers,
which you can easily ignore, and just a handful that even register.
Then you dig into those sections and see where the slowness is.

Be careful, though. You can easily waste hundreds of expensive dev
hours trying to track down an insignificant time delay. :)

ChrisA

[toc] | [prev] | [next] | [standalone]


#65423

FromDennis Lee Bieber <wlfraed@ix.netcom.com>
Date2014-02-04 08:56 -0500
Message-ID<mailman.6387.1391522176.18130.python-list@python.org>
In reply to#65401
On Mon, 3 Feb 2014 20:07:48 -0800 (PST), Thomas <t.tchorzewski@gmail.com>
declaimed the following:

>I've written a script to log data from my Arduino to a csv file. The script works well enough but it's very, very slow. I'm quite new to Python and I just wanted to put this out there to see if any Python experts could help optimise my code. Here it is:
>
>    import serial
>    import re
>    import csv
>    import numpy as np
>    import matplotlib.pyplot as plt
>    
>    portPath = "/dev/ttyACM0"
>    baud = 9600
>    sample_time = 0.5
>    sim_time = 30
>    
>    
>    # Initializing Lists
>    # Data Collection
>    data_log = []
>    line_data = []
>    
>    def map(x, in_min, in_max, out_min, out_max):
>        return (((x - in_min) * (out_max - out_min))/(in_max - in_min)) + out_min
>
	Doesn't the Arduino have a map() function internally? If you have
control over the Arduino couldn't you set it up to return the desired
mapping values directly?
        
>    # Establishing Serial Connection
>    connection = serial.Serial(portPath,baud)
>    
>    # Calculating the length of data to collect based on the
>    # sample time and simulation time (set by user)
>    max_length = sim_time/sample_time
>    
>    # Collecting the data from the serial port
>    while True:
>        data_log.append(connection.readline())
>        if len(data_log) > max_length - 1:
>            break
>
	Here you are building up a list of raw lines...
                
>    # Cleaning the data_log and storing it in data.csv
>    with open('data.csv','wb') as csvfile:
>        for line in data_log:
>            line_data = re.findall('\d*\.\d*',line) # Find all digits
>            line_data = filter(None,line_data)    # Filter out empty strings
>            line_data = [float(x) for x in line_data] # Convert Strings to float
>
>            for i in range(1,len(line_data)):
>                line_data[i]=map(line_data[i],0,1023,0,5)
>                
>            csvwrite = csv.writer(csvfile)

	You are creating a new csv writer instance on each pass!

>            csvwrite.writerow(line_data)
>
	And then you loop over all the lines looking for particular values,
just to scale them into another range, to write to a CSV file.

	Personally, I'd have opened the CSV file at the start, and done all
this filtering/transforming on each line as it was read from the Arduino.

	csvfile = open("data.csv", "wb")
	csvwriter = csv.writer(csvfile)
	line = ""
	while len(line) < max_length:
		if len(line) == 0:	#skip first line (which your range(1,...) does
			line = connection.readline()
		line = connection.readline()
		# do all your filtering here
		if line:		#not empty, so filtering didn't wipe it out
			  csvwrite.writerow(line)
	csvfile.close()
-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
    wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

[toc] | [prev] | [next] | [standalone]


#65424

FromMRAB <python@mrabarnett.plus.com>
Date2014-02-04 14:05 +0000
Message-ID<mailman.6388.1391522708.18130.python-list@python.org>
In reply to#65401
On 2014-02-04 04:07, Thomas wrote:
> I've written a script to log data from my Arduino to a csv file. The script works well enough but it's very, very slow. I'm quite new to Python and I just wanted to put this out there to see if any Python experts could help optimise my code. Here it is:
>
[snip]
>      # Cleaning the data_log and storing it in data.csv
>      with open('data.csv','wb') as csvfile:
>          for line in data_log:
>              line_data = re.findall('\d*\.\d*',line) # Find all digits
>              line_data = filter(None,line_data)    # Filter out empty strings
>              line_data = [float(x) for x in line_data] # Convert Strings to float
>
>              for i in range(1,len(line_data)):
>                  line_data[i]=map(line_data[i],0,1023,0,5)
>
You're doing this for every in line the log:

>              csvwrite = csv.writer(csvfile)
[snip]

Try moving before the 'for' loop so it's done only once.

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web