Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #15766 > unrolled thread
| Started by | Ulrich Eckhardt <ulrich.eckhardt@dominolaser.com> |
|---|---|
| First post | 2011-11-16 10:08 +0100 |
| Last post | 2011-11-17 21:00 -0500 |
| Articles | 7 — 4 participants |
Back to article view | Back to comp.lang.python
unit-profiling, similar to unit-testing Ulrich Eckhardt <ulrich.eckhardt@dominolaser.com> - 2011-11-16 10:08 +0100
Re: unit-profiling, similar to unit-testing Roy Smith <roy@panix.com> - 2011-11-16 09:36 -0500
Re: unit-profiling, similar to unit-testing Ulrich Eckhardt <ulrich.eckhardt@dominolaser.com> - 2011-11-17 09:53 +0100
Re: unit-profiling, similar to unit-testing Roy Smith <roy@panix.com> - 2011-11-17 09:03 -0500
Re: unit-profiling, similar to unit-testing "spartan.the" <spartan.the@gmail.com> - 2011-11-17 13:28 -0800
Re: unit-profiling, similar to unit-testing Tycho Andersen <tycho@tycho.ws> - 2011-11-17 14:45 -0600
Re: unit-profiling, similar to unit-testing Roy Smith <roy@panix.com> - 2011-11-17 21:00 -0500
| From | Ulrich Eckhardt <ulrich.eckhardt@dominolaser.com> |
|---|---|
| Date | 2011-11-16 10:08 +0100 |
| Subject | unit-profiling, similar to unit-testing |
| Message-ID | <95bcp8-bft.ln1@satorlaser.homedns.org> |
Hi! I'm currently trying to establish a few tests here that evaluate certain performance characteristics of our systems. As part of this, I found that these tests are rather similar to unit-tests, only that they are much more fuzzy and obviously dependent on the systems involved, CPU load, network load, day of the week (Tuesday is virus scan day) etc. What I'd just like to ask is how you do such things. Are there tools available that help? I was considering using the unit testing framework, but the problem with that is that the results are too hard to interpret programmatically and too easy to misinterpret manually. Any suggestions? Cheers! Uli
[toc] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2011-11-16 09:36 -0500 |
| Message-ID | <roy-DBE11D.09364016112011@news.panix.com> |
| In reply to | #15766 |
In article <95bcp8-bft.ln1@satorlaser.homedns.org>, Ulrich Eckhardt <ulrich.eckhardt@dominolaser.com> wrote: > Hi! > > I'm currently trying to establish a few tests here that evaluate certain > performance characteristics of our systems. As part of this, I found > that these tests are rather similar to unit-tests, only that they are > much more fuzzy and obviously dependent on the systems involved, CPU > load, network load, day of the week (Tuesday is virus scan day) etc. > > What I'd just like to ask is how you do such things. Are there tools > available that help? I was considering using the unit testing framework, > but the problem with that is that the results are too hard to interpret > programmatically and too easy to misinterpret manually. Any suggestions? It's really, really, really hard to either control for, or accurately measure, things like CPU or network load. There's so much stuff you can't even begin to see. The state of your main memory cache. Disk fragmentation. What I/O is happening directly out of kernel buffers vs having to do a physical disk read. How slow your DNS server is today. What I suggest is instrumenting your unit test suite to record not just the pas/fail status of every test, but also the test duration. Stick these into a database as the tests run. Over time, you will accumulate a whole lot of performance data, which you can then start to mine. While you're running the tests, gather as much system performance data as you can (output of top, vmstat, etc) and stick that into your database too. You never know when you'll want to refer to the data, so just collect it all and save it forever.
[toc] | [prev] | [next] | [standalone]
| From | Ulrich Eckhardt <ulrich.eckhardt@dominolaser.com> |
|---|---|
| Date | 2011-11-17 09:53 +0100 |
| Message-ID | <kkuep8-nqd.ln1@satorlaser.homedns.org> |
| In reply to | #15773 |
Am 16.11.2011 15:36, schrieb Roy Smith: > It's really, really, really hard to either control for, or accurately > measure, things like CPU or network load. There's so much stuff you > can't even begin to see. The state of your main memory cache. Disk > fragmentation. What I/O is happening directly out of kernel buffers vs > having to do a physical disk read. How slow your DNS server is today. Fortunately, I am in a position where I'm running tests on one system (generic desktop PC) while the system to test is another one, and there both hardware and software is under my control. Since this is rather smallish and embedded, the power and load of the desktop don't play a significant role, the other side is usually the bottleneck. ;) > What I suggest is instrumenting your unit test suite to record not just > the pas/fail status of every test, but also the test duration. Stick > these into a database as the tests run. Over time, you will accumulate > a whole lot of performance data, which you can then start to mine. I'm not sure. I see unit tests as something that makes sure things run correctly. For performance testing, I have functions to set up and tear down the environment. Then, I found it useful to have separate code to prime a cache, which is something done before each test run, but which is not part of the test run itself. I'm repeating each test run N times, recording the times and calculating maximum, minimum, average and standard deviation. Some of this is similar to unit testing (code to set up/tear down), but other things are too different. Also, sometimes I can vary tests with a factor F, then I would also want to capture the influence of this factor. I would even wonder if you can't verify the behaviour agains an expected Big-O complexity somehow. All of this is rather general, not specific to my use case, hence my question if there are existing frameworks to facilitate this task. Maybe it's time to create one... > While you're running the tests, gather as much system performance data > as you can (output of top, vmstat, etc) and stick that into your > database too. You never know when you'll want to refer to the data, so > just collect it all and save it forever. Yes, this is surely something that is necessary, in particular since there are no clear success/failure outputs like for unit tests and they require a human to interpret them. Cheers! Uli
[toc] | [prev] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2011-11-17 09:03 -0500 |
| Message-ID | <roy-56C820.09031517112011@news.panix.com> |
| In reply to | #15812 |
In article <kkuep8-nqd.ln1@satorlaser.homedns.org>, Ulrich Eckhardt <ulrich.eckhardt@dominolaser.com> wrote: > Yes, this is surely something that is necessary, in particular since > there are no clear success/failure outputs like for unit tests and they > require a human to interpret them. As much as possible, you want to automate things so no human intervention is required. For example, let's say you have a test which calls foo() and times how long it takes. You've already mentioned that you run it N times and compute some basic (min, max, avg, sd) stats. So far, so good. The next step is to do some kind of regression against past results. Once you've got a bunch of historical data, it should be possible to look at today's numbers and detect any significant change in performance. Much as I loathe the bureaucracy and religious fervor which has grown up around Six Sigma, it does have some good tools. You might want to look into control charts (http://en.wikipedia.org/wiki/Control_chart). You think you've got the test environment under control, do you? Try plotting a month's worth of run times for a particular test on a control chart and see what it shows. Assuming your process really is under control, I would write scripts that did the following kinds of analysis: 1) For a given test, do a linear regression of run time vs date. If the line has any significant positive slope, you want to investigate why. 2) You already mentioned, "I would even wonder if you can't verify the behaviour agains an expected Big-O complexity somehow". Of course you can. Run your test a bunch of times with different input sizes. I would try something like a 1-2-5 progression over several decades (i.e. input sizes of 10, 20, 50, 100, 200, 500, 1000, etc) You will have to figure out what an appropriate range is, and how to generate useful input sets. Now, curve fit your performance numbers to various shape curves and see what correlation coefficient you get. All that being said, in my experience, nothing beats plotting your data and looking at it.
[toc] | [prev] | [next] | [standalone]
| From | "spartan.the" <spartan.the@gmail.com> |
|---|---|
| Date | 2011-11-17 13:28 -0800 |
| Message-ID | <fa20036c-aeda-4306-90cb-d30283f10fb9@k10g2000yqn.googlegroups.com> |
| In reply to | #15818 |
On Nov 17, 4:03 pm, Roy Smith <r...@panix.com> wrote: > In article <kkuep8-nqd....@satorlaser.homedns.org>, > Ulrich Eckhardt <ulrich.eckha...@dominolaser.com> wrote: > > > Yes, this is surely something that is necessary, in particular since > > there are no clear success/failure outputs like for unit tests and they > > require a human to interpret them. > > As much as possible, you want to automate things so no human > intervention is required. > > For example, let's say you have a test which calls foo() and times how > long it takes. You've already mentioned that you run it N times and > compute some basic (min, max, avg, sd) stats. So far, so good. > > The next step is to do some kind of regression against past results. > Once you've got a bunch of historical data, it should be possible to > look at today's numbers and detect any significant change in performance. > > Much as I loathe the bureaucracy and religious fervor which has grown up > around Six Sigma, it does have some good tools. You might want to look > into control charts (http://en.wikipedia.org/wiki/Control_chart). You > think you've got the test environment under control, do you? Try > plotting a month's worth of run times for a particular test on a control > chart and see what it shows. > > Assuming your process really is under control, I would write scripts > that did the following kinds of analysis: > > 1) For a given test, do a linear regression of run time vs date. If the > line has any significant positive slope, you want to investigate why. > > 2) You already mentioned, "I would even wonder if you can't verify the > behaviour agains an expected Big-O complexity somehow". Of course you > can. Run your test a bunch of times with different input sizes. I > would try something like a 1-2-5 progression over several decades (i.e. > input sizes of 10, 20, 50, 100, 200, 500, 1000, etc) You will have to > figure out what an appropriate range is, and how to generate useful > input sets. Now, curve fit your performance numbers to various shape > curves and see what correlation coefficient you get. > > All that being said, in my experience, nothing beats plotting your data > and looking at it. I strongly agree with Roy, here. Ulrich, I recommend you to explore how google measures appengine's health here: http://code.google.com/status/appengine. Unit tests are inappropriate here; any single unit test can answer PASS or FAIL, YES or NO. It can't answer the question "how much". Unless you just want to use unit tests. Then any arguments here just don't make sense. I suggest: 1. Decide what you want to measure. Measure result must be a number in range (0..100, -5..5), so you can plot them. 2. Write no-UI programs to get each number (measure) and write it to CSV. Run each of them several times take away 1 worst and 1 best result, and take and average number. 3. Collect the data for some period of time. 4. Plot those average number over time axis (it's easy with CSV format). 5. Make sure you automate this process (batch files or so) so the plot is generated automatically each hour or each day. And then after a month you can decide if you want to divide your number ranges into green-yellow-red zones. More often than not you may find that your measures are so inaccurate and random that you can't trust them. Then you'll either forget that or dive into math (statistics). You have about 5% chances to succeed ;)
[toc] | [prev] | [next] | [standalone]
| From | Tycho Andersen <tycho@tycho.ws> |
|---|---|
| Date | 2011-11-17 14:45 -0600 |
| Message-ID | <mailman.2810.1321562763.27778.python-list@python.org> |
| In reply to | #15773 |
On Wed, Nov 16, 2011 at 09:36:40AM -0500, Roy Smith wrote: > In article <95bcp8-bft.ln1@satorlaser.homedns.org>, > Ulrich Eckhardt <ulrich.eckhardt@dominolaser.com> wrote: > > > Hi! > > > > I'm currently trying to establish a few tests here that evaluate certain > > performance characteristics of our systems. As part of this, I found > > that these tests are rather similar to unit-tests, only that they are > > much more fuzzy and obviously dependent on the systems involved, CPU > > load, network load, day of the week (Tuesday is virus scan day) etc. > > > > What I'd just like to ask is how you do such things. Are there tools > > available that help? I was considering using the unit testing framework, > > but the problem with that is that the results are too hard to interpret > > programmatically and too easy to misinterpret manually. Any suggestions? > > It's really, really, really hard to either control for, or accurately > measure, things like CPU or network load. There's so much stuff you > can't even begin to see. The state of your main memory cache. Disk > fragmentation. What I/O is happening directly out of kernel buffers vs > having to do a physical disk read. How slow your DNS server is today. While I agree there's a lot of things you can't control for, you can get a more accurate picture by using CPU time instead of wall time (e.g. the clock() system call). If what you care about is mostly CPU time, you can control for the "your disk is fragmented", "your DNS server died", or "my cow-orker was banging on the test machine" this way. \t
[toc] | [prev] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2011-11-17 21:00 -0500 |
| Message-ID | <roy-5F44F7.21000017112011@news.panix.com> |
| In reply to | #15833 |
In article <mailman.2810.1321562763.27778.python-list@python.org>, Tycho Andersen <tycho@tycho.ws> wrote: > While I agree there's a lot of things you can't control for, you can > get a more accurate picture by using CPU time instead of wall time > (e.g. the clock() system call). If what you care about is mostly CPU > time [...] That's a big if. In some cases, CPU time is important, but more often, wall-clock time is more critical. Let's say I've got two versions of a program. Here's some results for my test run: Version CPU Time Wall-Clock Time 1 2 hours 2.5 hours 2 1.5 hours 5.0 hours Between versions, I reduced the CPU time to complete the given task, but increased the wall clock time. Perhaps I doubled the size of some hash table. Now I get a lot fewer hash collisions (so I spend less CPU time re-hashing), but my memory usage went up so I'm paging a lot and my locality of reference went down so my main memory cache hit rate is worse. Which is better? I think most people would say version 1 is better. CPU time is only important in a situation where the system is CPU bound. In many real-life cases, that's not at all true. Things can be memory bound. Or I/O bound (which, when you consider paging, is often the same thing as memory bound). Or lock-contention bound. Before you starting measuring things, it's usually a good idea to know what you want to measure, and why :-)
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web