Groups | Search | Server Info | Login | Register

Re: What would you like to see most in a storage benchmark test harness?

Newsgroups	comp.benchmarks
From	Charles Polisher <cpolish@nonesuch.com>
Subject	Re: What would you like to see most in a storage benchmark test harness?
References	<slrnkejnm3.uio.cpolish@localhost.localdomain> <g3jle8p6q18h131s22ji9clm27npqdtk1t@4ax.com>
Message-ID	<slrnkeltm1.68i.cpolish@localhost.localdomain> (permalink)
Organization	UseNetServer.com
Date	2013-01-07 16:09 +0000

Show all headers | View raw

On 2013-01-07, Mark F <mark53916@gmail.com> wrote:
> Charles Polisher wrote:
>
>> Hello everybody out there,
>> starting to get ready to share. I'd like to get feedback on
>> features, implementation, and venue for source code. 
> 1. Copy of 10MB or larger files from one place to another on the
>    device under test.  (Make sure the total size of files copied
>    is large enough to not be effectively cached on the device under
>    test and that you don't let the operating system do any "extra"
>    buffering beyond what it has given to the device under test to do
>    I/O on.)

Caching is notoriously hard to account for. The drives under
test have a cache, the controller has a cache, the device driver
may cache, the OS has buffer caches, the filesystem caches
inodes, applications cache. Sometimes I want only the actual
drive characteristics, othertimes I'm after performance with one
or more of the caches folded in. Is a range of settable test
profiles needed?

> 2. Copy trees with lots of files from one location
>    to another on the devices under test. ("lots" should be 10K files
>    or more for spinning devices, 100K for SSDs.  (The files sizes
>    should vary, but make sure the smallest files are large enough not 
>    to get stored in the metadata in the file system.)

Agreed. Tobi Oetiker has done some nice work on filesystem
benchmarking (http://oss.oetiker.ch/optools/wiki/FsOpBench)

> 3. Compare the times to copy using 1 stream to using 2, 3, and more
>    streams. For this test do both copying data from a second device
>    to the device under test and from location to location on the
>    device under test.

Agreed. With spinning disks it's good to account for different
transfer rates of inner and outer zones. 

Sometimes I want to isolate performance of a device under test
with no bottleneck on the source or desitination, usually I use
dd to/from /dev/null and use the reported throughput, seems to
agree with expectations based on device specs. 

> All of the data should be random.

Some benchmarks (for example iozone) try to account for and
test for deduplication, which I understand some SSD's implement,
and ZFS implements at the filesystem level. This implies that in
addition to a normal RNG there's a need for repeated sequences
of random numbers.

> Use filesystem operations so that problems with filesystem
> metadata access are not hidden. (Large file performance shouldn't
> be affected much by filesystem overhead, so you still should be
> able to match manufacturer's performance.  Small file performance
> will be affected, but we need to see those problems. 

Some people might want to test filesystem performance and
other layers such as LVM, which shouldn't have a performance
impact but show me the data!

> Many solid state devices take more than 100 times as long to copy
> from one location to another on the device, as compared to copying
> from a second device to the device under test, so it is important
> to copy within the device.  Many tests results reported on the
> web don't include copying from one location to another on the
> same device.  However, while copying from one location to another
> on a device is the operation that solid state devices should be able
> to perform the best compared to spinning, it often is the operation
> that the solid state device performs worst compared to a spinning
> device.

I have one SSD I've started to characterize, but not much experience
yet.

>> I'm concentrating on experimental technique, data management,
>> simulation (prediction), and visualization. My goal is to share
>> code, discuss techniques and observed data. Eventually I want to
>> be satisfied with performance, reliability and cost tradeoffs of
>> my storage.

It seems enough actual benchmarking pieces are already written
(tiobench, iozone, bonnie, dt, ...) but there are missing pieces
outside those actual tests. For instance, setting up an
m-element RAIDn, chunksize o, stripe width p, parity scheme q,
in normal and degraded modes. These are tedious to plan, set up,
predict performance, run the tests, manage, compare, and
interpret results. That's where I see a need for automation. My
direction is trying to make a framework to plop the pieces into,
the pieces being benchmark software, tests setups, datasets,
notes, visualization software, etc. I'd like to be able to share
data with others and be able to attempt meaningful comparisons.
I've heard tales of abberations in individual device
performance, it would help to discover if one is dealing with a
specific device issue vs a make and model.

-- 
Charles Polisher cpolish@surewest.net

Back to comp.benchmarks | Previous | Next — Previous in thread | Find similar

Thread

What would you like to see most in a storage benchmark test harness? Charles Polisher <cpolish@surewest.net> - 2013-01-06 20:29 +0000
  Re: What would you like to see most in a storage benchmark test harness? Mark F <mark53916@gmail.com> - 2013-01-07 08:53 -0500
    Re: What would you like to see most in a storage benchmark test  harness? Charles Polisher <cpolish@nonesuch.com> - 2013-01-07 16:09 +0000

csiph-web