Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!news.dougwise.org!gegeweb.org!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail From: sfeam Newsgroups: comp.graphics.apps.gnuplot Subject: Re: Stacked boxes Followup-To: comp.graphics.apps.gnuplot Date: Thu, 14 Apr 2011 08:57:17 -0700 Organization: gnuplot development team Lines: 74 Message-ID: References: <4da3355c$0$12295$c3e8da3$aae71a0a@news.astraweb.com> <4da344aa$0$5610$c3e8da3$10cdda79@news.astraweb.com> <4da3a149$0$22212$c3e8da3$a8a65a91@news.astraweb.com> <4da6025a$0$5766$c3e8da3$e408f015@news.astraweb.com> Reply-To: sfeam@users.sourceforge.net Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7Bit Injection-Date: Thu, 14 Apr 2011 15:57:21 +0000 (UTC) Injection-Info: mx02.eternal-september.org; posting-host="joalffpZlyHxFtpFd/PRrw"; logging-data="19655"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/UPtx+eK9eibC5J/44Y5iR" User-Agent: KNode/4.4.3 Cancel-Lock: sha1:6ZAV+XiLBUhyc6imGl6KMy68Qmg= Xref: x330-a1.tempe.blueboxinc.net comp.graphics.apps.gnuplot:258 James Waldby wrote: > On Wed, 13 Apr 2011 15:07:57 -0700, sfeam wrote: > >> Mike Rhodes wrote: >> >>> On 4/11/2011 8:48 PM, Mike Rhodes wrote: >>>> On 4/11/11 8:41 PM, sfeam wrote: >>>>> Mike Rhodes wrote: >>>>> >>>>> It seems like perhaps what you need to do is to treat time as a >>>>> sequence of discrete intervals (1 second?), [...] >>> To follow up on this, I have some example plots. >>> >>> The problem with aggregating the data into coarser intervals is that it >>> creates inaccurate amplitudes. As an example, [...] >>> http://s3.amazonaws.com/3tbVapQP/boxes.png >>> http://s3.amazonaws.com/3tbVapQP/seconds.png >>> http://s3.amazonaws.com/3tbVapQP/milli.png >>> >>> [...] the millisecond plot exactly maps to my input data, >>> but the aggregated seconds plot is inaccurate. >>> >>> So to get the graph I want, I need to use millisecond precision. But >>> the problem with remapping my boxes data into millisecond bins is the >>> file size. In this example, the millisecond file is 250x larger than >>> the boxes file -- and it's the exact same data, just represented >>> differently. At that rate, a 4MB boxes file becomes a 1GB millisecond >>> file![...] >> Now you have a better feel for why the problem is hard :-) >> >> How is the program to decide if two boxes overlap, if not by exactly the >> same process that you are saying generates too much data? Can you think >> of some way other to handle the data internally, other than treating >> each "box" as a series of samples at some pre-defined interval? I'm >> serious in asking - if there's a clever way to do it then great, >> otherwise it strikes me as impractical for reason you've already >> identified. > > The usual method for processing such data is to use a heap [see eg > ]. One > could dump all of the item start times and all of the item end > times into a big heap, and then plot the heap min until heap is > empty. A more complicated method (idea as sketched below, might > need a fix or two) would use a small heap, of size proportional > to maximum box stack depth: > > Suppose data lines are sorted by increasing time and each line > contains {.t, .f, .d} fields for {time, flow, duration}, and > heap entries contain {.t, .f} fields for {time, flow}. > > (1) If heap is empty and no more items, quit. > (2) If heap is empty, put next item x on heap as follows: > Make an 'up' entry, {x.t, x.f} and a 'down' entry > at {x.t+x.d, -x.f} and advance next. > (3) Get min item x from heap. If next item y has y.t <= x.t, > add y up and down to heap (ie, {y.t, y.f} and {y.t+y.d, -y.f}) > and advance next. > (4) Add x.f to current flow f and output (x.t, f) for plotting > (Adapt appropriately if making filled area boxes as in > suggestion below. Box width equals time difference between > previous and current heap min item, or between current and > next, depending on type of plot) > (5) Go to 1. Thanks. Some variant of that approach does seem likely to work. Can anyone offer pointers to typical data sets or examples of this type of presentation? What is such a plot style called? Would there typically be any information to present beyond the 3 quantities start/stop/flow? I imagine that each entry also could have at least one "type" or "class" property that might be encoded by color. Is that done?