Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #11821 > unrolled thread

Re: Interplatform (interprocess, interlanguage) communication

Started byjebblue <n@n.nnn>
First post2012-02-07 12:11 -0600
Last post2012-02-08 00:55 -0700
Articles 10 on this page of 70 — 7 participants

Back to article view | Back to comp.lang.java.programmer

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: Interplatform (interprocess, interlanguage) communication jebblue <n@n.nnn> - 2012-02-07 12:11 -0600
    Re: Interplatform (interprocess, interlanguage) communication BGB <cr88192@hotmail.com> - 2012-02-07 16:38 -0700
      Re: Interplatform (interprocess, interlanguage) communication Arved Sandstrom <asandstrom3minus1@eastlink.ca> - 2012-02-07 20:26 -0400
        Re: Interplatform (interprocess, interlanguage) communication BGB <cr88192@hotmail.com> - 2012-02-08 01:41 -0700
          Re: Interplatform (interprocess, interlanguage) communication Arved Sandstrom <asandstrom3minus1@eastlink.ca> - 2012-02-08 07:19 -0400
            Re: Interplatform (interprocess, interlanguage) communication BGB <cr88192@hotmail.com> - 2012-02-08 12:07 -0700
              Re: Interplatform (interprocess, interlanguage) communication Arne Vajhøj <arne@vajhoej.dk> - 2012-02-08 21:16 -0500
                Re: Interplatform (interprocess, interlanguage) communication BGB <cr88192@hotmail.com> - 2012-02-08 19:50 -0700
                  Re: Interplatform (interprocess, interlanguage) communication Arved Sandstrom <asandstrom3minus1@eastlink.ca> - 2012-02-09 06:24 -0400
                    Re: Interplatform (interprocess, interlanguage) communication BGB <cr88192@hotmail.com> - 2012-02-09 09:15 -0700
                      Re: Interplatform (interprocess, interlanguage) communication Arved Sandstrom <asandstrom3minus1@eastlink.ca> - 2012-02-09 18:58 -0400
                        Re: Interplatform (interprocess, interlanguage) communication BGB <cr88192@hotmail.com> - 2012-02-09 16:15 -0700
                        Re: Interplatform (interprocess, interlanguage) communication Arne Vajhøj <arne@vajhoej.dk> - 2012-02-09 18:50 -0500
                          Re: Interplatform (interprocess, interlanguage) communication BGB <cr88192@hotmail.com> - 2012-02-09 21:40 -0700
                            Re: Interplatform (interprocess, interlanguage) communication Arne Vajhøj <arne@vajhoej.dk> - 2012-02-11 14:47 -0500
                              Re: Interplatform (interprocess, interlanguage) communication Lew <lewbloch@gmail.com> - 2012-02-11 12:06 -0800
                                Re: Interplatform (interprocess, interlanguage) communication Arne Vajhøj <arne@vajhoej.dk> - 2012-02-11 15:18 -0500
                                  Re: Interplatform (interprocess, interlanguage) communication BGB <cr88192@hotmail.com> - 2012-02-11 23:03 -0700
                                    Re: Interplatform (interprocess, interlanguage) communication Arne Vajhøj <arne@vajhoej.dk> - 2012-02-12 09:27 -0500
                                      Re: Interplatform (interprocess, interlanguage) communication BGB <cr88192@hotmail.com> - 2012-02-12 13:33 -0700
                                        Re: Interplatform (interprocess, interlanguage) communication Arne Vajhøj <arne@vajhoej.dk> - 2012-02-12 15:50 -0500
                                          Re: Interplatform (interprocess, interlanguage) communication BGB <cr88192@hotmail.com> - 2012-02-12 14:34 -0700
                      Re: Interplatform (interprocess, interlanguage) communication Arne Vajhøj <arne@vajhoej.dk> - 2012-02-09 18:48 -0500
                        Re: Interplatform (interprocess, interlanguage) communication BGB <cr88192@hotmail.com> - 2012-02-09 21:46 -0700
                          Re: Interplatform (interprocess, interlanguage) communication Lew <lewbloch@gmail.com> - 2012-02-10 08:51 -0800
                            Re: Interplatform (interprocess, interlanguage) communication BGB <cr88192@hotmail.com> - 2012-02-10 10:43 -0700
                              Re: Interplatform (interprocess, interlanguage) communication Lew <lewbloch@gmail.com> - 2012-02-10 13:15 -0800
                                Re: Interplatform (interprocess, interlanguage) communication BGB <cr88192@hotmail.com> - 2012-02-10 14:50 -0700
                                  Re: Interplatform (interprocess, interlanguage) communication Lew <lewbloch@gmail.com> - 2012-02-10 14:32 -0800
                                    Re: Interplatform (interprocess, interlanguage) communication BGB <cr88192@hotmail.com> - 2012-02-10 17:10 -0700
                                      Re: Interplatform (interprocess, interlanguage) communication Arved Sandstrom <asandstrom3minus1@eastlink.ca> - 2012-02-10 22:08 -0400
                                        Re: Interplatform (interprocess, interlanguage) communication BGB <cr88192@hotmail.com> - 2012-02-11 00:49 -0700
                                          Re: Interplatform (interprocess, interlanguage) communication Arved Sandstrom <asandstrom3minus1@eastlink.ca> - 2012-02-11 14:04 -0400
                                Re: Interplatform (interprocess, interlanguage) communication Arne Vajhøj <arne@vajhoej.dk> - 2012-02-11 14:55 -0500
                              Re: Interplatform (interprocess, interlanguage) communication Arne Vajhøj <arne@vajhoej.dk> - 2012-02-11 14:52 -0500
                                Re: Interplatform (interprocess, interlanguage) communication BGB <cr88192@hotmail.com> - 2012-02-11 20:06 -0700
                                  Re: Interplatform (interprocess, interlanguage) communication Arne Vajhøj <arne@vajhoej.dk> - 2012-02-11 22:41 -0500
                                    Re: Interplatform (interprocess, interlanguage) communication BGB <cr88192@hotmail.com> - 2012-02-12 00:46 -0700
                                      Re: Interplatform (interprocess, interlanguage) communication Arne Vajhøj <arne@vajhoej.dk> - 2012-02-12 09:29 -0500
                                      Re: Interplatform (interprocess, interlanguage) communication Arne Vajhøj <arne@vajhoej.dk> - 2012-02-12 09:31 -0500
                                    Re: Interplatform (interprocess, interlanguage) communication Martin Gregorie <martin@address-in-sig.invalid> - 2012-02-12 16:02 +0000
                                      Re: Interplatform (interprocess, interlanguage) communication Arne Vajhøj <arne@vajhoej.dk> - 2012-02-12 11:16 -0500
                                        Re: Interplatform (interprocess, interlanguage) communication Martin Gregorie <martin@address-in-sig.invalid> - 2012-02-12 22:46 +0000
                                      Re: Interplatform (interprocess, interlanguage) communication BGB <cr88192@hotmail.com> - 2012-02-12 11:33 -0700
                                  Re: Interplatform (interprocess, interlanguage) communication Lew <lewbloch@gmail.com> - 2012-02-11 20:18 -0800
                                    Re: Interplatform (interprocess, interlanguage) communication BGB <cr88192@hotmail.com> - 2012-02-12 01:36 -0700
                                      Re: Interplatform (interprocess, interlanguage) communication Joshua Cranmer <Pidgeot18@verizon.invalid> - 2012-02-12 13:52 -0600
                                        Re: Interplatform (interprocess, interlanguage) communication BGB <cr88192@hotmail.com> - 2012-02-12 14:43 -0700
                          Re: Interplatform (interprocess, interlanguage) communication Arne Vajhøj <arne@vajhoej.dk> - 2012-02-11 14:49 -0500
                    Re: Interplatform (interprocess, interlanguage) communication Arne Vajhøj <arne@vajhoej.dk> - 2012-02-09 18:46 -0500
                  Re: Interplatform (interprocess, interlanguage) communication Arne Vajhøj <arne@vajhoej.dk> - 2012-02-09 18:45 -0500
          Re: Interplatform (interprocess, interlanguage) communication Lew <lewbloch@gmail.com> - 2012-02-08 14:02 -0800
            Re: Interplatform (interprocess, interlanguage) communication BGB <cr88192@hotmail.com> - 2012-02-08 18:49 -0700
              Re: Interplatform (interprocess, interlanguage) communication Arne Vajhøj <arne@vajhoej.dk> - 2012-02-08 21:14 -0500
                Re: Interplatform (interprocess, interlanguage) communication Lew <lewbloch@gmail.com> - 2012-02-08 20:07 -0800
                  Re: Interplatform (interprocess, interlanguage) communication BGB <cr88192@hotmail.com> - 2012-02-08 23:29 -0700
                    Re: Interplatform (interprocess, interlanguage) communication Lew <lewbloch@gmail.com> - 2012-02-09 09:40 -0800
                      Re: Interplatform (interprocess, interlanguage) communication BGB <cr88192@hotmail.com> - 2012-02-09 17:02 -0700
                Re: Interplatform (interprocess, interlanguage) communication BGB <cr88192@hotmail.com> - 2012-02-08 21:10 -0700
                  Re: Interplatform (interprocess, interlanguage) communication Arne Vajhøj <arne@vajhoej.dk> - 2012-02-09 18:54 -0500
                    Re: Interplatform (interprocess, interlanguage) communication BGB <cr88192@hotmail.com> - 2012-02-10 10:25 -0700
                      Re: Interplatform (interprocess, interlanguage) communication Arne Vajhøj <arne@vajhoej.dk> - 2012-02-11 14:45 -0500
                        Re: Interplatform (interprocess, interlanguage) communication Lew <lewbloch@gmail.com> - 2012-02-11 12:14 -0800
                          Re: Interplatform (interprocess, interlanguage) communication Arne Vajhøj <arne@vajhoej.dk> - 2012-02-11 15:20 -0500
                            Re: Interplatform (interprocess, interlanguage) communication BGB <cr88192@hotmail.com> - 2012-02-11 22:20 -0700
                              Re: Interplatform (interprocess, interlanguage) communication Arne Vajhøj <arne@vajhoej.dk> - 2012-02-12 09:23 -0500
                                Re: Interplatform (interprocess, interlanguage) communication BGB <cr88192@hotmail.com> - 2012-02-12 12:13 -0700
      Re: Interplatform (interprocess, interlanguage) communication Arne Vajhøj <arne@vajhoej.dk> - 2012-02-07 20:24 -0500
      Re: Interplatform (interprocess, interlanguage) communication Martin Gregorie <martin@address-in-sig.invalid> - 2012-02-08 01:31 +0000
        Re: Interplatform (interprocess, interlanguage) communication BGB <cr88192@hotmail.com> - 2012-02-08 00:55 -0700

Page 4 of 4 — ← Prev page 1 2 3 [4]


#11910

FromBGB <cr88192@hotmail.com>
Date2012-02-10 10:25 -0700
Message-ID<jh3k0f$mbj$1@news.albasani.net>
In reply to#11892
On 2/9/2012 4:54 PM, Arne Vajhøj wrote:
> On 2/8/2012 11:10 PM, BGB wrote:
>> On 2/8/2012 7:14 PM, Arne Vajhøj wrote:
>>> On 2/8/2012 8:49 PM, BGB wrote:
>>>> as noted, many people neither use schemas nor any sort of schema
>>>> validation. in many use-cases, schemas are overly constraining to the
>>>> ability of using XML to represent free-form data, or using them
>>>> otherwise would offer little particular advantage.
>>>
>>> xsd:any do provide some flexibility in schemas.
>>>
>>
>> yep, but one can wonder what is the gain of using a schema if one is
>> just going to use "xsd:any"?...
>
> You still have some structure.
>

probably.


>> it is also a mystery how well EXI behaves in this case (admittedly, I
>> have not personally looked into EXI in-depth, as I only briefly skimmed
>> over the spec a long time ago).
>
> No idea. But I would assume EXI supports what is valid XML and XSD.
>

yes, it is just that, IIRC, EXI uses the schema to know how to 
efficiently encode structures (values are directly coded), and falls 
back to a more naive strategy (describing the encoded tags) if the 
schema doesn't cover a given case.

admittedly, I am less certain, partly as skimming over the spec, 
admittedly I am not entirely certain how EXI works (would have to invest 
a bit more time in reading over the spec).

note: even in the worst case, the output will still likely be tiny vs 
textual XML.


more skimming... sudden mystery: if the format is a bitstream, why are 
they apparently using a byte-aligned scheme for storing integers?... 
(the cost here is that one has to then re-align with the next byte 
boundary, potentially wasting on average several bits).


>>>> say, if one is using XML for compiler ASTs or similar (say, the XML is
>>>> used to represent a just-parsed glob of source-code), do they really
>>>> need any sort of schema?
>>>
>>> I would expect syntax trees to follow certain rules and not be free
>>> form.
>>>
>>
>> well, there are some rules, but the question is more if a schema or the
>> use of validation would offer much advantage to make using it worth the
>> bother?...
>
> Enforcing correctness of data is usually a good idea.
>

potentially, but checking against schemas isn't free.
depending on the application, it could be hard to justify spending the 
extra clock cycles (except maybe for debugging purposes or similar).

a issue with ASTs is that they come in several forms:
giant, like in the output of a C compiler, where many tasks tend towards 
"expensive" (it may take easily anywhere from 250ms-1500ms to shove all 
this stuff through the various compiler stages);
small, like in a script-language VM, where typically it is desirable 
that compile times still be fairly fast, since a major strength of 
scripting languages is trying to keep "eval" and similar fairly close to 
free.

granted, one could debate the sanity of using XML for ASTs in the first 
place, but this started originally as a historical accident in my case 
(I was writing an interpreter, and it was what I had on-hand, actually: 
I partly hacked an existing XML-RPC implementation into being a script 
interpreter...). however, it doesn't seem to actually hurt performance 
too badly (ironically, in my C compiler, much more time goes into the 
preprocessor and tokenizer, which are far more efficient and more highly 
optimized).

side note: the C compiler doesn't use a standard DOM, but rather a 
highly specialized, but still DOM-like, system (and may still dump ASTs 
as text-form XML for debugging reasons). it involves, among other 
things, optimizations for numerical data (attributes may store numeric 
data directly, vs needing to use a string) and large hash-tables and 
chaining for look-ups, as well as specialized operations to reduce typing.

my current scripting VM, however, internally uses lists/s-expressions 
(note: they are neither AST compatible, nor will C code work effectively 
on my scripting VM). this was due to a later rewrite "switching over" (I 
was also reusing a lot of parts from a prior Scheme interpreter of mine 
for this one).


but, anyways, I am more left thinking schema-checking would probably 
make sense more when either some sort of security is a concern, or maybe 
when sending data "over the wire" between multiple parties.

inserting a schema check between ones' parser and ones' bytecode emitter 
doesn't seem nearly as compelling.


I guess, if a person really wanted, they could write a schema for the 
ASTs, but it is not clear how useful it would be to do so (since, 
generally, apart from someone mucking around with the compiler 
internals, there is little direct reason to know or care what is going 
on in there...).


or such...

[toc] | [prev] | [next] | [standalone]


#11936

FromArne Vajhøj <arne@vajhoej.dk>
Date2012-02-11 14:45 -0500
Message-ID<4f36c569$0$294$14726298@news.sunsite.dk>
In reply to#11910
On 2/10/2012 12:25 PM, BGB wrote:
> On 2/9/2012 4:54 PM, Arne Vajhøj wrote:
>> On 2/8/2012 11:10 PM, BGB wrote:
>>> On 2/8/2012 7:14 PM, Arne Vajhøj wrote:
>>>> On 2/8/2012 8:49 PM, BGB wrote:
>>>>> say, if one is using XML for compiler ASTs or similar (say, the XML is
>>>>> used to represent a just-parsed glob of source-code), do they really
>>>>> need any sort of schema?
>>>>
>>>> I would expect syntax trees to follow certain rules and not be free
>>>> form.
>>>>
>>>
>>> well, there are some rules, but the question is more if a schema or the
>>> use of validation would offer much advantage to make using it worth the
>>> bother?...
>>
>> Enforcing correctness of data is usually a good idea.
>>
>
> potentially, but checking against schemas isn't free.
> depending on the application, it could be hard to justify spending the
> extra clock cycles (except maybe for debugging purposes or similar).

One of the points is that you can validate during integration test
and if you encounter a problem but keep validation turned off otherwise.

And besides I would assume the big XML parser libraries to have
optimized the validation quite a bit.

Arne

[toc] | [prev] | [next] | [standalone]


#11944

FromLew <lewbloch@gmail.com>
Date2012-02-11 12:14 -0800
Message-ID<14291890.498.1328991256328.JavaMail.geo-discussion-forums@pbr7>
In reply to#11936
On Saturday, February 11, 2012 11:45:42 AM UTC-8, Arne Vajhøj wrote:
> On 2/10/2012 12:25 PM, BGB wrote:
> > On 2/9/2012 4:54 PM, Arne Vajhøj wrote:
> >> On 2/8/2012 11:10 PM, BGB wrote:
> >>> On 2/8/2012 7:14 PM, Arne Vajhøj wrote:
> >>>> On 2/8/2012 8:49 PM, BGB wrote:
> >>>>> say, if one is using XML for compiler ASTs or similar (say, the XML is
> >>>>> used to represent a just-parsed glob of source-code), do they really
> >>>>> need any sort of schema?
> >>>>
> >>>> I would expect syntax trees to follow certain rules and not be free
> >>>> form.
> >>>>
> >>>
> >>> well, there are some rules, but the question is more if a schema or the
> >>> use of validation would offer much advantage to make using it worth the
> >>> bother?...
> >>
> >> Enforcing correctness of data is usually a good idea.
> >>
> >
> > potentially, but checking against schemas isn't free.

Oh, yeah, micro-optimize that last $0.0000001 of performance.

Great thinking.

Checking against schemas isn't so expensive, either. You spout this drivel, 
BGB, about "isn't free", but where are your numbers? Show us reality, dude - 
exactly how "not free" is schema validation, under what loads, on what 
platforms? Hm?

I thought not.

>> depending on the application, it could be hard to justify spending the
>> extra clock cycles (except maybe for debugging purposes or similar).
>

How many "extra clock cycles", and does it cost less than the damage your 
development techniques cause?

> One of the points is that you can validate during integration test
> and if you encounter a problem but keep validation turned off otherwise.
> 
> And besides I would assume the big XML parser libraries to have
> optimized the validation quite a bit.

Given that BGB is just spewing dream talk with zero or less than zero facts, 
evidence or measurement behind it, it's pretty safe to dismiss his 
"conclusions".

or such ...

-- 
Lew

[toc] | [prev] | [next] | [standalone]


#11946

FromArne Vajhøj <arne@vajhoej.dk>
Date2012-02-11 15:20 -0500
Message-ID<4f36cd93$0$289$14726298@news.sunsite.dk>
In reply to#11944
On 2/11/2012 3:14 PM, Lew wrote:
> On Saturday, February 11, 2012 11:45:42 AM UTC-8, Arne Vajhøj wrote:
>> On 2/10/2012 12:25 PM, BGB wrote:
>>> On 2/9/2012 4:54 PM, Arne Vajhøj wrote:
>>>> On 2/8/2012 11:10 PM, BGB wrote:
>>>>> On 2/8/2012 7:14 PM, Arne Vajhøj wrote:
>>>>>> On 2/8/2012 8:49 PM, BGB wrote:
>>>>>>> say, if one is using XML for compiler ASTs or similar (say, the XML is
>>>>>>> used to represent a just-parsed glob of source-code), do they really
>>>>>>> need any sort of schema?
>>>>>>
>>>>>> I would expect syntax trees to follow certain rules and not be free
>>>>>> form.
>>>>>>
>>>>>
>>>>> well, there are some rules, but the question is more if a schema or the
>>>>> use of validation would offer much advantage to make using it worth the
>>>>> bother?...
>>>>
>>>> Enforcing correctness of data is usually a good idea.
>>>>
>>>
>>> potentially, but checking against schemas isn't free.
>
> Oh, yeah, micro-optimize that last $0.0000001 of performance.
>
> Great thinking.
>
> Checking against schemas isn't so expensive, either. You spout this drivel,
> BGB, about "isn't free", but where are your numbers? Show us reality, dude -
> exactly how "not free" is schema validation, under what loads, on what
> platforms? Hm?
>
> I thought not.
>
>>> depending on the application, it could be hard to justify spending the
>>> extra clock cycles (except maybe for debugging purposes or similar).
>>
>
> How many "extra clock cycles", and does it cost less than the damage your
> development techniques cause?
>
>> One of the points is that you can validate during integration test
>> and if you encounter a problem but keep validation turned off otherwise.
>>
>> And besides I would assume the big XML parser libraries to have
>> optimized the validation quite a bit.
>
> Given that BGB is just spewing dream talk with zero or less than zero facts,
> evidence or measurement behind it, it's pretty safe to dismiss his
> "conclusions".
>
> or such ...

In science you dismiss hypothesis's based on proving them wrong
not by noting the lack of proof.

Arne

[toc] | [prev] | [next] | [standalone]


#11961

FromBGB <cr88192@hotmail.com>
Date2012-02-11 22:20 -0700
Message-ID<jh7i91$tl2$1@news.albasani.net>
In reply to#11946
On 2/11/2012 1:20 PM, Arne Vajhøj wrote:
> On 2/11/2012 3:14 PM, Lew wrote:
>> On Saturday, February 11, 2012 11:45:42 AM UTC-8, Arne Vajhøj wrote:
>>> On 2/10/2012 12:25 PM, BGB wrote:
>>>> On 2/9/2012 4:54 PM, Arne Vajhøj wrote:
>>>>> On 2/8/2012 11:10 PM, BGB wrote:
>>>>>> On 2/8/2012 7:14 PM, Arne Vajhøj wrote:
>>>>>>> On 2/8/2012 8:49 PM, BGB wrote:
>>>>>>>> say, if one is using XML for compiler ASTs or similar (say, the
>>>>>>>> XML is
>>>>>>>> used to represent a just-parsed glob of source-code), do they
>>>>>>>> really
>>>>>>>> need any sort of schema?
>>>>>>>
>>>>>>> I would expect syntax trees to follow certain rules and not be free
>>>>>>> form.
>>>>>>>
>>>>>>
>>>>>> well, there are some rules, but the question is more if a schema
>>>>>> or the
>>>>>> use of validation would offer much advantage to make using it
>>>>>> worth the
>>>>>> bother?...
>>>>>
>>>>> Enforcing correctness of data is usually a good idea.
>>>>>
>>>>
>>>> potentially, but checking against schemas isn't free.
>>
>> Oh, yeah, micro-optimize that last $0.0000001 of performance.
>>
>> Great thinking.
>>
>> Checking against schemas isn't so expensive, either. You spout this
>> drivel,
>> BGB, about "isn't free", but where are your numbers? Show us reality,
>> dude -
>> exactly how "not free" is schema validation, under what loads, on what
>> platforms? Hm?
>>
>> I thought not.
>>
>>>> depending on the application, it could be hard to justify spending the
>>>> extra clock cycles (except maybe for debugging purposes or similar).
>>>
>>
>> How many "extra clock cycles", and does it cost less than the damage your
>> development techniques cause?
>>
>>> One of the points is that you can validate during integration test
>>> and if you encounter a problem but keep validation turned off otherwise.
>>>
>>> And besides I would assume the big XML parser libraries to have
>>> optimized the validation quite a bit.
>>
>> Given that BGB is just spewing dream talk with zero or less than zero
>> facts,
>> evidence or measurement behind it, it's pretty safe to dismiss his
>> "conclusions".
>>
>> or such ...
>
> In science you dismiss hypothesis's based on proving them wrong
> not by noting the lack of proof.
>

yeah...


and anyways, I am not about "making conclusions" or "decreeing how 
things should be done" or anything, rather, my view is there may be a 
time and place for everything (and whatever is or is not the case can be 
decided on a case-by-case basis or similar, based on whatever may apply 
in the particular case in question, and whichever options may be cheaper 
or more expensive, and similar).

IMHO, the idea that a person "should" always do things the same way in 
every situation is itself arguably questionable. likewise goes for a 
beliefs that something is universally required or universally 
prohibited, ...


[ decided to leave out most of the rest of what I wrote. ]

basically, it all amounted to the frustration that there is little point 
in trying to "prove" something which ultimately results to little more 
than "hair splitting over a few percentage points...".

the thing is... textual XML is kind of bulky, but doing damn near 
anything to it (like running it through deflate) will significantly 
reduce its size (say, to around 10-25% its original size). one can 
outperform this with specialized formats, but at this point it is 
worrying about a few percentage points +/-.

what is the point of "proving" something which is ultimately of a fairly 
limited significance and scope?...

maybe one can try to "prove" that people "should" actually give a crap.

or, for that matter, finding a particular claim to disprove (say, that X 
is always true or always false). this is rarely the case with data 
compression, as it is typically more about averages, and likewise, there 
are cases for which the data may actually get bigger (about the only 
real "absolute" in data compression is something commonly known as the 
"Shannon limit").

secondarily is the "law of diminishing returns" (itself a natural result 
of the Shannon limit), where essentially the compressibility of a piece 
of data will form a sort of curve, and any (lossless) algorithms will 
fall somewhere along this curve, and typically with a fairly consistent 
ordering (say, for example, LZMA tends to compress better than BZip2 
which tends to compress better than Deflate/GZip).

one can look at how each algorithm works internally, or experiment with 
how they can use the basic parts to build other things or achieve 
interesting results (and note mostly that the parts themselves tend to 
fall along these sorts of curves, reducing "compression" mostly to a 
matter of "going mix and match" with various parts and making 
cost/benefit tradeoffs between particular combinations of parts).

note that going further along the curve tends to become increasingly 
costly, hence why tradeoffs need to be made.


but, ultimately, how much something is relevant will itself tend to 
depend somewhat on context.

[toc] | [prev] | [next] | [standalone]


#11974

FromArne Vajhøj <arne@vajhoej.dk>
Date2012-02-12 09:23 -0500
Message-ID<4f37cb55$0$281$14726298@news.sunsite.dk>
In reply to#11961
On 2/12/2012 12:20 AM, BGB wrote:
> On 2/11/2012 1:20 PM, Arne Vajhøj wrote:
>> On 2/11/2012 3:14 PM, Lew wrote:
>>> On Saturday, February 11, 2012 11:45:42 AM UTC-8, Arne Vajhøj wrote:
>>>> One of the points is that you can validate during integration test
>>>> and if you encounter a problem but keep validation turned off
>>>> otherwise.
>>>>
>>>> And besides I would assume the big XML parser libraries to have
>>>> optimized the validation quite a bit.
>>>
>>> Given that BGB is just spewing dream talk with zero or less than zero
>>> facts,
>>> evidence or measurement behind it, it's pretty safe to dismiss his
>>> "conclusions".
>>>
>>> or such ...
>>
>> In science you dismiss hypothesis's based on proving them wrong
>> not by noting the lack of proof.
>>
>
> yeah...
>
> and anyways, I am not about "making conclusions" or "decreeing how
> things should be done" or anything, rather, my view is there may be a
> time and place for everything (and whatever is or is not the case can be
> decided on a case-by-case basis or similar, based on whatever may apply
> in the particular case in question, and whichever options may be cheaper
> or more expensive, and similar).
>
> IMHO, the idea that a person "should" always do things the same way in
> every situation is itself arguably questionable. likewise goes for a
> beliefs that something is universally required or universally
> prohibited, ...
...
 > but, ultimately, how much something is relevant will itself tend to
 > depend somewhat on context.

The fact that there is exceptions to most rules should not lead to
a perception that rules does not matter.

You should strive to go by the rules and only very reluctant go
for the exception if it is really needed.

Arne

[toc] | [prev] | [next] | [standalone]


#11997

FromBGB <cr88192@hotmail.com>
Date2012-02-12 12:13 -0700
Message-ID<jh931v$q33$1@news.albasani.net>
In reply to#11974
On 2/12/2012 7:23 AM, Arne Vajhøj wrote:
> On 2/12/2012 12:20 AM, BGB wrote:
>> On 2/11/2012 1:20 PM, Arne Vajhøj wrote:
>>> On 2/11/2012 3:14 PM, Lew wrote:
>>>> On Saturday, February 11, 2012 11:45:42 AM UTC-8, Arne Vajhøj wrote:
>>>>> One of the points is that you can validate during integration test
>>>>> and if you encounter a problem but keep validation turned off
>>>>> otherwise.
>>>>>
>>>>> And besides I would assume the big XML parser libraries to have
>>>>> optimized the validation quite a bit.
>>>>
>>>> Given that BGB is just spewing dream talk with zero or less than zero
>>>> facts,
>>>> evidence or measurement behind it, it's pretty safe to dismiss his
>>>> "conclusions".
>>>>
>>>> or such ...
>>>
>>> In science you dismiss hypothesis's based on proving them wrong
>>> not by noting the lack of proof.
>>>
>>
>> yeah...
>>
>> and anyways, I am not about "making conclusions" or "decreeing how
>> things should be done" or anything, rather, my view is there may be a
>> time and place for everything (and whatever is or is not the case can be
>> decided on a case-by-case basis or similar, based on whatever may apply
>> in the particular case in question, and whichever options may be cheaper
>> or more expensive, and similar).
>>
>> IMHO, the idea that a person "should" always do things the same way in
>> every situation is itself arguably questionable. likewise goes for a
>> beliefs that something is universally required or universally
>> prohibited, ...
> ...
>> but, ultimately, how much something is relevant will itself tend to
>> depend somewhat on context.
>
> The fact that there is exceptions to most rules should not lead to
> a perception that rules does not matter.
>
> You should strive to go by the rules and only very reluctant go
> for the exception if it is really needed.
>

possible.

others may go for an "all is allowed in programming, so long as it works 
ok and gets the job done" mindset. whether or not rules are followed may 
in turn depend on an evaluation of whether or not the rules work in 
ones' favor.

so, on one hand: well, I can follow this rule, and get certain desirable 
effects.

or, it may also work out as: this rule is stupid and inconvenient, I am 
not going to bother following it.

or maybe: the existing rule is stupid/inconvenient/..., so I am going to 
make up my own rules and follow them instead.


this does not necessarily mean making a standard of non-standard, as 
some piece of standardized technology (formally, or de-facto, it really 
doesn't matter) may itself carry desirable benefits.

as well noted, PNGs and JPEGs are an example of this:
they allow compatibility with existing applications which use these 
formats, etc, ...

so, although one could devise their own graphics format (I have done so 
before), using it may turn out to be so incredibly inconvenient for 
everyone involved that using it is ultimately not worth the bother.


likewise, in the everyday world, breaking laws may lead in turn to the 
police breaking down ones' door, and breaking moral and ethical rules 
may lead to various other consequences (do bad things and bad things may 
follow in turn).

so, all this doesn't give a person to do "whatever they want, whenever 
they want", because the rules of cost/benefit will prevent this (too 
many costs in these cases, defeating the benefits).

likewise, making a standard of non-standard, though not inherently bad, 
would likely end up being overly costly (in terms of use or maintenance 
or whatever else).


but, I am not going to try to list all of the costs and benefits one 
might encounter or how one may weight them, as there are too many and 
how much each may apply in a given situation is itself prone to vary.

[toc] | [prev] | [next] | [standalone]


#11838

FromArne Vajhøj <arne@vajhoej.dk>
Date2012-02-07 20:24 -0500
Message-ID<4f31ced9$0$282$14726298@news.sunsite.dk>
In reply to#11835
On 2/7/2012 6:38 PM, BGB wrote:
> On 2/7/2012 11:11 AM, jebblue wrote:
>> On Fri, 03 Feb 2012 19:52:08 +0000, Stefan Ram wrote:
>>> »X« below is another language than Java, for example,
>>> VBA, C#, or C.
>>>
>>> When an X process and a Java process have to exchange information on
>>> the same computer, what possibilites are there? The Java process
>>> should act as a client, sending commands to the X process and also
>>> wants to read answers from the X process. So, the X process is a kind
>>> of server.
>>>
>>> My criteria are: reliability and it should not be extremely slow (say
>>> exchanging a string should not take more than about 10 ms). The main
>>> criterion is reliability.
>>>
>>
>>> Sockets
>>>
>>> This is slightly less transparent than files, but has the advantage
>>> that it becomes very easy to have the two processes running on
>>> different computers later, if this should ever be required. Debugging
>>> should be possible by a man-in-the-middle proxy that prints all
>>> information it sees or by connecting to the server with a terminal.
>>>
>>
>> I recommend using sockets.
>
> in general, I agree (sockets generally make the most sense),

> another issue (besides how to pass messages), is what sort of form to
> pass messages in.
>
> usually, in my case, if storing data in files, I tend to prefer
> ASCII-based formats.
>
> usually, for passing messages over sockets, I have used "compact"
> specialized binary formats, typically serialized data from some other
> form (such as XML nodes or S-Expressions). although "magic byte value"
> based message formats are initially simpler, they tend to be harder to
> expand later (whereas encoding/decoding some more generic form, though
> initially more effort, can turn out to be easier to maintain and extend
> later).

If you want compact and text go for JSON.

Arne

[toc] | [prev] | [next] | [standalone]


#11839

FromMartin Gregorie <martin@address-in-sig.invalid>
Date2012-02-08 01:31 +0000
Message-ID<jgsj9c$sl6$2@localhost.localdomain>
In reply to#11835
On Tue, 07 Feb 2012 16:38:31 -0700, BGB wrote:

> in general, I agree (sockets generally make the most sense), although
> there are cases where file-based communications can make sense, although
> probably not in the form as described in the OP.
>
Yes, for small amounts of data or message passing between processes I 
tend to like sockets - as others have said, the fact that they are 
agnostic about the location of the communicating processes is often very 
useful.
  
> usually, for passing messages over sockets, I have used "compact"
> specialized binary formats,
>
Yep. ASN.1 has to be about the most compact way of encoding structured, 
multi-field messages with XML occupying the other end of the scale.

That said, for short, list of fields messages I often use a CSV string 
preceded by an unsigned binary byte value containing the string length: 
this type of message is both easy to transfer, even if the connection 
wants to fragment it during transmission, and by having a printable text 
payload, its also convenient for trouble shooting.


-- 
martin@   | Martin Gregorie
gregorie. | Essex, UK
org       |

[toc] | [prev] | [next] | [standalone]


#11844

FromBGB <cr88192@hotmail.com>
Date2012-02-08 00:55 -0700
Message-ID<jgt9s7$f7i$1@news.albasani.net>
In reply to#11839
On 2/7/2012 6:31 PM, Martin Gregorie wrote:
> On Tue, 07 Feb 2012 16:38:31 -0700, BGB wrote:
>
>> in general, I agree (sockets generally make the most sense), although
>> there are cases where file-based communications can make sense, although
>> probably not in the form as described in the OP.
>>
> Yes, for small amounts of data or message passing between processes I
> tend to like sockets - as others have said, the fact that they are
> agnostic about the location of the communicating processes is often very
> useful.
>

yep.


>> usually, for passing messages over sockets, I have used "compact"
>> specialized binary formats,
>>
> Yep. ASN.1 has to be about the most compact way of encoding structured,
> multi-field messages with XML occupying the other end of the scale.
>

I disagree partly WRT ASN.1:
a disadvantage of ASN.1 is that a lot of times it tends to use 
fixed-width integer encodings (and often sends structures in a 
"reasonably raw" form), whereas one can shave more bytes using a 
variable-length-integer scheme (why encode an integer in 4 bytes if you 
only need 1 byte in a given case?). it is also possible to shave more 
bytes if one makes the format use an adaptive/context-sensitive encoding 
scheme and maybe a variant of Huffman coding or similar (and possibly 
encode integer values using a similar scheme to that used in Deflate). 
it is in-fact not particularly difficult to outperform ASN.1 in these 
regards.


granted, yes, custom Huffman-based data encodings are probably not "the 
norm" for network protocols (though some programs, such as the Quake 3 
engine, have used Huffman-compressed network protocols).

there is also "arithmetic coding" and "range coding", but with these it 
is a lot harder to make the codec be acceptably fast (whereas there are 
some tricks to allow optimizing Huffman codecs).


in cases where I have used XML, I have typically used a custom binary 
XML variant, which can greatly reduce the overhead vs textual XML. in 
terms of saving bytes, my encoding can be more compact than WBXML or 
XML+Deflate, but is arguably more "esoteric", and as-is doesn't make use 
of schemas (it is instead a basic adaptive coding, and is vaguely 
similar to an LZ-Markov coding, attempting to exploit repeating patterns 
in tag-structure and similar via prediction, but like most adaptive 
codings initially transmits the data in a less dense form as it needs to 
build up a new context for each message). the coding in question doesn't 
use Huffman coding (for sake of simplicity, and because I don't always 
particularly need "maximum compactness"), but a Huffman-based variant 
could be created if needed.

there is also EXI, but I don't know how my encoding compares (EXI 
probably does better though, given that IIRC it uses binary universal 
codes and schemas).


for something else of mine I am using S-Expression based messages 
(currently between components within the same process), and had 
considered using a vaguely similar binary coding if/when I get around to it.


> That said, for short, list of fields messages I often use a CSV string
> preceded by an unsigned binary byte value containing the string length:
> this type of message is both easy to transfer, even if the connection
> wants to fragment it during transmission, and by having a printable text
> payload, its also convenient for trouble shooting.
>

yes, this is possible.

also possibly would be a TLV encoding (say, possibly doing something 
similar to the Matroska MKV file-format).


say, the integer values are encoded something like (range, encoding):
0-127		0xxxxxxx
128-16383	10xxxxxx xxxxxxxx
16384-2097151	110xxxxx xxxxxxxx xxxxxxxx
2097152-...	...

likewise, one can get a signed variant by folding the sign into the LSB, 
forming a pattern like: 0, -1, 1, -2, 2, ...

then, one defines tags as:
{
VLI tag;
VLI length;
byte data[length];
}

where tags can hold either data or messages (and, the smallest tag size 
needs 2 bytes, or 3 bytes if one has 1 byte of payload for the tag).


if the length is optional (presence depends on tag), one can reduce the 
typical tag size to 1 byte. likewise, tags can be combined with an 
MTF/MRU scheme such that any recently used tags have a small value (and 
can thus be encoded in a single byte). (many of my formats define tags 
inline, rather than relying on some large hard-coded tag-list).

more bytes can be saved if more of the message structure is known, say 
that not only does the tag encode a particular tag-type, but also may 
carry information about what follows after it (various combinations of 
attributes, and if it contains sub-tags and what they might be, ...).

if a new tag is defined, it is added to the MRU, but if not used 
frequently may move "backwards" (towards higher index numbers) or 
eventually be forgotten (falls off the end of the list).

note that some hard-coded tag-numbers will be needed for basic control 
purposes (encoding new/unfamiliar tags, ...).


a Huffman-based variant could be similar, just one may encode integers 
differently. an example scheme is to use a prefix value (Huffman coded) 
and a suffix bit pattern (similar to Deflate). a simpler (but less 
compact) scheme was used in JPEG, and IIRC I had before "compromised" 
between them by having the Huffman table be stored using Rice codes.


example (prefix range, value range, suffix bits):
0-15	0-15		0
16-23	16-31		1
24-31	32-63		2
32-39	64-127		3
40-47	128-255		4
48-55	512-1024	5
56-63	1024-2047	6
64-71	2048-4095	7
72-79	4096-8191	8
80-87	8192-16383	9
...

also note that a nifty thing (also used in Deflate) is to compress the 
Huffman table itself using Huffman coding.


likewise, one can save a few bytes if the encoder is smart enough to 
recognize when tags encode numeric data (mostly specific to XML, with 
S-Expressions or similar one knows when they are dealing with numeric data).

likewise, one can encode floats as a pair of integer values (although 
floats present a few of their own complexities). one can also devise 
special encodings for things like numeric vectors, quaternions, ... if 
needed as well.


likewise, either an LZ77 or LZ-Markov scheme can be used for encoding 
strings (an example would be to used a fixed-size rotating window like 
in Deflate, and essentially using the same basic encoding for strings, 
albeit likely with the use of an "End-Of-String" marker).

say (range, meaning):
0-255: literal byte values
258: End Of String
259-321: LZ77 Run (encodes length, followed by window offset).

String encoding would be used, say, for encoding both literal text, and 
also for escaping things like tag and attribute names.

...


the main variability is mostly in terms of the type of payload being 
transmitted:
be it XML-based, S-Expression based, or potentially object-based 
(similar to either JSON, or a sort of "heap pickling" style system).


for most structured data, it shouldn't be needed to change the 
"fundamentals" too much. the main difference is between tree-structured 
and heap-like / graph-structured data, as graph-structured data is often 
better sent as a flat list of objects with a certain entry being a "root 
node" than as a tree (this can be accomplished either by building a 
list, or using an algorithm to detect and break-up cycles when needed).


granted, for most use-cases something like this is likely to be overkill.


or such...

[toc] | [prev] | [standalone]


Page 4 of 4 — ← Prev page 1 2 3 [4]

Back to top | Article view | comp.lang.java.programmer


csiph-web