Groups | Search | Server Info | Login | Register


Groups > comp.sources.d > #76

Re: Looking for XML linearization information

From BGB <cr88192@hotmail.com>
Newsgroups comp.programming, comp.text.xml, comp.sources.d
Subject Re: Looking for XML linearization information
Date 2011-02-03 13:36 -0700
Organization albasani.net
Message-ID <iif3lg$1gv$1@news.albasani.net> (permalink)
References <0ba9f95a-83d8-4a9e-9d95-c87046a71f20@y26g2000yqd.googlegroups.com>

Cross-posted to 3 groups.

Show all headers | View raw


On 2/3/2011 12:29 PM, Generic Usenet Account wrote:
> Hello,
>
> Are there are tools/W3C standards/design patterns etc. for linearizing
> XML content?  Basically I want to send information, which is natively
> in XML, to a resource constrained device that does not have XML
> awareness.  In other words, the resource constrained device does not
> do any DOM or SAX processing of XML.
>

depends on what exactly you are wanting...


if a library:

one option is to use (or write) an XML library, but depending on memory 
resources, this may be too memory-hungry (for example, a lot of XML as 
DOM nodes will eat up a large chunk of memory even on desktop PCs).

if one adjusts the implementation to their needs, they can do a DOM-like 
implementation which needs a lot less memory than standard DOM (if one 
omits namespaces and doubly-linked structures, and uses ASCII or UTF-8 
rather than UTF-16, a fair bit can be saved).


SAX could be better, as it can allow a small implementation which does 
not require in-memory storage.


if a binary interchange:
well, WBXML could work.

http://en.wikipedia.org/wiki/WBXML

there is EXI, but EXI looks likely to require a more complex 
implementation (but is entropy/huffman coded so could save some bytes).

http://en.wikipedia.org/wiki/Efficient_XML_Interchange


also maybe relevant:
http://msdn.microsoft.com/en-us/library/cc219210%28PROT.10%29.aspx


for my uses, I rolled my own format (which I call SBXE) which is 
structurally vaguely similar to WBXML, but in general is more compact in 
my tests (for generic/schema-free operation, which is my main use-case), 
and is simpler and faster to decode than textual XML. its main 
difference from WBXML is that tags/strings are defined inline and go 
into MRU lists, and when in the list is referenced by its MRU index (a 
variant of "move to front" was used).

it also responds favorably to deflate.

some info (if server stays up...):
http://cr88192.dyndns.org/2010-10-27_SBXE11.txt

it was first defined/implemented around 2005, but I forgot about it for 
several years due to not having much use for it at the time.

I designed a new variant which could be (potentially) more compact, but 
the improvement was likely modest and not worth the hassle of having to 
re-implement it.

looking, there are a few holes in the spec...
the UVLI (unsigned variable-length integer) scheme is like this:
0..127: 0xxxxxxx
127..16383: 10xxxxxx xxxxxxxx
16384.. ...: 110xxxxx xxxxxxxx xxxxxxxx
...

note: high-bits/bytes come first.

with sign folding (for VLI) being into the LSB, so:
0, -1, 1, -2, 2, -3, 3, ...

Back to comp.sources.d | Previous | Next | Find similar


Thread

Re: Looking for XML linearization information BGB <cr88192@hotmail.com> - 2011-02-03 13:36 -0700

csiph-web