Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #9482

Re: Piggypack Encoding/Decoding on RandomAccessFile

From Eric Sosman <esosman@ieee-dot-org.invalid>
Newsgroups comp.lang.java.programmer
Subject Re: Piggypack Encoding/Decoding on RandomAccessFile
Date 2011-11-03 20:40 -0400
Organization A noiseless patient Spider
Message-ID <j8vcaa$tnj$1@dont-email.me> (permalink)
References <j8ulue$or4$1@news.albasani.net> <j8uq9l$17l$1@dont-email.me> <j8urao$53e$1@news.albasani.net>

Show all headers | View raw


On 11/3/2011 3:50 PM, Jan Burse wrote:
> Joshua Cranmer schrieb:
>> The "standard way" (at least, all of the use cases I've ever had for
>> RandomAccessFile) effectively uses the methods that are associated with
>> java.io.DataInput to read data: read(byte[]), and read*().
>
> I would like to use an arbirary encoding/decoding on top of the
> byte stream to get a character stream. But since RandomAccessFile
> does not implement InputStream/OutputStream, I cannot create
> a InputStreamReader/OutputStreamWrite on top.

     For a completely "arbitrary" encoding, I think you're out of luck.
Stateful encodings (where the encoding of byte B[n] is a function of
B[n-1],B[n-2],...) make it difficult to begin in medias res: You cannot
know how to decode the first byte you read without already having seen
all its predecessors.

     To support random access, where you'd like to jump directly to B[n]
without plowing through all that goes before, one usually addresses the
problem by restricting the valid n to multiples of some "block size,"
and encoding each "block" independently. You seek to the next lower
multiple of 32K or whatever, set your decryptor/compressor/decoder to
its initial state, and roll merrily along.

     There's a problem if the encoding does not always map K input bytes
to f(K) output bytes: compressors, for example, output different amounts
of data depending on the values of the bytes compressed.  There are two
principal methods for dealing with this difficulty:

     1) Encode the original in blocks of 32K (say), and store each
encoded block in a file region that's sure to be large enough -- 40K,
perhaps.  Pad with nulls or other junk values as needed, so long as
your decompressor can recognize and ignore the padding.  Then original
byte N is in block number N/32K, whose encoding starts at (N/32K)*40K
in the file; seek to that spot and start decoding.

     2) As before, encode the original in fixed-size blocks, but write
them cheek by jowl to the file.  As you do so, also write an index file
that's essentially Map<OriginalByteNumber,EncodedByteNumber> for each
block boundary.  Then original byte N is in the block beginning at
theMap.get(N/32K); seek to that spot and start decoding.

     Elsethread you mention that RandomAccessFile provides neither
InputStream nor OutputStream.  If you think about this a bit, you'll
see it's a natural consequence of the "Random" part: a Stream provides
the abstraction of a linear sequence of things, and does not admit of
leaping forward or backward to unrelated positions.  Yes, there are
skip() and mark() and reset(), but I think you'll agree these are of
a different character than "read bytes 3000-3999, then 10000-10999,
then 936-22728."  Streams are sequential; Random isn't.

-- 
Eric Sosman
esosman@ieee-dot-org.invalid

Back to comp.lang.java.programmer | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Piggypack Encoding/Decoding on RandomAccessFile Jan Burse <janburse@fastmail.fm> - 2011-11-03 19:18 +0100
  Re: Piggypack Encoding/Decoding on RandomAccessFile Joshua Cranmer <Pidgeot18@verizon.invalid> - 2011-11-03 14:32 -0500
    Re: Piggypack Encoding/Decoding on RandomAccessFile Jan Burse <janburse@fastmail.fm> - 2011-11-03 20:50 +0100
      Re: Piggypack Encoding/Decoding on RandomAccessFile markspace <-@.> - 2011-11-03 13:52 -0700
        Re: Piggypack Encoding/Decoding on RandomAccessFile Jan Burse <janburse@fastmail.fm> - 2011-11-03 23:13 +0100
        Re: Piggypack Encoding/Decoding on RandomAccessFile Knute Johnson <nospam@knutejohnson.com> - 2011-11-03 16:17 -0700
      Re: Piggypack Encoding/Decoding on RandomAccessFile Lew <lewbloch@gmail.com> - 2011-11-03 13:58 -0700
      Re: Piggypack Encoding/Decoding on RandomAccessFile Eric Sosman <esosman@ieee-dot-org.invalid> - 2011-11-03 20:40 -0400
        Re: Piggypack Encoding/Decoding on RandomAccessFile Jan Burse <janburse@fastmail.fm> - 2011-11-04 02:28 +0100
          Re: Piggypack Encoding/Decoding on RandomAccessFile Jan Burse <janburse@fastmail.fm> - 2011-11-04 03:06 +0100
            Re: Piggypack Encoding/Decoding on RandomAccessFile Eric Sosman <esosman@ieee-dot-org.invalid> - 2011-11-04 08:05 -0400
              Re: Piggypack Encoding/Decoding on RandomAccessFile Jan Burse <janburse@fastmail.fm> - 2011-11-04 16:12 +0100
        Re: Piggypack Encoding/Decoding on RandomAccessFile rossum <rossum48@coldmail.com> - 2011-11-04 16:54 +0000
  Re: Piggypack Encoding/Decoding on RandomAccessFile Jan Burse <janburse@fastmail.fm> - 2011-11-03 23:24 +0100
  Re: Piggypack Encoding/Decoding on RandomAccessFile Arne Vajhøj <arne@vajhoej.dk> - 2011-11-03 20:14 -0400
  Re: Piggypack Encoding/Decoding on RandomAccessFile Roedy Green <see_website@mindprod.com.invalid> - 2011-11-03 21:56 -0700
    [OT] Conspiracy theories are BS (Was: Piggypack Encoding/Decoding on RandomAccessFile) Lew <lewbloch@gmail.com> - 2011-11-04 10:50 -0700
      Re: [OT] Conspiracy theories are BS (Was: Piggypack Encoding/Decoding on RandomAccessFile) Arne Vajhøj <arne@vajhoej.dk> - 2011-11-04 21:07 -0400
      Re: [OT] Conspiracy theories are BS (Was: Piggypack Encoding/Decoding on RandomAccessFile) Roedy Green <see_website@mindprod.com.invalid> - 2011-11-05 20:21 -0700
      Re: [OT] Conspiracy theories are BS (Was: Piggypack Encoding/Decoding on RandomAccessFile) Roedy Green <see_website@mindprod.com.invalid> - 2011-11-05 20:24 -0700
        Re: [OT] Conspiracy theories are BS (Was: Piggypack Encoding/Decoding on RandomAccessFile) Jan Burse <janburse@fastmail.fm> - 2011-11-06 10:36 +0100
      Re: [OT] Conspiracy theories are BS (Was: Piggypack Encoding/Decoding on RandomAccessFile) Joshua Cranmer <Pidgeot18@verizon.invalid> - 2011-11-06 14:16 -0600
  Re: Piggypack Encoding/Decoding on RandomAccessFile Stanimir Stamenkov <s7an10@netscape.net> - 2011-11-05 16:51 +0200
    Re: Piggypack Encoding/Decoding on RandomAccessFile Jan Burse <janburse@fastmail.fm> - 2011-11-05 16:27 +0100
      Re: Piggypack Encoding/Decoding on RandomAccessFile Lew <lewbloch@gmail.com> - 2011-11-05 10:03 -0700
        Re: Piggypack Encoding/Decoding on RandomAccessFile Jan Burse <janburse@fastmail.fm> - 2011-11-05 19:37 +0100
          Re: Piggypack Encoding/Decoding on RandomAccessFile Lew <lewbloch@gmail.com> - 2011-11-05 13:25 -0700
        Re: Piggypack Encoding/Decoding on RandomAccessFile Jan Burse <janburse@fastmail.fm> - 2011-11-05 19:47 +0100
        Re: Piggypack Encoding/Decoding on RandomAccessFile Jan Burse <janburse@fastmail.fm> - 2011-11-05 19:56 +0100
          Re: Piggypack Encoding/Decoding on RandomAccessFile Lew <lewbloch@gmail.com> - 2011-11-05 13:29 -0700
            Re: Piggypack Encoding/Decoding on RandomAccessFile Jan Burse <janburse@fastmail.fm> - 2011-11-06 10:42 +0100

csiph-web