Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.ruby > #3412

Re: Binary file: SAT

From William Rutiser <wruyahoo05@comcast.net>
Newsgroups comp.lang.ruby
Subject Re: Binary file: SAT
Date 2011-04-23 13:20 -0500
Organization Service de news de lacave.net
Message-ID <4DB31887.20803@comcast.net> (permalink)
References (3 earlier) <BANLkTimZvkXz=7bTAcdnwdU2AsH6MzYd8w@mail.gmail.com> <2cddb6943b03d69eca28d3dffeba1374@ruby-forum.com> <b1b002b0b933deeeb92ca664c4f7b79f@ruby-forum.com> <a9a17c857d094cdaea68f811a6bdfa1c@ruby-forum.com> <d05142eefdc15ef8496889ba2fdc2918@ruby-forum.com>

Show all headers | View raw


On 2011-04-22 2:49 PM, 7stud -- wrote:
> Alessandro Barracco wrote in post #994473:
>>> Do not think of binary files as containing lines.  A binary file is a
>>> long continuous sequence of integers contained in a varying number of
>>> bytes.
>> That's OK. but the file I need to parse is a special txt file (DXF
>> format) that consist of couple-of-line:
> Binary files do not have lines.  Until you can understand that, you
> cannot proceed.  Binary files consist of blocks of bytes.  Each block
> contains some data.  Each block consists of a different number of bytes.
>
Its not to helpful to someone trying to deal with DXF files to make such 
a strong distinction between binary and text files. I haven't worked 
with them and hope I never have to. A quick look at the Wikipedia 
article and the most recent Autocad spec suggests that the files may be 
best thought of as a mixture of binary and ASCII data. The original DXF 
files were text files where each line was a key value pair with the 
value generally a decimal representation of a floating point number. 
There is now an optional file format that contains binary 
representations of the numbers to reduce precision losses caused by 
repeated conversions and save some space. Most of the 270 page 
specification appears to describe the ASCII format with the binary 
format introduced on page 242.


You can get a recent DXF spec at:
http://images.autodesk.com/adsk/files/autocad_2012_pdf_dxf-reference_enu.pdf

This may give a helpful overview:
http://en.wikipedia.org/wiki/Dxf

Alessandro's problem is to read and parse a file that contains small fields to be interpreted as ASCII text, binary integers, floating point numbers, etc. Just what will come next is determined by what came just before with reference to a 270 page document which has a few
examples in Visual Basic 6.

I would proceed as follows:

* Figure out which kinds of primitive data are expected in the files of interest.

* For each kind, write and test a function to read and convert one such item.

* Write a function to read the next entity record from the file. Its likely that this function
should return a Ruby object that represents the particular kind of entity.

The ACIS spec says "The header is followed by a sequence of entity records.
Each entity record consists of a sequence number (optional), an entity type identifier,
the entity data, and a terminator."

So to read an entity record, first read the sequence number if present, then read the type identifier. The type identifier should be used to select an appropriate function to read the data part of the entity record. Then read the terminator unless it was already used to end the entity data.


Essential tools:

Something to examine and print pieces of the data in hexadecimal. Use this to explore the
data and resolve questions about byte order, number encoding, etc.

The ruby String pack and unpack functions.

Possibly an assortment of colored pencils to mark up printed hex dumps of the data.

There may be some Ruby tools specifically intended for this kind of work.



Caveat:
I may have written more than I know about some of the details but I think the general ideas are correct.


-- Bill





Back to comp.lang.ruby | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Binary file: SAT Alessandro Barracco <bomastudio@gmail.com> - 2011-04-20 17:00 -0500
  Re: Binary file: SAT 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-20 19:02 -0500
    Re: Binary file: SAT 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-20 19:45 -0500
      Re: Binary file: SAT Roger Braun <roger@rogerbraun.net> - 2011-04-20 21:53 -0500
        Re: Binary file: SAT Alessandro Barracco <bomastudio@gmail.com> - 2011-04-21 03:06 -0500
          Re: Binary file: SAT 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-21 12:25 -0500
            Re: Binary file: SAT Alessandro Barracco <bomastudio@gmail.com> - 2011-04-22 04:51 -0500
              Re: Binary file: SAT 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-22 13:49 -0500
                Re: Binary file: SAT William Rutiser <wruyahoo05@comcast.net> - 2011-04-23 13:20 -0500
  Re: Binary file: SAT 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-21 12:53 -0500

csiph-web