Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.ruby > #3412

Re: Binary file: SAT

Path csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!feeder.news-service.com!de-l.enfer-du-nord.net!feeder1.enfer-du-nord.net!talisker.lacave.net!lacave.net!not-for-mail
From William Rutiser <wruyahoo05@comcast.net>
Newsgroups comp.lang.ruby
Subject Re: Binary file: SAT
Date Sat, 23 Apr 2011 13:20:55 -0500
Organization Service de news de lacave.net
Lines 74
Message-ID <4DB31887.20803@comcast.net> (permalink)
References <f82185e10f19c74e1962c03aeccd4771@ruby-forum.com> <4fa8adc3b92c44287a399f6cb1aab3ff@ruby-forum.com> <e003a225f227d13a192196cfde414b7c@ruby-forum.com> <BANLkTimZvkXz=7bTAcdnwdU2AsH6MzYd8w@mail.gmail.com> <2cddb6943b03d69eca28d3dffeba1374@ruby-forum.com> <b1b002b0b933deeeb92ca664c4f7b79f@ruby-forum.com> <a9a17c857d094cdaea68f811a6bdfa1c@ruby-forum.com> <d05142eefdc15ef8496889ba2fdc2918@ruby-forum.com>
NNTP-Posting-Host bristol.highgroove.com
Content-Type text/plain; charset=UTF-8
Content-Transfer-Encoding 7bit
X-Trace talisker.lacave.net 1303582870 85119 65.111.164.187 (23 Apr 2011 18:21:10 GMT)
X-Complaints-To abuse@lacave.net
NNTP-Posting-Date Sat, 23 Apr 2011 18:21:10 +0000 (UTC)
In-Reply-To <d05142eefdc15ef8496889ba2fdc2918@ruby-forum.com>
X-Received-From This message has been automatically forwarded from the ruby-talk mailing list by a gateway at comp.lang.ruby. If it is SPAM, it did not originate at comp.lang.ruby. Please report the original sender, and not us. Thanks! For more details about this gateway, please visit: http://blog.grayproductions.net/categories/the_gateway
X-Mail-Count 382099
X-Ml-Name ruby-talk
X-Rubymirror Yes
X-Ruby-Talk <4DB31887.20803@comcast.net>
Xref x330-a1.tempe.blueboxinc.net comp.lang.ruby:3412

Show key headers only | View raw


On 2011-04-22 2:49 PM, 7stud -- wrote:
> Alessandro Barracco wrote in post #994473:
>>> Do not think of binary files as containing lines.  A binary file is a
>>> long continuous sequence of integers contained in a varying number of
>>> bytes.
>> That's OK. but the file I need to parse is a special txt file (DXF
>> format) that consist of couple-of-line:
> Binary files do not have lines.  Until you can understand that, you
> cannot proceed.  Binary files consist of blocks of bytes.  Each block
> contains some data.  Each block consists of a different number of bytes.
>
Its not to helpful to someone trying to deal with DXF files to make such 
a strong distinction between binary and text files. I haven't worked 
with them and hope I never have to. A quick look at the Wikipedia 
article and the most recent Autocad spec suggests that the files may be 
best thought of as a mixture of binary and ASCII data. The original DXF 
files were text files where each line was a key value pair with the 
value generally a decimal representation of a floating point number. 
There is now an optional file format that contains binary 
representations of the numbers to reduce precision losses caused by 
repeated conversions and save some space. Most of the 270 page 
specification appears to describe the ASCII format with the binary 
format introduced on page 242.


You can get a recent DXF spec at:
http://images.autodesk.com/adsk/files/autocad_2012_pdf_dxf-reference_enu.pdf

This may give a helpful overview:
http://en.wikipedia.org/wiki/Dxf

Alessandro's problem is to read and parse a file that contains small fields to be interpreted as ASCII text, binary integers, floating point numbers, etc. Just what will come next is determined by what came just before with reference to a 270 page document which has a few
examples in Visual Basic 6.

I would proceed as follows:

* Figure out which kinds of primitive data are expected in the files of interest.

* For each kind, write and test a function to read and convert one such item.

* Write a function to read the next entity record from the file. Its likely that this function
should return a Ruby object that represents the particular kind of entity.

The ACIS spec says "The header is followed by a sequence of entity records.
Each entity record consists of a sequence number (optional), an entity type identifier,
the entity data, and a terminator."

So to read an entity record, first read the sequence number if present, then read the type identifier. The type identifier should be used to select an appropriate function to read the data part of the entity record. Then read the terminator unless it was already used to end the entity data.


Essential tools:

Something to examine and print pieces of the data in hexadecimal. Use this to explore the
data and resolve questions about byte order, number encoding, etc.

The ruby String pack and unpack functions.

Possibly an assortment of colored pencils to mark up printed hex dumps of the data.

There may be some Ruby tools specifically intended for this kind of work.



Caveat:
I may have written more than I know about some of the details but I think the general ideas are correct.


-- Bill





Back to comp.lang.ruby | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Binary file: SAT Alessandro Barracco <bomastudio@gmail.com> - 2011-04-20 17:00 -0500
  Re: Binary file: SAT 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-20 19:02 -0500
    Re: Binary file: SAT 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-20 19:45 -0500
      Re: Binary file: SAT Roger Braun <roger@rogerbraun.net> - 2011-04-20 21:53 -0500
        Re: Binary file: SAT Alessandro Barracco <bomastudio@gmail.com> - 2011-04-21 03:06 -0500
          Re: Binary file: SAT 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-21 12:25 -0500
            Re: Binary file: SAT Alessandro Barracco <bomastudio@gmail.com> - 2011-04-22 04:51 -0500
              Re: Binary file: SAT 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-22 13:49 -0500
                Re: Binary file: SAT William Rutiser <wruyahoo05@comcast.net> - 2011-04-23 13:20 -0500
  Re: Binary file: SAT 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-21 12:53 -0500

csiph-web