Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.ruby > #3271 > unrolled thread

Binary file: SAT

Started byAlessandro Barracco <bomastudio@gmail.com>
First post2011-04-20 17:00 -0500
Last post2011-04-21 12:53 -0500
Articles 10 — 4 participants

Back to article view | Back to comp.lang.ruby


Contents

  Binary file: SAT Alessandro Barracco <bomastudio@gmail.com> - 2011-04-20 17:00 -0500
    Re: Binary file: SAT 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-20 19:02 -0500
      Re: Binary file: SAT 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-20 19:45 -0500
        Re: Binary file: SAT Roger Braun <roger@rogerbraun.net> - 2011-04-20 21:53 -0500
          Re: Binary file: SAT Alessandro Barracco <bomastudio@gmail.com> - 2011-04-21 03:06 -0500
            Re: Binary file: SAT 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-21 12:25 -0500
              Re: Binary file: SAT Alessandro Barracco <bomastudio@gmail.com> - 2011-04-22 04:51 -0500
                Re: Binary file: SAT 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-22 13:49 -0500
                  Re: Binary file: SAT William Rutiser <wruyahoo05@comcast.net> - 2011-04-23 13:20 -0500
    Re: Binary file: SAT 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-21 12:53 -0500

#3271 — Binary file: SAT

FromAlessandro Barracco <bomastudio@gmail.com>
Date2011-04-20 17:00 -0500
SubjectBinary file: SAT
Message-ID<f82185e10f19c74e1962c03aeccd4771@ruby-forum.com>
Hi all. I never work before with binary file, and I'm a bit
confused..... I need to  "read" in a text file (a *.dxf file) a block of
lines encoded in binary format (in the sat/sab file, like here:

<a href="http://paulbourke.net/dataformats/sat/sat.pdf">link</a>)

How can I do it?

-- 
Posted via http://www.ruby-forum.com/.

[toc] | [next] | [standalone]


#3278

From7stud -- <bbxx789_05ss@yahoo.com>
Date2011-04-20 19:02 -0500
Message-ID<4fa8adc3b92c44287a399f6cb1aab3ff@ruby-forum.com>
In reply to#3271
Alessandro Barracco wrote in post #994136:
> Hi all. I never work before with binary file, and I'm a bit
> confused.....
>

Both numbers and characters are stored as integers in file(or anywhere 
on a computer).  One method of storing characters in a file is with the 
ASCII encoding.  For instance, in the ASCII encoding 'a' is stored as 
the integer 67, taking up one byte total.  Note that you could also 
store the integer 67 in 4 bytes--the other three bytes would just be all 
0's.

You may also want to store the count of the number of banks in New York, 
which is 67.  You could also store that in one byte.  So the question 
becomes, how do you know whether a 67 you read from the file is supposed 
to be the count of banks or the letter 'a'?  The answer is: you have to 
know how the data in the file is supposed to be interpreted.

If the integer in the first byte in a file is supposed to be an integer, 
than you read in the integer as is; and if the integer in the second 
byte in the file is supposed to be a letter, then you need to convert 
the integer to a letter.  In other words, you have to know what each 
byte in the file is supposed to represent.

-- 
Posted via http://www.ruby-forum.com/.

[toc] | [prev] | [next] | [standalone]


#3281

From7stud -- <bbxx789_05ss@yahoo.com>
Date2011-04-20 19:45 -0500
Message-ID<e003a225f227d13a192196cfde414b7c@ruby-forum.com>
In reply to#3278
7stud -- wrote in post #994163:
>
> Once you are familiar with what each byte in your file represents, you
> can use String#unpack to tell ruby how many bytes each integer occupies,
> and how to interpret the integer.

But, I can't get a simple unpack() example to work, so what do I know:

str = "\x00\x00\x00\x61"  #97 in hex, taking up 4 bytes

results = str.unpack("L")
p results

--output:--
[1627389952]

-- 
Posted via http://www.ruby-forum.com/.

[toc] | [prev] | [next] | [standalone]


#3285

FromRoger Braun <roger@rogerbraun.net>
Date2011-04-20 21:53 -0500
Message-ID<BANLkTimZvkXz=7bTAcdnwdU2AsH6MzYd8w@mail.gmail.com>
In reply to#3281
Hi,

2011/4/21 7stud -- <bbxx789_05ss@yahoo.com>:
> 7stud -- wrote in post #994163:
>>
>> Once you are familiar with what each byte in your file represents, you
>> can use String#unpack to tell ruby how many bytes each integer occupies,
>> and how to interpret the integer.
>
> But, I can't get a simple unpack() example to work, so what do I know:
>
> str = "\x00\x00\x00\x61"  #97 in hex, taking up 4 bytes
>
> results = str.unpack("L")
> p results
>
> --output:--
> [1627389952]

It's the correct result. L uses your systems endianness, which seems
to be little-endian. If you force big-endian by using N instead of L,
you will get your expected 97.

ruby-1.9.2-p180 :008 > str = "\x61\x00\x00\x00"
 => "a\u0000\u0000\u0000"
ruby-1.9.2-p180 :009 > results = str.unpack("L")
 => [97]

ruby-1.9.2-p180 :011 > str = "\x00\x00\x00\x61"
 => "\u0000\u0000\u0000a"
ruby-1.9.2-p180 :012 > results = str.unpack("N")
 => [97]


-- 
Roger Braun
rbraun.net | humoralpathologie.de

[toc] | [prev] | [next] | [standalone]


#3298

FromAlessandro Barracco <bomastudio@gmail.com>
Date2011-04-21 03:06 -0500
Message-ID<2cddb6943b03d69eca28d3dffeba1374@ruby-forum.com>
In reply to#3285
Thanx you all. I'm beginning to understand a bit....

These are the first 20 lines of the binary-block in the file:
------------------------------------------------------------------------

 1
mogoo mih m o
  1
_ll P/:1 [:,681 ^ 336>1<: ^ \VL ]*63;:- _nk ^ \VL mogqoo QK _mk H:; ^ /- 
mo mmeogemi monn
  1
n fqfffffffffffffffj:rooh n:rono
  1
>,27:>;:- {rn rn _nm mnmqoqoqjgmm |
  1
=0;& {m rn {rn {l {rn {rn |
  1
-:9@)+r:&:r>++-6= {rn rn {rn {rn {n {k {j |
  1
3*2/ {i rn {rn {rn {h {n |
  1
:&:@-:961:2:1+ {rn rn _j 8-6;  n _l +-6 n _k ,*-9 o _l >;5 o _k 8->; o 
_f /0,+<7:<4 o _k ,+03 oqoohilhjnjnfgklfljfh _k 1+03 lo _k ;,63 o _g 
93>+1:,, o _h /6'>-:> o _k 72>' o _i 8-6;>- o _j 28-6; looo _j *8-6; o 
_j )8-6; o _no :1;@96:3;, |
  1
):-+:'@+:2/3>+: {rn rn l o n g |
  1
-:9@)+r:&:r>++-6= {rn rn {rn {rn {l {k {j |

-------------------------------------------------------------------------

It consists of pairs of lines: the first is a code (always 1), the 
second is the data. I think that the latter is wrote according to the 
SAT format (well to the SAB format, it's binary....).

ACIS supports two kinds of save files, SAT and SAB, which stand for 
“Standard ACIS Text”
and “Standard ACIS Binary”, respectively. Although one is ASCII text and 
the other is binary
data, the model data information stored in the two formats is identical

A SAB file has a .sab file extension. A SAB file uses delimiters
between elements and binary tags, without additional formatting.
The binary formats supported are:
int . . . . . . . . . . 4--byte 2s complement (as long)
long . . . . . . . . . 4--byte 2s complement
double . . . . . . . 8--byte IEEE
char . . . . . . . . . 1--byte ASCII
where “byte” is eight bits, and files are considered to be byte strings. 
For multi--byte data
items, byte order normally just matches that of the processor being 
used, but a specific order
may be imposed by compiling with the preprocessor macro BIG_ENDIAN or
LITTLE_ENDIAN defined.

-- 
Posted via http://www.ruby-forum.com/.

[toc] | [prev] | [next] | [standalone]


#3324

From7stud -- <bbxx789_05ss@yahoo.com>
Date2011-04-21 12:25 -0500
Message-ID<b1b002b0b933deeeb92ca664c4f7b79f@ruby-forum.com>
In reply to#3298
Alessandro Barracco wrote in post #994230:
> Thanx you all. I'm beginning to understand a bit....
>
> These are the first 20 lines of the binary-block in the file:
>

Binary files aren't human readable, i.e. they look like nonsense.

>
> It consists of pairs of lines: the first is a code (always 1), the
> second is the data.

Do not think of binary files as containing lines.  A binary file is a 
long continuous sequence of bytes.  And you have to know exactly what 
each byte means to read the data.  For instance, you have to know that 
the first 4 bytes is the count of banks in New York, and the next byte 
is a letter, and the next 2 bytes is the year, and the next 2 bytes is 
the month, etc.

-- 
Posted via http://www.ruby-forum.com/.

[toc] | [prev] | [next] | [standalone]


#3375

FromAlessandro Barracco <bomastudio@gmail.com>
Date2011-04-22 04:51 -0500
Message-ID<a9a17c857d094cdaea68f811a6bdfa1c@ruby-forum.com>
In reply to#3324
> Do not think of binary files as containing lines.  A binary file is a
> long continuous sequence of integers contained in a varying number of
> bytes.

That's OK. but the file I need to parse is a special txt file (DXF 
format) that consist of couple-of-line: the 1st is a code, that specify 
an objectt-property (the colour of a line, the center of a circle, the 
hieght of a text, etc), the 2nd is the value associated with it.
Well, there is a special object, the 3dsolid, that have 4 or 5 copules 
like above, and a long series of couple that have the 1st line always 1 
and the 2nd one as binary data.

Group code         Description
8                  Layer name
70                 Modeler format version number (currently = 1)
..                ....
1                  Proprietary data (multiple lines < 255 characters 
each)
3                  Additional lines of proprietary data (if previous 
group 1 string is greater than 255 characters)(optional)

For exanple, the following draws a line, in the layer "Walls", from the 
point (16.5, 12.5,0.0) to (46.5,12.5,0.0).

 0
LINE
  8
Walls
 10
16.5
 20
12.5
 30
0.0
 11
46.5
 21
12.5
 31
0.0


My task is to "understand" the object "3dsolid" that have also the 
"Proprietary data", ie the binary data. Searching in Google I found that 
this data are set according to the ACIS *.sab standard (the link in the 
first post), so I think I can read that binary..... isn't it?

-- 
Posted via http://www.ruby-forum.com/.

[toc] | [prev] | [next] | [standalone]


#3387

From7stud -- <bbxx789_05ss@yahoo.com>
Date2011-04-22 13:49 -0500
Message-ID<d05142eefdc15ef8496889ba2fdc2918@ruby-forum.com>
In reply to#3375
Alessandro Barracco wrote in post #994473:
>> Do not think of binary files as containing lines.  A binary file is a
>> long continuous sequence of integers contained in a varying number of
>> bytes.
>
> That's OK. but the file I need to parse is a special txt file (DXF
> format) that consist of couple-of-line:

Binary files do not have lines.  Until you can understand that, you 
cannot proceed.  Binary files consist of blocks of bytes.  Each block 
contains some data.  Each block consists of a different number of bytes.

-- 
Posted via http://www.ruby-forum.com/.

[toc] | [prev] | [next] | [standalone]


#3412

FromWilliam Rutiser <wruyahoo05@comcast.net>
Date2011-04-23 13:20 -0500
Message-ID<4DB31887.20803@comcast.net>
In reply to#3387
On 2011-04-22 2:49 PM, 7stud -- wrote:
> Alessandro Barracco wrote in post #994473:
>>> Do not think of binary files as containing lines.  A binary file is a
>>> long continuous sequence of integers contained in a varying number of
>>> bytes.
>> That's OK. but the file I need to parse is a special txt file (DXF
>> format) that consist of couple-of-line:
> Binary files do not have lines.  Until you can understand that, you
> cannot proceed.  Binary files consist of blocks of bytes.  Each block
> contains some data.  Each block consists of a different number of bytes.
>
Its not to helpful to someone trying to deal with DXF files to make such 
a strong distinction between binary and text files. I haven't worked 
with them and hope I never have to. A quick look at the Wikipedia 
article and the most recent Autocad spec suggests that the files may be 
best thought of as a mixture of binary and ASCII data. The original DXF 
files were text files where each line was a key value pair with the 
value generally a decimal representation of a floating point number. 
There is now an optional file format that contains binary 
representations of the numbers to reduce precision losses caused by 
repeated conversions and save some space. Most of the 270 page 
specification appears to describe the ASCII format with the binary 
format introduced on page 242.


You can get a recent DXF spec at:
http://images.autodesk.com/adsk/files/autocad_2012_pdf_dxf-reference_enu.pdf

This may give a helpful overview:
http://en.wikipedia.org/wiki/Dxf

Alessandro's problem is to read and parse a file that contains small fields to be interpreted as ASCII text, binary integers, floating point numbers, etc. Just what will come next is determined by what came just before with reference to a 270 page document which has a few
examples in Visual Basic 6.

I would proceed as follows:

* Figure out which kinds of primitive data are expected in the files of interest.

* For each kind, write and test a function to read and convert one such item.

* Write a function to read the next entity record from the file. Its likely that this function
should return a Ruby object that represents the particular kind of entity.

The ACIS spec says "The header is followed by a sequence of entity records.
Each entity record consists of a sequence number (optional), an entity type identifier,
the entity data, and a terminator."

So to read an entity record, first read the sequence number if present, then read the type identifier. The type identifier should be used to select an appropriate function to read the data part of the entity record. Then read the terminator unless it was already used to end the entity data.


Essential tools:

Something to examine and print pieces of the data in hexadecimal. Use this to explore the
data and resolve questions about byte order, number encoding, etc.

The ruby String pack and unpack functions.

Possibly an assortment of colored pencils to mark up printed hex dumps of the data.

There may be some Ruby tools specifically intended for this kind of work.



Caveat:
I may have written more than I know about some of the details but I think the general ideas are correct.


-- Bill





[toc] | [prev] | [next] | [standalone]


#3327

From7stud -- <bbxx789_05ss@yahoo.com>
Date2011-04-21 12:53 -0500
Message-ID<00a061658a9946c7c43c68a0fe83f318@ruby-forum.com>
In reply to#3271
Suppose your file contains this data:

"\x00\x00\x00\x01"

Scenario 1:
The four bytes could represent the number of widgets sold (=1).

Scenario 2:
Or the first two bytes could represent the number of widgets sold(=0), 
the third bytes is the number of widgets in inventory(=0), and the 
fourth byte is the number of widgets in transit to the factory(=1).

So unless you know what each byte in the file is supposed to represent, 
you cannot read the file correctly.  If someone hands you the file with 
the above data in it, and says, "Here's your data.  Get cracking!", and 
the person walks out the door, how would you know if Scenario 1 or 
Scenario 2 is the way the data is laid out?

-- 
Posted via http://www.ruby-forum.com/.

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.ruby


csiph-web