Groups > comp.lang.ruby > #3221 > unrolled thread

splitting binary data

Started by	hroyd hroyd <hroyd@mailinator.com>
First post	2011-04-20 03:26 -0500
Last post	2011-04-21 12:18 -0500
Articles	9 — 4 participants

Back to article view | Back to comp.lang.ruby

  splitting binary data hroyd hroyd <hroyd@mailinator.com> - 2011-04-20 03:26 -0500
    Re: splitting binary data 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-20 12:54 -0500
      Re: splitting binary data Iñaki Baz Castillo <ibc@aliax.net> - 2011-04-21 05:52 -0500
        Re: splitting binary data 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-21 12:13 -0500
          Re: splitting binary data Iñaki Baz Castillo <ibc@aliax.net> - 2011-04-21 12:27 -0500
            Re: splitting binary data 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-21 12:56 -0500
              Re: splitting binary data "Y. NOBUOKA" <nobuoka@r-definition.com> - 2011-04-26 04:47 -0500
    Re: splitting binary data hroyd hroyd <hroyd@mailinator.com> - 2011-04-21 05:01 -0500
      Re: splitting binary data 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-21 12:18 -0500

#3221 — splitting binary data

From	hroyd hroyd <hroyd@mailinator.com>
Date	2011-04-20 03:26 -0500
Subject	splitting binary data
Message-ID	<a27c4fbb30b28a4543b4ed0920037c24@ruby-forum.com>

Hello

First post (i am new to ruby :-)). Can you help?

I am using eventmachine to read in TCP segments off the network. I read
in a TCP segment that contains 4 messages. The TCP segment binary data
is shown below, where
\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\ is the
marker for each message. I would like to split the data intot he 4
messages, but am having trouble doing so. When I split the data, the
whole message gets inserted into the first array element. I understand I
may need to escape the \, but how would i do that for the following
message. I can split it by unpacking to Hex, and the splitting, but that
is inefficient for my needs as I use bindata to inspect the packet. Any
help is appreciated

Thanks


\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x00R\x02\x00\x00\x00'@\x01\x01\x00@\x02\x00@\x03\x04\n\x10\x8E\xC8\x80\x04\x04\x00\x00\x002@\x05\x04\x00\x00\x00d\xC0\b\b\x00\x01\x00\x01\x00\x01\x00\x02\x18\n\x10\x8E\b\x04\x18\x02\x02\x02
\n\x13\x00\x01
\x01\x01\x01\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x00A\x02\x00\x00\x00'@\x01\x01\x00@\x02\x00@\x03\x04\n\x10\x8E\xC8\x80\x04\x04\x00\x00\x002@\x05\x04\x00\x00\x00d\xC0\b\b\x00\x01\x00\x01\x00\x01\x00\x02\x10\x03\x03\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x00d\x02\x00\x00\x00I@\x01\x01\x00@\x02\x1E\x02\x0E=\xD6R\x132H2H2H2H2H2H2H2H\x8A\xEA\x8A\xEA\x8A\xEA\x8A\xEA@\x03\x04\n\x10\x8E\n\x80\x04\x04\x00\x00\x00\x00@\x05\x04\x00\x00\x00d\xC0\b\f=\xD6\x01,=\xD6\x011=\xD6\v\xEB\x16.\xAE\xF0\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x00N\x02\x00\x00\x003@\x01\x01\x00@\x02\b\x02\x03=\xD6R\xE3\xC0\x1F@\x03\x04\n\x10\x8E\n\x80\x04\x04\x00\x00\x00\x00@\x05\x04\x00\x00\x00d\xC0\b\f=\xD6\x01,=\xD6\x011=\xD6\v\xEB\x16.\xAD\xA8

-- 
Posted via http://www.ruby-forum.com/.

[toc] | [next] | [standalone]

#3261

From	7stud -- <bbxx789_05ss@yahoo.com>
Date	2011-04-20 12:54 -0500
Message-ID	<cfb58e3ca13b30817c52febc64a9b79e@ruby-forum.com>
In reply to	#3221

hroyd hroyd wrote in post #993957:
> Hello
>
> First post (i am new to ruby :-)). Can you help?
>
> I am using eventmachine to read in TCP segments off the network. I read
> in a TCP segment that contains 4 messages. The TCP segment binary data
> is shown below, where
> \xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\ is the
> marker for each message. I would like to split the data intot he 4
> messages, but am having trouble doing so. When I split the data, the
> whole message gets inserted into the first array element.

I'm not seeing that.  Your message starts with the delimiter, so the 
first element of the array will be a blank string:

str = "\xFF\xFF" +
      "\x61" +
      "\xFF\xFF" +
      "\x62" +
      "\xFF\xFF" +
      "\x63" +
      "\xFF\xFF" +
      "\x64"

pattern = "\xFF\xFF"
p str.split(pattern)

--output:--
["", "a", "b", "c", "d"]

-- 
Posted via http://www.ruby-forum.com/.

[toc] | [prev] | [next] | [standalone]

#3307

From	Iñaki Baz Castillo <ibc@aliax.net>
Date	2011-04-21 05:52 -0500
Message-ID	<BANLkTiktwYj=Q-2B+FFDhEnymiucqHp8CQ@mail.gmail.com>
In reply to	#3261

2011/4/20 7stud -- <bbxx789_05ss@yahoo.com>:
> str = "\xFF\xFF" +
>      "\x61" +
>      "\xFF\xFF" +
>      "\x62" +
>      "\xFF\xFF" +
>      "\x63" +
>      "\xFF\xFF" +
>      "\x64"
>
> pattern = "\xFF\xFF"
> p str.split(pattern)
>
> --output:--
> ["", "a", "b", "c", "d"]

Note that this fails under Ruby1.9:

  p str.split(pattern)
  ArgumentError: invalid byte sequence in UTF-8
        from (irb):10:in `split'

-- 
Iñaki Baz Castillo
<ibc@aliax.net>

[toc] | [prev] | [next] | [standalone]

#3322

From	7stud -- <bbxx789_05ss@yahoo.com>
Date	2011-04-21 12:13 -0500
Message-ID	<bbe459046541b7ad39aca75e5822aa7d@ruby-forum.com>
In reply to	#3307

"Iñaki Baz Castillo" <ibc@aliax.net> wrote in post #994264:
> 2011/4/20 7stud -- <bbxx789_05ss@yahoo.com>:
>> p str.split(pattern)
>>
>> --output:--
>> ["", "a", "b", "c", "d"]
>
> Note that this fails under Ruby1.9:
>
>   p str.split(pattern)
>   ArgumentError: invalid byte sequence in UTF-8
>         from (irb):10:in `split'

I guess you missed this:

puts RUBY_VERSION

..
..
..

--output:--
1.9.2

-- 
Posted via http://www.ruby-forum.com/.

[toc] | [prev] | [next] | [standalone]

#3326

From	Iñaki Baz Castillo <ibc@aliax.net>
Date	2011-04-21 12:27 -0500
Message-ID	<BANLkTikpNdaK9nVbavU4BNzkaNY=s+SXJQ@mail.gmail.com>
In reply to	#3322

2011/4/21 7stud -- <bbxx789_05ss@yahoo.com>:
>> Note that this fails under Ruby1.9:
>>
>>   p str.split(pattern)
>>   ArgumentError: invalid byte sequence in UTF-8
>>         from (irb):10:in `split'
>
> I guess you missed this:
>
> puts RUBY_VERSION
>
> ...
>
> --output:--
> 1.9.2


Interesting, I also use 1.9.2, but have realized that it fails under
irb, but not in case I run the above code in a separate file.

-- 
Iñaki Baz Castillo
<ibc@aliax.net>

[toc] | [prev] | [next] | [standalone]

#3329

From	7stud -- <bbxx789_05ss@yahoo.com>
Date	2011-04-21 12:56 -0500
Message-ID	<1c812474cfe40a58c9c935e0b1559f78@ruby-forum.com>
In reply to	#3326

"Iñaki Baz Castillo" <ibc@aliax.net> wrote in post #994342:
> 2011/4/21 7stud -- <bbxx789_05ss@yahoo.com>:
>> ...
>>
>> --output:--
>> 1.9.2
>
>
> Interesting, I also use 1.9.2, but have realized that it fails under
> irb, but not in case I run the above code in a separate file.a

I never use irb like interfaces in any language anymore--they are 
unreliable.

-- 
Posted via http://www.ruby-forum.com/.

[toc] | [prev] | [next] | [standalone]

#3505

From	"Y. NOBUOKA" <nobuoka@r-definition.com>
Date	2011-04-26 04:47 -0500
Message-ID	<BANLkTin=eOvsZfo5fD9bNc0twZUuBJ5A+g@mail.gmail.com>
In reply to	#3329

On ruby 1.9, a String object knows the encoding of itself.
And, If a String object includes byte sequences unsuitable for the encoding,
the String#split method raises error.

Not using the magic comment, it's not the matter that a string literal includes
non-ASCII characters.

## example: OK!!
#-------------------------------------------------
#! ruby-1.9.2

str = "\xFF\xFF\x61\xFF\xFF\x62\xFF\xFF\x63\xFF\xFF\x64"
p str.encoding         #=> #<Encoding:ASCII-8BIT>
p str.valid_encoding?  #=> true

pattern = "\xFF\xFF"
p str.split( pattern ) #=> ["", "a", "b", "c", "d"]
#-------------------------------------------------

However, using the magic comment to tell the file encoding is UTF-8,
it's the matter that a string literal includes non-ASCII characters.

## example: NG
#-------------------------------------------------
#! ruby-1.9.2
# coding: UTF-8

str = "\xFF\xFF\x61\xFF\xFF\x62\xFF\xFF\x63\xFF\xFF\x64"
p str.encoding         #=> #<Encoding:UTF-8>
p str.valid_encoding?  #=> false

pattern = "\xFF\xFF"
p pattern.valid_encoding? #=> false
p str.split( pattern ) # ERROR OCCURS!!!
#-------------------------------------------------

Avoiding this problem, you must change the encoding of the string which include
non-ASCII characters into ASCII-8BIT.

## example: avoiding the problem
#-------------------------------------------------
#! ruby-1.9.2
# coding: UTF-8

str = "\xFF\xFF\x61\xFF\xFF\x62\xFF\xFF\x63\xFF\xFF\x64"
# change the encoding of the string
str.force_encoding Encoding::ASCII_8BIT
p str.encoding         #=> #<Encoding:ASCII-8BIT>
p str.valid_encoding?  #=> true

pattern = "\xFF\xFF".force_encoding Encoding::ASCII_8BIT
p pattern.valid_encoding? #=> true
p str.split( pattern ) #=> ["", "a", "b", "c", "d"]
#-------------------------------------------------

Kind regards,
-- 
NOBUOKA Yu

[toc] | [prev] | [next] | [standalone]

#3305

From	hroyd hroyd <hroyd@mailinator.com>
Date	2011-04-21 05:01 -0500
Message-ID	<7568e966e6a587834b24d6893c5b6c41@ruby-forum.com>
In reply to	#3221

Thanks for the reply, that works

I was trying to split on

"\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\"

but dropping the last \ was what I was missing

"\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF"

["",
"\x00R\x02\x00\x00\x00'@\x01\x01\x00@\x02\x00@\x03\x04\n\x10\x8E\xC8\x80\x04\x04\x00\x00\x002@\x05\x04\x00\x00\x00d\xC0\b\b\x00\x01\x00\x01\x00\x01\x00\x02\x18\n\x10\x8E\b\x04\x18\x02\x02\x02
\n\x13\x00\x01 \x01\x01\x01\x01",
"\x00A\x02\x00\x00\x00'@\x01\x01\x00@\x02\x00@\x03\x04\n\x10\x8E\xC8\x80\x04\x04\x00\x00\x002@\x05\x04\x00\x00\x00d\xC0\b\b\x00\x01\x00\x01\x00\x01\x00\x02\x10\x03\x03",
"\x00d\x02\x00\x00\x00I@\x01\x01\x00@\x02\x1E\x02\x0E=\xD6R\x132H2H2H2H2H2H2H2H\x8A\xEA\x8A\xEA\x8A\xEA\x8A\xEA@\x03\x04\n\x10\x8E\n\x80\x04\x04\x00\x00\x00\x00@\x05\x04\x00\x00\x00d\xC0\b\f=\xD6\x01,=\xD6\x011=\xD6\v\xEB\x16.\xAE\xF0",
"\x00N\x02\x00\x00\x003@\x01\x01\x00@\x02\b\x02\x03=\xD6R\xE3\xC0\x1F@\x03\x04\n\x10\x8E\n\x80\x04\x04\x00\x00\x00\x00@\x05\x04\x00\x00\x00d\xC0\b\f=\xD6\x01,=\xD6\x011=\xD6\v\xEB\x16.\xAD\xA8"]

Thanks for your help

-- 
Posted via http://www.ruby-forum.com/.

[toc] | [prev] | [next] | [standalone]

#3323

From	7stud -- <bbxx789_05ss@yahoo.com>
Date	2011-04-21 12:18 -0500
Message-ID	<bd8e6df73eed51dd17b0d11bfdc15f5b@ruby-forum.com>
In reply to	#3305

hroyd hroyd wrote in post #994257:
>
> Thanks for your help
>

Sure.  Also, note that ruby lets you do this:

pattern = "\xFF" * 16
p pattern

--output:--
"\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF"


..so that you don't have to write that out by hand.

-- 
Posted via http://www.ruby-forum.com/.

[toc] | [prev] | [standalone]

csiph-web

splitting binary data

Contents

#3221 — splitting binary data

#3261

#3307

#3322

#3326

#3329

#3505

#3305

#3323