Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #20294 > unrolled thread

Unicode escapes and String literals?

Started byKnute Johnson <nospam@knutejohnson.com>
First post2012-12-13 09:31 -0800
Last post2012-12-17 02:42 -0800
Articles 6 on this page of 26 — 9 participants

Back to article view | Back to comp.lang.java.programmer


Contents

  Unicode escapes and String literals? Knute Johnson <nospam@knutejohnson.com> - 2012-12-13 09:31 -0800
    Re: Unicode escapes and String literals? Thomas Richter <thor@math.tu-berlin.de> - 2012-12-13 18:51 +0100
      Re: Unicode escapes and String literals? Knute Johnson <nospam@knutejohnson.com> - 2012-12-13 10:47 -0800
        Re: Unicode escapes and String literals? Lew <lewbloch@gmail.com> - 2012-12-13 11:41 -0800
          Re: Unicode escapes and String literals? rossum <rossum48@coldmail.com> - 2012-12-14 13:32 +0000
            Re: Unicode escapes and String literals? Lew <lewbloch@gmail.com> - 2012-12-14 15:16 -0800
        Re: Unicode escapes and String literals? markspace <-@.> - 2012-12-13 12:58 -0800
          Re: Unicode escapes and String literals? David Lamb <dalamb@cs.queensu.ca> - 2012-12-13 16:21 -0500
            Re: Unicode escapes and String literals? markspace <-@.> - 2012-12-13 14:00 -0800
              Re: Unicode escapes and String literals? David Lamb <dalamb@cs.queensu.ca> - 2012-12-13 17:17 -0500
              Re: Unicode escapes and String literals? David Lamb <dalamb@cs.queensu.ca> - 2012-12-13 17:19 -0500
              Re: Unicode escapes and String literals? Lew <lewbloch@gmail.com> - 2012-12-13 17:11 -0800
            Re: Unicode escapes and String literals? Arne Vajhøj <arne@vajhoej.dk> - 2012-12-13 19:38 -0500
    Re: Unicode escapes and String literals? Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-12-13 11:46 -0800
      Re: Unicode escapes and String literals? Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-12-13 11:49 -0800
      Re: Unicode escapes and String literals? Knute Johnson <nospam@knutejohnson.com> - 2012-12-13 14:55 -0800
        Re: Unicode escapes and String literals? markspace <-@.> - 2012-12-13 15:32 -0800
    Re: Unicode escapes and String literals? Arne Vajhøj <arne@vajhoej.dk> - 2012-12-13 18:09 -0500
      Re: Unicode escapes and String literals? Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-12-13 15:52 -0800
        Re: Unicode escapes and String literals? Arne Vajhøj <arne@vajhoej.dk> - 2012-12-13 19:40 -0500
      Re: Unicode escapes and String literals? Knute Johnson <nospam@knutejohnson.com> - 2012-12-13 16:11 -0800
        Re: Unicode escapes and String literals? Arne Vajhøj <arne@vajhoej.dk> - 2012-12-13 19:43 -0500
          Re: Unicode escapes and String literals? Knute Johnson <nospam@knutejohnson.com> - 2012-12-13 17:08 -0800
    Re: Unicode escapes and String literals? Roedy Green <see_website@mindprod.com.invalid> - 2012-12-14 02:28 -0800
      Re: Unicode escapes and String literals? Arne Vajhøj <arne@vajhoej.dk> - 2012-12-14 21:05 -0500
    Re: Unicode escapes and String literals? Roedy Green <see_website@mindprod.com.invalid> - 2012-12-17 02:42 -0800

Page 2 of 2 — ← Prev page 1 [2]


#20310

FromKnute Johnson <nospam@knutejohnson.com>
Date2012-12-13 16:11 -0800
Message-ID<kadqs2$3m9$1@dont-email.me>
In reply to#20307
On 12/13/2012 3:09 PM, Arne Vajhøj wrote:
> On 12/13/2012 12:31 PM, Knute Johnson wrote:
>> I just had a great revelation as I was putting together my SSCCE
>> for the question I was going to ask.  So it has changed my
>> question.  How do I do the conversion of unicode escape sequences
>> to a String that are done by string literals?
>>
>> String s = "\u0066\u0065\u0064";
>>
>> becomes "fed" but if you create a String with \u0066\u0065\u0064 in
>> it without using the literal it stays \u0066\u0065\u0064.  Is there
>> a built in mechanism in Java for doing that translation to a
>> String?
>
> I don't think there is anything built in.
>
> But it is trivial to code.
>
> This was posted just a few months back:
>
> import java.util.regex.Matcher; import java.util.regex.Pattern;
>
> public class Unescape { private static final Pattern p =
> Pattern.compile("\\\\u([0-9A-F]{4})"); public static String
> U2U(String s) { //String res = s; //Matcher m = p.matcher(res);
> //while (m.find()) { //  res = res.replaceAll("\\" + m.group(0),
> Character.toString((char) Integer.parseInt(m.group(1), 16))); //}
> //return res; Matcher m = p.matcher(s); StringBuffer res = new
> StringBuffer(); while (m.find()) { m.appendReplacement(res,
> Character.toString((char) Integer.parseInt(m.group(1), 16))); }
> m.appendTail(res); return res.toString(); } public static void
> main(String[] args) {
>
> System.out.println(U2U("\\u0041\\u0042\\u0043\\u000A\\u0031\\u0032\\u0033"));
>
>
> } }
>
> Arne

Well, brilliant minds think alike.  Where were you when I asked the 
first time :-).  I don't remember a thread on this going by but that's 
getting harder to do all the time.  I originally had String.valueOf() 
instead of Character.toString().  I think the latter is better but not 
sure if it makes any difference.  Could be a non-trivial Unicode gotcha 
eh Daniel?

Thanks everybody.


import java.util.regex.*;

public class test6 {
     public static void main(String[] args) {
         String clear = "byte me!";
         System.out.println(clear);
         String escpd = unicodeEscape(clear);
         System.out.println(escpd);

         Pattern p = Pattern.compile("\\\\u([0-9a-fA-F]{4})");
         Matcher m = p.matcher(escpd);

         StringBuffer buf = new StringBuffer();
         while (m.find()) {
             String repl =
              Character.toString((char)Integer.parseInt(m.group(1),16));
             m.appendReplacement(buf,repl);
         }
         m.appendTail(buf);

         System.out.println(buf);
     }

     public static String unicodeEscape(char c) {
         return String.format("\\u%04x",(int)c);
     }

     public static String unicodeEscape(Character c) {
         if (c == null)
             return null;

         return unicodeEscape(c.charValue());
     }

     public static String unicodeEscape(String str) {
         StringBuilder buf = new StringBuilder();
         for (int i=0; i<str.length(); i++)
             buf.append(unicodeEscape(str.charAt(i)));

         return buf.toString();
     }
}

C:\Documents and Settings\Knute Johnson>java test6
byte me!
\u0062\u0079\u0074\u0065\u0020\u006d\u0065\u0021
byte me!

-- 

Knute Johnson

[toc] | [prev] | [next] | [standalone]


#20313

FromArne Vajhøj <arne@vajhoej.dk>
Date2012-12-13 19:43 -0500
Message-ID<50ca763e$0$287$14726298@news.sunsite.dk>
In reply to#20310
On 12/13/2012 7:11 PM, Knute Johnson wrote:
> On 12/13/2012 3:09 PM, Arne Vajhøj wrote:
>> On 12/13/2012 12:31 PM, Knute Johnson wrote:
>>> I just had a great revelation as I was putting together my SSCCE
>>> for the question I was going to ask.  So it has changed my
>>> question.  How do I do the conversion of unicode escape sequences
>>> to a String that are done by string literals?
>>>
>>> String s = "\u0066\u0065\u0064";
>>>
>>> becomes "fed" but if you create a String with \u0066\u0065\u0064 in
>>> it without using the literal it stays \u0066\u0065\u0064.  Is there
>>> a built in mechanism in Java for doing that translation to a
>>> String?
>>
>> I don't think there is anything built in.
>>
>> But it is trivial to code.
>>
>> This was posted just a few months back:

> Well, brilliant minds think alike.   Where were you when I asked the
> first time :-).  I don't remember a thread on this going by but that's
> getting harder to do all the time.

I am pretty sure that it was here that I posted the
code, but with the out commented implementation and
that someone (Daniel?) suggested the new implementation
as an improvement.

Arne

[toc] | [prev] | [next] | [standalone]


#20315

FromKnute Johnson <nospam@knutejohnson.com>
Date2012-12-13 17:08 -0800
Message-ID<kadu73$kn7$1@dont-email.me>
In reply to#20313
On 12/13/2012 4:43 PM, Arne Vajhøj wrote:
> On 12/13/2012 7:11 PM, Knute Johnson wrote:
>> On 12/13/2012 3:09 PM, Arne Vajhøj wrote:
>>> On 12/13/2012 12:31 PM, Knute Johnson wrote:
>>>> I just had a great revelation as I was putting together my SSCCE
>>>> for the question I was going to ask.  So it has changed my
>>>> question.  How do I do the conversion of unicode escape sequences
>>>> to a String that are done by string literals?
>>>>
>>>> String s = "\u0066\u0065\u0064";
>>>>
>>>> becomes "fed" but if you create a String with \u0066\u0065\u0064 in
>>>> it without using the literal it stays \u0066\u0065\u0064.  Is there
>>>> a built in mechanism in Java for doing that translation to a
>>>> String?
>>>
>>> I don't think there is anything built in.
>>>
>>> But it is trivial to code.
>>>
>>> This was posted just a few months back:
>
>> Well, brilliant minds think alike.   Where were you when I asked the
>> first time :-).  I don't remember a thread on this going by but that's
>> getting harder to do all the time.
>
> I am pretty sure that it was here that I posted the
> code, but with the out commented implementation and
> that someone (Daniel?) suggested the new implementation
> as an improvement.
>
> Arne
>

Well I appreciate everybody's help.  It was driving me nuts for two days.

-- 

Knute Johnson

[toc] | [prev] | [next] | [standalone]


#20321

FromRoedy Green <see_website@mindprod.com.invalid>
Date2012-12-14 02:28 -0800
Message-ID<3mvlc8tm36j2nrnunpl6dc3rr37jvf0p6t@4ax.com>
In reply to#20294
On Thu, 13 Dec 2012 09:31:18 -0800, Knute Johnson
<nospam@knutejohnson.com> wrote, quoted or indirectly quoted someone
who said :

>I just had a great revelation as I was putting together my SSCCE for the 
>question I was going to ask.  So it has changed my question.  How do I 
>do the conversion of unicode escape sequences to a String that are done 
>by string literals?
>
>String s = "\u0066\u0065\u0064";
>
>becomes "fed" but if you create a String with \u0066\u0065\u0064 in it 
>without using the literal it stays \u0066\u0065\u0064.  Is there a built 
>in mechanism in Java for doing that translation to a String?

have a look at native2ascii

IIRC it uses sequences like that in its ASCII representation which you
can then convert to any encoding you like.

see http://mindprod.com/jgloss/encoding.html#NATIVE2ASCII

A little finite state machine should handle that fairly easily.
If you find that difficult, I would write one for you.

-- 
Roedy Green Canadian Mind Products http://mindprod.com
Students who hire or con others to do their homework are as foolish 
as couch potatoes who hire others to go to the gym for them. 

[toc] | [prev] | [next] | [standalone]


#20337

FromArne Vajhøj <arne@vajhoej.dk>
Date2012-12-14 21:05 -0500
Message-ID<50cbdb02$0$294$14726298@news.sunsite.dk>
In reply to#20321
On 12/14/2012 5:28 AM, Roedy Green wrote:
> On Thu, 13 Dec 2012 09:31:18 -0800, Knute Johnson
> <nospam@knutejohnson.com> wrote, quoted or indirectly quoted someone
> who said :
>
>> I just had a great revelation as I was putting together my SSCCE for the
>> question I was going to ask.  So it has changed my question.  How do I
>> do the conversion of unicode escape sequences to a String that are done
>> by string literals?
>>
>> String s = "\u0066\u0065\u0064";
>>
>> becomes "fed" but if you create a String with \u0066\u0065\u0064 in it
>> without using the literal it stays \u0066\u0065\u0064.  Is there a built
>> in mechanism in Java for doing that translation to a String?
>
> have a look at native2ascii
>
> IIRC it uses sequences like that in its ASCII representation which you
> can then convert to any encoding you like.
>
> see http://mindprod.com/jgloss/encoding.html#NATIVE2ASCII

First: it does not do what Knute asked for. It actually
generates the escape sequences that Knute is trying to
convert from.

Second: even it has done whar Knute asked for, then:
- create a file with the String
- use Runtime exec (or ProcessBuilder) to run native2ascii
- read a new String from the new file
seems at the least efficient solution possible.

> A little finite state machine should handle that fairly easily.
> If you find that difficult, I would write one for you.

Based on the above: hmmmmmmm.

Arne

[toc] | [prev] | [next] | [standalone]


#20395

FromRoedy Green <see_website@mindprod.com.invalid>
Date2012-12-17 02:42 -0800
Message-ID<ojttc8pn6a66gaj9j45cegbum067u48jql@4ax.com>
In reply to#20294
On Thu, 13 Dec 2012 09:31:18 -0800, Knute Johnson
<nospam@knutejohnson.com> wrote, quoted or indirectly quoted someone
who said :

>I just had a great revelation as I was putting together my SSCCE for the 
>question I was going to ask.  So it has changed my question.  How do I 
>do the conversion of unicode escape sequences to a String that are done 
>by string literals?

The code you want exists inside Quoter.

see FromJavaStringLiteral
and ToJavaStringLiteral classes.

Source is available from http://mindprod.com/products.html#QUOTER
you can play with it as an Applet at
http://mindprod.com/applet/quoter.html
-- 
Roedy Green Canadian Mind Products http://mindprod.com
Students who hire or con others to do their homework are as foolish 
as couch potatoes who hire others to go to the gym for them. 

[toc] | [prev] | [standalone]


Page 2 of 2 — ← Prev page 1 [2]

Back to top | Article view | comp.lang.java.programmer


csiph-web