Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #20310

Re: Unicode escapes and String literals?

From Knute Johnson <nospam@knutejohnson.com>
Newsgroups comp.lang.java.programmer
Subject Re: Unicode escapes and String literals?
Date 2012-12-13 16:11 -0800
Organization A noiseless patient Spider
Message-ID <kadqs2$3m9$1@dont-email.me> (permalink)
References <kad3d6$eoo$1@dont-email.me> <50ca6046$0$284$14726298@news.sunsite.dk>

Show all headers | View raw


On 12/13/2012 3:09 PM, Arne Vajhøj wrote:
> On 12/13/2012 12:31 PM, Knute Johnson wrote:
>> I just had a great revelation as I was putting together my SSCCE
>> for the question I was going to ask.  So it has changed my
>> question.  How do I do the conversion of unicode escape sequences
>> to a String that are done by string literals?
>>
>> String s = "\u0066\u0065\u0064";
>>
>> becomes "fed" but if you create a String with \u0066\u0065\u0064 in
>> it without using the literal it stays \u0066\u0065\u0064.  Is there
>> a built in mechanism in Java for doing that translation to a
>> String?
>
> I don't think there is anything built in.
>
> But it is trivial to code.
>
> This was posted just a few months back:
>
> import java.util.regex.Matcher; import java.util.regex.Pattern;
>
> public class Unescape { private static final Pattern p =
> Pattern.compile("\\\\u([0-9A-F]{4})"); public static String
> U2U(String s) { //String res = s; //Matcher m = p.matcher(res);
> //while (m.find()) { //  res = res.replaceAll("\\" + m.group(0),
> Character.toString((char) Integer.parseInt(m.group(1), 16))); //}
> //return res; Matcher m = p.matcher(s); StringBuffer res = new
> StringBuffer(); while (m.find()) { m.appendReplacement(res,
> Character.toString((char) Integer.parseInt(m.group(1), 16))); }
> m.appendTail(res); return res.toString(); } public static void
> main(String[] args) {
>
> System.out.println(U2U("\\u0041\\u0042\\u0043\\u000A\\u0031\\u0032\\u0033"));
>
>
> } }
>
> Arne

Well, brilliant minds think alike.  Where were you when I asked the 
first time :-).  I don't remember a thread on this going by but that's 
getting harder to do all the time.  I originally had String.valueOf() 
instead of Character.toString().  I think the latter is better but not 
sure if it makes any difference.  Could be a non-trivial Unicode gotcha 
eh Daniel?

Thanks everybody.


import java.util.regex.*;

public class test6 {
     public static void main(String[] args) {
         String clear = "byte me!";
         System.out.println(clear);
         String escpd = unicodeEscape(clear);
         System.out.println(escpd);

         Pattern p = Pattern.compile("\\\\u([0-9a-fA-F]{4})");
         Matcher m = p.matcher(escpd);

         StringBuffer buf = new StringBuffer();
         while (m.find()) {
             String repl =
              Character.toString((char)Integer.parseInt(m.group(1),16));
             m.appendReplacement(buf,repl);
         }
         m.appendTail(buf);

         System.out.println(buf);
     }

     public static String unicodeEscape(char c) {
         return String.format("\\u%04x",(int)c);
     }

     public static String unicodeEscape(Character c) {
         if (c == null)
             return null;

         return unicodeEscape(c.charValue());
     }

     public static String unicodeEscape(String str) {
         StringBuilder buf = new StringBuilder();
         for (int i=0; i<str.length(); i++)
             buf.append(unicodeEscape(str.charAt(i)));

         return buf.toString();
     }
}

C:\Documents and Settings\Knute Johnson>java test6
byte me!
\u0062\u0079\u0074\u0065\u0020\u006d\u0065\u0021
byte me!

-- 

Knute Johnson

Back to comp.lang.java.programmer | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Unicode escapes and String literals? Knute Johnson <nospam@knutejohnson.com> - 2012-12-13 09:31 -0800
  Re: Unicode escapes and String literals? Thomas Richter <thor@math.tu-berlin.de> - 2012-12-13 18:51 +0100
    Re: Unicode escapes and String literals? Knute Johnson <nospam@knutejohnson.com> - 2012-12-13 10:47 -0800
      Re: Unicode escapes and String literals? Lew <lewbloch@gmail.com> - 2012-12-13 11:41 -0800
        Re: Unicode escapes and String literals? rossum <rossum48@coldmail.com> - 2012-12-14 13:32 +0000
          Re: Unicode escapes and String literals? Lew <lewbloch@gmail.com> - 2012-12-14 15:16 -0800
      Re: Unicode escapes and String literals? markspace <-@.> - 2012-12-13 12:58 -0800
        Re: Unicode escapes and String literals? David Lamb <dalamb@cs.queensu.ca> - 2012-12-13 16:21 -0500
          Re: Unicode escapes and String literals? markspace <-@.> - 2012-12-13 14:00 -0800
            Re: Unicode escapes and String literals? David Lamb <dalamb@cs.queensu.ca> - 2012-12-13 17:17 -0500
            Re: Unicode escapes and String literals? David Lamb <dalamb@cs.queensu.ca> - 2012-12-13 17:19 -0500
            Re: Unicode escapes and String literals? Lew <lewbloch@gmail.com> - 2012-12-13 17:11 -0800
          Re: Unicode escapes and String literals? Arne Vajhøj <arne@vajhoej.dk> - 2012-12-13 19:38 -0500
  Re: Unicode escapes and String literals? Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-12-13 11:46 -0800
    Re: Unicode escapes and String literals? Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-12-13 11:49 -0800
    Re: Unicode escapes and String literals? Knute Johnson <nospam@knutejohnson.com> - 2012-12-13 14:55 -0800
      Re: Unicode escapes and String literals? markspace <-@.> - 2012-12-13 15:32 -0800
  Re: Unicode escapes and String literals? Arne Vajhøj <arne@vajhoej.dk> - 2012-12-13 18:09 -0500
    Re: Unicode escapes and String literals? Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-12-13 15:52 -0800
      Re: Unicode escapes and String literals? Arne Vajhøj <arne@vajhoej.dk> - 2012-12-13 19:40 -0500
    Re: Unicode escapes and String literals? Knute Johnson <nospam@knutejohnson.com> - 2012-12-13 16:11 -0800
      Re: Unicode escapes and String literals? Arne Vajhøj <arne@vajhoej.dk> - 2012-12-13 19:43 -0500
        Re: Unicode escapes and String literals? Knute Johnson <nospam@knutejohnson.com> - 2012-12-13 17:08 -0800
  Re: Unicode escapes and String literals? Roedy Green <see_website@mindprod.com.invalid> - 2012-12-14 02:28 -0800
    Re: Unicode escapes and String literals? Arne Vajhøj <arne@vajhoej.dk> - 2012-12-14 21:05 -0500
  Re: Unicode escapes and String literals? Roedy Green <see_website@mindprod.com.invalid> - 2012-12-17 02:42 -0800

csiph-web