Path: csiph.com!usenet.pasdenom.info!gegeweb.org!eternal-september.org!feeder.eternal-september.org!mx04.eternal-september.org!.POSTED!not-for-mail
From: Knute Johnson <nospam@knutejohnson.com>
Newsgroups: comp.lang.java.programmer
Subject: Re: Unicode escapes and String literals?
Date: Thu, 13 Dec 2012 16:11:46 -0800
Organization: A noiseless patient Spider
Lines: 101
Message-ID: <kadqs2$3m9$1@dont-email.me>
References: <kad3d6$eoo$1@dont-email.me> <50ca6046$0$284$14726298@news.sunsite.dk>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 14 Dec 2012 00:11:46 +0000 (UTC)
Injection-Info: mx04.eternal-september.org; posting-host="9b3fcb0d22708969e4dc99e7aa0ef1f9"; logging-data="3785"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18hI8clNJlXszhkHop78xXm"
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:17.0) Gecko/17.0 Thunderbird/17.0
In-Reply-To: <50ca6046$0$284$14726298@news.sunsite.dk>
Cancel-Lock: sha1:18cOHbU3/WI3u5ziKiCQzxGiwz8=
Xref: csiph.com comp.lang.java.programmer:20310

On 12/13/2012 3:09 PM, Arne Vajhøj wrote:
> On 12/13/2012 12:31 PM, Knute Johnson wrote:
>> I just had a great revelation as I was putting together my SSCCE
>> for the question I was going to ask.  So it has changed my
>> question.  How do I do the conversion of unicode escape sequences
>> to a String that are done by string literals?
>>
>> String s = "\u0066\u0065\u0064";
>>
>> becomes "fed" but if you create a String with \u0066\u0065\u0064 in
>> it without using the literal it stays \u0066\u0065\u0064.  Is there
>> a built in mechanism in Java for doing that translation to a
>> String?
>
> I don't think there is anything built in.
>
> But it is trivial to code.
>
> This was posted just a few months back:
>
> import java.util.regex.Matcher; import java.util.regex.Pattern;
>
> public class Unescape { private static final Pattern p =
> Pattern.compile("\\\\u([0-9A-F]{4})"); public static String
> U2U(String s) { //String res = s; //Matcher m = p.matcher(res);
> //while (m.find()) { //  res = res.replaceAll("\\" + m.group(0),
> Character.toString((char) Integer.parseInt(m.group(1), 16))); //}
> //return res; Matcher m = p.matcher(s); StringBuffer res = new
> StringBuffer(); while (m.find()) { m.appendReplacement(res,
> Character.toString((char) Integer.parseInt(m.group(1), 16))); }
> m.appendTail(res); return res.toString(); } public static void
> main(String[] args) {
>
> System.out.println(U2U("\\u0041\\u0042\\u0043\\u000A\\u0031\\u0032\\u0033"));
>
>
> } }
>
> Arne

Well, brilliant minds think alike.  Where were you when I asked the 
first time :-).  I don't remember a thread on this going by but that's 
getting harder to do all the time.  I originally had String.valueOf() 
instead of Character.toString().  I think the latter is better but not 
sure if it makes any difference.  Could be a non-trivial Unicode gotcha 
eh Daniel?

Thanks everybody.


import java.util.regex.*;

public class test6 {
     public static void main(String[] args) {
         String clear = "byte me!";
         System.out.println(clear);
         String escpd = unicodeEscape(clear);
         System.out.println(escpd);

         Pattern p = Pattern.compile("\\\\u([0-9a-fA-F]{4})");
         Matcher m = p.matcher(escpd);

         StringBuffer buf = new StringBuffer();
         while (m.find()) {
             String repl =
              Character.toString((char)Integer.parseInt(m.group(1),16));
             m.appendReplacement(buf,repl);
         }
         m.appendTail(buf);

         System.out.println(buf);
     }

     public static String unicodeEscape(char c) {
         return String.format("\\u%04x",(int)c);
     }

     public static String unicodeEscape(Character c) {
         if (c == null)
             return null;

         return unicodeEscape(c.charValue());
     }

     public static String unicodeEscape(String str) {
         StringBuilder buf = new StringBuilder();
         for (int i=0; i<str.length(); i++)
             buf.append(unicodeEscape(str.charAt(i)));

         return buf.toString();
     }
}

C:\Documents and Settings\Knute Johnson>java test6
byte me!
\u0062\u0079\u0074\u0065\u0020\u006d\u0065\u0021
byte me!

-- 

Knute Johnson