Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.programmer > #20310
| From | Knute Johnson <nospam@knutejohnson.com> |
|---|---|
| Newsgroups | comp.lang.java.programmer |
| Subject | Re: Unicode escapes and String literals? |
| Date | 2012-12-13 16:11 -0800 |
| Organization | A noiseless patient Spider |
| Message-ID | <kadqs2$3m9$1@dont-email.me> (permalink) |
| References | <kad3d6$eoo$1@dont-email.me> <50ca6046$0$284$14726298@news.sunsite.dk> |
On 12/13/2012 3:09 PM, Arne Vajhøj wrote:
> On 12/13/2012 12:31 PM, Knute Johnson wrote:
>> I just had a great revelation as I was putting together my SSCCE
>> for the question I was going to ask. So it has changed my
>> question. How do I do the conversion of unicode escape sequences
>> to a String that are done by string literals?
>>
>> String s = "\u0066\u0065\u0064";
>>
>> becomes "fed" but if you create a String with \u0066\u0065\u0064 in
>> it without using the literal it stays \u0066\u0065\u0064. Is there
>> a built in mechanism in Java for doing that translation to a
>> String?
>
> I don't think there is anything built in.
>
> But it is trivial to code.
>
> This was posted just a few months back:
>
> import java.util.regex.Matcher; import java.util.regex.Pattern;
>
> public class Unescape { private static final Pattern p =
> Pattern.compile("\\\\u([0-9A-F]{4})"); public static String
> U2U(String s) { //String res = s; //Matcher m = p.matcher(res);
> //while (m.find()) { // res = res.replaceAll("\\" + m.group(0),
> Character.toString((char) Integer.parseInt(m.group(1), 16))); //}
> //return res; Matcher m = p.matcher(s); StringBuffer res = new
> StringBuffer(); while (m.find()) { m.appendReplacement(res,
> Character.toString((char) Integer.parseInt(m.group(1), 16))); }
> m.appendTail(res); return res.toString(); } public static void
> main(String[] args) {
>
> System.out.println(U2U("\\u0041\\u0042\\u0043\\u000A\\u0031\\u0032\\u0033"));
>
>
> } }
>
> Arne
Well, brilliant minds think alike. Where were you when I asked the
first time :-). I don't remember a thread on this going by but that's
getting harder to do all the time. I originally had String.valueOf()
instead of Character.toString(). I think the latter is better but not
sure if it makes any difference. Could be a non-trivial Unicode gotcha
eh Daniel?
Thanks everybody.
import java.util.regex.*;
public class test6 {
public static void main(String[] args) {
String clear = "byte me!";
System.out.println(clear);
String escpd = unicodeEscape(clear);
System.out.println(escpd);
Pattern p = Pattern.compile("\\\\u([0-9a-fA-F]{4})");
Matcher m = p.matcher(escpd);
StringBuffer buf = new StringBuffer();
while (m.find()) {
String repl =
Character.toString((char)Integer.parseInt(m.group(1),16));
m.appendReplacement(buf,repl);
}
m.appendTail(buf);
System.out.println(buf);
}
public static String unicodeEscape(char c) {
return String.format("\\u%04x",(int)c);
}
public static String unicodeEscape(Character c) {
if (c == null)
return null;
return unicodeEscape(c.charValue());
}
public static String unicodeEscape(String str) {
StringBuilder buf = new StringBuilder();
for (int i=0; i<str.length(); i++)
buf.append(unicodeEscape(str.charAt(i)));
return buf.toString();
}
}
C:\Documents and Settings\Knute Johnson>java test6
byte me!
\u0062\u0079\u0074\u0065\u0020\u006d\u0065\u0021
byte me!
--
Knute Johnson
Back to comp.lang.java.programmer | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Unicode escapes and String literals? Knute Johnson <nospam@knutejohnson.com> - 2012-12-13 09:31 -0800
Re: Unicode escapes and String literals? Thomas Richter <thor@math.tu-berlin.de> - 2012-12-13 18:51 +0100
Re: Unicode escapes and String literals? Knute Johnson <nospam@knutejohnson.com> - 2012-12-13 10:47 -0800
Re: Unicode escapes and String literals? Lew <lewbloch@gmail.com> - 2012-12-13 11:41 -0800
Re: Unicode escapes and String literals? rossum <rossum48@coldmail.com> - 2012-12-14 13:32 +0000
Re: Unicode escapes and String literals? Lew <lewbloch@gmail.com> - 2012-12-14 15:16 -0800
Re: Unicode escapes and String literals? markspace <-@.> - 2012-12-13 12:58 -0800
Re: Unicode escapes and String literals? David Lamb <dalamb@cs.queensu.ca> - 2012-12-13 16:21 -0500
Re: Unicode escapes and String literals? markspace <-@.> - 2012-12-13 14:00 -0800
Re: Unicode escapes and String literals? David Lamb <dalamb@cs.queensu.ca> - 2012-12-13 17:17 -0500
Re: Unicode escapes and String literals? David Lamb <dalamb@cs.queensu.ca> - 2012-12-13 17:19 -0500
Re: Unicode escapes and String literals? Lew <lewbloch@gmail.com> - 2012-12-13 17:11 -0800
Re: Unicode escapes and String literals? Arne Vajhøj <arne@vajhoej.dk> - 2012-12-13 19:38 -0500
Re: Unicode escapes and String literals? Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-12-13 11:46 -0800
Re: Unicode escapes and String literals? Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-12-13 11:49 -0800
Re: Unicode escapes and String literals? Knute Johnson <nospam@knutejohnson.com> - 2012-12-13 14:55 -0800
Re: Unicode escapes and String literals? markspace <-@.> - 2012-12-13 15:32 -0800
Re: Unicode escapes and String literals? Arne Vajhøj <arne@vajhoej.dk> - 2012-12-13 18:09 -0500
Re: Unicode escapes and String literals? Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-12-13 15:52 -0800
Re: Unicode escapes and String literals? Arne Vajhøj <arne@vajhoej.dk> - 2012-12-13 19:40 -0500
Re: Unicode escapes and String literals? Knute Johnson <nospam@knutejohnson.com> - 2012-12-13 16:11 -0800
Re: Unicode escapes and String literals? Arne Vajhøj <arne@vajhoej.dk> - 2012-12-13 19:43 -0500
Re: Unicode escapes and String literals? Knute Johnson <nospam@knutejohnson.com> - 2012-12-13 17:08 -0800
Re: Unicode escapes and String literals? Roedy Green <see_website@mindprod.com.invalid> - 2012-12-14 02:28 -0800
Re: Unicode escapes and String literals? Arne Vajhøj <arne@vajhoej.dk> - 2012-12-14 21:05 -0500
Re: Unicode escapes and String literals? Roedy Green <see_website@mindprod.com.invalid> - 2012-12-17 02:42 -0800
csiph-web