Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.programmer > #20294 > unrolled thread
| Started by | Knute Johnson <nospam@knutejohnson.com> |
|---|---|
| First post | 2012-12-13 09:31 -0800 |
| Last post | 2012-12-17 02:42 -0800 |
| Articles | 6 on this page of 26 — 9 participants |
Back to article view | Back to comp.lang.java.programmer
Unicode escapes and String literals? Knute Johnson <nospam@knutejohnson.com> - 2012-12-13 09:31 -0800
Re: Unicode escapes and String literals? Thomas Richter <thor@math.tu-berlin.de> - 2012-12-13 18:51 +0100
Re: Unicode escapes and String literals? Knute Johnson <nospam@knutejohnson.com> - 2012-12-13 10:47 -0800
Re: Unicode escapes and String literals? Lew <lewbloch@gmail.com> - 2012-12-13 11:41 -0800
Re: Unicode escapes and String literals? rossum <rossum48@coldmail.com> - 2012-12-14 13:32 +0000
Re: Unicode escapes and String literals? Lew <lewbloch@gmail.com> - 2012-12-14 15:16 -0800
Re: Unicode escapes and String literals? markspace <-@.> - 2012-12-13 12:58 -0800
Re: Unicode escapes and String literals? David Lamb <dalamb@cs.queensu.ca> - 2012-12-13 16:21 -0500
Re: Unicode escapes and String literals? markspace <-@.> - 2012-12-13 14:00 -0800
Re: Unicode escapes and String literals? David Lamb <dalamb@cs.queensu.ca> - 2012-12-13 17:17 -0500
Re: Unicode escapes and String literals? David Lamb <dalamb@cs.queensu.ca> - 2012-12-13 17:19 -0500
Re: Unicode escapes and String literals? Lew <lewbloch@gmail.com> - 2012-12-13 17:11 -0800
Re: Unicode escapes and String literals? Arne Vajhøj <arne@vajhoej.dk> - 2012-12-13 19:38 -0500
Re: Unicode escapes and String literals? Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-12-13 11:46 -0800
Re: Unicode escapes and String literals? Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-12-13 11:49 -0800
Re: Unicode escapes and String literals? Knute Johnson <nospam@knutejohnson.com> - 2012-12-13 14:55 -0800
Re: Unicode escapes and String literals? markspace <-@.> - 2012-12-13 15:32 -0800
Re: Unicode escapes and String literals? Arne Vajhøj <arne@vajhoej.dk> - 2012-12-13 18:09 -0500
Re: Unicode escapes and String literals? Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-12-13 15:52 -0800
Re: Unicode escapes and String literals? Arne Vajhøj <arne@vajhoej.dk> - 2012-12-13 19:40 -0500
Re: Unicode escapes and String literals? Knute Johnson <nospam@knutejohnson.com> - 2012-12-13 16:11 -0800
Re: Unicode escapes and String literals? Arne Vajhøj <arne@vajhoej.dk> - 2012-12-13 19:43 -0500
Re: Unicode escapes and String literals? Knute Johnson <nospam@knutejohnson.com> - 2012-12-13 17:08 -0800
Re: Unicode escapes and String literals? Roedy Green <see_website@mindprod.com.invalid> - 2012-12-14 02:28 -0800
Re: Unicode escapes and String literals? Arne Vajhøj <arne@vajhoej.dk> - 2012-12-14 21:05 -0500
Re: Unicode escapes and String literals? Roedy Green <see_website@mindprod.com.invalid> - 2012-12-17 02:42 -0800
Page 2 of 2 — ← Prev page 1 [2]
| From | Knute Johnson <nospam@knutejohnson.com> |
|---|---|
| Date | 2012-12-13 16:11 -0800 |
| Message-ID | <kadqs2$3m9$1@dont-email.me> |
| In reply to | #20307 |
On 12/13/2012 3:09 PM, Arne Vajhøj wrote:
> On 12/13/2012 12:31 PM, Knute Johnson wrote:
>> I just had a great revelation as I was putting together my SSCCE
>> for the question I was going to ask. So it has changed my
>> question. How do I do the conversion of unicode escape sequences
>> to a String that are done by string literals?
>>
>> String s = "\u0066\u0065\u0064";
>>
>> becomes "fed" but if you create a String with \u0066\u0065\u0064 in
>> it without using the literal it stays \u0066\u0065\u0064. Is there
>> a built in mechanism in Java for doing that translation to a
>> String?
>
> I don't think there is anything built in.
>
> But it is trivial to code.
>
> This was posted just a few months back:
>
> import java.util.regex.Matcher; import java.util.regex.Pattern;
>
> public class Unescape { private static final Pattern p =
> Pattern.compile("\\\\u([0-9A-F]{4})"); public static String
> U2U(String s) { //String res = s; //Matcher m = p.matcher(res);
> //while (m.find()) { // res = res.replaceAll("\\" + m.group(0),
> Character.toString((char) Integer.parseInt(m.group(1), 16))); //}
> //return res; Matcher m = p.matcher(s); StringBuffer res = new
> StringBuffer(); while (m.find()) { m.appendReplacement(res,
> Character.toString((char) Integer.parseInt(m.group(1), 16))); }
> m.appendTail(res); return res.toString(); } public static void
> main(String[] args) {
>
> System.out.println(U2U("\\u0041\\u0042\\u0043\\u000A\\u0031\\u0032\\u0033"));
>
>
> } }
>
> Arne
Well, brilliant minds think alike. Where were you when I asked the
first time :-). I don't remember a thread on this going by but that's
getting harder to do all the time. I originally had String.valueOf()
instead of Character.toString(). I think the latter is better but not
sure if it makes any difference. Could be a non-trivial Unicode gotcha
eh Daniel?
Thanks everybody.
import java.util.regex.*;
public class test6 {
public static void main(String[] args) {
String clear = "byte me!";
System.out.println(clear);
String escpd = unicodeEscape(clear);
System.out.println(escpd);
Pattern p = Pattern.compile("\\\\u([0-9a-fA-F]{4})");
Matcher m = p.matcher(escpd);
StringBuffer buf = new StringBuffer();
while (m.find()) {
String repl =
Character.toString((char)Integer.parseInt(m.group(1),16));
m.appendReplacement(buf,repl);
}
m.appendTail(buf);
System.out.println(buf);
}
public static String unicodeEscape(char c) {
return String.format("\\u%04x",(int)c);
}
public static String unicodeEscape(Character c) {
if (c == null)
return null;
return unicodeEscape(c.charValue());
}
public static String unicodeEscape(String str) {
StringBuilder buf = new StringBuilder();
for (int i=0; i<str.length(); i++)
buf.append(unicodeEscape(str.charAt(i)));
return buf.toString();
}
}
C:\Documents and Settings\Knute Johnson>java test6
byte me!
\u0062\u0079\u0074\u0065\u0020\u006d\u0065\u0021
byte me!
--
Knute Johnson
[toc] | [prev] | [next] | [standalone]
| From | Arne Vajhøj <arne@vajhoej.dk> |
|---|---|
| Date | 2012-12-13 19:43 -0500 |
| Message-ID | <50ca763e$0$287$14726298@news.sunsite.dk> |
| In reply to | #20310 |
On 12/13/2012 7:11 PM, Knute Johnson wrote: > On 12/13/2012 3:09 PM, Arne Vajhøj wrote: >> On 12/13/2012 12:31 PM, Knute Johnson wrote: >>> I just had a great revelation as I was putting together my SSCCE >>> for the question I was going to ask. So it has changed my >>> question. How do I do the conversion of unicode escape sequences >>> to a String that are done by string literals? >>> >>> String s = "\u0066\u0065\u0064"; >>> >>> becomes "fed" but if you create a String with \u0066\u0065\u0064 in >>> it without using the literal it stays \u0066\u0065\u0064. Is there >>> a built in mechanism in Java for doing that translation to a >>> String? >> >> I don't think there is anything built in. >> >> But it is trivial to code. >> >> This was posted just a few months back: > Well, brilliant minds think alike. Where were you when I asked the > first time :-). I don't remember a thread on this going by but that's > getting harder to do all the time. I am pretty sure that it was here that I posted the code, but with the out commented implementation and that someone (Daniel?) suggested the new implementation as an improvement. Arne
[toc] | [prev] | [next] | [standalone]
| From | Knute Johnson <nospam@knutejohnson.com> |
|---|---|
| Date | 2012-12-13 17:08 -0800 |
| Message-ID | <kadu73$kn7$1@dont-email.me> |
| In reply to | #20313 |
On 12/13/2012 4:43 PM, Arne Vajhøj wrote: > On 12/13/2012 7:11 PM, Knute Johnson wrote: >> On 12/13/2012 3:09 PM, Arne Vajhøj wrote: >>> On 12/13/2012 12:31 PM, Knute Johnson wrote: >>>> I just had a great revelation as I was putting together my SSCCE >>>> for the question I was going to ask. So it has changed my >>>> question. How do I do the conversion of unicode escape sequences >>>> to a String that are done by string literals? >>>> >>>> String s = "\u0066\u0065\u0064"; >>>> >>>> becomes "fed" but if you create a String with \u0066\u0065\u0064 in >>>> it without using the literal it stays \u0066\u0065\u0064. Is there >>>> a built in mechanism in Java for doing that translation to a >>>> String? >>> >>> I don't think there is anything built in. >>> >>> But it is trivial to code. >>> >>> This was posted just a few months back: > >> Well, brilliant minds think alike. Where were you when I asked the >> first time :-). I don't remember a thread on this going by but that's >> getting harder to do all the time. > > I am pretty sure that it was here that I posted the > code, but with the out commented implementation and > that someone (Daniel?) suggested the new implementation > as an improvement. > > Arne > Well I appreciate everybody's help. It was driving me nuts for two days. -- Knute Johnson
[toc] | [prev] | [next] | [standalone]
| From | Roedy Green <see_website@mindprod.com.invalid> |
|---|---|
| Date | 2012-12-14 02:28 -0800 |
| Message-ID | <3mvlc8tm36j2nrnunpl6dc3rr37jvf0p6t@4ax.com> |
| In reply to | #20294 |
On Thu, 13 Dec 2012 09:31:18 -0800, Knute Johnson <nospam@knutejohnson.com> wrote, quoted or indirectly quoted someone who said : >I just had a great revelation as I was putting together my SSCCE for the >question I was going to ask. So it has changed my question. How do I >do the conversion of unicode escape sequences to a String that are done >by string literals? > >String s = "\u0066\u0065\u0064"; > >becomes "fed" but if you create a String with \u0066\u0065\u0064 in it >without using the literal it stays \u0066\u0065\u0064. Is there a built >in mechanism in Java for doing that translation to a String? have a look at native2ascii IIRC it uses sequences like that in its ASCII representation which you can then convert to any encoding you like. see http://mindprod.com/jgloss/encoding.html#NATIVE2ASCII A little finite state machine should handle that fairly easily. If you find that difficult, I would write one for you. -- Roedy Green Canadian Mind Products http://mindprod.com Students who hire or con others to do their homework are as foolish as couch potatoes who hire others to go to the gym for them.
[toc] | [prev] | [next] | [standalone]
| From | Arne Vajhøj <arne@vajhoej.dk> |
|---|---|
| Date | 2012-12-14 21:05 -0500 |
| Message-ID | <50cbdb02$0$294$14726298@news.sunsite.dk> |
| In reply to | #20321 |
On 12/14/2012 5:28 AM, Roedy Green wrote: > On Thu, 13 Dec 2012 09:31:18 -0800, Knute Johnson > <nospam@knutejohnson.com> wrote, quoted or indirectly quoted someone > who said : > >> I just had a great revelation as I was putting together my SSCCE for the >> question I was going to ask. So it has changed my question. How do I >> do the conversion of unicode escape sequences to a String that are done >> by string literals? >> >> String s = "\u0066\u0065\u0064"; >> >> becomes "fed" but if you create a String with \u0066\u0065\u0064 in it >> without using the literal it stays \u0066\u0065\u0064. Is there a built >> in mechanism in Java for doing that translation to a String? > > have a look at native2ascii > > IIRC it uses sequences like that in its ASCII representation which you > can then convert to any encoding you like. > > see http://mindprod.com/jgloss/encoding.html#NATIVE2ASCII First: it does not do what Knute asked for. It actually generates the escape sequences that Knute is trying to convert from. Second: even it has done whar Knute asked for, then: - create a file with the String - use Runtime exec (or ProcessBuilder) to run native2ascii - read a new String from the new file seems at the least efficient solution possible. > A little finite state machine should handle that fairly easily. > If you find that difficult, I would write one for you. Based on the above: hmmmmmmm. Arne
[toc] | [prev] | [next] | [standalone]
| From | Roedy Green <see_website@mindprod.com.invalid> |
|---|---|
| Date | 2012-12-17 02:42 -0800 |
| Message-ID | <ojttc8pn6a66gaj9j45cegbum067u48jql@4ax.com> |
| In reply to | #20294 |
On Thu, 13 Dec 2012 09:31:18 -0800, Knute Johnson <nospam@knutejohnson.com> wrote, quoted or indirectly quoted someone who said : >I just had a great revelation as I was putting together my SSCCE for the >question I was going to ask. So it has changed my question. How do I >do the conversion of unicode escape sequences to a String that are done >by string literals? The code you want exists inside Quoter. see FromJavaStringLiteral and ToJavaStringLiteral classes. Source is available from http://mindprod.com/products.html#QUOTER you can play with it as an Applet at http://mindprod.com/applet/quoter.html -- Roedy Green Canadian Mind Products http://mindprod.com Students who hire or con others to do their homework are as foolish as couch potatoes who hire others to go to the gym for them.
[toc] | [prev] | [standalone]
Page 2 of 2 — ← Prev page 1 [2]
Back to top | Article view | comp.lang.java.programmer
csiph-web