Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!news.ripco.com!news-out.news.tds.net!newsreading01.news.tds.net!53ab2750!not-for-mail From: "Arne Vajhøj" Subject: Re: retriving escape unicode sequences from files ... Message-ID: <5023FE37.56456.calajapr@time.synchro.net> X-Comment-To: Daniel Pitts Newsgroups: comp.lang.java.programmer X-FTN-AREA: COMP.LANG.JAVA.PROGRAMMER X-FTN-MSGID: 1:261/38 68540380 Content-Type: text/plain; charset=IBM437 Content-Transfer-Encoding: 8bit X-Gateway: time.synchro.net [Synchronet 3.16a-Win32 NewsLink 1.98] Lines: 90 Date: Thu, 09 Aug 2012 18:44:31 GMT NNTP-Posting-Host: 69.21.70.65 X-Complaints-To: news@tds.net X-Trace: newsreading01.news.tds.net 1344537871 69.21.70.65 (Thu, 09 Aug 2012 13:44:31 CDT) NNTP-Posting-Date: Thu, 09 Aug 2012 13:44:31 CDT Organization: tds.net Xref: csiph.com comp.lang.java.programmer:17507 To: Daniel Pitts From: "Arne Vajhoj" To: Daniel Pitts From: "Arne Vajhoj" To: Daniel Pitts From: Arne Vajhoj On 8/3/2012 11:49 PM, Daniel Pitts wrote: > On 8/3/12 5:37 PM, Arne Vajhoj wrote: >> On 8/2/2012 11:52 PM, qwertmonkey@syberianoutpost.ru wrote: >>> Why is it that if you save a unicode sequence in a file, say >>> "frantais" >>> ~ >>> \u0066\u0072\u0061\u006e\u00e7\u0061\u0069\u0073 >>> ~ >>> and then retrieve as a String you can't then convert it back to a >>> UTF-8 String >>> ~ >> >> Some code from my shelf: >> >> import java.util.regex.Matcher; >> import java.util.regex.Pattern; >> >> public class Unescape { >> private static final Pattern p = >> Pattern.compile("\\\\u([0-9A-F]{4})"); >> public static String U2U(String s) { >> String res = s; >> Matcher m = p.matcher(res); >> while(m.find()) { >> res = res.replaceAll("\\" + m.group(0), >> Character.toString((char)Integer.parseInt(m.group(1), 16))); >> } >> return res; >> } >> public static void main(String[] args) { >> >> System.out.println(U2U("\\u0041\\u0042\\u0043\\u000A\\u0031\\u0032\\u0033")) ; >> >> >> } >> } > And if you wanted this to be effecient, you'd use appendReplacement > instead of res.replaceAll() I did not even knew that existed. So: import java.util.regex.Matcher; import java.util.regex.Pattern; public class Unescape { private static final Pattern p = Pattern.compile("\\\\u([0-9A-F]{4})"); public static String U2U(String s) { Matcher m = p.matcher(s); StringBuffer res = new StringBuffer(); while (m.find()) { m.appendReplacement(res, Character.toString((char) Integer.parseInt(m.group(1), 16))); } m.appendTail(res); return res.toString(); } public static void main(String[] args) { System.out.println(U2U("\\u0041\\u0042\\u0043\\u000A\\u0031\\u0032\\u0033")); } } Arne -+- BBBS/Li6 v4.10 Dada-1 + Origin: Prism bbs (1:261/38) -+- Synchronet 3.16a-Win32 NewsLink 1.98 Time Warp of the Future BBS - telnet://time.synchro.net:24 -+- BBBS/Li6 v4.10 Dada-1 + Origin: Prism bbs (1:261/38) -+- Synchronet 3.16a-Win32 NewsLink 1.98 Time Warp of the Future BBS - telnet://time.synchro.net:24 --- BBBS/Li6 v4.10 Dada-1 * Origin: Prism bbs (1:261/38) --- Synchronet 3.16a-Win32 NewsLink 1.98 Time Warp of the Future BBS - telnet://time.synchro.net:24