Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!news-out.readnews.com!transit3.readnews.com!news-out.news.tds.net!newsreading01.news.tds.net!53ab2750!not-for-mail From: "Daniel Pitts" Subject: Re: retriving escape unicode sequences from files ... Message-ID: <501D6353.56120.calajapr@time.synchro.net> X-Comment-To: Arne Vajhøj Newsgroups: comp.lang.java.programmer In-Reply-To: <501D6353.56117.calajapr@time.synchro.net> References: <501D6353.56117.calajapr@time.synchro.net> X-FTN-AREA: COMP.LANG.JAVA.PROGRAMMER X-FTN-MSGID: 1:261/38 d8bcb221 X-FTN-REPLY: 1:261/38 5a4efe48 Content-Type: text/plain; charset=IBM437 Content-Transfer-Encoding: 8bit X-Gateway: time.synchro.net [Synchronet 3.16a-Win32 NewsLink 1.98] Lines: 43 Date: Sat, 04 Aug 2012 18:41:42 GMT NNTP-Posting-Host: 69.21.70.65 X-Complaints-To: news@tds.net X-Trace: newsreading01.news.tds.net 1344105702 69.21.70.65 (Sat, 04 Aug 2012 13:41:42 CDT) NNTP-Posting-Date: Sat, 04 Aug 2012 13:41:42 CDT Organization: tds.net Xref: csiph.com comp.lang.java.programmer:17160 To: Arne Vajhøj From: Daniel Pitts On 8/3/12 5:37 PM, Arne Vajhoj wrote: > On 8/2/2012 11:52 PM, qwertmonkey@syberianoutpost.ru wrote: >> Why is it that if you save a unicode sequence in a file, say "frantais" >> ~ >> \u0066\u0072\u0061\u006e\u00e7\u0061\u0069\u0073 >> ~ >> and then retrieve as a String you can't then convert it back to a >> UTF-8 String >> ~ > > Some code from my shelf: > > import java.util.regex.Matcher; > import java.util.regex.Pattern; > > public class Unescape { > private static final Pattern p = > Pattern.compile("\\\\u([0-9A-F]{4})"); > public static String U2U(String s) { > String res = s; > Matcher m = p.matcher(res); > while(m.find()) { > res = res.replaceAll("\\" + m.group(0), > Character.toString((char)Integer.parseInt(m.group(1), 16))); > } > return res; > } > public static void main(String[] args) { > > System.out.println(U2U("\\u0041\\u0042\\u0043\\u000A\\u0031\\u0032\\u0033")); > > } > } And if you wanted this to be effecient, you'd use appendReplacement instead of res.replaceAll() --- BBBS/Li6 v4.10 Dada-1 * Origin: Prism bbs (1:261/38) --- Synchronet 3.16a-Win32 NewsLink 1.98 Time Warp of the Future BBS - telnet://time.synchro.net:24