Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!news-out.readnews.com!transit4.readnews.com!news-out.news.tds.net!newsreading01.news.tds.net!53ab2750!not-for-mail From: "qwertmonkey" Subject: retriving escape unicode sequences from files ... Message-ID: <501C1568.56042.calajapr@time.synchro.net> X-Comment-To: All Newsgroups: comp.lang.java.programmer X-FTN-AREA: COMP.LANG.JAVA.PROGRAMMER X-FTN-MSGID: 1:261/38 276ab5eb Content-Type: text/plain; charset=IBM437 Content-Transfer-Encoding: 8bit X-Gateway: time.synchro.net [Synchronet 3.16a-Win32 NewsLink 1.98] Lines: 47 Date: Fri, 03 Aug 2012 18:54:20 GMT NNTP-Posting-Host: 69.21.70.65 X-Complaints-To: news@tds.net X-Trace: newsreading01.news.tds.net 1344020060 69.21.70.65 (Fri, 03 Aug 2012 13:54:20 CDT) NNTP-Posting-Date: Fri, 03 Aug 2012 13:54:20 CDT Organization: tds.net Xref: csiph.com comp.lang.java.programmer:17083 From: qwertmonkey@syberianoutpost.ru Why is it that if you save a unicode sequence in a file, say "frantais" ~ \u0066\u0072\u0061\u006e\u00e7\u0061\u0069\u0073 ~ and then retrieve as a String you can't then convert it back to a UTF-8 String ~ As you can test with this piece of code, you can simply declare the String as a literal one or give it in the command prompt, but retrieving what seems to be the same sequence of characters (as they print to standard out) from a file doesn't seem to work ~ import java.io.ByteArrayOutputStream; import java.io.PrintStream; import java.io.UnsupportedEncodingException; import java.io.IOException; // __ public class UniKdEnk00Test{ private static final String aNWLn = System.getProperty("line.separator"); // __ public static void main (String[] aArgs){ try{ // __ if((aArgs == null) || (aArgs.length != 1)){ throw new IOException(aNWLn + "// __ usage:" + aNWLn + aNWLn + " java UniKdEnk00Test \\u0066\\u0072\\u0061\\u006e\\u00e7\\u0061\\u0069\\u0073" + aNWLn); } String aUniKdEnk = "\u0066\u0072\u0061\u006e\u00e7\u0061\u0069\u0073"; byte[] bAr = aUniKdEnk.getBytes("UTF-8"); ByteArrayOutputStream BOS = new ByteArrayOutputStream(); BOS.write(bAr, 0, bAr.length); String aUTF8L = new String(BOS.toByteArray(), "UTF-8"); System.out.println(aUTF8L); BOS.reset(); }catch(UnsupportedEncodingException UEncX){ UEncX.printStackTrace(); } catch(IOException IOX) { IOX.printStackTrace(); } // __ } } ~ lbrtchx comp.lang.java.programmer: escape unicode sequences in files ... --- BBBS/Li6 v4.10 Dada-1 * Origin: Prism bbs (1:261/38) --- Synchronet 3.16a-Win32 NewsLink 1.98 Time Warp of the Future BBS - telnet://time.synchro.net:24