Path: csiph.com!usenet.pasdenom.info!aioe.org!.POSTED!not-for-mail From: qwertmonkey@syberianoutpost.ru Newsgroups: comp.lang.java.programmer Subject: retriving escape unicode sequences from files ... Date: Fri, 3 Aug 2012 03:52:12 +0000 (UTC) Organization: Aioe.org NNTP Server Lines: 42 Message-ID: NNTP-Posting-Host: 6HUL+bS5zB7R0gZNlgpC2Q.user.speranza.aioe.org X-Complaints-To: abuse@aioe.org X-Notice: Filtered by postfilter v. 0.8.2 X-Newsreader: NetComponents Xref: csiph.com comp.lang.java.programmer:17026 Why is it that if you save a unicode sequence in a file, say "français" ~ \u0066\u0072\u0061\u006e\u00e7\u0061\u0069\u0073 ~ and then retrieve as a String you can't then convert it back to a UTF-8 String ~ As you can test with this piece of code, you can simply declare the String as a literal one or give it in the command prompt, but retrieving what seems to be the same sequence of characters (as they print to standard out) from a file doesn't seem to work ~ import java.io.ByteArrayOutputStream; import java.io.PrintStream; import java.io.UnsupportedEncodingException; import java.io.IOException; // __ public class UniKdEnk00Test{ private static final String aNWLn = System.getProperty("line.separator"); // __ public static void main (String[] aArgs){ try{ // __ if((aArgs == null) || (aArgs.length != 1)){ throw new IOException(aNWLn + "// __ usage:" + aNWLn + aNWLn + " java UniKdEnk00Test \\u0066\\u0072\\u0061\\u006e\\u00e7\\u0061\\u0069\\u0073" + aNWLn); } String aUniKdEnk = "\u0066\u0072\u0061\u006e\u00e7\u0061\u0069\u0073"; byte[] bAr = aUniKdEnk.getBytes("UTF-8"); ByteArrayOutputStream BOS = new ByteArrayOutputStream(); BOS.write(bAr, 0, bAr.length); String aUTF8L = new String(BOS.toByteArray(), "UTF-8"); System.out.println(aUTF8L); BOS.reset(); }catch(UnsupportedEncodingException UEncX){ UEncX.printStackTrace(); } catch(IOException IOX) { IOX.printStackTrace(); } // __ } } ~ lbrtchx comp.lang.java.programmer: escape unicode sequences in files ...