Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!news.ripco.com!news-out.news.tds.net!newsreading01.news.tds.net!53ab2750!not-for-mail
From: "Arne Vajhøj" <arne.vajhøj@1:261/38.remove-p82-this>
Subject: Re: retriving escape unicode sequences from files ...
Message-ID: <5021F864.56292.calajapr@time.synchro.net>
X-Comment-To: Daniel Pitts
Newsgroups: comp.lang.java.programmer
In-Reply-To: <501D6353.56120.calajapr@time.synchro.net>
References: <501D6353.56120.calajapr@time.synchro.net>
X-FTN-AREA: COMP.LANG.JAVA.PROGRAMMER
X-FTN-MSGID: 1:261/38 513bed3e
X-FTN-REPLY: 1:261/38 d8bcb221
Content-Type: text/plain; charset=IBM437
Content-Transfer-Encoding: 8bit
X-Gateway: time.synchro.net [Synchronet 3.16a-Win32 NewsLink 1.98]
Lines: 74
Date: Wed, 08 Aug 2012 06:20:18 GMT
NNTP-Posting-Host: 69.21.70.65
X-Complaints-To: news@tds.net
X-Trace: newsreading01.news.tds.net 1344406818 69.21.70.65 (Wed, 08 Aug 2012 01:20:18 CDT)
NNTP-Posting-Date: Wed, 08 Aug 2012 01:20:18 CDT
Organization: tds.net
Xref: csiph.com comp.lang.java.programmer:17338

  To: Daniel Pitts
From: Arne Vajhoj <arne@vajhoej.dk>

On 8/3/2012 11:49 PM, Daniel Pitts wrote:
> On 8/3/12 5:37 PM, Arne Vajhoj wrote:
>> On 8/2/2012 11:52 PM, qwertmonkey@syberianoutpost.ru wrote:
>>>   Why is it that if you save a unicode sequence in a file, say
>>> "frantais"
>>> ~
>>> \u0066\u0072\u0061\u006e\u00e7\u0061\u0069\u0073
>>> ~
>>>   and then retrieve as a String you can't then convert it back to a
>>> UTF-8 String
>>> ~
>>
>> Some code from my shelf:
>>
>> import java.util.regex.Matcher;
>> import java.util.regex.Pattern;
>>
>> public class Unescape {
>>      private static final Pattern p =
>> Pattern.compile("\\\\u([0-9A-F]{4})");
>>      public static String U2U(String s) {
>>          String res = s;
>>          Matcher m = p.matcher(res);
>>          while(m.find()) {
>>              res = res.replaceAll("\\" + m.group(0),
>> Character.toString((char)Integer.parseInt(m.group(1), 16)));
>>          }
>>          return res;
>>      }
>>      public static void main(String[] args) {
>>
>> System.out.println(U2U("\\u0041\\u0042\\u0043\\u000A\\u0031\\u0032\\u0033"))
;
>>
>>
>>      }
>> }
> And if you wanted this to be effecient, you'd use appendReplacement
> instead of res.replaceAll()

I did not even knew that existed.

So:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Unescape {
        private static final Pattern p = Pattern.compile("\\\\u([0-9A-F]{4})");
        public static String U2U(String s) {
                Matcher m = p.matcher(s);
                StringBuffer res = new StringBuffer();
                while (m.find()) {
                        m.appendReplacement(res, Character.toString((char)
Integer.parseInt(m.group(1), 16)));
                }
                m.appendTail(res);
                return res.toString();
        }
        public static void main(String[] args) {

System.out.println(U2U("\\u0041\\u0042\\u0043\\u000A\\u0031\\u0032\\u0033"));
        }
}

Arne

--- BBBS/Li6 v4.10 Dada-1
 * Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24