Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.programmer > #17507
| Path | csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!news.ripco.com!news-out.news.tds.net!newsreading01.news.tds.net!53ab2750!not-for-mail |
|---|---|
| From | "Arne Vajhøj" <arne.vajhøj@1:261/38.remove-k2r-this> |
| Subject | Re: retriving escape unicode sequences from files ... |
| Message-ID | <5023FE37.56456.calajapr@time.synchro.net> (permalink) |
| X-Comment-To | Daniel Pitts |
| Newsgroups | comp.lang.java.programmer |
| X-FTN-AREA | COMP.LANG.JAVA.PROGRAMMER |
| X-FTN-MSGID | 1:261/38 68540380 |
| Content-Type | text/plain; charset=IBM437 |
| Content-Transfer-Encoding | 8bit |
| X-Gateway | time.synchro.net [Synchronet 3.16a-Win32 NewsLink 1.98] |
| Lines | 90 |
| Date | Thu, 09 Aug 2012 18:44:31 GMT |
| NNTP-Posting-Host | 69.21.70.65 |
| X-Complaints-To | news@tds.net |
| X-Trace | newsreading01.news.tds.net 1344537871 69.21.70.65 (Thu, 09 Aug 2012 13:44:31 CDT) |
| NNTP-Posting-Date | Thu, 09 Aug 2012 13:44:31 CDT |
| Organization | tds.net |
| Xref | csiph.com comp.lang.java.programmer:17507 |
Show key headers only | View raw
To: Daniel Pitts
From: "Arne Vajhoj" <arne.vajhoj@1:261/38.remove-qhs-this>
To: Daniel Pitts
From: "Arne Vajhoj" <arne.vajhoj@1:261/38.remove-p82-this>
To: Daniel Pitts
From: Arne Vajhoj <arne@vajhoej.dk>
On 8/3/2012 11:49 PM, Daniel Pitts wrote:
> On 8/3/12 5:37 PM, Arne Vajhoj wrote:
>> On 8/2/2012 11:52 PM, qwertmonkey@syberianoutpost.ru wrote:
>>> Why is it that if you save a unicode sequence in a file, say
>>> "frantais"
>>> ~
>>> \u0066\u0072\u0061\u006e\u00e7\u0061\u0069\u0073
>>> ~
>>> and then retrieve as a String you can't then convert it back to a
>>> UTF-8 String
>>> ~
>>
>> Some code from my shelf:
>>
>> import java.util.regex.Matcher;
>> import java.util.regex.Pattern;
>>
>> public class Unescape {
>> private static final Pattern p =
>> Pattern.compile("\\\\u([0-9A-F]{4})");
>> public static String U2U(String s) {
>> String res = s;
>> Matcher m = p.matcher(res);
>> while(m.find()) {
>> res = res.replaceAll("\\" + m.group(0),
>> Character.toString((char)Integer.parseInt(m.group(1), 16)));
>> }
>> return res;
>> }
>> public static void main(String[] args) {
>>
>> System.out.println(U2U("\\u0041\\u0042\\u0043\\u000A\\u0031\\u0032\\u0033"))
;
>>
>>
>> }
>> }
> And if you wanted this to be effecient, you'd use appendReplacement
> instead of res.replaceAll()
I did not even knew that existed.
So:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Unescape {
private static final Pattern p = Pattern.compile("\\\\u([0-9A-F]{4})");
public static String U2U(String s) {
Matcher m = p.matcher(s);
StringBuffer res = new StringBuffer();
while (m.find()) {
m.appendReplacement(res, Character.toString((char)
Integer.parseInt(m.group(1), 16)));
}
m.appendTail(res);
return res.toString();
}
public static void main(String[] args) {
System.out.println(U2U("\\u0041\\u0042\\u0043\\u000A\\u0031\\u0032\\u0033"));
}
}
Arne
-+- BBBS/Li6 v4.10 Dada-1
+ Origin: Prism bbs (1:261/38)
-+- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
-+- BBBS/Li6 v4.10 Dada-1
+ Origin: Prism bbs (1:261/38)
-+- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
--- BBBS/Li6 v4.10 Dada-1
* Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
Back to comp.lang.java.programmer | Previous | Next | Find similar | Unroll thread
Re: retriving escape unicode sequences from files ... "Arne Vajhøj" <arne.vajhøj@1:261/38.remove-k2r-this> - 2012-08-09 18:44 +0000
csiph-web