Groups > comp.lang.java.programmer > #20294 > unrolled thread

Unicode escapes and String literals?

Started by	Knute Johnson <nospam@knutejohnson.com>
First post	2012-12-13 09:31 -0800
Last post	2012-12-17 02:42 -0800
Articles	20 on this page of 26 — 9 participants

Back to article view | Back to comp.lang.java.programmer

  Unicode escapes and String literals? Knute Johnson <nospam@knutejohnson.com> - 2012-12-13 09:31 -0800
    Re: Unicode escapes and String literals? Thomas Richter <thor@math.tu-berlin.de> - 2012-12-13 18:51 +0100
      Re: Unicode escapes and String literals? Knute Johnson <nospam@knutejohnson.com> - 2012-12-13 10:47 -0800
        Re: Unicode escapes and String literals? Lew <lewbloch@gmail.com> - 2012-12-13 11:41 -0800
          Re: Unicode escapes and String literals? rossum <rossum48@coldmail.com> - 2012-12-14 13:32 +0000
            Re: Unicode escapes and String literals? Lew <lewbloch@gmail.com> - 2012-12-14 15:16 -0800
        Re: Unicode escapes and String literals? markspace <-@.> - 2012-12-13 12:58 -0800
          Re: Unicode escapes and String literals? David Lamb <dalamb@cs.queensu.ca> - 2012-12-13 16:21 -0500
            Re: Unicode escapes and String literals? markspace <-@.> - 2012-12-13 14:00 -0800
              Re: Unicode escapes and String literals? David Lamb <dalamb@cs.queensu.ca> - 2012-12-13 17:17 -0500
              Re: Unicode escapes and String literals? David Lamb <dalamb@cs.queensu.ca> - 2012-12-13 17:19 -0500
              Re: Unicode escapes and String literals? Lew <lewbloch@gmail.com> - 2012-12-13 17:11 -0800
            Re: Unicode escapes and String literals? Arne Vajhøj <arne@vajhoej.dk> - 2012-12-13 19:38 -0500
    Re: Unicode escapes and String literals? Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-12-13 11:46 -0800
      Re: Unicode escapes and String literals? Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-12-13 11:49 -0800
      Re: Unicode escapes and String literals? Knute Johnson <nospam@knutejohnson.com> - 2012-12-13 14:55 -0800
        Re: Unicode escapes and String literals? markspace <-@.> - 2012-12-13 15:32 -0800
    Re: Unicode escapes and String literals? Arne Vajhøj <arne@vajhoej.dk> - 2012-12-13 18:09 -0500
      Re: Unicode escapes and String literals? Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-12-13 15:52 -0800
        Re: Unicode escapes and String literals? Arne Vajhøj <arne@vajhoej.dk> - 2012-12-13 19:40 -0500
      Re: Unicode escapes and String literals? Knute Johnson <nospam@knutejohnson.com> - 2012-12-13 16:11 -0800
        Re: Unicode escapes and String literals? Arne Vajhøj <arne@vajhoej.dk> - 2012-12-13 19:43 -0500
          Re: Unicode escapes and String literals? Knute Johnson <nospam@knutejohnson.com> - 2012-12-13 17:08 -0800
    Re: Unicode escapes and String literals? Roedy Green <see_website@mindprod.com.invalid> - 2012-12-14 02:28 -0800
      Re: Unicode escapes and String literals? Arne Vajhøj <arne@vajhoej.dk> - 2012-12-14 21:05 -0500
    Re: Unicode escapes and String literals? Roedy Green <see_website@mindprod.com.invalid> - 2012-12-17 02:42 -0800

Page 1 of 2 [1] 2 Next page →

#20294 — Unicode escapes and String literals?

From	Knute Johnson <nospam@knutejohnson.com>
Date	2012-12-13 09:31 -0800
Subject	Unicode escapes and String literals?
Message-ID	<kad3d6$eoo$1@dont-email.me>

I just had a great revelation as I was putting together my SSCCE for the 
question I was going to ask.  So it has changed my question.  How do I 
do the conversion of unicode escape sequences to a String that are done 
by string literals?

String s = "\u0066\u0065\u0064";

becomes "fed" but if you create a String with \u0066\u0065\u0064 in it 
without using the literal it stays \u0066\u0065\u0064.  Is there a built 
in mechanism in Java for doing that translation to a String?

-- 

Knute Johnson

[toc] | [next] | [standalone]

#20295

From	Thomas Richter <thor@math.tu-berlin.de>
Date	2012-12-13 18:51 +0100
Message-ID	<kad4io$see$1@news2.informatik.uni-stuttgart.de>
In reply to	#20294

On 13.12.2012 18:31, Knute Johnson wrote:
> I just had a great revelation as I was putting together my SSCCE for the
> question I was going to ask. So it has changed my question. How do I do
> the conversion of unicode escape sequences to a String that are done by
> string literals?
>
> String s = "\u0066\u0065\u0064";
>
> becomes "fed" but if you create a String with \u0066\u0065\u0064 in it
> without using the literal it stays \u0066\u0065\u0064. Is there a built
> in mechanism in Java for doing that translation to a String?

Yes. It's called "compiler". The same part of the compiler that 
translates a "\t" in a string literal to the TAB control character also 
replaces the unicode sequences in the string literal to the 
corresponding unicode encoding.

Greetings,
	Thomas

[toc] | [prev] | [next] | [standalone]

#20296

From	Knute Johnson <nospam@knutejohnson.com>
Date	2012-12-13 10:47 -0800
Message-ID	<kad7rb$btq$1@dont-email.me>
In reply to	#20295

On 12/13/2012 9:51 AM, Thomas Richter wrote:
> On 13.12.2012 18:31, Knute Johnson wrote:
>> I just had a great revelation as I was putting together my SSCCE for the
>> question I was going to ask. So it has changed my question. How do I do
>> the conversion of unicode escape sequences to a String that are done by
>> string literals?
>>
>> String s = "\u0066\u0065\u0064";
>>
>> becomes "fed" but if you create a String with \u0066\u0065\u0064 in it
>> without using the literal it stays \u0066\u0065\u0064. Is there a built
>> in mechanism in Java for doing that translation to a String?
>
> Yes. It's called "compiler". The same part of the compiler that
> translates a "\t" in a string literal to the TAB control character also
> replaces the unicode sequences in the string literal to the
> corresponding unicode encoding.
>
> Greetings,
>      Thomas

I want to be able to do it to a String not to a string literal.

-- 

Knute Johnson

[toc] | [prev] | [next] | [standalone]

#20297

From	Lew <lewbloch@gmail.com>
Date	2012-12-13 11:41 -0800
Message-ID	<8d054b85-a6ed-4bab-a8a9-81e967a84fda@googlegroups.com>
In reply to	#20296

Knute Johnson wrote:
> Thomas Richter wrote:
>> Knute Johnson wrote:
>>> I just had a great revelation as I was putting together my SSCCE for the
>>> question I was going to ask. So it has changed my question. How do I do
>>> the conversion of unicode [sic] escape sequences to a String that are done by
>>> string literals?

They aren't done by String literals.

>>> String s = "\u0066\u0065\u0064";
>>> becomes "fed" but if you create a String with \u0066\u0065\u0064 in it

Exactly how?

>>> without using the literal it stays \u0066\u0065\u0064. Is there a built
>>> in mechanism in Java for doing that translation to a String?

No.

>> Yes. It's called "compiler". The same part of the compiler that

That's not exactly correct, and it certainly is not the same part that translates '\t'.

>> translates a "\t" in a string literal to the TAB control character also
>> replaces the unicode sequences in the string literal to the
>> corresponding unicode encoding.

Nope.

> I want to be able to do it to a String not to a string literal.

You want to do what, exactly? I'm not clear on what you're trying to accomplish.

'\u' sequences are pre-compile, not during compile. Their presence is exactly equivalent 
to typing the corresponding Unicode character directly. 

You can embed them in identifiers, directives, anywhere the corresponding character can go.

Not just literals.

For that matter, you can use them in numeric literals.

<sscce>
package temp;

/**
 * ShowUnicodeEscapes.
 */
public class ShowUnicodeEscapes {

    static final \u0069nt COUN\u0054 = \u0030\u003b

    /**
     * main.
     * 
     * @param args String array of arguments.
     */
    public static void main(String[] args) {
        System.out.println("COUNT = \u0022+ COUNT);
    }
}
</sscce>

[toc] | [prev] | [next] | [standalone]

#20326

From	rossum <rossum48@coldmail.com>
Date	2012-12-14 13:32 +0000
Message-ID	<0camc85sbkqmmgic2r05toov7p65c0hoi1@4ax.com>
In reply to	#20297

On Thu, 13 Dec 2012 11:41:03 -0800 (PST), Lew <lewbloch@gmail.com>
wrote:

>>>> String s = "\u0066\u0065\u0064";
>>>> becomes "fed" but if you create a String with \u0066\u0065\u0064 in it
>
>Exactly how?

  StringBuilder sb = new StringBuilder(18);
  sb.append('\\');
  sb.append("u0066");
  sb.append('\\');
  sb.append("u0065");
  sb.append('\\');
  sb.append("u0064");
        
  String ss = sb.toString();
        
  System.out.println(ss);

Produces: \u0066\u0065\u0064

Which still leaves the question why?

rossum

[toc] | [prev] | [next] | [standalone]

#20334

From	Lew <lewbloch@gmail.com>
Date	2012-12-14 15:16 -0800
Message-ID	<7db64abf-3fdd-412b-8af6-11b6e0a7231c@googlegroups.com>
In reply to	#20326

rossum wrote:
> Lew wrote:
>>>>>  if you create a String with \u0066\u0065\u0064 in it
>>
>>Exactly how?
> 
> StringBuilder sb = new StringBuilder(18);
>   sb.append('\\');
>   sb.append("u0066");
>   sb.append('\\');
>   sb.append("u0065");
>   sb.append('\\');
>   sb.append("u0064");
>         
>   String ss = sb.toString();
>   System.out.println(ss);
> 
> Produces: \u0066\u0065\u0064
> 
> Which still leaves the question why?

This has been explained to death upthread already.

Those are not Unicode escapes, that's why.

You have created the String literal that comprises backslashes, the letter "u" and 
various digits. That happens at runtime.

There is no way for the pre-compiler to see those and convert them.

That code sequence is exactly equivalent to this one:

  StringBuilder sb = new StringBuilder(\u0031\u0038); 
  sb.append('\u005c\u005c\u0027)\u003b 
  sb.append("\u0075\u0030\u0030\u0036\u0036"); 
  sb.append('\u005c\u005c\u0027)\u003b 
  sb.append("u006\u0035\u0022); 
  sb.append('\u005c\u005c\u0027)\u003b 
  sb.append(\u0022\u00750064"); 

Unicode escape sequence processing is a pre-compiler operation, not a compiler 
operation and not a run-time operation. 

To do what you want you have to parse the string and convert it yourself.

-- 
Lew

[toc] | [prev] | [next] | [standalone]

#20300

From	markspace <-@.>
Date	2012-12-13 12:58 -0800
Message-ID	<kadfh3$r6o$1@dont-email.me>
In reply to	#20296

On 12/13/2012 10:47 AM, Knute Johnson wrote:
>
> I want to be able to do it to a String not to a string literal.
>

Daniel showed one way to interpret your request.  Here's another.  Pay 
special attention to the bits out side the quotes.  This program prints 
"fed".

public class EscapeTest {
    public static void main(String[] args) {
       String \u0066\u0065\u0064 = "\u0066\u0065\u0064";
       System.out.println( fed );
    }
}

[toc] | [prev] | [next] | [standalone]

#20301

From	David Lamb <dalamb@cs.queensu.ca>
Date	2012-12-13 16:21 -0500
Message-ID	<kadgt6$862$1@dont-email.me>
In reply to	#20300

On 13/12/2012 3:58 PM, markspace wrote:
> On 12/13/2012 10:47 AM, Knute Johnson wrote:
>>
>> I want to be able to do it to a String not to a string literal.
>>
>
> Daniel showed one way to interpret your request.  Here's another.  Pay
> special attention to the bits out side the quotes.  This program prints
> "fed".
>
>
> public class EscapeTest {
>     public static void main(String[] args) {
>        String \u0066\u0065\u0064 = "\u0066\u0065\u0064";
>        System.out.println( fed );
>     }
> }

Cute. But presupposing that the OP isn't the idiot some people seem to 
have assumed, I suspect he meant something more like

   String line = someBufferedFile.readline();
   ... change all \u escapes into unicode in "line" ... [1]

where by "\u escapes" he mean the 6-character substrings one usually 
types in string literals. The OP needs to look into "code points" and 
the corresponding codepoint to Character conversions at
http://docs.oracle.com/javase/7/docs/api/java/lang/Character.html

[1] which, for the pedantic, really means "create a new string(buffer) 
from line"

[toc] | [prev] | [next] | [standalone]

#20302

From	markspace <-@.>
Date	2012-12-13 14:00 -0800
Message-ID	<kadj6o$m6v$1@dont-email.me>
In reply to	#20301

On 12/13/2012 1:21 PM, David Lamb wrote:
>
> Cute. But presupposing that the OP isn't the idiot some people seem to
> have assumed, I suspect he meant something more like
>
>    String line = someBufferedFile.readline();
>    ... change all \u escapes into unicode in "line" ... [1]

Maybe.  But your code above is obvious, imo.  Either Knute had a brain 
fart and forgot about \\ to escape a slash, or he ran into some other 
problem.

My point was that there's a very simple pre-compiler for Java.  It 
translates all \u-escapes into characters before the compiler proper 
sees it.  There's no difference to the Java compiler between "fed" and 
"\u0066\u0065\u0064".  It literally can't tell the difference.

That's an important distinction.

[toc] | [prev] | [next] | [standalone]

#20304

From	David Lamb <dalamb@cs.queensu.ca>
Date	2012-12-13 17:17 -0500
Message-ID	<kadk4j$r35$1@dont-email.me>
In reply to	#20302

On 13/12/2012 5:00 PM, markspace wrote:
> My point was that there's a very simple pre-compiler for Java.  It
> translates all \u-escapes into characters before the compiler proper
> sees it.  There's no difference to the Java compiler between "fed" and
> "\u0066\u0065\u0064".  It literally can't tell the difference.

I should probably have found a different point in the thread to hang my 
comment, since you're perfectly correct.

[toc] | [prev] | [next] | [standalone]

#20305

From	David Lamb <dalamb@cs.queensu.ca>
Date	2012-12-13 17:19 -0500
Message-ID	<kadk8e$r35$2@dont-email.me>
In reply to	#20302

On 13/12/2012 5:00 PM, markspace wrote:
> Either Knute had a brain fart and forgot about \\ to escape a slash, or
> he ran into some other problem.

Some other problem. As I said, I suspect he didn't know about the 
codepoint-to-character methods. Let's wait to see if he responds to my 
suggestion. Or for Lew to condemn him for not thinking of the right spot 
to read in the API docs.

[toc] | [prev] | [next] | [standalone]

#20316

From	Lew <lewbloch@gmail.com>
Date	2012-12-13 17:11 -0800
Message-ID	<8dbfcaee-b736-40f7-ae16-ea7fb7e4ac00@googlegroups.com>
In reply to	#20302

markspace wrote:
> David Lamb wrote:
>> Cute. But presupposing that the OP isn't the idiot some people seem to
> > have assumed, I suspect he meant something more like
>>
>>    String line = someBufferedFile.readline();
>>    ... change all \u escapes into unicode in "line" ... [1]

That was not obvious to me, hence my question as to what he did mean.

> Maybe.  But your code above is obvious, imo.  Either Knute had a brain 
> fart and forgot about \\ to escape a slash, or he ran into some other 
> problem.
> 
> My point was that there's a very simple pre-compiler for Java.  It 
> translates all \u-escapes into characters before the compiler proper 
> sees it.  There's no difference to the Java compiler between "fed" and 
> "\u0066\u0065\u0064".  It literally can't tell the difference.

That was also the point of my SSCCE.

> That's an important distinction.

-- 
Lew

[toc] | [prev] | [next] | [standalone]

#20311

From	Arne Vajhøj <arne@vajhoej.dk>
Date	2012-12-13 19:38 -0500
Message-ID	<50ca7505$0$287$14726298@news.sunsite.dk>
In reply to	#20301

On 12/13/2012 4:21 PM, David Lamb wrote:
> Cute. But presupposing that the OP isn't the idiot some people seem to
> have assumed, I suspect he meant something more like
>
>    String line = someBufferedFile.readline();
>    ... change all \u escapes into unicode in "line" ... [1]
>
> where by "\u escapes" he mean the 6-character substrings one usually
> types in string literals. The OP needs to look into "code points" and
> the corresponding codepoint to Character conversions at
> http://docs.oracle.com/javase/7/docs/api/java/lang/Character.html

Why?

I think he is only asking for conversion between string with
escape and 16 bit chars.

The mess with code points and surrogate pairs is no
different from usual.

Arne

[toc] | [prev] | [next] | [standalone]

#20298

From	Daniel Pitts <newsgroup.nospam@virtualinfinity.net>
Date	2012-12-13 11:46 -0800
Message-ID	<Ggqys.16833$Sl.15694@newsfe27.iad>
In reply to	#20294

On 12/13/12 9:31 AM, Knute Johnson wrote:
> I just had a great revelation as I was putting together my SSCCE for the
> question I was going to ask.  So it has changed my question.  How do I
> do the conversion of unicode escape sequences to a String that are done
> by string literals?
>
> String s = "\u0066\u0065\u0064";
>
> becomes "fed" but if you create a String with \u0066\u0065\u0064 in it
> without using the literal it stays \u0066\u0065\u0064.  Is there a built
> in mechanism in Java for doing that translation to a String?
>

Do you mean, you have a String, whose value is "\\u0066\\u0065\\u0064", 
you want to pass that String to a method which will return fed.

meaning

String foo = "\\u0066\\u0065\\u0064";

System.out.println(foo); // prints \u0066\u0065\u0064
System.out.println(magicFunction(foo)); // prints fed

There might be such a function in Apache Commons library, but I don't 
think there is one in the standard API.  I could be wrong though.

[toc] | [prev] | [next] | [standalone]

#20299

From	Daniel Pitts <newsgroup.nospam@virtualinfinity.net>
Date	2012-12-13 11:49 -0800
Message-ID	<fjqys.16834$Sl.8271@newsfe27.iad>
In reply to	#20298

On 12/13/12 11:46 AM, Daniel Pitts wrote:
> On 12/13/12 9:31 AM, Knute Johnson wrote:
>> I just had a great revelation as I was putting together my SSCCE for the
>> question I was going to ask.  So it has changed my question.  How do I
>> do the conversion of unicode escape sequences to a String that are done
>> by string literals?
>>
>> String s = "\u0066\u0065\u0064";
>>
>> becomes "fed" but if you create a String with \u0066\u0065\u0064 in it
>> without using the literal it stays \u0066\u0065\u0064.  Is there a built
>> in mechanism in Java for doing that translation to a String?
>>
>
> Do you mean, you have a String, whose value is "\\u0066\\u0065\\u0064",
> you want to pass that String to a method which will return fed.
>
> meaning
>
> String foo = "\\u0066\\u0065\\u0064";
>
> System.out.println(foo); // prints \u0066\u0065\u0064
> System.out.println(magicFunction(foo)); // prints fed
>
> There might be such a function in Apache Commons library, but I don't
> think there is one in the standard API.  I could be wrong though.

Two minutes of googling and reading a stack-overflow post gave me this link:

<http://commons.apache.org/lang/api/org/apache/commons/lang3/StringEscapeUtils.html#unescapeJava%28java.lang.String%29>

[toc] | [prev] | [next] | [standalone]

#20306

From	Knute Johnson <nospam@knutejohnson.com>
Date	2012-12-13 14:55 -0800
Message-ID	<kadmcl$74d$1@dont-email.me>
In reply to	#20298

On 12/13/2012 11:46 AM, Daniel Pitts wrote:
> On 12/13/12 9:31 AM, Knute Johnson wrote:
>> I just had a great revelation as I was putting together my SSCCE for the
>> question I was going to ask.  So it has changed my question.  How do I
>> do the conversion of unicode escape sequences to a String that are done
>> by string literals?
>>
>> String s = "\u0066\u0065\u0064";
>>
>> becomes "fed" but if you create a String with \u0066\u0065\u0064 in it
>> without using the literal it stays \u0066\u0065\u0064.  Is there a built
>> in mechanism in Java for doing that translation to a String?
>>
>
> Do you mean, you have a String, whose value is "\\u0066\\u0065\\u0064",
> you want to pass that String to a method which will return fed.
>
> meaning
>
> String foo = "\\u0066\\u0065\\u0064";
>
> System.out.println(foo); // prints \u0066\u0065\u0064
> System.out.println(magicFunction(foo)); // prints fed
>
> There might be such a function in Apache Commons library, but I don't
> think there is one in the standard API.  I could be wrong though.

I obviously didn't explain it well the first time around, so let me try 
again.  I understand that the compiler reads unicode escape sequences 
pretty much anywhere and converts them to characters.  What I want to be 
able to do is to do that conversion on characters that are in a String. 
  So if in my String I had the characters \u0066\u0065\u0064 I would 
like to convert those to a String of "fed".

I did look at the apache commons link you sent and that would probably 
do it but if the compiler can translate them it must have a method 
already.  Maybe it's not public but that is what I was asking.

So thanks everybody for your answers.

-- 

Knute Johnson

[toc] | [prev] | [next] | [standalone]

#20308

From	markspace <-@.>
Date	2012-12-13 15:32 -0800
Message-ID	<kadoht$kab$1@dont-email.me>
In reply to	#20306

On 12/13/2012 2:55 PM, Knute Johnson wrote:

> I did look at the apache commons link you sent and that would probably
> do it but if the compiler can translate them it must have a method
> already.  Maybe it's not public but that is what I was asking.

The compilers internal methods aren't part of the public API.  The 
closest thing I'm aware of is Properties#load(), which does convert \u 
and some other escape sequences in a properties file.  However their 
method do do that is private.

I think if it's in the Apache utils then it's fair to say there's no 
Java API equivalent.  Otherwise, why make an Apache utils method?

[toc] | [prev] | [next] | [standalone]

#20307

From	Arne Vajhøj <arne@vajhoej.dk>
Date	2012-12-13 18:09 -0500
Message-ID	<50ca6046$0$284$14726298@news.sunsite.dk>
In reply to	#20294

On 12/13/2012 12:31 PM, Knute Johnson wrote:
> I just had a great revelation as I was putting together my SSCCE for the
> question I was going to ask.  So it has changed my question.  How do I
> do the conversion of unicode escape sequences to a String that are done
> by string literals?
>
> String s = "\u0066\u0065\u0064";
>
> becomes "fed" but if you create a String with \u0066\u0065\u0064 in it
> without using the literal it stays \u0066\u0065\u0064.  Is there a built
> in mechanism in Java for doing that translation to a String?

I don't think there is anything built in.

But it is trivial to code.

This was posted just a few months back:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Unescape {
     private static final Pattern p = Pattern.compile("\\\\u([0-9A-F]{4})");
     public static String U2U(String s) {
         //String res = s;
         //Matcher m = p.matcher(res);
         //while (m.find()) {
         //  res = res.replaceAll("\\" + m.group(0), 
Character.toString((char) Integer.parseInt(m.group(1), 16)));
         //}
         //return res;
         Matcher m = p.matcher(s);
         StringBuffer res = new StringBuffer();
         while (m.find()) {
             m.appendReplacement(res, Character.toString((char) 
Integer.parseInt(m.group(1), 16)));
         }
         m.appendTail(res);
         return res.toString();
     }
     public static void main(String[] args) {
 
System.out.println(U2U("\\u0041\\u0042\\u0043\\u000A\\u0031\\u0032\\u0033"));
     }
}

Arne

[toc] | [prev] | [next] | [standalone]

#20309

From	Daniel Pitts <newsgroup.nospam@virtualinfinity.net>
Date	2012-12-13 15:52 -0800
Message-ID	<_Stys.6673$Ep5.1891@newsfe08.iad>
In reply to	#20307

On 12/13/12 3:09 PM, Arne Vajhøj wrote:
> On 12/13/2012 12:31 PM, Knute Johnson wrote:
>> I just had a great revelation as I was putting together my SSCCE for the
>> question I was going to ask.  So it has changed my question.  How do I
>> do the conversion of unicode escape sequences to a String that are done
>> by string literals?
>>
>> String s = "\u0066\u0065\u0064";
>>
>> becomes "fed" but if you create a String with \u0066\u0065\u0064 in it
>> without using the literal it stays \u0066\u0065\u0064.  Is there a built
>> in mechanism in Java for doing that translation to a String?
>
> I don't think there is anything built in.
>
> But it is trivial to code.
Famous last words.  Nothing in Unicode is trivial.  It may seem trivial, 
but there are potentially gotchas in the spec.

I don't know of any off the top of my head, but I wouldn't just assume 
it was trivial unless I knew the spec backward and forward.

[toc] | [prev] | [next] | [standalone]

#20312

From	Arne Vajhøj <arne@vajhoej.dk>
Date	2012-12-13 19:40 -0500
Message-ID	<50ca7595$0$287$14726298@news.sunsite.dk>
In reply to	#20309

On 12/13/2012 6:52 PM, Daniel Pitts wrote:
> On 12/13/12 3:09 PM, Arne Vajhøj wrote:
>> I don't think there is anything built in.
>>
>> But it is trivial to code.
> Famous last words.  Nothing in Unicode is trivial.  It may seem trivial,
> but there are potentially gotchas in the spec.
>
> I don't know of any off the top of my head, but I wouldn't just assume
> it was trivial unless I knew the spec backward and forward.

Unicode can be tricky.

But in reality this is not really a unicode problem.

It is about converting substrings of length 6 to
16 bit chars.

Which substantial reduces the complexity.

Arne

[toc] | [prev] | [next] | [standalone]

Page 1 of 2 [1] 2 Next page →

csiph-web

Unicode escapes and String literals?

Contents

#20294 — Unicode escapes and String literals?

#20295

#20296

#20297

#20326

#20334

#20300

#20301

#20302

#20304

#20305

#20316

#20311

#20298

#20299

#20306

#20308

#20307

#20309

#20312