Groups > comp.lang.java.programmer > #26157

Re: Why No Supplemental Characters In Character Literals?

From	markspace <nospam@nowhere.com>
Newsgroups	comp.lang.java.programmer
Subject	Re: Why No Supplemental Characters In Character Literals?
Date	2011-02-04 11:04 -0800
Organization	A noiseless patient Spider
Message-ID	<iihikc$dpj$1@news.eternal-september.org> (permalink)
References	<iig4k2$sus$1@lust.ihug.co.nz> <iig6j2$dul$2@news.albasani.net> <iig84e$uqu$1@lust.ihug.co.nz> <iigtgn$ieq$1@news.eternal-september.org> <vihok6l4j8bjpetle24j639im2buguab6m@4ax.com>

Show all headers | View raw

On 2/4/2011 10:36 AM, Roedy Green wrote:
> On Fri, 04 Feb 2011 08:04:23 -0500, Joshua Cranmer
> <Pidgeot18@verizon.invalid>  wrote, quoted or indirectly quoted someone
> who said :
>
>> The JLS clearly states that a char is an unsigned 16-bit value.
>
> Perhaps char will be redefined as 32 bits, or a new unsigned 32-bit
> echar type will be invented.

An int is currently used for this purpose.  For example, 
Character.codePointAt(CharSequence,int) returns an int.

<http://download.oracle.com/javase/6/docs/api/java/lang/Character.html>

Also, from that same page, this explains the whole story in one go:

"Unicode Character Representations

"The char data type (and therefore the value that a Character object 
encapsulates) are based on the original Unicode specification, which 
defined characters as fixed-width 16-bit entities. The Unicode standard 
has since been changed to allow for characters whose representation 
requires more than 16 bits. The range of legal code points is now U+0000 
to U+10FFFF, known as Unicode scalar value. (Refer to the definition of 
the U+n notation in the Unicode standard.)

"The set of characters from U+0000 to U+FFFF is sometimes referred to as 
the Basic Multilingual Plane (BMP). Characters whose code points are 
greater than U+FFFF are called supplementary characters. The Java 2 
platform uses the UTF-16 representation in char arrays and in the String 
and StringBuffer classes. In this representation, supplementary 
characters are represented as a pair of char values, the first from the 
high-surrogates range, (\uD800-\uDBFF), the second from the 
low-surrogates range (\uDC00-\uDFFF).

"A char value, therefore, represents Basic Multilingual Plane (BMP) code 
points, including the surrogate code points, or code units of the UTF-16 
encoding. An int value represents all Unicode code points, including 
supplementary code points. The lower (least significant) 21 bits of int 
are used to represent Unicode code points and the upper (most 
significant) 11 bits must be zero.

...etc....

Back to comp.lang.java.programmer | Previous | Next — Previous in thread | Next in thread | Find similar

Thread

Re: Why No Supplemental Characters In Character Literals? Lew <noone@lewscanon.com> - 2011-02-04 01:34 -0500
  Re: Why No Supplemental Characters In Character Literals? markspace <nospam@nowhere.com> - 2011-02-04 11:04 -0800
  Re: Why No Supplemental Characters In Character Literals? Joshua Cranmer <Pidgeot18@verizon.invalid> - 2011-02-04 08:04 -0500

csiph-web