Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.help > #1863

Re: Actual width of unicode chracters.

Newsgroups comp.lang.java.help
From "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject Re: Actual width of unicode chracters.
References (2 earlier) <jr9h8n$ark$1@dont-email.me> <jr9j3p$qg0$1@tnews.hananet.net> <ac90eba4-9efe-4c35-9f2b-2590326f1fe5@googlegroups.com> <75foa9-b46.ln1@s.simpson148.btinternet.com> <4fd99db2$0$6118$426a74cc@news.free.fr>
Date 2012-06-14 15:32 +0200
Message-ID <slrnjtjpvk.8bu.hjp-usenet2@hrunkner.hjp.at> (permalink)

Show all headers | View raw


On 2012-06-14 08:26, mayeul.marguet <mayeul.marguet@free.fr> wrote:
> On 14/06/2012 09:06, Steven Simpson wrote:
>> On 13/06/12 22:14, Lew wrote:
>>> Young wrote:
>>>> Thank you for the tries, I don't understand why I should use
>>>> codePointCount() method. The length() method gives same result. I
>>>> want to
>>> Not in general it doesn't.
>>>
>>> Read the Javadocs for the two methods and you'll see why.
>>
>> I've just read it, and not seen any surprises - it doesn't seem to have
>> anything to do with the OP's problem, counting spaces occupied by a
>> character when displayed on a console. Whether a code point takes up two
>> chars inside a program is unrelated to whether it takes up two display
>> positions on a console. Am I missing something?

Right. Counting Java chars is very wrong. Counting code points is less
wrong, but still wrong, since not every code point takes the same amount
of screen space: If we assume a text terminal, a code point may take up
0, 1 or 2 positions. You'll have to loop over the code points and add up
the width of each code point. (A method which does this probably already
exists, but it isn't codePointCount())


>  From the start, what the OP calls a 'width' is actually the number of 
> bytes used to represent the character.
> Korean characters might be big and large, but not to the point that 
> they'd be twice as large as a monospace roman character. Even when using 
> strange fonts where that would happen, they wouldn't be /exactly/ twice 
> as large, and therefore trying to maintain alignment would be futile.

If the OP is trying to align them on a text terminal: No it wouldn't be
futile. Text terminals have a fixed character grid, and wide Asian
characters occupy 2 character cells. This is what the Unicode wide,
narrow, fullwidth and halfwidth properties are about (Somebody already
posted a link to the relevant specs). 

Just start a text terminal (xterm, gnome-terminal, konsole, or whatever)
and look at some text with Asian characters.

> Some encodings for korean characters use two bytes for korean characters 
> and one byte for ASCII characters.

Yes, but that's irrelevant for the OPs problem  (although in some Asian
encodings the two-byte characters are exactly those which also occupy
two positions on the screen, so converting to such an encoding and
counting the number of bytes would yield the right answer).

	hp


-- 
   _  | Peter J. Holzer    | Deprecating human carelessness and
|_|_) | Sysadmin WSR       | ignorance has no successful track record.
| |   | hjp@hjp.at         | 
__/   | http://www.hjp.at/ |  -- Bill Code on asrg@irtf.org

Back to comp.lang.java.help | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Actual width of unicode chracters. Young <ycp101@gmail.com> - 2012-06-13 02:23 +0000
  Re: Actual width of unicode chracters. Roedy Green <see_website@mindprod.com.invalid> - 2012-06-12 20:42 -0700
  Re: Actual width of unicode chracters. markspace <-@.> - 2012-06-12 20:45 -0700
    Re: Actual width of unicode chracters. markspace <-@.> - 2012-06-13 00:58 -0700
      Re: Actual width of unicode chracters. Young <ycp101@gmail.com> - 2012-06-13 08:30 +0000
        Re: Actual width of unicode chracters. markspace <-@.> - 2012-06-13 08:45 -0700
        Re: Actual width of unicode chracters. Lew <lewbloch@gmail.com> - 2012-06-13 14:14 -0700
          Re: Actual width of unicode chracters. Steven Simpson <ss@domain.invalid> - 2012-06-14 08:06 +0100
            Re: Actual width of unicode chracters. "mayeul.marguet" <mayeul.marguet@free.fr> - 2012-06-14 10:26 +0200
              Re: Actual width of unicode chracters. "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2012-06-14 15:32 +0200
                Re: Actual width of unicode chracters. "mayeul.marguet" <mayeul.marguet@free.fr> - 2012-06-14 16:47 +0200
                Re: Actual width of unicode chracters. "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2012-06-16 20:39 +0200
  Re: Actual width of unicode chracters. Joshua Cranmer <Pidgeot18@verizon.invalid> - 2012-06-13 00:00 -0400
  Re: Actual width of unicode chracters. markspace <-@.> - 2012-06-13 00:24 -0700
  Re: Actual width of unicode chracters. Steven Simpson <ss@domain.invalid> - 2012-06-13 10:24 +0100

csiph-web