Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.help > #1875

Re: Actual width of unicode chracters.

Newsgroups comp.lang.java.help
From "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject Re: Actual width of unicode chracters.
References (4 earlier) <ac90eba4-9efe-4c35-9f2b-2590326f1fe5@googlegroups.com> <75foa9-b46.ln1@s.simpson148.btinternet.com> <4fd99db2$0$6118$426a74cc@news.free.fr> <slrnjtjpvk.8bu.hjp-usenet2@hrunkner.hjp.at> <4fd9f71a$0$1992$426a74cc@news.free.fr>
Date 2012-06-16 20:39 +0200
Message-ID <slrnjtpkmu.ue3.hjp-usenet2@hrunkner.hjp.at> (permalink)

Show all headers | View raw


On 2012-06-14 14:47, mayeul.marguet <mayeul.marguet@free.fr> wrote:
> On 14/06/2012 15:32, Peter J. Holzer wrote:
>> If the OP is trying to align them on a text terminal: No it wouldn't be
>> futile. Text terminals have a fixed character grid, and wide Asian
>> characters occupy 2 character cells. This is what the Unicode wide,
>> narrow, fullwidth and halfwidth properties are about (Somebody already
>> posted a link to the relevant specs).
>>
>> Just start a text terminal (xterm, gnome-terminal, konsole, or whatever)
>> and look at some text with Asian characters.
>
> I'll have to trust you on that for now, but that makes sense. I'll 
> verify later with an up-to-date system.

There is a screenshot on wikipedia:
http://en.wikipedia.org/wiki/Halfwidth_and_fullwidth_forms


> Then that would mean that the OP meant exactly what he said and pretty 
> much everything I said and was understood here, was wrong.
> I was still right, though, in implying that codePointCount() is 
> pointless. First because, as you point out, counting chars is wrong. 
> Second because, in the context of this problem, everything is in the BMP 
> and codePointCount() will make no difference with length().

It would still be stupid to assume that all characters are from the
BMP. While it is very tempting to assume that (since all the "important"
characters are in the BMP) it is bound to fail sooner or later.

Doing it right is only marginally more complicated.


> I guess it's a simple matter of telling fullwidth & non-fullwidth 
> characters apart, then counting them, fullwidth counting for two.
> I am not knowledgeable enough with korean language to find out how to 
> tell them apart, maybe the list of characters is a known unicode range?

For Korean, probably yes. But then somebody enters a Chinese name ...
(A Korean would probably think of including the main CJK block. But
would they think of the four extension blocks (3 of which are not in the
BMP)?)

Don't invent a narrow specialized method when a generic method already
exists. In this case the property is already defined for every Unicode
character, you just have to use it. (The Java SE doesn't seem to provide
a way to get at this information but a few minutes of googling turned up
icu4j, which seems to provide it:
http://userguide.icu-project.org/strings/properties).

	hp


-- 
   _  | Peter J. Holzer    | Deprecating human carelessness and
|_|_) | Sysadmin WSR       | ignorance has no successful track record.
| |   | hjp@hjp.at         | 
__/   | http://www.hjp.at/ |  -- Bill Code on asrg@irtf.org

Back to comp.lang.java.help | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Actual width of unicode chracters. Young <ycp101@gmail.com> - 2012-06-13 02:23 +0000
  Re: Actual width of unicode chracters. Roedy Green <see_website@mindprod.com.invalid> - 2012-06-12 20:42 -0700
  Re: Actual width of unicode chracters. markspace <-@.> - 2012-06-12 20:45 -0700
    Re: Actual width of unicode chracters. markspace <-@.> - 2012-06-13 00:58 -0700
      Re: Actual width of unicode chracters. Young <ycp101@gmail.com> - 2012-06-13 08:30 +0000
        Re: Actual width of unicode chracters. markspace <-@.> - 2012-06-13 08:45 -0700
        Re: Actual width of unicode chracters. Lew <lewbloch@gmail.com> - 2012-06-13 14:14 -0700
          Re: Actual width of unicode chracters. Steven Simpson <ss@domain.invalid> - 2012-06-14 08:06 +0100
            Re: Actual width of unicode chracters. "mayeul.marguet" <mayeul.marguet@free.fr> - 2012-06-14 10:26 +0200
              Re: Actual width of unicode chracters. "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2012-06-14 15:32 +0200
                Re: Actual width of unicode chracters. "mayeul.marguet" <mayeul.marguet@free.fr> - 2012-06-14 16:47 +0200
                Re: Actual width of unicode chracters. "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2012-06-16 20:39 +0200
  Re: Actual width of unicode chracters. Joshua Cranmer <Pidgeot18@verizon.invalid> - 2012-06-13 00:00 -0400
  Re: Actual width of unicode chracters. markspace <-@.> - 2012-06-13 00:24 -0700
  Re: Actual width of unicode chracters. Steven Simpson <ss@domain.invalid> - 2012-06-13 10:24 +0100

csiph-web