Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.help > #1875
| Newsgroups | comp.lang.java.help |
|---|---|
| From | "Peter J. Holzer" <hjp-usenet2@hjp.at> |
| Subject | Re: Actual width of unicode chracters. |
| References | (4 earlier) <ac90eba4-9efe-4c35-9f2b-2590326f1fe5@googlegroups.com> <75foa9-b46.ln1@s.simpson148.btinternet.com> <4fd99db2$0$6118$426a74cc@news.free.fr> <slrnjtjpvk.8bu.hjp-usenet2@hrunkner.hjp.at> <4fd9f71a$0$1992$426a74cc@news.free.fr> |
| Date | 2012-06-16 20:39 +0200 |
| Message-ID | <slrnjtpkmu.ue3.hjp-usenet2@hrunkner.hjp.at> (permalink) |
On 2012-06-14 14:47, mayeul.marguet <mayeul.marguet@free.fr> wrote: > On 14/06/2012 15:32, Peter J. Holzer wrote: >> If the OP is trying to align them on a text terminal: No it wouldn't be >> futile. Text terminals have a fixed character grid, and wide Asian >> characters occupy 2 character cells. This is what the Unicode wide, >> narrow, fullwidth and halfwidth properties are about (Somebody already >> posted a link to the relevant specs). >> >> Just start a text terminal (xterm, gnome-terminal, konsole, or whatever) >> and look at some text with Asian characters. > > I'll have to trust you on that for now, but that makes sense. I'll > verify later with an up-to-date system. There is a screenshot on wikipedia: http://en.wikipedia.org/wiki/Halfwidth_and_fullwidth_forms > Then that would mean that the OP meant exactly what he said and pretty > much everything I said and was understood here, was wrong. > I was still right, though, in implying that codePointCount() is > pointless. First because, as you point out, counting chars is wrong. > Second because, in the context of this problem, everything is in the BMP > and codePointCount() will make no difference with length(). It would still be stupid to assume that all characters are from the BMP. While it is very tempting to assume that (since all the "important" characters are in the BMP) it is bound to fail sooner or later. Doing it right is only marginally more complicated. > I guess it's a simple matter of telling fullwidth & non-fullwidth > characters apart, then counting them, fullwidth counting for two. > I am not knowledgeable enough with korean language to find out how to > tell them apart, maybe the list of characters is a known unicode range? For Korean, probably yes. But then somebody enters a Chinese name ... (A Korean would probably think of including the main CJK block. But would they think of the four extension blocks (3 of which are not in the BMP)?) Don't invent a narrow specialized method when a generic method already exists. In this case the property is already defined for every Unicode character, you just have to use it. (The Java SE doesn't seem to provide a way to get at this information but a few minutes of googling turned up icu4j, which seems to provide it: http://userguide.icu-project.org/strings/properties). hp -- _ | Peter J. Holzer | Deprecating human carelessness and |_|_) | Sysadmin WSR | ignorance has no successful track record. | | | hjp@hjp.at | __/ | http://www.hjp.at/ | -- Bill Code on asrg@irtf.org
Back to comp.lang.java.help | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Actual width of unicode chracters. Young <ycp101@gmail.com> - 2012-06-13 02:23 +0000
Re: Actual width of unicode chracters. Roedy Green <see_website@mindprod.com.invalid> - 2012-06-12 20:42 -0700
Re: Actual width of unicode chracters. markspace <-@.> - 2012-06-12 20:45 -0700
Re: Actual width of unicode chracters. markspace <-@.> - 2012-06-13 00:58 -0700
Re: Actual width of unicode chracters. Young <ycp101@gmail.com> - 2012-06-13 08:30 +0000
Re: Actual width of unicode chracters. markspace <-@.> - 2012-06-13 08:45 -0700
Re: Actual width of unicode chracters. Lew <lewbloch@gmail.com> - 2012-06-13 14:14 -0700
Re: Actual width of unicode chracters. Steven Simpson <ss@domain.invalid> - 2012-06-14 08:06 +0100
Re: Actual width of unicode chracters. "mayeul.marguet" <mayeul.marguet@free.fr> - 2012-06-14 10:26 +0200
Re: Actual width of unicode chracters. "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2012-06-14 15:32 +0200
Re: Actual width of unicode chracters. "mayeul.marguet" <mayeul.marguet@free.fr> - 2012-06-14 16:47 +0200
Re: Actual width of unicode chracters. "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2012-06-16 20:39 +0200
Re: Actual width of unicode chracters. Joshua Cranmer <Pidgeot18@verizon.invalid> - 2012-06-13 00:00 -0400
Re: Actual width of unicode chracters. markspace <-@.> - 2012-06-13 00:24 -0700
Re: Actual width of unicode chracters. Steven Simpson <ss@domain.invalid> - 2012-06-13 10:24 +0100
csiph-web