Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #7837

Re: ascii char 26

Path csiph.com!x330-a1.tempe.blueboxinc.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!goblin2!goblin.stu.neva.ru!newsfeed1.swip.net!uio.no!ntnu.no!not-for-mail
From Bent C Dalager <bcd@pvv.ntnu.no>
Newsgroups comp.lang.java.programmer
Subject Re: ascii char 26
Date Sun, 11 Sep 2011 23:18:51 +0000 (UTC)
Organization Norwegian university of science and technology
Lines 34
Message-ID <slrnj6qger.r1i.bcd@microbel.pvv.ntnu.no> (permalink)
References <16f8836c-27b9-483b-a71f-61d7d6cfd188@i2g2000yqm.googlegroups.com>
NNTP-Posting-Host microbel.pvv.ntnu.no
X-Trace orkan.itea.ntnu.no 1315783131 20484 129.241.210.179 (11 Sep 2011 23:18:51 GMT)
X-Complaints-To usenet@ntnu.no
NNTP-Posting-Date Sun, 11 Sep 2011 23:18:51 +0000 (UTC)
User-Agent slrn/pre1.0.0-18 (Linux)
Xref x330-a1.tempe.blueboxinc.net comp.lang.java.programmer:7837

Show key headers only | View raw


On 2011-09-11, bob <bob@coolgroups.com> wrote:
> Anyone know why ASCII char 26 is used in place of a hyphen in UTF-8?

Unicode has multiple different hyphens and hyphen-like characters.

The traditional ASCII hyphen is the Unicode "hyphen-minus" which
encodes to 0x2d in utf-8.

http://www.fileformat.info/info/unicode/char/2d/index.htm suggests the
following additional hyphen-like characters that you may actually be
working with in your string, and that will probably be mapped to 26 in
your case:

hyphen U+2010
non-breaking hyphen U+2011
figure dash U+2012
en dash U+2013
minus sign U+2212
roman uncia sign U+10191

If hyphens are of particular interest to you it may be a better
approach to replace non-ASCII-supported hyphens from the above list
with "hyphen-minus", before you transcode to ASCII.

One would tend to think there ought to be a library function somewhere
to convert a unicode string to ASCII-supported variants of its various
characters where possible, that you should be using instead. I don't
know if such a function is easily available.

Cheers,
	Bent D
-- 
Bent Dalager - bcd@pvv.org - http://www.pvv.org/~bcd
                                    powered by emacs

Back to comp.lang.java.programmer | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

ascii char 26 bob <bob@coolgroups.com> - 2011-09-11 14:33 -0700
  Re: ascii char 26 Arne Vajhøj <arne@vajhoej.dk> - 2011-09-11 17:48 -0400
  Re: ascii char 26 Joshua Cranmer <Pidgeot18@verizon.invalid> - 2011-09-11 16:52 -0500
    Re: ascii char 26 Eric Sosman <esosman@ieee-dot-org.invalid> - 2011-09-11 18:28 -0400
    Re: ascii char 26 bob <bob@coolgroups.com> - 2011-09-11 19:12 -0700
      Re: ascii char 26 Joshua Cranmer <Pidgeot18@verizon.invalid> - 2011-09-11 21:25 -0500
        Re: ascii char 26 bob <bob@coolgroups.com> - 2011-09-12 01:30 -0700
  Re: ascii char 26 Roedy Green <see_website@mindprod.com.invalid> - 2011-09-11 15:25 -0700
  Re: ascii char 26 Bent C Dalager <bcd@pvv.ntnu.no> - 2011-09-11 23:18 +0000
    Re: ascii char 26 Joshua Cranmer <Pidgeot18@verizon.invalid> - 2011-09-11 18:37 -0500
    Re: ascii char 26 Retahiv Oopsiscame <roopsisc@gmail.com> - 2011-09-11 16:53 -0700
      Re: ascii char 26 Roedy Green <see_website@mindprod.com.invalid> - 2011-09-14 11:55 -0700

csiph-web