Path: csiph.com!x330-a1.tempe.blueboxinc.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!weretis.net!feeder4.news.weretis.net!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail From: Joshua Cranmer Newsgroups: comp.lang.java.programmer Subject: Re: ascii char 26 Date: Sun, 11 Sep 2011 18:37:45 -0500 Organization: A noiseless patient Spider Lines: 14 Message-ID: References: <16f8836c-27b9-483b-a71f-61d7d6cfd188@i2g2000yqm.googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Sun, 11 Sep 2011 23:38:19 +0000 (UTC) Injection-Info: mx04.eternal-september.org; posting-host="WpcHJSul77m+zlbR9GVqkA"; logging-data="21589"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/ZOTIpj3XW/CsCzWDyvBAsjV1U+mGaVtA=" User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0.2) Gecko/20110902 Thunderbird/6.0.2 In-Reply-To: Cancel-Lock: sha1:8FFquiD8YPrlWj3p96aQ4Z/1BOk= Xref: x330-a1.tempe.blueboxinc.net comp.lang.java.programmer:7841 On 9/11/2011 6:18 PM, Bent C Dalager wrote: > One would tend to think there ought to be a library function somewhere > to convert a unicode string to ASCII-supported variants of its various > characters where possible, that you should be using instead. I don't > know if such a function is easily available. This generally falls under the umbrella of Unicode normalization, which can resolve, e.g., Å the Angstrom symbol and Å the Swedish letter to the same representation (may require compatibility normalization). You can do this in Java using the java.text.Normalizer class. -- Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald E. Knuth