Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!news.albasani.net!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail From: Joshua Cranmer Newsgroups: comp.lang.java.programmer Subject: Re: ascii char 26 Date: Sun, 11 Sep 2011 21:25:53 -0500 Organization: A noiseless patient Spider Lines: 24 Message-ID: References: <16f8836c-27b9-483b-a71f-61d7d6cfd188@i2g2000yqm.googlegroups.com> <63554bdb-dab4-43e7-b809-5128fd831f3c@m38g2000vbn.googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Mon, 12 Sep 2011 02:26:28 +0000 (UTC) Injection-Info: mx04.eternal-september.org; posting-host="WpcHJSul77m+zlbR9GVqkA"; logging-data="5600"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19Hv4ePO4b+EA2FGIZ+lCLzXEkN3pJQFIM=" User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0.2) Gecko/20110902 Thunderbird/6.0.2 In-Reply-To: <63554bdb-dab4-43e7-b809-5128fd831f3c@m38g2000vbn.googlegroups.com> Cancel-Lock: sha1:n56lQMjmpUZt0PcNJiHSIJT5Gjo= Xref: x330-a1.tempe.blueboxinc.net comp.lang.java.programmer:7853 On 9/11/2011 9:12 PM, bob wrote: > You're right. I messed up, and it was the em dash. It turned into 26 > after going thru 'b = html.getBytes("US-ASCII");' > > Here's the new code: Hardcoding a list of tables is generally not a good thing; in particular, I don't think it's going to solve your problems. I have seen sites that use the Unicode ff and fi ligatures instead of relying on fonts to automatically pick up on that as well. If I may ask, why do you need to convert the string to US-ASCII as opposed to UTF-8? That is going to cause major issues for the ~90% of the world that doesn't speak English as their main language. > Also, I'm on Android 2.1, so import java.text.Normalizer; doesn't > work. It shouldn't be that hard to find other Java Unicode normalization libraries out there. -- Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald E. Knuth