Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!news.albasani.net!nuzba.szn.dk!news.szn.dk!pnx.dk!dotsrc.org!filter.dotsrc.org!news.dotsrc.org!not-for-mail Date: Sun, 11 Sep 2011 17:48:02 -0400 From: =?ISO-8859-1?Q?Arne_Vajh=F8j?= User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0.2) Gecko/20110902 Thunderbird/6.0.2 MIME-Version: 1.0 Newsgroups: comp.lang.java.programmer Subject: Re: ascii char 26 References: <16f8836c-27b9-483b-a71f-61d7d6cfd188@i2g2000yqm.googlegroups.com> In-Reply-To: <16f8836c-27b9-483b-a71f-61d7d6cfd188@i2g2000yqm.googlegroups.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Lines: 35 Message-ID: <4e6d2c91$0$309$14726298@news.sunsite.dk> Organization: SunSITE.dk - Supporting Open source NNTP-Posting-Host: 72.192.23.141 X-Trace: news.sunsite.dk DXC=T6gDbgkjLfP;OkaSc<8a?TYSB=nbEKnk[@IQ>cfbXVeVJPe3\kP5EUQKBm9cfh9BSTM2;kT<[:>[Qm6VQ`Kki7\U;VU8@m4o=D_ X-Complaints-To: staff@sunsite.dk Xref: x330-a1.tempe.blueboxinc.net comp.lang.java.programmer:7827 On 9/11/2011 5:33 PM, bob wrote: > Anyone know why ASCII char 26 is used in place of a hyphen in UTF-8? > > I had to write this function to deal with this: > > public static String convertToAscii(String html) { > html = html.replaceAll("\u2019", "'"); > html = html.replaceAll("\u201D", "\""); > html = html.replaceAll("\u201C", "\""); > > byte[] b = null; > try { > b = html.getBytes("US-ASCII"); > } catch (UnsupportedEncodingException e) { > e.printStackTrace(); > } > > // hyphen replace > for (int ctr = 0; ctr< b.length; ctr++) > if (b[ctr] == 26) > b[ctr] = 45; > > html = new String(b); > return html; > } ASCII code 26 is not in general replaced with hyphen. If you are asking why some code may do it, then in some contexts (usually on Windows platform) ASCII code 26 indicates EOF. Arne